domingo, marzo 18, 2012

Moi je pense à l'enfant, Entouré de soldats, Moi je pense à l'enfant, Qui demande pourquoi (Non Non Rien N'a Changé - Les Poppys)

After 8 years developing server and embedded applications using Hibernate as ORM, squeezing my brain seeking solutions to improve Hibernate performance, reading blogs and attending conferences, I decided to share this knowledge acquired during these years with you.

This is the first post of many more posts to come:

Last year I went to Devoxx as speaker but also I attended Patrycja Wegrzynowicz conference about Hibernate Anti-Patterns. In that presentation Patrycja shows us an anti-pattern that shocks me because it proved to expect the unexpected.

We are going to see the effect it has when Hibernate detects a dirty collection and should re-create it.

Let's start with the model we are going to use, only two classes related with one-to-many association:

In previous classes, we should pay attention in three important points:
  • we are annotating at property level instead of field level.
  • @OneToMany and @ManyToOne uses default options (apart from cascade definition)
  • officers getter on Starship class returns an immutable list. 
To test model configuration, we are going to create a test which creates and persists one Starship and seven Officers, and in different Transaction and EntityManager finds created Starship.

Now that we have created this test, we can run it and we are going to observe Hibernate console output.

See the number of queries executed during first commit (persisting objects) and during commit of second transaction (finding a Starship). In total and ignoring sequence generator, we can count 22 inserts, 2 selects and 1 delete, not bad when we are only creating 8 objects and 1 find by primary key.

At this point let's examine why these SQL queries are executed:

First eight inserts are unavoidable; they are required by inserting data into database.

Next seven inserts are required because we have annotated getOfficers property without mappedBy attribute. If we look closely at Hibernate documentation, it points us that “Without describing any physical mapping, a unidirectional one to many with join table is used.”

Next group of queries are even stranger, the first select statement is to find Starship by id, but what are these deletes and inserts of data that we have already created?

During commit Hibernate validates whether collection properties are dirty by comparing object references. When a collection is marked as dirty, Hibernate needs to re-create whole collection, even containing the same objects. In our case when we are getting officers we are returning a different collection instance, concretely an unmodifiable list, so Hibernate considers officers collection as dirty.

Because a join table is used, Starship_Officer table should be re-created, deleting previous inserted tuples and inserting the new ones (although they have the same values).

Let's try to fix this problem. We start by mapping a bidirectional one-to-many association, with many-to-one side as owning side.

And now we rerun the same test again and we inspect the output again.

Although we have reduced the number of SQL statements, from 25 to 10, we still have an unnecessary query, the ones just in commit section of second transaction. Why if officers are lazy by default (JPA specification), and we are not getting officers in transaction, Hibernate executes a select on Officers table?  By the same reason as previously configuration, returned collection has different Java identifier, so Hibernate marks it as newly instantiated collection, but now obviously join table operations are no longer required. We have reduced the number of queries but we still have a performance problem. It is likely that we'll need some other solution, and the solution is not the most obvious one, we are not going to return collection objects returned by Hibernate, we might expand on this later, but we are going to change annotations location.

What we are going to do is to change mapping location from property approach to use field mapping. Simply we are going to move all annotations to class attributes rather than on getters.

And finally we are going to run the test again, and see what's happen:

Why using property mapping Hibernate runs queries during commit and using field mapping are not executed? When a Transaction is committed, Hibernate execute a flush to  synchronize the underlying persistent store with persistable state held in memory. When property mapping is used, Hibernate calls getter/setter methods to synchronize data, and in case of getOfficers method, it returns a dirty collection (because of unmodifiableList call). On the other side when we are using field mapping, Hibernate gets directly the field, so collection is not considered dirty and no re-creation is required.

But we have not finished yet, I suppose you are wondering why we have not removed Collections.unmodifiableList from getter, returning Hibernate collection? Yes I agree with you that we finished quickly, and change would look like @OneToMany(cascade={CascadeType.ALL}) public List<Officer> getOfficers() {officers;} but returning original collection ends up with an encapsulation problem, in fact we are broken encapsulation!. We could add to mutable list anything we like; we could apply uncontrolled changes to the internal state of an object.

Using an  unmodifiableList is an approach to use to avoid breaking encapsulation, but of course we could have used different accessors for public access and hibernate access, and not calling  Collections.unmodifiableList method.

Considering what we have seen today, I suggest you to use always field annotations instead of property mapping, we are going to save from a plenty of surprises.

Hope you have found this post useful.

Screencast of example shown here:

Download code

martes, marzo 06, 2012

Keep 'em laughing as you go, Just remember that the last laugh is on you, And always look on the bright side of life..., Always look on the right side of life... (Always Look on the Bright Side of Life - Mony Python)

Integration tests are kind of tests which individual modules are combined and tested as a whole. Moreover integration tests might use system dependent values, accessing external systems like file system, database, web services, ..., and testing multiple aspects of one test case. We can say it is a high-level test.

This differs from unit test where only a single component is tested. Unit tests runs in isolation, mocking-out external components or using in-memory database in case of DAO layers. A unit test might be:
  • Repeatable.
  • Consistent.
  • In Memory.
  • Fast.
  • Self-validating.
  • Testing single concept

The problem when we are writing tests, is how to test rare (or untypical) conditions like "No disk space" in case of accessing file system, or "Connection lost" when executing a database query.

In unit testing this is not a problem you can mock up that component (database connection or filesystem access), generating required output like throwing IOException.

The problem becomes "harder" with integration tests. It would be strange to mock a component, when what you really want to do is validate the real system. So arrived at this point I see two possibilities:
  • Creating a partial mock.
  • Using fault injection.
In this post I am going to show you how to use fault injection approach to test unusual erroneous situations. 

Fault injection is a technique which involves changing application code under test at specific locations. This modifications will introduce faults on error handling code paths which otherwise would rarely be followed.

I am going to talk about how to use fault injection using Byteman in a JUnit test, and run it with Maven.

Let's start coding. Imagine you need to write a backup module, which shall save a string into a local file, but if hard disk is full (IOException is thrown), content shall be sent to remote server.

First we are going to code a class that writes content into file.

Next class, would be the one that sends data through socket but will not be shown, because it is not necessary for this example.

And finally the backup service responsible of managing described behavior.

And now testing time. First of all a brief introduction to Byteman.

Byteman is a tool which allows you to insert/modify code into an application at runtime. These modifications can be used to inject code on your compiled application causing unusual or unexpected operations (aka Fault Injection).

Byteman uses a clear, simple scripting language, based on a formalism called Event Condition Action (ECA) rules to specify where, when and how the original Java code should be transformed.

An example of ECA script is:

But Byteman also supports annotations. And in my opinion, annotations are a better approach than script file, because only watching your test case you can understand what you are exactly testing. If not you should switch context from unit class to script file to understand what are you testing.

So let's create an integration test that that validates that when IOException is thrown while writing content into disk, data is sent to a server.

See that BMUnitRunner (a special jUnit runner that comes with Byteman) is required.

First test called aFileWithContentShouldBeCreated is a standard test that writes Hello world into backup file.

But the second one dataShouldBeSentToServerInCaseOfIOException, has BMRule annotation which will contain when, where and what code should be injected. First parameter is the name of the rule, in this case a description of what we are going to do (throwing an IOException). Next attributes, targetClass and targetMethod configure when injected code should be added. In this case when FileUtils.createFileWithContent method is called. Next attribute targetLocation is location where code is inserted, and in our case is where createFileWithContent method calls write method of BufferedWriter. And finally what to do that obviously in this test is throwing an IOException.

So now you can go to your IDE and run them, and all tests should pass, but if you run through Maven using Surefire plugin, test will not work. To use Byteman with Maven, Surefire plugin should be configured in a specific way.

First important thing is adding tools jar as dependency. This jar provides classes needed in order to dynamically install the Byteman agent.

In Surefire plugin configuration is important to set useManifestOnlyJar to false to ensure that the Byteman jar appears in the classpath of the test JVM.  Also see that we are defining empty environment variables (BYTEMAN_HOME and org.jboss.byteman.home). This is because when it loads the agent the BMUnit package will use environment variable BYTEMAN_HOME or System property org.jboss.byteman.home to locate byteman.jar but only if it is a non-empty string. Otherwise it scans the classpath to locate the jar. Because we want to ensure that jar added on dependency section is used, we are overriding any other configuration present on system.

And now you can run mvn clean test and two tests are successful too.

See that Byteman opens a new world into how we are writing our integration tests, now we can test in an easy way unusual exceptions like Communications Error, Input/Output Exceptions or Out Of Memory Error. Moreover because we are not mocking FileUtils, we are executing real code; for example in our second test, we are running a few lines of FileUtils object until write method is reached. If we had mocked-up FileUtils class, these lines would not be executed. Thanks of using fault injection our code coverage is improved.

Byteman is more than what I have shown you, it also has built-ins designed for testing in multithreaded environments, parameter binding, and an amount of location specifiers, to cite a few things.

I wish you have found this post useful and help you testing rare conditions of your classes.

Download Code