Java Persistence Performance: December 2010

Java performance optimization is part analysis, and part superstition and witch craft.

In deciding how to optimize their applications a lot of developers search the web for what people say is faster, or go by what they have heard from their co-workers sister's boyfriend's cousin, or by what seems to be the "public opinion". Sometimes the public opinion is correct, and their application benefits, and sometimes it is not, and they waste their time and effort and make little performance improvement, or make things worse.

True performance optimization involves measuring the current performance. Profiling the application and determining the bottlenecks. Investigating and implementing optimizations to avoid the bottlenecks.

But how should one code on a day to day basis for optimal performance? Are synchronized methods slow? Are final methods fast? What is faster Vector, ArrayList or LinkedList? What is the optimal way to iterate a List? Is refection slow? I will attempt to answer these questions in this post.

In the EclipseLink project we are generally very concerned about performance, as we are a persistence library used by other applications, so like the JDK we are part of the plumbing and need to be as optimized as possible.

We have a number of performance test suites in EclipseLink, mainly to measure our persistence performance, but we have one test suite that has nothing to do with persistence. Our Java performance test suite just measures the performance of different operations in Java. These tests help us determine how to best optimize our code.

The tests function by running a certain operation for a set amount of time and measuring the number of operations in the time period. Next the next operation is run and measured the same way. This is then repeated 5 times, and the max/min values are rejected, the average of the middle 3 is computed, the standard deviation is computed, and the averages of the two operations are compared. Tests are single threaded.

The tests were run on Oracle Sun JDK 1.6.0_07 on 32 bit Windows.

Maps

There are several different Map implementations in Java. This test compares the performance for various sizes of Maps. The test instantiates the Map, does n (size) puts, then n gets and n removes.

Map Operation Performance Comparison

Map	Size	Average (operations/10 seconds)	%STD	%DIF (with Hashtable)
Hashtable	10	3605235	0.03%	0%
HashMap	10	2908854	0.02	-23%
LinkedHashMap	10	2594492	0.03%	-38%
IdentityHashMap	10	1346278	0.01%	-167%
ConcurrentHashMap	10	1009259	0.0%	-257%
HashSet	10	2927254	0.02%	-23%
Hashtable	100	357229	0.01%	0%
HashMap	100	274587	0.03%	-30%
LinkedHashMap	100	269840	0.03%	-32%
IdentityHashMap	100	110801	0.02%	-222%
ConcurrentHashMap	100	119068	0.01%	-200%
HashSet	100	281960	0.04%	-26%
Hashtable	1000	34034	6.2%	0%
HashMap	1000	27818	0.06%	-22%
LinkedHashMap	1000	25514	0.03%	-33%
IdentityHashMap	1000	11650	0.04%	-192%
ConcurrentHashMap	1000	12420	2.9%	-174%
HashSet	1000	26888	0.04%	-26%

These results are quite interesting, I never would have expected such a big difference in performance between classes doing basically the same well defined thing, that has been around for quite some time. It is surprising that Hashtable performs better than HashMap, when HashMap is suppose to be the replacement for Hashtable, and does not suffer from its limitation of using synchronized methods, which are suppose to be slow. Note: After exploring other JVMs in my next post, it seems that this anomaly only exists in JDK 1.6.7, in the latest JVM, and most other JVM, HashMap seems to be faster than Hashtable.

I would expect LinkedHashMap and ConcurrentHashMap to be somewhat slower, as they have additional overhead and perform better in specific use cases, but it is odd that IdentityHashMap is so much slower, given my test object did not define a hashCode(), so was using its identity, so the map was doing the identical thing as HashMap and Hashtable. Also interesting that HashSet has the same performance as HashMap, given it in theory could be a simpler data structure (in reality it is a subclass of HashMap, so performs the same).

Note that these are single threaded results. Although Hashtable outperformed HashMap in this test, in a multi-threaded (and multi-CPU) environment, Hashtable would perform much worse because of the method synchronization. Assuming read-only access of coarse, as concurrent access to a HashMap would blow up. ConcurrentHashMap does perform the best in a concurrent read-write environment.

Lists

For Maps, Hashtable turned out to have the best performance. Lets now investigate Lists, will the old Vector implementation have better performance than ArrayList? The list test instantiates a list, then does n (size) adds followed by n gets by index.

List Operation Performance Comparison

List	Size	Average (operations/10 seconds)	%STD	%DIF (with Vector)
Vector	10	11625453	0.006%	0%
ArrayList	10	18448453	0.08%	+56%
LinkedList	10	12362290	0.03%	+6.0%
Vector	100	1241420	0.2%	0%
ArrayList	100	1893581	0.1%	+52%
LinkedList	100	1012740	0.01%	-22%
Vector	1000	132182	0.2%	0%
ArrayList	1000	223969	0.1%	+69%
LinkedList	1000	12689	0.02%	-941%

So, it seems that ArrayList is much faster than Vector. Given both Vector and Hashtable are synchronized, I assume it is not the synchronized methods, just the implementation. LinkedList surprisingly performs as good as Vector for small sizes, but then performs much worse on larger sizes because of the indexed get. This is expected as LinkedList performs good in specific use cases, such as removing from the head or middle, but this was not tested.

Iteration

There are many way to iterate a List in Java. For a Vector a Enumeration can be used, or an Iterator in any List. A simple for loop with an index can also be used for any List. Java 5 also defines a simplified for syntax for iterating any Collection. This test compares the performance of iteration, the List has already been pre-populated.

Iteration Performance Comparison

List	Average (operations/10 seconds)	%STD	%DIF (with Vector)
for index (Vector)	6079272	0.06%	0%
for index (ArrayList)	6048324	0.03%	-0.5%
Enumeration (Vector)	4736049	0.02%	-28%
Iterator (Vector)	2840094	0.05%	-114%
Iterator (ArrayList)	1935122	0.008%	-214%
for (Vector)	2841567	0.05%	-113%
for (ArrayList)	1933576	0.04%	-214%

So using a for loop with an index is faster than an Enumerator or Iterator. This is expected, as it avoids the cost of the iteration object instance. It is interesting that a Enumerator is faster than an Iterator, and a Vector Iterator is faster than an ArrayList Iterator. Unfortunately there is no magic in the Java 5 for syntax, it just calls iterator().

So the optimal way to iterate a List is:


int size = list.size();
for (int index = 0; index < size; index++) {
    Object object = list.get(index);
    ...
}

However, the Java 5 syntax is simpler, so I must admit I would use it in any non performance critical code, and maybe some day the JVM will do some magic and for will have better performance.

Method Execution

A method in Java can be defined in several different ways. It can be synchronized, final, called reflectively, or not called at all (in-lined). This next test tries to determine the performance overhead to a method call. The test executes a simple method that just increments an index. The test executes the method 100 times to avoid the test call overhead. For the reflective usage the Method object and arguments array are cached.

Method Execution Performance Comparison

Method	Average (100 operations/10 seconds)	%STD	%DIF (with normal)
Normal	25533130	0.7%	0%
synchronized	13383707	4.3%	-93%
Block synchronized	8244087	4.7%	-203%
final	26873873	1.2%	+6.1%
In-lined	26816109	1.5%	+5.2%
volatile	1503727	0.1%	-1539%
Reflection	159069	3.2%	-15646%

Interesting results. Synchronized methods are slower, and using a synchronized block inside a method seems to be slower than just making the method synchronized. Final methods seem to be slightly faster, but very minor, and seem to have the same improvement as in-lining the method, so I assume that is what the JVM is doing.

Calling a method through reflection is significantly slower. Reflection has improved much since 1.5 added byte-code generation, but if you profile a reflective call you will see several isAssignable() checks before the method is invoked to ensure the object and arguments are of the correct type, it is odd that these cannot be handled through casts in the generated code, or typing errors just trapped.

CORRECTION
Originally volatile was showing an improvement in performance because of a bug in the volatile test. After correcting the bug the test now shows a huge overhead in performance. The usage of volatile on the int field that is being incremented in the test method is causing a 15x performance overhead. This is surprising, and much larger than the synchronized overhead, but still smaller than the reflection overhead. This is especially odd as the field is an int which one would think is atomic anyway. I imaging the affect the volatile has on the JVM memory usage is somehow causing the overhead. In spite of the overhead, I would still use volatile when required, in concurrent pieces of code where the same variable is concurrently modified. I have done multi-threaded tests comparing its concurrency with synchronizing the method, and using volatile is better than using synchronization for concurrency.

Note that this difference in cost of execution is on a method that does basically nothing. Generally, performance bottlenecks in applications are in places that do something, not nothing, and it is the something that is the bottleneck, not how the method is called. Even in the reflective call, the cost was less than 1 millionth of a second, so on a method that toke 1 millisecond the overhead would be irrelevant. Also, I assume these results are very JVM specific, other JVMs, older JVMs, and the JVMs of the future may behave much differently.

Reflection

There are two types of reflection, field reflection and method reflection. In JPA this normally corresponds to using FIELD access of PROPERTY access (annotating the fields or the get methods). Which is faster? This test sets a single variable, either directly, by calling a set method, or through field or set method reflection.

Reflection Performance Comparison

Type	Average (operations/10 seconds)	%STD	%DIF (with in-lined)
In-lined	44105730	0.7%	0%
Set method	46189638	1.2%	+4.7%
Field Reflection	23488987	1.4%	-87%
Method Reflection	10862684	0.8%	-306%

So its appears that in JDK 1.6 field reflection is faster than set method reflection. Part of this difference is that for method reflection an Object array must be created to call the set method, where as field reflection does not require this. Method reflection improved a lot in JDK 1.5, but it still seems slower than field reflection, at least for set methods. Both have an overhead over a compiled method execution or direct variable access.

If you look at the results the difference between reflection and normal execution is much less than in the previous test. This is because in the previous test the method was executed 100 times inside the test, so the overhead of calling the test method was reduced, where as this test only called it once, so had the test method call overhead in it. This highlights an important fact, that how you call the method that sets the field has a big impact on the performance. If you use a complex layering of interfaces and generated code to call a set method directly, versus calling it reflectively, the overhead of the layering may be worse than the reflective call.

In EclipseLink field reflection is used by default, but it depends if you annotate your fields or get methods. If you annotate your get methods, then method reflection is used. If you use weaving (enabled through either, the EclipseLink agent, static weaving, JEE, or Spring) and use FIELD access, then EclipseLink will weave in a generic get and set method into your classes to avoid reflection. This amounts to a minor optimization in the context of database persistence, and can be disabled using the persistence unit property "eclipselink.weaving.internal"="false".

Summary

So this post has presented a lot of data on how different things perform in Oracle Sun JDK 1.6 on Windows. It would be interesting to see how the same tests perform in different JVMs in different environments. Perhaps I will investigate that next.

The source code to the above tests is in the EclipseLink SVN repository,
here

Java Persistence Performance

Pages

Friday, December 17, 2010

What is faster? JVM Performance