EntityManagerFactory
, this is generally done for you in JavaEE, but in JavaSE you need to do this yourself, such as storing it in a static variable. If you don't cache your EntityManagerFactory
, then your persistence unit will be redeployed on every call, which will really suck.
Other caches in JPA include the cache of JDBC connections, the cache of JDBC statements, the result set cache, and most importantly the object cache, which is what I would like to discuss today.
JPA 1.0 did not define caching, although most JPA providers did support a cache in some form or another. JPA 2.0 defined caching through the @Cacheable
annotation and the <shared-cache-mode>
persistence.xml element. Some describe caching in JPA as two levels. Conceptually there is the L1 cache on an EntityManager
, and the L2 cache on the EntityManagerFactory
.
The EntityManager
cache is an isolated, transactional cache, that only caches the objects read by that EntityManager
, and shares nothing with other EntityManagers
. The main purpose of the L1 cache is to maintain object identity (i.e. person == person.getSpouse().getSpouse()
), and maintain transaction consistency. The L1 cache will also improve performance by avoiding querying the same object multiple times. The only way to avoid the L1 cache is to refresh, create a new EntityManager
, or call clear()
.
The EntityManagerFactory
cache is a shared cache across all EntityManagers
, and reflects the current committed state of the database (stale data can be possible depending on your configuration and if you have other applications accessing the database). The main purpose of the L2 cache is to improve performance by avoiding queries for objects that have already been read. The L2 cahe is normally what is referred to when caching is discussed in JPA, and what the JPA <shared-cache-mode>
and @Cacheable
refer to.
There are many types of caches provided by the various JPA providers. Some provide data caches, some provide object caches, some have relationship caches, some have query caches, some have distributed caches, or coordinated caches.
EclipseLink provides an object cache, what I would call a "live" object cache. I believe most other JPA providers provide a data cache. The difference between a data cache, and an object cache, is that a data cache just caches the object's row, where as an object cache caches the entire object, including its relationships.
Caching relationships is normally more important than caching the object's data, as each relationship normally represent a database query, so saving n database queries to build an object's relationships is more important than saving the 1 query for the object itself. Some JPA providers augment their data cache with a relationship cache, or a query cache. If a data cache caches relationships at all, it is normally in the form of caching only the ids of the related objects. This can be a major issue, consider caching a OneToMany relationship, if you only have a set of ids, then you need to query the database for each id that is not in the cache, causing n database queries. With an object cache, you have the related objects, so never need to query the database.
The other advantage to caching objects is that you also save the cost of building the objects from the data. If the object or query is read-only, the cached object can be used directly, otherwise it only needs to be copied, not rebuilt from data.
EclipseLink also supports not caching relationships through the @Noncacheable
annotation. Also the @Cache(isolation=PROTECTED)
option can be used to ensure read-only entities and queries always copy the cached objects. So you can simulate a data cache with EclipseLink.
One should not underestimate the performance benefits of caching. Where as other JPA optimization may improve performance by 10-20%, or 2-5x for the major ones, caching has the potential to improve performance by factors of 100x even 1000x.
So what are the numbers? In this simple benchmark I compare reading a simple Order object, and it relationships (orderLines, customer). I compared the various caching options.(result is queries per second, so bigger number is better, test was single threaded, randomly querying an order from a data set of 1000 orders, tests were run 5 times and averaged, database was an Oracle database over a local area network, low end hardware was used).
Cache Option | Cache Config | Average Result (q/s) | % Difference |
---|---|---|---|
No Cache | @Cacheable(false) | 965 | 0% |
Object Cache | @Cacheable(true) | 36,544 | 3,686% |
Object Cache | @Cache(isolation=PROTECTED) | 35,107 | 3,538% |
Data Cache | @Cache(isolation=PROTECTED) + @Noncacheable(true) | 1,889 | 95% |
Read Only Cache | @ReadOnly | 940,123 | 97,322% |
Protected Read Only Cache | @ReadOnly + @Cache(isolation=PROTECTED) | 625,602 | 64,729% |
Cache Option | Cache Config | Average Result (q/s) | % Difference |
---|---|---|---|
No Cache | @Cacheable(false) | 186 | 0% |
Object Cache | @Cacheable(true) | 1,021 | 448% |
Object Cache | @Cache(isolation=PROTECTED) | 1,085 | 483% |
Data Cache | @Cache(isolation=PROTECTED) + @Noncacheable(true) | 198 | 6% |
Read Only Query | "eclipselink.read-only"="true" | 1,391 | 647% |
Read Only Query - Protected Cache | "eclipselink.read-only"="true" + @Cache(isolation=PROTECTED) | 1,351 | 626% |
Query Cache | "eclipselink.query-results-cache"="true" | 5,114 | 2,649% |
In-memory Query | "eclipselink.cache-usage"="CheckCacheOnly" | 2,397 | 1,188% |
Other caching options available in EclipseLink include:
- @Cache - type : FULL, WEAK, SOFT, SOFT_CACHE, HARD_CACHE
- @Cache - size : size of cache in number of objects
- @Cache - expiry : millisecond time to live expiry
- @Cache - expiryTimeOfDay : daily expiry
- @CacheIndex : non-id cache indexing
- "eclipselink.cache.coordination" : clustered cache synchronization or invalidation
- "eclipselink.cache.database-event-listener" : database event driven cache invalidation (Oracle DCN)
- "eclipselink.query-results-cache.expiry" : query cache time to live expiry
- "eclipselink.query-results-cache.expiry-time-of-day" : query cache daily expiry
- TopLink Grid : integration with Oracle Coherence distributed cache