In JPA the primary key is defined as the JPA Id, and since JPA's object cache is indexed by Id, any queries by this Id obtain cache hits, and avoid database access. This is great when querying by id, and for traversing relationships across foreign keys based on the id, but what about queries not using the id?
Sequence generated ids are great for computers and databases, but are not very useful to people. How many times how you been to a store or website, and to look up your record they asked your for your sequence generated id?
Probably not very often, you are more likely to be asked for your phone number, email address, ssn or other such key that is easy to remember. These data keys can be considered alternative or secondary keys, and are very common in databases. Querying using alternative keys are very common, so it is important that they perform optimally.
The most important thing to ensure is that the columns are properly indexed in the database, otherwise each query will require a full table scan. The second thing to ensure is that the columns are indexed in the JPA object cache.
EclipseLink Database Indexes
To create a database index on a column you can use your own DDL script, or use the@Index
annotation in EclipseLink (since EclipseLink 2.2). JPA 2.1 will also defined its own @Index
annotation. For the EclipseLink index annotation you can just put this on the attributes you would like to index.
@Entity @Index(columnNames={"F_NAME", "L_NAME"}) public class Employee { @Id private long id; @Index @Column(name="F_NAME") private String firstName; @Index @Column(name="L_NAME") private String lastName; @Index(unique=true) private String ssn; @Index private String phoneNumber; @Index private String emailAddress; ... }
JPA 2.1 Database Indexes
The upcoming JPA 2.1 spec (draft) also defines support for database indexes. EclipseLink 2.5 is the reference implementation for JPA 2.1, so in EclipseLink 2.5 (dev builds) you can also create database indexes using the JPA 2.1 annotations. This is a little bit more complex, as you cannot define the@Index
on attributes, only inside table annotations.
@Entity @Table(indexes={ @Index(name="EMP_SSN_INDEX", unique=true, columnList={"SSN"}), @Index(name="EMP_PHONE_INDEX", columnList="PHONENUMBER"), @Index(name="EMP_EMAIL_INDEX", columnList="EMAILADDRESS"), @Index(name="EMP_F_NAME_INDEX", columnList="F_NAME"), @Index(name="EMP_L_NAME_INDEX", columnList="L_NAME"), @Index(name="EMP_NAME_INDEX", columnList={"F_NAME", "L_NAME"}) }) public class Employee { @Id private long id; @Column(name="F_NAME") private String firstName; @Column(name="L_NAME") private String lastName; private String ssn; private String phoneNumber; private String emailAddress; ... }
EclipseLink Cache Indexes
EclipseLink also supports indexes on the object cache (since EclipseLink 2.4). This allows JPQL and Criteria queries on indexed fields to obtain cache hits, and avoid all database access. The@CacheIndex
annotation is used to index an attribute, or set of columns. When a set of columns are indexed, any queries using the attributes that map to those columns will use the index.
Cache indexes only provide a benefit to queries that expect a single result. Indexing a field such as firstName
would provide no benefit, as there are many results with the same first name, and EclipseLink could never be certain is has them all loaded in the cache, so must access the database.
@Entity @CacheIndex(columnNames={"F_NAME", "L_NAME"}) public class Employee { @Id private long id; @Column(name="F_NAME") private String firstName; @Column(name="L_NAME") private String lastName; @CacheIndex private String ssn; @CacheIndex private String phoneNumber; @CacheIndex private String emailAddress; ... }
So, is indexing worth it? I created an example benchmark that shows the difference between indexed and non-indexed queries. These results were obtained accessing an Oracle database across a LAN from a desktop machine. Results are queries per second, so a bigger number is better. The test consisted of querying a Customer object by name, with a database size of 1,000 customers.
Config | Average Result (q/s) | % Difference |
---|---|---|
No index, no cache | 7,155 | 0% |
No index, cache | 8,095 | 13% |
Database index, cache | 9,900 | 38% |
Database index, cache index | 137,120 | 1,816% |
The results show that the object cache indexed by id provides a little benefit (13%), as it avoids having to rebuild the object, but still has to access the database. Note the test did not access any relationships on Customer, if it had this object cache would still provide a major benefit in avoiding relationship queries. The database index provides a better benefit, (38%), this is a factor of the size of the table, the bigger the table the bigger the benefit. The object cache index provides the best benefit, almost a 20x improvement.
This shows it is important to properly index your database and your cache. Avoid indexing everything, as there is a cost in maintaining indexes. For the database, index any commonly queried columns, for the cache, index any secondary key fields.
See the EclipseLink UserGuide for more info on caching.
The source code for the benchmarks used in this post can be found here, or downloaded here.