Now, don't get me wrong, I'm not trying to dis Java. Java serialization has many requirements, the main one being able to serialize anything (or at least anything that implements Serializable
), into any other JVM (even a different JVM version/implementation), even running a different version of the classes being serialized (as long as you set a serialVersionUID
). The main thing is it just works, and that is really great. Performance is not the main requirement, and the format is standard and must be backward compatible, so optimization is very difficult. Also, for many types of use cases, Java serialization performs very well.
I got started on this journey into the bowels of serialization while working on a three tier concurrency benchmark. I noticed a lot of the CPU time being spent inside Java serialization, so I decided to investigate. I started by serializing a simple Order
object that had a couple of fields. I serialized the object and output the bytes. Although the Order object only had a few bytes of data, I was not that naive to think it would serialize to only a few bytes, I knew enough about serialization that it would at least need to write out the full class name, so it knew what it had serialized, so it could read it back. So I expected, maybe 50 bytes or so. The result was over 600 bytes, that's when I realized Java serialization was not as simple as I had imagined.
Java serialization bytes for Order object
----sr--model.Order----h#-----J--idL--customert--Lmodel/Customer;L--descriptiont--Ljava/lang/String;L--orderLinest--Ljava/util/List;L--totalCostt--Ljava/math/BigDecimal;xp--------ppsr--java.util.ArrayListx-----a----I--sizexp----w-----sr--model.OrderLine--&-1-S----I--lineNumberL--costq-~--L--descriptionq-~--L--ordert--Lmodel/Order;xp----sr--java.math.BigDecimalT--W--(O---I--scaleL--intValt--Ljava/math/BigInteger;xr--java.lang.Number-----------xp----sr--java.math.BigInteger-----;-----I--bitCountI--bitLengthI--firstNonzeroByteNumI--lowestSetBitI--signum[--magnitudet--[Bxq-~----------------------ur--[B------T----xp----xxpq-~--xq-~--
(note "-" means an unprintable character)
As you may have noticed, Java serialization writes out not only the full class name of the object being serialized, but also the entire class definition of the class being serialized, and all of the referenced classes. The class definition can be quite large, and seems to be the main performance and efficiency issue, especially when writing out a single object. If you are writing out a large number of objects of the same class, then the class definition overhead is not normally a big issue. One other thing that I noticed, is that if your object has a reference to a class (such as a meta-data object), then Java serialization will write the entire class definition, not just the class name, so using Java serialization to write out meta-data is very expensive.
Externalizable
It is possible to optimize Java serialization through implementing the Externalizable interface. Implementing this interface avoids writing out the entire class definition, just the class name is written. It requires that you implement thereadExternal
and writeExternal
methods, so requires some work and maintenance on your part, but is faster and more efficient than just implementing Serializable.
One interesting note on the results for Externalizable is that it was much more efficient for a small number of objects, but actually output slightly more bytes than Serializable for a large number of objects. I assume the Externalizable format is slightly less efficient for repeated objects.
Externalizable class
public class Order implements Externalizable {
private long id;
private String description;
private BigDecimal totalCost = BigDecimal.valueOf(0);
private List orderLines = new ArrayList();
private Customer customer;
public Order() {
}
public void readExternal(ObjectInput stream) throws IOException, ClassNotFoundException {
this.id = stream.readLong();
this.description = (String)stream.readObject();
this.totalCost = (BigDecimal)stream.readObject();
this.customer = (Customer)stream.readObject();
this.orderLines = (List)stream.readObject();
}
public void writeExternal(ObjectOutput stream) throws IOException {
stream.writeLong(this.id);
stream.writeObject(this.description);
stream.writeObject(this.totalCost);
stream.writeObject(this.customer);
stream.writeObject(this.orderLines);
}
}
Externalizable serialization bytes for Order object
----sr--model.Order---*3--^---xpw---------psr--java.math.BigDecimalT--W--(O---I--scaleL--intValt--Ljava/math/BigInteger;xr--java.lang.Number-----------xp----sr--java.math.BigInteger-----;-----I--bitCountI--bitLengthI--firstNonzeroByteNumI--lowestSetBitI--signum[--magnitudet--[Bxq-~----------------------ur--[B------T----xp----xxpsr--java.util.ArrayListx-----a----I--sizexp----w-----sr--model.OrderLine-!!|---S---xpw-----pq-~--q-~--xxx
Other Serialization Options
I started to investigate what other serialization options there were in Java. I started with EclipseLink MOXy, which supports serializing objects to XML or JSON through the JAXB API. I was not expecting XML serialization to outperform Java serialization, so was quite surprised when it did for certain use cases. I also found a product Kryo, which is an open source project for optimized serialization. I also investigated the Oracle Coherence POF serialization format. There are pros and cons with each product, but my main focus was on how their performance compared, and how efficient they were.
EclipseLink MOXy - XML and JSON
The main advantage of using EclipseLink MOXy to serialize to XML or JSON is that both are standard, portable formats. You can access the data from any client using any language, so are not restricted to Java, as with Java serialization. You can also integrate your data with web services and REST services. Both formats are also text based, so human readable. No coding or special interfaces are required, only meta-data. The performance is quite acceptable, and outperforms Java serialization for small data-sets.
The draw backs are that the text formats are less efficient than optimized binary formats, and JAXB requires meta-data. So you need to annotate your classes with JAXB annotations, or provide an XML configuration file. Also, circular references are not handled by default, you need to use an
Kryo works similar to Java serialization, and respects transient fields, but does not require a class be Serializable. I found Kryo to have some limitations, such as requiring classes to have a default constructor, and encountered some issues in serializing java.sql.Time, java.sql.Date and java.sql.Timestamp classes.
POF provides a serialization framework, and can be used independently of Coherence (if you already have Coherence licensed). POF requires that your class implement a
The total bytes for POF was 32 bytes for a single Order/OrderLine object, and 1593 bytes for 100 OrderLines. I'm not going to give the results, as POF is part of a commercially licensed product, but is was very fast.
The results show that Java serialization is slow for a small number of objects, but good for a large number of objects. Conversely, XML and JSON can outperform Java serialization for a small number of objects, but Java serialization is faster for a large number of objects. Kryo and other optimized binary serializers outperform Java serialization with both types of data.
You may wonder why it is relevant that something that takes less than a millisecond is of any relevance to performance, and you may be right. In general you would only have a real performance problem if you wrote out a very large number of objects, and then Java serialization performs quite well, so is the fact that it performs very poorly for a small number of object relevant? For a single operation, this is probably true, but if you execute many small serialization operations, then the cost is relevant. A typical server servicing many clients will typically send out many small requests, so while the cost of serialization is not great enough to make any of these single requests take a long time, it will dramatically effects the scalability of the server.
One such place is in serialized @Lob mappings. You can now use the @Convert annotation to specify a serializer such as @Convert(XML), @Convert(JSON), @Convert(Kryo). In addition to optimizing performance, this provides an easy mechanism to write XML and JSON data to your database.
Also for EclipseLink cache coordination you can choose your serializer using the "eclipselink.cache.coordination.serializer" property.
The source code for the benchmarks used in this post can be found here, or downloaded here.
JAXB annotated classes
@XmlRootElement
public class Order {
@XmlID
@XmlAttribute
private long id;
@XmlAttribute
private String description;
@XmlAttribute
private BigDecimal totalCost = BigDecimal.valueOf(0);
private List
EclipseLink MOXy serialization XML for Order object
<order id="0" totalCost="0"><orderLines lineNumber="1" cost="0"><order>0</order></orderLines></order>
EclipseLink MOXy serialization JSON for Order object
{"order":{"id":0,"totalCost":0,"orderLines":[{"lineNumber":1,"cost":0,"order":0}]}}
Kryo
Kryo is a fast, efficient serialization framework for Java.
Kryo is an open source project on Google code that is provided under the New BSD license. It is a small project, with only 3 members, it first shipped in 2009 and last shipped the 2.21 release in Feb 2013, so is still actively being developed.
Kryo serialization bytes for Order object
------java-util-ArrayLis-----model-OrderLin----java-math-BigDecima---------model-Orde-----
Oracle Coherence POF
The Oracle Coherence product provides its own optimized binary format called POF (portable object format). Oracle Coherence is an in-memory data grid solution (distributed cache). Coherence is a commercial product, and requires a license. EclipseLink supports an integration with Oracle Coherence through the Oracle TopLink Grid product that uses Coherence as the EclipseLink shared cache.
PortableObject
interface and read/write methods. You can also implement a separate Serializer class, or use annotations in the latest Coherence release. POF requires that each class be assigned an constant id ahead of time, so you need some way determine this id. The POF format is a binary format, very compact, efficient, and fast, but does require some work on your part.
POF PortableObject
public class Order implements PortableObject {
private long id;
private String description;
private BigDecimal totalCost = BigDecimal.valueOf(0);
private List
POF serialization bytes for Order object
-----B--G---d-U------A--G-------
Results
So how does each perform? I made a simple benchmark to compare the different serialization mechanisms. I compared the serialization of two different use cases. The first is a single Order object with a single OrderLine object. The second is a single Order object with 100 OrderLine objects. I compared the average serialization operations per second, and measure the size in bytes of the serialized data. Different object models, use cases, and environments will give different results, but this gives you a general idea of the performance differences in different serializers.
Order with 1 OrderLine
Serializer Size (bytes) Serialize (operations/second) Deserialize (operations/second) % Difference (from Java serialize) % Difference (deserialize) Java Serializable 636 128,634 19,180 0% 0% Java Externalizable 435 160,549 26,678 24% 39% EclipseLink MOXy XML 101 348,056 47,334 170% 146% Kryo 90 359,368 346,984 179% 1709% Order with 100 OrderLines
Serializer Size (bytes) Serialize (operations/second) Deserialize (operations/second) % Difference (from Java serialize) % Difference (deserialize) Java Serializable 2,715 16,470 10,215 0% 0% Java Externalizable 2,811 16,206 11,483 -1% 12% EclipseLink MOXy XML 6,628 7,304 2,731 -55% -73% Kryo 1216 22,862 31,499 38% 208% EclipseLink JPA
In EclipseLink 2.6 development builds, and to some degree 2.5, we have added the ability to choose your serializer anywhere that EclipseLink does serialization.