Tuesday, August 13, 2013

Optimizing Java Serialization - Java vs XML vs JSON vs Kryo vs POF

Perhaps I'm naive, but I always thought Java serialization must surely be the fastest and most efficient way to serialize Java objects into binary form. After all, Java is on it's 7th major release, so this is not new technology, and since every JDK seems to be faster than the last, I incorrectly assumed serialization must be very fast and efficient by now. I thought, since Java serialization is binary, and language dependent, it must be much faster and more efficient than XML or JSON. Unfortunately, I was wrong, if you concerned about performance, I would recommend avoiding Java serialization.

Now, don't get me wrong, I'm not trying to dis Java. Java serialization has many requirements, the main one being able to serialize anything (or at least anything that implements Serializable), into any other JVM (even a different JVM version/implementation), even running a different version of the classes being serialized (as long as you set a serialVersionUID). The main thing is it just works, and that is really great. Performance is not the main requirement, and the format is standard and must be backward compatible, so optimization is very difficult. Also, for many types of use cases, Java serialization performs very well.

I got started on this journey into the bowels of serialization while working on a three tier concurrency benchmark. I noticed a lot of the CPU time being spent inside Java serialization, so I decided to investigate. I started by serializing a simple Order object that had a couple of fields. I serialized the object and output the bytes. Although the Order object only had a few bytes of data, I was not that naive to think it would serialize to only a few bytes, I knew enough about serialization that it would at least need to write out the full class name, so it knew what it had serialized, so it could read it back. So I expected, maybe 50 bytes or so. The result was over 600 bytes, that's when I realized Java serialization was not as simple as I had imagined.

Java serialization bytes for Order object

----sr--model.Order----h#-----J--idL--customert--Lmodel/Customer;L--descriptiont--Ljava/lang/String;L--orderLinest--Ljava/util/List;L--totalCostt--Ljava/math/BigDecimal;xp--------ppsr--java.util.ArrayListx-----a----I--sizexp----w-----sr--model.OrderLine--&-1-S----I--lineNumberL--costq-~--L--descriptionq-~--L--ordert--Lmodel/Order;xp----sr--java.math.BigDecimalT--W--(O---I--scaleL--intValt--Ljava/math/BigInteger;xr--java.lang.Number-----------xp----sr--java.math.BigInteger-----;-----I--bitCountI--bitLengthI--firstNonzeroByteNumI--lowestSetBitI--signum[--magnitudet--[Bxq-~----------------------ur--[B------T----xp----xxpq-~--xq-~--
(note "-" means an unprintable character)

As you may have noticed, Java serialization writes out not only the full class name of the object being serialized, but also the entire class definition of the class being serialized, and all of the referenced classes. The class definition can be quite large, and seems to be the main performance and efficiency issue, especially when writing out a single object. If you are writing out a large number of objects of the same class, then the class definition overhead is not normally a big issue. One other thing that I noticed, is that if your object has a reference to a class (such as a meta-data object), then Java serialization will write the entire class definition, not just the class name, so using Java serialization to write out meta-data is very expensive.

Externalizable

It is possible to optimize Java serialization through implementing the Externalizable interface. Implementing this interface avoids writing out the entire class definition, just the class name is written. It requires that you implement the readExternal and writeExternal methods, so requires some work and maintenance on your part, but is faster and more efficient than just implementing Serializable.

One interesting note on the results for Externalizable is that it was much more efficient for a small number of objects, but actually output slightly more bytes than Serializable for a large number of objects. I assume the Externalizable format is slightly less efficient for repeated objects.

Externalizable class

public class Order implements Externalizable {
    private long id;
    private String description;
    private BigDecimal totalCost = BigDecimal.valueOf(0);
    private List orderLines = new ArrayList();
    private Customer customer;

    public Order() {
    }

    public void readExternal(ObjectInput stream) throws IOException, ClassNotFoundException {
        this.id = stream.readLong();
        this.description = (String)stream.readObject();
        this.totalCost = (BigDecimal)stream.readObject();
        this.customer = (Customer)stream.readObject();
        this.orderLines = (List)stream.readObject();
    }

    public void writeExternal(ObjectOutput stream) throws IOException {
        stream.writeLong(this.id);
        stream.writeObject(this.description);
        stream.writeObject(this.totalCost);
        stream.writeObject(this.customer);
        stream.writeObject(this.orderLines);
    }
}

Externalizable serialization bytes for Order object

----sr--model.Order---*3--^---xpw---------psr--java.math.BigDecimalT--W--(O---I--scaleL--intValt--Ljava/math/BigInteger;xr--java.lang.Number-----------xp----sr--java.math.BigInteger-----;-----I--bitCountI--bitLengthI--firstNonzeroByteNumI--lowestSetBitI--signum[--magnitudet--[Bxq-~----------------------ur--[B------T----xp----xxpsr--java.util.ArrayListx-----a----I--sizexp----w-----sr--model.OrderLine-!!|---S---xpw-----pq-~--q-~--xxx

Other Serialization Options

I started to investigate what other serialization options there were in Java. I started with EclipseLink MOXy, which supports serializing objects to XML or JSON through the JAXB API. I was not expecting XML serialization to outperform Java serialization, so was quite surprised when it did for certain use cases. I also found a product Kryo, which is an open source project for optimized serialization. I also investigated the Oracle Coherence POF serialization format. There are pros and cons with each product, but my main focus was on how their performance compared, and how efficient they were.

EclipseLink MOXy - XML and JSON

The main advantage of using EclipseLink MOXy to serialize to XML or JSON is that both are standard, portable formats. You can access the data from any client using any language, so are not restricted to Java, as with Java serialization. You can also integrate your data with web services and REST services. Both formats are also text based, so human readable. No coding or special interfaces are required, only meta-data. The performance is quite acceptable, and outperforms Java serialization for small data-sets.

The draw backs are that the text formats are less efficient than optimized binary formats, and JAXB requires meta-data. So you need to annotate your classes with JAXB annotations, or provide an XML configuration file. Also, circular references are not handled by default, you need to use an @XmlIDREF to handle cycles.

JAXB annotated classes

@XmlRootElement
public class Order {
    @XmlID
    @XmlAttribute
    private long id;
    @XmlAttribute
    private String description;
    @XmlAttribute
    private BigDecimal totalCost = BigDecimal.valueOf(0);
    private List orderLines = new ArrayList();
    private Customer customer;
}

public class OrderLine {
    @XmlIDREF
    private Order order;
    @XmlAttribute
    private int lineNumber;
    @XmlAttribute
    private String description;
    @XmlAttribute
    private BigDecimal cost = BigDecimal.valueOf(0);
}

EclipseLink MOXy serialization XML for Order object

<order id="0" totalCost="0"><orderLines lineNumber="1" cost="0"><order>0</order></orderLines></order>

EclipseLink MOXy serialization JSON for Order object

{"order":{"id":0,"totalCost":0,"orderLines":[{"lineNumber":1,"cost":0,"order":0}]}}

Kryo

Kryo is a fast, efficient serialization framework for Java. Kryo is an open source project on Google code that is provided under the New BSD license. It is a small project, with only 3 members, it first shipped in 2009 and last shipped the 2.21 release in Feb 2013, so is still actively being developed.

Kryo works similar to Java serialization, and respects transient fields, but does not require a class be Serializable. I found Kryo to have some limitations, such as requiring classes to have a default constructor, and encountered some issues in serializing java.sql.Time, java.sql.Date and java.sql.Timestamp classes.

Kryo serialization bytes for Order object

------java-util-ArrayLis-----model-OrderLin----java-math-BigDecima---------model-Orde-----

Oracle Coherence POF

The Oracle Coherence product provides its own optimized binary format called POF (portable object format). Oracle Coherence is an in-memory data grid solution (distributed cache). Coherence is a commercial product, and requires a license. EclipseLink supports an integration with Oracle Coherence through the Oracle TopLink Grid product that uses Coherence as the EclipseLink shared cache.

POF provides a serialization framework, and can be used independently of Coherence (if you already have Coherence licensed). POF requires that your class implement a PortableObject interface and read/write methods. You can also implement a separate Serializer class, or use annotations in the latest Coherence release. POF requires that each class be assigned an constant id ahead of time, so you need some way determine this id. The POF format is a binary format, very compact, efficient, and fast, but does require some work on your part.

The total bytes for POF was 32 bytes for a single Order/OrderLine object, and 1593 bytes for 100 OrderLines. I'm not going to give the results, as POF is part of a commercially licensed product, but is was very fast.

POF PortableObject

public class Order implements PortableObject {
    private long id;
    private String description;
    private BigDecimal totalCost = BigDecimal.valueOf(0);
    private List orderLines = new ArrayList();
    private Customer customer;

    public Order() {
    }
    
    public void readExternal(PofReader in) throws IOException {
        this.id = in.readLong(0);
        this.description = in.readString(1);
        this.totalCost = in.readBigDecimal(2);
        this.customer = (Customer)in.readObject(3);
        this.orderLines = (List)in.readCollection(4, new ArrayList());
    }

    public void writeExternal(PofWriter out) throws IOException {
        out.writeLong(0, this.id);
        out.writeString(1, this.description);
        out.writeBigDecimal(2, this.totalCost);
        out.writeObject(3, this.customer);
        out.writeCollection(4, this.orderLines);
    }
}

POF serialization bytes for Order object

-----B--G---d-U------A--G-------

Results

So how does each perform? I made a simple benchmark to compare the different serialization mechanisms. I compared the serialization of two different use cases. The first is a single Order object with a single OrderLine object. The second is a single Order object with 100 OrderLine objects. I compared the average serialization operations per second, and measure the size in bytes of the serialized data. Different object models, use cases, and environments will give different results, but this gives you a general idea of the performance differences in different serializers.

The results show that Java serialization is slow for a small number of objects, but good for a large number of objects. Conversely, XML and JSON can outperform Java serialization for a small number of objects, but Java serialization is faster for a large number of objects. Kryo and other optimized binary serializers outperform Java serialization with both types of data.

You may wonder why it is relevant that something that takes less than a millisecond is of any relevance to performance, and you may be right. In general you would only have a real performance problem if you wrote out a very large number of objects, and then Java serialization performs quite well, so is the fact that it performs very poorly for a small number of object relevant? For a single operation, this is probably true, but if you execute many small serialization operations, then the cost is relevant. A typical server servicing many clients will typically send out many small requests, so while the cost of serialization is not great enough to make any of these single requests take a long time, it will dramatically effects the scalability of the server.

Order with 1 OrderLine

SerializerSize (bytes)Serialize (operations/second)Deserialize (operations/second)% Difference (from Java serialize)% Difference (deserialize)
Java Serializable636128,63419,1800%0%
Java Externalizable435160,54926,67824%39%
EclipseLink MOXy XML101348,05647,334170%146%
Kryo90359,368346,984179%1709%

Order with 100 OrderLines

SerializerSize (bytes)Serialize (operations/second)Deserialize (operations/second)% Difference (from Java serialize)% Difference (deserialize)
Java Serializable2,71516,47010,2150%0%
Java Externalizable2,81116,20611,483-1%12%
EclipseLink MOXy XML6,6287,3042,731-55%-73%
Kryo121622,86231,49938%208%

EclipseLink JPA

In EclipseLink 2.6 development builds, and to some degree 2.5, we have added the ability to choose your serializer anywhere that EclipseLink does serialization.

One such place is in serialized @Lob mappings. You can now use the @Convert annotation to specify a serializer such as @Convert(XML), @Convert(JSON), @Convert(Kryo). In addition to optimizing performance, this provides an easy mechanism to write XML and JSON data to your database.

Also for EclipseLink cache coordination you can choose your serializer using the "eclipselink.cache.coordination.serializer" property.

The source code for the benchmarks used in this post can be found here, or downloaded here.

13 comments :

  1. This comment has been removed by the author.

    ReplyDelete
  2. Serializing objects directly using ByteBuffers or sun.misc.Unsafe can be orders of magnitude faster than Java serialization: http://mechanical-sympathy.blogspot.jp/2012/07/native-cc-like-performance-for-java.html

    ReplyDelete
  3. I can't resist... Here are some numbers from the article above:

    2.4GHz Sandy Bridge - Java 1.7.0_04
    ===================================
    0 Serialisation write=1,940ns read=9,006ns total=10,946ns
    1 Serialisation write=1,674ns read=8,567ns total=10,241ns
    2 Serialisation write=1,666ns read=8,680ns total=10,346ns
    3 Serialisation write=1,666ns read=8,623ns total=10,289ns
    4 Serialisation write=1,715ns read=8,586ns total=10,301ns
    0 ByteBuffer write=199ns read=198ns total=397ns
    1 ByteBuffer write=176ns read=178ns total=354ns
    2 ByteBuffer write=174ns read=174ns total=348ns
    3 ByteBuffer write=172ns read=183ns total=355ns
    4 ByteBuffer write=174ns read=180ns total=354ns
    0 UnsafeMemory write=38ns read=75ns total=113ns
    1 UnsafeMemory write=26ns read=52ns total=78ns
    2 UnsafeMemory write=26ns read=51ns total=77ns
    3 UnsafeMemory write=25ns read=51ns total=76ns
    4 UnsafeMemory write=27ns read=50ns total=77ns

    ReplyDelete
  4. @remko Thanks for the comment, interesting blog post, in terms of rolling your own serialization, and managing your own memory. The object being tested is quite simplistic, rolling your own becomes more complex as your object model's complexity increases. Also the test was only writing a single object, where Java serialization is going to have a big overhead, Java serialization would do much better for a larger set of the objects.

    ReplyDelete
  5. Wonder whether you can include GSON and protobuf in the benchmark.

    ReplyDelete
  6. @Remko, @James:
    FYI, the trunk version of Kryo (2.22-SNAPSHOT) provides support for serialization using ByteBuffers and sun.misc.Unsafe. Just use UnsafeInput/UnsafeOutput or UnsafeMemoryInput/UnsafeMemoryOutput instead of the usual Input/Output classes provided by Kryo. These specialized classes provide a significant performance boost in many situations.

    BTW, similar approach was recently borrowed from Kryo by Hazelcast 3.0 and Avro. So, expect good results from them as well.

    ReplyDelete
  7. If you are interested in serialization performance, you should first check out JVM-serializers project (https://github.com/eishay/jvm-serializers). It has already tested most of things you did.

    One very fast choice you did not include is Jackson (https://github.com/FasterXML/jackson).
    Jackson is the fastest way to serialize POJOs as JSON (and possibly fastest for XML as well). It also supports binary JSON representation called Smile.
    The main reason for Jackson inclusion is actually the fact that most Java service frameworks nowadays include it in one way or another; I was actually surprised to see that it was not included here.

    ReplyDelete

  8. Hello Mr Sutherland,

    Nice blog! I would like to discuss a partnership opportunity between
    our sites. Is there an email address I can contact you in private?

    Thanks,

    Eleftheria Kiourtzoglou

    Head of Editorial Team

    Java Code Geeks

    email: [dot][at]javacodegeeks[dot]com

    ReplyDelete
  9. For my own interest, I did the same with Java Chronicle 2.0. Java Chronicle includes writing and reading from a file which makes it more interesting for me.

    order lines: 1
    orderCache-Kryo-serialize : 743181
    orderCache-Kryo-deserialize : 707776
    Size: 90

    orderCache-Java-serialize : 217936
    orderCache-Java-deserialize : 31579
    Size: 636

    order-Chronicle-serialize : 6892399
    order-Chronicle-deserialize : 5943322
    Size: 12

    ---

    order lines: 100
    order-Chronicle-serialize : 201030
    order-Chronicle-deserialize : 146758
    Size: 507

    ReplyDelete
    Replies
    1. Cool! Could you provide the source code for your experiment so that others can test it as well? Thanks!

      Delete
  10. https://github.com/RuedigerMoeller/fast-serialization
    is a 100% jdk serialization replacement, which outperforms kryo for most real live applications.

    ReplyDelete
  11. Problem: HP Printer not connecting to my laptop.
    I had an issue while connecting my 2 year old HP printer to my brother's laptop that I had borrowed for starting my own business. I used a quick google search to fix the problem but that did not help me. I then decided to get professional help to solve my problem. After having received many quotations from various companies, i decided to go ahead with Online Tech Repair (www.onlinetechrepairs.com).
    Reasons I chose them over the others:
    1) They were extremely friendly and patient with me during my initial discussions and responded promptly to my request.
    2) Their prices were extremely reasonable.
    3) They were ready and willing to walk me through the entire process step by step and were on call with me till i got it fixed.
    How did they do it
    1) They first asked me to state my problem clearly and asked me a few questions. This was done to detect any physical connectivity issues with the printer.
    2) After having answered this, they confirmed that the printer and the laptop were functioning correctly.
    3) They then, asked me if they could access my laptop remotely to troubleshoot the problem and fix it. I agreed.
    4) One of the tech support executives accessed my laptop and started troubleshooting.
    5) I sat back and watched as the tech support executive was navigating my laptop to spot the issue. The issue was fixed.
    6) I was told that it was due to an older version of the driver that had been installed.
    My Experience I loved the entire friendly conversation that took place with them. They understood my needs clearly and acted upon the solution immediately. Being a technical noob, i sometimes find it difficult to communicate with tech support teams. It was a very
    different experience with the guys at Online Tech Repairs. You can check out their website www.onlinetechrepairs.com or call them on 1-914-613-3786.
    Would definitely recommend this service to anyone who needs help fixing their computers.
    Thanks a ton guys. Great Job....

    ReplyDelete
  12. Is Your VIRUS REMOVAL
    Computer Sluggish or Plagued With a Virus? – If So you Need Online Tech Repairs
    As a leader in online computer repair, Online Tech Repairs Inc has the experience to deliver professional system optimization and virus removal.Headquartered in Great Neck, New York our certified technicians have been providing online computer repair and virus removal for customers around the world since 2004.
    Our three step system is easy to use; and provides you a safe, unobtrusive, and cost effective alternative to your computer service needs. By using state-of-the-art technology our computer experts can diagnose, and repair your computer system through the internet, no matter where you are.
    Our technician will guide you through the installation of Online Tech Repair Inc secure software. This software allows your dedicated computer expert to see and operate your computer just as if he was in the room with you. That means you don't have to unplug everything and bring it to our shop, or have a stranger tramping through your home.
    From our remote location the Online Tech Repairs.com expert can handle any computer issue you want addressed, like:
    • - System Optimization
    • - How it works Software Installations or Upgrades
    • - How it works Virus Removal
    • - How it works Home Network Set-ups
    Just to name a few.
    If you are unsure of what the problem may be, that is okay. We can run a complete diagnostic on your system and fix the problems we encounter. When we are done our software is removed; leaving you with a safe, secure and properly functioning system. The whole process usually takes less than an hour. You probably couldn't even get your computer to your local repair shop that fast!
    Call us now for a FREE COMPUTER DIAGONISTIC using DISCOUNT CODE (otr214425@gmail.com) on +1-914-613-3786 or chat with us on www.onlinetechrepairs.com.

    ReplyDelete