BCL: Serialization Performance

Almost all real world applications needs to perform some form of serialization to turn data into objects and back again. If working against a relational database that will usually entail using a framework to take care of the object mapping for you. NHibernate and the Entity Framework are the most popular choices at the moment. The time it takes to turn data into an object is usually small compared to the database query time and so the serialization mechanism is not a major performance bottleneck.

There are many instances where using a full database for the back end is overkill. In that case you need to take care of the serialization yourself and there are several choices available. To make the right choice we need to find out the performance and output sizes of the options.

Output Size
We will start by using the following trivial definition of a structure that has just a single field.


    [Serializable]
    [DataContract]
    [StructLayout(LayoutKind.Sequential)]
    public struct SmallStruct
    {
        [DataMember]
        public int i;
    }

To discover the output size when using the XmlSerializer we can use the following code. It streams outputs to a MemoryStream that itself uses an array of bytes for storage. This gives us the required metric of byte size.


    var ss = new SmallStruct();
    var xml = new XmlSerializer(typeof(SmallStruct));
    using (var ms = new MemoryStream())
    {
        xml.Serialize(ms, ss);
        Console.WriteLine("Xml Size = {0}", ms.ToArray().Length);
    }

Swapping the XmlSerializer for the BinaryFormatter, DataContractSerializer and DataContractJsonSerializer classes gives us the output size for those alternatives. Our final option for comparison is to write our own serializer that is as fast and concise as possible. This acts as a benchmark for comparing the maximum possible performance against the available implementations. My implementation uses the Marshal class to create an unmanaged copy of the structure and then copy the unmanaged bytes back to managed code as a byte array.


    var ss = new SmallStruct();
    var size = Marshal.SizeOf(ss);
    var ptr = Marshal.AllocHGlobal(size);
    byte[] bytes = new byte[size];
    Marshal.StructureToPtr(ss, ptr, true);
    Marshal.Copy(ptr, bytes, 0, size);
    Marshal.FreeHGlobal(ptr);
    Console.WriteLine("Marshal Size = {0}\n", bytes.Length);

The following table shows how many bytes are output from the various options.

Serialization Size for Single Integer

The Marshal implementation provides the minimum possible number of bytes,  an integer is four bytes and so that is the smallest possible output size. Json is a very compact format and essentially consists of name/value pairs.  With only a single value inside the structure it means an output of just a single name and value becomes very concise.

All the others are relatively large because they include information about the type that is being serialized. Hence even a trivial struct includes a overhead of at least 150 bytes. We can conclude from this that if you are dealing with a large number of small objects you are best of using Json or a custom Marshal style implementation. Otherwise be prepared for a very bloated output.

Performance
To compare performance we use a loop that times the duration of the serialize action. Note that we create the serializer instance outside of the loop and so the set-up cost of any initialization code in the constructor is not included. This is fair because we should be spending far more time persisting objects than creating the serializer instances themselves.


    var sw = new Stopwatch();
    var ss = new SmallStruct();
    var xml = new XmlSerializer(typeof(SmallStruct));
    using (var ms = new MemoryStream(500))
    {
        sw.Start();
        for (int i = 0; i < 300000; i++)
        {
            ms.Seek(0, SeekOrigin.Begin);
            xml.Serialize(ms, ss);
        }
        sw.Stop();
    }

    Console.WriteLine("Serializations per second = {0}",
        (int)((double)1000 / sw.ElapsedMilliseconds * 300000));

The code for the Marshal based implementation is a little different and caches as much set-up information as it can outside the loop.


    var sw = new Stopwatch();
    var ss = new SmallStruct();
    var size = Marshal.SizeOf(ss);
    var ptr = Marshal.AllocHGlobal(size);
    byte[] bytes = new byte[size];
    sw.Start();
    for (int i = 0; i < 300000; i++)
    {
        Marshal.StructureToPtr(ss, ptr, true);
        Marshal.Copy(ptr, bytes, 0, size);
    }
    sw.Stop();

    Marshal.FreeHGlobal(ptr);

    Console.WriteLine("Serializations per second = {0}",
        (int)((double)1000 / sw.ElapsedMilliseconds * 300000));

The following table shows how many serializations were performed per-second.

Serialization Speed for Single Integer

Again the Marshal implementation is the best and this is not surprising as it is hard to imagine performing less work per iteration than simply copying four bytes into a byte array. Of the others the XmlSerializer is a clear loser being significantly slower than any other option.

Larger Structures
Not many real world objects consist of just a single field and so we are going to run the tests again but this time using the following.


    [Serializable]
    [DataContract]
    [StructLayout(LayoutKind.Sequential)]
    public struct SmallStruct
    {
        [DataMember] public int integer1;
        [DataMember] public int integer2;
        [DataMember] public int integer3;
        [DataMember] public float floating1;
        [DataMember] public double floating2;
        [DataMember] public string firstname;

        public SmallStruct(bool x)
        {
            integer1 = 100;
            integer2 = 1000;
            integer3 = 10000;
            floating1 = 123.456F;
            floating2 = 1234.5678;
            firstname = "A typical length string";
        }
    }

Our new struct gives the following output sizes…

Serialization Size for Typical Struct

…and serializations per-second.

Serialization Speed for Typical Struct

Results
We can conclude that there is no good reason to use the XmlSerializer. It produces the most bloated output size and runs much slower than anything else. This is not terribly surprising as it the oldest of the implementations and has been around since the earliest days of the .NET Framework. The introduction of WCF brought us the DataContractSerializer which is used by WCF to convert objects into Xml for transport.

So the only real choice to be made is between using the DataControlSerializer that outputs to Xml or the Json variation. Json produces more compact output but Xml is a better known standard with better tools available for consuming it. The project requirements would likely make this decision for you. If only there was a serializer based on the Marshal style approach we would be able to get the smallest possible output size and lightning fast performance. A graph of relative performance shows just how handy that would be.

Serialization Graph

Resources
MSDN – XmlSerializer
MSDN – BinaryFormatter
MSDN – DataContractSerializer
MSDN – DataContractJsonSerializer
MSDN – Marshal Class

Leave a comment