Shared Strings - jamescourtney/FlatSharp GitHub Wiki

Shared Strings (or string deduplication) is a relatively lightweight way to decrease the size of a serialized FlatBuffer.

How it works

FlatBuffers store strings by pointers. Writing a regular string involves:

  1. Allocating a vector to hold the string
  2. Writing the string into the allocated spot
  3. Writing a pointer into the table or vector that holds the string

Shared Strings in FlatSharp defer these writes until later. So instead of writing "Dog" 10 times, FlatSharp can keep track of the 10 locations in the buffer that need to be updated to point at "Dog".

Shared Strings are not a compression technique, and they do not make reading or writing from Buffers any faster except in contrived cases. LZ4 or other fast compression algorithms will generally give better compression results than Shared Strings, but may be slower.

When is it useful?

Shared Strings are useful only in narrow cases. A canonical example is a collection of property bags:

attribute "fs_sharedString";

table DataSet
{
    Items:[Item];
}

table Item
{
    Pairs:[KeyValuePair];
}

table KeyValuePair
{
    Name:string (fs_sharedString);
    Value:string;
}

With a relatively small number of Name values, and a large number of items, the use of Shared Strings could achieve significant space savings.

How to use

  1. Annotate your string fields (or string vector fields) with the fs_sharedString attribute.
  2. Optionally -- configure the shared string writer using ISerializer<T>.WithSettings(...). The FlatSharp shared string writer is used by default, however, you may inject your own custom shared string writer or remove the shared string writer entirely. The FlatSharp Default Shared String writer is a flush-on-evict hash table. The size of this hash table is configurable via a constructor.

Implementing ISharedStringWriter

FlatSharp provides a default implementation of ISharedStringWriter that is optimized for a balance of size reduction and speed. Custom implementations using Dictionary are trivial and can be seen in the FlatSharp samples.