Serialization

Serializable Concept

A type T is Serializable if and only if one of the following is true:

it is a primitive type.
By primitive type we mean a C++ built-in type and ONLY a C++ built-in type. Arithmetic (including characters), bool, enum are primitive types. Below in serialization traits, we define a "primitive" implementation level in a different way for a different purpose. This can be a source of confusion.
It is a class type and one of the following has been declared according to the prototypes detailed below:
- a class member function serialize
- a global function serialize
it is a pointer to a Serializable type.
it is a reference to a Serializable type.
it is a native C++ Array of Serializable type.

Primitive Types

The template operators &, <<, and >> of the archive classes described above will generate code to save/load all primitive types to/from an archive. This code will usually just add the data to the archive according to the archive format. For example, a four byte integer is appended to a binary archive as 4 binary bytes while a to a text archive it would be rendered as a space followed by a string representation.

Class Types

For class/struct types, the template operators &, <<, and >> will generate code that invokes the programmer's serialization code for the particular data type. There is no default. An attempt to serialize a class/struct for which no serialization has been explicitly specified will result in a compile time error. The serialiation of a class can be specified via either a class member function or a free funcation which takes a reference to an instance of the class as an argument.

Member Function

The serialization library invokes the following code to save or load a class instance to/from and archive.


template<class Archive, class T>
inline void serialize(
    Archive & ar, 
    T & t, 
    const unsigned int file_version
){
    // invoke member function for class T
    t.serialize(ar, file_version);
}

That is, the default definition of template serialize presumes the existence of a class member function template of the following signature:


template<class Archive>
void serialize(Archive &ar, const unsigned int version){
    ...
}

If such a member function is not declared, a compile time error will occur. In order that the member function generated by this template can be called to append the data to an archive, it either must be public or the class must be made accessible to the serialization library by including:


friend class boost::serialization::access;

in the class definition. This latter method should be preferred over the option of making the member function public. This will prevent serialization functions from being called from outside the library. This is almost certainly an error. Unfortunately, it may appear to function but fail in a way that is very difficult to find.

It may not be immediately obvious how this one template serves for both saving data to an archive as well as loading data from the archive. The key is that the & operator is defined as << for output archives and as >> input archives. The "polymorphic" behavior of the & permits the same template to be used for both save and load operations. This is very convenient in that it saves a lot of typing and guarantees that the saving and loading of class data members are always in sync. This is the key to the whole serialization system.

Free Function

Of course we're not restricted to using the default implementation described above. We can override the default one with our own. Doing this will permit us to implement serialization of a class without altering the class definition itself. We call this non-intrusive serialization. Suppose our class is named my_class, the override would be specified as:


// namespace selection

template<class Archive>
inline void serialize(
    Archive & ar, 
    my_class & t, 
    const unsigned int file_version
){
    ...
}

Note that we have called this override "non-intrusive". This is slightly inaccurate. It does not require that the class have special functions, that it be derived from some common base class or any other fundamental design changes. However, it will require access to the class members that are to be saved and loaded. If these members are private, it won't be possible to serialize them. So in some instances, minor modifications to the class to be serialized will be necessary even when using this "non-intrusive" method. In practice this may not be such a problem as many libraries (E.G. STL) expose enough information to permit implementation of non-intrusive serialization with absolutly no changes to the library.

Namespaces for Free Function Overrides

For maximum portability, include any free functions templates and definitions in the namespace boost::serialization. If portability is not a concern and the compiler being used supports ADL (Argument Dependent Lookup) the free functions and templates can be in any of the following namespaces:

boost::serialization
namespace of the archive class
namespace of the type being serialized

Note that, at first glance, this suggestion may seem to be wrong for compilers which implement two phase lookup. In fact, the serialization library used a perhaps overly clever method to support this rule even for such compilers. Those with an interest in studying this further will find more information in serialization.hpp

Serialization of Class Members

Regardless of which of the above methods is used, the body of the serialize function must specify the data to be saved/loaded by sequential application of the archive operator & to all the data members of the class.


{
    // save/load class member variables
    ar & member1;
    ar & member2;
}

Base Classes

The header file base_object.hpp includes the template:


template<class Base, class Derived>
Base & base_object(Derived &d);

which should be used to create a reference to an object of the base which can be used as an argument to the archive serialization operators. So for a class of Serializable type T the base class state should be serialized like this:


{
    // invoke serialization of the base class 
    ar & boost::serialization::base_object<base_class_of_T>(*this);
    // save/load class member variables
    ar & member1;
    ar & member2;
}

Resist the temptation to just cast *this to the base class. This might seem to work but may fail to invoke code necessary for proper serialization.

Note that this is NOT the same as calling the serialize function of the base class. This might seem to work but will circumvent certain code used for tracking of objects, and registering base-derived relationships and other bookkeeping that is required for the serialization system to function as designed. For this reason, all serialize member functions should be private.

`const` Members

Saving const members to an archive requires no special considerations. Loading const members can be addressed by using a const_cast:


    ar & const_cast<T &>(t);

Note that this violates the spirit and intention of the const keyword. const members are intialized when a class instance is constructed and not changed thereafter. However, this may be most appropriate in many cases. Ultimately, it comes down to the question about what const means in the context of serialization.

Templates

Implementation of serialization for templates is exactly the same process as for normal classes and requires no additional considerations. Among other things, this implies that serialization of compositions of templates are automatically generated when required if serialization of the component templates is defined. For example, this library includes definition of serialization for boost::shared_ptr<T> and for std::list<T>. If I have defined serialization for my own class my_t, then serialization for std::list< boost::shared_ptr< my_t> > is already available for use.

For an example that shows how this idea might be implemented for your own class templates, see demo_auto_ptr.cpp. This shows how non-intrusive serialization for the template auto_ptr from the standard library can be implemented.

A somewhat trickier addition of serialization to a standard template can be found in the example shared_ptr.hpp

In the specification of serialization for templates, its common to split serialize into a load/save pair. Note that the convenience macro described above isn't helpful in these cases as the number and kind of template class arguments won't match those used when splitting serialize for a simple class. Use the override syntax instead.

Versioning

It will eventually occur that class definitions change after archives have been created. When a class instance is saved, the current version in included in the class information stored in the archive. When the class instance is loaded from the archive, the original version number is passed as an argument to the loading function. This permits the load function to include logic to accommodate older definitions for the class and reconcile them with latest version. Save functions always save the current version. So this results in automatically converting older format archives to the newest versions. Version numbers are maintained independently for each class. This results in a simple system for permitting access to older files and conversion of same. The current version of the class is assigned as a Class Serialization Trait described later in this manual.


{
    // invoke serialization of the base class 
    ar & boost::serialization::base_object<base_class_of_T>(*this);
    // save/load class member variables
    ar & member1;
    ar & member2;
    // if its a recent version of the class
    if(1 < file_version)
        // save load recently added class members
        ar & member3;
}

Splitting `serialize` into Save/Load

There are times when it is inconvenient to use the same template for both save and load functions. For example, this might occur if versioning gets complex.

Splitting Member Functions

For member functions this can be addressed by including the header file boost/serialization/split_member.hpp including code like this in the class:


template<class Archive>
void save(Archive & ar, const unsigned int version) const
{
    // invoke serialization of the base class 
    ar << boost::serialization::base_object<const base_class_of_T>(*this);
    ar << member1;
    ar << member2;
    ar << member3;
}

template<class Archive>
void load(Archive & ar, const unsigned int version)
{
    // invoke serialization of the base class 
    ar >> boost::serialization::base_object<base_class_of_T>(*this);
    ar >> member1;
    ar >> member2;
    if(version > 0)
        ar >> member3;
}

template<class Archive>
void serialize(
    Archive & ar,
    const unsigned int file_version 
){
    boost::serialization::split_member(ar, *this, file_version);
}

This splits the serialization into two separate functions save and load. Since the new serialize template is always the same it can be generated by invoking the macro BOOST_SERIALIZATION_SPLIT_MEMBER() defined in the header file boost/serialization/split_member.hpp . So the entire serialize function above can be replaced with:


BOOST_SERIALIZATION_SPLIT_MEMBER()

Splitting Free Functions

The situation is same for non-intrusive serialization with the free serialize function template. To use save and load function templates rather than serialize:


namespace boost { namespace serialization {
template<class Archive>
void save(Archive & ar, const my_class & t, unsigned int version)
{
    ...
}
template<class Archive>
void load(Archive & ar, my_class & t, unsigned int version)
{
    ...
}
}}

include the header file boost/serialization/split_free.hpp . and override the free serialize function template:


namespace boost { namespace serialization {
template<class Archive>
inline void serialize(
    Archive & ar,
    my_class & t,
    const unsigned int file_version
){
    split_free(ar, t, file_version); 
}
}}

To shorten typing, the above template can be replaced with the macro:


BOOST_SERIALIZATION_SPLIT_FREE(my_class)

Note that although the functionality to split the


serialize

function into save/load has been provided, the usage of the serialize function with the corresponding & operator is preferred. The key to the serialization implementation is that objects are saved and loaded in exactly the same sequence. Using the & operator and serialize function guarantees that this is always the case and will minimize the occurrence of hard to find errors related to synchronization of save and load functions.

Also note that BOOST_SERIALIZATION_SPLIT_FREE must be used outside of any namespace.

Pointers

A pointer to any class instance can be serialized with any of the archive save/load operators.

To properly save and restore an object through a pointer the following situations must be addressed:

If the same object is saved multiple times through different pointers, only one copy of the object need be saved.
If an object is loaded multiple times through different pointers, only one new object should be created and all returned pointers should point to it.
The system must detect the case where an object is first saved through a pointer then the object itself is saved. Without taking extra precautions, loading would result in the creation of multiple copies of the original object. This system detects this case when saving and throws an exception - see below.
An object of a derived class may be stored through a pointer to the base class. The true type of the object must be determined and saved. Upon restoration the correct type must be created and its address correctly cast to the base class. That is, polymorphic pointers have to be considered.
NULL pointers must be dectected when saved and restored to NULL when deserialized.

This serialization library addresses all of the above considerations. The process of saving and loading an object through a pointer is non-trivial. It can be summarized as follows:

Saving a pointer:

determine the true type of the object being pointed to.
write a special tag to the archive
if the object pointed to has not already been written to the archive, do so now

Loading a pointer:

read a tag from the archive.
determine the type of object to be created
if the object has already been loaded, return its address.
otherwise, create a new instance of the object
read the data back in using the operators described above
return the address of the newly created object.

Given that class instances are saved/loaded to/from the archive only once, regardless of how many times they are serialized with the << and >> operators

Loading the same pointer object multiple times results in only one object being created, thereby replicating the original pointer configuration.
Structures, such as collections of polymorphic pointers, are handled with no special effort on the part of users of this library.

Serialization of pointers of derived types through a pointer to the base class may require a little extra "help". Also, the programmer may desire to modify the process described above for his own reasons. For example, it might be desired to suppress the tracking of objects as it is known a priori that the application in question can never create duplicate objects. Serialization of pointers can be "fine tuned" via the specification of Class Serialization Traits as described in another section of this manual

Non-Default Constructors

Serialization of pointers is implemented in the library with code similar to the following:


// load data required for construction and invoke constructor in place
template<class Archive, class T>
inline void load_construct_data(
    Archive & ar, T * t, const unsigned int file_version
){
    // default just uses the default constructor to initialize
    // previously allocated memory. 
    ::new(t)T();
}

The default load_construct_data invokes the default constructor "in-place" to initialize the memory.

If there is no such default constructor, the function templates load_construct_data and perhaps save_construct_data will have to be overridden. Here is a simple example:


class my_class {
private:
    friend class boost::serialization::access;
    const int m_attribute;  // some immutable aspect of the instance
    int m_state;            // mutable state of this instance
    template<class Archive>
    void serialize(Archive &ar, const unsigned int file_version){
        ar & m_state;
    }
public:
    // no default construct guarentees that no invalid object
    // ever exists
    my_class(int attribute) :
        m_attribute(attribute),
        m_state(0)
    {}
};

the overrides would be:


namespace boost { namespace serialization {
template<class Archive>
inline void save_construct_data(
    Archive & ar, const my_class * t, const unsigned int file_version
){
    // save data required to construct instance
    ar << t->m_attribute;
}

template<class Archive>
inline void load_construct_data(
    Archive & ar, my_class * t, const unsigned int file_version
){
    // retrieve data from archive required to construct new instance
    int attribute;
    ar >> attribute;
    // invoke inplace constructor to initialize instance of my_class
    ::new(t)my_class(attribute);
}
}} // namespace ...

In addition to the deserialization of pointers, these overrides are used in the deserialization of STL containers whose element type has no default constructor.

Pointers to Objects of Derived Classes

Registration

Consider the following:


class base {
    ...
};
class derived_one : public base {
    ...
};
class derived_two : public base {
    ...
};
main(){
    ...
    base *b;
    ...
    ar & b; 
}

When saving b what kind of object should be saved? When loading b what kind of object should be created? Should it be an object of class derived_one, derived_two, or maybe base?

It turns out that the kind of object serialized depends upon whether the base class (base in this case) is polymophic or not. If base is not polymorphic, that is if it has no virtual functions, then an object of the type base will be serialized. Information in any derived classes will be lost. If this is what is desired (it usually isn't) then no other effort is required.

If the base class is polymorphic, an object of the most derived type (derived_one or derived_two in this case) will be serialized. The question of which type of object is to be serialized is (almost) automatically handled by the library.

The system "registers" each class in an archive the first time an object of that class it is serialized and assigns a sequential number to it. Next time an object of that class is serialized in that same archive, this number is written in the archive. So every class is identified uniquely within the archive. When the archive is read back in, each new sequence number is re-associated with the class being read. Note that this implies that "registration" has to occur during both save and load so that the class-integer table built on load is identical to the class-integer table built on save. In fact, the key to whole serialization system is that things are always saved and loaded in the same sequence. This includes "registration".

Expanding our previous example:


main(){
    derived_one d1;
    derived_two d2:
    ...
    ar & d1;
    ar & d2;
    // A side effect of serialization of objects d1 and d2 is that
    // the classes derived_one and derived_two become known to the archive.
    // So subsequent serialization of those classes by base pointer works
    // without any special considerations.
    base *b;
    ...
    ar & b; 
}

When b is read it is preceded by a unique (to the archive) class identifier which has previously been related to class derived_one or derived_two.

If a derived class has NOT been automatically "registered" as described above, an unregistered_class exception will be thrown when serialization is invoked.

This can be addressed by registering the derived class explicitly. All archives are derived from a base class which implements the following template:


template<class T>
register_type();

So our problem could just as well be addressed by writing:


main(){
    ...
    ar.template register_type<derived_one>();
    ar.template register_type<derived_two>();
    base *b;
    ...
    ar & b; 
}

Note that if the serialization function is split between save and load, both functions must include the registration. This is required to keep the save and corresponding load in syncronization.

Export

The above will work but may be inconvenient. We don't always know which derived classes we are going to serialize when we write the code to serialize through a base class pointer. Every time a new derived class is written we have to go back to all the places where the base class is serialized and update the code.

So we have another method:


#include <boost/serialization/export.hpp>
...
BOOST_CLASS_EXPORT_GUID(derived_one, "derived_one")
BOOST_CLASS_EXPORT_GUID(derived_two, "derived_two")

main(){
    ...
    base *b;
    ...
    ar & b; 
}

The macro BOOST_CLASS_EXPORT_GUID associates a string literal with a class. In the above example we've used a string rendering of the class name. If a object of such an "exported" class is serialized through a pointer and is otherwise unregistered, the "export" string is included in the archive. When the archive is later read, the string literal is used to find the class which should be created by the serialization library. This permits each class to be in a separate header file along with its string identifier. There is no need to maintain a separate "pre-registration" of derived classes that might be serialized. This method of registration is referred to as "key export". More information on this topic is found in the section Class Traits - Export Key.

Instantiation

Registration by means of any of the above methods fulfill another role whose importance might not be obvious. This system relies on templated functions of the form template<class Archive, class T>. This means that serialization code must be instantiated for each combination of archive and data type that is serialized in the program.

Polymorphic pointers of derived classes may never be referred to explictly by the program so normally code to serialize such classes would never be instantiated. So in addition to including export key strings in an archive, BOOST_CLASS_EXPORT_GUID explicitly instantiates the class serialization code for all archive classes used by the program.

Selective Tracking

Whether or not an object is tracked is determined by its object tracking trait. The default setting for user defined types is track_selectively. That is, track objects if and only if they are serialized through pointers anywhere in the program. Any objects that are "registered" by any of the above means are presumed to be serialized through pointers somewhere in the program and therefore would be tracked. In certain situations this could lead to an inefficiency. Suppose we have a class module used by multiple programs. Because some programs serializes polymorphic pointers to objects of this class, we export a class identifier by specifying BOOST_CLASS_EXPORT in the class header. When this module is included by another program, objects of this class will always be tracked even though it may not be necessary. This situation could be addressed by using track_never in those programs.

It could also occur that even though a program serializes through a pointer, we are more concerned with efficiency than avoiding the the possibility of creating duplicate objects. It could be that we happen to know that there will be no duplicates. It could also be that the creation of a few duplicates is benign and not worth avoiding given the runtime cost of tracking duplicates. Again, track_never can be used.

Runtime Casting

In order to properly translate between base and derived pointers at runtime, the system requires each base/derived pair be found in a table. A side effect of serializing a base object with boost::serialization::base_object<Base>(Derived &) is to ensure that the base/derived pair is added to the table before the main function is entered. This is very convenient and results in a clean syntax. The only problem is that it can occur where a derived class serialized through a pointer has no need to invoke the serialization of its base class. In such a case, there are two choices. The obvious one is to invoke the base class serialization with base_object and specify an empty function for the base class serialization. The alternative is to "register" the Base/Derived relationship explicitly by invoking the template void_cast_register<Derived, Base>();. Note that this usage of the term "register" is not related to its usage in the previous section. Here is an example of how this is done:


#include <sstream>
#include <boost/serialization/serialization.hpp>
#include <boost/archive/text_iarchive.hpp>
#include <boost/serialization/export.hpp>

class base {
    friend class boost::serialization::access;
    //...
    // only required when using method 1 below
    // no real serialization required - specify a vestigial one
    template<class Archive>
    void serialize(Archive & ar, const unsigned int file_version){}
};

class derived : public base {
    friend class boost::serialization::access;
    template<class Archive>
    void serialize(Archive & ar, const unsigned int file_version){
        // method 1 : invoke base class serialization
        ar & boost::serialization::base_object<base>(*this);
        // method 2 : explicitly register base/derived relationship
        boost::serialization::void_cast_register<derived, base>(
            static_cast<derived *>(NULL),
            static_cast<base *>(NULL)
        )
    }
};

BOOST_CLASS_EXPORT_GUID(derived, "derived")

main(){
    //...
    std::stringstream ss;
    boost::archive::text_iarchive ar(ss);
    base *b;
    ar >> b; 
}

In order for this template to be invoked in code compiled by non-conforming compilers, the following syntax may be used:


boost::serialization::void_cast_register(
    static_cast<Derived *>(NULL),
    static_cast<Base *>(NULL)
);

For more information, see Template Invocation syntax

References

Classes that contain reference members will generally require non-default constructors as references can only be set when an instance is constructed. The example of the previous section is slightly more complex if the class has reference members. This raises the question of how and where the objects being referred to are stored and how are they created. Also there is the question about references to polymorphic base classes. Basically, these are the same questions that arise regarding pointers. This is no surprise as references are really a special kind of pointer. We address these questions by serializing references as though they were pointers.


class object;
class my_class {
private:
    friend class boost::serialization::access;
    int member1;
    object & member2;
    template<class Archive>
    void serialize(Archive &ar, const unsigned int file_version);
public:
    my_class(int m, object & o) :
        member1(m), 
        member2(o)
    {}
};

the overrides would be:


namespace boost { namespace serialization {
template<class Archive>
inline void save_construct_data(
    Archive & ar, const my_class * t, const unsigned int file_version
){
    // save data required to construct instance
    ar << t.member1;
    // serialize reference to object as a pointer
    ar << & t.member2;
}

template<class Archive>
inline void load_construct_data(
    Archive & ar, my_class * t, const unsigned int file_version
){
    // retrieve data from archive required to construct new instance
    int m;
    ar >> m;
    // create and load data through pointer to object
    // tracking handles issues of duplicates.
    object * optr;
    ar >> optr;
    // invoke inplace constructor to initialize instance of my_class
    ::new(t)my_class(m, *optr);
}
}} // namespace ...

Arrays

If T is a serializable type, then any native C++ array of type T is a serializable type. That is, if T is a serializable type, then the following is automatically available and will function as expected:


T t[4];
ar << t;
    ...
ar >> t;

Class Serialization Traits

Serialization Wrappers

Models - Serialization Implementations Included in the Library

The facilities described above are sufficient to implement serialization for all STL containers. In fact, this has been done and has been included in the library. For example, in order to use the included serialization code for std::list, use:


#include <boost/serialization/list.hpp>

rather than


#include <list>

Since the former includes the latter, this is all that is necessary. The same holds true for all STL collections as well as templates required to support them (e.g. std::pair).

As of this writing, the library contains serialization of the following boost classes:

optional
variant
scoped_ptr
shared_ptr
auto_ptr (demo)

Others are being added to the list so check the boost files section and headers for new implementations!