On Reflection and Serialization

Home

What It's All About

Who Is This Guy?

The List

Complete Archive

Personal

RSS Feed

People

On Reflection and Serialization

By Kyle Wilson
Monday, June 17, 2002

If I could add just one feature to standard C++, it would be support for reflection. Reflection, also known as introspection or the use of metaclasses, is described in A System Of Patterns (yes, the other pattern book) far better than I could sum it up here. Suffice it to say that it refers to the ability of an object-oriented language to generate objects that describe classes. Most languages that offer reflection also seem to offer the ability to construct new classes (not objects, classes) at runtime, which certainly wouldn't be appropriate for a language designed to be as efficient and type-safe as C++.

What I'd really like to see in C++ is a reflection mechanism which allows access to type information from template functions, so I could iterate through a class's member variables like

template <class T>
void Save(const T& t, CoreStream* stream)
{
     foreach (X in T)
          stream.Write(t.X);
}

and have the iteration unroll at template instantiation time to a series of stream.Write calls to the appropriate functions based on the type of T's member variables. This is certainly information that's available to the compiler. It's just not exposed to programmers.

Serialization Through Reflection

What I'd like to do with this is automatically serialize objects. Serialization is the ability to write objects to and read objects from byte streams. In game terms, serialization encompasses export/import of static state, saving/loading of dynamic state, and transmitting/receiving of state information over a network. A reflection-based serialization scheme would allow me to write classes that look like

class Example
{
public:
     /* Methods */

private:
     CoreSerial<int, kExport | kSave>             m_testInt;
     CoreSerial<float, kSave>                         m_dynamicFloat;
     CoreSerial<bool, kSave | kNetwork>      m_networkState;
};

Where CoreSerial is simply defined as a templatized thin wrapper around any type

template <typename T, int F>
class CoreSerial
{
public:
     CoreSerial(const T& t) : m_t(t)  { }
     operator T()      { return m_t; }

private:
     T m_t;
};

and kExport, kSave and kNetwork are defined as bitmasks. On iteration through the members of Example, I could process each member variable appropriately based on the flags associated with it. If an object was flagged for saving, I could save it. If not, I could ignore it.

Unfortunately, I can't iterate through the member variables of a class. I can still make my CoreSerial template work, but it's going to be much more expensive.

Self-Registering Serialized Variables

I can make CoreSerial work by doing per object at runtime what really only ought to be done per class at compile time. I'd like to access at compile time the offsets for each member variable of class Example. Instead, my only option is to access those variables directly. I can create a singleton to manage all serialized state in the scene and have CoreSerial's constructor call CoreSerialManager::Instance()->Register(this, sizeof(T), F) and have CoreSerial's destructor call the equivalent Unregister function. (This approach owes a great deal to Adrian Stone, who first proposed something like it -- hsSaveable<int> -- when we were working at HeadSpin.)

The cost of this is an extra function call for every serialized variable in your game, paid up front as objects are constructed, and again every time an object is deleted. CoreSerialManager will keep a vector of pointers and byte sizes that needs to be grown for each new serialized variable and, worse, searched and settled for each member of each deleted object.  CoreSerialManager will need the space to store that vector, which holds pointer and size information to each variable. If most variables are 32-bit ints and floats, storage requirements for your game objects will triple. Storing pointers and sizes in a list instead of a vector might make object deletion faster, but at a cost of even more pointers for each link in the list, doubling storage requirements for serialized variables yet again (if you use std::list).

Despite the space requirements and initialization overhead, this approach has a couple advantages that the other approaches I'm about to describe lack. First, CoreSerialManager can detect continuous blocks of serializable data and (on platforms where internal byte-order is the same as byte-order on disk) group them in single read/write calls for more efficient I/O. Two, all you need to do to serialize a variable is declare it properly. Everything else is done for you automatically.

Explicit Serialization

If we don't want to pay for all those list insertions and deletions at object construction and destruction time, then we're stuck writing an actual serialization function for every class. You should probably derive every serialized class from something like

class CoreSerializable
{
public:
     virtual ~Serializable()     { }
     virtual void Serialize(CoreSerializer* serializer) = 0;
};

where CoreSerializer is defined as

class CoreSerializer
{
     virtual ~CoreSerializer()     { }
     virtual void Process(float& rhs);
     virtual void Process(int& rhs);
     virtual void Process(bool& rhs);
     /* ... */
};

and serializers are derived from CoreSerializer to read from a file, write to a file, read from a network stream, write to a network stream and so forth. Separate Read and Write calls would let us be const correct, but double the amount of typing to be done as well as making it much more likely that Read and Write would get out of synch. I'm leaving out the flags fields for each variable, but it would be just as easy to flag variables for different types of serialization in this scheme as in those I discussed before -- probably even easier.

The coding and maintenance cost of all this is that every serializable class now has a Serialize function that needs to be kept up to date as new variables are added to the class. The runtime cost is, at least, only paid at serialization time now. At that point, a virtual function call is made for each serialized variable. That's probably less costly than the fact that the serializer no longer has sufficient context to aggregate variables for bulk reading/writing.

This is probably the serialization approach I'd use in most cases, but there's one more worthy of mention.

Full Metaclass Implementation

The heavyweight solution to serialization, of course, is to fully encapsulate each variable in a hold-anything structure and create "classes" that are really trees of value structs assembled at runtime. Reflection in this case reflects not C++ classes, but external "class" descriptions read from data files or script. Detlef Vollmann's article, "Metaclasses and Reflection in C++", describes one implementation of such a scheme.

As far as I can tell, Dungeon Siege and Unreal Engine games both use an approach like this to store all script-based objects. I'm not sure if script persistence alone is enough to completely reproduce their world state, or if they also store some amount of hard-coded engine state as well. I'm a little skeptical about the efficiency of this approach in a true general-purpose engine -- in Plasma, we had a lot of (possibly) dynamic state, and I can only imagine that engines like NetImmerse and Alchemy have even more. I've never worked on an engine that pushed as much state into script objects as Unreal does, though, so I can only theorize.

I'm Kyle Wilson. I've worked in the game industry since I got out of grad school in 1997. Any opinions expressed herein are in no way representative of those of my employers.

Home