"Google Protocol Buffers"

Posted by Ryan C. Scott on Fri 13 July 2012

Recently I got around to actually experimenting with Google Protocol Buffers (protobufs). The experiment didn't take long, is over, and now I will put protobufs into anything you let me... on the serious. I will throw out the 90% of your data file processing code that you don't actually need and go right to Protobufs Town.

The Basics:

Protobufs work by way of code generation. A proto file defines the various data types you will be creating (i.e. "Messages") and the protobufs compiler (protoc) generates the code to include in your project.

C++, Java, and Python are all supported out of the box and there's a plugin architecture to hook into if you needed it for another language (although admittedly I've not dealt with any of that yet).

It is worth noting that Python 3 is not yet supported. This was a shame when it came to integrating with the latest Blender (2.6)

package muhdata;

message Vec3 
{
    optional float x = 1;
    optional float y = 2;
    optional float z = 3;
}

message Transform
{
    optional Vec3 position = 1;
    optional Vec3 rotation = 2;
    optional float scale = 3 [ default = 1 ];
}

message TestObject
{
    repeated Transform trans = 1;
}
protoc --cpp_out=. test.proto

Basic Usage

#include "test.pb.h"

using namespace google::protobuf;

void TestStuff( const char *data, int size )
{
    muhdata::TestObject *obj = new muhdata::TestObject();
    trans->ParseFromArray( data, size );

    // Now we can use the sweet, sweet data inside...
    for( int i = 0; i < obj->trans_count(); ++i ) {
        muhdata::Transform &thisTrans = obj->trans( i );
        cout<< "X: " << thisTrans.position().x() << endl;
        cout<< "Scale: " << thisTrans.scale() << endl;
    }

}

The above example is hardly feature complete, nor would I guarantee that it would compile, but gives you a sense of what's going on.

Textual Format:

Initially looking at protobufs roughly two years ago I was extremely put off by the fact that the data files were binary. Binary files for data that doesn't inherently need to be binary without fail makes your version control software cry itself to sleep. I was surprised that there was not an option to use a text based format in addition to the binary one.
At the time they had a text format that was recommended to only use for debugging, but could only be written. Since then they've extended it to be able to both read and write it. I know this works for the C++ generated code and that it's not implemented in Python yet. I'm not sure what the state of the generated Java code is.

test_data.text

trans {
    scale: 1
    position {
        x: 5
        y: -3
    }
    rotation {
        y: 90
    }
}

trans {
    scale: 5
    position {
        x: 0
        y: 3
        z: -2
    }
}

protoc can help you to encode/decode between the text format and the binary format as well as dump the values out of raw binary files

This provides a damn near perfect solution to the whole thing. You get your source control friendly, human readable/writable text format, a highly compressed binary format, and a modest set of simple tools to poke around in both with.

Nuanced Results:

An interesting side effect of not dealing with the minutia of data parsing is that you're freed up just enough to really think about the organization of your data and the ways in which you will actually be using it.

A huge part of that is the lowered cost in time and effort of refactoring. With that pile of data parsing code that I previously alluded to, you, and really not just you but anyone that values their time at all, would think twice before changing up their data formats, even in the cases where those changes would just be removing now unnecessary components.

This reluctance itself is a byproduct of the fact that organically grown code, which, let's face it, is most likely going to be your data file parsing and similar, almost inherently become layered, britle, and rife with booby traps of sorts over time.

Pulling one thing can throw off a size calculation somewhere else, etc., etc. Granted that's a nasty case you've got there when you started writing that code without thinking deeply about your data formats and long-term needs, but please don't try to put up a front as if you've never gotten yourself into that situation. Lying like that is awkward and not becoming of you at all.

With an abstraction like protobufs in place those changes are somewhat trivial. Your "Business Logic" may change, but you now have a uniform way to access the pieces of data squirreled away in binary chunks.

Left-Over Thoughts

A potentially non-obvious usage of protobufs is embedding encoded binary messages inside of other file formats. In one particular case I wanted to extend my geometry file format, but not ditch what was there completely and do all of it with protobufs. It turned out to be trivial to insert a protobuf into the file format and the generated code for dealing with that was easily incorporated both into the engine itself and the tools.

It had not dawned on me that there would be version issues with Python in Blender. That was a damn shame, but generating the text format by hand hasn't been the worst thing that's ever happened to me.

-r