bidar

Binary Data Representation (serialization format)

npm install bidar
22 downloads in the last week
88 downloads in the last month

Bidar: Binary Data Representation.

Oh no! Not another BSON?!?!

Actually, yes.

The fact of the proliferation of serialized data formats, and of binary JSON-like formats in particular, is a fiat demonstration that there isn't actually a solid consensus for a single unified and universal format. As such, I (Danfuzz) think it's a reasonable and prudent idea to continue the exploration of the design of these things.

To be clear, I am not about to claim that the format defined here is the One True Serialized JavaScript-Like Object Format. What I will claim, though, is that this one is at a significantly different spot in the coordinate-space of possible serialization formats compared to the other ones that showed up on my radar. As such, this one may turn out to be useful to folks in cases where the other existing solutions don't seem to be particularly apropos.

Here are the properties being aimed for here:

  • reasonably space-efficient

  • single-representation (non-equal binary forms implies non-isomorphic / non-equal object forms)

  • without gratuitous quoting (e.g., look what happens when you put a JSON-encoded string inside another JSON-encoded string; avoid that!)

  • supports a buffer (binary blob) type

  • embraces as much of the JavaScript data model as is reasonably possible. For example, this format differentiates null and undefined. Why? Because you can tell the difference between them in the purely-local case, and you shouldn't necessarily have to change your code if you decide to interpose serialization (e.g., a machine boundary). I have no love of undefined myself and wish I could do away with it, but I'm not about to try to do so wishfully.

  • supports "holes" where non-pure-data elements may be filled in. More specifically, supports clients that want to be able to set up function/callback-like things for over-the-wire communication.

  • supports non-tree graphs of objects. This includes both circular structures and arbitrary acyclic graphs. Non-cycles might be tempting to convert to trees on serialization, but that would have poor interplay with the idea of "holes" (above). That is, once you have "holes," object identity starts to matter a lot more.

Non-Goals:

  • partially streamable. If you want to incrementally stream a (notionally) single structure, that can be done by defining a layer above this one.

  • random-access. If you need an index into a structure, then (as with the previous item) it can be built as an additional layer.

A few references:

Bonus

Though not directly about serialization, this module also provides a generic object iterator, which is handy for all sorts of things.

Example

One of the major reasons for the development of this module was to have the ability to sensibly manage and manipulate non-data "holes" in what is otherwise a graph of data objects, particularly so that two sides of a communication link can succeed in communicating about at least some "non-data-y things" despite not being able to actually share an address space.

In this example, imagine that the two sides (serializing and parsing) do a bit more than the trivial transformations here. Hopefully, this is enough to give a flavor of things.

(You can find this example in the example directory.)

var bidar = require("../");

function muffins() {
  return "muffins";
}

function are() {
  return "!are";
}

function tasty() {
  return "tasty";
}

function holeFilter(hole) {
  var replacement = hole();
  if (replacement[0] === "!") {
    return { value: replacement.substr(1) };
  } else {
    return { data: "[" + hole() + "]" };
  }
}

function holeReverser(hole) {
  return "<" + hole + ">";
}

var message = [ muffins, are, "very", tasty ];

// The "sending" side.
var serialized = bidar.serialize(message, holeFilter);

// The "receiving" side.
var parsed = bidar.parse(serialized, holeReverser);
console.log(parsed);

Here is a transcript of running it:

$ node example/transform.js
[ '<[muffins]>', 'are', 'very', '<[tasty]>' ]

Format Details

The format will be more fully described once it has baked a bit and seems stable. In the mean time, refer to the source. (It's got comments.)

Usage

This library provides just a few top-level functions.

serialize(root, [holeFilter])

Serialize an object graph, rooted at the given object, into binary form. Returns a buffer of the result. The hole filter, if provided is used as a transformation on "holes" encoutered in the object graph.

A "hole" is any object that is not representable as pure data. This includes:

  • functions

  • non-array non-buffer objects whose prototype is not the default object prototype

  • objects that define any dynamic properties (getters or setters)

The hole filter is called with a single argument -- the original "hole" object encountered in the graph -- and is expected to return a serializable replacement object in response tagged as either hole-replacement data or a simple value replacement, indicated by being a single-property object { data: theHoleData } or { value: theValueReplacement }. In the data case, that hole is marked explicitly as a hole in the serialized output, and upon parsing the corresponding "hole filler" will be called with that replacement data as its argument. In the latter value case, the would-be hole turns into a simple value in the serialized form, which needs no special code in order to parse it on the receiving side. If this is confusing, the example code may shed some light.

It is possible for a hole replacement to itself have holes in it. This is fine, so long as the hole replacer can successfully replace all the holes, eventually bottoming out at pure data in some form.

parse(buf, [holeFiller])

Parse a serialized form into an object graph, returning the root of the graph as originally serialized. The holeFiller, if specified, is a function which is consulted to transform "hole replacements" (see the description of serialize() for more details) back into directly-usable objects.

The entire contents of the given buffer are expeceted to be consulted in the process of parsing. If there is extra data at the end of the buffer that is not needed, then this function throws an exception.

This will also throw an exception if there is insufficient data to complete the operation.

And finally, this will throw an exception if given a serialized form that has holes, if holeFiller was not specified.

parsePartial(buf, [startIndex], [holeFiller])

This is like parse(), except that only a portion of the input buffer is expected to be used. The startIndex (defaults to 0) is the index into the buffer at which to begin parsing. The optional holeFilter is the same as with parse().

The return value of this call is a map of two values: root is bound to the root of the parsed object graph. bytesConsumed is bound to the number of bytes that were read from the buffer during the parse operation.

parseNoHead(buf, [holeFiller])

parsePartialNoHead(buf, [startIndex], [holeFiller])

serializeNoHead(root, [holeFilter])

These perform equivalent functions to their non-"no head" companions, but these variants neither produce nor expect either a fixed header or footer section.

These are useful if one is going to embed bidar-encoded data inside some other container, where that other container can unambiguously identify both itself and where to find the bidar-encoded data it contains.

iterate(root, visitor, [includeHidden])

Starting at the given root object, iterate through all references reachable from it, calling various visit* methods on the given visitor.

Some of the visit methods define an innerVisitor parameter. This is a function to call in order to continue the iteration into the element indicated by the original call. For example, when visitObject(x, innerVisitor) is called, the innerVisitor it is passed will iterate over the prototype and name bindings of x. If innerVisitor is not called, then x's contents will not be iterated over. innerVisitor may either be called with no arguments, in which case further callbacks get made to the original visitor passed in to iterate(), or with a single argument to use as a replacement visitor.

The includeHidden argument, if specified, is taken to be a boolean and indicates whether (true) or not (false) to include hidden (non-enumerable) object properties in the iteration. It defaults to false. Note that, even when true, the length property of arrays will never get included. (It's particularly special.)

Here is a rundown of the various called methods:

  • visitObject(obj, innerVisitor) -- called on non-array non-function objects.

  • visitObjectPrototype(obj, proto, innerVisitor) -- called for non-array non-function object prototypes as part of the inner visit to an object. It won't be called for the default object prototype (e.g. the prototype of the object returned by {}). If called, it will be the first call made during an inner visit.

  • visitObjectBinding(obj, name, props, innerVisitor) -- called for arbitrary non-index bindings of any object (regular, array, or function). Always called in sorted order of binding names. props is the return value from a call to Object.getOwnPropertyDescriptor.

  • visitObjectGetter(obj, name, getter, innerVisitor) -- called for synthetic object properties that have a defined getter. getter is the getter function. This is called from an object binding inner visitor.

  • visitObjectSetter(obj, name, setter, innerVisitor) -- called for synthetic object properties that have a defined setter. setter is the setter function. This is called from an object binding inner visitor.

  • visitArray(arr, innerVisitor) -- called on array objects.

  • visitArrayElement(arr, index, value, innerVisitor, missing) -- called on array elements, in index order. This is called multiple times from an array inner visitor, and is always called before visitObjectBinding() is called for the named object bindings. The missing parameter (placed after innerVisitor both so that it is easily ignored and so that the innerVisitor is always the fourth argument for object binding iteration calls) is a boolean that indicates if an array element is actually missing (as opposed to being set to undefined); it will only ever be true if value is undefined.

  • visitFunction(func, innerVisitor) -- called on function objects.

  • visitString(str) -- called on strings.

  • visitNumber(num) -- called on numbers.

  • visitBoolean(bool) -- called on booleans.

  • visitUndefined() -- called when the undefined value is encountered.

  • visitNull() -- called when the null value is encountered.

createVisitorProxy(proxyFunction)

Construct an iteration visitor proxy which calls through to the given function for every visitor method. See iterate() above for details on the visitor methods.

The proxy function is always called as proxyFunction(name, args) where name is the original visitor function name and args is an array (per se) of the original arguments to the visitor.

This is useful if you want to implement homogeneous visitor behavior (e.g. wrapping some other visitor).

Building and Installing

npm install bidar

Or grab the source and

npm install

Testing

npm test

Or

node ./test/test.js

To Do

Known Deficiencies:

  • If there are extra properties on Buffer objects, they get silently ignored. E.g.: On serialization, foo will get dropped with this definition of x: x = new Buffer(10); x.foo = "bar";

  • Regex objects are currently considered holes. They should perhaps be handled specially (as Buffer objects currently are).

  • Date objects are currently considered holes. They should perhaps be handled specially (as Buffer objects currently are).

Contributing

Questions, comments, bug reports, and pull requests are all welcome. Submit them at the project on GitHub.

Bug reports that include steps-to-reproduce (including code) are the best. Even better, make them in the form of pull requests that update the test suite. Thanks!

Author

Dan Bornstein (personal website), supported by The Obvious Corporation.

License

Copyright 2012 The Obvious Corporation.

Licensed under the Apache License, Version 2.0. See the top-level file LICENSE.txt and (http://www.apache.org/licenses/LICENSE-2.0).

npm loves you