Typewise structured sorting for arbirarily complex data structures
Want to see pretty graphs? Log in now!
npm install typewise
|2||downloads in the last week|
|23||downloads in the last month|
|Last Published By|
|Version||0.7.0 last updated 5 months ago|
|Keywords||sort, order, collation, leveldb, indexeddb, couchdb|
|Dependencies||bops, ses, esprima, escodegen, buffertools|
Typewise structured sorting for arbirarily complex data structures. This package defines and imlements the collation used by bytewise (and eventually elementwise -- an array-based version of bytewise suitable for CouchDB and IndexedDB).
The actual ordering is subject to change, although ideally it will remain a superset of the type-based collations defined by CouchDB and IndexedDB.
But this isn't nearly as bad as it sounds. We're not trying to define anything fancy like type checking, just the structures of our types, how they should be sorted and what constitutes equality.
We're leaning on some pretty heavyweight dependencies to support a very uncommon use case (parsing and reviving functions). This functionality could be pushed into another package and typewise could be completely agnostic to these parsing and reviving concerns for functions and even complex collections like Map and Set and friends.
The ordering chosen for some of the types is somewhat arbitrary. It is intentionally structured to support those sorts defined by CouchDB and IndexedDB but there might be more logical placements, specifically for BUFFER, SET, and FUNCTION, which aren't defined in either. It may be beneficial to fully characterize the distinctions between collections that affect collation.
One possible breakdown for collection types:
- unordered set (order unimportant and thus sorted using standard collation)
- unordered multiset, duplicates allowed
- chain (ordered set) (order-preserving with distinct values)
- ordered multiset, duplicates allowed (a list)
- unordered map (keys as unordered set, objects are string-keyed maps)
- unordered multimap (keys as unordered multiset), duplicates allowed
- ordered map (keys as ordered set)
- ordered multimap (keys as ordered multiset), duplicates allowed
Perhaps we should always allow duplicates, and have the prevention of duplicates be a enforced at runtime by a schema of some sort.
There is a third characterizing bit: whether or not duplicates are allowed. The effect this has on sort is very localized, only for breaking ties between two otherwise identical keys -- otherwise records are completely interwoven when sorted, regardless of whether duplicates are allowed or not.
We may want unique symbols to signal these characterizing bits for serialization.
We probably want hooks for custom revivers.
Sparse arrays could be modeled with sorted maps of integer keys, but should we use a trailer to preserve this data?
This is very close to a generalized total order for all possible data structure models.
Encoding and decoding is surely slower than the native
JSON functions, but there is plenty of room to narrow the gap. Once the serialization stabilizes a C port should be straitforward to narrow the gap further.
Where this serialization should really shine is streaming. Building a stream from individually encoded elements should require little more than strait concatenation, and parsing a stream would be the same as parsing an array. Parsing is a little more complex than msgpack and many other binary encodings because we have to use termination characters, not length specifiers, for certain types to keep from screwing up our collation invariants. This also means we have to do a little escapement dance, which costs a little extra too.