elephant

0.1.0-2 • Public • Published

node-elephant

Elephant Logo

A fast & memory-efficient data structure that cat tell if it saw a string before

Summary

In case you need a fast, memory-efficient data structure that would remember whether or not it had seen a string before, for example you are running through a list of filename with possible duplicates that you do not want to double-process, or through long list of emails/GUIDs with possible duplicate that you wanna deal with, here is the module for you.

The Node.js/c++ binding is 2/3 times faster, and uses 6 times less memory than a JavaScript implementation. The pure c++ is even faster and uses 8 times less memory than the nodejs binding, that is 200% faster than pure JavaScript, and 50 times (5000%) more memory-efficient*. Check the Time Table for details.

* For some reason arguments passed from JavaScript to c++ lingers in memory, if you have a way to set them free let me know

Prerequisites

You will need google sparsehash, under Ubuntu you can do sudo apt-get install sparsehash.

Usage

Check the test/time-binding.js file for an example usage.

Example

var elephant = require('elephant')
 
var obj = new elephant.Elephant();
 
console.log(obj.memorize("hello")); // false: first time to be seen, increases stats_added by 1
console.log(obj.memorize("hello")); // true: seen before, increases stats_duplicates by 1
console.log(obj.remember("hello")); // true: remembered, told before to memorize it
console.log(obj.remember("hi")); // false: not remembered, as has not been told to memorize it
console.log(obj.remember("hi")); // false: still not remembered, of course
 
console.log(obj.stats()); // { added: 1, duplicates: 1 }
 
console.log('Success, because you can read this : )');

API

  • bool .memorize("string") tells the elephant to memorize a string, returns true if the string was seen before, and false otherwise.
  • bool .remember("string") ask the elephant whether it remembers a string (has seen, aka told to memorize that string before), returns true if the string was seen before, and false otherwise.
  • object .stats() returns some stats.

Time Table

Numbers between (brackets) refer to the difference between current cell value and the one to its left.

Command being timed"build/time-native 5000000""node test/time-binding.js 5000000""node test/time-node.js 5000000"
User time (seconds)10.2714.6 (142%)22.18 (152%)
System time (seconds)00.1 (Infinity%)0.41 (410%)
Percent of CPU this job got99%100%100%
Elapsed (wall clock) time (h:mm:ss or m:ss)0:10.310:14.650:22.53
Average shared text size (kbytes)00 (=)0 (=)
Average unshared data size (kbytes)00 (=)0 (=)
Average stack size (kbytes)00 (=)0 (=)
Average total size (kbytes)00 (=)0 (=)
Maximum resident set size (kbytes)44576364368 (817%)2188960 (601%)
Average resident set size (kbytes)00 (=)0 (=)
Major (requiring I/O) page faults00 (=)0 (=)
Minor (reclaiming a frame) page faults299723614 (788%)167480 (709%)
Voluntary context switches114790 (1479000%)19914 (135%)
Involuntary context switches8721253 (144%)2008 (160%)
Swaps00 (=)0 (=)
File system inputs00 (=)0 (=)
File system outputs00 (=)0 (=)
Socket messages sent00 (=)0 (=)
Socket messages received00 (=)0 (=)
Signals delivered00 (=)0 (=)
Page size (bytes)40964096 (=)4096 (=)
Exit status00 (=)0 (=)

Roadmap

  • Add more hashes options, besides the default 32-bit MurmurHash3, probably FNV-1 32-bit & 64-bit..

Copyright & License

© 2013 Hasan Arous. All rights reserved.

Used libraries are copyrighted to their owners, as seen in files comments.

Mozilla Public License Version 2.0

Dependencies (0)

    Dev Dependencies (0)

      Package Sidebar

      Install

      npm i elephant

      Weekly Downloads

      4

      Version

      0.1.0-2

      License

      none

      Last publish

      Collaborators

      • aularon