simplemapreduce

Simple MapReduce implementation, written in JavaScript

npm install simplemapreduce
12 downloads in the last month

SimpleMapReduce

Simple MapReduce implementation, written in JavaScript.

Installation

Via npm on Node:

npm install simplemapreduce

Usage

Reference in your program:

var simplemapreduce = require('simplemapreduce');

Run

Synchronous run

simplemapreduce.runSync(items, mapfn, newfn, processfn);

where

  • items: to be processed. In the current version, it's an object with forEach function defined.
  • mapfn(item): given an item to be processed, it returns it's associated key..
  • newfn(item, key): given a new key, it returns the new object to be associated with that key.
  • processfn(item, result, [key, map]): process an item, usually modifying its associated result object. In addition, it could receive and use the associated key and the map, the dictionary that is being build by the process.

Example

var result = simplemapreduce.runSync(
    ["A", "word", "is", "a", "word"], 
    function (item) { return item.toLowerCase(); },
    function (item, key) { return { count: 0 }; },
    function (item, result) { result.count++; }
);
console.dir(result);

Output

{ a: { count: 2 }, word: { count: 2 }, is: { count: 1 } }

There is a run with callback:

simplemapreduce.run(items, mapfn, newfn, processfn);

under development. Current implementation internally uses runSync. Example:

simplemapreduce.run(
    ["A", "word", "is", "a", "word"], 
    function (item) { return item.toLowerCase(); },
    function (item, key) { return { count: 0 }; },
    function (item, result) { result.count++; },
    function (result) {
        console.dir(result);
    }
);

Run Task

Alternatively, you can define a task, an object with functions:

  • getItems(): return the items to be processed.
  • getKey(item): maps an item to its associated key.
  • getResult(item, key): creates a new object/value to be associated to the key/item. Usually it's used to accumulate results.
  • processItem(item, result, [key, map]): function that process an item, usually updating the result object.

Example:

var task = {
    items: ["A", "word", "is", "a", "word"], 
    getItems: function () { return this.items; },
    getKey: function (item) { return item.toLowerCase(); },
    getResult: function (item, key) { return { count: 0 }; },
    processItem: function (item, result) { result.count++; }
};

simplemapreduce.runTask(task, function (result) { console.dir(result); });

Notice that in this case, getItems returns items defined in the same task. You can provide a more complex function, i.e. reading an stream or file.

Development

git clone git://github.com/ajlopez/SimpleMapReduce.git
cd SimpleMapReduce
npm install
npm test

Samples

Words Word Count sample with callback.

Words Sync Synchronous Word Count sample.

Task Run Task sample with callback.

Task Sync Synchrnous Run Task.

To do

  • Improve async procesing
  • Distributed sample

Versions

  • 0.0.1 : Published
  • 0.0.2 : Under development

Contribution

Feel free to file issues and submit pull requests � contributions are welcome.

If you submit a pull request, please be sure to add or update corresponding test cases, and ensure that npm test continues to pass.

npm loves you