disruptor

Distributed real-time computation system.

npm install disruptor
8 downloads in the last week
16 downloads in the last month

disruptor

disruptor

disruptor intends to be a distributed realtime computation system for node.js. disruptor makes it easy to process unbounded streams of data across many machines. It has minimal configuration requirements and no single point of failure. It is still under early development.

Nodes are started by being pointed at another peer and they quickly find all the other nodes in the network. Workers are written in Javascript as independant node.js applications which get spawned by each node in the cluster. As work in the form of json payloads over http requests comes in, it gets distributed amongst the live workers. Results come back via json payloads as responses to the http requests.

There is no master peer, monitoring node or other single point of failure. The design stresses simplicity wherever possible and requires a minimum of setup.

Install

npm install -g disruptor

or

git clone https://github.com/anders94/disruptor.git
npm install

Usage

The application takes an IP and port on which to listen and the IP and port of some other peer on the network. All the peers will find each other and stay in communication as peers enter and leave the network.

disruptor peer myHost:myPort anotherHost:anotherPort

Example

In the first shell:

disruptor peer 127.0.0.1:1111 127.0.0.1:22222

In the second shell:

disruptor peer 127.0.0.1:2222 127.0.0.1:11111

The processes should find each other. Start a few more and point each to one of the live nodes in the network and they should all find each other.

To see what other nodes the first disruptor peer knows about, visit it with a web browser:

http://127.0.0.1:1111

Usually this is done machine to machine with network accessible IP addresses, not all on the same host as in this example.

Creating Worker Apps

Workers run code that lives in app directories under apps/ (for example apps/wordcount) and respond to:

process.on('message', function() { ... })

They emit results with:

process.send( ... );

For example, here is a word counting worker:

var natural = require('natural'),
    tokenizer = new natural.WordTokenizer();

process.on('message', function(message) {
        var total = 0, unique = 0;
        var hash = {};
        var ary = tokenizer.tokenize(message);
        for (var id in ary) { // throw stemmed word into hash
            hash[natural.PorterStemmer.stem(ary[id])] = true;
            total ++;
        }

        for (var key in hash) // count unique word stems
            unique ++;

        process.send({ message: message, total: total, unique: unique });
    });

With this example, this input:

The First World War was to be the war to end all wars.

creates this output:

{ message: 'The First World War was to be the war to end all wars.',
    total: 13,
   unique: 9 }

Worker apps, once started, run continuously and can send responses at any time. Any number of differently named workers can run on the same node at the same time.

Note: Any npm packages used in worker apps need to be installed on every node. Disruptor will do this automatically* if you install the modules locally to each app (ie: apps/wordcount/npm_modules for the above example) although a standard 'npm install' will put them in disruptor's npm_modules. This will work but the code will not be automatically distributed to other nodes by disruptor so you would have to do that by hand.

Note: This functionality is under active developed.

Starting Workers

You start workers by telling one of the nodes to tell all the peers it knows about to start a particular application.

disruptor start 127.0.0.1:1111 apps/wordcount/counter

Stopping all the workers is done similarly.

disruptor stop 127.0.0.1:1111 apps/wordcount/counter

Note: Code is not yet distributed automatically. You have to sync the app directory with all the peers. A good command to use for this is rsync for the time being:

rsync -ae ssh ~/disruptor/apps 1.2.3.4:~/disruptor

In the future, starting a job will first make sure it runs locally, package it up into a compressed archive, distribute it and then start it on all known peers.

Note: This functionality is under active developed.

Sending Compute Tasks to Workers

You can send json payloads to be processed to any node in the cluster through an HTTP socket connection. The task will be sent to a random worker and responses will flow back the same way.

disruptor send 127.0.0.1:1111 apps/wordcount/counter \
"{'the quick brown fox jumped over the lazy dog'}"

Alternatively, you can send requests directly via HTTP:

$ curl -X POST -H "content-type: application/json" \
    http://localhost:8098/mapred --data @-<<\EOF
{'the quick brown fox jumped over the lazy dog'}
EOF

JSON results come back as expected.

{ message: 'the quick brown fox jumped over the lazy dog',
    total: 9,
   unique: 8 }

Note: This functionality is under active development.

Author

Anders Brownworth

Please get in touch if you would like to contribute.

Are You Using This?

I started this project to do distributed natural language processing and machine learning. However, I'm sure the need to massively distribute node.js processing exists for many other jobs. I'm interested in solving real-world problems with disruptor so it is useful to know what jobs it is or isn't solving. Please tweet http://twitter.com/anders94 or otherwise get in touch.

Copyright 2013 Anders Brownworth

Licensed under the Apache License, Version 2.0 (the "License"); you may not use this work except in compliance with the License. You may obtain a copy of the License in the LICENSE file, or at:

http://www.apache.org/licenses/LICENSE-2.0

Unless required by applicable law or agreed to in writing, software distributed under the License is distributed on an "AS IS" BASIS, WITHOUT WARRANTIES OR CONDITIONS OF ANY KIND, either express or implied. See the License for the specific language governing permissions and limitations under the License.

npm loves you