node-ckan-crawler

0.0.3 • Public • Published

node-ckan-crawler

A simple and fast NodeJS based crawler for sites powered by CKAN http://ckan.org

  • Uses the CKAN package_search Action.Get API to crawl packages / datasets

Install

npm install node-ckan-crawler

Usage

var CKANCrawler = require('node-ckan-crawler');

var crawler = new CKANCrawler();

crawler.queueSite('http://datahub.io/');
crawler.on('content', function(response, content){
  console.log('content', response.uri, content.length);
});

More examples

See more examples found in examples\

API

Events

Event: 'content'

When response received from the site has been parsed and results ready for consumption

response an http.IncomingMessage object returned from mikeal's request()

body a JSON object of the response.body

crawler.on('content', function(response, body) {
    ...
});

Event: 'beforeQueue'

When next link is ready to be added to the crawler queue. Return a non-true value to skip the link

url a string of the next link ready to be added to the crawler queue

next a callback function

crawler.on('beforeQueue', function(url, next) {
    next(true); // to add the link to the queue
    // next(false) // to skip link
});

Event: 'queued'

After a link was added to the crawler queue.

url a string of the next link ready to be added to the crawler queue

crawler.on('queued', function(url) {
    ...
});

Event: 'drain'

When crawler has drained its queue and has no more links to crawl

crawler.on('drain', function() {
    ...
});

Event: 'error'

When an error has occurred

crawler.on('error', function(err) {
    ...
});

Methods

queueSite(url)

Queue a CKAN powered site by specifying its base API url

Example:

crawler.queueSite('http://datahub.io')

Known Issues

Credits

Links

License

Copyright (c) 2014 Hafiz Ismail. This software is licensed under the MIT License.

Readme

Keywords

Package Sidebar

Install

npm i node-ckan-crawler

Weekly Downloads

2

Version

0.0.3

License

MIT

Last publish

Collaborators

  • sogko