alexandria A storage interface to store crawled content in Elasticsearch

bas Behaviour Assertion Sheets: CSS-like declarative syntax for client-side integration testing and quality assurance.

birdeater A command-line tool for backing up a user's public Tweets in JSON format.

cobweb Web auditing and analysis framework

cobweb-queue Adds queuing functionality to Cobweb

console-crawler A simple web crawler that keeps to the domain

crawlit A node.js crawler support custom crawl rules for special site with a plugin.

crawljs A basic nodejs crawler.

crawlstream Crawl websites in a streaming fashion

easyspider mini spider.

extrae A web scraping framework written in coffeescript

flexible Easily build flexible, scalable, and distributed, web crawlers.

forage-fetch Fetch pure HTML from a webserver and save it to disk

funnelweb Detect search engine crawlers by their User-Agent strings.

google-play-search Crawls Google Play store apps website, returning results as JSON

googlebot Express middleware that returns the resulting html after executing javascript, allowing crawlers to read on the page

gretel Follows and collects breadcrumbs accross the web

hcrawler a hierachical web crawler with concurrency control and server-side jQuery support

huntsman Super configurable async web spider

img-crawler A module to download images from a given URL

jdistiller A page scraping DSL for extracting structured information from unstructured XHTML, built on Node.js and jQuery.

jedi-crawler Lightsabing Node/PhantomJS crawler. Crawl almost everything, including AJAX content.

js-crawler Web crawler for Node.js

jwebquery An jQuery style web crawler(actually extend jquery).

kickstarter-crawler Crawls a given kickstarter project page. Returns over 40 data points.

krawler Fast and lightweight web crawler with built-in cheerio, xml and json parser.

listal bot to download pictures from

loki Crawl all the things

netty net tools for node.js

node-bot Fast and Real-time extraction of web pages information using node-dom (html,text,etc) based on given criterias

node-crawler Node.JS Multithreaded Web Crawler with rules to parse site

node-spider Generic web crawler powered by NodeJS

pagemunch A node.js wrapper for the PageMunch web crawler API

phantalyzer A PhantomJS script for running Wappalyzer over many sites using a headless Webkit browser

phantom-crawl Web crawler for ajax applications

rawblog-crawler DEEP BETA. Crawls pages, a bit like Jekyll

repunt Simple, configurable and extensible webcrawler

routers-news A crawler for various popular tech news sources. Read technology news from the comfort of your CLI.

salmonjs Web Crawler in Node.js to spider dynamically whole websites.

scawler A scraping crawler

scrapebp Boilerplate code for a Node.js scraper with CLI

scrapinode content driven and route based scraper

simplecrawler Very straigntforward web crawler. Uses EventEmitter. Generates queue statistics and has a basic cache mechanism with extensible backend.

snapshooter Simple crawler for Single Page Applications

snapshoter Recusively loads javascript pages and render then to plain html files

spa-crawler Crawl 100% JS single page apps with phantomjs and node.

spidey Web Crawler in Node.js to spider dynamically whole websites.

steer Use steer to control your chrome (the browser)

tarantula nodejs crawler/spider which provides a simple interface for crawling the Web

trawler Express middleware to troll bots. A combination of "trolling" and "crawler", also a boat for catching a lot of fish. Gotta catch 'em all.

web-crawler Scalable, extensible, web crawler framework.

web-htmlparser web html parser, can send post form-data or url-encoded data and parser result as a table

wikifetch Uses jQuery to return a structured JSON representation of a Wikipedia article.

zoo-crawler The crawler module fetches the best general information for any url. It's an open-source port of Zootool's content detection engine

