Browse by Keyword: "crawler"
alexandria A storage interface to store crawled content in Elasticsearch
bas Behaviour Assertion Sheets: CSS-like declarative syntax for client-side integration testing and quality assurance.
birdeater A command-line tool for backing up a user's public Tweets in JSON format.
console-crawler A simple web crawler that keeps to the domain
crawljs A basic nodejs crawler.
crawlstream Crawl websites in a streaming fashion
easyspider mini spider.
flexible Easily build flexible, scalable, and distributed, web crawlers.
forage-fetch Fetch pure HTML from a webserver and save it to disk
funnelweb Detect search engine crawlers by their User-Agent strings.
google-play-search Crawls Google Play store apps website, returning results as JSON
gretel Follows and collects breadcrumbs accross the web
hcrawler a hierachical web crawler with concurrency control and server-side jQuery support
huntsman Super configurable async web spider
img-crawler A module to download images from a given URL
jdistiller A page scraping DSL for extracting structured information from unstructured XHTML, built on Node.js and jQuery.
jedi-crawler Lightsabing Node/PhantomJS crawler. Crawl almost everything, including AJAX content.
js-crawler Web crawler for Node.js
jwebquery An jQuery style web crawler(actually extend jquery).
krawler Fast and lightweight web crawler with built-in cheerio, xml and json parser.
listal bot to download pictures from listal.com
loki Crawl all the things
netty net tools for node.js
node-bot Fast and Real-time extraction of web pages information using node-dom (html,text,etc) based on given criterias
node-crawler Node.JS Multithreaded Web Crawler with rules to parse site
node-spider Generic web crawler powered by NodeJS
pagemunch A node.js wrapper for the PageMunch web crawler API
phantalyzer A PhantomJS script for running Wappalyzer over many sites using a headless Webkit browser
phantom-crawl Web crawler for ajax applications
rawblog-crawler DEEP BETA. Crawls pages, a bit like Jekyll
routers-news A crawler for various popular tech news sources. Read technology news from the comfort of your CLI.
salmonjs Web Crawler in Node.js to spider dynamically whole websites.
scawler A scraping crawler
scrapebp Boilerplate code for a Node.js scraper with CLI
scrapinode content driven and route based scraper
simplecrawler Very straigntforward web crawler. Uses EventEmitter. Generates queue statistics and has a basic cache mechanism with extensible backend.
snapshooter Simple crawler for Single Page Applications
spidey Web Crawler in Node.js to spider dynamically whole websites.
steer Use steer to control your chrome (the browser)
tarantula nodejs crawler/spider which provides a simple interface for crawling the Web
trawler Express middleware to troll bots. A combination of "trolling" and "crawler", also a boat for catching a lot of fish. Gotta catch 'em all.
web-crawler Scalable, extensible, web crawler framework.
wikifetch Uses jQuery to return a structured JSON representation of a Wikipedia article.
zoo-crawler The crawler module fetches the best general information for any url. It's an open-source port of Zootool's content detection engine