Browse by Keyword: "crawler"
alexandria A storage interface to store crawled content in Elasticsearch
bas Behaviour Assertion Sheets: CSS-like declarative syntax for client-side integration testing and quality assurance.
birdeater A command-line tool for backing up a user's public Tweets in JSON format.
cobweb Web auditing and analysis framework
cobweb-queue Adds queuing functionality to Cobweb
console-crawler A simple web crawler that keeps to the domain
crawlit A node.js crawler support custom crawl rules for special site with a plugin.
crawljs A basic nodejs crawler.
crawlstream Crawl websites in a streaming fashion
easyspider mini spider.
extrae A web scraping framework written in coffeescript
flexible Easily build flexible, scalable, and distributed, web crawlers.
forage-fetch Fetch pure HTML from a webserver and save it to disk
funnelweb Detect search engine crawlers by their User-Agent strings.
google-play-search Crawls Google Play store apps website, returning results as JSON
gretel Follows and collects breadcrumbs accross the web
hcrawler a hierachical web crawler with concurrency control and server-side jQuery support
huntsman Super configurable async web spider
img-crawler A module to download images from a given URL
jdistiller A page scraping DSL for extracting structured information from unstructured XHTML, built on Node.js and jQuery.
jedi-crawler Lightsabing Node/PhantomJS crawler. Crawl almost everything, including AJAX content.
js-crawler Web crawler for Node.js
jwebquery An jQuery style web crawler(actually extend jquery).
kickstarter-crawler Crawls a given kickstarter project page. Returns over 40 data points.
krawler Fast and lightweight web crawler with built-in cheerio, xml and json parser.
listal bot to download pictures from listal.com
loki Crawl all the things
netty net tools for node.js
node-bot Fast and Real-time extraction of web pages information using node-dom (html,text,etc) based on given criterias
node-crawler Node.JS Multithreaded Web Crawler with rules to parse site
node-spider Generic web crawler powered by NodeJS
pagemunch A node.js wrapper for the PageMunch web crawler API
phantalyzer A PhantomJS script for running Wappalyzer over many sites using a headless Webkit browser
phantom-crawl Web crawler for ajax applications
rawblog-crawler DEEP BETA. Crawls pages, a bit like Jekyll
repunt Simple, configurable and extensible webcrawler
routers-news A crawler for various popular tech news sources. Read technology news from the comfort of your CLI.
salmonjs Web Crawler in Node.js to spider dynamically whole websites.
scawler A scraping crawler
scrapebp Boilerplate code for a Node.js scraper with CLI
scrapinode content driven and route based scraper
simplecrawler Very straigntforward web crawler. Uses EventEmitter. Generates queue statistics and has a basic cache mechanism with extensible backend.
snapshooter Simple crawler for Single Page Applications
spa-crawler Crawl 100% JS single page apps with phantomjs and node.
spidey Web Crawler in Node.js to spider dynamically whole websites.
steer Use steer to control your chrome (the browser)
tarantula nodejs crawler/spider which provides a simple interface for crawling the Web
trawler Express middleware to troll bots. A combination of "trolling" and "crawler", also a boat for catching a lot of fish. Gotta catch 'em all.
web-crawler Scalable, extensible, web crawler framework.
web-htmlparser web html parser, can send post form-data or url-encoded data and parser result as a table
wikifetch Uses jQuery to return a structured JSON representation of a Wikipedia article.
zoo-crawler The crawler module fetches the best general information for any url. It's an open-source port of Zootool's content detection engine