tiny-scraper
a simple web scraper, friendly usage.
Feature
- request will be queued, configurable request frequency and delay.
- page parse logic can be customed base on url route.
Dependencies
- flyd
- transducers.js
- path-to-regexp
- co
Install
npm install tiny-scraper
API
createRouter
return a router to parse specified page.
const createRouter = ;const router = ;
router.match
match a site base uri, return a function to filter urls in this site. please refer to path-to-regexp document for route expression format.
Parameters
- baseUri
const matchGithub = router ;
createScraper
create a scraper.
Parameters
- options a object contains config fields.
- maxRequest max requests count paralleled.
- requestDuration min request duration, if request completed early, will wait until specified duration.
- router you implemented router.
- downloader method to request page, config => responsePromise. example: axios.request
const createScraper = ;const scraper = ; scraper
scraper.task$
task input stream. you can send seed url or resend failed request into this steam.
Parameters
- input a array of request config. please refer to axios document.
scraper.running$
current running tasks.
scraper.requestError$
failed request stream.
scraper.routeError$
route execute error. you can debug you route code by this scream.