@headwall/url-crawler

0.2.7 • Public • Published

head-spider

URL crawler and web content analyser.

The idea is to create an instance of the crawler, add one or more URLs to it along with one or more response/document processors. When the crawler has no more URLs in its queue, it finished.

This can form the basis of a technical SEO crawler, or any other content crawler/scraper.

When a page has been fetched, a series of "processors" are run over it to extract structured data.

After all the processors have finished, the "analysers" are run, which can look for things like missing IMG Alt text, out-of-sequence heading elements, whatever you want.

You can easily add your own processors and analysers.

This is still in early development as I'm working on the test suite and setting up some basic document processors.

You can run the test suite with npm run test.

Readme

Keywords

Package Sidebar

Install

npm i @headwall/url-crawler

Weekly Downloads

11

Version

0.2.7

License

MIT

Unpacked Size

18.3 kB

Total Files

13

Last publish

Collaborators

  • headwall