head-spider

URL crawler and web content analyser.

The idea is to create an instance of the crawler, add one or more URLs to it along with one or more response/document processors. When the crawler has no more URLs in its queue, it finished.

This can form the basis of a technical SEO crawler, or any other content crawler/scraper.

When a page has been fetched, a series of "processors" are run over it to extract structured data.

After all the processors have finished, the "analysers" are run, which can look for things like missing IMG Alt text, out-of-sequence heading elements, whatever you want.

You can easily add your own processors and analysers.

This is still in early development as I'm working on the test suite and setting up some basic document processors.

You can run the test suite with npm run test.

@headwall/url-crawler

head-spider

Readme

Keywords

Package Sidebar

Install

Weekly Downloads

Version

License

Unpacked Size

Total Files

Last publish

Collaborators

@headwall/url-crawler

head-spider

Readme

Keywords

Package Sidebar

Install

DownloadsWeekly Downloads

Version

License

Unpacked Size

Total Files

Last publish

Collaborators

Weekly Downloads