Table of Contents
Spider parser
Parses the spider output from wget into an object structure of links.
This object could then be processed further to create a tree structure of the hierarchy of a website such that sitemap generation could be implemented.
Tested using wget v1.15
on linux.
Usage
var parser = buf = 0; // buffer should contain the spider outputconsoledir;
parser.Parser
: The parser class.parser.Link
: The class that represents a link.parser.ParseStream
: Parse stream class.
Streams support is available, see the test spec for example usage.
wget-parser
A program that reads from stdin
and prints the result of the parse as JSON, exits with error code 1 if any broken links are found.
cat test/fixtures/mock.txt | wget-parser
cat test/fixtures/broken.txt | wget-parser; echo $?;
wget-spider
A program that performs a spider with wget and pipes the output to wget-parser
:
wget-spider http://google.com
Output
Example output from the parser:
Developer
Test
To run the test suite:
npm test
Cover
To generate code coverage run:
npm run cover
Lint
Run the source tree through jshint and jscs:
npm run lint
Clean
Remove generated files:
npm run clean
Readme
To build the readme file from the partial definitions:
npm run readme
Generated by mdp(1).