SimpleScraper
SimpleScraper define parsers.
Installation
Via npm on Node:
npm install simplescraper
Usage
Reference in your program:
var ss = require('simplescraper');
Create a document:
var doc = ss.document(doctext);
Find and process elements:
var elems = doc.elements();
for (var elem = elems.next(); elem; elem = elems.next()) {
// process element
}
Find elements by tag:
var elems = doc.elements('div');
Find elements by class:
var elems = doc.elements('.news');
Find elements by id:
var elems = doc.elements('#content');
Combined filters:
var elems = doc.elements('div .news');
Find first element (or null):
var firstelem = doc.element('div');
var firstelem = doc.element('.news');
var firstelem = doc.element('#content');
Filter elements:
var elems = doc.elements(function (element) { return element.attribute('style') != null; });
Get attribute in an element (or null value):
var myattr = elem.attribute('myattr');
Get attributes (as a plain JavaScript object, each attribute name is a property):
var attrs = elem.attributes();
// { class: 'news', type: 'text', ... }
Get element tag name as string:
var tagname = elem.tag();
Development
git clone git://github.com/ajlopez/SimpleScraper.git
cd SimpleScraper
npm install
npm test
Samples
References
- HTML as an Application of SGML
- Names A name consists of a letter followed by letters, digits, periods, or hyphens...Element and attribute names are not case sensitive, but entity names are...
- Attributes
Versions
- 0.0.1: Published
- 0.0.2: Published, new examples, internal refactor
License
MIT
Contribution
Feel free to file issues and submit pull requests � contributions are welcome<
If you submit a pull request, please be sure to add or update corresponding
test cases, and ensure that npm test
continues to pass.