promise-parser
Promise-based HTML/XML parser and web scraper for NodeJS.
Features
- Fast: uses libxml C bindings
- Lightweight: no dependencies like jQuery, cheerio, or jsdom
- Clean: promise based interface- no more nested callbacks
- Flexible: supports both CSS and XPath selectors
Example
var pp = ; var parser = ; // scrape all craigslist listingsparser
Install
npm install promise-parser
Usage
-
opts [object]
- opts.http [object] - HTTP options given to needle instance
- opts.http.timeout [int] - Timeout in milliseconds
- opts.http.proxy [string] - Forward requests through HTTP(s) proxy
- opts.http.concurrency [int] - Number of simultaneous HTTP requests
- opts.http.tries [int] - Number of tries before giving up on a request
Promises
.parse(string)
Parse an HTML or XML string
.get(url, [data], [opts])
HTTP GET request
.post(url, [data], [opts])
HTTP POST request
.find(selector, [opts])
Find elements based on selector
within the current context
.follow([selector], [opts])
Follow URLs found within the element text or attr
.set([args])
Find and set values for context.data
// set 'title' to current element textpp // set 'title' to text of 'a.title'pp // set multiplepp;
.then(callback(next))
Calls callback
from the context of the current element.
To continue, the callback must call next([context])
at least once.
The context
argument can optionally be a new context.
pp
context
The this
value of .then
callback function is set to the current context.
The context is a libxmljs Element
object representing the current HTML/XML element.
In addition to all of the libxmljs Element
functions,
each context
also supports these functions:
- context.request(url, [data], callback(context))
- context.post(url, [data], callback(context))
- context.log(msg)
- context.debug(msg)
- context.error(msg)
- context.data [object]
.data(callback(data))
Get data stored in context.data
.done(callback)
Calls callback
when parsing has completely finished
.log(callback(msg))
Call callback
when any log messages are received
.error(callback(msg))
Call callback
when any error messages are received
.debug(callback(msg))
Call callback
when any debug messages are received
CSS helpers
These CSS helper selectors are provided to simplify complex CSS expressions and to add jQuery-like functionality.
:contains(string)
Select elements whose contents contain string
:starts-with(string)
Select elements whose contents start with string
:ends-with(string)
Select elements whose contents end with string
:first
Select first element (shortcut for :first-of-type
)
:first(n), :limit(n)
Select first n
elements
:last
Select last element (shortcut for :last-of-type
)
:last(n)
Select last n
elements
:even
Select even elements
:odd
Select odd elements
:skip(n), skip-first(n)
Skip first n
elements
:skip-last(n)
Skip last n
elements
:range(n1, n2)
Select n1
through n2
elements inclusive
.exampleSelector[n]
Select n
th element (shortcut for :nth-of-type
)
@attribute
Select attribute