tokenize-html
streaming html tokenizer.
Like html-tokenize but uses forgiving htmlparser2 underneath.
example
var fs = ;var tokenize = ;var through = ;fs;
this html:
blah blah blahthereitis
generates this output:
[ 'open', 'table', { cols: '3' } ]
[ 'text', '\n ' ]
[ 'open', 'tbody', {} ]
[ 'text', 'blah blah blah' ]
[ 'close', 'tbody' ]
[ 'text', '\n ' ]
[ 'open', 'tr', {} ]
[ 'open', 'td', {} ]
[ 'text', 'there' ]
[ 'close', 'td' ]
[ 'close', 'tr' ]
[ 'text', '\n ' ]
[ 'open', 'tr', {} ]
[ 'open', 'td', {} ]
[ 'text', 'it' ]
[ 'close', 'td' ]
[ 'close', 'tr' ]
[ 'text', '\n ' ]
[ 'open', 'tr', {} ]
[ 'open', 'td', { bgcolor: 'blue' } ]
[ 'text', 'is' ]
[ 'close', 'td' ]
[ 'close', 'tr' ]
[ 'text', '\n' ]
[ 'close', 'table' ]
[ 'text', '\n' ]
api
var t = tokenize()
Returns a transform stream that takes html input and produces rows of output.
The output rows are of the form:
[ name, tag|text [, attrs] ]
The types of names are:
- open
- close
- text
license
mit