Use jquery selector to convert web content into json data.
Quick start
- Install module
- Using modules
Installing the module
npm install webtojson
or
yarn add webtojson
Using modules
Introduce module
var webtojson = ;
- Implementation method
Execute thewebtojson
method
For example, the following code is the result of crawling the google search keyword node
;
async var googleData = await ; console;;
result
id: 0 title: 'Node.js'id: 1 title: 'Node.js-Wikipedia'id: 2 title: 'nodejs / node: Node.js JavaScript runtime-GitHub'id: 3 title: 'Introduction to Node.js'id: 4 title: 'Node.js Introduction-W3Schools'id: 5 title: 'Node-Web APIs | MDN-Mozilla'id: 6 title: 'node-npm'id: 7 title: 'node-Docker Hub'id: 8 title: 'Express-Node.js web application framework'id: 9 title: 'Node.js-Introduction-Tutorialspoint'id: 10 title: 'Node-RED'
API
The module provides the following methods:
webtojson
(required)extend
(optional)add
(optional)getData
orsaveFile
(required)
webtojson
webtojson(urls, selector, option);
webtojson
method can grab single page data;
- urls: string | array | collection The url address of the content to be crawled. It can be a single url or multiple;(required)
- selector: object specifies the content to be crawled;(required)
- option: object configuration item(optional)
- paging: paging configuration(optional)
- pageNum: total pages
- offset: number offset of each page
- keyword: keyword corresponding to string url
- headers: request header settings(optional)
- User-Agent: Specify useragent(optional)
- Cookie: Make a cookie. For example, some search engines need to provide cookies to have results (optional)
- paging: paging configuration(optional)
Urls supports three types of strings, arrays, and collections:
When urls is a string, only the content of a single url is crawled. as follows:
;
When urls is an array, it will grab all the url addresses in the array and merge all the values together. as follows:
;
When urls is the collection, object url
field will be used and all values are merged together. as follows:
;
The following is a form of selector:
{
id: ".typecont span a",
title: ".typecont span a",
url: ".typecont span a"
}
selector is of the form key / value
,key
is the attribute name of the final json
, andvalue
is the attribute value of json
;
value
has two types: String and Function;
When value
isString
, it is a selector similar to jquery, and the content of the selector corresponding to the dom node is used as the json attribute value.
When value
isFunction
, the value that the function runs as the json attribute value; something like this:
id: ".typecont" title: ".typecont span a" { return ; }
The function has two parameters:
- $: jquery $
- parentEle: parent element node, select child nodes based on parent element;
extend(selector) or extend(url, selector)
Data from other pages will be captured and merged with previous data;
There are two different parameters for extend:
extend(selector): if there is only a selector parameter, the value of the previous data url
field will be used to ;
extend(url, selector): same as webtojson function parameters;
var googleData = await ;
extend merges the search list name with the contents of each name;
add(selector) or add(url, selector)
Add information on the basis of previous data, parameters are consistent with extend method;
var googleData = await ;
getData()
Get data for the entire chain;
saveFile(filePath)
Save the data as a file
var googleData = await ;