html-urls

2.4.58 • Public • Published

html-urls

Last version Coverage Status NPM Status

Get all URLs from a HTML markup. It's based on W3C link checker.

Install

$ npm install html-urls --save

Usage

const got = require('got')
const htmlUrls = require('html-urls')

;(async () => {
  const url = process.argv[2]
  if (!url) throw new TypeError('Need to provide an url as first argument.')
  const { body: html } = await got(url)
  const links = htmlUrls({ html, url })

  links.forEach(({ url }) => console.log(url))

  // => [
  //   'https://microlink.io/component---src-layouts-index-js-86b5f94dfa48cb04ae41.js',
  //   'https://microlink.io/component---src-pages-index-js-a302027ab59365471b7d.js',
  //   'https://microlink.io/path---index-709b6cf5b986a710cc3a.js',
  //   'https://microlink.io/app-8b4269e1fadd08e6ea1e.js',
  //   'https://microlink.io/commons-8b286eac293678e1c98c.js',
  //   'https://microlink.io',
  //   ...
  // ]
})()

It returns the following structure per every value detect on the HTML markup:

value

Type: <string>

The original value.

url

Type: <string|undefined>

The normalized URL, if the value can be considered an URL.

uri

Type: <string|undefined>

The normalized value as URI.


See examples for more!

API

htmlUrls([options])

options

html

Type: string
Default: ''

The HTML markup.

url

Type: string
Default: ''

The URL associated with the HTML markup.

It is used for resolve relative links that can be present in the HTML markup.

whitelist

Type: array
Default: []

A list of links to be excluded from the final output. It supports regex patterns.

See matcher for know more.

removeDuplicates

Type: boolean
Default: true

Remove duplicated links detected over all the HTML tags.

Related

  • xml-urls – Get all urls from a Feed/Atom/RSS/Sitemap xml markup.
  • css-urls – Get all URLs referenced from stylesheet files.

License

html-urls © Kiko Beats, released under the MIT License.
Authored and maintained by Kiko Beats with help from contributors.

kikobeats.com · GitHub @Kiko Beats · Twitter @Kikobeats

Package Sidebar

Install

npm i html-urls

Weekly Downloads

632

Version

2.4.58

License

MIT

Unpacked Size

9.39 kB

Total Files

4

Last publish

Collaborators

  • kikobeats