gatsby-remark-extract-keywords
Extract most important keywords from your content using Natural tf-idf. From their docs:
Term Frequency–Inverse Document Frequency (tf-idf) is implemented to determine how important a word (or words) is to a document relative to a corpus. The following formulas are used for calculating tf and idf:
- tf(t, d) is a so-called raw count, so just the count of the term in the document
- idf(t, D) uses the following formula: 1 + ln(N / (1 + n*t)) where N is the number of documents, and n_t the number of documents in which the term appears. The 1 + in the denominator is for handling the possibility that n_t is 0.
In our context, N
is just 1, your page/post content.
Supports both MD and MDX format.
Table of Contents
Installation
npm install --save gatsby-remark-extract-keywords
or
yarn add gatsby-remark-extract-keywords
It has gatsby as peerDependency
.
Usage
In your gatsby-config.js
:
plugins: resolve: `gatsby-transformer-remark` options: plugins: `gatsby-remark-extract-keywords` ;
This creates a new field on each MD/MDX node called keywords
, you can use it on your GraphQL query:
query ListingQuery { edges node id frontmatter title fields keywords }
blacklist
option as function
This will only return keywords with keyword length higher than 5.
const filterKeywords = termlength > 5; plugins: resolve: `gatsby-transformer-remark` options: plugins: resolve: `gatsby-remark-extract-keywords` options: blacklist: filterKeywords ;
Options
Option | Description |
---|---|
max |
Maximum number of keywords to return |
blacklist |
String, array of strings or function to blacklist terms. If function, is used as filter parameter. |
Contributors ✨
Thanks goes to these wonderful people (emoji key):
Eduardo Reveles 💻 📖 🤔 |
This project follows the all-contributors specification. Contributions of any kind welcome!