microparsec

0.0.1 • Public • Published

microparsec

Bare-bones text processing using keywords and regex

Extract words, special characters, or regex matches from strings of test using this simple text parser. Inspired by Todoist's Quick Add feature, which uses NLP (Natural Language Processing) to extract dates, projects, and other keywords.

Give microparsec lists of keywords, regex, or special characters you want found and extracted, and some text to parse. It will return lists of keywords it found or regex it matched, along with the rest of the string input.

Installation


yarn:

yarn add microparsec

npm:

npm i --save microparsec

Usage


microparsec takes the keywords it's given and searches for their last occurrence, using this case-insensitive regular expresssion: /\bkeyword\b(?!.*\bkeyword\b)/i. Keywords can have spaces in them. If a regular expression is passed in place of a keyword, the input is matched directly against that regex.

Initialize microparsec with the keywords you want parsed out, which can take the form of:

  • a string
new Microparsec("keyword")
  • a regex
new Microparsec(/regex/)
  • an array of strings or regexes
new Microparsec(["key", "word", /regex/])
  • a parser object, containing a name and list of keywords
new Microparsec({ name: "What", keywords: ["the", /heck/] })
  • an array of parser objects
new Microparsec([
  { name: "What", keywords: ["the", /heck/] },
  { name: "Goodbye", keywords: ["hello", "hi"] },
])

Once initialized, you can parse any input string and expect a result like this:

const parser = new Microparsec("Hello")
const result = parser.parse("Hello World!")
console.log(result)
/*
{
  matches: [
    {
      name: "Default",
      matches: ["Hello"]
    }
  ],
  leftovers: "World!"
}
*/

Any parser that had a match is returned with all its matches, and the rest of the unmatched string is returned as leftovers, trimmed of excess whitespace. Parsers without a match do not appear in the result. Matches against regex keywords return the text that matched, not the regex itself.

const parser = new Microparsec([
  { name: "One", keywords: [/ee+/] },
  { name: "Two", keywords: ["What", "the"]}
  { name: "Three", keywords: ["no", "matches"]}
const result = parser.parse("What the wheeeee is that?")
console.log(result)
/*
{
  matches: [
    {
      name: "One",
      matches: ["eeeee"]
    },
    {
      name: "Two",
      matches: ["What", "the"]
    }
  ],
  leftovers: "is that?"
}
*/
])

Development


yarn install
yarn test

Readme

Keywords

Package Sidebar

Install

npm i microparsec

Weekly Downloads

1

Version

0.0.1

License

ISC

Last publish

Collaborators

  • xandroxygen