cjk-tokenizer

0.1.0 • Public • Published

cjk-tokenizer

Extract terms from CJK text. The origin idea is stolen from timdream/wordfreq.

Why?

A CJK text tokenizer that works as expected is missing in the javascript magic world. So I decided to build one with these features:

  • Chinese, Japanese and Korean support
  • Terms extracted would contain score, position in origin text, etc.
  • A more common stop words collection

Install

Use in project:

npm i cjk-tokenizer --save

Cli:

npm i cjk-tokenizer -g

Demo

Contribute

Versions

Current Tags

  • Version
    Downloads (Last 7 Days)
    • Tag
  • 0.1.0
    9
    • latest

Version History

  • Version
    Downloads (Last 7 Days)
    • Published
  • 0.1.0
    9
  • 0.0.1
    2

Package Sidebar

Install

npm i cjk-tokenizer

Weekly Downloads

8

Version

0.1.0

License

MIT

Last publish

Collaborators

  • leungwensen