KodeTokenizer
Generic source code tokenizer. WIP.
Installation
Via npm on Node:
npm install kodetokenizer
Usage
Reference in your program:
var kt = ;
Given a text, get its content as tokens:
var tokens = kt;
The result is an array of tokens, each one is a plain JavaScript object with:
value
: texttype
: a number, from kt.Types
The types are:
kt.Types.Word
: a sequence of letterskt.Types.Digits
: a sequence of digitskt.Types.WhiteSpace
: a sequence of whitespacekt.Types.NewLine
: a new line:\n
,\r\n
or\r
kt.Types.Symbol
: a sequence of symbol (not a letter, digit, whitespace, new line or separator)kt.Types.Unknown
kt.Types.Separator
: a character separator
The separators are "language dependend", so you must indicate them in an option object parameter, ie:
var tokens = kt
You can add processors: functions that given an initial character, returns a token:
{ //...} var tokens = kt;
The parameter ch
is the detected character. position
points to a character in text
, the next unprocessed one.
The processor can return:
null
: no token detected, so the tokenizer takes control again.{ position: anumber, token: atoken }
: whereposition
is the new unprocessed char position in text, andtoken
is the token to be used
See test/string.js
as an example of processor. Note that you can use;
var Types = ktTypes;TypesString = ++TypesMaxValue;
to add your own token types.
Development
git clone git://github.com/ajlopez/KodeTokenizer.git
cd KodeTokenizer
npm install
npm test
Samples
TBD
Versions
- 0.0.1: Published
References
TBD
Contribution
Feel free to file issues and submit pull requests � contributions are welcome
If you submit a pull request, please be sure to add or update corresponding
test cases, and ensure that npm test
continues to pass.