ux-lexer

0.10.0-alpha1 • Public • Published

ux-lexer

version: 0.10.0-alpha1

Extensible Lexer for JavaScript without requiring regular expressions.

This library is meant to provide a foundation for creating custom lexers written in JavaScript in order to provide lexical analysis which will produce tokens.

  • Create rules
  • Add the rules to a lexer in order of precedence.
  • Parse the lexical syntax from a string.

Contents

Key Requirements

  • A lexer that does not require regular expressions.
  • Runs in multiple JavaScript environments: browsers, web workers, nodejs, windows rt.
  • Keep the library at a bare minimum.

back to top

Rationale

Some problems need a better solution than using regular expressions to parse tokens from strings, especially HTML. Regular expressions can be ineffecient,hard to read, and gives you only so much control.

back to top

Builds

The builds for nodejs, browser, and amd will let you include files as needed. The builds will also include a util.js file that includes all the methods bundled in a single file. The build process creates 3 versions of the scripts, one for CommonJS, one for AMD, and one with closures for the browser.

The source files are included with the npm module inside the src folder. This is so that developers and script consumers may cheery pick the functionality he/she wants and include the files as neede for custom builds.

To build and test everything use:

$ grunt ci

There are two different versions of ux-lexer. lexer.js will require ux-util as a depdendency, while lexer-all.js will bundle only the methods needed from ux-util requiring zero external dependenies.

back to top

Browser Distribution

location: dist/browser

The browser distribution will use closures to wrap functionality and it uses the global variable of "ux.util". If you wish to use a method you can do the following:

    <script type="/scripts/ux-util/equals.js"> </script>
    var equals = self.ux.util.equals;
    if(equals(left, right))
    {
        // do something
    }

back to top

AMD Distribution

location: dist/amd[/lib]

The amd distribution has the main file in the root and the rest of the files are pushed into the lib folder. This is so that the same require statements will work with node and when using something like require js with a browser.

back to top

CommonJS Distribution

location: lib

The files are located inside of the lib folder.

back to top

API

Reader

Provides various methods to read, scan, and peek at various parts of an array like value/object. back to top



constructor(Array|String enumerable)

Takes an array like source that can be iterated over and has a zero based index property accessor.

example
    var reader = new Reader("text to reader");

    var example = function() {
        var argReader = new Reader(arguments);
    };

back to top

current

Gets the current value or object for the current position.
back to top

limit

Gets the number of items in the ienumerable which is the fartherest index - 1 that the reader can move to. back to top

position

Gets the current position/index that reader is currently pointing to for the enumerable.



data

Gets the enumerable data that the reader was given. back to top

emptyValue

Gets or sets value that the reader knows to be the end of string or file. This defaults to null.

back to top

dispose()

Remove any references that the reader is holding on to. By default, the reader will dispose of the reference to the enumerable and methods that are created in the constructor.

example
    var using = require("ux-util/lib/using");

    using(new Reader("some text here!"), function(r){

    }); // disposed will be called. 

    var reader = new Reader("some other text");

    console.log(reader.peek(0));

    reader.dispose();

back to top

next()

Returns the next value in the ienumerable. If the environment supports StopIteration, next() will throw it when done. Otherwise it will throw an Error with the message of "StopIteration".

example
    var example = function(){
        for(value of new Reader(arguments)) {
            console.log(value);
        }
    }

    example("one", "two", "three");

    var reader = new Reader("function(arg1, arg2) {}");
    try {
        while(true) {
            console.log(reader.next());
        }
    } catch(e) {
        if(e.message !== "StopInteration" && typeof e !== "StopInteration") {
            // log error or rethrow
        }
    }

back to top

nextValue()

Returns the next value or an empty value if the reader has reached the end of ienumerable.

example
    var reader = new Reader("function(arg1, arg2) {}");
    var current  = null;
    while((current = reader.nextValue()) !== reader.emptyValue) {
        console.log(current);
    }

back to top

peek(Number position)

Returns the value or object at the specified position if the position is less than the limit, otherwise it returns the emptyValue.

example
    var reader = new Reader("ABCDEF");
    console.log(reader.peek(3)); // D
    console.log(reader.peek(9)); // null

back to top

peekAtNext()

Returns the value, object, or emptyValue for next position in the reader.

#example

    var reader = new Reader("ABCDED");
    reader.next();
    var c = reader.peekAtNext();
    console.log(c); // B

back to top

reset()

Return the reader to the start position in order to be read the enumerable again.

example
    var reader = new Reader("ABCDEF");
    var current = null;
    while((current = reader.nextValue())) {
        console.log(current);
    }

    reader.reset();
    while(current = reader.nextValue())) {
        console.log("v2: " + current);
    }

back to top

scan(Function|Object predicate, [Number position], [Number limit])

Looks for a section of the enumerable for values or objects that match the predicate and returns position of the match.

example
    var reader = Reader("ABCDEFEDCBA");

    var position = reader.scan("D");
    console.log(position); // 3

    position = reader.scan(function scan(c) {
        var count = scan.count || 0;
        if(c === 'D')
        {
            if(count === 1)
                return true;
            scan.count = 1;
        }
        return false;
    });

    position = reader.scan("D", 5);
    console.log(position); // 7

back to top

slice(Number offset, Number limit)

Returns an array of values or objects that starts at the offset position up to the specified limit.

example
    var reader = Reader("ABCDEFEDCBA");

    var slice = reader.slice(1,2);
    console.log(slice); // ["B","C"];

back to top

to(Number position)

Moves the reader to specified position.

example
    var reader = Reader("ABCDEFEDCBA");
    reader.to(3)
    var next = reader.nextValue();
    console.log(next); // "E"

back to top



LexerRule

Rules determine how characters are consumed and transformed into a token.

LexerRule Example

    var IdentifierRule = LexerRule.extend({
        tokenName: "IDENTIFIER",
        value: null,
        position: null;
        match: function(character, reader) {
            var alpha = Lexer.isLetter(character);
        
            if(!alpha || (character !== "_" && character !== "$"))
                return false;

            this.value = this.position = null;
            var start = reader.position,
                i = start,
                c = null;

            while((c = reader.peek(i++)) !== reader.emtpyValue && c !== ' ')
            {
                if(!Lexer.isLetterOrDigit(c)  && c !== '_')
                    return false;
            }
            
            var count = (start - i);
            this.value = reader.slice(start, count).join('');
            this.position = i;

            return true;
        },
        createToken: function(reader) {
            var token = {name: this.tokenName, value: this.value, ruleIndex: this.ruleIndex };
            reader.to(this.position);

            this.value = this.position = null;

            return token;
        }
    });
    var SpaceRule = LexerRule.extend({
        match:function(character, reader) {
            return character === ' ';
        }
    });


    var AnyRule = LexerRule.extend({
        match: function(character, reader) {
            return character !== ' ';
        },
        next: function(reader) {
            var c = reader.peekAtNext();
            if(c !== ' ' && c !== null)
                return true;
            return false;
        }
    });

    var enumerable = "$test word hyphen-word";

    var reader = new Reader(enumerable),
        c = null, 
        rules = [new IdentifierRule(), new SpaceRule(), new AnyRule()];
        tokens = [],
        i = 0,
        l = rules.length;

    while((c = reader.nextValue()) !== reader.emtpyValue)
    {
        for(; i < l; i++)
        {
            var rule = rules[i];
            if(rule.match(c))
            {
                tokens.push(rule.createToken(reader));
                break;
            }
        }
    }

    console.log(tokens);

back to top



symbol

symbol static property found on the constructor that is the same value as the tokenName property. This will allow you to have one statically available const for the token name.

example

    var NewRule = LexerRule.extend({
        tokenName: "NEW",
        // other stuff
    });

    // elsewhere

    // if token.name === "NEW"
    if(tokens[2].name === NewRule.symbol) {
        // do something.  
    }

back to top

constructor()

Creates an instance of LexerRule.

tokenName

Gets or sets the name for the token when the rule generates the token object.
back to top | example |

createToken(Reader reader)

Returns the token generated by this rule when a match is found.

back to top | example

match(String character, Reader reader)

Returns true when the rule matches on the character(s), otherwise it returns false.

back to top | example

next(Reader reader)

Returns true when the rule matches on the next character(s) in the sequence, otherwise it returns false. This method is used by createToken to generate the value for the token and move the reader forward as needed.

back to top | example

extends(Object prototype)

Creates a sub class of LexerRule. This is the preferred way of sub classing the LexerRule and to create rules for the lexer.

back to top | example

Lexer

The base class to inherit from in order to create a customized lexer.

Lexer example

    var SimpleLexer = Lexer.extend({
        emptyValue: null,
        addRules: function() {
            this.addRule(new IdentifierRule());
            this.addRule(new SpaceRule());
            this.addRule(new AnyRule());
        }
    });

    var lexer = new SimpleLexer("var x = new Test();"),
        tokens = []
        token = null;

    while((token = lexer.nextValue()) !== lexer.emptyValue)
    {
        if(token.name !== SpaceRule.symbol)
            tokens.push(token);
    }

    console.log(tokens);

back to top

constructor(Object enumerable)

The Lexer constructor. It takes an array like object as the main parameter. This could be a string, array, or arguments.

back to top | example

emptyValue]

Gets or sets emptyvalue or the end of sequence marker. This will instructor the lexer that it has reached the end of known tokens.

back to top | example

initialized

Gets a value that indicates whether or not the Lexer has been initialized.

back to top

reader

Gets a reference to the reader for the Lexer.

back to top

rules

Gets the array of rules for the lexer. The order of the rules are important as the rules are processed in order. The rule that matches first determines how the token is created.

back to top

slots

Gets the array of positions where tokens were found within the enumerable value that was passed to the lexer for analysis.

back to top

addRule(LexerRule rule)

Adds a lexer rule to the lexer in order to find matches and create tokens. back to top | example

addRules()

An abstract method that subclasses are meant to override in order to add rules to the lexer.

back to top | example

dispose()

Disposes of resources that the lexer is holding onto in order to free up memory.

back to top | example

init()

Initializes the lexer. This is called by the constructor. back to top

iterator()

Returns an iterator for the lexer. This method is for iterators in ecmascript 6, however it can be used in previous versions of JavaScript.

    var lexer  = new SimpleLexer("var x ='one';"),
        tokens = [];

    for(token of lexer) {

        if(token.name !== SpaceRule.symbol) 
            tokens.push(token);
    }

back to top | example

next()

Returns the next value in the iteration or throws a StopIteration exception. If the environment does not support StopIteration, then an Error with the message "StopIteration" is thrown.

back to top | example

nextValue()

Returns the next value in the iteration or returns the emptyValue when the sequence / loop has ended.

back to top | example

extends(Object prototype)

A static method that sub classes lexer.

back to top | example

isDigit(String character)

A static method that returns true if the character is a digit (0-9), otherwise it returns false.

back to top

isLetter(String character)

A static method that returns true if the character is a letter (a-zA-z), otherwise it returns false.

back to top

isLetterOrDigit(String character)

A static method that returns true if the character is a letter or digit(0-9a-zA-Z), otherwise it returs false.

back to top

License

For extends, isPlainObject, isWindow: Copyright 2014 jQuery Foundation and other contributors http://jquery.com/

The MIT License (MIT)

Copyright (c) 2013-2014 Michael Herndon http://dev.michaelherndon.com

Permission is hereby granted, free of charge, to any person obtaining a copy of this software and associated documentation files (the "Software"), to deal in the Software without restriction, including without limitation the rights to use, copy, modify, merge, publish, distribute, sublicense, and/or sell copies of the Software, and to permit persons to whom the Software is furnished to do so, subject to the following conditions:

The above copyright notice and this permission notice shall be included in all copies or substantial portions of the Software.

THE SOFTWARE IS PROVIDED "AS IS", WITHOUT WARRANTY OF ANY KIND, EXPRESS OR IMPLIED, INCLUDING BUT NOT LIMITED TO THE WARRANTIES OF MERCHANTABILITY, FITNESS FOR A PARTICULAR PURPOSE AND NONINFRINGEMENT. IN NO EVENT SHALL THE AUTHORS OR COPYRIGHT HOLDERS BE LIABLE FOR ANY CLAIM, DAMAGES OR OTHER LIABILITY, WHETHER IN AN ACTION OF CONTRACT, TORT OR OTHERWISE, ARISING FROM, OUT OF OR IN CONNECTION WITH THE SOFTWARE OR THE USE OR OTHER DEALINGS IN THE SOFTWARE.

Readme

Keywords

none

Package Sidebar

Install

npm i ux-lexer

Weekly Downloads

0

Version

0.10.0-alpha1

License

none

Last publish

Collaborators

  • michaelherndon