duckdown

Simple, lightweight Markdown-like language with extensible grammar.

npm install duckdown
1 downloads in the last week
4 downloads in the last month

Duckdown

Duckdown Build Status

Ultra-simple Markdown-inspired markup language, implemented initially in JS (targeting both the browser and node.)

Duckdown has a difference though - it doesn't work through naive regex hacks: It's a proper recursive descendant parser/state machine with a customisable grammar!

Try Duckdown live, in your browser!

You can use it as is, extend it, or build your very own text markup language with it.

Contents

Writing with Duckdown

Duckdown is intended to be very simple, and flexible, - but very strict and consequently unambiguous for authors. Some aspects of Markdown were omitted or changed as we felt they were they were too complex for novice editors.

WARNING: You should consider the API and text-specification unstable until further notice. Hopefully everything will be formalised soon.

Like Markdown, Duckdown is primarily a line-based language. Inline text styling and linking are similar. Remember that this document describes the default Duckdown grammar, and the parser is not necessarily bound by these same limitations or patterns.

Bold, Italic, Underline, and Strikethrough

Semantic Level: text

Bold, italic, and underline are specified by prepending a string of text with a token, and closing a given string with the same token.

*This text is bold.*

~This text is emphasised.~

-This text is struck through-

_This text is underlined._

*This text is bold ~and this is bold & em!~*

"When I asked her ~why~ she'd done it, she replied '*Just because.*'"

Duckdown is quite strict in what it considers valid. You may not wrap a text style over multiple lines. Opening tokens which aren't given breathing room (they directly abut a word or non-significant token) will be ignored. Closing tokens which do not directly abut the string of text they close will be ignored. Text-level tags which are not closed are considered invalid. Mismatched nesting is also considered invalid.

Headings

Semantic Level: textblock

Headings in Duckdown are described in only one way - by a tag at the beginning of the line, like so:

h1. This is heading 1

h2. This is heading 2 (With some ~emphasised~ text!)

Headings may contain inline tagging/styling, such as emphasis, strikethrough, or a link. Duckdown supports headings one (h1.) through six (h6.)

Semantic Level: text

The primary rationale behind the Duckdown link syntax design is ease of use (and readability.) Secondarily, content archival and maintainability.

With that in mind, we've made the possibly controversial decision to scrap relative links. Instead, all links must include the full path (including the protocol!) This ensures relative reorganisation of content will not break link relationships. Links are left plain, and simply included in text like so:

http://www.example.com/

Of course, often it won't make much sense to include a URL in the middle of a sentence! In that circumstance you can use parentheses to add a link description:

You can purchase http://example.com/barbeques/fourburner (four burner barbeques) at the Acme BBQ store.

It is possible to include any inline text styles in the link text.

https://example.com/sinisterconspiracy.html (Recently, I chanced upon a sinister Mafia conspiracy involving none other than ~*The Queen herself!*~)

Horizontal Rules

Semantic Level: block

Horizontal rules can be embedded in any block element. Simply connect three dashes (---) on a separate line, like so:

---

You may use the horizontal rule syntax in blockquotes and lists (among other block elements.)

Lists

Semantic Level: textblock

Bulletted / Unordered Lists

Bulletted (unordered) lists in Duckdown are very similar to those in Markdown. Simply begin a line with an asterisk (and then some whitespace) like so:

* Oranges;
* Apples,
* Pears, and
* Potatoes.

You must give the list some breathing room - it either has to be the first thing in the document, a direct child (and the first element) of a block level item like another list or blockquote, or be preceded by a blank line. The following is valid, and will be rendered as an unordered list:

Here's a preceding paragraph. This is followed by a blank line.

* Here's a list item.
* Here's another list item. These will be rendered correctly.

On the other hand, without the blank line, the list will be interpreted as a continuation of the previous paragraph. The Duckdown snippet below:

Here's a preceding paragraph. No blank line here, punks!
* What are you expecting?
* Hopefully not a UL here!
* You'll be disappointed!

will be rendered as so in HTML:

<p>Here's a preceding paragraph. No blank line here, punks! * What are you expecting? * Hopefully not a UL here! * You'll be disappointed!</p>

Failing to add whitespace after the asterisk will also prevent it from being considered a list item.

Lists may be nested by indenting them - either by a single tab or four spaces.

* Here's a root-level list item.
    * Without leaving a blank line above, the next line is indented.
    * Both this line and the next will be rendered as second-level list items.
        * Here's a third-level item!
Numbered / Ordered Lists

Unlike ordered lists in Markdown, Duckdown supports flexible list tokens designed to make the raw Duckdown much easier to read. It also explicitly supports three different list types:

  • Numeric - the default display style for a regular ordered list.
  • Lower, roman - lowercase roman numerals
  • Alphabetical, lowercase

In order to specify the list type, just use a letter, number, or romal numeral accordingly - and then a full stop (period) and some whitespace.

1. Ordered List 1
2. Ordered List 2
3. Ordered List 3

a. Important legal subsection a!
b. Important legal subsection b!
c. Important legal subsection c!

i. Important roman-numeral list!
ii. Important...
iii. Roman...
iv. Numeral...
v. List!

Duckdown automatically determines the list type based on the first item in the list. Consider a list which changes types halfway through, like so:

a. Alphabetic item!
ii. Roman Numeral Item!
3. Regular Numbered Item!

In this case, the first item in the list takes precedence, and the whole list is ordered alphabetically.

This restriction does not apply to nested lists - you may nest ordered lists inside any other block element or list - just as you would an ordered list.

1. Item 1
    a. Alphabetic list nested beneath regular ordered list
    b. Item b.
2. Item 2
    i. Roman numeral sub-list!
        * And of course, it's possible to nest bullets as well.

Blockquotes

Semantic Level: block

Duckdown supports blockquotes as multiple concurrent lines prepended with a caret '>'.

This text is outside the blockquote.

> This text is inside the blockquote. The text in 
> blockquotes is also consolidated into paragraphs
> just like regular text.
>
> Separated by a blank line, this is a new paragraph
> inside the blockquote.

This text is outside the blockquote.

You may add attribution to the blockquote by appending a citation on the following like like so:

> The march of science and technology does not imply growing
> intellectual complexity in the lives of most people.
> It often means the opposite.
-- Thomas Sowell

This adds a new paragraph with a linked <cite> tag.

You may also nest blockquotes:

> Two hours ago, MATSUMOTO Hiroshi wrote:
>
> I don't agree with your assertion as stated in your last email:
>
>> Four hours ago, Jacob Slim wrote:
>>
>> Shouldn't the API endpoint be idempotent regardless of the version?
>> This is a data integrity issue.
>
> This isn't a data integrity issue - this is about making things
> easy to understand for app developers.

Preformatted Text

Semantic Level: block

Preformatted text works in exactly the same way as Markdown: indent each line of a preformatted block with either a single tab or four spaces. In the example below, consider \t equal to one tab character.

\tHere's a block of preformatted text.
\tHere's another line. No further processing occurrs in this region.

Feathers

Semantic Level: hybrid (may be overridden by feather function)

One of the key considerations leading to the development of Duckdown (as opposed to using Markdown) was extensibility. We needed a way to incorporate extra functionality into the syntax without polluting it, and since the language is designed to be independent from HTML, we could not use HTML to cover these use cases.

Some examples of this functionality might be:

  • Tweet (or social) buttons
  • Inlining external content
  • Inline video
  • Image galleries and other embedded multimedia

Because a lot of this content is also site or application specific, it didn't make sense to include it in the Duckdown core either.

Instead, I created a method of calling external JavaScript procedures from Duckdown itself, (in keeping with the Duck theme) named Feathers.

Feathers look similar to an HTML tag, with a different parameter syntax:

<feathername param:value paramtwo:value>

In this case, we've already registered a handler with Duckdown, with the name feathername. Duckdown chops up the parameters, and passes them to the feather function as a big object (containing strings.) In this case, such an object would look like the following:

{
    "param": "value",
    "paramtwo": "value"
}

It's totally up to the function defined as to how it handles the parameters. The content of the feather node is replaced with whatever it returns immediately upon execution - although asynchronous code in the handler can retain a reference to the node in question and act on it (mutate it in any way it wants!) before compilation.

The exact way in which feathers work are described in more detail later in this document.

The parameters may have spaces in the values, but not in the names. The parameter values need not be quoted, but the closing caret (>) character must be escaped or avoided.

An example of real-word feather use could include embedding a video in the page:

<video external:true source:youtube id:v982fSFd2 showcomments:false caption:Prime Minister Gordon Brown being introduced to visiting dignitaries.>

This would result in the following hash:

{
    "external": "true",
    "source": "youtube",
    "id": "v982fSFd2",
    "showcomments": "false",
    "caption": "Prime Minister Gordon Brown being introduced to visiting dignitaries."
}

The feather function would then take this information, and generate the appropriate HTML embed code for the video.

A word on text and block-level semantics

Duckdown inherits an HTML-like understanding of block/text semantics.

Each token/language construct has a semantic class associated with it. These are:

  • text
  • textblock
  • hybrid
  • block

This concept, like in HTML, defines reasonable defaults around nesting behaviour:

  • A block level element is permitted to nest only within other blocks, and hybrid elements. It can contain any other element, regardless of text semantics.
  • A hybrid element can contain any element, and nest within any element. It is mainly used for elements where the semantics are indefinite, such as feathers.
  • A textblock can nest within hybrid and block elements, but not text or other textblock elements. Only text elements can be contained within it. An example of a textblock element would be a heading.
  • A text element can nest within any element, but can only contain other text elements

This function returns true or false depending on the nesting compatibility. If no current node is present, and the new node is being inserted directly into the document, this function will return true regardless of text semantics.

A word on encoding

Duckdown works with the regular JavaScript string methods, and is bound by the restrictions of the VM it runs in (in nearly all cases, this means Duckdown will output UCS-2 in a way that is functionally indistinguishable from UTF-8.)

Any character which does not fit into the first 128 printable ASCII characters, or is not permitted in XML will be escaped as XML/HTML hexadecimal entities.

Using Duckdown

Duckdown may be run on the server or in the browser. Let's start with node.

Installing

If you're using npm, you may install Duckdown locally or globally. Installing globally will permit you to easily use Duckdown's CLI tool.

npm install -g duckdown

If you plan on running the tests or building Duckdown yourself, you should install the development dependencies:

npm install -g --dev duckdown

And if you're using git:

git clone https://github.com/cgiffard/Duckdown.git
cd Duckdown
npm install

Running npm install in the git repo will ensure that the required dependencies for testing and building Duckdown are available.

CLI

If you installed Duckdown globally, you should now have a duck CLI tool available to you in your $PATH.

Usage is simple. By default, the tool accepts uncompiled Duckdown on STDIN and pipes compiled HTML to STDOUT.

You may specify a filename to compile:

duck README.dd

Options:

  • -t, --tokens
    Outputs an array of tokens from the original text, prior to parsing.
  • -a, --ast
    Outputs the Duckdown AST for the file or input, prior to parsing.
  • -l, --log
    Displays the Duckdown parse log, along with the cumulative execution time. Log items are gathered via parser events (See Events)
  • -d, --disk
    Write parse log to disk
  • -v, --verbose
    Verbose output - returns extra data in the log, as well as detailled attributes for AST nodes when outputting the AST.
  • -s, --surpress
    Surpress compiled output - in circumstances where you're just interested, for example, in the tokens, log, or AST, you can surpress display of the compiled HTML.
  • -e, --echo
    Include raw duckdown in output
  • -b, --build
    Builds a combined JS file representing the Duckdown source, intended for use in the browser, in both minified and unminified forms. Development dependencies are required in order to use this option. You may specify a filename to write to - which will be considered the name of the 'minified' version. The unminified version will have '-unminified' appended to the name.

Example usage:

# Surpresses compiled output, but displays tokens and an AST verbosely
duck -atvs myDuckdownDocument.dd

# Build duckdown to the current folder
duck -b ./duckdown.js

Using the Duckdown API

Fundamentally, the Duckdown API is very simple. Depending on whether you're using it with node or in the browser, the method of instantiation will be different - but the subsequent use is the same across platforms.

Basically, you'll want to create a new instance of the Duckdown parser. In node, you'll need to require it. In the browser, just include the compiled version of Duckdown (you can find the latest build at Github, or you can build it yourself.)

// Instantiating Duckdown in Node
var Duckdown = require("duckdown"),
    duckdown = new Duckdown();

// Instantiating Duckdown in the browser
var duckdown = new Duckdown();

Assuming you've already got the text you want to compile in a variable, compilation can be as simple as one call:

var compiledHTML = duckdown.compile(myRawDuckdown);

There's a catch though - in order to enable streaming, the parser retains any input it receives, so subsequent compilations will include the Duckdown of the calls before them. You'll need to clear the parser object before compiling again:

duckdown.clear();
var myNewCompiledHTML = duckdown.compile(someOtherDocument);

Using Feathers

The syntax of feathers was described earlier, but feathers must be registered with Duckdown in order to be correctly parsed.

A feather is a non-blocking JavaScript function which accepts an object hash of parameters defined by the Duckdown document being parsed, and returns a string to insert into the document (on compilation) over the top of the feather token.

It receives a reference to the feather node itself, so it may mutate the node later, in an asynchronous callback - but it must be non-blocking or it will totally destroy parsing and compilation performance.

Feathers are registered with Duckdown using the Duckdown.registerFeather() method:

var featherHandler = function(input,duckdown){
    return "abc123";
};

duckdown.registerFeather("myfeather",featherHandler,"text");

The first parameter of the registration function is the name by which you would access the feather from the Duckdown document itself (eg. <myfeather>.)

The second parameter is the function to handle the feather.

The third (optional) parameter describes the semantic level of the feather result (since a feather could reasonably used inline with text, or as a block, like a video or image gallery.) This is used to support nesting behaviour.


That's it! You're good to go.

How Duckdown works

Still here? OK - Here's a little more about what this does.

The above method hides a lot of complexity. Behind the scenes, a number of major functions are called, shown here in roughly sequential order:

  • Duckdown.tokenise
    Turns the raw text into tokens dictated by the grammar
  • Duckdown.parse
    Parses the tokens into an intermediary AST
    • Duckdown.parseToken
      Called by the Duckdown parser, this function is responsible for the brunt of the work. It parses an individual token according to state stored in the Duckdown parser object itself.
    • Duckdown.completeParse
      Finalises a parse operation (Technically speaking, it restores pointers to the AST root, closing any open nodes.)
  • Duckdown.compile
    Actually compiles the sourcecode. Recursively loops through the AST, and calls out to compilation handlers defined by the grammar where required.

Tokenisation

The first stage in any parsing process is to extract a list of meaningful tokens from the input text.

The duckdown tokenising function is Duckdown.tokenise().

Duckdown uses a two-condition process. It splits the input stream based on matches with tokens in the grammar, but also at the boundries of word and non- word characters. Duckdown emphasises longer, more specific matches over more generic ones.

It advances through the text one character at a time, and takes a section of characters between the current pointer and an index determined by the longest token in the grammar.

It then checks the substring against each item in the grammar. If a match is found, it saves the result as a token, and advances the stream pointer to the one character after the end of the match.

If a match isn't found, the length of the substring is decreased by one character, and is compared to the grammar again. This repeats until either a match is found, or the length of the string reaches just one character.

If the substring is only one character long and no grammar match has been found, the character is classified according to whether it is a 'word' or 'non-word' character. 'Runs' of word and non-word characters are buffered and each run is converted into a token when the tokeniser state changes, or completes.

You may use the duck CLI took to observe the token buffer for the document - see the CLI section for usage instructions.

Parsing process

The parsing process is initiated by Duckdown.parse().

It loops through each of the tokens made available by the tokenising stage, and runs Duckdown.parseToken() (see Token Parsing below) on each of them in order to build an AST for the document/stream.

Once each token in the stream is parsed, it executes Duckdown.completeParse, which ties up any loose ends, and restores any pointers that it had to nodes deep in the parser AST to point to the root of the AST itself.

This means that input parsed later cannot mutate nodes already in the Duckdown AST. If you need to leave the parser state as is, so you can add additional content to the document later (for example, you're cumulatively processing a stream,) - you can pass a leaveHanging attribute to Duckdown.parse():

// Duckdown.parse(input,leaveHanging);
duckdown.parse(null,true);

Token Parsing

The Duckdown.parseToken() function is called for each token, and recursively builds an AST from them. It is responsible for the bulk of the work Duckdown does.

Each time it is called, it observes the context it stores against the Duckdown parser object itself, and evaluates the current token according to that state.

In order of execution, it first checks to see whether the current token terminates any existing state, and recursively closes any open AST nodes if applicable. At this point, it emits, for each closed node, any relevant events, and mutates nodes depending on whether the grammar defines specific requirements for them that are only evaluable upon termination.

If the current token hasn't been 'swallowed' by this process (used up when terminating an AST node) then it will checked again against the grammar, to determine if a new node should be created for it.

If a node is not created, the token is deemed to be 'text', and it is buffered.

If a node is created, any currently buffered tokens are appended to the previous current node as 'children'. The new node is then also appended as a child, and initialised.

The token pointer is then advanced by one, and the parseToken function is called again as required, until the token buffer is exhausted.

You may use the duck CLI took to observe the final AST for the document - see the CLI section for usage instructions.

Compilation

Once an AST has been built, Duckdown can compile the document to HTML.

Duckdown recursively loops downward through the AST, compiling each node and appending the result to a text buffer, which it then returns.

Text tokens are encoded and appended as is. Duckdown nodes are compiled according to the rules defined in the grammar. If a node does not have a compilation rule associated with it in the grammar, Duckdown will simply descend into the node and compile its children.

If the node does define a compilation rule, that rule may determine whether further descent occurs. Each compilation rule is passed a reference to the Duckdown compiler, which it can use to compile child nodes, or simply ignore.

Events

During the tokenising, parse, and compilation process, Duckdown emits a number of events which you can listen to in order to introspect the parser operation.

Duckdown itself uses this to generate the parser event logs and performance profile that you can see in the duck CLI tool..

Duckdown implements a kind of pseudo-EventEmitter (because this code also has to run in the browser, and bundling the complete EventEmitter class was overkill!) which you can use like so:

//  Listen to the parse token event
duckdown.on("parsetoken",function handler(currentToken) {
    // do something
    console.log("Looks like the token '%s' is being parsed!",currentToken);
});

Here's a list:

  • clear
    Emitted when initialising the Duckdown parser object, or when the Duckdown parser state is destroyed. No arguments.

  • tokenisestart
    Emitted when the tokenising process begins. No arguments.

  • tokeniseend
    Emitted when the tokenising process is completed. Hands a the resultant token list over as the first argument.

  • parsestart
    Emitted when the parsing process is initiated. No arguments.

  • parseend
    Emitted when the parsing process completes. No arguments.

  • parsetoken
    Emitted when Duckdown begins parsing a token. Passes the current token as the first argument.

  • compilestart
    Emitted when Duckdown begins compiling. No arguments.

  • compileend
    Emitted when Duckdown completes compilation. Passes the final HTML document as the first argument.

  • addstate
    Emitted when Duckdown adds another state to its internal state stack. The state name/ID in question is passed as the first argument.

  • nodeclosed
    Emitted when Duckdown closes an AST node. A reference to the node itself is passed as the first argument.

  • nodeinvalid
    Emitted when a static grammar rule, or processing function determines that a node is invalid. The current node is passed as the first argument. If the node was determined to be invalid by a regex condition, the condition will be passed as the second argument, and the raw node source as the third.

  • nodeselfdestruct
    Emitted when a node processing function determines that the node is invalid and should be converted to text instead of remaining as a node. The node in question is passed as the first parameter.

Be aware that Duckdown doesn't try and clean up after you. If you throw an error or do something untoward in an event listener, you'll kill the current operation at hand.

Building and Testing Duckdown

At the moment, Duckdown only needs to be built for the browser, as the raw source form will work natively in node.

When installed globally, Duckdown makes available a duck CLI tool, which you can use to build the source for the browser. (See CLI for details.) The git repository also includes an up-to-date version of Duckdown built for the browser, in both minified and unminified form. (/compiled/duckdown.js)

Duckdown uses mocha and chai to run its test suite. You can run the test suite with npm:

npm test

Or, with mocha itself for more flexibility

# Show just the syntax tests with the spec reporter
# - and watch for changes
mocha -w -R spec -g reference

You can check the current build status at Travis CI.

Writing a Duckdown Grammar

The Duckdown Grammar, as it currently stands, exhausted the ability of its own architecture/structure to keep it clean and organised.

It is currently in the midst of being totally refactored to ensure it is clean, understandable, and maintainable.

When this process is complete, the new architecture will be documented. Sorry!

Licence & Credits

Who's responsible for this monstrosity!?

Christopher Giffard, with contributions to the test suite and language design by Daniel Nitsche.

And the licence? BSD 2-Clause!

Copyright (c) 2012, Christopher Giffard.

All rights reserved.

Redistribution and use in source and binary forms, with or without modification, are permitted provided that the following conditions are met:

  • Redistributions of source code must retain the above copyright notice, this list of conditions and the following disclaimer.
  • Redistributions in binary form must reproduce the above copyright notice, this list of conditions and the following disclaimer in the documentation and/or other materials provided with the distribution.

THIS SOFTWARE IS PROVIDED BY THE COPYRIGHT HOLDERS AND CONTRIBUTORS "AS IS" AND ANY EXPRESS OR IMPLIED WARRANTIES, INCLUDING, BUT NOT LIMITED TO, THE IMPLIED WARRANTIES OF MERCHANTABILITY AND FITNESS FOR A PARTICULAR PURPOSE ARE DISCLAIMED. IN NO EVENT SHALL THE COPYRIGHT HOLDER OR CONTRIBUTORS BE LIABLE FOR ANY DIRECT, INDIRECT, INCIDENTAL, SPECIAL, EXEMPLARY, OR CONSEQUENTIAL DAMAGES (INCLUDING, BUT NOT LIMITED TO, PROCUREMENT OF SUBSTITUTE GOODS OR SERVICES; LOSS OF USE, DATA, OR PROFITS; OR BUSINESS INTERRUPTION) HOWEVER CAUSED AND ON ANY THEORY OF LIABILITY, WHETHER IN CONTRACT, STRICT LIABILITY, OR TORT (INCLUDING NEGLIGENCE OR OTHERWISE) ARISING IN ANY WAY OUT OF THE USE OF THIS SOFTWARE, EVEN IF ADVISED OF THE POSSIBILITY OF SUCH DAMAGE.

npm loves you