SparserTechnical Documentation
Execution
Execute the application by simply running this instruction, where the options are an object described in Options .
Supplied Run Times
Sparser is intended for inclusion in other applications as an embedded utility. To run Sparser immediately and experiment without any configuration some simple runtime interfaces are provided.
Browser Runtime
A handy-dandy browser utility is provided to run Sparser in a web browser as demo/index.xhtml. This utility can be run from any location whether on your local file system or from a webserver. First, run the build, , to create the necessary JavaScript file. To run the web server simply execute
on the command line. The server provides a local webserver and a web socket channel so that the provided HTML tool automatically refreshes when the application rebuilds.
The browser utility produces output in a HTML table and color codes the output based upon the lexer used.
Terminal Runtime
All Node.js support is consolidated into a single location. To examine available features run this command:
Embedding
Sparser is completely environment agnostic, which means it can be embedded anywhere and run the same way and produce the same output.
Browser Embedding
The browser environment makes use of a single dynamically created file: js/browser.js. Just simply install from NPM () or run the build
to compile the TypeScript and generate this file. This file is actually API agnostic except that it builds out the application and attaches it as a property of the window object. The file also contains every available lexer.
Include the js/browser.js file in your HTML:
]]>
Inside your browser-based JavaScript application simply call Sparser.
Node Embedding
The file for embedding into Node, js/parser.js is identical to the file for embedding into the browser except the one reference to the window object instead refers to Node's global object. Simply include this file into your application by any means, example:
Ignore Code
Parts of code can be ignored from parsing by sandwiching that code between two comments. The first comment must start with and the second comment must contain
. For example:
some code to ignore ]]>
Universal Parse Model
Sparser supports several different formats of data structure for output as defined by the format option. All these formats represent the data equally, but shape the data in a way a user may find more comfortable to access. The following explanation will use examples in the default arrays format type.
Data Types
- begin - number - The index where the current structure begins. For tokens of type start this will refer to the parent container or global scope.
- ender - number - The index where the current structure ends. Unlike the begin data a token of type end refers to itself.
- lexer - string - The type of rules use to scan and resolve the current token.
- lines - number - Describes the white space immediate prior to the token's first character. A value of 0 means no white space. A value of 1 means some amount of white space not containing a new line character. Values of 2 and greater indicate the number of new lines plus 1. For example an empty line preceding the current token would mean a value of 3, because the white space would contain two new line characters.
- stack - string - A description of the current structure represented by the begin and ender data values.
- token - string - The atomic code fragment.
- types - string - A categorical description of the current token. Types are defined in each markdown file accompanying a respective lexer file.
Each of those data types is an array of identical length that are populated and modified in unison. Think of this as a database table such that each array is a column, the name of the array (the object key name) is the column metadata, and finally each index of the arrays is a record in the table. Here is an example:
Consider the code ]]>
. The parsed output in the default format will be:
", "", "class=\"cat\"", "", ""
],
types: ["start", "start", "attribute", "end", "end"];
}]]>
If that parsed output were arranged as a table it would look something like:
index | begin | ender | lexer | lines | stack | token | types |
---|---|---|---|---|---|---|---|
0 | -1 | 4 | "markup" | 0 | "global" | "<a>" | "start" |
1 | 0 | 3 | "markup" | 0 | "a" | "<b>" | "start" |
2 | 1 | 3 | "markup" | 0 | "b" | "class=\"cat\"" | "attribute" |
3 | 1 | 3 | "markup" | 0 | "b" | "</b>" | "end" |
4 | 0 | 4 | "markup" | 0 | "a" | "</a>" | "end" |