The Abandon Wars

Saturday, April 07, 2007

Like the man said: Just a quickie.

All I have to say is that I finally document how the parser definitions work. I was changing things so often I confused myself and decided that unless I pinned the defintion in text I'd never stop.

And here it is: the parser document in all its minimalist glory.

A defintion is declared with the $ character prefixing the definition name. The one exception is the 'main' definition which is always the first text in the file and is not named.

A node can be explicitly created by using parenthesis. Elements in a node can be separated using one of three separators. The | character chooses only one of the elements in the node eg: (A | B) will parse either A or B, not both.

The & character parses if all the elements parse. It is only valid with the ! modifier and is used to disallow elements eg: (A & !B & !C).

A 'space' - ie: no separator - means the elements must parse in order eg: (A B C). If any elements fail then the node fails.

The ! character modifies element by logically negating the parse result. A node with a ! modified element will fail if the element parses normally (before negation).

A node may be followed by the + or - characters. The + indicates that the node will not be simplified eg: ((A)+) will remain ((A)) and not collapse to (A). The - means the contents of the element should be discarded and not added into the token tree. The node is still created as necessary.

Following the + or - is an optional node name. eg: (A)+MyName or (B)-OtherName. It is possible for the - node to be collapsed at which point the name is lost.

A nodes repetition can be set by the pattern ?-?. eg: (A) 0-1 makes the node A optional. (B) 1-1 makes B compulsory. If ommited the node is 1-1. The right hand digit may be replaced by and X which implies no limit. eg: (C) 0-X means C can be repeated as often as needed or not be present at all.

The . character is the last node option and stops the token tree from rolling back past it. eg: (C | (A B).) This is useful for error reporting as otherwise the entire parse tree will rollback and confusingly report the root node as being in error.

An element can be a string literal eg: "Hello" or character literal eg: 'c'. An element can also be one of the primitive keywords: integer, real, string or char. A string must be surrounded by double quotes. eg: "Yo!" but places no constraint on what the string contains.

A character must be surrounded by single quotes eg: '$'. Escape codes can be used as normal in both strings and chars.

An element can be the keyword 'identifier' which parses c style identifiers: eg: iIndex or _Node38 but
not 74Id or $%.

Lastly an element can be a definition. eg: $DefinitionA can be used as ("Hello" DefinitionA). This replaces 'DefinitionA' with whatever the node $DefinitionA actually contained.


That's it. You may notice that the parser is described as the dumb beast alluded to last week. This has made life much, much simpler - which is better for all of us as I may actually start working on Space Crusade specifics again ;)

Later!

2 Comments:

  • Sounds like a regular expression parser ;)

    By Blogger Justin Paver, at 2:00 am  

  • Um... well... pretty much ;)

    Hold on! you actually read that? Even I didn't read it and I wrote it!

    The language definition thingy does contain a bit of meta infomation that it passes to the token parser.

    Disturbingly the token tree resembles XML but without the verbosity.

    By Blogger Andrew, at 9:48 am  

Post a Comment

<< Home