Parser

The parser for the GMAD language is independent from the rest of BDSIM and can be found in the parser directory. The main parser interface can be found in parser.h. The parser is currently a singleton (only one instance with global scope).

The GMAD language is a LALR (Look-Ahead LR) parser language generated by Bison (in the file parser.y) with a lexical analyser generated by flex (from parser.l). GMAD provides basic arithmetic and Boolean operators. GMAD also provides predefined units, constants in SI units and some common mathematical functions, like the trigonometric ones, by binding them to actual C-variables and functions. It has a global scope only.

Note

GMAD is designed to be an extension of the syntax used by MAD, which means that any MAD syntax should be supported.

Parser Classes

All options for a BDSIM run are contained in an instance of the Options class. This is passed to BDSIM. The Options class is built out of a struct-like OptionsBase and a layer of self-inspection provided by the Published class. The struct-like OptionsBase is needed to output the options to the rootevent output.

All other parser classes like Element, Region, etc. also have this layer of self-inspection.

The Parameter class is a temporary storage class of the parameters before the actual Elements are created.

The enum ElementType has a list of all elements currently in BDSIM.

The Array class is an array representation for bison and either strings or doubles.

The python.h provides the Python interface to the parser.

The Symtab class represents a parser variable. All variables are stored in a map.

Bison

This section gives a brief overview of Bison. For more comprehensive reading, a manual is recommended.

The parser.y file contains the typical four main sections:

%{
C declarations
%}

Bison token and types declarations

%%
Grammar rules
%%

Additional C code

The gmad keywords are translated to bison tokens in the library file parser.l.

C-declarations

The C-declarations are a few global variables.

Bison Tokens

Bison tokens (translated directly with the library) and types (more general variables) are from a union and these can be one of the following types:

  • double

  • int (for the enum class ElementType)

  • std::string* (a pointer so its size can fit in the union; its memory is stored in the Parser class)

  • GMAD::Array*

  • GMAD::Symtab* (a pointer to a general symbol / variable class, which can represent a double, string, GMAD::Array or a function)

The union type of the tokens are defined in the Bison declaration section of parser.y, for example:

%token <str> STR
%type <dval> aexpr

STR is a token of type string, and aexpr is general number of type double.

Tokens can also have no value attached to it at all:

%token MATERIAL

Grammar Rules

The grammar rules define a syntax tree. Bison is a bottom-up parser. It tries, by shifts and reductions, to reduce the entire input down to a single grouping whose symbol is the grammar’s start symbol, which in our case is input:

// every statement ends in a semicolon
input :
      | input stmt ';'

This rule is split into two parts:

  • Input can be empty (indicated by no text after the colon)

  • It is a recursive rule, where it breaks the input into statements (stmt) ending with a semicolon.

A rule can be split into as many parts as possible.

Another example are the atomic statements (single lines without if constructs):

// atomic statements can be an mathematical expression, a declaration or a command
atomic_stmt :
| expr     { if(ECHO_GRAMMAR) printf("atomic_stmt -> expr\n"); }
| command  { if(ECHO_GRAMMAR) printf("atomic_stmt -> command\n"); }
| decl     { if(ECHO_GRAMMAR) printf("atomic_stmt -> decl\n"); }

The part inside the brackets is the actual C-code, which is only debug printout in this case. The rules for expr, command and decl are defined elsewhere.

Rules can be tokens and types as well, and can have a value. For example, the rule for addition looks like:

aexpr   | aexpr '+' aexpr      { $$ = $1 + $3;}

aexpr is a variable of type double. The rule reduces the syntax “number + number” to a single number. The new value (indicated with $$) will be the value of the first token ($1) plus the third token ($3). Note that the second token is ‘+’.

Debugging

Since adding or changing Bison rules can often have unforeseen consequences, it is strongly recommended that when extending the GMAD language, first write a test case for it and check that it fails. There are many GMAD CMake tests in the parser/test directory.

Often the compiler will complain when the rules are inconsistent and the CMake tests cover many syntax cases which all should still work. For debugging there are several options in parser.y, all of which need recompilation:

  • The variables ECHO_GRAMMAR and INTERACTIVE can be switched on for extra output.

  • Compile Bison with “-t” flag. This is automatically done when CMAKE_BUILD_TYPE equals Debug.

  • Uncomment the line with %debug. This will print out the token stack after each step.