.. _dev-parser: Parser ****** The parser for the GMAD language is independent from the rest of BDSIM and can be found in the parser directory. The main parser interface can be found in :code:`parser.h`. The parser is currently a singleton (only one instance with global scope). The GMAD language is a LALR (Look-Ahead LR) parser language generated by Bison (in the file :code:`parser.y`) with a lexical analyser generated by flex (from :code:`parser.l`). GMAD provides basic arithmetic and Boolean operators. GMAD also provides predefined units, constants in SI units and some common mathematical functions, like the trigonometric ones, by binding them to actual C-variables and functions. It has a global scope only. .. note:: GMAD is designed to be an extension of the syntax used by MAD, which means that any MAD syntax should be supported. Parser Classes ============== All options for a BDSIM run are contained in an instance of the Options class. This is passed to BDSIM. The Options class is built out of a struct-like OptionsBase and a layer of self-inspection provided by the Published class. The struct-like OptionsBase is needed to output the options to the rootevent output. All other parser classes like Element, Region, etc. also have this layer of self-inspection. The Parameter class is a temporary storage class of the parameters before the actual Elements are created. The enum ElementType has a list of all elements currently in BDSIM. The Array class is an array representation for bison and either strings or doubles. The python.h provides the Python interface to the parser. The Symtab class represents a parser variable. All variables are stored in a map. Bison ===== This section gives a brief overview of Bison. For more comprehensive reading, a `manual `_ is recommended. The :code:`parser.y` file contains the typical four main sections:: %{ C declarations %} Bison token and types declarations %% Grammar rules %% Additional C code The gmad keywords are translated to bison tokens in the *library* file :code:`parser.l`. C-declarations ^^^^^^^^^^^^^^ The C-declarations are a few global variables. Bison Tokens ^^^^^^^^^^^^ Bison tokens (translated directly with the library) and types (more general variables) are from a union and these can be one of the following types: * double * int (for the enum class ElementType) * std::string* (a pointer so its size can fit in the union; its memory is stored in the Parser class) * GMAD::Array* * GMAD::Symtab* (a pointer to a general symbol / variable class, which can represent a double, string, GMAD::Array or a function) The union type of the tokens are defined in the Bison declaration section of :code:`parser.y`, for example:: %token STR %type aexpr *STR* is a token of type string, and *aexpr* is general number of type double. Tokens can also have no value attached to it at all:: %token MATERIAL Grammar Rules ^^^^^^^^^^^^^ The grammar rules define a syntax tree. Bison is a bottom-up parser. It tries, by shifts and reductions, to reduce the entire input down to a single grouping whose symbol is the grammar's start symbol, which in our case is *input*:: // every statement ends in a semicolon input : | input stmt ';' This rule is split into two parts: * Input can be empty (indicated by no text after the colon) * It is a recursive rule, where it breaks the input into statements (*stmt*) ending with a semicolon. A rule can be split into as many parts as possible. Another example are the atomic statements (single lines without if constructs):: // atomic statements can be an mathematical expression, a declaration or a command atomic_stmt : | expr { if(ECHO_GRAMMAR) printf("atomic_stmt -> expr\n"); } | command { if(ECHO_GRAMMAR) printf("atomic_stmt -> command\n"); } | decl { if(ECHO_GRAMMAR) printf("atomic_stmt -> decl\n"); } The part inside the brackets is the actual C-code, which is only debug printout in this case. The rules for *expr*, *command* and *decl* are defined elsewhere. Rules can be tokens and types as well, and can have a value. For example, the rule for addition looks like:: aexpr | aexpr '+' aexpr { $$ = $1 + $3;} *aexpr* is a variable of type double. The rule reduces the syntax "number + number" to a single number. The new value (indicated with $$) will be the value of the first token ($1) plus the third token ($3). Note that the second token is '+'. Debugging ^^^^^^^^^ Since adding or changing Bison rules can often have unforeseen consequences, it is strongly recommended that when extending the GMAD language, first write a test case for it and check that it fails. There are many GMAD CMake tests in the *parser/test* directory. Often the compiler will complain when the rules are inconsistent and the CMake tests cover many syntax cases which all should still work. For debugging there are several options in :code:`parser.y`, all of which need recompilation: * The variables ECHO_GRAMMAR and INTERACTIVE can be switched on for extra output. * Compile Bison with "-t" flag. This is automatically done when CMAKE_BUILD_TYPE equals Debug. * Uncomment the line with %debug. This will print out the token stack after each step.