bisonc++input(7) Organization of bisonc++'s grammar file(s)

DESCRIPTION

Bisonc++ derives from bison++(1), originally derived from bison(1). Like these programs bisonc++ generates a parser for an LALR(1) grammar. Bisonc++ generates C++ code: an expandable C++ class.

Refer to bisonc++(1) for a general overview. This manual page covers the structure and organization of bisonc++'s grammar file(s).

Bisonc++'s grammar file has the following generic outline:

    directives (see the next section)
    %%
    grammar rules
        

Grammar rules have the following generic form:

    nonterminal:
        production-rules
    ;
        

Production rules consist of zero or more sequences of terminal tokens, nonterminal tokens and/or action blocks. When multiple production rules are used they must be separated from each other by vertical bars. Action blocks are C++ compound statements.

This manual page contains the following sections:

  • DESCRIPTION: this section;
  • DIRECTIVES: bisonc++'s grammar-specification directives;
  • POLYMORPHIC SEMANTIC VALUES: how to use polymorphic semantic values in parsers generated by bisonc++;
  • DOLLAR NOTATIONS: available $-shorthand notations with single, union, and polymorphic semantic value types.
  • RESTRICTIONS ON TOKEN NAMES: name restrictions for user-defined symbols;
  • OBSOLETE SYMBOLS: symbols available to bison(1), but not to bisonc++;
  • EXAMPLE: an example of using bisonc++;
  • USING PARSER-CLASS SYMBOLS IN LEXICAL SCANNERS: how to refer to tokens defined in the grammar from within a lexical scanner;
  • AUTHOR: at the end of this man-page.

DIRECTIVES

Quite a few directives can be specified in the initial section of the grammar specification file. If command-line options for directives are available, then their specifications take precedence over the corresponding directives in the grammar file. Once class header or implementation header files exist directives affecting those files are ignored.

Directives accepting a `filename' do not accept path names, i.e., they cannot contain directory separators (/); directives accepting a 'pathname' may contain directory separators. A 'pathname' using blank characters should be surrounded by double quotes.

Some directives may generate errors. This happens when their specifications conflict with the contents of files bisonc++ cannot modify (e.g., a parser class header file exists, but doesn't define a namespace, but in a later run the a %namespace directive was provided).

To resolve such errors the offending directive could be omitted, the existing file could be removed, or the existing file could be hand-edited according to the directive's specification.

o
%baseclass-header filename
Filename defines the name of the file to contain the parser's base class. This class defines, e.g., the parser's symbolic tokens. Defaults to the name of the parser class plus the suffix base.h. This directive is overruled by the --baseclass-header (-b) command-line option.
It is an error if this directive is used and an already existing parser class header file does not contain #include "filename".
o
%baseclass-preinclude pathname
Pathname defines the path to the file preincluded by the parser's base-class header. See the description of the --baseclass-preinclude option for details about this directive. By default, bisonc++ surrounds header by double quotes. However, when header itself is surrounded by pointed brackets #include <header> is included.
o
%class-header filename
Filename defines the name of the file to contain the parser class. Defaults to the name of the parser class plus the suffix .h This directive is overruled by the --class-header (-c) command-line option.
It is an error if this directive is used and an already existing implementation header file does not contain #include "filename".
o
%class-name parser-class-name
Declares the name of the parser class. It defines the name of the C++ class that is generated. If no %class-name is specified the default class name Parser is used.
It is an error if this directive is used and an already existing parser-class header file does not define class `className' and/or if an already existing implementation header file does not define members of the class `className'.
o
%debug
Add debugging code to the generated parse and its support functions, which can show (on the standard output stream) the steps performed by the parsing function while it parses input streams. When this directive is specified then the parsing steps are shown by default. The setDebug members can be used to suppress outputting these parsing steps. #ifdef DEBUG macros are not used. Existing debugging code can be removed by rerunning bisonc++ without specifying the debug option or directive.
o
%default-actions(d)(off|quiet|warn|std)
By default, bisonc++ adds a $$ = $1 action block to rules not having final action blocks, but not to empty production rules. This default behavior can also explicitly be configured using the default-actions std option or directive.
Bisonc++ also supports alternate ways of handling rules not having final action blocks. When off is specified, bisonc++ does not add $$ = $1 action blocks; when polymorphic semantic values are used, then specifying
- warn adds specialized action blocks, using the semantic types of the first elements of the production rules, while issuing a warning;
- quiet adds these action blocks without issuing warnings.
When either warn or quiet are specified the types of $$ and $1 must match. When bisonc++ detects a type mismatches it issues errors.
o
%error-verbose
This directive can be specified to dump the parser's state stack to the standard output stream when the parser encounters a syntactic error. The stack dump shows on separate lines a stack index followed by the state stored at the indicated stack element. The first stack element is the stack's top element.
o
%expect number
This directive specifies the exact number of shift/reduce and reduce/reduce conflicts for which no warnings are to be generated. Details of the conflicts are reported in the verbose output file (e.g., grammar.output). If the number of actually encountered conflicts deviates from `number', then this directive is ignored.
o
%filenames filename
Filename is a generic filename that is used for all header files generated by bisonc++. Options defining specific filenames are also available (which then, in turn, overrule the name specified by this directive). This directive is overruled by the --filenames (-f) command-line option.
o
%flex
When provided, the scanner member returning the matched text is called as d_scanner.YYText(), and the scanner member returning the next lexical token is called as d_scanner.yylex(). This directive is only interpreted if the %scanner directive is also provided.
o
%implementation-header filename
Filename defines the name of the file to contain the implementation header. It defaults to the name of the generated parser class plus the suffix .ih.
The implementation header should contain all directives and declarations that are only used by the parser's member functions. It is the only header file that is included by the source file containing parse's implementation. User defined implementation of other class members may use the same convention, thus concentrating all directives and declarations that are required for the compilation of other source files belonging to the parser class in one header file.
o
%include pathname
This directive is used to switch to pathname while processing a grammar specification. Unless pathname defines an absolute file-path, pathname is searched relative to the location of bisonc++'s main grammar specification file (i.e., the grammar file that was specified as bisonc++'s command-line option). This directive can be used to split long grammar specification files in shorter, meaningful units. After processing pathname processing continues beyond the %include pathname directive.
o
%left terminal ...
Defines the names of symbolic terminal tokens that must be treated as left-associative. I.e., in case of a shift/reduce conflict, a reduction is preferred over a shift. Sequences of %left, %nonassoc, %right and %token directives may be used to define the precedence of operators. In expressions, the first used directive defines the tokens having the lowest precedence, the last used defines the tokens having the highest priority. See also %token below.
o
%locationstruct struct-definition
Defines the organization of the location-struct data type LTYPE__. This struct should be specified analogously to the way the parser's stacktype is defined using %union (see below). The location struct is named LTYPE__. By default (if neither locationstruct nor LTYPE__ is specified) the standard location struct (see the next directive) is used:
o
%lsp-needed
This directive results in bisonc++ generating a parser using the standard location stack. This stack's default type is:
    struct LTYPE__
    {
        int timestamp;
        int first_line;
        int first_column;
        int last_line;
        int last_column;
        char *text;
    };
           
Bisonc++ does not provide the elements of the LTYPE__ struct with values. Action blocks of production rules may refer to the location stack element associated with a production element using @ variables, like @1.timestamp, @3.text, @5. The rule's location struct itself may be referred to as either d_loc__ or @@.
o
%ltype typename
Specifies a user-defined token location type. If %ltype is used, typename should be the name of an alternate (predefined) type (e.g., size_t). It should not be used if a %locationstruct specification is defined (see below). Within the parser class, this type is available as the type `LTYPE__'. All text on the line following %ltype is used for the typename specification. It should therefore not contain comment or any other characters that are not part of the actual type definition.
o
%namespace namespace
Define all of the code generated by bisonc++ in the namespace namespace. By default no namespace is defined. If this directive is used the implementation header is provided with a commented out using namespace declaration for the specified namespace. In addition, the parser and parser base class header files also use the specified namespace to define their include guard directives.
It is an error if this directive is used and an already existing parser-class header file and/or implementation header file does not define namespace identifier.
o
%negative-dollar-indices
Do not generate warnings when zero- or negative dollar-indices are used in the grammar's action blocks. Zero or negative dollar-indices are commonly used to implement inherited attributes, and should normally be avoided. When used, they can be specified like $-1, or like $<type>-1, where type is empty; an STYPE__ tag; or a field-name. However, note that in combination with the %polymorphic directive (see below) only the $-i format can be used.
o
%no-lines
By default #line preprocessor directives are inserted just before action statements in the file containing the parser's parse function. These directives are suppressed by the %no-lines directive.
o
%nonassoc terminal ...
Defines the names of symbolic terminal tokens that should be treated as non-associative. I.e., in case of a shift/reduce conflict, a reduction is preferred over a shift. Sequences of %left, %nonassoc, %right and %token directives may be used to define the precedence of operators. In expressions, the first used directive defines the tokens having the lowest precedence, the last used defines the tokens having the highest priority. See also %token below.
o
%parsefun-source filename
Filename defines the name of the file to contain the parser member function parse. Defaults to parse.cc. This directive is overruled by the --parse-source (-p) command-line option.
o
%polymorphic polymorphic-specification(s)
Bison's traditional way of handling multiple semantic values is to use a %union specification (see below). Although %union is supported by bisonc++, a polymorphic semantic value class is preferred due to its improved type safety.
The %polymorphic directive defines a polymorphic semantic value class and can be used instead of a %union specification. Refer to section POLYMORPHIC SEMANTIC VALUES below or to bisonc++'s user manual for a detailed description of the specification, characteristics, and use of polymorphic semantic values.
o
%prec token
Overrules the defined precedence of an operator for a particular grammatical rule. A well known application of %prec is:
    expression:
        '-' expression %prec UMINUS
        {
            ...
        }
                
Here, the default priority and precedence of the `-' token as the subtraction operator is overruled by the precedence and priority of the UMINUS token, which is commonly defined as
    %right UMINUS
                
(see below) following, e.g., the '*' and '/' operators.
o
%print-tokens
The print directive provides an implementation of the Parser class's print__ function displaying the current token value and the text matched by the lexical scanner as received by the generated parse function.
o
%required-tokens number
Following a syntactic error, require at least number successfully processed tokens before another syntactic error can be reported. By default number is zero.
o
%right terminal ...
Defines the names of symbolic terminal tokens that should be treated as right-associative. I.e., in case of a shift/reduce conflict, a shift is preferred over a reduction. Sequences of %left, %nonassoc, %right and %token directives may be used to define the precedence of operators. In expressions, the first used directive defines the tokens having the lowest precedence, the last used defines the tokens having the highest priority. See also %token below.
o
%scanner pathname
Use pathname as the path name to the file pre-included in the parser's class header. See the description of the --scanner option for details about this directive. Similar to the convention adopted for this argument, pathname by default is surrounded by double quotes. However, when the argument is surrounded by pointed brackets #include <pathname> is included. This directive results in the definition of a composed Scanner d_scanner data member into the generated parser, and in the definition of a int lex() member, returning d_scanner.lex().
By specifying the %flex directive the function d_scanner.yylex() is called. Any other function to call can be specified using the --scanner-token-function option (or %scanner-token-function directive).
It is an error if this directive is used and an already existing parser class header file does not include `pathname'.
o
%scanner-class-name scannerClassName
Defines the name of the scanner class, declared by the pathname header file that is specified at the scanner option or directive. By default the class name Scanner is used.
It is an error if this directive is used and either the scanner directive was not provided, or the parser class interface in an already existing parser class header file does not declare a scanner class d_scanner object.
o
%scanner-matched-text-function function-call
The scanner function returning the text that was matched by the lexical scanner after its token function (see below) has returned. A complete function call expression should be provided (including a scanner object, if used). Example:
    %scanner-matched-text-function myScanner.matchedText()
                
By specifying the %flex directive the function d_scanner.YYText() is called.
If the function call contains white space scanner-token-function should be surrounded by double quotes.
o
%scanner-token-function function-call
The scanner function returning the next token, called from the generated parser's lex function. A complete function call expression should be provided (including a scanner object, if used). Example:
    %scanner-token-function d_scanner.lex()
                
If the function call contains white space scanner-token-function should be surrounded by double quotes.
It is an error if this directive is used and the scanner token function is not called from the code in an already existing implementation header.
o
%stack-expansion size Defines the number of elements to be added to the generated parser's semantic value stack when it must be enlarged. By default 10 elements are added to the stack. This option/directive is interpreted only once, and only if size at least equals the default stack expansion size of 10.
o
%start nonterminal
The nonterminal nonterminal should be used as the grammar's start-symbol. If omitted, the first grammatical rule is used as the grammar's starting rule. All syntactically correct sentences must be derivable from this starting rule.
o
%stype typename
The type of the semantic value of nonterminal tokens. By default it is int. %stype, %union, and %polymorphic are mutually exclusive directives.
Within the parser class, the semantic value type is available as the type `STYPE__'. All text on the line following %stype is used for the typename specification. It should therefore not contain comment or any other characters that are not part of the actual type definition.
o
%tag-mismatches on|off
This directive is only interpreted when polymorphic semantic values are used. When on is specified (which is used by default) the parse member of the generated parser dynamically checks that the tag that is used when calling a semantic value's get member matches the actual tag of the semantic value.
If a mismatch is observed, then the parsing function aborts after displaying a fatal error message. If this happens, and if the option/directive debug was specified when bisonc++ created the parser's parsing function, then the program can be rerun, specifying parser.setDebug(Parser::ACTIONCASES) before calling the parsing function. As a result the case-entry numbers of the switch, defined in the parser's executeAction member, are inserted into the standard output stream. The action case number reported just before the program displays the fatal error message tells you in which of the grammar's action block the error was encountered.
o
%target-directory pathname
Pathname defines the directory where generated files should be written. By default this is the directory where bisonc++ is called. This directive is overruled by the --target-directory command-line option.
o
%token terminal ...
Defines the names of symbolic terminal tokens. Sequences of %left, %nonassoc, %right and %token directives may be used to define the precedence of operators. In expressions, the first used directive defines the tokens having the lowest precedence, the last used defines the tokens having the highest priority. See also %token below.
NOTE: Symbolic tokens are defined as enum-values in the parser's base class. The names of symbolic tokens may not be equal to the names of the members and types defined by bisonc++ itself (see the next sections). This requirement is not enforced by bisonc++, but compilation errors may result if this requirement is violated.
o
%type <type> nonterminal ...
In combination with %polymorphic or %union: associate the semantic value of a nonterminal symbol with a polymorphic semantic value tag or union field defined by these directives.
o
%union union-definition
Acts identically to the identically named bison and bison++ declaration. Bisonc++ generates a union, named STYPE__, as its semantic type.
o
%weak-tags
This directive is ignored unless the %polymorphic directive was specified. It results in the declaration of enum Tag__ rather than enum class Tag__. When in doubt, don't use this directive.

POLYMORPHIC SEMANTIC VALUES

Like bison(1), bisonc++ by default uses int semantic values, and also supports the %stype and %union directives for using single-type or traditional C-type unions as semantic values. These types of semantic values are covered in bisonc++'s manual.

In addition, the %polymorphic directive can be specified to generate a parser using `polymorphic' semantic values. In this case semantic values are specified as pairs, consisting of tags (which are C++ identifiers), and C++ (pointer or value) type names. Tags and type names are separated by colons. Multiple tag and type name combinations are separated by semicolons, and an optional semicolon ends the final tag/type pair.

Here is an example, defining three semantic values: an int, a std::string and a std::vector<double>:

    %polymorphic INT: int; STRING: std::string; 
                 VECT: std::vector<double>
        
The identifier to the left of the colon is called the tag-identifier (or simply tag), and the type name to the right of the colon is called the type-name. Starting with bisonc++ version 4.12.00 the types no longer have to provide default constructors.

When polymorphic type-names refer to types that have not yet been declared by the parser's base class header, then these types must be (directly or indirectly) declared in a header file whose location is specified using the %baseclass-preinclude directive.

%type directives are used to associate (non-)terminals with semantic value types. E.g., after:

    %polymorphic INT: int; TEXT: std::string
    %type <INT> expr
        
the expr nonterminal returns int semantic values. In a rule like:
    expr:
        expr '+' expr
        {
            // Action block: C++ statements here.
        }
        
symbols $$, $1, and $3 represent int values, and can be used that way in the C++ action block.

Definitions and declarations

The %polymorphic directive adds the following definitions and declarations to the generated base class header and parser source file (if the %namespace directive was used then all declared/defined elements are placed inside the namespace that is specified by the %namespace directive):

  • All semantic value type identifiers are collected in a strongly typed `Tag__' enumeration. E.g.,
        enum class Tag__
        {
            INT,
            STRING,
            VECT
        };
            
    
  • An anonymous enum defining the symbolic constant sizeofTag__ equal to the number of tags in the Tag__ enumeration.
  • The namespace Meta__ contains almost all of the code implementing polymorphic values.

The namespace Meta__ contains, among other classes the class SType. The parser's semantic value type STYPE__ is equal to Meta__::SType.

STYPE__ equals Meta__::SType

Meta__::SType provides the standard user interface for using polymorphic semantic data types. It declares the following public interface:

o
Constructors: Default, copy and move constructors. No data can be retrieved from SType objects that were constructed by SType's default constructors, but they can accept values of defined polymorphic types, which may then be retrieved from those objects.
o
Operators: The standard overloaded assignment operators (copy and move assignment operators) are available.
In addition the member templates
    SType &operator=(Type const &value)
and 
    SType &operator=(Type &&tmp)
    
can be used for all polymorphic semantic value types. Type must (maybe after casting) exactly match one of the defined polymorphic semantic types, because Type is used to determine the appropriate Meta__::Tag__ value.
When operator=(Type const &value) is used, the left-hand side SType object receives a copy of value; when operator=(Type &&tmp) is used, tmp is move-assigned to the left-hand side SType object;
o
void assign<tag>(Args &&...args) The tag template argument must be a Tag__ value. This member function constructs a semantic value of the type matching tag from the arguments that are passed to this member (zero arguments are OK if the type associated with tag supports default construction). The constructed value (not a copy of this value) is then stored in the STYPE__ object for which assign has been called.
As a Meta__::Tag__ value must be specified when using assign the compiler can use the explicit tag to convert assign's arguments to an SType object of the type matching the specified tag.
The member assign can be used to store a specific polymorphic semantic value in an STYPE__ object. It differs from the set of operator=(Type) members in that assign accepts multiple arguments to construct the requested SType value from, whereas the operator= members only accept single arguments of defined polymorphic types.
To initialize an STYPE__ object with a default STYPE__ value, direct assignment can be used (e.g., d_lval__ = STYPE__{}).
o
DataType &get<tag>(), and DataType const &get<tag>() const These members return references to the object's semantic values. The tag must be a Tag__ value: its specification tells the compiler which semantic value type it must use.
When the option/directive tag-mismatches on was specified then get, when called from the generated parse function, performs a run-time check to confirm that the specified tag corresponds to object's actual Tag__ value. If a mismatch is observed, then the parsing function aborts with a fatal error message. When shorthand notations (like $$ and $1) are used in production rules' action blocks, then bisonc++ can determine the correct tag, preventing the run-time check from failing.
But once a fatal error is encountered, it can be difficult to determine which action block generated the error. If this happens, then consider regenerating the parser specifying the --debug option, calling
parser.setDebug(Parser::ACTIONCASES)
before calling the parser's parse function.
Following this the case-entry numbers of the switch which is defined in the parser's executeAction member are inserted into the standard output stream just before the matching statements are executed. The action case number that's reported just before the program reports the fatal error tells you in which of the grammar's action block the error was encountered.
o
Tag__ tag() const The tag matching the semantic value's polymorphic type is returned. The returned value is a valid Tag__ value when the SType object's valid member returns true;
By default, or after assigning a plain (default) STYPE__ object to an STYPE__ object (e.g., using a statement like $$ = STYPE__{}), valid returns false, and the tag member returns Meta__::sizeofTag__.
o
bool valid() const
The value true is returned if the object contains a semantic value. Otherwise false is returned. Note that default STYPE__ values can be assigned to STYPE__ objects, but they do not represent valid semantic values. See also the previous description of the tag member.

DOLLAR NOTATIONS

Inside action blocks dollar-notations can be used to retrieve and assign values from/to the elements of production rules. Type directives are used to associates dollar-notations with semantic types.

When %stype is specified (and with the default int semantic value type) the following dollar-notations are available:

o
$$ =
A value is assigned to the rule's nonterminal's semantic value. The right-hand side (rhs) of the assignment expression must be an expression of a type that can be assigned to the STYPE__ type.
o
$$(expr)
Same as the previous dollar-notation: expr's value is assigned to the rule's nonterminal's semantic value.
o
_$$
This refers to the semantic value of the rule's nonterminal.
o
$$
Same as the previous item: this refers to the semantic value of the rule's nonterminal.
o
$$.
If STYPE__ is a class-type then this dollar-notation is shorthand for the member selector operator, applied to the rule's nonterminal's semantic value.
o
$$->
If STYPE__ is a class-type then this dollar-notation is shorthand for the pointer to member operator, applied to the rule's nonterminal's semantic value.
o
_$1
This refers to the current production rule's first component's semantic value.
o
$1
Same as the previous dollar-notation: this refers to the current production rule's first component's semantic value.
o
$1.
If STYPE__ is a class-type then this dollar-notation is shorthand for the member selector operator, applied to the current production rule's first component's semantic value.
o
$1->
If STYPE__ is a class-type then this dollar-notation is shorthand for the pointer to member operator, applied to the current production rule's first component's semantic value.
o
_$-1
This refers to the semantic value of a component in a production rule, listed immediately before the current rule's nonterminal ($-2 refers to a component used two elements before the current nonterminal, etc.).
o
$-1
Same as the previous item: this refers to the semantic value of a component in a production rule, listed immediately before the current rule's nonterminal.
o
$-1.
If STYPE__ is a class-type then this dollar-notation is shorthand for the member selector operator, applied to the semantic value of some production rule element, 1 element before the current rule's nonterminal.
o
$-1->
If STYPE__ is a class-type then this dollar-notation is shorthand for the pointer to member operator, applied to the semantic value of some production rule element, 1 element before the current rule's nonterminal.

When %union is specified these dollar-notations are available:

  • $$ =
  • A value is assigned to the rule's nonterminal's semantic value. If the rule's nonterminal was associated with one of the union's field types, then the matching union field receives the value of the assignment expression's right-hand side. If no association was defined then the variable representing the nonterminal's semantic value is a plain union (i.e., STYPE__) variable.
  • $$(expr)
  • Expr's value is assigned to the rule's nonterminal's plain union (i.e., STYPE__) type. Any association that may have been defined between the nonterminal and a union field is ignored.
  • _$$
  • This refers to the rule's nonterminal's plain union (i.e., STYPE__) type. Any association that may have been defined between the nonterminal and a union field is ignored.
  • $$
  • This refers to the rule's nonterminal's semantic value. If it was associated with one of the union's types, then $$ refers to the associated union field. If no association was defined then $$ represents a plain union (i.e., STYPE__) type of variable.
  • $$.
  • If the rule's nonterminal's semantic value was associated with one of the union's types, then $$. is shorthand for the member selector operator, applied to the associated union field type. If no association was defined then $$. is shorthand for the field selector operator, applied to the nonterminal's semantic value's plain union (i.e., STYPE__) type.
  • $$->
  • If the rule's nonterminal's semantic value was associated with one of the union's types, then $$-> is shorthand for the pointer to member operator, applied to the associated union field type. If no association was defined then an error message is issued, as the pointer to member operator is not defined for plain union types.
  • _$1
  • This refers to the current production rule's first component's plain union (STYPE__) value.
  • $1
  • This shorthand refers to the semantic value of the production rule's first element. If it was associated with one of the union's types, then $1 refers to the associated union field. If no association was defined then $1 represents a plain union (i.e., STYPE__) type of variable.
  • $1.
  • If the production rule's first component's semantic value was associated with one of the union's types, then $1. is shorthand for the member selector operator, applied to the associated union field type. If no association was defined then $1. is shorthand for the field selector operator, applied to the first component's semantic value's plain union (i.e., STYPE__) type.
  • $1->
  • If the production rule's first component's semantic value was associated with one of the union's types, then $1-> is shorthand for the pointer to member operator, applied to the associated union field type. If no association was defined then an error message is issued, as the pointer to member operator is not defined for plain union types.
  • _$-1
  • This refers to the plain union (STYPE__) value of a component in a production rule, listed immediately before the current rule's nonterminal ($-2 refers to a component used two elements before the current nonterminal, etc.).
  • $-1
  • Same: this refers to the plain union (STYPE__) value of a component in a production rule, listed immediately before the current rule's nonterminal ($-2 refers to a component used two elements before the current nonterminal, etc.).
  • $-1.
  • This is shorthand for the field selector operator applied to to the plain union (STYPE__) value of some production rule element, 1 element before the current rule's nonterminal.
  • $-1->
  • This shorthand refers to tho pointer to member operator applied to the plain union (STYPE__) value of some production rule element, 1 element before the current rule's nonterminal. Its use results in an error message, as the pointer to member operator is not defined for plain union types.
  • $<field>-1
  • This refers to the field union field of a component in a production rule, listed immediately before the current rule's nonterminal. Note that the validity of the specified field for that particular component cannot be verified by bisonc++.
  • $<field>-1.
  • This refers to the member selector operator of the field union field of a component in a production rule, listed immediately before the current rule's nonterminal. Note that the validity of the specified field for that particular component cannot be verified by bisonc++.
  • $<field>-1-> This refers to the pointer to member operator of the field union field of a component in a production rule, listed immediately before the current rule's nonterminal. Note that the validity of the specified field for that particular component cannot be verified by bisonc++.

When %polymorphic is specified these dollar-notations can be used:

o
$$ =
A semantic value is assigned to the rule's nonterminal's semantic value. The right-hand side (rhs) of the assignment expression must be an expression of the type that is associated with $$. This assignment operation assumes that the type of the rhs-expression equals $$'s semantic value type. If the types don't match the compiler issues a compilation error when compiling parse.cc. Casting the rhs to the correct value type is possible, but in that case the function call operator (see the next item) is preferred, as it does not require casting. If no semantic value type was associated with $$ then the assignment $$ = STYPE__{} can be used.
o
$$(expr)
A value is assigned to the rule's nonterminal's semantic value. Expr must be of a type that can be statically cast to $$'s semantic value type. The required static_cast is generated by bisonc++ and doesn't have to be specified for expr.
o
_$$
This refers to the rule's nonterminal's semantic value, disregarding any polymorphic type that might have been associated with the rule's nonterminal.
o
$$
If no polymorphic type was associated with the rule's nonterminal then this is shorthand for a reference to the rule's plain STYPE__ value. If a polymorphic value type was associated with the rule's nonterminal then this shorthand represents a reference to a value of that particular type.
o
$$.
If no polymorphic type was associated with the rule's nonterminal then this is shorthand for the member selector operator, applied to a reference to the rule's nonterminal's STYPE__ value. If a polymorphic value type was associated with the rule's nonterminal then this shorthand represents the member selector operator, applied to a reference of that particular type.
o
$$->
If no polymorphic type was associated with the rule's nonterminal then this is shorthand for the pointer to member operator, applied to a reference to the rule's nonterminal's STYPE__ value. If a polymorphic value type was associated with the rule's nonterminal then this shorthand represents the pointer to member operator, applied to a reference of that particular type.
o
_$1
This refers to the current production rule's first component's generic STYPE__ value.
o
$1
This shorthand refers to the semantic value of the production rule's first element. If it was associated with a polymorphic type, then $1 refers to a value of that particular type. If no association was defined then $1 represents a generic STYPE__ value.
o
$1.
If the production rule's first component's semantic value was associated with a polymorphic type, then $1. is shorthand for the member selector operator, applied to the value of the associated polymorphic type. If no association was defined then $1. is shorthand for the member selector operator, applied to the first component's generic STYPE__ value.
o
$1->
If the production rule's first component's semantic value was associated with a polymorphic type, then $1-> is shorthand for the pointer to member operator, applied to the value of the associated polymorphic type. If no association was defined then $1. is shorthand for the pointer to member operator, applied to the first component's generic STYPE__ value.
o
_$-1
This refers to the generic (STYPE__) value of a component in a production rule, listed immediately before the current rule's nonterminal ($-2 refers to a component used two elements before the current nonterminal, etc.).
o
$-1
Same: this refers to the generic (STYPE__) value of a component in a production rule, listed immediately before the current rule's nonterminal ($-2 refers to a component used two elements before the current nonterminal, etc.).
o
$-1.
This is shorthand for the member selector operator applied to to the generic STYPE__ value of some production rule element, 1 element before the current rule's nonterminal.
o
$-1->
This is shorthand for the pointer to member operator applied to to the generic STYPE__ value of some production rule element, 1 element before the current rule's nonterminal.
o
$<tag>-1
This shorthand represents a reference to the semantic value of the polymorphic type associated with tag of some production rule element, 1 element before the current rule's nonterminal.
If, when using the generated parser's class parse function, the polymorphic type of that element turns out not to match the type that is associated with tag then a run-time fatal error results.
If that happens, and the debug option/directive had been specified when bisonc++ was run, then rerun the program after specifying parser.setDebug(Parser::ACTIONCASES) to locate the parse function's action block where the fatal error was encountered.
o
$<tag>-1.
This shorthand represents the member selector operator, applied to the semantic value of the polymorphic type associated with tag of some production rule element, 1 element before the current rule's nonterminal.
If, when using the generated parser's class parse function, the polymorphic type of that element turns out not to match the type that is associated with tag then a run-time fatal error results. The procedure suggested at the previous ($<tag>-1) item for solving such errors can be applied here as well.
o
$<tag>-1->
This shorthand represents the pointer to member selector operator, applied to the semantic value of the polymorphic type associated with tag of some production rule element, 1 element before the current rule's nonterminal.
If, when using the generated parser's class parse function, the polymorphic type of that element turns out not to match the type that is associated with tag then a run-time fatal error results. The procedure suggested at the previous ($<tag>-1) item for solving such errors can be applied here as well.

RESTRICTIONS ON TOKEN NAMES

To avoid collisions with names defined by the parser's (base) class, the following identifiers should not be used as token names:

  • Identifiers ending in two underscores;
  • Any of the following identifiers: ABORT, ACCEPT, ERROR, clearin, debug, or setDebug.

OBSOLETE SYMBOLS

All DECLARATIONS and DEFINE symbols not listed above but defined in bison++ are obsolete with bisonc++. In particular, there is no %header{ ... %} section anymore. Also, all DEFINE symbols related to member functions are now obsolete. There is no need for these symbols anymore as they can simply be declared in the class header file and defined elsewhere.

EXAMPLE

Using a fairly worn-out example, we'll construct a simple calculator below. The basic operators as well as parentheses can be used to specify expressions, and each expression should be terminated by a newline. The program terminates when a q is entered. Empty lines result in a mere prompt.

First an associated grammar is constructed. When a syntactic error is encountered all tokens are skipped until then next newline and a simple message is printed using the default error function. It is assumed that no semantic errors occur (in particular, no divisions by zero). The grammar is decorated with actions performed when the corresponding grammatical production rule is recognized. The grammar itself is rather standard and straightforward, but note the first part of the specification file, containing various other directives, among which the %scanner directive, resulting in a composed d_scanner object as well as an implementation of the member function int lex. In this example, a common Scanner class construction strategy was used: the class Scanner was derived from the class yyFlexLexer generated by flex++(1). The actual process of constructing a class using flex++(1) is beyond the scope of this man-page, but flex++(1)'s specification file is mentioned below, to further complete the example. Here is bisonc++'s input file:

%filenames parser
%scanner ../scanner/scanner.h
                                // lowest precedence
%token  NUMBER                  // integral numbers
        EOLN                    // newline
%left   '+' '-' 
%left   '*' '/' 
%right  UNARY
                                // highest precedence 
%%
expressions:
    expressions  evaluate
|
    prompt
;
evaluate:
    alternative prompt
;
prompt:
    {
        prompt();
    }
;
alternative:
    expression EOLN
    {
        cout << $1 << endl;
    }
|
    'q' done
|
    EOLN
|
    error EOLN
;
done:
    {
        cout << "Done.\n";
        ACCEPT();
    }
;
expression:
    expression '+' expression
    {
        $$ = $1 + $3;
    }
|
    expression '-' expression
    {
        $$ = $1 - $3;
    }
|
    expression '*' expression
    {
        $$ = $1 * $3;
    }
|
    expression '/' expression
    {
        $$ = $1 / $3;
    }
|
    '-' expression      %prec UNARY
    {
        $$ = -$2;
    }
|
    '+' expression      %prec UNARY
    {
        $$ = $2;
    }
|
    '(' expression ')'
    {
        $$ = $2;
    }
|
    NUMBER
    {
        $$ = stoul(d_scanner.matched());
    }
;

Next, bisonc++ processes this file. In the process, bisonc++ generates the following files from its skeletons:

o
The parser's base class, which should not be modified by the programmer:
// Generated by Bisonc++ V5.00.00 on Wed, 13 Apr 2016 10:19:24 +0530
#ifndef ParserBase_h_included
#define ParserBase_h_included
#include <exception>
#include <vector>
#include <iostream>
namespace // anonymous
{
    struct PI__;
}
class ParserBase
{
    public:
        enum DebugMode__
        {
            OFF           = 0,
            ON            = 1 << 0,
            ACTIONCASES   = 1 << 1
        };
// $insert tokens
    // Symbolic tokens:
    enum Tokens__
    {
        NUMBER = 257,
        EOLN,
        UNARY,
    };
// $insert STYPE
typedef int STYPE__;
    private:
        int d_stackIdx__ = -1;
        std::vector<size_t>   d_stateStack__;
        std::vector<STYPE__>  d_valueStack__;
    protected:
        enum Return__
        {
            PARSE_ACCEPT__ = 0,   // values used as parse()'s return values
            PARSE_ABORT__  = 1
        };
        enum ErrorRecovery__
        {
            DEFAULT_RECOVERY_MODE__,
            UNEXPECTED_TOKEN__,
        };
        bool        d_actionCases__ = false;
        bool        d_debug__ = true;
        size_t      d_nErrors__ = 0;
        size_t      d_requiredTokens__;
        size_t      d_acceptedTokens__;
        int         d_token__;
        int         d_nextToken__;
        size_t      d_state__;
        STYPE__    *d_vsp__;
        STYPE__     d_val__;
        STYPE__     d_nextVal__;
        ParserBase();
        void ABORT() const;
        void ACCEPT() const;
        void ERROR() const;
        void clearin();
        bool actionCases() const;
        bool debug() const;
        void pop__(size_t count = 1);
        void push__(size_t nextState);
        void popToken__();
        void pushToken__(int token);
        void reduce__(PI__ const &productionInfo);
        void errorVerbose__();
        size_t top__() const;
    public:
        void setDebug(bool mode);
        void setDebug(DebugMode__ mode);
}; 
inline ParserBase::DebugMode__ operator|(ParserBase::DebugMode__ lhs, 
                                     ParserBase::DebugMode__ rhs)
{
    return static_cast<ParserBase::DebugMode__>(static_cast<int>(lhs) | rhs);
};
inline bool ParserBase::debug() const
{
    return d_debug__;
}
inline bool ParserBase::actionCases() const
{
    return d_actionCases__;
}
inline void ParserBase::ABORT() const
{
    throw PARSE_ABORT__;
}
inline void ParserBase::ACCEPT() const
{
    throw PARSE_ACCEPT__;
}
inline void ParserBase::ERROR() const
{
    throw UNEXPECTED_TOKEN__;
}
// For convenience, when including ParserBase.h its symbols are available as
// symbols in the class Parser, too.
#define Parser ParserBase
#endif

o
The parser class parser.h itself. In the grammar specification various member functions are used (e.g., done) and prompt. These functions are so small that they can very well be implemented inline. Note that done calls ACCEPT to terminate further parsing. ACCEPT and related members (e.g., ABORT) can be called from any member called by parse. As a consequence, action blocks could contain mere function calls, rather than several statements, thus minimizing the need to rerun bisonc++ when an action is modified.
After bisonc++ created parser.h the additional members were added to it, resulting in the following final version:
// Generated by Bisonc++ V5.00.00 on Sun, 03 Apr 2016 17:49:17 +0200
#ifndef Parser_h_included
#define Parser_h_included
// $insert baseclass
#include "parserbase.h"
// $insert scanner.h
#include "../scanner/scanner.h"
#undef Parser
class Parser: public ParserBase
{
    // $insert scannerobject
    Scanner d_scanner;
        
    public:
        int parse();
    private:
        void error(char const *msg);    // called on (syntax) errors
        int lex();                      // returns the next token from the
                                        // lexical scanner. 
        void print();                   // use, e.g., d_token, d_loc
        void prompt();
        void done();
    // support functions for parse():
        void executeAction(int ruleNr);
        void errorRecovery();
        int lookup(bool recovery);
        void nextToken();
        void print__();
        void exceptionHandler__(std::exception const &exc);
};
inline void Parser::prompt()
{
    std::cout << "? " << std::flush;
}
inline void Parser::done()
{
    std::cout << "Done\n";
    ACCEPT();
}
#endif

o
The lexical scanner specification, used by flexc++(1) to create the lexical scanner completes the example.
%interactive
%filenames scanner
%%
[ \t]+                          // skip white space
\n                              return Parser::EOLN;
[0-9]+                          return Parser::NUMBER;
.                               return matched()[0];
%%

o
Since no member functions other than parse were defined in separate source files, only parse includes parser.ih. Since cerr is used in the grammar's actions, a using namespace std or comparable directive is required. It was specified at the end of parser.ih. Here is the implementation header declaring the standard namespace:
// Generated by Bisonc++ V5.00.00 on Sun, 03 Apr 2016 17:51:26 +0200
    // Include this file in the sources of the class Parser.
// $insert class.h
#include "parser.h"
inline void Parser::error(char const *msg)
{
    std::cerr << msg << '\n';
}
// $insert lex
inline int Parser::lex()
{
    return d_scanner.lex();
}
inline void Parser::print()         
{
    print__();           // displays tokens if --print was specified
}
inline void Parser::exceptionHandler__(std::exception const &exc)         
{
    throw;              // re-implement to handle exceptions thrown by actions
}
    // Add here includes that are only required for the compilation 
    // of Parser's sources.
    // UN-comment the next using-declaration if you want to use
    // int Parser's sources symbols from the namespace std without
    // specifying std::
using namespace std;

In the current context the parsing member function parse's implementation is not very relevant, since it should not be modified by the programmer. It is not shown here, but is available in the example's source file calculator/parser/parse.cc.
o
Finally, here is the program's main function:
#include "parser/parser.h"
int main()
{
    Parser calculator;
    return calculator.parse();
}

USING PARSER-CLASS SYMBOLS IN LEXICAL SCANNERS

Although the file parserbase.h, defining the parser class' base-class, rather than the header file parser.h defining the parser class is included by scanner.ih, the lexical scanner may simply return tokens of the class Parser (e.g., Parser::NUMBER rather than ParserBase::NUMBER). This former specification is considered somewhat more intuitively appealing than the latter specification. It was realized by a simple #define - #undef pair generated by bisonc++ near the end of parserbase.h and just before the definition of the parser class itself in the file parser.h. Note that this feature can only be used to access base class types and enum values. The actual parser class is not available by the time the the lexical scanner is being defined, avoiding circular class dependencies.

AUTHOR

Frank B. Brokken ([email protected]).