bisonc++(1) Generate a C++ parser class and parsing function

SYNOPSIS

bisonc++ [OPTIONS] grammar-file

DESCRIPTION

Bisonc++ derives from previous work on bison by Alain Coetmeur ([email protected]), who created in the early '90s a C++ class encapsulating the yyparse function as generated by the GNU-bison parser generator.

Initial versions of bisonc++ (up to version 0.92) wrapped Alain's program in a program offering a more modern user-interface, removing all old-style (C) %define directives from bison++'s input specification file (see below for an in-depth discussion of the differences between bison++ and bisonc++). Starting with version 0.98, bisonc++ represents a complete rebuilt of the parser generator, closely following descriptions given in Aho, Sethi and Ullman's Dragon Book. Since version 0.98 bisonc++ is a C++ program, rather than a C program generating C++ code.

Bisonc++ expands the concepts initially implemented in bison and bison++, offering a cleaner setup of the generated parser class. The parser class is derived from a base-class, mainly containing the parser's token- and type-definitions as well as several member functions which should not be modified by the programmer.

Most of these base-class members might also be defined directly in the parser class, but were defined in the parser's base-class. This design results in a very lean parser class, declaring only members that are actually defined by the programmer or that have to be defined by bisonc++ itself (e.g., the member function parse as well as some support functions requiring access to facilities that are only available in the parser class itself, rather than in the parser's base class).

This design does not require any virtual members: the members which are not involved in the actual parsing process may always be (re)implemented directly by the programmer. Thus there is no need to apply or define virtual member functions.

Before version 5.00.00 bisonc++ offered one single manual page. The advantage of one man-page is of course that you never have to look for which manual page contains which information. But on the other hand, bisonc++'s man-page grew into a huge man-page of about 2000 lines in which it was hard to find your way. From release 5.00.00 onward, three man-pages. The following index relates manual pages to their specific contents:

This man-page

This man-page concentrates on the bisonc++ program itself, offering the following sections:

  • DESCRIPTION: a short description of bisonc++ and its roots;
  • OPTIONS: options supported by bisonc++.
  • QUICK START: a quick start overview about how to use bisonc++;
  • GENERATED FILES: files generated by bisonc++ and their purposes
  • FILES: skeleton files used by bisonc++;
  • SEE ALSO: references to other programs and documentation;
  • BUGS: some additional stuff that should not qualify as bugs.
  • ABOUT bisonc++: Some history;
  • AUTHOR: at the end of this man-page. )

The bisonc++input(7) man-page covers the details of the grammar-specification file. This man-page offers these sections:

  • DESCRIPTION: a short description of bisonc++ and its grammar file(s);
  • DIRECTIVES: bisonc++'s grammar-specification directives;
  • POLYMORPHIC SEMANTIC VALUES: how to use polymorphic semantic values in parsers generated by bisonc++;
  • DOLLAR NOTATIONS: available $-shorthand notations with single, union, and polymorphic semantic value types.
  • RESTRICTIONS ON TOKEN NAMES: name restrictions for user-defined symbols;
  • OBSOLETE SYMBOLS: symbols available to bison(1), but not to bisonc++;
  • EXAMPLE: an example of using bisonc++;
  • USING PARSER-CLASS SYMBOLS IN LEXICAL SCANNERS: how to refer to tokens defined in the grammar from within a lexical scanner;
  • SEE ALSO: references to other programs and documentation;
  • AUTHOR: at the end of this man-page.

The bisonc++api(3) describes the application programmer's interface, containing these sections:

  • DESCRIPTION: a short description of bisonc++ and its application programmer's interface;
  • PUBLIC MEMBERS AND -TYPES: members and types that can be used by calling software;
  • PRIVATE ENUMS AND -TYPES: enumerations and types only available to the Parser class;
  • PRIVATE MEMBER FUNCTIONS: member functions that are only available to the Parser class;
  • PRIVATE DATA MEMBERS: data members that are only available to the Parser class;
  • TYPES AND VARIABLES IN THE ANONYMOUS NAMESPACE: an overview of the types and variables that are used to define and store the grammar-tables generated by bisonc++;
  • SEE ALSO: references to other programs and documentation;
  • AUTHOR: at the end of this man-page.

OPTIONS

Where available, single letter options are listed between parentheses beyond their associated long-option variants. Single letter options require arguments if their associated long options also require arguments. Options affecting the class- or implementation header files are ignored if these files already exist. Options accepting a `filename' do not accept path names, i.e., they cannot contain directory separators (/); options accepting a 'pathname' may contain directory separators.

Some options may cause errors. This happens when they conflict with the contents of a file which bisonc++ cannot modify (e.g., a parser class header file exists, but doesn't define a namespace, but a --namespace option was specified).

To solve the error the offending option could be omitted; the existing file could be removed; or the existing file could be hand-edited according to the option's specification.

Note that bisonc++ currently does not handle the opposite error condition: if a previously used option is omitted, then bisonc++ does not report an inconsistency. In those cases compilation errors may be observed.

o
--analyze-only (-A)
Only analyze the grammar. No files are (re)written. This option can be used to test the grammatic correctness of modification `in situ', without overwriting previously generated files. If the grammar contains syntactic errors only syntax analysis is performed.
o
--baseclass-header=filename (-b)
Filename defines the name of the file to contain the parser's base class. This class defines, e.g., the parser's symbolic tokens. Defaults to the name of the parser class plus the suffix base.h. It is generated, unless otherwise indicated (see --no-baseclass-header and --dont-rewrite-baseclass-header below).
It is an error if this option is used and an already existing parser class header file does not contain #include "filename".
o
--baseclass-preinclude=pathname (-H)
Pathname defines the path to the file preincluded in the parser's base-class header. This option is needed in situations where the base class header file refers to types which might not yet be known. E.g., with polymorphic semantic values a std::string value type might be used. Since the string header file is not by default included in parserbase.h we somehow need to inform the compiler about this and possibly other headers. The suggested procedure is to use a pre-include header file declaring the required types. By default `header' is surrounded by double quotes: #include "header" is used when the option -H header is specified. When the argument is surrounded by pointed brackets #include <header> is included. In the latter case, quotes might be required to escape interpretation by the shell (e.g., using -H '<header>').
o
--baseclass-skeleton=pathname (-B)
Pathname defines the path name to the file containing the skeleton of the parser's base class. It defaults to the installation-defined default path name (e.g., /usr/share/bisonc++/ plus bisonc++base.h).
o
--class-header=filename (-c)
Filename defines the name of the file to contain the parser class. Defaults to the name of the parser class plus the suffix .h
It is an error if this option is used and an already existing implementation header file does not contain #include "filename".
o
--class-name className
Defines the name of the C++ class that is generated. If neither this option, nor the %class-name directory is specified, then the default class name (Parser) is used.
It is an error if this option is used and className differs from the name of the class that is defined in an already existing parser-class header file and/or if an already existing implementation header file does not define members of the class `className'.
o
--class-skeleton=pathname (-C)
Pathname defines the path name to the file containing the skeleton of the parser class. It defaults to the installation-defined default path name (e.g., /usr/share/bisonc++/ plus bisonc++.h).
o
--construction
Details about the construction of the parsing tables are written to the same file as written by the --verbose option (i.e., <grammar>.output, where <grammar> is the input file read by bisonc++). This information is primarily useful for developers. It augments the information written to the verbose grammar output file, generated by the --verbose option.
o
--debug
Provide the generated parse and its support functions with debugging code, optionally showing the actual parsing process on the standard output stream. When included, the debugging output is active by default, but its activity may be controlled using the setDebug(bool on-off) member. Bisonc++ does not use #ifdef DEBUG macros. Rerun bisonc++ without the --debug option to remove the debugging code.
Note that this option does not show the actions of bisonc++'s own parser, its own lexical scanner or merely the numbers of the case-entries executed by the parser's parse function. If that is what you want, use the --own-debug, --action-cases, --scanner-debug and/or --own-tokens options.
When polymorphic semantic values are used (see section POLYMORPHIC SEMANTIC VALUES) then the generated parser might attempt to retrieve an incorrect polymorphic value. In that case a fatal error is generated, ending bisonc++'s run. The error message itself cannot refer to the action block where the error occurred. If this situation is encountered, rerun bisonc++, specifying --debug and call parser.setDebug(Parser::ACTIONCASES): as a debugging aid the generated parser then shows the executeAction switch's case entry numbers just before their execution.
o
--default-actions=off|quiet|warn (-d)
When warn is specified (which is the default) an action block executing $$ = $1 (or $$ = STYPE__{} for empty production rules) is added to production rules that do not explicitly define their own final action blocks, while issuing a warning. When quiet is specified these action blocks are silently added. It is an error when the types of $$ and $1 differ (but it is OK if neither $$ nor $1 is associated with a specific type). When off is specified no action blocks are added (issuing a warning issued, unless the option/directive tag-mismatches off has been specified).
o
--error-verbose
When a syntactic error is reported, the generated parse function dumps the parser's state stack to the standard output stream. The stack dump shows on separate lines a stack index followed by the state stored at the indicated stack element. The first stack element is the stack's top element.
o
--filenames=filename (-f)
Filename is a generic file name that is used for all header files generated by bisonc++. Options defining specific file names are also available (which then, in turn, overrule the name specified by this option).
o
--flex
Bisonc++ generates code calling d_scanner.yylex() to obtain the next lexical token, and calling d_scanner.YYText() for the matched text, unless overruled by options or directives explicitly defining these functions. By default, the interface defined by flexc++(1) is used. This option is only interpreted if the --scanner option or %scanner directive is also used.
o
--help (-h)
Write basic usage information to the standard output stream and terminate.
o
--implementation-header=filename (-i)
Filename defines the name of the file to contain the implementation header. It defaults to the name of the generated parser class plus the suffix .ih.
The implementation header should contain all directives and declarations only used by the implementations of the parser's member functions. It is the only header file that is included by the source file containing parse's implementation. User defined implementation of other class members may use the same convention, thus concentrating all directives and declarations that are required for the compilation of other source files belonging to the parser class in one header file.
o
--implementation-skeleton=pathname (-I)
Pathname defines the path name to the file containing the skeleton of the implementation header. t defaults to the installation-defined default path name (e.g., /usr/share/bisonc++/ plus bisonc++.ih).
o
--insert-stype
This option is only effective if the debug option (or %debug directive) has been specified. When insert-stype has been specified the parsing function's debug output also shows selected semantic values. It should only be used if objects or variables of the semantic value type STYPE__ can be inserted into ostreams.
o
--max-inclusion-depth=value
Set the maximum number of nested grammar files. Defaults to 10.
o
--namespace identifier
Define all of the code generated by bisonc++ in the namespace identifier. By default no namespace is defined. If this options is used the implementation header is provided with a commented out using namespace declaration for the specified namespace. In addition, the parser and parser base class header files also use the specified namespace to define their include guard directives.
It is an error if this option is used and an already existing parser-class header file and/or implementation header file does not define namespace identifier.
o
--no-baseclass-header
Do not write the file containing the parser class' base class, even if that file doesn't yet exist. By default the file containing the parser's base class is (re)written each time bisonc++ is called. Note that this option should normally be avoided, as the base class defines the symbolic terminal tokens that are returned by the lexical scanner. When the construction of this file is suppressed, modifications of these terminal tokens are not communicated to the lexical scanner.
o
--no-decoration (-D)
Do not include user-defined or default actions when generating the parser's parse member. This effectively generates a parser which merely performs syntax checks, without performing the actions which are normally executed when rules have been matched. This may be useful in situations where a (partially or completely) decorated grammar is reorganized, and the syntactic correctness of the modified grammar must be verified, or in situations where the grammar has already been decorated, but functions which are called from the rules's actions have not yet been impleemented.
o
--no-lines
Do not put #line preprocessor directives in the file containing the parser's parse function. By default the file containing the parser's parse function also contains #line preprocessor directives. This option allows the compiler and debuggers to associate errors with lines in your grammar specification file, rather than with the source file containing the parse function itself.
o
--no-parse-member
Do not write the file containing the parser's predefined parser member functions, even if that file doesn't yet exist. By default the file containing the parser's parse member function is (re)written each time bisonc++ is called. Note that this option should normally be avoided, as this file contains parsing tables which are altered whenever the grammar definition is modified.
o
--own-debug
Displays the actions performed by bisonc++'s parser when it processes the grammar specification file(s) (lots of output!). This implies the --verbose option.
o
--own-tokens (-T)
The tokens returned as well as the text matched by bisonc++'s lexcial scanner are shown when this option is used.
This option does not result in the generated parsing function displaying returned tokens and matched text. If that is what you want, use the --print-tokens option.
o
--parsefun-skeleton=pathname (-P)
Pathname defines the path name of the file containing the parsing member function's skeleton. It defaults to the installation-defined default path name (e.g., /usr/share/bisonc++/ plus bisonc++.cc).
o
--parsefun-source=filename (-p)
Filename defines the name of the source file to contain the parser member function parse. Defaults to parse.cc.
o
--polymorphic-code-skeleton=pathname (-L)
Pathname defines the path name of the file containing the non-template members of the polymorphic Base class. It defaults to the installation-defined default path name (e.g., /usr/share/bisonc++/ plus bisonc++polymorphic).
o
--polymorphic-skeleton=pathame (-M)
Pathname defines the path name of the file containing the skeleton of the polymorphic template classes. It defaults to the installation-defined default path name (e.g., /usr/share/bisonc++/ plus bisonc++polymorphic.code).
o
--print-tokens (-t)
The generated parsing function implements a function print__ displaying (on the standard output stream) the tokens returned by the parser's scanner as well as the corresponding matched text. This implementation is suppressed when the parsing function is generated without using this option. The member print__ is called from Parser::print, which is defined in-line in the the parser's class header. Calling Parser::print__ can thus easily be controlled from print, using, e.g., a variable that set by the program using the parser generated by bisonc++.
This option does not show the tokens returned and text matched by bisonc++ itself when it is reading its input file(s). If that is what you want, use the --own-tokens option.
o
--required-tokens=number
Following a syntactic error, require at least number successfully processed tokens before another syntactic error can be reported. By default number is zero.
o
--scanner=pathname (-s)
Pathname defines the path name to the file defining the scanner's class interface (e.g., "../scanner/scanner.h"). When this option is used the parser's member int lex() is predefined as
    int Parser::lex()
    {
        return d_scanner.lex();
    }
                
and an object Scanner d_scanner is composed into the parser (but see also option scanner-class-name). The example shows the function that's called by default. When the --flex option (or %flex directive) is specified the function d_scanner.yylex() is called. Any other function to call can be specified using the --scanner-token-function option (or %scanner-token-function directive).
By default bisonc++ surrounds pathname by double quotes (using, e.g., #include "pathname"). When pathname is surrounded by pointed brackets #include <pathname> is included.
It is an error if this option is used and an already existing parser class header file does not include `pathname'.
o
--scanner-class-name scannerClassName
Defines the name of the scanner class, declared by the pathname header file that is specified at the scanner option or directive. By default the class name Scanner is used.
It is an error if this option is used and either the scanner option was not provided, or the parser class interface in an already existing parser class header file does not declare a scanner class d_scanner object.
o
--scanner-debug
Show de scanner's matched rules and returned tokens. This extensively displays the rules and tokens matched and returned by bisonc++'s scanner, instead of just showing the tokens and matched text which are received by bisonc++. If you want the latter, use the option --own-tokens.
o
--scanner-matched-text-function=function-call
The scanner function returning the text that was matched at the last call of the scanner's token function. A complete function call expression should be provided (including a scanner object, if used). This option overrules the d_scanner.matched() call used by default when the %scanner directive is specified, and it overrules the d_scanner.YYText() call used when the %flex directive is provided. Example:
    --scanner-matched-text-function "myScanner.matchedText()"
                

o
--scanner-token-function=function-call
The scanner function returning the next token, called from the parser's lex function. A complete function call expression should be provided (including a scanner object, if used). This option overrules the d_scanner.lex() call used by default when the %scanner directive is specified, and it overrules the d_scanner.yylex() call used when the %flex directive is provided. Example:
    --scanner-token-function "myScanner.nextToken()"
                

It is an error if this option is used and the scanner token function is not called from the code in an already existing implementation header.
o
--show-filenames
Writes the names of the generated files to the standard error stream.
o
--skeleton-directory=directory (-S)
Specifies the directory containing the skeleton files. In addition to specifying a common names for the skeleton files the locations of individual skeleton files can be specified using the options (-B -C, -H, -I, -L and -M).
o
--stack-expansion(size)
Defines the number of elements to be added to the generated parser's semantic value stack when it must be enlarged. By default 10 elements are added to the stack. This option/directive is interpreted only once, and only if size at least equals the default stack expansion size of 10.
o
--tag-mismatches off|on
When on is specified (which is the default), a warning is issued if no $$ assignment was detected in an action block, or if adding a default $$ = ... action was suppressed (cf. the default-actions off option or directive).
o
--target-directory=pathname
Pathname defines the directory where generated files should be written. By default this is the directory where bisonc++ is called.
o
--thread-safe
No static data are modified, making bisonc++ thread-safe.
o
--usage
Writes basic usage information to the standard output stream and terminates.
o
--verbose (-V)
Writes a file containing verbose descriptions of the parser states and what is done for each type of look-ahead token in that state. This file also describes all conflicts detected in the grammar, both those resolved by operator precedence and those that remain unresolved. It is not created by default, but if requested the information is written on <grammar>.output, where <grammar> is the grammar specification file passed to bisonc++.
o
--version (-v)
Displays bisonc++'s version number and terminates.

QUICK START

Bisonc++ may be used as follows:

  • First, define a grammar. The reader is referred to bisonc++'s manual and other sources (like Aho, Sethi and Ullman's book) for details about how to define and decorate grammars.
  • No `macro style' %define declarations are required anymore. Instead, the normal practice of defining class members in source files and declaring them in class header files can be followed when using bisonc++. Bisonc++ concentrates on its main tasks: defining a parser class and implementing the parsing function int parse, leaving all other parts of the parser class' definition to the programmer.
  • Having defined a grammar and (usually) some directives bisonc++ is run, generating the essential elements of a parser class. See the next section for details about the files generated by bisonc++.
  • Next, members required in addition to the bisonc++-generated member parse and its support functions must be implemented by the programmer, and declared in the parser's class header. At the very least a member int lex must be defined (a default implementation can be generated by bisonc++).
  • The generated parser can now be used in a program. A very simple example would be:
        int main()
        {
            Parser parser;
            return parser.parse();
        }
            
    

GENERATED FILES

Bisonc++ may create the following files:

  • A file containing the implementation of the member function parse and its support functions. The member parse is a public member that can be called to parse a token-sequence according to a specified LALR1 type of grammar. By default the implementations of these members are written on the file parse.cc. The programmer should not modify the contents of this file; it is rewritten every time bisonc++ is called.
  • A file containing an initial setup of the parser class, containing the declaration of the public member parse and of its (private) support members. New members may safely be declared in the parser class, as it is only created by bisonc++ if not yet existing, using the filename <parser-class>.h (where <parser-class> is the the name of the defined parser class).
  • A file containing the parser class' base class. This base class should not be modified by the programmer. It contains types defined by bisonc++, as well as several (protected) data members and member functions, which should not be redefined by the programmer. All symbolic parser terminal tokens are defined in this class, thereby escalating these definitions to a separate class (cf. Lakos, (2001)), which in turn prevents circular dependencies between the lexical scanner and the parser (here, circular dependencies may easily be encountered, as the parser needs access to the lexical scanner class when defining the lexical scanner as one of its data members, whereas the lexical scanner needs access to the parser class to know about the grammar's symbolic terminal tokens; escalation is a way out of such circular dependencies). By default this file is (re)written any time bisonc++ is called, using the filename <parser-class>base.h.
  • A file containing an implementation header. The implementation header rather than the parser's class header file should be included by the parser's source files implementing member functions declared by the programmer. The implementation header first includes the parser class's header file, and then provides default in-line implementations for its members error and print (which may be altered by the programmer). The member lex may also receive a standard in-line implementation. Alternatively, its implementation can be provided by the programmer (see below). Any directives and/or namespace directives required for the proper compilation of the parser's additional member functions should be declared next. The implementation header is included by the file defining parse. By default the implementation header is created if not yet existing, receiving the filename <parser-class>.ih.
  • A verbose description of the generated parser. This file is comparable to the verbose output file originally generated by bison++. It is generated when the option --verbose or -V is provided. If so, bisonc++ writes the file <grammar>.output, where <grammar> is the name of the file containing the grammar definition.

FILES

  • bisonc++base.h: skeleton of the parser's base class;
  • bisonc++.h: skeleton of the parser class;
  • bisonc++.ih: skeleton of the implementation header;
  • bisonc++.cc: skeleton of the member parse;
  • bisonc++polymorphic: skeleton of the declarations used by %polymorphic;
  • bisonc++polymorphic.code: skeleton of the non-inline implementations of the members declared in bisonc++polymorphic.
  • debugdecl.in: skeleton declaring members of the parser's base class that are only required when the debug option or directive was specified.
  • debugfunctions1.in: skeleton defining the members declared in debugdecl.in.
  • debugfunctions2.in: skeleton implementing symbol__, which is only needed when the print-tokens option or directive was specified.
  • debugfunctions3.in: skeleton implementing errorVerbose__, which is only needed when the error-verbose option or directive was specified.
  • debugincludes.in: skeleton specifying the header files #include directives that are required when the debug option or directive was specified.
  • debuglookup.in: skeleton containing extra code required in the Parser::lookup member when the debug option of directive was specified.
  • lex.in: skeleton implementing the Parser::lex function.
  • ltypedata.in: skeleton declaring the location variables
  • ltype.in: skeleton defining the default or user defined LTYPE__.
  • print.in: skeleton implementing the actions of Parser::print if the print-tokens option or directive was specified.
  • threading.in: skeleton defining the variables required for generating a thread-safe parsing function.

BUGS

Parser-class header files (e.g., Parser.h) and parser-class internal header files (e.g., Parser.ih) generated with bisonc++ < 4.02.00 require two hand-modifications when used in combination with bisonc++ >= 4.02.00. See the description of exceptionHandler__ for details.

Discontinued options:

  • --include-only
  • --namespace
  • --polymorphic-inline-skeleton

To avoid collisions with names defined by the parser's (base) class, the following identifiers should not be used as token names:

  • Identifiers ending in two underscores;
  • Any of the following identifiers: ABORT, ACCEPT, ERROR, clearin, debug, error, or setDebug.

When re-using files generated by bisonc++ before version 2.0.0, minor hand-modification may be necessary. Refer to bisonc++'s git (https://github.com/fbb-git/bisoncpp) for details.

The Semantic parser, mentioned in bison++(1) is not implemented in bisonc++(1). According to bison++(1) the semantic parser was not available in bison++ either. Maybe a so-called pure parser is available through the --thread-safe option.

ABOUT bisonc++

Bisonc++ was based on bison++, originally developed by Alain Coetmeur ([email protected]), R&D department (RDT), Informatique-CDC, France, who based his work on bison, GNU version 1.21.

Bisonc++ version 0.98 and beyond is a complete rewrite of an LALR-1 parser generator, closely following the construction process as described in Aho, Sethi and Ullman's (1986) book Compilers (i.e., the Dragon book). It uses the same grammar specification as bison and bison++, and it uses practically the same options and directives as bisonc++ versions earlier than 0.98. Variables, declarations and macros that are obsolete were removed.

Compared to bison and bison++, the number and functions of the various %define declarations was thoroughly modified. All of bison's %define declarations were replaced by their (former) first arguments. Furthermore, `macro-style' declarations are not supported or required. Finally, all directives only use lower-case characters and do not contain underscore characters (but sometimes hyphens). E.g., %define DEBUG is now declared as %debug; %define LSP_NEEDED is now declared as %lsp-needed (note the hyphen).

AUTHOR

Frank B. Brokken ([email protected]).