flexc++api(3) Application programmer's interface of flexc++ generated classes

DESCRIPTION

Flexc++(1) was designed after flex(1) and flex++(1). Like these latter two programs flexc++ generates code performing pattern-matching on text, possibly executing actions when certain regular expressions are recognized.

Refer to flexc++(1) for a general overview. This manual page covers the Application Programmer's Interface of classes generated by flexc++, offering the following sections:

  • 1. INTERACTIVE SCANNERS: how to create an interactive scanner
  • 2. THE CLASS INTERFACE: SCANNER.H: Constructors and members of the scanner class generated by flexc++
  • 3. NAMING CONVENTION: symbols defined by flexc++ in the scanner class.
  • 4. CONSTRUCTORS: constructors defined in the scanner class.
  • 5. PUBLIC MEMBER FUNCTION: public member declared in the scanner class.
  • 6. PRIVATE MEMBER FUNCTIONS: private members declared in the scanner class.
  • 7. SCANNER CLASS HEADER EXAMPLE: an example of a generated scanner class header
  • 8. THE SCANNER BASE CLASS: the scanner class is derived from a base class. The base class is described in this section
  • 9. PUBLIC ENUMS AND -TYPES: enums and types declared by the base class
  • 10. PROTECTED ENUMS AND -TYPES: enumerations and types used by the scanner and scanner base classes
  • 11. NO PUBLIC CONSTRUCTORS: the scanner base class does not offer public constructors.
  • 12. PUBLIC MEMBER FUNCTIONS: several members defined by the scanner base class have public access rights.
  • 13. PROTECTED CONSTRUCTORS: the base class can be constructed by a derived class. Usually this is the scanner class generated by flexc++.
  • 14. PROTECTED MEMBER FUNCTIONS: this section covers the base class member functions that can only be used by scanner class or scanner base class members
  • 15. PROTECTED DATA MEMBERS: this section covers the base class data members that can only be used by scanner class or scanner base class members
  • 16. FLEX++ TO FLEXC++ MEMBERS: a short overview of frequently used flex(1) members that received different names in flexc++.
  • 17. THE CLASS INPUT: the scanner's job is completely decoupled from the actual input stream. The class Input, nested within the scanner base class handles the communication with the input streams. The class Input, is described in this section.
  • 18. INPUT CONSTRUCTORS: the class Input can easily be replaced by another class. The constructor-requirements are described in this section.
  • 19. REQUIRED PUBLIC MEMBER FUNCTIONS: this section covers the required public members of a self-made Input class

1. INTERACTIVE SCANNERS

An interactive scanner is characterized by the fact that scanning is postponed until an end-of-line character has been received, followed by reading all information on the line, read so far. Flexc++ supports the %interactive directive), generating an interactive scanner. Here it is assumed that Scanner is the name of the scanner class generated by flexc++.

Caveat: generating interactive and non-interactive scanners should not be mixed as their class organizations fundamentally differ, and several of the Scanner class's members are only available in the non-interactive scanner. As the Scanner.h file contains the Scanner class's interface, which is normally left untouched by flexc++, flexc++ cannot adapt the Scanner class when requested to change the interactivity of an existing Scanner class. Because of this support for the --interactive option was discontinued at flexc++'s 1.01.00 release.

The interactive scanner generated by flexc++ has the following characteristics:

o
The Scanner class is derived privately from std::istringstream and (as usual) publicly from ScannerBase.
o
The istringstream base class is constructed by its default constructor.
o
The function lex's default implementation is removed from Scanner.h and is implemented in the generated lex.cc source file. It performs the following tasks:
- If the token returned by the scanner is not equal to 0 it is returned as then next token;
- Otherwise the next line is retrieved from the input stream passed to the Scanner's constructor (by default std::cin). If this fails, 0 is returned.
- A '\n' character is appended to the just read line, and the scanner's std::istringstream base class object is re-initialized with that line;
- The member lex__ returns the next token. This implementation allows code calling Scanner::lex() to conclude, as usual, that the input is exhausted when lex returns 0.

Here is an example of how such a scanner could be used:

    // scanner generated using 'flexc++ lexer' with lexer containing 
    // the %interactive directive
    int main()
    {
        Scanner scanner;        // by default: read from std::cin 
    
        while (true)
        {
            cout << "? ";       // prompt at each line
            while (true)        // process all the line's tokens
            {
                int token = scanner.lex();
    
                if (token == '\n')  // end of line: new prompt
                    break;
    
                if (token == 0)     // end of input: done
                    return 0;
    
                                    // process other tokens
                cout << scanner.matched() << '\n';
                if (scanner.matched()[0] == 'q')
                    return 0;
            }
        }
    }
        

2. THE CLASS INTERFACE: SCANNER.H

By default, flexc++ generates a file Scanner.h containing the initial interface of the scanner class performing the lexical scan according to the specifications given in flexc++'s input file. The name of the file that is generated can easily be changed using flexc++'s --class-header option. In this man-page we'll stick to using the default name.

The file Scanner.h is generated only once, unless an explicit request is made to rewrite it (using flexc++'s --force-class-header option).

The provided interface is very light-weight, primarily offering a link to the scanner's base class (see this manpage's sections 8 through 16).

Many of the facilities offered by the scanner class are inherited from the ScannerBase base class. Additional facilities offered by the Scanner class. are covered below.

3. NAMING CONVENTION

All symbols that are required by the generated scanner class end in two consecutive underscore characters (e.g., executeAction__). These names should not be redefined. As they are part of the Scanner and ScannerBase class their scope is immediately clear and confusion with identically named identifiers elsewhere is unlikely.

Some member functions do not use the underscore convention. These are the scanner class's constructors, or names that are similar or equal to names that have historically been used (e.g., length). Also, some functions are offered offering hooks into the implementation (like preCode). The latter category of function also have names that don't end in underscores.

4. CONSTRUCTORS

o
explicit Scanner(std::istream &in = std::cin, std::ostream &out = std::cout) This constructor by default reads information from the standard input stream and writes to the standard output stream. When the Scanner object goes out of scope the input and output files are closed.
With interactive scanners input stream switching or stacking is not available; switching output streams, however, is.
o
Scanner(std::string const &infile, std::string const &outfile) This constructor opens the input and output streams whose file names were specified. When the Scanner object goes out of scope the input and output files are closed. If outfile == "-" then the standard output stream is used as the scanner's output medium; if outfile == "" then the standard error stream is used as the scanner's output medium.
This constructor is not available with interactive scanners.

5. PUBLIC MEMBER FUNCTIONS

o
int lex() The lex function performs the lexical scanning of the input file specified at construction time (but also see section 6.1. for information about intermediate stream-switching facilities). It returns an int representing the token associated with the matched regular expression. The returned value 0 indicates end-of-file. Considering its default implementation, it could be redefined by the user. Lex's default implementation merely calls lex__:
inline int Scanner::lex()
{
    return lex__();
}
        

Caveat: with interactive scanners the lex function is defined in the generated lex.cc file. Once flexc++ has generated the scanner class header file this scanner class header file isn't automatically rewritten by flexc++. If, at some later stage, an interactive scanner must be generated, then the inline lex implementation must be removed `by hand' from the scanner class header file. Likewise, a lex member implementation (like the above) must be provided `by hand' if a non-interactive scanner is required after first having generated files implementing an interactive scanner.

6. PRIVATE MEMBER FUNCTIONS

o
int lex__() This function is used internally by lex and should not otherwise be used.
o
int executeAction__() This function is used internally by lex and should not otherwise be used.
o
void preCode() By default this function has an empty, inline implementation in Scanner.h. It can safely be replaced by a user-defined implementation. This function is called by lex__, just before it starts to match input characters against its rules: preCode is called by lex__ when lex__ is called and also after having executed the actions of a rule which did not execute a return statement. The outline of lex__'s implementation looks like this:
int Scanner::lex__()
{
    ...
    preCode();
    while (true)
    {
        size_t ch = get__();            // fetch next char
        ...
        switch (actionType__(range))    // determine the action
        {
            ... maybe return
        }
        ... no return, continue scanning
        preCode();
    } // while
}
        
o
void postCode(PostEnum__ type) By default this function has an empty, inline implementation in Scanner.h. It can safely be replaced by a user-defined implementation. This function is called by lex__, just after a rule has been matched. Values of the enum class PostEnum__ indicate the characteristic of the matched rule. PostEnum__ has four values: PostEnum__::END, PostEnum__::POP, PostEnum__::RETURN, and PostEnum__::WIP. Refer to section 10 for their meanings.
o
void print() When the --print-tokens or %print-tokens directive is used this function is called to display, on the standard output stream, the tokens returned and text matched by the scanner generated by flexc++.
Displaying is suppressed when the lex.cc file is (re)generated without using this directive. The function actually showing the tokens (ScannerBase::print__) is called from print, which is defined in-line in Scanner.h. Calling ScannerBase::print__, therefore, can also easily be controlled by an option controlled by the program using the scanner object.

7. SCANNER CLASS HEADER EXAMPLE

#ifndef Scanner_H_INCLUDED_
#define Scanner_H_INCLUDED_
// $insert baseclass_h
#include "Scannerbase.h"
// $insert classHead
class Scanner: public ScannerBase
{
    public:
        explicit Scanner(std::istream &in = std::cin,
                                std::ostream &out = std::cout);
        Scanner(std::string const &infile, std::string const &outfile);
        
        // $insert lexFunctionDecl
        int lex();
    private:
        int lex__();
        int executeAction__(size_t ruleNr);
        void print();
        void preCode();     // re-implement this function for code that must 
                            // be exec'ed before the patternmatching starts
        void postCode(PostEnum__ type);    
                            // re-implement this function for code that must 
                            // be exec'ed after the rules's actions.
};
// $insert scannerConstructors
inline Scanner::Scanner(std::istream &in, std::ostream &out)
:
    ScannerBase(in, out)
{}
inline Scanner::Scanner(std::string const &infile, std::string const &outfile)
:
    ScannerBase(infile, outfile)
{}
// $insert inlineLexFunction
inline int Scanner::lex()
{
    return lex__();
}
inline void Scanner::preCode() 
{
    // optionally replace by your own code
}
inline void Scanner::postCode(PostEnum__ type) 
{
    // optionally replace by your own code
}
inline void Scanner::print() 
{
    print__();
}
#endif // Scanner_H_INCLUDED_
        

8. THE SCANNER BASE CLASS

By default, flexc++ generates a file Scannerbase.h containing the interface of the base class of the scanner class also generated by flexc++. The name of the file that is generated can easily be changed using flexc++'s --baseclass-header option. In this man-page we use the default name.

The file Scannerbase.h is generated at each new flexc++ run. It contains no user-serviceable or extensible parts. Rewriting can be prevented by specifying flexc++'s --no-baseclass-header option).

9. PUBLIC ENUMS AND -TYPES

  • enum class StartCondition__ This strongly typed enumeration defines the names of the start conditions (i.e., mini scanners). It at least contains INITIAL, but when the %s or %x directives were used it also contains the identifiers of the mini scanners declared by these directives. Since StartCondition__ is a strongly typed enum its values must be preceded by its enum name. E.g.,
        begin(StartCondition__::INITIAL);
            
    

10. PROTECTED ENUMS AND -TYPES

o
enum class ActionType__ This strongly typed enumeration is for internal use only.
o
enum Leave__ This enumeration is for internal use only.
o
enum class PostEnum__ Values of this strongly typed enumeration are passed to the scanner's private member postCode, indicating the scanner's action after matching a rule. The values of this enumeration are:
PostEnum__::END: the function lex__ immediately returns 0 once postCode returns, indicating the end of the input was reached;
PostEnum__::POP: the end of an input stream was reached, and processing continues with the previously pushed input stream. In this case the function lex__ doesn't return, it simply coontinues processing the previously pushed stream;
PostEnum__::RETURN: the function lex__ immediately returns once postCode returns, returning the next token;
PostEnum__::WIP: the function lex__ has matched a non-returning rule, and continues its rule-matching process.

11. NO PUBLIC CONSTRUCTORS

There are no public constructors. ScannerBase is a base class for the Scanner class generated by flexc++. ScannerBase only offers protected constructors.

12. PUBLIC MEMBER FUNCTIONS

o
bool debug() const returns true if --debug or %debug was specified, otherwise false.
o
bool interactiveLine() this member is only available with interactive scanners. All remaining contents of the current interactive line buffer is discarded, and the interactive line buffer is filled with the contents of the next input line. This member can be used when a condition is encountered which invalidates the remaining contents of a line. Following a call to interactiveLine the next token that is returned by the lexical scanner will be the first token on the next line. This member returns true if the next line is available and false otherwise.
o
std::string const &filename() const returns the name of the file currently processed by the scanner object.
o
size_t length() const returns the length of the text that was matched by lex. With flex++ this function was called leng.
o
size_t lineNr() const returns the line number of the currently scanned line. This function is always available (note: flex++ only offered a similar function (called lineno) after using the %lineno option).
o
std::string const &matched() const returns the text matched by lex (note: flex++ offers a similar member called YYText).
o
void setDebug(bool onOff) Switches on/off debugging output by providing the argument true or false. Switching on debugging output only has visible effects if the debug option was specified.
o
void switchIstream(std::string const &infilename) The currently processed input stream is closed, and processing continues at the stream whose name is specified as the function's argument. This is not a stack-operation: after processing infilename processing does not return to the original stream.
This member is not available with interactive scanners.
o
void switchOstream(std::ostream &out) The currently processed output stream is closed, and new output is written to out.
o
void switchOstream(std::string const &outfilename)
The current output stream is closed, and output is written to outfilename. If this file already exists, it is rewritten.
o
void switchStreams(std::istream &in, std::ostream &out = std::cout) The currently processed input and output streams are closed, and processing continues at in, writing output to out. This is not a stack-operation: after processing in processing does not return to the original stream.
This member is not available with interactive scanners.
o
void switchStreams(std::string const &infilename, std::string const &outfilename) The currently processed input and output streams are closed, and processing continues at the stream whose name is specified as the function's first argument, writing output to the file whose name is specified as the function's second argument. This latter file is rewritten. This is not a stack-operation: after processing infilename processing does not return to the original stream. If outfilename == "-" then the standard output stream is used as the scanner's output medium; if outfilename == "" then the standard error stream is used as the scanner's output medium.
If outfilename == "-" then the standard output stream is used as the scanner's output medium; if outfilename == "" then the standard error stream is used as the scanner's output medium.
This member is not available with interactive scanners.

13. PROTECTED CONSTRUCTORS

  • ScannerBase(std::string const &infilename, std::string const &outfilename) The scanner object opens and reads infilename and opens (rewrites) and writes outfilename. It is called from the corresponding Scanner constructor.
  • This member is not available for interactive scanners.
  • ScannerBase(std::istream &in, std::ostream &out) The in and out parameters are, respectively, the derived class constructor's input stream and output streams.

14. PROTECTED MEMBER FUNCTIONS

All member functions ending in two underscore characters are for internal use only and should not be called by user-defined members of the Scanner class.

The following members, however, can safely be called by members of the generated Scanner class:

o
void accept(size_t nChars = 0) accept(n) returns all but the first `nChars' characters of the current token back to the input stream, where they will be rescanned when the scanner looks for the next match. So, it matches `nChars' of the characters in the input buffer, rescanning the rest. This function effectively sets length's return value to nChars (note: with flex++ this function was called less);
o
void begin(StartCondition__ startCondition) activate the regular expression rules associated with StartCondition__ startCondition. As this enumeration is a strongly typed enum the StartCondition__ scope must be specified as well. E.g.,
        begin(StartCondition__::INITIAL);
            

o
void echo() const The currently matched text (i.e., the text returned by the member matched) is inserted into the scanner object's output stream;
o
void leave(int retValue) actions defined in the lexical scanner specification file may or may not return. This frequently results in complicated or overlong compound statements, blurring the readability of the specification file. By encapsulating the actions in a member function readability is enhanced. However, frequently a compound statement is still required, as in:
    regex-to-match  {
                        if (int ret = memberFunction())
                            return ret;
                    }
            
The member leave removes the need for constructions like the above. The member leave can be called from within member functions encapsulating actions performed when a regular expression has been matched. It ends lex, returning retValue to its caller. The above rule can now be written like this:
    regex-to-match  memberFunction();
            
and memberFunction could be implemented as follows:
    void memberFunction()
    {
        if (someCondition())
        {                           // any action, e.g., 
                                    // switch mini-scanner
            begin(StartCondition__::INITIAL);
            leave(Parser::TOKENVALUE);    // lex returns TOKENVALUE
            // this point is never reached
        }
    
        pushStream(d_matched);      // switch to the next stream
                                    // lex continues
    }
            
The member leave should only (indirectly) be called (usually nested) from actions defined in the scanner's specification s; calling leave outside of this context results in undefined behavior.
o
void more() the matched text is kept and will be prefixed to the text that is matched at the next lexical scan;
o
std::ostream &out() returns a reference to the scanner's output stream;
o
bool popStream() closes the currently processed input stream and continues to process the most recently stacked input stream (removing it from the stack of streams). If this switch was successfully performed true is returned, otherwise (e.g., when the stream stack is empty) false is returned;
o
void push(size_t ch) character ch is pushed back onto the input stream. I.e., it will be the character that is retrieved at the next attempt to obtain a character from the input stream;
o
void push(std::string const &txt) the characters in the string txt are pushed back onto the input stream. I.e., they will be the characters that are retrieved at the next attempt to obtain characters from the input stream. The characters in txt are retrieved from the first character to the last. So if txt == "hello" then the 'h' will be the character that's retrieved next, followed by 'e', etc, until 'o';
o
void pushStream(std::istream &curStream) this function pushes curStream on the stream stack;
This member is not available with interactive scanners.
o
void pushStream(std::string const &curName) same, but the stream curName is opened first, and the resulting istream is pushed on the stream stack;
This member is not available with interactive scanners.
o
void redo(size_t nChars = 0) this member acts like accept but its argument counts backward from the end of the matched text. All but these nChars characters are kept and the last nChar characters are rescanned. This function effectively reduces length's return value by nChars;
o
void setFilename(std::string const &name) this function sets the name of the stream returned by filename to name;
o
void setMatched(std::string const &text) this function stores text in the matched text buffer. Following a call to this function matched returns text.
o
StartCondition__ startCondition() const returns the currently active start condition (mini scanner);
o
std::vector<StreamStruct> const &streamStack() const returns the vector of currently stacked input streams. The vector's size equals 0 unless pushStream has been used. So flexc++'s input file is not counted here. The StreamStruct is a struct only having one accessible member: std::string const &pushedName, which holds the name of the pushed stream. The vector is used internally as a stack: the stream that was first pushed is found at index position 0, the most recently pushed stream is found at streamStack().back().
This member is not available with interactive scanners.

15. PROTECTED DATA MEMBERS

All protected data members are for internal use only, allowing lex__ to access them. All of them end in two underscore characters.

16. FLEX++ TO FLEXC++ MEMBERS

Flex++ (old)
lineno()
YYText()
less()

17. THE CLASS INPUT

Flexc++ generates a file Scannerbase.h defining the scanner class's base class, by default named ScannerBase (which is the name used in this man-page). The base class ScannerBase contains a nested class Input whose interface looks like this:

class Input
{
    public:
        Input();
        Input(std::istream *iStream, size_t lineNr = 1);
        size_t get();
        size_t lineNr() const;          
        size_t nPending() const;          
        void setPending(size_t nPending);          
        void reRead(size_t ch);
        void reRead(std::string const &str, size_t fmIdx);
        void close();
};
        
The members of this class are all required and offer a level in between the operations of ScannerBase and flexc++'s actual input file that's being processed.

By default, flexc++ provides an implementation for all of Input's required members. Therefore, in most situations this section of this man-page can safely be ignored.

However, users may define and extend their own Input class and provide flexc++'s base class with that Input class. To do so flexc++'s rules file must contain the following two directives:

       %input-implementation = "sourcefile"
       %input-interface = "interface"
        
Here, interface is the name of a file containing the class Input's interface. This interface is then inserted into ScannerBase's interface instead of the default class Input's interface. This interface must at least offer the aforementioned members and constructors (their functions are described below). The class may contain additional members if required by the user-defined implementation. The implementation itself is expected in sourcefile. The contents of this file are inserted in the generated lex.cc file instead of Input's default implementation. The file sourcefile should probably not have a .cc extension to prevent its compilation by a program maintenance utility.

When the lexical scanner generated by flexc++ switches streams using the //include directive (see also section 2. FILE SWITCHING) in the flexc++input(7) man page), then the input stream that's currently processed is pushed on an Input stack maintained by ScannerBase, and processing continues at the file named at the //include directive. Once the latter file has been processed, the previously pushed stream is popped off the stack, and processing of the popped stream continues. This implies that Input objects must be `stack-able'. The required interface is designed to satisfy this requirement.

18. INPUT CONSTRUCTORS

o
Input() The default constructor is used by ScannerBase to prepare the stack for Input objects. It must make sure that a default (empty) Input object is in a valid state and can be destroyed. It serves no further purpose. Input objects, however, must support the default (or overloaded) assignment operator.
o
Input(std::istream *iStream, size_t lineNr = 1) This constructor receives a pointer to a dynamically allocated istream object. The Input constructor should preserve this pointer when the Input object is pushed on and popped off the stack. A shared_ptr probably comes in handy here. The Input object becomes the owner of the istream object, albeit that its destructor is not supposed to destroy the istream object. Destruction remains the responsibility of the ScannerBase object, which calls the Input::close member (see below) when it's time to destroy (close) the stream.
The new input stream's line counter is set to lineNr, by default 1.

19. REQUIRED PUBLIC MEMBER FUNCTIONS

  • size_t get() returns the next character to be processed by the lexical scanner. Usually it will be the next character from the istream passed to the Input class at construction time. It is never called by the ScannerBase object for Input objects defined using Input's default constructor. It should return 0x100 once istream's end-of-file has been reached.
  • size_t lineNr() const should return the (1-based) number of the istream object passed to the Input object. At construction time the istream has just been opened and so at that point lineNr should return 1.
  • size_t nPending() const should return the number of pending characters (i.e., the number of characters which were passed back to the Input object using its reRead members which were not yet retrieved again by its get member).
  • void setPending(size_t nPending) should remove nPending characters from the head of the Input object's pending input queue. The lexical scanner always passes the value received from nPending to setPending, without calling get in between.
  • void reRead(size_t ch) if provided with a value smaller than 0x100 ch should be pushed back onto the istream, where it becomes the character next to be returned. Physically the character doesn't have to be pushed back. The default implementation uses a deque onto which the character is pushed-front. Only when this deque is exhausted characters are retrieved from the Input object's istream.
  • void reRead(std::string const &str, size_t fmIdx) the characters in str from fmIdx until the string's final character are pushed back onto the istream object so that the string's first character is retrieved first and the string's last character is retrieved last.
  • void close() the istream object initially passed to the Input object is deleted by close, thereby not only freeing the stream's memory, but also closing the stream if the stream in fact was an ifstream. Note that the Input's destructor should not destroy the Input's istream object.

FILES

Flexc++'s default skeleton files are in /usr/share/flexc++.
By default, flexc++ generates the following files:

  • Scanner.h: the header file containing the scanner class's interface.
  • Scannerbase.h: the header file containing the interface of the scanner class's base class.
  • Scanner.hh: the internal header file that is meant to be included by the scanner class's source files (e.g., it is included by lex.cc, see the next item's file), and that should contain all declarations required for compiling the scanner class's sources.
  • lex.cc: the source file implementing the scanner class member function lex (and support functions), performing the lexical scan.

BUGS

  • Generating interactive and non-interactive scanners (see section 1. INTERACTIVE SCANNERS) cannot be mixed.

COPYRIGHT

This is free software, distributed under the terms of the GNU General Public License (GPL).

AUTHOR

Frank B. Brokken ([email protected]),
Jean-Paul van Oosten ([email protected]),
Richard Berendsen ([email protected]) (until 2010).