Chapter 19: Concrete examples of C++

We're always interested in getting feedback. E-mail us if you like this guide, if you think that important material is omitted, if you encounter errors in the code examples or in the documentation, if you find any typos, or generally just if you feel like e-mailing. Send your email to Frank Brokken.

Please state the document version you're referring to, as found in the title (in this document: 5.1.0d) and please state the paragraph you're referring to.

All mail received is seriously considered, and new (sub)releases of the Annotations will normally reflect your suggestions for improvements. Except for the incidental case I will not otherwise acknowledge the receipt of suggestions for improvements. Please don't misinterpret this for lack of appreciation.

This chapter presents a number of concrete examples of programming in C++. Items from this document such as virtual functions, static members, etc. are rediscussed. Examples of container classes are shown.

Another example digs into the peculiarities of using a parser- and scanner-generator with C++. Once the input for a program exceeds a certain level of complexity, it's advantageous to use a scanner- and parser-generator for creating the code which does the actual input recognition. The example describes the usage of these tool in a C++ environment.

19.1: `streambuf' classes using file descriptors

19.1.1: A class for output operations

Usually extensions to the ANSI/ISO standard are available allowing us to read from and/or write to file descriptors. However, these extensions are not standard, and may thus vary across compilers. As a file descriptor can be considered a device, it seems natural to use the class streambuf as the starting point for the construction of classes interfacing file descriptors. In this section we will discuss the construction of a class which may be used to write to a device defined by its file descriptor: it may be a file, but could be a pipe or socket. Section 19.1.2 will discuss the matter of reading from devices that are known by their file descriptors, while section 19.3 will reconsider the question of redirection, which was discussed earlier in section 5.8.3.

Basically, deriving a class for output operations is very simple. The only member function that must be overridden is the virtual int overflow(int c) member. This member is responsible for writing characters to the device in the case where there is no buffer or no more room in the buffer. If fd represents the file descriptor to write to, and if we decide against using a buffer, the member overflow() can be as simple as:

    class UnbufferedFD: public std::streambuf
    {
        public:
            int
            overflow(int c)
            {
                if (c != EOF)
                {
                    if (write(fd, &static_cast<char>(c), 1) != 1)
                        return EOF;
                }
                return c;
            }
            ...
    }
The reveived argument is either written as char to the file descriptor, or EOF is returned. However, this approach does not use an output buffer. As the use of a buffer is strongly advised (see also the next section), we will discuss the construction of a class using an output buffer is somewhat greater detail.

When an output buffer is to be used, the overflow() member gets a bit more complex, due to the fact that it is now called only when the buffer is full. So, first we will have to flush the buffer, for which the (virtual) function streambuf::sync() should be used. Defining sync() and using it in overflow() is not all: the information to write might be less than the size of the buffer. So, at the end of the lifetime of our special streambuf object, it might find itself having a buffer that is only partially full. So, we must make sure that the buffer is flushed by the time our object goes out of scope. This is of course very simple: sync() should be called in the destructor as well.

Now that we've considered the consequences of using an output buffer, we're ready to construct our derived class. Here it is:

#include <unistd.h>
#include <streambuf>
#include <ios>

class ofdstreambuf: public std::streambuf
{
    public:
        ofdstreambuf(int fd)
        :
            fd(fd)
        {
            setp(buffer, buffer + BUFSIZE);             // 12
        }

        ~ofdstreambuf()
        {
            sync();                                     // 17
        }

        int
        sync()
        {
            if (pptr() > pbase())
            {
                write(fd, &buffer, pptr() - pbase());   // 25
                setp(buffer, buffer + BUFSIZE);         // 26
            }
            return 0;
        }

        int
        overflow(int c)
        {
            sync();                                     // 34
            if (c != EOF)
            {
                *pptr() = static_cast<char>(c);         // 37
                pbump(1);
            }
            return c;
        }

    private:
        static unsigned const BUFSIZE = 20;

        int     fd;
        char    buffer[BUFSIZE];
};
  • At line 12, in the constructor, the buffer is initialized. Using setp() the begin and end points of the buffer are passed to the streambuf base class, allowing it to setup pbase() pptr() and epptr().
  • At line 17, in the destructor, sync() is called to write any unprocessed characters in the output buffer to the device.
  • At line 25 in the member sync(), the unflushed characters in the buffer are written to the device. Then, at the next line, the buffer is reinitialized. Note that sync() should return 0 after a successfull flush operation.
  • At line 26 the buffer pointers maintained by streambuf are reset to their initial values by called setp() once again.
  • At lines 34 and 37, in the member overflow(), sync() is called to flush the now filled up output buffer to the device. As this recreates an empty buffer, the character c which could not be written to the buffer is now entered into the buffer using the member functions pptr() and pbump(). Notice that entering a character into the buffer is realized using available member functions of streambuf.
  • Depending on the number of arguments, the following program uses the ofstreambuf class to copy its standard input to file descriptor STDOUT_FILENO, which is the symbolic name of the file descriptor used for the standard output. Here is the program:
    #include "fdout.h"
    #include <string>
    #include <iostream>
    #include <istream>
    
    using namespace std;
    
    int main(int argc)
    {
        ofdstreambuf
            fds(STDOUT_FILENO);
        ostream
            os(&fds);
    
        switch (argc)
        {
            case 1:
                os << "COPYING cin LINE BY LINE\n";
                for (string  s; getline(cin, s); )
                    os << s << endl;
            break;        
    
            case 2:
                os << "COPYING cin BY EXTRACTING TO os.rdbuf()\n";
    
                cin >> os.rdbuf();      // Alternatively, use:  cin >> &fds;
            break;
    
            case 3:    
                os << "COPYING cin BY INSERTING cin.rdbuf() into os\n";
                os << cin.rdbuf();
            break;
        }
    }
    

    19.1.2: Classes for input operations

    When a class performing input operations must be derived from the class streambuf, the class should use an input buffer of at least one character, to allow the use of the member functions istream::putback() or istream::ungetc(). Stream classes (like istream) normally allow the ungetting of at least one character using their member functions putback() or ungetc(). This is important, as these stream classes usually interface to streambuf objects. Although strictly speaking it is not necessary to implement a buffer in classes derived from streambuf, we strongly suggest to use buffers is these cases: the implementation is very simple and straightforward, and the applicability of such classes is greatly improved. Therefore, in all our classes derived from the class streambuf at least a buffer of one character will be defined.

    19.1.2.1: Using a buffer of one character

    When deriving a class (e.g., ifdstreambuf) from streambuf using a buffer of at least one character, at the minimum the member streambuf::underflow() should be overridden, as this is the member to which all requests for input eventually degenerate. Since also a buffer is needed, the member streambuf::setg() is used to inform the streambuf baseclass of the dimensions of the input buffer, so that it is able to set up its input buffer pointers so that eback(), gptr(), and egptr() return correct values. Here is the corresponding class definition:

    #include <unistd.h>
    #include <streambuf>
    #include <ios>
    
    class ifdstreambuf: public std::streambuf
    {
        public:
            ifdstreambuf(int fd)
            :
                fd(fd)
            {
                setg(buffer, buffer + 1, buffer + 1);   // 12
            }
    
            int 
            underflow() 
            {
                if (gptr() < egptr())                   // 18
                    return *gptr();                     // 19
        
                if (read(fd, buffer, 1) <= 0)           // 21
                    return EOF;
        
                setg(buffer, buffer, buffer + 1);       // 24
                return *gptr();
            }
        protected:                                      // 27
            int     fd;
            char    buffer[1];
    };
    
    Please note the following:
  • At line 12 the buffer is set up. In the private section a small array of one character is defined, and setg() will set gptr() equal to egptr(). Since this implies that the buffer is empty, underflow() will be called to refill the buffer.
  • At lines 18 and 19 underflow() will first see whether the buffer is really empty. If not, then the character to be retrieved next is returned.
  • At line 21 the buffer is refilled. If this fails (for whatever reason), EOF is returned. More sophisticated implementations could react more intelligently here, of course.
  • At line 24 the buffer has been refilled, so setg() is called once again to set up streambuf's buffer pointers correctly.
  • Below line 27 two data members are defined: they were defined as protected data members so that derived classes (e.g., see section 19.1.2.3) can access them.
  • The above ifdstreambuf class is used in the following program:
    #include "ifdbuf.h"
    #include <iostream>
    #include <istream>
    
    using namespace std;
    
    int main(int argc)
    {
        ifdstreambuf
            fds(0);
        istream
            is(&fds);
    
        cout << is.rdbuf();
    }        
    

    19.1.2.2: Using an n-character buffer

    How complex would things get if we decide to use a buffer of substantial size? Not that complex. The following class allows use to specify the size of a buffer, and it is basically the same class as ifdstreambuf from the previous section. To make things a bit more interesting, in the current class ifdnstreambuf the member streambuf::xsgetn() was also overridden, to optimize the reading of series of characters at once. Here is the class ifdnstreambuf:

    #include <unistd.h>
    #include <streambuf>
    #include <ios>
    
    class ifdnstreambuf: public std::streambuf
    {
        public:
            ifdnstreambuf(int fd, unsigned bufsize = 1)
            :
                fd(fd),
                bufsize(bufsize),
                buffer(new char[bufsize])
            {
                setg(buffer, buffer + bufsize, buffer + bufsize);
            }
    
            ~ifdnstreambuf()
            {
                delete buffer;                          // 19
            }
    
            int
            underflow()
            {
                if (gptr() < egptr())
                    return *gptr();
    
                int
                    nread = read(fd, buffer, bufsize);
    
                if (nread <= 0)
                    return EOF;
        
                setg(buffer, buffer, buffer + nread);   // 34
                return *gptr();
            }
    
            std::streamsize
            xsgetn(char *dest, std::streamsize n)       // 39
            {
                int nread = 0;
    
                while (n)
                {
                    if (!in_avail())
                    {
                        if (underflow() == EOF)         // 47
                            break;
                    }
    
                    int avail = in_avail();
    
                    if (avail > n)
                        avail = n;
    
                    memcpy(dest + nread, gptr(), avail);// 56
                    gbump(avail);                       // 57
    
                    nread += avail;
                    n -= avail;
                }
    
                return nread;
            }                                           // 64
    
        private:
            int         fd;
            unsigned    bufsize;
            char*       buffer;
    };
    
    With this class, please note:
  • In line 19 the memory allocated for the buffer is deleted. Since the constructor dynamically allocates memory for the buffer, we must make sure that the allocated memory is returned to the common pool when the ifdnstreambuf object goed out of scope.
  • At line 34 the buffer has just been refilled by underflow(). The input buffer pointers of streambuf are now reset in accord with the actual number of characters read from the device.
  • At line 39 to 64 the function xsgetn() is overridden. In a loop n is reduced until 0, at which point the function terminates. Alternatively (line 47), the member returns if underflow() fails to obtain more characters. In line 56 and 57 the reading optimazation takes place: instead of calling streambuf::sbumpc() n times, a block of avail characters is copied to the destination, using streambuf::gpumb() to consume avail characters from the buffer using one function call.
  • The member function xsgetn() is called by streambuf::sgetn(), which is a streambuf member. The following example illustrates the use of this member function with a ifdnstreambuf object:
    #include "ifdnbuf.h"
    #include <iostream>
    #include <istream>
    
    using namespace std;
    
    int main(int argc)
    {
        ifdnstreambuf fds(0, 30);   // internally: 30 char buffer
        char buf[80];               // main() reads blocks of 80 
                                    // chars
        while (true)
        {
            unsigned n = fds.sgetn(buf, 80); 
            if (n == 0)
                break;
            cout.write(buf, n);
        }
    }        
    

    19.1.2.3: Seeking positions in `streambuf' objects

    When devices support seek operations, classes derived from streambuf should override streambuf::seekoff() and streambuf::seekpos(). In the following class ifdseek, which is derived from our previously constructed class ifdstreambuf, we assume that the device of which we have a file descriptor available suports these seeking operations. The class ifdseek is derived from ifdstreambuf, so it uses a character buffer of just one character. The facilities to perform seek operations, which are added to our new class ifdseek will make sure that the input buffer is reset when a seek operation is requested. The class could also be derived from the class ifdnstreambuf, in which case the arguments to reset the input buffer must be adapted such that its second and third parameter point beyond the available input buffer. Here is the class definition:

    #include "ifdbuf.h"
    #include <unistd.h>
    
    class ifdseek: public ifdstreambuf
    {
        typedef std::streambuf::pos_type        pos_type;   // 6
        typedef std::streambuf::off_type        off_type;
        typedef std::ios::seekdir               seekdir;
        typedef std::ios::openmode              openmode;   // 9
    
        public:
            ifdseek(int fd)                                 // 12
            :
                ifdstreambuf(fd)
            {}
    
            pos_type                                        // 17
            seekoff(off_type offset, seekdir dir, openmode)
            {
                pos_type pos = 
                    lseek
                    (
                        fd, offset, 
                        (dir ==  std::ios::beg) ? SEEK_SET :
                        (dir ==  std::ios::cur) ? SEEK_CUR :
                                                  SEEK_END
                    );
    
                if (pos < 0)
                    return -1;
    
                setg(buffer, buffer + 1, buffer + 1);       // 32
                return pos;
            }
    
            pos_type                                        // 36
            seekpos(pos_type offset, openmode mode)
            {
                return seekoff(offset, std::ios::beg, mode);
            }
    };
    
  • In lines 6-9 several type names from the streambuf and ios classes were redefined to prevent us from specifying std::streambuf and std::ios again and again when reffering to these types.
  • At line 12 a plain constructor, deferring its job to its base class constructor is defined.
  • At line 17 the function lseek() is used to seek a new position in a file whose file descriptor is known.
  • At line 32, if the seeking succeeds, the streambuf class will reset its input buffer pointers to an empty buffer, so that underflow() will refill the buffer at the next request for input.
  • At line 36 the companion function seekpos is overridden as well: it is actually defined by a call to seekoff().
  • An example of a program using the class ifdseek is the following. If this program is given its own source file using input redirection then seeking is supported, and apart from the first line, every other line is shown twice:
        #include "fdinseek.h"
        #include <string>
        #include <iostream>
        #include <istream>
        #include <iomanip.h>
        
        using namespace std;
        
        int main(int argc)
        {
            ifdseek fds(0);
            istream is(&fds);
            string  s;
        
            while (true)
            {
                if (!getline(is, s))
                    break;
        
                streampos pos = is.tellg();
        
                cout << setw(5) << pos << ": `" << s << "'\n";
        
                if (!getline(is, s))
                    break;
        
                streampos pos2 = is.tellg();
        
                cout << setw(5) << pos2 << ": `" << s << "'\n";
        
                if (!is.seekg(pos))
                {
                    cout << "Seek failed\n";
                    break;
                }
            }
        }        
    

    buffer is refilled

    19.1.2.4: Multiple `unget()' calls in `streambuf' objects

    As stated earlier streambuf classes and classes derived from streambuf should at least support ungetting the character that was last read. Special care must be taken when series of unget() calls are to be supported. In this section we will show how to construct a class which is derived from streambuf and which allows a configurable number of istream::unget() or istream::putback() calls. Support for multiple (say `n') unget() calls is realized by setting aside an initial section of the input buffer, which is then gradually filled up by the last n characters read. Here is an implementation of a class supporting series of unget() calls:

    #include <unistd.h>
    #include <streambuf>
    #include <ios>
    #include <algorithm>
    
    class fdunget: public std::streambuf
    {
        public:
            fdunget (int fd, unsigned bufsz, unsigned unget)    // 9
            :
                fd(fd),
                reserved(unget)                                 // 12
            {
                unsigned allocate =                             // 14
                        bufsz > reserved ?
                            bufsz
                        :
                            reserved + 1;
    
                buffer = new char [allocate];                   // 20
    
                base = buffer + reserved;                       // 22
                setg(base, base, base);                         // 23
    
                bufsize = allocate - reserved;                  // 25
            }
    
            ~fdunget()
            {
                delete buffer;
            }
    
            int 
            underflow()             
            {
                if (gptr() < egptr())
                    return *gptr();                             
    
                unsigned consumed =                             // 39
                    static_cast<unsigned>(gptr() - eback());    
    
                unsigned move = std::min(consumed, reserved);   // 41    
    
                memcpy(base - move, egptr() - move, move);      // 43
    
                int nread = read(fd, base, bufsize);            // 45
                if (nread <= 0)       // none read -> 
                    return EOF;
        
                setg(base - move, base, base + nread);          // 50
    
                return *gptr();
            }
    
        private:
            int         fd;
            unsigned    bufsize;
            unsigned    reserved;
            char*       buffer;
            char*       base;
    };
    
  • At line 9 the constructor's definition starts. It expects as arguments a file descriptor, a buffer size and the number of characters which can be unget.
  • At line 12 unget determines the size of a reserved area, which is the first reserved bytes of the used buffer.
  • At line 14 we ensure that the number of bytes to allocate is at least one byte larger than reserved. So, a certain number of bytes will be read, and once reserved bytes have been read at least reserved bytes can be unget.
  • At line 20 the buffer is allocated.
  • At line 22 we set up the starting point for reading operations: it is called base, which is located reserved bytes into the buffer area. This is always the point where refills of the buffer start.
  • At line 23 streambuf's buffer pointers are set up. As no characters have been read as yet, all pointers are set to base. At the next read request, underflow() will now know that no characters are available, so unget() will fail immediately.
  • At line 25 the size of the refill buffer is determined as the number of allocated bytes minus the size of the reserved area.
  • At line 39 we are inside the member underflow(). Up to this point everything is standard, and now we determine how many characters were consumed. Remember that at line 23 we ascertained that the pointers indicated 0 bytes in the buffer, so consumed is either 0 or the actual number of bytes read into the buffer after a successful refill.
  • At line 41 the number of bytes to move to the reserved area is computed. This number is at most reserved, but it is less if fewer characters were available. So, move is set to the minimum of the number of consumed characters and reserved.
  • At line 43 the characters to move are moved from the end of the used section of the input buffer to the area immediately before base.
  • Then, at line 45 the buffer is refilled. This is standard, but notice that reading starts from base, not from buffer.
  • Finally, at line 50 streambuf's read buffer pointers are set up. Eback() is set to move locations before base, thus defining the guaranteed unget-area, gptr() is set to base, since that's the location of the first read character after a refill, and egptr() is set just beyond the location of the last read character.
  • Here is an example of a program illustrating the use of the class fdunget. The program reads at most 10 characters from the standard input, stopping at EOF. A guaranteed unget-buffer of 2 characters is defined, in a buffer of 3 characters. Just before reading a character it is tried to unget at most 6 characters. This is of course never possible, but the program will neatly unget as many characters as possible, considering the actual number of characters read:
        #include "fdunget.h"
        #include <string>
        #include <iostream>
        #include <istream>
        
        using namespace std;
        
        int main(int argc)
        {
            fdunget
                fds(0, 3, 2);
            istream
                is(&fds);
            char
                c;
        
            for (int idx = 0; idx < 10; ++idx)
            {
                cout << "after reading " << idx << " characters:\n";
                for (int ug = 0; ug <= 6; ++ug)
                {
                    if (!is.unget())
                    {
                        cout 
                        << "\tunget failed at attempt " << (ug + 1) << "\n" 
                        << "\trereading: '";
        
                        is.clear();
                        while (ug--)
                        {
                            is.get(c);
                            cout << c;
                        }
                        cout << "'\n";
                        break;
                    }
                }
        
                if (!is.get(c))
                {
                    cout << " reached\n";
                    break;
                }    
                cout << "Next character: " << c << endl;
            }
        }        
        /*
            Generated output after 'echo abcde | program':
        after reading 0 characters:
                unget failed at attempt 1
                rereading: ''
        Next character: a
        after reading 1 characters:
                unget failed at attempt 2
                rereading: 'a'
        Next character: b
        after reading 2 characters:
                unget failed at attempt 3
                rereading: 'ab'
        Next character: c
        after reading 3 characters:
                unget failed at attempt 4
                rereading: 'abc'
        Next character: d
        after reading 4 characters:
                unget failed at attempt 4
                rereading: 'bcd'
        Next character: e
        after reading 5 characters:
                unget failed at attempt 4
                rereading: 'cde'
        Next character: 
        
        after reading 6 characters:
                unget failed at attempt 4
                rereading: 'de
        '
         reached
        */
    

    19.2: Using form() with ostream objects

    In the ANSI/ISO standard does not include the previously available Gnu extension form() (and scan()) for stream objects. In this section the construction of a class oformstream is described, which is derived from ostream and does support form() and vform(). Analogously to oformstream a class iscanstream can be developed, supporting scan() and vscan(). The contruction of this latter class is left as an exercise to the reader.

    Here is the class interface and definition. It is annotated below:

        #include <iostream>
        
        class oformstream: public std::ostream
        {
            public:
                oformstream(std::ostream &ostr)
                :
                    std::ostream(ostr.rdbuf())
                {}
    
                std::ostream &form(char const *fmt, ...);
                
        };
    
  • At line 6 the class is defined: it is derived from ostream, so it inherits ostream's capabilities.
  • At line 10 a simple constructor is defined. It expects the address of a streambuf, which could be obtained (as shown below) from another ostream object.
  • At line 16 the member form() is defined. As is customary for this kind of function, it doesn't do the formatting itself, but delegates this to a `v-function' companion, in this case vform().
  • At line 24 the function vform() is defined.
  • At line 26 an initial estimate of the required buffer size for the formatting is defined. This initial estimate is set to a power of 2, so that eventually (line 41) it might `enlarge' to 0.
  • At line 28 formatting is attempted in iterations. At each iteration the formatting is attempted again, using a buffer that is twice as large.
  • At line 30 an auto_ptr is used to manage the dynamically allocated buffer.
  • At line 32 (33) the formatting is attempted. If it succeeds, vsnprintf() returns the number of bytes written to the buffer, not counting the final ASCII-Z character.
  • At line 35, if the formatting succeeded, the contents of the buffer are written to the ostream base class, and the oformstream object is returned.
  • Otherwise, at line 41, the size of the next buffer is determined. If this results in a zero-size, then an error message is written and the program terminates. More forgiving alternatives are left for the reader to develop.
  • A program using the class oformstream is given in the next example. It prints a well-known string and some marker text:
        #include "oformstream.h"
        #include <memory>
    
        std::ostream &oformstream::form(char const *fmt, ...)
        {
            va_list
                list;
            va_start(list, fmt);
            char
                c;
            unsigned
                n = vsnprintf(&c, 0, fmt, list);
            std::auto_ptr<char>
                buf(new char[n + 1]);
    
            vsprintf(buf.get(), fmt, list);
    
            return *this << buf.get();
        }
    
        using namespace std;
    
        int main()
        {
            oformstream
                of(cout);
    
            of << "Hello world\n";
            cout << "Ok, ";
            of << "That's all, folks\n";
    
            of.form("%s\n", "Hello world of C++") << endl;
        };
    

    19.3: Redirection revisited

    Earlier, in section 5.8.3 it was noted that within a C++ program streams could be redirected using the ios::rdbuf() member function. By assigning the streambuf of a stream to another stream, both stream objects access the same streambuf, thus realizing redirection at the level of the programming language itself.

    It should be realized that this is fine within the context of the C++ program, but if that context is left, the redirection is terminated, as the operating system does not know about streambuf objects. This happens, e.g., when a program uses a system() call to start a subprogram. The following program uses C++ redirection to redirect the information inserted into cout to a file, and then calls

        system("echo hello world")
    
    to echo a well-known line of text. Since echo writes its information to the standard output this would be the program's redirected file if C++'s redirection would be recognized by the operating system. However, this is not the case, so hello world still appears at the program's standard output and not on the redirected file. A solution of this problem involves redirection at the operating system level, for which system calls like dup() and dup2() are available in some operating systems (e.g., Unix and friends). Examples of the use of these system calls are given in section 19.4.3 in this chapter.

    Here is the example of the failing redirection after a C++ redirection:

        #include <iostream>
        #include <fstream>
        #include <cstdlib>
    
        using namespace::std;
    
        int main()
        {
            ofstream
                of("outfile");
    
            cout.rdbuf(of.rdbuf());
            cout << "To the of stream" << endl;
            system("echo hello world");
            cout << "To the of stream" << endl;
        }
        /*
            Generated output: on the file `outfile'
        To the of stream
        To the of stream
            On standard output:
        hello world
        */
    

    19.4: The fork() system call

    From the C prorgramming language, the fork() system call is well known. When a program needs to start a new process, system() can be used, but this requires the program to wait for the child process to terminate. The more general way to spawn subprocesses is to call fork(). In this section we will see how C++ can be used to wrap a class around a complex system class like fork(). Much of what follows in this section directly applies to the Unix operating system, and the discussion will therefore focus on that operating system. However, other systems usually provide comparable facilities.

    When fork() is called, the current program is duplicated in memory, thus creating a new process, and both processes continue their execution just below the fork() system call. The two processes may, however, inspect the return value of fork() which is different in the original process (called the parent process) and in the newly created process (called the child process):

  • In the parent process fork() returns the process id of the child process that was created by the fork() system call. This is a positive integer value.
  • In the child process fork() returns 0.
  • If fork() fails, -1 is returned.
  • 19.4.1: The `Fork' class

    A basic Fork class should hide the bookkeeping details of a system call like fork() from its users. We will first construct a basic Fork class doing just that. The class itself is derived from ForkExceptions, a class merely defining empty enum types (see section 16.4.1), representing several error conditions which may be encountered while using the various Fork classes. The ForkExceptions class is defined as follows:
        class ForkExceptions
        {
            public:
                enum bad_fork
                {};
                enum bad_pipe
                {};
                enum bad_redirection
                {};
                enum bad_istream
                {};
                enum bad_ostream
                {};
        };
    

    Then Fork will be used as a base class for other classes, derived from Fork, performing special functions. The design consideration here is that the construction of a derived class is appropriate if the nature of the task to be performed by a the derived class is different from the tasks performed by existing classes. This results in relative slim classes, whose names should provide clear indications of what may be expected of them.

    The Fork class simply allows us to perform the fork() system call. Its main functionality is that fork() throws an exception on failure, and the definition of an enumeration indicating the process we're in after the Fork::fork() member has been called: Fork::CHILD or Fork::PARENT. This process type is the return value of Fork::fork(). If fork() fails, a Fork::bad_fork exception is thrown, bad_fork being an empty enum defined in the class ForkExceptions. When a child process is created, the parent process must be prepared to handle a signal generated by the child process should the child process terminate before the parent process terminates. If this situation occurs, and the parent process does not catch the signal generated by the child process then the child process remains visible as a zombi process. To prevent zombi processes, the Fork class defines a default signal handler, which merely catches the signal generated by the child process and stores the exit value of the process in the static variable childExitValue.

    The default signal handler is installed by the static function Fork::defaultSignalHandler(). If the default signal handler is considered inappropriate, then an alternate signal handler can be designed and installed. It is the responsibility of the programmer to make sure a SIGCHLD signal handler has been activated. The default signal handler is:

        #include "fork.H"
    
        void Fork::sig_child(int signalnumber)
        {
            wait(&childExitValue);
            signal(SIGCHLD, &sig_child);
        }
    

    Here is the basic class Fork, together with the implementation of the Fork::fork() member function. Note the protected nature of the p_id data member, which is thus made available for all classes derived from Fork:

    #ifndef __FORK_H__
    #define __FORK_H__
        #include <sys/types.h>
        #include "forkexceptions.h"
    
        class Fork: public ForkExceptions
        {
            public:
                enum ProcessType
                {
                    CHILD,
                    PARENT
                };
                ProcessType fork();
                ProcessType processType()
                {
                    return p_id ? PARENT : CHILD;
                }
                pid_t pid()
                {
                    return p_id;
                }
    
                static void defaultSignalHandler();
                static int childExitValue;
            protected:
                pid_t
                    p_id;
            private:
                static void sig_child(int signalnumber);
        };
    #endif
    
        #include "fork.H"
    
        Fork::ProcessType Fork::fork()
        {
            p_id = ::fork();
            
            if (p_id < 0)
                throw bad_fork();
    
            return processType();
        }
    

    19.4.2: The `Daemon' class

    Applications exist in which the only purpose of fork() is to start a child process. The parent process terminates immediately after spawning the child process. This can be considered a natural event: in real life, parents tend to die before their children die. If this happens in a program, the child process continues to run, now as a child process of init, the always running first process on Unix systems. Such a process is often called a daemon, running as a background process. Thus our next class is Daemon. Daemon redefines fork() to end the parent process, allowing the child process to continue as child of init. Since we know that we're the child process after Daemon::fork(), it can be a void function. Here is the class interface and the definition of Daemon::fork():
        #include "fork.h"
    
        class Daemon: public Fork
        {
            public:
                void fork();
        };
    
        #include "daemon.h"
        #include <cstdlib>
    
        void Daemon::fork()
        {
            if (Fork::fork() == PARENT)
                exit(0);                // the parent dies
        }
    
    Here is an example illustrating the use of the Daemon class:
        #include <iostream>
        #include <unistd.h>
    
        #include "daemon.h"
    
        using namespace std;
    
        
        int main()
        {
            Daemon
                daemon;
    
            daemon.fork();
    
            sleep(5);                               // wait 5 seconds
            std::cout << "Hello from the child process\n";
            return 0;
        }
        /*
            Generated output:
        The next command prompt, then after 5 seconds:
        Hello from the child process
        */
    
    ;

    19.4.3: The `IFork' class

    Next we'll mimic the workings of the procbuf class. The class IFork is constructed which allows us to read information from the IFork object using, e.g., the extraction operator. The extracted information is made available by a function or external process that writes to its standard output.

    The IFork::fork() calls now comes in two flavors:

  • When called without arguments, the standard output written by the virtual function IFork::child can be extracted from the IFork object. To use IFork::fork() a class should be derived from IFork, overriding the IFork::child() member.
  • When called with a char *const * argument, the argument is interpreted as an array of character strings, defining the name and optionally arguments of an external process whose standard output can be extracted from the IFork object. If this variant is used, it is the responsibility of the programmer to make sure that the last element of the array that is passed as argument is 0 (zero).
  • The IFork class interface has several special characteristics, which will be discussed following the interface, which is shown here:
        #include <fstream>
        #include <unistd.h>
        #include "fork.h"
    
        class IFork: public Fork
        {
            public:
                typedef int (IFork::*memberPtr)() const;
    
                IFork();
                virtual ~IFork();
                operator std::istream &() const
                {
                    return *istreamPtr;
                }
                void fork()
                {
                    fork(&IFork::child);
                }
                void fork(char *const *argv);
                virtual int child() const;
    
            protected:
                void fork(memberPtr ptr);
    
                virtual void parentSetup();
                virtual void childSetup();
            private:
                int externalCommand() const
                {
                    return execvp(argv[0], argv);
                }
    
                IFork(IFork const &other);              // Not Implemented
                IFork &operator=(IFork const &other);   // Not Implemented
    
                char 
                    *const *argv;
                std::ifstream
                    *istreamPtr;
                int
                    io[2];
        };
    
    The interface has several characteristics deserving extra attention:
  • Later on, we will use a pointer to a member of IFork. To reduce typing and clutter, the typedef memberPtr is defined to indicate a pointer to a const member function of IFork, returning an int.
  • It would have been nice if we would have been able to use multiple derivation here, to construct a class which is both a Fork and an ifstream object. However, by the time the IFork object is constructed the stream from which information must be read is not yet known: it will be a stream that is constructed around a file descriptor. Unfortunately it is not possible to use ifstream::open() with a file descriptor. So, the ifstream object will have to be constructed later, and IFork() can only be derived from the Fork base class.
  • Since we persist in our desire to extract information from an IFork object, we use a conversion operator, so that we're able to interpret the object as an istream. The conversion operator doesn't change the IFork object, so it can be a const member function.
  • The fork() member function and the void fork(char *const *argv) are almost identical functions. So it was decided to put their common code in another ( protected) fork() member, expecting a memberPtr as its argument: void fork(memberPtr ptr). At this point in our discussion, there is no real need for protected members, but later on, in section 19.4.5, we will use the protected members of IFork.

    Fork::fork() calls fork(memberPtr) using the address of int IFork::child() as its argument;, IFork::fork(char *const *argv) stores the value of its argument in IFork::argv, and then calls the protected IFork::fork(memberPtr) member:

        #include "ifork.H"
    
        void IFork::fork(char *const *arg)
        {
            argv = arg;
            fork(&IFork::externalCommand);
        }
    
    Now we have accomplished what we want: as the member function externalCommand and child have identical prototypes, we can use their addresses as arguments for IFork::fork(i(memberPtr)) (discussed below).
  • Looking at the private data of IFork, we see that the class uses an int io[2] array. This array is used to set up the communication between the parent and child processes. This communication process is realized using a pipe. A pipe is a one-way channel through which information may flow: one process writes information to the pipe, another process may read information from the pipe.

    The pipe() system call can be used to create a pipe, and it will fill io[] with two file descriptors: io[0] represents the end of the pipe from which information may be read, io[1] represents the end to which information may be written. Constructing a pipe may fail: in those cases a bad_pipe exception will be thrown.

  • The protected IFork::fork() member forks, and calls childSetup() to set up the child part of the communication and parentSetup() to set up the parent part of the communication. The function is constructed such that it can be called only once during the lifetime of an IFork() object. After the setup, the function to which member points is called. This is either child() or externalCommand(), which executes the command defined in the elements of the argv data member. Both child() and externalCommand() should not return, but if they do, their return values become the exit values of the child process. If an attempt is made to call fork() multiple times, bad_fork is thrown. Here is its code:
        #include "ifork.H"
    
        void IFork::fork(memberPtr member)
        {
            if (istreamPtr)                 // already initialized ?
                throw bad_fork();
    
            if (Fork::fork() == CHILD)
            {
                childSetup();
                exit((this->*member)());
            }
            
            parentSetup();
        }
    
  • After forking, io has been duplicated: one copy is available in the parent process, the other copy in the child process. As the pipe is used for the information flow from child to parent, the read end of the pipe (io[0]) in the child is not used and the write end of the pipe (io[1]) in the parent is not used. So they can be closed. The handling of the io[] array is done separately for the parent and the child process in, respectively parentSetup() and childSetup().
  • In the protected virtual member function parentSetup() the unused writing end of the pipe is closed, and an ifstream object is allocated using the read section of the io pipe:
        #include "ifork.H"
    
        void IFork::parentSetup()
        {
            close(io[1]);       // no writing done by the parent process
    
            istreamPtr = new std::ifstream(io[0]);
    
            if (!*istreamPtr)   // construction failed ?
                throw bad_istream();
        }
        
    This function, again, was given protected access rights for the benefit of the IOFork class, discussed in section 19.4.5. IOFork will not only use this member, but it will also override it, so it is also defined as a virtual member function.
  • Meanwhile, in the child process childSetup() will make sure that information written to standard output will actually be written to the write section of the pipe. This is realized using redirection: information written to cout is eventually written to the STDOUT_FILENO file descriptor. Using the dup2() system call the write section of the pipe is `copied' to STDOUT_FILENO. From now on, all information written by the child process to cout will eventually appear at the write section of the pipe. Like parentSetup() and for the same reasons, childSetup() is defined as a protected virtual member function.
  • The default child() member function does not do anything impressive. It can be overridden in a class derived from IFork, as shown below. The default implementation just writes its name to cout:
        #include "ifork.H"
    
        int IFork::child() const
        {
            std::cout << "IFork::child()\n";
            return 0;
        }
    
  • The copy constructor and overloaded assignment operator are defined as private members, so they cannot be used to copy or reassign IFork objects. As they won't be used by the IFork member functions either, they will not be implemented.
  • The IFork() constructor creates the pipe, and initializes istreamPtr to zero:
        #include "ifork.H"
    
        IFork::IFork()
        :
            istreamPtr(0)
        {
            if (pipe(io))
                throw bad_pipe();
        }
        
  • The destructor closes the ifstream and deletes allocated memory:
        #include "ifork.H"
    
        IFork::~IFork()
        {
            if (istreamPtr)
            {
                istreamPtr->close();
                delete istreamPtr;
            }
        }
        
  • An example illustrating the use of IFork() to call an external command is:
        #include <string>
        #include "ifork.h"
        
        int main()
        {
            IFork
                ifork;
    
            char *const
                argv[] = {"ls", "-Fla",  0};
    
            ifork.fork(argv);
            
            string
                s;
    
            while (getline(ifork, s))   // shows a list of files
                std::cout << s << endl;
        }
    
    To use IFork::child() it should be overridden in a class that is derived from IFork. Here is an example:
        #include <string>
        #include "ifork.h"
    
        class IF2: public IFork
        {
            public:
                int child() const
                {
                    std::cout << "Hello from:\n"
                            "IF2::child()\n";
                }
        };
    
        int main()
        {
            IF2
                ifork;
    
            ifork.fork();
            
            string
                s;
    
            while (getline(ifork, s))
                std::cout << s << endl;
        }
    

    19.4.4: The `OFork' class

    The OFork class does the opposite from the IFork class. Their implementations are highly comparable. The implementation of OForm is left as an exercise to the reader. The suggested class interface is:
        #include <fstream>
        #include <unistd.h>
        #include "fork.h"
    
        class OFork: public Fork
        {
            public:
                OFork();
                virtual ~OFork();
                operator ostream &() const
                {
                    return *ostreamPtr;
                }
                void fork()
                {
                    fork(&OFork::child);
                }
                void fork(char *const *argv);
                virtual int child() const
                {
                    return 0;
                }
    
            protected:
                void parentSetup();
                void childSetup();
    
            private:
                int externalCommand() const
                {
                    return execvp(argv[0], argv);
                }
    
                void fork(int (OFork::*member)() const);
    
                OFork(OFork const &other);              // NI
                OFork &operator=(OFork const &other);
    
                ofstream
                    *ostreamPtr;
                char 
                    *const *argv;
                int
                    io[2];
        };
    

    19.4.5: The `IOFork' class

    With the construction of the IFork and OFork classes, we primarily realized a reconstruction of the facilities already provided by the procbuf class. One of the limiting characterstics of a procbuf is that its communication is always one-way: either information is read from the external process, or it is written to an external process. The class IOFork, developed below tries to overcome that restriction by implementing a two-way communication between parent and child processes.

    Since IOFork combines the characteristics of IFork and OFork, it can be constructed as a class using multiple derivation: it is derived from both IFork and OFork. This implies that on the one hand, some members are available through inheritance, and on the other hand, that ambiguity is introduced, because IOFork harbors two Fork objects and because several members of IFork and OFork have identical names. If this is considered impractical, then IOFork could also be constructed as a separate class. However, it is probably instructive to see how the introduced ambiguity is handled here. To make matters concrete, we show the final interface, annotating it below. Here is the interface:

        #include "ifork.h"
        #include "ofork.h"
    
        class IOFork: public IFork, public OFork
        {
            public:     
                void fork()
                {
                    IFork::fork();// &IFork::child);
                }
                void fork(char *const *argv)
                {
                    IFork::fork(argv);
                }
            private:
                void parentSetup();
                void childSetup();
         };
    
  • The class contains no constructor or destructor. There is no need for them, as by default the constructor will call the default constructors of the base classes and the destructor will call the default destructors of the base classes, which is exactly what we want. Also note that no overloaded assignment operator or copy constructor is defined: by leaving them out, their default implementations are used, which will call the analogous members of IFork and OFork. As these base class members are unavailable (as they are private), it is again not possible for applications to copy or assign IOFork objects using existing IOFork objects.
  • Operators istream& and ostream & inherit nicely from their respective base classes, and require no further effort.
  • As we have seen, the IFork base class has several protected and virtual members. IFork was specifically designed with the use IOFork can make of these features in mind. The IFork::fork(memberPtr) member function calls IFork::parentSetup() and IFork::childSetup(). As these latter two functions are protected virtual members, they can be easily overridden here. No ambiguity with OFork's members occurs, since a derived class is simply allowed to redefine members of its base class. As the members are virtual in IFork, the compiler also constructs the redefined IOFork::parentSetup() and IOFork::childSetup() as virtual members. Now, by calling IFork::fork(memberPtr) from an IOFork object, the IOFork::parentSetup() and IOFork::childSetup() members will be used, rather than the IFork::parentSetup() and IFork::childSetup(). This is what we want, as the IOFork object must do redirection of standard input and redirection of standard output for its child process.
  • With fork(), we now call IFork::fork(): this member function will call IFork::child(). Since IFork::child() is also a virtual function, we can derive a class from IOFork in which a child() member function is defined. This latter function overrides IFork::child() and will consequently be called by IFork::fork(). Analogously, IOFork::fork(char *const *argv) calls IFork::fork(char *const *argv).
  • Since the IFork section of IOFork is doing the forking, the Fork in IFork contains information about the process id. To obtain the child's process id with IOFork obj call obj.IFork::pid().
  • Here is an example using a class that is derived from IOFork:
        #include <string>
        #include "iofork.h"
    
        class IOF2: public IOFork
        {
            public:
                int child() const
                {
                    string
                        s;
                    while (getline(cin, s))
                    {
                        cerr << "Child process: got `" << s << "'\n";
                        s = "Reveived: " + s;
                        std::cout << s << endl;
                        cerr << "Child process: sent `" << s << "'\n";
                    }
                    return 0;
                }
        };
    
        int main()
        {
            IOF2
                obj;
    
            obj.fork();
            
            string
                s;
    
            std::cout << "Child process id: " << obj.IFork::pid() << endl;
    
            while (getline(cin, s))
            {
                obj << s << endl;
                std::cout << "Parent: sent `" << s << "'\n";
                getline(obj, s);
                std::cout << "Parent: got `" << s << "'\n";
            }
        }
        /*
            Example of generated output:
        Child process id: 32279
        Hello world
        Parent: sent `Hello world'
        Child process: got `Hello world'
        Child process: sent `Reveived: Hello world'
        Parent: got `Reveived: Hello world'
        That's all, folks!
        Parent: sent `That's all, folks!'
        Child process: got `That's all, folks!'
        Child process: sent `Reveived: That's all, folks!'
        Parent: got `Reveived: That's all, folks!'
        */

    19.5: Function objects performing bitwise operations

    In section 17.1 several types of predefined function objects were introduced. Predefined function objects exists performing arithmetic operations, relational operations, and logical operations exist, corresponding to a multitude of binary- and unary operator unary operators.

    For whatever reason, some operators are missing: there are no predefined function objects corresponding to bitwise operations. However, their construction is, given the available predefined function objects, not difficult. The following examples show a template class implementing a function object calling the bitwise and ( operator&()), and a template class implementing a function object calling the unary not ( operator~()). Other missing operators can be constructed similarly. This is, however, left as an exercise to the reader.

    Here is the implementation of a function object calling the bitwise operator&():

        #include <functional>
    
        template <typename _Tp>
        struct bit_and : public binary_function<_Tp,_Tp,_Tp> 
        {
            _Tp operator()(const _Tp& __x, const _Tp& __y) const 
            { 
                return __x & __y; 
            }
        };
    

    Here is the implementation of a function object calling operator~():

        #include <functional>
    
        template <typename _Tp>
        struct bit_not : public unary_function<_Tp,_Tp> 
        {
            _Tp operator()(const _Tp& __x) const 
            { 
                return ~__x; 
            }
        };
    

    These and other missing predefined function objects are also implemented in the file bitfunctional, which is found in one of the cplusplus zip-archives.

    An example using bit_and() to remove all odd numbers from a vector of int values, using the remove_if() generic algorithm is:

        #include <iostream>
        #include <algorithm>
        #include <vector>
        #include "bitand.h"
    
        int main()
        {
            vector<int>
                vi;
            
            for (int idx = 0; idx < 10; ++idx)
                vi.push_back(idx);
    
            copy
            (
                vi.begin(), 
                remove_if(vi.begin(), vi.end(), bind2nd(bit_and<int>(), 1)),
                ostream_iterator<int>(cout, " ")
            );
            cout << endl;
        }
        /*
            Generated output:
        0 2 4 6 8
        */
        

    19.6: Implementing a reverse_iterator

    In this section we will extend the class Vector, introduced in section 18.3 to contain a RandomAccessIterator reverse_iterator.

    First, we note that applying the ordinary iterators of the Vector class (made available by the begin() and end() member functions) to the generic algorithms is without any problem, as the pointers that are returned by these members are accepted by the generic algorithms:

        #include <iostream>
        #include "vector.h"
        
        int main()
        {
            Vector<int>
                vi;
    
            for (int idx = 0; idx < 10; ++idx)
                vi.push_back(idx);
    
            random_shuffle(vi.begin(), vi.end());
    
            copy(vi.begin(), vi.end(), ostream_iterator<int>(cout, " "));
            cout << endl;
        }
        /*
            generated output, e.g.,
        1 7 9 3 5 6 0 4 8 2 
        */

    However, the use of a reverse iterator is not without problems: the reverse iterator shows rather complex behavior, in that it visits the elements in its range backwards, resulting in a reversal of the meanings of the operators that are used to step forward and step backward in a range. If a reverse iterator rit points to element 11 in a range, then it will point to element 10, rather than 12 after ++rit. Because of this complex behavior, the reverse iterator will be implemented as a class reverse_iterator, which can do the required bookkeeping for us.

    Furthermore, in the current context reverse_iterator is a reverse iterator only for the elements of the class Vector. So, a nested class seems to be appropriate here. Since Vector has already been defined, we will decide to leave it untouched. Instead of modifying Vector, a template class Vector2 is derived of it, and Vector2 will be given its nested class reverse_iterator. This latter class can be derived from random_access_iterator to realize a RandomAccessIterator for the class Vector2:

        template <typename Type>
        class Vector2: public Vector<Type>
        {
            public:
                class reverse_iterator: 
                    public random_access_iterator<Type, ptrdiff_t>
                {
                    friend class Vector2<Type>;     // see the text, below
                    ...
                };
            ...
        };
    
    The ptrdiff_t type that is mentioned here is ordinarilly defined as an int (e.g., with the Linux operating system it is defined as a __kernel_ptrdiff_t in types.h, which is defined as an int in posix_types.h).

    The reverse_iterator class is now defined as a nested template class, derived from the template class random_access_iterator. Its surrounding class is Vector2, which itself is a template class that was derived from the template class Vector.

    Within reverse_iterator we store two Type pointers, called begin and end, which will be used to define the reversed range [end - 1, begin - 1). In the private section of reverse_iterator we declare:

        Type
            *begin,
            *end;
    
    Begin and end receive their initial values from the begin() and end() members of the Vector object. However, the class reverse_iterator is closely tied to Vector2: only Vector2 objects should be allowed to construct their own reverse iterators. This policy is enforced by making Vector2<Type> a bound friend of reverse_iterator, and by declaring its constructor private. The constructor expects the two Type pointers, defining the range [begin(), end()). Internally begin() - 1 will be used as endpoint of the reversed range, while end() - 1 will be used as the beginning of the reversed range. The constructor, defined in the private section of the reverse_iterator class, is therefore:
        reverse_iterator(Type *begin, Type *end)
        :
            begin(begin),
            end(end)
        {}
    
    In the implementation of reverse_iterator, end will gradually walk towards begin, which will keep its value. The dereferencing operator will not return *begin, but rather end[-1]: the element just before the element to which end points. All this is allowed, as Vector::begin() and Vecotr::end() return random access iterators themselves, for which pointer arithmetic operations are defined.

    Since the reverse_iterator class declares Vector2<Type> as a friend, the Vector2 class can implement two members, rbegin() and rend() returning reverse_iterator objects:

        reverse_iterator rbegin()
        {
            return reverse_iterator(begin(), end());
        }
        reverse_iterator rend()
        {
            return reverse_iterator(begin(), begin());
        }
    
    Apart from these two members, the Vector2 class could define two constructors, analogous to the constructors in the Vector class. E.g.,
        Vector2()
        :
            Vector<Type>()
        {}
    

    In order to allow programs to use these reverse iterators, a public copy constructor and overloaded assignment operator is needed in the class reverse_iterator. Since the Type pointers in reverse_iterator are merely used for pointing to existing information, a destructor is not required. Of course, this implies that the usefulness of a reverse_iterator terminates if the object from which it was obtained goed out of scope. It is the responsibility of the programmer to make sure this doesn't cause disaster. In the public section of reverse_iterator we add the following members (implementations given as inline code):

        reverse_iterator(reverse_iterator const &other)
        :
            begin(other.begin),
            end(other.end)
        {}
    
        reverse_iterator &operator=(reverse_iterator const &other)
        {
            begin(other.begin);
            end(other.end);
        }
    
    The implementation shown below is fairly Spartan: illegal pointer values are not detected. As the generic algorithms do not require these checks we've left them out. Of course, one could modify the implementation and incorporate checks at several locations. It's up to the reader.

    Random access iterators require us to implement a series of member functions. Here they are, as they should appear in the public section of the reverse_iterator nested class. Most implementations will be self-explanatory, and consist of single lines of code.

        Type &operator*() const
        {
            return end[-1];             // reference to the previous element
        }        
    
        bool operator==(reverse_iterator const &other) const
        {
            return other.end == end;    // comparing two iterators for equality
        }
    
        reverse_iterator &operator++()
        {
            --end;                      // increment is: to the previous element
            return *this;
        }
    
        reverse_iterator &operator--();
        {
            ++end;                      // decrement is: to the next element
            return *this;
        }
    
                                        // remaining members: see the text    
        bool operator<(reverse_iterator const &other) const;
    
        reverse_iterator operator++(int);  
        reverse_iterator operator--(int);
    
        int operator-(reverse_iterator const &other) const;
        reverse_iterator operator+(int step) const;
        reverse_iterator operator-(int step) const;
    
    The following member deserve special attention:
  • bool operator<(reverse_iterator const &other) const
    This function compares two iterators. The operator<() must be interpreted as `the lvalue iterator points to an element earlier in the range than the rvalue iterator'. Since we're talking reversed iterators here, this means that the lvalue operator points to a later element than the rvalue operator in the normal iterator range. So, the overloaded operator should return true if this->end > other.end, implying that the current iterator points to an element that is located earlier in the range than the other iterator. So, we implement:
        return end > other.end;
    
  • reverse_iterator operator++(int)
    The postfix increment operator must return a reverse_iterator pointing to the current element, incrementing the pointer. Again, this implies that end is decremented. Since we have a constructor expecting two (normal) iterators,, the implementation again is a one-liner:
        return reverse_iterator(begin, end--);
    
    Using the postfix decrement with end, a reverse_iterator object is returned representing the iterator's position at the time the postfix increment operator was called, incrementing (so, decrementing end) the reversed iterator as a side effect.
  • reverse_iterator operator++(int)
    Implemented analogously:
        return reverse_iterator(begin, end++);
    
  • int operator-(reverse_iterator const &other) const
    This is the pointer arithmetic operator: it returns the difference between two pointers, representing the number of steps to make to reach the lvalue operand starting from the rvalue operand. Normally we would implement this as end - other.end. However, since reversed iterators reduce their values to reach elements later in the range, the algebraic sign must be negated with reversed iterators: -(end - other.end), or, using simle algebraic manipulations:
        return other.end - end;
    
  • reverse_iterator operator-(int step) const
    This operator allows us to use random staps backward using a random access iterator. With a reversed iterator this means that we must increase the value of the iterator. So we return the following reverse_iterator object: return reverse_iterator(begin, end + step);
  • reverse_iterator operator+(int step) const
    This operator does the opposite. So we implement it accordingly: return operator+(-step);
  • For illustration purposes we show how to implement some of these member functions outside of the template class interface: these implementations could be included in the header file defining the Vector2 interface, if the corresponding member functions are only declared:
        template <typename Type>                   // this is a template function
                                                // the class in which the 
        Vector2<Type>::reverse_iterator<Type>:: // constructor is defined
        reverse_iterator(Type *begin, Type *end)// the constructor itself
        :
            begin(begin),
            end(end)
        {}    
    
        template <typename Type>                   // this is a template function
        Vector2<Type>::reverse_iterator         // this is its return type
                                                // the class in which the funcion
        Vector2<Type>::reverse_iterator<Type>:: // is defined
        operator+(int step) const               // this is the function itself
        {
            return reverse_iterator(begin, end - step);
        }
    
    Finally, an example of a program using the reversed_iterator:
        #include <iostream>
        #include "vector3.h"
        
        int main()
        {
            Vector2<int>
                vi;
    
            for (int idx = 0; idx < 10; ++idx)
                vi.push_back(idx);
    
            copy(vi.begin(), vi.end(), ostream_iterator<int>(cout, " "));
            cout << endl;
    
            random_shuffle(vi.rbegin(), vi.rend());
            copy(vi.begin(), vi.end(), ostream_iterator<int>(cout, " "));
            cout << endl;
    
            sort(vi.rbegin(), vi.rend());          
            copy(vi.begin(), vi.end(), ostream_iterator<int>(cout, " "));
            cout << endl;
    
            reverse(vi.rbegin(), vi.rend());
            copy(vi.begin(), vi.end(), ostream_iterator<int>(cout, " "));
            cout << endl;
    
            copy(vi.rbegin(), vi.rend(), ostream_iterator<int>(cout, " "));
            cout << endl;
        }
        /*
            Generated output:
        0 1 2 3 4 5 6 7 8 9 
        7 1 5 9 3 4 6 0 2 8 
        9 8 7 6 5 4 3 2 1 0 
        0 1 2 3 4 5 6 7 8 9 
        9 8 7 6 5 4 3 2 1 0 
        */
    

    19.7: Using Bison and Flex

    The example discussed in this section digs into the peculiarities of using a parser- and scanner generator with C++. Once the input for a program exceeds a certain level of complexity, it's advantageous to use a scanner- and parser-generator for creating the code which does the actual input recognition. The current example assumes that the reader knows how to use the scanner generator flex and the parser generator bison. Both bison and flex are well documented elsewhere. The original predecessors of bison and flex, called yacc and lex are described in several books, e.g. in O'Reilly's book `lex & yacc'.

    However, scanner- and parser generators are also (and maybe even more commonly, nowadays) available as free software. Both bison and flex can be obtained from ftp://prep.ai.mit.edu/pub/gnu. Flex will create a C++ class when called as flex++, or when the -+ flag is used. With bison the situation is a bit more complex. Scattered over the Internet several bison++ archives can be found (e.g., in ftp://ftp.rug.nl/contrib/frank/software/linux/bison/ or rzbsdi01.uni-trier.de). The information in these archives usually dates back to 1993, irrespective of the version number mentioned with the archive itself.

    Using flex++ and bison++ a class-based scanner and parser can be generated. The advantage of this approach is that the interface to the scanner and the parser tends to become a bit cleaner than without the class interface.

    Below two examples are given. In the first example only a lexical scanner is used to monitor the production of a file from several parts. This example focuses on the lexical scanner, and on switching files while churning through the information. The second example uses both a scanner and a parser to transform standard arithmetic expressions to their postfix notation, commonly encountered in code generated by compilers and in HP-calculators. The second example focuses on the parser.

    19.7.1: Using Flex++ to create a scanner

    In this example a lexical scanner is used to monitor the production of a file from several subfiles. This example focuses on the lexical scanner, and on switching files while churning through the information. The setup is as follows: The input-language knows of an #include statement, which is followed by a string indicating the file which should be included at the location of the #include.

    In order to avoid complexities that have irrelevant to the current example, the format of the #include statement is restricted to the form #include <filepath>. The file specified between the pointed brackets should be available at the location indicated by filepath. If the file is not available, the program should terminate using a proper error message.

    The program is started with one or two filename arguments. If the program is started with just one filename argument, the output is written to the standard output stream cout. Otherwise, the output is written to the stream whose name is given as the program's second argument.

    The program uses a maximum nesting depth. Once the maximum is exceeded, the program terminates with an appropriate error message. In that case, the filename stack indicating where which file was included should be printed.

    One small extra feature is that comment-lines should be recognized: include directives in comment-lines should be ignored, comment being the standard C++ comment-types.

    The program is created in the following steps:

  • First, the file lexer is constructed, containing the specifications of the input-language.
  • From the specifications in lexer the requirements for the class Scanner evolve. The Scanner class is a wrapper around the class yyFlexLexer generated by flex++. The requirements results in the specification of the interface for the class Scanner.
  • Next, the main() function is constructed. A Startup object is created to inspect the command-line arguments. If successful, the scanner's member yylex() is called to construct the output file.
  • Now that the global setup of the program has been specified, the member functions of the different classes are constructed.
  • Finally, the program is compiled and linked.
  • 19.7.1.1: The flex++ specification file

    The organization of the lexical scanner specification file is similar to the one used with flex. However, flex++ creates a class ( yyFlexLexer) from which the class Scanner will be derived.

    The code associated with the regular expression rules will be located inside the class yyFlexLexer. However, it would be handy to access the member functions of the derived class within that code. Fortunately, inheritance helps us to realize this. In the specification of the class yyFlexLexer(), we notice that the function yylex() is a virtual function. In the FlexLexer.h header file we see virtual int yylex():

        class yyFlexLexer: public FlexLexer 
        {
            public:
                yyFlexLexer( istream* arg_yyin = 0, ostream* arg_yyout = 0 );
        
                virtual ~yyFlexLexer();
        
                void yy_switch_to_buffer( struct yy_buffer_state* new_buffer );
                struct yy_buffer_state* yy_create_buffer( istream* s, int size );
                void yy_delete_buffer( struct yy_buffer_state* b );
                void yyrestart( istream* s );
        
                virtual int yylex();
                virtual void switch_streams( istream* new_in, ostream* new_out );
        };
    
    Consequently, if yylex() is defined in a derived class, then this function of the derived class will be called from a base class (i.e., yyFlexLexer) pointer. Since the yylex() function of the derived class is called, that function will have access to the members of its class, and to the public and protected members of its base class.

    The context in which the generated scanner is placed is (by default) the function yyFlexLexer::yylex(). However, this context can be changed by defining the YY_DECL-macro. This macro, if defined, determines the context in which the generated scanner will be placed. So, in order to make the generated scanner part of the derived class function yylex(), three things must be done:

  • The macro YY_DECL must be defined in the lexer specficiation file. It must define the derived class function yylex() as the scanner function. For example:
    #define YY_DECL int Scanner::yylex()
  • The function yylex() must be declared in the class definition of the derived class.
  • As the function yyFlexLexer::yylex() is a virtual function, it must still be defined. It is not called, though, so its definition may be a simple
        int yyFlexLexer::yylex()
        {
            return 0;
        }
    
  • The definition of the YY_DECL macro and the yyFlexLexer::yylex() function can conveniently be placed in the lexer specification file, as shown below.

    Looking at the regular expressions themselves, notice that we'll need rules for the recognition of the comment, for the #include directive, and for the remaining characters. This is all fairly standard practice. When an #include directive is detected, the directive is parsed by the scanner. This is common practice: preprocessor directives are normally not analyzed by a parser, but by the lexical scanner. The scanner uses a mini scanner to extract the filename from the directive, throwing an Scanner::Error value (invalidInclude) if this fails. If the filename could be extracted, it is stored in nextSource, and on completion of the reading of the #include directive switchSource() is called to perform the switch to another file.

    When the end of the file (EOF) is detected, the derived class' member function popSource() is called, which will pop the previously pushed file, returning true. Once the file-stack is empty, the function will return false, resulting in the call of yyterminate(), which will terminate the scanner.

    The lexical scanner specification file has three sections: a C++ preamble, containing code which can be used in the code defining the actions to be performed once a regular expression is matched, a Flex++ symbol area, which is used for the definition of symbols, like a mini scanner, or options, like %option yylineno when the lexical scanner should keep track of the line numbers of the files it is scanning and, finally a rules section, in which the regular expressions and their actions are given. In the current example, the main task of the lexer is to copy information from the istream *yyin to the ostream *yyout, for which the predefined macro ECHO can be used. The complete and annotated lexical scanner specification file to be used with flex++ is:

    %{
    /* ----------------------------------------------------------------------------
                                     C++ -preamble.
       Include header files, other than those generated by flex++ and bison++.
          E.g., include the interface to the class derived from yyFlexLexer
    ----------------------------------------------------------------------------*/
    
                                // the yylex() function that's actually
                                // used
    #define YY_DECL int Scanner::yylex()
    
    #include "scanner.H"        // The interface of the derived class
    
    int yyFlexLexer::yylex()    // not called: overruled by
    {                           // Scanner::yylex()
        return 0;
    }
      
    %}
    
    /* ----------------------------------------------------------------------------
                                  Flex++ symbol area
                                  ~~~~~~~~~~~~~~~~~~
          The symbols mentioned here are used for defining e.g., a mini scanner
    ---------------------------------------------------------------------------- */
    %x comment 
    %x include
    %option yylineno
    
    eolnComment     "//".*
    anyChar         .|\n
    
    /* ----------------------------------------------------------------------------
                                   Flex rules area:
                                   ~~~~~~~~~~~~~~~~
         Regular expressions below here define what the lexer will recognize.
    ---------------------------------------------------------------------------- */
    %%
        /*
            The comment-rules: comment lines are ignored.    
        */
    {eolnComment}
    "/*"                    BEGIN comment;
    <comment>{anyChar}
    <comment>"*/"           BEGIN INITIAL;
    
        /*                
            File switching: #include <filepath>
        */
    #include[ \t]+"<"       BEGIN include;
    <include>[^ \t>]+       nextSource = yytext;
    <include>">"[ \t]*\n    {
                                BEGIN INITIAL;
                                performSwitch();
                            }
    <include>{anyChar}      throw invalidInclude;
    
        /* 
            The default rules: eating all the rest, echoing it to output    
        */ 
    {anyChar}               ECHO;
    
        /*
            The <<EOF>>)rule: pop a pushed file, or terminate the lexer
        */
    <<EOF>>                 {
                                if (!popSource())
                                    yyterminate();
                            }
    
    Since the derived class is able to access the information stored within the lexical scanner itself (it can even access the information directly, since the data members of yyFlexLexer are protected, and thus accessible to derived classes), very much processing can be done by the derived class' member functions. This results in a very clean setup of the lexer specification file, which requires hardly any code in the preamble.

    19.7.1.2: The derived class: Scanner

    The class Scanner is derived from the class yyFlexLexer, generated by flex++. The derived class has access to the data controlled by the lexical scanner. In particular, the derived class has access to the following data members:

  • char *yytext: contains the text matched by a regular expression (accessible outside of the scanner from the YYText() member);
  • int yyleng: the length of the text in yytext (accessible outside of the scanner from the YYLeng() member);
  • int yylineno: the current line number (only if %option yylineo was specified in the lexer specfication file, accessible outside of the scanner from the lineno() member);
  • Other members are available as well, but we feel they are less often used. Details can be found in the file FlexLexer.h, which is part of the flex distribution.

    The class Scanner performs two tasks: It pushes file information about the current file to a file stack, and pops the information pushed last once EOF is detected in a file.

    Several member functions are needed to accomplish these tasks. As they are auxiliary to the scanner, they are private members. In practice, these private members are developed once the need for them arises. In the following interface of the Scanner class the final header file is given. Note that, apart from the private member functions, several private data members are used as well. These members are initialized by the Scanner() constructor and are used in the private member functions. They are discussed below, in the context of the member functions that are using them. Here is the scanner.h header file:

    #include <FlexLexer.h>  // provides yyFlexLexer interface
    #include <fstream>
    #include <string>
    #include <stack>
    #include <vector>
    
    using namespace std;    // FBB: needed for g++ >= 3.0
    
    class Scanner: public yyFlexLexer
    {
        public:         
            enum Error
            {
                invalidInclude,
                circularInclusion,
                nestingTooDeep,
                cantRead,
            };
                    
            Scanner(istream *yyin, string const &initialName);
            string const &lastFile()
            {
                return fileName.back();
            }
            void stackTrace();  // dumps filename stack to cerr
            int yylex();        // overruling yyFlexLexer's yylex()
        private:
            Scanner(Scanner const &other);      // no Scanner copy-initialization
            Scanner &operator=(Scanner const &other);   // no assignment either
    
            bool popSource();   // true: source popped
            void performSwitch();
            void throwOnCircularInclusion();
    
            stack<yy_buffer_state *>
                state;
            vector<string>
                fileName;
            string
                nextSource;
    
            static unsigned const
                sizeof_buffer = 16384,
                maxDepth = 10;
    };
    
    If the scanner can extract a filename from an #include directive, a switch to another file is performed by performSwitch(). If the filename could not be extracted, the scanner throws an invalidInclude exception value. The performSwitch() member and the matching function popSource() handle the file switch. In particular, note that yylineno is not updated when a file switch is performed. If line numbers are to be monitored, then the current value of yylineno should be pushed on a stack, and yylineno should be reset by performSwitch(), while popSource() should reinstate a former value of yylineno by popping a previous value from the stack. In the current implementation of Scanner a simple stack of yy_buffer_state pointers is maintained. Changing that into a stack of pair<yy_buffer_state *, unsigned> elements allows you to save the line numbers. This modification is left as an exercise to the reader.

    As mentioned. performSwitch() performs the actual file-switching. The yyFlexLexer class provides a series of member functions that can be used for file switching purposes. The file-switching capability of a yyFlexLexer object is founded on the struct yy_buffer_state, containing the state of the scan-buffer of the file that is currently scanned by the lexical scanner. This buffer is pushed on the state stack when an #include is encountered. Then this buffer is replaced by the buffer that will be created for the file that is mentioned in the #include directive. The actual file switching is realized as follows:

  • First, the current depth of the include-nesting is inspected. If maxDepth is reached, the stack is considered full, and the scanner throws a nestingTooDeep exception.
  • Next, the fileName vector is inspected, to avoid circular inclusions. If nextSource is encountered in the fileName vector, the inclusion is refused, and the scanner throws a circularInclusion exception. The member function throwOnCircularInclusion() is called for this task.
  • Then, a new ifstream object is created, for the filename in nextSource. If this fails, the scanner throws a cantRead exception.
  • Finally, a new yy_buffer_state is created for the newly opened stream, and the lexical scanner is instructed to switch to that stream using yyFlexLexer's member function yy_switch_to_buffer().
  • The sources for the member functions performSwitch() and throwOnCircularInclusion() are:
    #include "scanner.H"
    
    void Scanner::performSwitch()
    {   
        if (state.size() == maxDepth)
            throw nestingTooDeep;
    
        fileName.push_back(nextSource);
        throwOnCircularInclusion();
        
        ifstream
            *newStream = new ifstream(nextSource.c_str());
            
        if (!*newStream)
            throw cantRead;
    
        state.push(yy_current_buffer);
        yy_switch_to_buffer(yy_create_buffer(newStream, sizeof_buffer));
    }
    
    #include "scanner.H"
    
    void Scanner::throwOnCircularInclusion()
    {   
        vector<string>::iterator
            it = find(fileName.begin(), fileName.end(), nextSource);
    
        if (it != fileName.end())
            throw circularInclusion;
    }
    
    The member function popSource() is called to pop the previously pushed buffer from the stack, to continue its scan just beyond the just processed #include directive. The popSource() function first inspects the size of the state stack: if empty, false is returned and the fuction terminates. if not empty, then the current buffer is deleted, to be replaced by the state waiting on top of the stack. The file switch is performed by the yyFlexLexer members yy_delete_buffer() and yy_switch_to_buffer. Note that yy_delete_buffer() takes care of the closing of the ifstream and of deleting the memory allocated for this stream in performSwitch(). Furthermore, the filename that was last entered in the fileName vector is removed. Having done all this, the function returns true:
    #include "scanner.H"
    
    bool Scanner::popSource()
    {       
        if (state.empty())
            return false;
    
        yy_delete_buffer(yy_current_buffer);
        yy_switch_to_buffer(state.top());
        state.pop();
        fileName.pop_back();
        return true;
    }
    
    These functions complete the implementation of the complete lexical scanner. the lexical scanner itself is stored in the Scanner::yylex() function. The Scanner object itself only has three public member functions and a constructor. One member function is yylex(), the other two allow the called to dump a stack trace and to obtain the name of the file that was last processed by the scanner. The constructor is a simple piece of code. Here is its source:
    #include "scanner.h"
    
    Scanner::Scanner(istream *yyin, string const &initialName)
    {
        switch_streams(yyin, yyout);
        fileName.push_back(initialName);
    }
    

    19.7.1.3: The main() function

    The main program is very simple. As the program expects a filename to start the scanning process at, initially the number of arguments is checked. If at least one argument was given, then a ifstream object is created. If this object can be created, then a Scanner object is constructed, receiving the address of the ifstream object and the name of the file as its arguments. Then the yylex() member function of the Scanner object is called. This function is inherited from the Scanner's base class yyFlexLexer. If anything fails inside the scanner, an exception is thrown, which is caught at the end of the main() function:

    /*                              lexer.cc
    
       A C++ main()-frame generated by C++ for lexer.cc
    
    */
    
    #include "lexer.h"           /* program header file */
    
    
    int main(int argc, char **argv)
    {       
        if (argc == 1)
        {
            cerr << "Filename argument required\n";
            exit (1);
        }
    
        ifstream
            yyin(argv[1]);
    
        if (!yyin)
        {
            cerr << "Can't read " << argv[1] << endl;
            exit(1);
        }
         
        Scanner
            scanner(&yyin, argv[1]);
    
        try
        {
            scanner.yylex();
        }
        catch (Scanner::Error err)
        {
            char const *msg[] =
            {
                "Include specification",
                "Circular Include",
                "Nesting",
                "Read",
            };
    
            cerr << msg[err] << " error in " << scanner.lastFile() << 
                                ", line " << scanner.lineno() << endl;
            scanner.stackTrace();
            return 1;
        }
    }
    

    19.7.1.4: Building the scanner-program

    The final program is constructed in two steps. These steps are given for a Unix system, on which flex++ and the Gnu C++ compiler g++ have been installed:

  • First, the lexical scanner's source is created using flex++. For this the command
    flex++ lexer

    can be given.

  • Next, all sources are compiled and linked, using the libfl.a library. The appropriate command here is
    g++ -o scanner *.cc -lfl

  • For the purpose of debugging a lexical scanner the rules that are matched and the tokens that are returned are useful information. When flex++ is called with the -d flag, debugging code will be part of the generated scanner. Apart from that, the debugging code must be activated. Assuming the scanner object is called scanner, the statement
    scanner.set_debug(1);

    must be given following the construction of the scanner object.

    19.7.2: Using both bison++ and flex++

    When an input language exceeds a certain level of complexity, a parser is generally needed to be able to control the complexity of the input language. In these cases, a parser generator is used to generate the code that is required to determine the grammatical correctness of the input. The function of the lexical scanner is to provided chunks of the input, called tokens, for the parser to work with.

    Starting point for a program using both a parser and a scanner is the grammar: the grammar is specified first. This results in a set of tokens which can be returned by the lexical scanner (commonly called the lexer). Finally, auxiliary code is provided to `fill in the blanks': the actions which are performed by the parser and the lexer are not normally specified in the grammatical rules or lexical regular expressions, but are executed by functions, which are called from within the parser's rules or associated with the lexer's regular expressions.

    In the previous section we've seen an example of a C++ class generated by flex++. In the current section we concern ourselves with the parser. The parser can be generated from a grammar specified for the program bison++. The specification of bison++ is similar to the specifications required for bison, but a class is generated, rather than a single function. In the next sections we'll develop a program converting infix expressions, in which binary operators are written between their operands, to postfix expressions, in which binary operators are written behind their operands. A comparable situation holds true for the unary operators - and +: We ignore the + operator, but the - will beconverted to a unary minus.

    Our calculator will recognize a minimal set of operators: multiplication, addition, parentheses, and the unary minus. We'll distinguish real numbers from integers, to illustrate a subtlety in the bison-like grammar specifications, but that's about it: the purpose of this section, after all, is to illustrate a C++ program, using a parser and a lexer, and not to construct a full-fledged calculator.

    In the next few sections we'll develop the grammar in a bison++ specification file. Then, the regular expressions for the scanner are specified according to the requirements of flex++. Finally the program is constructed.

    The class-generating bison software (bison++) is not widely available. The version used by us is 2.20. It can be obtained at ftp://ftp.rug.nl/contrib/frank/software/linux/bison/bison++2.20.tar.gz.

    19.7.2.1: The bison++ specification file

    The grammar specification file used with bison++ is comparable to the specification file used with bison. Differences are related to the class nature of the resulting parser. Our calculator will distinguish real numbers from integers, and will support the basic set of arithmetic operators.

    The bison++ specification file consists of the following sections:

  • The header section. This section is comparable to the C specification section used with bison. The difference being the %header{...} opening. In this section we'll encounter mainly declarations: header files are included, and the yyFlexLexer object is declared.
  • The token section. In this section the bison tokens, and the priority rules for the operators are declared. However, bison++ has several extra items that can be declared here. They are important and warrant a section of their own.
  • The rule section. The grammatical rules define the grammar. This section has not been changed since the bison program.
  • 19.7.2.2: The bison++ token section

    The token section contains all the tokens that are used in the grammar, as well as the priority rules as used for the mathematical operators. Moreover, several new items can be declared here:

  • %name ParserName. The name ParserName will be the name of the parser's class. This entry should be the first entry of the token section. It is used in cases where multiple grammars are used, to make sure that the different parser-classes use unique identifiers. By default the name parse is used.
  • %define name content. The %define has the same function as the #define statement for the C++ preprocessor. It can be used to define, e.g., a macro. Internally, the defined symbol will be the concatenation of YY_, the parser's class name, and the name of the macro. E.g.,
    YY_ParserName_name

    Several symbols will normally be defined here. Of all the definitions that can be given here, the following two are required:

  • %define LEX_BODY { inline-code }: here the body of the call to the lexer is defined. It can be defined as = 0 for an abstract parser-class, but otherwise it must contain the code (including surrounding curly braces) representing the call to the lexer. For example, if the lexer object generated by flex++ is called lexer, this declaration should be
    %define LEX_BODY { return lexer.yylex(); }

  • %define ERROR_BODY { inline-code }: similarly, the body of the code of the call to the error-function is defined here. It can be defined as = 0, in which case the parser's class will again become abstract. Otherwise, it is used to specify the inner workings of the error function, including surrounding braces. E.g.,
    %define ERROR_BODY { cerr << "syntax Error\n"; }

  • When the LEX_BODY and ERROR_BODY definitions are omitted, then the compiler is not able to complete the virtual table of the parser class, and the linking phase will report an error like
    undefined reference to `Parser virtual table'

    The remaining symbols are optional, and can be (re)defined as needed:

  • %define DEBUG 1: if non-0 debugging code will be included in the parser's source.
  • %define ERROR_VERBOSE: if defined, the parser's stack will be dumped when an error occurs.
  • %define LVAL yylval: the default variable name is shown here: the variable name containing the parser's semantic value is by default yylval, but its name may be redefined here.
  • %define INHERIT : public ClassA, public ClassB: the inheritance list for the parser's class. Note that it starts with a colon. The space character between INHERIT and the colon may be omitted. The %define itself should be omitted if the parser class is not derived from another class.
  • %define MEMBERS member-prototypes: if the parser contains extra members, they must be declared here. Note that there is only one %define MEMBERS definition allowed. So, if multiple members are to be declared, they must all be declared at this point. To prevent very long lines in the specification file, the \ ( backslash) can be used at the end of a line, to indicate that it continues on the next line of the source-text. E.g.,
        %define MEMBERS void lookup(); \ 
                        void lookdown();
    
    The MEMBERS section starts as a public section. If private members are required too, a private: directive can be part of the MEMBERS section.
  • Constructor-related defines: When a special parser constructor is needed, then three %defines can be used:
  • %define CONSTRUCTOR_PARAM parameterlist: this defines the parameterlist for the parser's constructor. Here the types and names of the parameters of the parser's constructor should be given. The surrounding parentheses of the parameter list are not part of the CONSTRUCTOR_PARAM definition.
  • %define CONSTRUCTOR_INIT : initializer(s): this defines the base class and member initializers of the constructor. Note the initial colon following CONSTRUCTOR_INIT, which is required. The colon may be given immediately after the CONSTRUCOR_INIT statement, or blanks may be used to separate the symbol from the colon.
  • %define CONSTRUCTOR_CODE { code }: this defines the code of the parser's constructor (including surrounding curly braces).
  • When the parser doesn't need special effects, a constructor will not be needed. In those cases the parser can be created as follows (using the default parser-name):
    parse parser;

  • %union. This starts the definition of the semantical value union. It replaces the #define YYSTYPE definition seen with bison. An example of a %union declaration is
        %union 
        {   
            int 
                i;
            double
                d;
        };
    
    The union cannot contain objects as its fields, as constructors cannot be called when a union is created. This means that a string cannot be a member of the union. A string *, however, is a possible union member. As a side line: the lexical scanner does not have to know about this union. The scanner can simply pass its scanned text to the parser through its YYText() member function. For example, a statement like:
    $$.i = atoi(scanner.YYText());

    can be used to convert matched text to a value of an appropriate type.

  • Associating tokens and non-terminals with union fields. Tokens and non-terminals can be associated with union fields. This is strongly advised. By doing so, the parser's action implementations become much cleaner than if the tokens aren't associated with fields. As non-terminals can be associated with union fields, the generic return variable $$ or the generic return values $1, $2, etc, that are associated with components of rules can be used, rather than $$.i, $3.d, etc. To associate a non-terminal or a token with a union field, the <fieldname> specification is used. E.g.,
        %token <i> INT          // token association (deprecated)
               <d> DOUBLE
        %type  <i> intExpr      // non-terminal association
    
    In this example, note that both the tokens and the non-terminals can be associated with a field of the union. However, as noted earlier, the lexical scanner does not have to know about all this. In our opinion, it is cleaner to let the scanner do just one thing: scan texts. The parser knows what the input is all about, and may convert strings like "123" to an integer value. Consequently, we are discouraging the association of a union field and a token. In the upcoming description of the rules of the grammar this will be further illustrated.
  • In the %union discussion the %token and %type specifications should be noted. They are used for the specficiation of the tokens ( terminal symbols) that can be returned by the lexical scanner, and for the specification of the return types of non-terminals. Apart from %token the token indicators %left, %right and %nonassoc may be used to specify the associativity of operators. The tokens mentioned at these indicators are interpreted as tokens indicating operators, associating in the indicated direction. The precedence of operators is given by their order: the first specification has the lowest priority. To overrule a certain precedence in a certain context, %prec can be used. As all this is standard bison practice, it isn't further discussed in this context. The documentation provided with the bison distribution should be consulted for further reference.
  • 19.7.2.3: The bison++ grammar rules

    The rules and actions of the grammar are specified as usual. The grammar for our little calculator is given below. A lot of rules, but they illustrate the use of non-terminals associated with value-types.

        lines:
            lines
            line
        |
            line
        ;
        
        line:
            intExpr
            '\n'
            {
                cerr << "int: " << $1 << endl;
            }
        |
            doubleExpr
            '\n'
            {
                cerr << "double: " << $1 << endl;
            }
        |
            '\n'
            {
                cout << "Good bye\n";
                YYACCEPT;
            }
        |
            error
            '\n'
        ;
        
        intExpr:
            intExpr '*' intExpr
            {
                $$ = $1 * $3;
            }
        |
            intExpr '+' intExpr
            {
                $$ = $1 + $3;
            }
        |
            '(' intExpr ')'
            {
                $$ = $2;
            }
        |
            '-' intExpr         %prec UnaryMinus
            {
                $$ = -$2;
            }
        |
            INT
            {
                $$ = atoi(lexer.YYText());
            }
        ;
        
        doubleExpr:
            doubleExpr '*' doubleExpr
            {
                $$ = $1 * $3;
            }
        |
            doubleExpr '+' doubleExpr
            {
                $$ = $1 + $3;
            }
        |
            doubleExpr '*' intExpr
            {
                $$ = $1 * $3;
            }
        |
            doubleExpr '+' intExpr
            {
                $$ = $1 + $3;
            }
        |
            intExpr '*' doubleExpr
            {
                $$ = $1 * $3;
            }
        |
            intExpr '+' doubleExpr
            {
                $$ = $1 + $3;
            }
        |
            '(' doubleExpr ')'
            {
                $$ = $2;
            }
        |
            '-' doubleExpr         %prec UnaryMinus
            {
                $$ = -$2;
            }
        |
            DOUBLE
            {
                $$ = atof(lexer.YYText());
            }
        ;
    
    Using these rules a simple calculator is defined in which integer and real values can be negated, added, and multiplied, and in which standard priority rules can be circumvented using parentheses. The rules show the use of typed nonterminal symbols: doubleExpr is linked to real (double) values, intExpr is linked to integer values. Precedence and type association is defined in the token section of the parser specification file, which is:
        %name  Parser                
        %union 
        {
            int i;
            double d;
        };
        %token      INT
                    DOUBLE
        %type   <i> intExpr
                <d> doubleExpr
        
        %left   '+'
        %left   '*'
        %right  UnaryMinus
        
        %define MEMBERS                                         \ 
            virtual ~Parser()   {}                              \ 
            private:                                            \ 
                yyFlexLexer lexer;              
        %define LEX_BODY {return lexer.yylex();}
        
        %define ERROR_BODY { cerr << "error encountered\n"; }
    
    In the token section we see the use of the %type specifiers, connecting intExpr to the i-field of the semantic-value union, and connecting doubleExpr to the d-field. At first sight it looks complex, since the expression rules must be included for each individual return type. On the other hand, if the union itself would have been used, we would have had to specify somewhere in the returned semantic values what field to use: less rules, but more complex and error-prone code.

    Also, note that the lexical scanner is included as a member of the parser. There is no need to define the scanner outside of the parser, as it's not used outside of the parser object. The virtual destructor is included as an member to prevent the compiler from complaining about the parser having a non-virtual destructor.

    19.7.2.4: The flex++ specification file

    The flex-specification file that is used by our calculator is simple: blanks are skipped, single characters are returned, and numerical values are returned as either Parser::INT or Parser::DOUBLE tokens. Here is the complete flex++ specification file:

    %{ 
    #include "parser.h"
    %}
    %option yylineno
    %%
    
    [ \t]                       ;
    [0-9]+                      return(Parser::INT);
    
    "."[0-9]*                   |                                        
    [0-9]+("."[0-9]*)?          return(Parser::DOUBLE);
    
    exit                        |
    quit                        return (Parser::DONE);
    
    .|\n                        return (*yytext);
    

    19.7.2.5: The generation of the code

    The code is generated in the same way as with bison and flex. To order bison++ to generate the files parser.cc and parser.h, give the command:

    bison++ -d -o parser.cc parser

    Next, flex++ will generate code on lexer.cc using the command

    flex++ -I -olexer.cc lexer

    Note here that flex++ expects no blanks between the -o flag and lexer.cc.

    On Unix systems, linking and compiling the generated sources and the source for the main program (given below) is realized by the command:

    g++ -o calc -Wall *.cc -lfl -s

    Note the fact that the libfl.a library is mentioned here. If it's not mentioned, the linker will complain about unresolved references like yywrap().

    A source in which the main() function, the lexical scanner and the parser objects are defined is, finally:

    #include "parser.h"
    
    int main()
    {
        Parser
            parser;
    
        cout << "Enter (nested) expressions containing ints, doubles, *, + and "
                "unary -\n"
                "operators. Enter an empty line, exit or quit to exit.\n";
    
        parser.yyparse();
    
        return (0);
    }