Chapter 19: Concrete examples of C++

We're always interested in getting feedback. E-mail us if you like this guide, if you think that important material is omitted, if you encounter errors in the code examples or in the documentation, if you find any typos, or generally just if you feel like e-mailing. Send your email to Frank Brokken.

Please state the document version you're referring to, as found in the title (in this document: 5.1.1a) and please state the paragraph you're referring to.

All mail received is seriously considered, and new (sub)releases of the Annotations will normally reflect your suggestions for improvements. Except for the incidental case I will not otherwise acknowledge the receipt of suggestions for improvements. Please don't misinterpret this for lack of appreciation.

This chapter presents a number of concrete examples of programming in C++. Items from this document such as virtual functions, static members, etc. are rediscussed. Examples of container classes are shown.

Also, with the advent of the ANSI/ISO standard, classes supporting streams based on file descriptors are no longer available, including the Gnu procbuf extension. These classes were frequently used in older C++ programs. This section of the C++ Annotations develops an alternative: classes extending streambuf, allowing the use of file descriptors, and classes around the fork() system call.

Another example digs into the peculiarities of using a parser- and scanner-generator with C++. Once the input for a program exceeds a certain level of complexity, it's advantageous to use a scanner- and parser-generator for creating the code which does the actual input recognition. The example describes the usage of these tool in a C++ environment.

19.1: `streambuf' classes using file descriptors

19.1.1: A class for output operations

Usually extensions to the ANSI/ISO standard are available allowing us to read from and/or write to file descriptors. However, these extensions are not standard, and may thus vary across compilers. As a file descriptor can be considered a device, it seems natural to use the class streambuf as the starting point for the construction of classes interfacing file descriptors. In this section we will discuss the construction of a class which may be used to write to a device defined by its file descriptor: it may be a file, but could be a pipe or socket. Section 19.1.2 will discuss the matter of reading from devices that are known by their file descriptors, while section 19.3 will reconsider the question of redirection, which was discussed earlier in section 5.8.3.

Basically, deriving a class for output operations is very simple. The only member function that must be overridden is the virtual int overflow(int c) member. This member is responsible for writing characters to the device in the case where there is no buffer or no more room in the buffer. If fd represents the file descriptor to write to, and if we decide against using a buffer, the member overflow() can be as simple as:

    class UnbufferedFD: public std::streambuf
    {
        public:
            int
            overflow(int c)
            {
                if (c != EOF)
                {
                    if (write(fd, &static_cast<char>(c), 1) != 1)
                        return EOF;
                }
                return c;
            }
            ...
    }
The reveived argument is either written as char to the file descriptor, or EOF is returned. However, this approach does not use an output buffer. As the use of a buffer is strongly advised (see also the next section), we will discuss the construction of a class using an output buffer is somewhat greater detail.

When an output buffer is to be used, the overflow() member gets a bit more complex, due to the fact that it is now called only when the buffer is full. So, first we will have to flush the buffer, for which the (virtual) function streambuf::sync() should be used. Defining sync() and using it in overflow() is not all that must be done: the information to write might be less than the size of the buffer. So, at the end of the lifetime of our special streambuf object, it might find itself having a buffer that is only partially full. So, we must make sure that the buffer is flushed by the time our object goes out of scope. This is of course very simple: sync() should be called in the destructor as well.

Now that we've considered the consequences of using an output buffer, we're almost ready to construct our derived class. We add a couple of extra features. First, we permit the specification of the size of the output buffer by the user of the class. Second, we permit the construction of an object of the class before the file descriptor to use is actually known. Later, in section 19.4 we'll encounter a situation where this feature will be used. In order to save some space, the successful operation of various calls were not checked. In `real life' implementations these checks should of course not be omitted. Here is the class ofdnstreambuf:

    #ifndef _OFDNSTREAMBUF_H_
    #define _OFDNSTREAMBUF_H_
    
    #include <unistd.h>
    #include <streambuf>
    #include <ios>
    
    class ofdnstreambuf: public std::streambuf
    {
        public:
            ofdnstreambuf()
            :
                bufsize(0),
                buffer(0)
            {}
    
            ofdnstreambuf(int fd, unsigned bufsize = 1)
            {
                open(fd, bufsize);
            }
    
            ~ofdnstreambuf()
            {
                if (buffer)
                {
                    sync();                                 // 23
                    delete buffer;
                }
            }
    
            void open(int xfd, unsigned xbufsize = 1)
            {
                fd = xfd;
                bufsize = xbufsize == 0 ? 1 : xbufsize;
            
                buffer = new char[bufsize];
                setp(buffer, buffer + bufsize);             // 34
            }
    
            int sync()
            {
                if (pptr() > pbase())
                {
                    write(fd, buffer, pptr() - pbase());   // 41
                    setp(buffer, buffer + bufsize);         // 42
                }
                return 0;
            }
    
            int overflow(int c)
            {
                sync();                                     // 49
                if (c != EOF)
                {
                    *pptr() = static_cast<char>(c);         // 52
                    pbump(1);
                }
                return c;
            }
    
        private:
            unsigned bufsize;
            int     fd;
            char    *buffer;
    };

    #endif
Depending on the number of arguments, the following program uses the ofdstreambuf class to copy its standard input to file descriptor STDOUT_FILENO, which is the symbolic name of the file descriptor used for the standard output. Here is the program:
#include "fdout.h"
#include <string>
#include <iostream>
#include <istream>

using namespace std;

int main(int argc)
{
    ofdnstreambuf
        fds(STDOUT_FILENO, 500);
    ostream
        os(&fds);

    switch (argc)
    {
        case 1:
            os << "COPYING cin LINE BY LINE\n";
            for (string  s; getline(cin, s); )
                os << s << endl;
        break;        

        case 2:
            os << "COPYING cin BY EXTRACTING TO os.rdbuf()\n";

            cin >> os.rdbuf();      // Alternatively, use:  cin >> &fds;
        break;

        case 3:    
            os << "COPYING cin BY INSERTING cin.rdbuf() into os\n";
            os << cin.rdbuf();
        break;
    }
}

19.1.2: Classes for input operations

When a class performing input operations must be derived from the class streambuf, the class should use an input buffer of at least one character, to allow the use of the member functions istream::putback() or istream::ungetc(). Stream classes (like istream) normally allow the ungetting of at least one character using their member functions putback() or ungetc(). This is important, as these stream classes usually interface to streambuf objects. Although strictly speaking it is not necessary to implement a buffer in classes derived from streambuf, we strongly suggest to use buffers is these cases: the implementation is very simple and straightforward, and the applicability of such classes is greatly improved. Therefore, in all our classes derived from the class streambuf at least a buffer of one character will be defined.

19.1.2.1: Using a one-character buffer

When deriving a class (e.g., ifdstreambuf) from streambuf using a buffer of at least one character, at the minimum the member streambuf::underflow() should be overridden, as this is the member to which all requests for input eventually degenerate. Since also a buffer is needed, the member streambuf::setg() is used to inform the streambuf baseclass of the dimensions of the input buffer, so that it is able to set up its input buffer pointers so that eback(), gptr(), and egptr() return correct values. Here is the corresponding class definition:

#include <unistd.h>
#include <streambuf>
#include <ios>

class ifdstreambuf: public std::streambuf
{
    public:
        ifdstreambuf(int fd)
        :
            fd(fd)
        {
            setg(buffer, buffer + 1, buffer + 1);   // 12
        }

        int 
        underflow() 
        {
            if (gptr() < egptr())                   // 18
                return *gptr();                     // 19
    
            if (read(fd, buffer, 1) <= 0)           // 21
                return EOF;
    
            setg(buffer, buffer, buffer + 1);       // 24
            return *gptr();
        }
    protected:                                      // 27
        int     fd;
        char    buffer[1];
};
Please note the following: The above ifdstreambuf class is used in the following program:
#include "ifdbuf.h"
#include <iostream>
#include <istream>

using namespace std;

int main(int argc)
{
    ifdstreambuf
        fds(0);
    istream
        is(&fds);

    cout << is.rdbuf();
}        

19.1.2.2: Using an n-character buffer

How complex would things get if we decide to use a buffer of substantial size? Not that complex. The following class allows use to specify the size of a buffer, and it is basically the same class as ifdstreambuf from the previous section. To make things a bit more interesting, in the current class ifdnstreambuf the member streambuf::xsgetn() was also overridden, to optimize the reading of series of characters at once. Also, a default constructor is provided which can be used in combination with the added open() member in cases where we want to construct a istream object before we know the file descriptor that must be used. Later, in section 19.4 we'll encounter such a situation. In order to save some space, the successful operation of various calls were not checked. In `real life' implementations these checks should of course not be omitted. Here is the new class ifdnstreambuf:

    #ifndef _IFDNBUF_H_
    #define _IFDNBUF_H_

    #include <unistd.h>
    #include <streambuf>
    #include <ios>
    
    class ifdnstreambuf: public std::streambuf
    {
        public:
            ifdnstreambuf()
            :
                bufsize(0),
                buffer(0)
            {}

            ifdnstreambuf(int fd, unsigned bufsize = 1)
            {
                open(fd, bufsize);
            }

            ~ifdnstreambuf()
            {
                if (bufsize)
                {
                    close(fd);
                    delete buffer;                      // 27
                }
            }

            void open(int xfd, unsigned xbufsize = 1)
            {
                fd = xfd;
                bufsize = xbufsize;
                buffer = new char[bufsize];
                setg(buffer, buffer + bufsize, buffer + bufsize);
            }
    
            int underflow()
            {
                if (gptr() < egptr())
                    return *gptr();
    
                int nread = read(fd, buffer, bufsize);
    
                if (nread <= 0)
                    return EOF;
        
                setg(buffer, buffer, buffer + nread);   // 49
                return *gptr();
            }
    
            std::streamsize
            xsgetn(char *dest, std::streamsize n)       // 54
            {
                int nread = 0;
    
                while (n)
                {
                    if (!in_avail())
                    {
                        if (underflow() == EOF)         // 62
                            break;
                    }
    
                    int avail = in_avail();
    
                    if (avail > n)
                        avail = n;
    
                    memcpy(dest + nread, gptr(), avail);// 71
                    gbump(avail);                       // 72
    
                    nread += avail;
                    n -= avail;
                }
    
                return nread;
            }                                           // 79
    
        protected:
            int         fd;
            unsigned    bufsize;
            char*       buffer;
    };

    #endif
With this class, please note: The member function xsgetn() is called by streambuf::sgetn(), which is a streambuf member. The following example illustrates the use of this member function with a ifdnstreambuf object:
    #include "ifdnbuf.h"
    #include <iostream>
    #include <istream>
    
    using namespace std;
    
    int main(int argc)
    {
        ifdnstreambuf fds(0, 30);   // internally: 30 char buffer
        char buf[80];               // main() reads blocks of 80 
                                    // chars
        while (true)
        {
            unsigned n = fds.sgetn(buf, 80); 
            if (n == 0)
                break;
            cout.write(buf, n);
        }
    }        

19.1.2.3: Seeking positions in `streambuf' objects

When devices support seek operations, classes derived from streambuf should override streambuf::seekoff() and streambuf::seekpos(). In the following class ifdseek, which is derived from our previously constructed class ifdstreambuf, we assume that the device of which we have a file descriptor available suports these seeking operations. The class ifdseek is derived from ifdstreambuf, so it uses a character buffer of just one character. The facilities to perform seek operations, which are added to our new class ifdseek will make sure that the input buffer is reset when a seek operation is requested. The class could also be derived from the class ifdnstreambuf, in which case the arguments to reset the input buffer must be adapted such that its second and third parameter point beyond the available input buffer. Here is the class definition:

#include "ifdbuf.h"
#include <unistd.h>

class ifdseek: public ifdstreambuf
{
    typedef std::streambuf::pos_type        pos_type;   // 6
    typedef std::streambuf::off_type        off_type;
    typedef std::ios::seekdir               seekdir;
    typedef std::ios::openmode              openmode;   // 9

    public:
        ifdseek(int fd)                                 // 12
        :
            ifdstreambuf(fd)
        {}

        pos_type                                        // 17
        seekoff(off_type offset, seekdir dir, openmode)
        {
            pos_type pos = 
                lseek
                (
                    fd, offset, 
                    (dir ==  std::ios::beg) ? SEEK_SET :
                    (dir ==  std::ios::cur) ? SEEK_CUR :
                                              SEEK_END
                );

            if (pos < 0)
                return -1;

            setg(buffer, buffer + 1, buffer + 1);       // 32
            return pos;
        }

        pos_type                                        // 36
        seekpos(pos_type offset, openmode mode)
        {
            return seekoff(offset, std::ios::beg, mode);
        }
};
An example of a program using the class ifdseek is the following. If this program is given its own source file using input redirection then seeking is supported, and apart from the first line, every other line is shown twice:
    #include "fdinseek.h"
    #include <string>
    #include <iostream>
    #include <istream>
    #include <iomanip.h>
    
    using namespace std;
    
    int main(int argc)
    {
        ifdseek fds(0);
        istream is(&fds);
        string  s;
    
        while (true)
        {
            if (!getline(is, s))
                break;
    
            streampos pos = is.tellg();
    
            cout << setw(5) << pos << ": `" << s << "'\n";
    
            if (!getline(is, s))
                break;
    
            streampos pos2 = is.tellg();
    
            cout << setw(5) << pos2 << ": `" << s << "'\n";
    
            if (!is.seekg(pos))
            {
                cout << "Seek failed\n";
                break;
            }
        }
    }        

buffer is refilled

19.1.2.4: Multiple `unget()' calls in `streambuf' objects

As stated earlier streambuf classes and classes derived from streambuf should at least support ungetting the character that was last read. Special care must be taken when series of unget() calls are to be supported. In this section we will show how to construct a class which is derived from streambuf and which allows a configurable number of istream::unget() or istream::putback() calls. Support for multiple (say `n') unget() calls is realized by setting aside an initial section of the input buffer, which is then gradually filled up by the last n characters read. Here is an implementation of a class supporting series of unget() calls:

#include <unistd.h>
#include <streambuf>
#include <ios>
#include <algorithm>

class fdunget: public std::streambuf
{
    public:
        fdunget (int fd, unsigned bufsz, unsigned unget)    // 9
        :
            fd(fd),
            reserved(unget)                                 // 12
        {
            unsigned allocate =                             // 14
                    bufsz > reserved ?
                        bufsz
                    :
                        reserved + 1;

            buffer = new char [allocate];                   // 20

            base = buffer + reserved;                       // 22
            setg(base, base, base);                         // 23

            bufsize = allocate - reserved;                  // 25
        }

        ~fdunget()
        {
            delete buffer;
        }

        int 
        underflow()             
        {
            if (gptr() < egptr())
                return *gptr();                             

            unsigned consumed =                             // 39
                static_cast<unsigned>(gptr() - eback());    

            unsigned move = std::min(consumed, reserved);   // 41    

            memcpy(base - move, egptr() - move, move);      // 43

            int nread = read(fd, base, bufsize);            // 45
            if (nread <= 0)       // none read -> 
                return EOF;
    
            setg(base - move, base, base + nread);          // 50

            return *gptr();
        }

    private:
        int         fd;
        unsigned    bufsize;
        unsigned    reserved;
        char*       buffer;
        char*       base;
};
Here is an example of a program illustrating the use of the class fdunget. The program reads at most 10 characters from the standard input, stopping at EOF. A guaranteed unget-buffer of 2 characters is defined, in a buffer of 3 characters. Just before reading a character it is tried to unget at most 6 characters. This is of course never possible, but the program will neatly unget as many characters as possible, considering the actual number of characters read:
    #include "fdunget.h"
    #include <string>
    #include <iostream>
    #include <istream>
    
    using namespace std;
    
    int main(int argc)
    {
        fdunget
            fds(0, 3, 2);
        istream
            is(&fds);
        char
            c;
    
        for (int idx = 0; idx < 10; ++idx)
        {
            cout << "after reading " << idx << " characters:\n";
            for (int ug = 0; ug <= 6; ++ug)
            {
                if (!is.unget())
                {
                    cout 
                    << "\tunget failed at attempt " << (ug + 1) << "\n" 
                    << "\trereading: '";
    
                    is.clear();
                    while (ug--)
                    {
                        is.get(c);
                        cout << c;
                    }
                    cout << "'\n";
                    break;
                }
            }
    
            if (!is.get(c))
            {
                cout << " reached\n";
                break;
            }    
            cout << "Next character: " << c << endl;
        }
    }        
    /*
        Generated output after 'echo abcde | program':
    after reading 0 characters:
            unget failed at attempt 1
            rereading: ''
    Next character: a
    after reading 1 characters:
            unget failed at attempt 2
            rereading: 'a'
    Next character: b
    after reading 2 characters:
            unget failed at attempt 3
            rereading: 'ab'
    Next character: c
    after reading 3 characters:
            unget failed at attempt 4
            rereading: 'abc'
    Next character: d
    after reading 4 characters:
            unget failed at attempt 4
            rereading: 'bcd'
    Next character: e
    after reading 5 characters:
            unget failed at attempt 4
            rereading: 'cde'
    Next character: 
    
    after reading 6 characters:
            unget failed at attempt 4
            rereading: 'de
    '
     reached
    */

19.2: Using form() with ostream objects

In the ANSI/ISO standard does not include the previously available Gnu extension form() (and scan()) for stream objects. In this section the construction of a class oformstream is described, which is derived from ostream and does support form() and vform(). Analogously to oformstream a class iscanstream can be developed, supporting scan() and vscan(). The contruction of this latter class is left as an exercise to the reader.

Here is the class interface and definition. It is annotated below:

    #include <iostream>
    
    class oformstream: public std::ostream
    {
        public:
            oformstream(std::ostream &ostr)
            :
                std::ostream(ostr.rdbuf())
            {}

            std::ostream &form(char const *fmt, ...);
            
    };
A program using the class oformstream is given in the next example. It prints a well-known string and some marker text:
    #include "oformstream.h"
    #include <memory>

    std::ostream &oformstream::form(char const *fmt, ...)
    {
        va_list
            list;
        va_start(list, fmt);
        char
            c;
        unsigned
            n = vsnprintf(&c, 0, fmt, list);
        std::auto_ptr<char>
            buf(new char[n + 1]);

        vsprintf(buf.get(), fmt, list);

        return *this << buf.get();
    }

    using namespace std;

    int main()
    {
        oformstream
            of(cout);

        of << "Hello world\n";
        cout << "Ok, ";
        of << "That's all, folks\n";

        of.form("%s\n", "Hello world of C++") << endl;
    };

19.3: Redirection revisited

Earlier, in section 5.8.3 it was noted that within a C++ program streams could be redirected using the ios::rdbuf() member function. By assigning the streambuf of a stream to another stream, both stream objects access the same streambuf, thus realizing redirection at the level of the programming language itself.

It should be realized that this is fine within the context of the C++ program, but if that context is left, the redirection is terminated, as the operating system does not know about streambuf objects. This happens, e.g., when a program uses a system() call to start a subprogram. The following program uses C++ redirection to redirect the information inserted into cout to a file, and then calls

    system("echo hello world")
to echo a well-known line of text. Since echo writes its information to the standard output this would be the program's redirected file if C++'s redirection would be recognized by the operating system. However, this is not the case, so hello world still appears at the program's standard output and not on the redirected file. A solution of this problem involves redirection at the operating system level, for which system calls like dup() and dup2() are available in some operating systems (e.g., Unix and friends). Examples of the use of these system calls are given in section 19.4 in this chapter.

Here is the example of the failing redirection at the system level following C++ redirection using streambuf redirection:

    #include <iostream>
    #include <fstream>
    #include <cstdlib>

    using namespace::std;

    int main()
    {
        ofstream
            of("outfile");

        cout.rdbuf(of.rdbuf());
        cout << "To the of stream" << endl;
        system("echo hello world");
        cout << "To the of stream" << endl;
    }
    /*
        Generated output: on the file `outfile'
    To the of stream
    To the of stream
        On standard output:
    hello world
    */

19.4: The fork() system call

From the C prorgramming language, the fork() system call is well known. When a program needs to start a new process, system() can be used, but this requires the program to wait for the child process to terminate. The more general way to spawn subprocesses is to call fork().

In this section we will see how C++ can be used to wrap classes around a complex system call like fork(). Much of what follows in this section directly applies to the Unix operating system, and the discussion will therefore focus on that operating system. However, other systems usually provide comparable facilities. The following discussion is based heavily on the notion of design patterns, as published by Gamma et al. (1995)

When fork() is called, the current program is duplicated in memory, thus creating a new process, and both processes continue their execution just below the fork() system call. The two processes may, however, inspect the return value of fork() which is different in the original process (called the parent process) and in the newly created process (called the child process):

A basic Fork class should hide the bookkeeping details of a system call like fork() from its users. The Fork class developed below will do just that. The class itself only needs to take care of the proper execution of the fork() system call. Normally, fork() is called to start a child process, usually boiling down to the execution of a separate process. This child process may expect input at its standard input stream and/or may generate output to its standard output and/or standard error streams. Fork does not know all this, and does not have to know what the child process is. However, a Fork object should be able to activate the child process.

In the child process created by the Fork class developed here, two subtasks will be performed: first, the necessary (standard) streams are redirected, second the actual child process is started.

Comparably, in the parent process two subtasks will also be performed: here istreams and an ostream object can be made available, followed by the activation of the actual parent process. Hoever, the Fork process has no responsibilities here, but defers these tasks to separate objects. This design pattern is called the Chain of Responsibility pattern by Gamma et al. (1995). Using this design pattern the sender of a request is decoupled from its receiver.

Normally, the child process does not return: once it has completed its task it should terminate as a separate process. The parent (main) process usually waits for its child process to complete its task. The terminating child process informs its parent that it is about to terminate by sending a signal to the parent, which should be caught by the parent. If the child terminates and the parent process does not catch the signal generated by the child process then the child process remains visible as a so-called zombi process.

There is one situation where the child process continues to live, and the parent dies. In nature this happens all the time: parents tend to die before their children do. In our context this represents a daemon program, where the parent process dies and the child program continues in fact as a child of the basic init process. Again, when the child eventually dies a signal is sent to its ` step-parent' init. No zombi is created here, as init catches the termination signals of its (step-) children.

The construction of a daemon process is so simple, that no special Fork object is needed (see section 19.4.12 for an example).

19.4.1: The `ChildProcess' abstract base class

The child process will be activated from a method offered by an abstract ChildProcess class, offering a protected pure virtual method doProcess() which must be implemented in a concrete implementation of the abstract class. Depending on the problem at hand, the child process may assume that it can read from its standard input, and that it can write to its standard output and/or standard error. Here is the definition of the abstract ChildProcess class:
    #ifndef _CHILD_PROCESS_
    #define _CHILD_PROCESS_
    
    class ChildProcess
    {
        public:
            void process()
            {
                doProcess();
                exit(-1);                   // doProcess itself should stop
            }
    
        protected:
            virtual void doProcess() = 0;   // implemented in subclasses
    };
    
    #endif
In this class, note that the methode process() is the only public method that is offered. This method performs two tasks: it calls doProcess(), which is implemented in a derived class; and then, as a safeguard, it calls exit(-1): this ensures that the child process is properly terminated if, for whatever reason, the doProcess() member itself fails to terminate the child process properly (see also section 19.4.12 for the reverse case where the child continues to live and the parent dies).

The design pattern in which a generic class defines a member (an action) to be implemented in concrete subclasses is called a Command design pattern by Gamma et al. (1995). The process() method defines a skeleton algorithm, deferring the implementation of the doProcess() step to a subclass. This design pattern is called the Template Method by Gamma et al.

19.4.2: The `ParentProcess' abstract base class

The parent process acts comparably to the child process. The parent process will be activated from a method offered by an abstract ParentProcess class, offering a public pure virtual method process() which must be implemented in a concrete implementation of the abstract class. This process() method always is called with the following arguments: Here is the definition of the abstract class ParentProcess:
    #ifndef _PARENT_PROCESS_
    #define _PARENT_PROCESS_
    
    #include <istream>
    
    class ParentProcess
    {
        public:
            virtual void process        // pure virtual function
            (
                int pid,
                std::ostream &out,
                std::istream &in,
                std::istream &inerr
            ) = 0;
    };
    
    #endif

Again, this is an example of the Command design pattern as identified by Gamma et al. (1995).

19.4.3: The `Redirector' abstract base class

Before the child process can receive information from the parent process and before the parent process can receive information from the child process, the proper communication channels must be constructed. The Fork class itself has no business here.

Redirection at the system level (and not at the C++ level, see section 19.3) is required to connect the child's standard streams to the streams defined as parameters in the ParentProcess::process() member function. The abstract class Redirector defines the interface for this redirection. Depending on the requirements of the child process its standard input, standard output and/or standard error may have to be redirected. Here is the definition of the abstract class Redirector:

    #ifndef _REDIRECTOR_H_
    #define _REDIRECTOR_H
    
    #include <istream>
    
    class Redirector
    {
        public:
            enum
            {
                R,
                W
            };
    
            enum bad_stream
            {};
        
            virtual ~Redirector()
            {}
    
            virtual void child() = 0;   // sets up redirections in the child
            virtual void parent() = 0;  // initializes in, out, err
    
            virtual std::istream &in()
            { 
                throw bad_stream();
                return *reinterpret_cast<std::istream *>(0); 
            }
    
            virtual std::istream &inErr()
            { return in(); }
        
            virtual std::ostream &out()
            { 
                throw bad_stream();
                return *reinterpret_cast<std::ostream *>(0); 
            }
    };
    
    #endif
This class offers the following methods:

The Redirector class represents yet another example of Gamma's (1995) Command design pattern.

19.4.4: The `Fork' class: implementation

Now that Fork's support classes have been defined, let's have a look at the Fork class itself. The class defines but one member: fork(), performing the action fork() system call. An object of the class Fork can be configured by provinding the appropriate Redirector, ChildProcess and ParentProcess objects. In the current definition the Fork constructor defines such parameters and the class defines corresponding reference variables, but an alternative approach where fork() itself defines these reference parameters could have been followed as well. It is up to the reader to experiment with this latter approach.

Here is the interface of the class Fork, together with the implementation of the fork() member function:

    #ifndef _FORK_H_
    #define _FORK_H
    
    class Redirector;
    class ChildProcess;
    class ParentProcess;
    
    class Fork
    {
        public:
            Fork(Redirector &red, ChildProcess &cp, ParentProcess &pp)
            :
                redirector(red),
                child(cp),
                parent(pp)
            {}
    
            void fork() const;
    
        protected:
            Redirector &redirector;
            ChildProcess &child;
            ParentProcess &parent;
    };
    
    #endif
    #include "fork.i"
    
    void Fork::fork() const
    {
        int pid = ::fork();
        if (pid == 0)           // child
        {
            redirector.child();
            child.process();
        }
    
        redirector.parent();
        parent.process(pid,
                        redirector.out(),
                        redirector.in(),
                        redirector.inErr());
    }
Looking at the implementation of the fork() member, note that its child- and parent- sections are implemented rather symmetrically: first the redirector is called to set up the appropriate redirection, then the relevant (child or parent) object is asked to do its thing.

Again we meet a design pattern as identified by Gamma et al. (1995), this time the Chain of Responsibility pattern (see also section 19.4).

19.4.5: Support classes

Now that the groundwork has been completed, we turn our attention to the construction of concrete examples. In order to use the Fork class, the abstract base classes Redirector, ChildProcess and ParentProcess must be implemented in concrete subclasses. The following classes will be developed: Redirection at the system level is accomplished using the dup2() system call, operating on file descriptors. One of these file descriptors is the end of a pipe, as created by the pipe() system call. A pipe is a one-way channel through which information may flow. Typically one process writes information to the pipe, another process reads information from the pipe.

The pipe() system call will fill an int array having two elements with two file descriptors: element 0 represents the end of the pipe from which information can be read, element 1 represents the end to which information can be written.

The Redirector subclasses must set up the redirection, and this is often a rather standard initialization and redirection of the file descriptors created by a pipe() system call. Because this is often standard, it is useful to have a set of objects available doing just that. So, as an extra support class to assist the implementation of Redirector subclasses the Pipe set of classes were developed too.

The abovementioned classes are now constructed. First the Pipe set of classes are developed, followed by the construction of the IORedirector, the ChildPlain, the ParentCmd and the ParentSlurp classes. Eventually, these classes are used in some examples.

Although the story seems like a long one, it is hoped that the reader will be able to appreciate the beauty of the design patterns underlying the construction of the Fork- and its support-classes as discussed here. As we will see, once the design is clear, the implementations are surprisingly short.

19.4.6: The `Pipe' classes

Three Pipe support classes were constructed: Pipe is the base class, defining and initializing the file descriptor array. Its constructor is used only by its subclasses, so it was given the protected access modifier. Here is the definition of the class Pipe:
    #ifndef _PIPE_H_
    #define _PIPE_H_
    
    #include <unistd.h>
    
    class Pipe
    {
        public:
            enum bad_pipe {};

            int getReadFd()
            {
                return fd[R];
            }
            int getWriteFd()
            {
                return fd[W];
            }
        protected:
            enum  { R, W };
    
            Pipe()
            {
                if (pipe(fd))
                    throw bad_pipe();
            }
            int fd[2];
    };
    
    #endif

The IPipe class defines a pipe which can be used by a parent process to read from an ifdnstreambuf object (introduced in section 19.1.2.2). The redirection of a (standard) file descriptor to the file descriptor corresponding to the writing end of the pipe is implemented in its child() members. Note that the unused ends of the pipes (the reading end in the child, the writing end in the parent) are closed: open descriptors will prevent the corresponding pipe to close if only one of the processes using the pipe is done with it. Also, once the redirection has been performed (to save space the successful completion of dup2() has not been checked: it should be checked in production code), the file descriptor associated with the pipe can be closed as well, as the alternate file descriptor will now be used instead. Also note the use of the open() member function of the ifdnstreambuf object. Here is the definition of the IPipe class:

    #ifndef _IPIPE_H_
    #define _IPIPE_H_
    
    #include "pipe.h"
    #include "../../examples/ifdnbuf.h"
    
    class IPipe: public Pipe             // parent reads
    {
        public:
            void child(int std_fd)
            {
                close(fd[R]);
                dup2(fd[W], std_fd);
                close(fd[W]);          
            }
        
            void child(int std_fd1, int std_fd2)
            {
                close(fd[R]);
                dup2(fd[W], std_fd1);
                dup2(fd[W], std_fd2);
                close(fd[W]);          
            }
        
            void parent(ifdnstreambuf &fdbuf, unsigned bufsize = 500)
            {
                close(fd[W]);
                fdbuf.open(fd[R], bufsize);
            }
    };
    
    #endif

The OPipe class defines a pipe which can be used by a parent process to write to an ofdnstreambuf object (introduced in section 19.1.1). This information will reach the reading end of the pipe in the child process. The implementation of IPipe closely mimics the implementation of the class IPipe. Since there is only one standard input stream, the child() member needs no parameter, but can immediately redirect STDIN_FILENO. Here is the definition of the class OPipe:

    #ifndef _OPIPE_H_
    #define _OPIPE_H_
    
    #include "pipe.h"
    #include "../../examples/fdout.h"
    
    class OPipe: public Pipe             // parent writes
    {
        public:
            void child()
            {
                close(fd[W]);
                dup2(fd[R], STDIN_FILENO);
                close(fd[R]);            
            }
        
            void parent(ofdnstreambuf &fdbuf, unsigned bufsize = 500)
            {
                close(fd[R]);
                fdbuf.open(fd[W], bufsize);
            }
    };
    
    #endif

19.4.7: The `IORedirector'

The class IORedirector, derived from the Redirector abstract base class, redirects information written in a parent process to the writing end of a pipe, accessed through its out() ostream to the standard input stream of a child process that uses the reading end of the pipe. Furthermore, any information that's written in the child process to either the standard output stream or the standard error stream may be read in the parent process from the reading end of a pipe, which is accessed through the istream in() reference. First the class definition is shown, then it will be annotated. Here is the class definition:
    #ifndef _IOREDIRECTOR_H_
    #define _IOREDIRECTOR_H
    
    #include "../abc/redirector.h"
    #include "../pipe/ipipe.h"
    #include "../pipe/opipe.h"
    
    class IORedirector: public Redirector      // defines in() out()
    {
        public:
            IORedirector()                  // 11
            :
                ins(&ifdb),
                outs(&ofdb)
            {}
                
            std::istream &in()              // 17
            {
                return ins;
            }
        
            std::ostream &out()             // 22
            {
                return outs;
            }
        
            void child()                    // 27
            {
                ip.child(STDOUT_FILENO, STDERR_FILENO);
                op.child();
            }
        
            void parent()                   // 33
            {
                ip.parent(ifdb);
                op.parent(ofdb);
            }
    
        private:
            ifdnstreambuf ifdb;              // 40
            std::istream ins;
            IPipe   ip;

            ofdnstreambuf ofdb;
            std::ostream outs;
            OPipe   op;                     // 47
    };
            
    #endif
Note that IORedirector can be used in many situations. If a process only writes information, the reading facilities are simply not used. If a process only reads information, the writing facilities are not used. The class does not distinguish information written written by the child process to its standard output stream from information written to its standard error stream. If these streams must be distinguished, another subclass of Redirector must be constructed (probably requiring the use of the select() system call in the parent). This situation is left as an exercise to the reader.

19.4.8: The `ChildPlain' class

The class ChildPlain, derived from the ChildProcess abstract base class, only has to define our child process. The current implementation (see below) defines a public constructor, expecting the full path of the program to execute as child process. In the current context, it is assumed that the program to be executed needs no arguments. So, the program could be /bin/sh or /bin/ls.

The member doProcess() is never seen outside of the ChildProcess class itself, so it needs no public access rights. Here, it is defined as a private member.

At the very minimum, only doProcess() could have been defined, using a fixed call of the process. For illustration purposes ChildPlain was constructed as a slightly more flexible class. Here is the class definition itself:

    #ifndef _CHILD_PLAIN_
    #define _CHILD_PLAIN_
    
    #include "../abc/childprocess.h"
    
    #include <unistd.h>
    
    class ChildPlain: public ChildProcess
    {
        public:
            ChildPlain(char const *cmd)
            :
                cmd(cmd)
            {}
        private:
            void doProcess()        
            {                               
                execl(cmd, cmd, 0);
            }
            char const *cmd;
    };
    
    #endif
        

19.4.9: The `ParentCmd' class

The class ParentCmd, derived from the ParentProcess abstract base class, only needs to implement the process() member, defining the parent's side of the communication with the child process. ParentCmd was constructed with the child process executing /bin/sh in mind. The process() member implements the following simple protocol: Here is the definition of the ParentCmd class.
    #ifndef _PARENT_CMD_
    #define _PARENT_CMD_
    
    #include "../abc/parentprocess.h"
    
    #include <iostream>
    #include <string>
    #include <sys/wait.h>
    #include <signal.h>
    
    using namespace std;
    
    class ParentCmd: public ParentProcess
    {
        public:
            void process 
            (
                int pid, 
                std::ostream &out, 
                std::istream &in, 
                std::istream &inerr     // not used
            )
            {
                string
                    s;
    
                while (true)
                {
                    cerr << "? ";
                    if (!getline(cin, s))
                        break;
    
                    cout << "Cmd: " << s << endl;
                    out << s << endl << "echo EOC" << endl;

                    while (getline(in, s))
                    {
                        cout << "Read: " << s << endl;
                        if (s == "EOC")
                            break;
                    }
                }
                kill(pid, SIGTERM);
                kill(pid, SIGKILL);
    
                waitpid(pid, 0, 0);
            }
    };
    
    #endif
        

19.4.10: The `ParentSlurp' class

The class ParentSlurp, derived from the ParentProcess abstract base class, only needs to implement the process() member, defining the parent's side of the communication with the child process. ParentSlurp was constructed with the child process executing a program merely producing output (like /bin/sh) in mind. The process() member reads all information appearing at its istream &in parameter (note the use of the C++ redirection), and then waits for the child process to complete.

Here is the definition of the ParentSlurp class:

    #ifndef _PARENT_SLURP_
    #define _PARENT_SLURP_
    
    #include "../abc/parentprocess.h"
    
    #include <iostream>
    #include <string>
    #include <sys/wait.h>
    #include <signal.h>
    
    using namespace std;
    
    class ParentSlurp: public ParentProcess
    {
        public:
            void process 
            (
                int pid, 
                std::ostream &out,      // not used    
                std::istream &in,       
                std::istream &inerr     // not used
            )
            {
                std::cout << in.rdbuf();
                waitpid(pid, 0, 0);
            }
    };
    
    #endif

19.4.11: The `Fork' class: example of use

Now that concrete classes have been derived from the abstract base classes, it's time to construct some examples using the Fork class. Two examples are presented here. Both examples use the IORedirector object (introduced in section 19.4.7).

In the first example, /bin/ls is called as child process, and a ParentSlurp object is used to consume the generated output:

    #include "fork/fork.h"
    #include "ioredirector/ioredirector.h"
    #include "childplain/childplain.h"
    #include "parentslurp/parentslurp.h"
    
    int main()
    {
        IORedirector    io;
        ChildPlain      binls("/bin/ls");
        ParentSlurp     slurp;
    
        Fork(io, binls, slurp).fork();
    }
    /*
        Generated output (partially shown):
    abc
    binls
    binsh
    ...
    parentcmd
    parentslurp
    pipe
    */
Note the combined construction of the Fork object as a constant object, and the way fork() is called with this constant object. Alternatively, the construction of the Fork object and the activation of the fork() member could have been separated:
    Fork    f(io, binsh, cmd);
    f.fork();

In the second example, /bin/sh is called to execute several commands, and a ParentCmd object is used to interface between the user of the program and the /bin/sh program. Here is the program and an example of a resulting conversation:

    #include "fork/fork.h"
    #include "ioredirector/ioredirector.h"
    #include "childplain/childplain.h"
    #include "parentcmd/parentcmd.h"
    
    int main()
    {
        IORedirector    io;
        ChildPlain      binsh("/bin/sh");
        ParentCmd       cmd;
    
        Fork(io, binsh, cmd).fork();
    }
    /*
        Example of conversation:
    ? cd tmp
    Cmd: cd tmp
    Read: /bin/sh: cd: tmp: No such file or directory
    Read: EOC
    ? mkdir tmp
    Cmd: mkdir tmp
    Read: EOC
    ? cd tmp
    Cmd: cd tmp
    Read: EOC
    ? cd ../
    Cmd: cd ../
    Read: EOC
    ? rmdir tmp
    Cmd: rmdir tmp
    Read: EOC
    ? cd tmp
    Cmd: cd tmp
    Read: /bin/sh: cd: tmp: No such file or directory
    Read: EOC
    */

19.4.12: The `Daemon' program

Applications exist in which the only purpose of fork() is to start a child process. The parent process terminates immediately after spawning the child process. If this happens, the child process continues to run as a child process of init, the always running first process on Unix systems. Such a process is often called a daemon, running as a background process.

Although the following example can easily be constructed as a plain C program, it was included in the C++ Annotations because it is so closely related to the current discussion of the Fork class. I thought about adding a daemon() member to that class, but eventually decided against it because the construction of a daemon program is very simple and uses none of the features currently offered by the Fork class.

Here is an example illustrating the construction of a daemon program:

    #include <iostream>
    #include <unistd.h>

    int main()
    {
        int pid = fork();
        if (pid != 0)
            return pid < 0;     // 1: failing fork, 0: daemon started

                                // child process: the daemon
        sleep(3);               // wait 3 seconds
        std::cout << "Hello from the child process\n";
    }
    /*
        Generated output:
    The next command prompt, then after 3 seconds:
    Hello from the child process
    */

19.5: Function objects performing bitwise operations

In section 17.1 several types of predefined function objects were introduced. Predefined function objects exists performing arithmetic operations, relational operations, and logical operations exist, corresponding to a multitude of binary- and unary operator unary operators.

For whatever reason, some operators are missing: there are no predefined function objects corresponding to bitwise operations. However, their construction is, given the available predefined function objects, not difficult. The following examples show a template class implementing a function object calling the bitwise and ( operator&()), and a template class implementing a function object calling the unary not ( operator~()). Other missing operators can be constructed similarly. This is, however, left as an exercise to the reader.

Here is the implementation of a function object calling the bitwise operator&():

    #include <functional>

    template <typename _Tp>
    struct bit_and : public binary_function<_Tp,_Tp,_Tp> 
    {
        _Tp operator()(const _Tp& __x, const _Tp& __y) const 
        { 
            return __x & __y; 
        }
    };

Here is the implementation of a function object calling operator~():

    #include <functional>

    template <typename _Tp>
    struct bit_not : public unary_function<_Tp,_Tp> 
    {
        _Tp operator()(const _Tp& __x) const 
        { 
            return ~__x; 
        }
    };

These and other missing predefined function objects are also implemented in the file bitfunctional, which is found in one of the cplusplus zip-archives.

An example using bit_and() to remove all odd numbers from a vector of int values, using the remove_if() generic algorithm is:

    #include <iostream>
    #include <algorithm>
    #include <vector>
    #include "bitand.h"

    int main()
    {
        vector<int>
            vi;
        
        for (int idx = 0; idx < 10; ++idx)
            vi.push_back(idx);

        copy
        (
            vi.begin(), 
            remove_if(vi.begin(), vi.end(), bind2nd(bit_and<int>(), 1)),
            ostream_iterator<int>(cout, " ")
        );
        cout << endl;
    }
    /*
        Generated output:
    0 2 4 6 8
    */
    

19.6: Implementing a reverse_iterator

In this section we will extend the class Vector, introduced in section 18.3 to contain a RandomAccessIterator reverse_iterator.

First, we note that applying the ordinary iterators of the Vector class (made available by the begin() and end() member functions) to the generic algorithms is without any problem, as the pointers that are returned by these members are accepted by the generic algorithms:

    #include <iostream>
    #include "vector.h"
    
    int main()
    {
        Vector<int>
            vi;

        for (int idx = 0; idx < 10; ++idx)
            vi.push_back(idx);

        random_shuffle(vi.begin(), vi.end());

        copy(vi.begin(), vi.end(), ostream_iterator<int>(cout, " "));
        cout << endl;
    }
    /*
        generated output, e.g.,
    1 7 9 3 5 6 0 4 8 2 
    */

However, the use of a reverse iterator is not without problems: the reverse iterator shows rather complex behavior, in that it visits the elements in its range backwards, resulting in a reversal of the meanings of the operators that are used to step forward and step backward in a range. If a reverse iterator rit points to element 11 in a range, then it will point to element 10, rather than 12 after ++rit. Because of this complex behavior, the reverse iterator will be implemented as a class reverse_iterator, which can do the required bookkeeping for us.

Furthermore, in the current context reverse_iterator is a reverse iterator only for the elements of the class Vector. So, a nested class seems to be appropriate here. Since Vector has already been defined, we will decide to leave it untouched. Instead of modifying Vector, a template class Vector2 is derived of it, and Vector2 will be given its nested class reverse_iterator. This latter class can be derived from random_access_iterator to realize a RandomAccessIterator for the class Vector2:

    template <typename Type>
    class Vector2: public Vector<Type>
    {
        public:
            class reverse_iterator: 
                public random_access_iterator<Type, ptrdiff_t>
            {
                friend class Vector2<Type>;     // see the text, below
                ...
            };
        ...
    };
The ptrdiff_t type that is mentioned here is ordinarilly defined as an int (e.g., with the Linux operating system it is defined as a __kernel_ptrdiff_t in types.h, which is defined as an int in posix_types.h).

The reverse_iterator class is now defined as a nested template class, derived from the template class random_access_iterator. Its surrounding class is Vector2, which itself is a template class that was derived from the template class Vector.

Within reverse_iterator we store two Type pointers, called begin and end, which will be used to define the reversed range [end - 1, begin - 1). In the private section of reverse_iterator we declare:

    Type
        *begin,
        *end;
Begin and end receive their initial values from the begin() and end() members of the Vector object. However, the class reverse_iterator is closely tied to Vector2: only Vector2 objects should be allowed to construct their own reverse iterators. This policy is enforced by making Vector2<Type> a bound friend of reverse_iterator, and by declaring its constructor private. The constructor expects the two Type pointers, defining the range [begin(), end()). Internally begin() - 1 will be used as endpoint of the reversed range, while end() - 1 will be used as the beginning of the reversed range. The constructor, defined in the private section of the reverse_iterator class, is therefore:
    reverse_iterator(Type *begin, Type *end)
    :
        begin(begin),
        end(end)
    {}
In the implementation of reverse_iterator, end will gradually walk towards begin, which will keep its value. The dereferencing operator will not return *begin, but rather end[-1]: the element just before the element to which end points. All this is allowed, as Vector::begin() and Vecotr::end() return random access iterators themselves, for which pointer arithmetic operations are defined.

Since the reverse_iterator class declares Vector2<Type> as a friend, the Vector2 class can implement two members, rbegin() and rend() returning reverse_iterator objects:

    reverse_iterator rbegin()
    {
        return reverse_iterator(begin(), end());
    }
    reverse_iterator rend()
    {
        return reverse_iterator(begin(), begin());
    }
Apart from these two members, the Vector2 class could define two constructors, analogous to the constructors in the Vector class. E.g.,
    Vector2()
    :
        Vector<Type>()
    {}

In order to allow programs to use these reverse iterators, a public copy constructor and overloaded assignment operator is needed in the class reverse_iterator. Since the Type pointers in reverse_iterator are merely used for pointing to existing information, a destructor is not required. Of course, this implies that the usefulness of a reverse_iterator terminates if the object from which it was obtained goed out of scope. It is the responsibility of the programmer to make sure this doesn't cause disaster. In the public section of reverse_iterator we add the following members (implementations given as inline code):

    reverse_iterator(reverse_iterator const &other)
    :
        begin(other.begin),
        end(other.end)
    {}

    reverse_iterator &operator=(reverse_iterator const &other)
    {
        begin(other.begin);
        end(other.end);
    }
The implementation shown below is fairly Spartan: illegal pointer values are not detected. As the generic algorithms do not require these checks we've left them out. Of course, one could modify the implementation and incorporate checks at several locations. It's up to the reader.

Random access iterators require us to implement a series of member functions. Here they are, as they should appear in the public section of the reverse_iterator nested class. Most implementations will be self-explanatory, and consist of single lines of code.

    Type &operator*() const
    {
        return end[-1];             // reference to the previous element
    }        

    bool operator==(reverse_iterator const &other) const
    {
        return other.end == end;    // comparing two iterators for equality
    }

    reverse_iterator &operator++()
    {
        --end;                      // increment is: to the previous element
        return *this;
    }

    reverse_iterator &operator--();
    {
        ++end;                      // decrement is: to the next element
        return *this;
    }

                                    // remaining members: see the text    
    bool operator<(reverse_iterator const &other) const;

    reverse_iterator operator++(int);  
    reverse_iterator operator--(int);

    int operator-(reverse_iterator const &other) const;
    reverse_iterator operator+(int step) const;
    reverse_iterator operator-(int step) const;
The following member deserve special attention: For illustration purposes we show how to implement some of these member functions outside of the template class interface: these implementations could be included in the header file defining the Vector2 interface, if the corresponding member functions are only declared:
    template <typename Type>                   // this is a template function
                                            // the class in which the 
    Vector2<Type>::reverse_iterator<Type>:: // constructor is defined
    reverse_iterator(Type *begin, Type *end)// the constructor itself
    :
        begin(begin),
        end(end)
    {}    

    template <typename Type>                   // this is a template function
    Vector2<Type>::reverse_iterator         // this is its return type
                                            // the class in which the funcion
    Vector2<Type>::reverse_iterator<Type>:: // is defined
    operator+(int step) const               // this is the function itself
    {
        return reverse_iterator(begin, end - step);
    }
Finally, an example of a program using the reversed_iterator:
    #include <iostream>
    #include "vector3.h"
    
    int main()
    {
        Vector2<int>
            vi;

        for (int idx = 0; idx < 10; ++idx)
            vi.push_back(idx);

        copy(vi.begin(), vi.end(), ostream_iterator<int>(cout, " "));
        cout << endl;

        random_shuffle(vi.rbegin(), vi.rend());
        copy(vi.begin(), vi.end(), ostream_iterator<int>(cout, " "));
        cout << endl;

        sort(vi.rbegin(), vi.rend());          
        copy(vi.begin(), vi.end(), ostream_iterator<int>(cout, " "));
        cout << endl;

        reverse(vi.rbegin(), vi.rend());
        copy(vi.begin(), vi.end(), ostream_iterator<int>(cout, " "));
        cout << endl;

        copy(vi.rbegin(), vi.rend(), ostream_iterator<int>(cout, " "));
        cout << endl;
    }
    /*
        Generated output:
    0 1 2 3 4 5 6 7 8 9 
    7 1 5 9 3 4 6 0 2 8 
    9 8 7 6 5 4 3 2 1 0 
    0 1 2 3 4 5 6 7 8 9 
    9 8 7 6 5 4 3 2 1 0 
    */

19.7: Using Bison and Flex

The example discussed in this section digs into the peculiarities of using a parser- and scanner generator with C++. Once the input for a program exceeds a certain level of complexity, it's advantageous to use a scanner- and parser-generator for creating the code which does the actual input recognition. The current example assumes that the reader knows how to use the scanner generator flex and the parser generator bison. Both bison and flex are well documented elsewhere. The original predecessors of bison and flex, called yacc and lex are described in several books, e.g. in O'Reilly's book `lex & yacc'.

However, scanner- and parser generators are also (and maybe even more commonly, nowadays) available as free software. Both bison and flex can be obtained from ftp://prep.ai.mit.edu/pub/gnu. Flex will create a C++ class when called as flex++, or when the -+ flag is used. With bison the situation is a bit more complex. Scattered over the Internet several bison++ archives can be found (e.g., in ftp://ftp.rug.nl/contrib/frank/software/linux/bison/ or rzbsdi01.uni-trier.de). The information in these archives usually dates back to 1993, irrespective of the version number mentioned with the archive itself.

Using flex++ and bison++ a class-based scanner and parser can be generated. The advantage of this approach is that the interface to the scanner and the parser tends to become a bit cleaner than without the class interface.

Below two examples are given. In the first example only a lexical scanner is used to monitor the production of a file from several parts. This example focuses on the lexical scanner, and on switching files while churning through the information. The second example uses both a scanner and a parser to transform standard arithmetic expressions to their postfix notation, commonly encountered in code generated by compilers and in HP-calculators. The second example focuses on the parser.

19.7.1: Using Flex++ to create a scanner

In this example a lexical scanner is used to monitor the production of a file from several subfiles. This example focuses on the lexical scanner, and on switching files while churning through the information. The setup is as follows: The input-language knows of an #include statement, which is followed by a string indicating the file which should be included at the location of the #include.

In order to avoid complexities that have irrelevant to the current example, the format of the #include statement is restricted to the form #include <filepath>. The file specified between the pointed brackets should be available at the location indicated by filepath. If the file is not available, the program should terminate using a proper error message.

The program is started with one or two filename arguments. If the program is started with just one filename argument, the output is written to the standard output stream cout. Otherwise, the output is written to the stream whose name is given as the program's second argument.

The program uses a maximum nesting depth. Once the maximum is exceeded, the program terminates with an appropriate error message. In that case, the filename stack indicating where which file was included should be printed.

One small extra feature is that comment-lines should be recognized: include directives in comment-lines should be ignored, comment being the standard C++ comment-types.

The program is created in the following steps:

19.7.1.1: The flex++ specification file

The organization of the lexical scanner specification file is similar to the one used with flex. However, flex++ creates a class ( yyFlexLexer) from which the class Scanner will be derived.

The code associated with the regular expression rules will be located inside the class yyFlexLexer. However, it would be handy to access the member functions of the derived class within that code. Fortunately, inheritance helps us to realize this. In the specification of the class yyFlexLexer(), we notice that the function yylex() is a virtual function. In the FlexLexer.h header file we see virtual int yylex():

    class yyFlexLexer: public FlexLexer 
    {
        public:
            yyFlexLexer( istream* arg_yyin = 0, ostream* arg_yyout = 0 );
    
            virtual ~yyFlexLexer();
    
            void yy_switch_to_buffer( struct yy_buffer_state* new_buffer );
            struct yy_buffer_state* yy_create_buffer( istream* s, int size );
            void yy_delete_buffer( struct yy_buffer_state* b );
            void yyrestart( istream* s );
    
            virtual int yylex();
            virtual void switch_streams( istream* new_in, ostream* new_out );
    };
Consequently, if yylex() is defined in a derived class, then this function of the derived class will be called from a base class (i.e., yyFlexLexer) pointer. Since the yylex() function of the derived class is called, that function will have access to the members of its class, and to the public and protected members of its base class.

The context in which the generated scanner is placed is (by default) the function yyFlexLexer::yylex(). However, this context can be changed by defining the YY_DECL-macro. This macro, if defined, determines the context in which the generated scanner will be placed. So, in order to make the generated scanner part of the derived class function yylex(), three things must be done:

The definition of the YY_DECL macro and the yyFlexLexer::yylex() function can conveniently be placed in the lexer specification file, as shown below.

Looking at the regular expressions themselves, notice that we'll need rules for the recognition of the comment, for the #include directive, and for the remaining characters. This is all fairly standard practice. When an #include directive is detected, the directive is parsed by the scanner. This is common practice: preprocessor directives are normally not analyzed by a parser, but by the lexical scanner. The scanner uses a mini scanner to extract the filename from the directive, throwing an Scanner::Error value (invalidInclude) if this fails. If the filename could be extracted, it is stored in nextSource, and on completion of the reading of the #include directive switchSource() is called to perform the switch to another file.

When the end of the file (EOF) is detected, the derived class' member function popSource() is called, which will pop the previously pushed file, returning true. Once the file-stack is empty, the function will return false, resulting in the call of yyterminate(), which will terminate the scanner.

The lexical scanner specification file has three sections: a C++ preamble, containing code which can be used in the code defining the actions to be performed once a regular expression is matched, a Flex++ symbol area, which is used for the definition of symbols, like a mini scanner, or options, like %option yylineno when the lexical scanner should keep track of the line numbers of the files it is scanning and, finally a rules section, in which the regular expressions and their actions are given. In the current example, the main task of the lexer is to copy information from the istream *yyin to the ostream *yyout, for which the predefined macro ECHO can be used. The complete and annotated lexical scanner specification file to be used with flex++ is:

%{
/* ----------------------------------------------------------------------------
                                 C++ -preamble.
   Include header files, other than those generated by flex++ and bison++.
      E.g., include the interface to the class derived from yyFlexLexer
----------------------------------------------------------------------------*/

                            // the yylex() function that's actually
                            // used
#define YY_DECL int Scanner::yylex()

#include "scanner.H"        // The interface of the derived class

int yyFlexLexer::yylex()    // not called: overruled by
{                           // Scanner::yylex()
    return 0;
}
  
%}

/* ----------------------------------------------------------------------------
                              Flex++ symbol area
                              ~~~~~~~~~~~~~~~~~~
      The symbols mentioned here are used for defining e.g., a mini scanner
---------------------------------------------------------------------------- */
%x comment 
%x include
%option yylineno

eolnComment     "//".*
anyChar         .|\n

/* ----------------------------------------------------------------------------
                               Flex rules area:
                               ~~~~~~~~~~~~~~~~
     Regular expressions below here define what the lexer will recognize.
---------------------------------------------------------------------------- */
%%
    /*
        The comment-rules: comment lines are ignored.    
    */
{eolnComment}
"/*"                    BEGIN comment;
<comment>{anyChar}
<comment>"*/"           BEGIN INITIAL;

    /*                
        File switching: #include <filepath>
    */
#include[ \t]+"<"       BEGIN include;
<include>[^ \t>]+       nextSource = yytext;
<include>">"[ \t]*\n    {
                            BEGIN INITIAL;
                            performSwitch();
                        }
<include>{anyChar}      throw invalidInclude;

    /* 
        The default rules: eating all the rest, echoing it to output    
    */ 
{anyChar}               ECHO;

    /*
        The <<EOF>>)rule: pop a pushed file, or terminate the lexer
    */
<<EOF>>                 {
                            if (!popSource())
                                yyterminate();
                        }
Since the derived class is able to access the information stored within the lexical scanner itself (it can even access the information directly, since the data members of yyFlexLexer are protected, and thus accessible to derived classes), very much processing can be done by the derived class' member functions. This results in a very clean setup of the lexer specification file, which requires hardly any code in the preamble.

19.7.1.2: The derived class: Scanner

The class Scanner is derived from the class yyFlexLexer, generated by flex++. The derived class has access to the data controlled by the lexical scanner. In particular, the derived class has access to the following data members:

Other members are available as well, but we feel they are less often used. Details can be found in the file FlexLexer.h, which is part of the flex distribution.

The class Scanner performs two tasks: It pushes file information about the current file to a file stack, and pops the information pushed last once EOF is detected in a file.

Several member functions are needed to accomplish these tasks. As they are auxiliary to the scanner, they are private members. In practice, these private members are developed once the need for them arises. In the following interface of the Scanner class the final header file is given. Note that, apart from the private member functions, several private data members are used as well. These members are initialized by the Scanner() constructor and are used in the private member functions. They are discussed below, in the context of the member functions that are using them. Here is the scanner.h header file:

#include <FlexLexer.h>  // provides yyFlexLexer interface
#include <fstream>
#include <string>
#include <stack>
#include <vector>

using namespace std;    // FBB: needed for g++ >= 3.0

class Scanner: public yyFlexLexer
{
    public:         
        enum Error
        {
            invalidInclude,
            circularInclusion,
            nestingTooDeep,
            cantRead,
        };
                
        Scanner(istream *yyin, string const &initialName);
        string const &lastFile()
        {
            return fileName.back();
        }
        void stackTrace();  // dumps filename stack to cerr
        int yylex();        // overruling yyFlexLexer's yylex()
    private:
        Scanner(Scanner const &other);      // no Scanner copy-initialization
        Scanner &operator=(Scanner const &other);   // no assignment either

        bool popSource();   // true: source popped
        void performSwitch();
        void throwOnCircularInclusion();

        stack<yy_buffer_state *>
            state;
        vector<string>
            fileName;
        string
            nextSource;

        static unsigned const
            sizeof_buffer = 16384,
            maxDepth = 10;
};
If the scanner can extract a filename from an #include directive, a switch to another file is performed by performSwitch(). If the filename could not be extracted, the scanner throws an invalidInclude exception value. The performSwitch() member and the matching function popSource() handle the file switch. In particular, note that yylineno is not updated when a file switch is performed. If line numbers are to be monitored, then the current value of yylineno should be pushed on a stack, and yylineno should be reset by performSwitch(), while popSource() should reinstate a former value of yylineno by popping a previous value from the stack. In the current implementation of Scanner a simple stack of yy_buffer_state pointers is maintained. Changing that into a stack of pair<yy_buffer_state *, unsigned> elements allows you to save the line numbers. This modification is left as an exercise to the reader.

As mentioned. performSwitch() performs the actual file-switching. The yyFlexLexer class provides a series of member functions that can be used for file switching purposes. The file-switching capability of a yyFlexLexer object is founded on the struct yy_buffer_state, containing the state of the scan-buffer of the file that is currently scanned by the lexical scanner. This buffer is pushed on the state stack when an #include is encountered. Then this buffer is replaced by the buffer that will be created for the file that is mentioned in the #include directive. The actual file switching is realized as follows:

The sources for the member functions performSwitch() and throwOnCircularInclusion() are:
#include "scanner.H"

void Scanner::performSwitch()
{   
    if (state.size() == maxDepth)
        throw nestingTooDeep;

    fileName.push_back(nextSource);
    throwOnCircularInclusion();
    
    ifstream
        *newStream = new ifstream(nextSource.c_str());
        
    if (!*newStream)
        throw cantRead;

    state.push(yy_current_buffer);
    yy_switch_to_buffer(yy_create_buffer(newStream, sizeof_buffer));
}
#include "scanner.H"

void Scanner::throwOnCircularInclusion()
{   
    vector<string>::iterator
        it = find(fileName.begin(), fileName.end(), nextSource);

    if (it != fileName.end())
        throw circularInclusion;
}
The member function popSource() is called to pop the previously pushed buffer from the stack, to continue its scan just beyond the just processed #include directive. The popSource() function first inspects the size of the state stack: if empty, false is returned and the fuction terminates. if not empty, then the current buffer is deleted, to be replaced by the state waiting on top of the stack. The file switch is performed by the yyFlexLexer members yy_delete_buffer() and yy_switch_to_buffer. Note that yy_delete_buffer() takes care of the closing of the ifstream and of deleting the memory allocated for this stream in performSwitch(). Furthermore, the filename that was last entered in the fileName vector is removed. Having done all this, the function returns true:
#include "scanner.H"

bool Scanner::popSource()
{       
    if (state.empty())
        return false;

    yy_delete_buffer(yy_current_buffer);
    yy_switch_to_buffer(state.top());
    state.pop();
    fileName.pop_back();
    return true;
}
These functions complete the implementation of the complete lexical scanner. the lexical scanner itself is stored in the Scanner::yylex() function. The Scanner object itself only has three public member functions and a constructor. One member function is yylex(), the other two allow the called to dump a stack trace and to obtain the name of the file that was last processed by the scanner. The constructor is a simple piece of code. Here is its source:
#include "scanner.h"

Scanner::Scanner(istream *yyin, string const &initialName)
{
    switch_streams(yyin, yyout);
    fileName.push_back(initialName);
}

19.7.1.3: The main() function

The main program is very simple. As the program expects a filename to start the scanning process at, initially the number of arguments is checked. If at least one argument was given, then a ifstream object is created. If this object can be created, then a Scanner object is constructed, receiving the address of the ifstream object and the name of the file as its arguments. Then the yylex() member function of the Scanner object is called. This function is inherited from the Scanner's base class yyFlexLexer. If anything fails inside the scanner, an exception is thrown, which is caught at the end of the main() function:

/*                              lexer.cc

   A C++ main()-frame generated by C++ for lexer.cc

*/

#include "lexer.h"           /* program header file */


int main(int argc, char **argv)
{       
    if (argc == 1)
    {
        cerr << "Filename argument required\n";
        exit (1);
    }

    ifstream
        yyin(argv[1]);

    if (!yyin)
    {
        cerr << "Can't read " << argv[1] << endl;
        exit(1);
    }
     
    Scanner
        scanner(&yyin, argv[1]);

    try
    {
        scanner.yylex();
    }
    catch (Scanner::Error err)
    {
        char const *msg[] =
        {
            "Include specification",
            "Circular Include",
            "Nesting",
            "Read",
        };

        cerr << msg[err] << " error in " << scanner.lastFile() << 
                            ", line " << scanner.lineno() << endl;
        scanner.stackTrace();
        return 1;
    }
}

19.7.1.4: Building the scanner-program

The final program is constructed in two steps. These steps are given for a Unix system, on which flex++ and the Gnu C++ compiler g++ have been installed:

For the purpose of debugging a lexical scanner the rules that are matched and the tokens that are returned are useful information. When flex++ is called with the -d flag, debugging code will be part of the generated scanner. Apart from that, the debugging code must be activated. Assuming the scanner object is called scanner, the statement
scanner.set_debug(1);

must be given following the construction of the scanner object.

19.7.2: Using both bison++ and flex++

When an input language exceeds a certain level of complexity, a parser is generally needed to be able to control the complexity of the input language. In these cases, a parser generator is used to generate the code that is required to determine the grammatical correctness of the input. The function of the lexical scanner is to provided chunks of the input, called tokens, for the parser to work with.

Starting point for a program using both a parser and a scanner is the grammar: the grammar is specified first. This results in a set of tokens which can be returned by the lexical scanner (commonly called the lexer). Finally, auxiliary code is provided to `fill in the blanks': the actions which are performed by the parser and the lexer are not normally specified in the grammatical rules or lexical regular expressions, but are executed by functions, which are called from within the parser's rules or associated with the lexer's regular expressions.

In the previous section we've seen an example of a C++ class generated by flex++. In the current section we concern ourselves with the parser. The parser can be generated from a grammar specified for the program bison++. The specification of bison++ is similar to the specifications required for bison, but a class is generated, rather than a single function. In the next sections we'll develop a program converting infix expressions, in which binary operators are written between their operands, to postfix expressions, in which binary operators are written behind their operands. A comparable situation holds true for the unary operators - and +: We ignore the + operator, but the - will beconverted to a unary minus.

Our calculator will recognize a minimal set of operators: multiplication, addition, parentheses, and the unary minus. We'll distinguish real numbers from integers, to illustrate a subtlety in the bison-like grammar specifications, but that's about it: the purpose of this section, after all, is to illustrate a C++ program, using a parser and a lexer, and not to construct a full-fledged calculator.

In the next few sections we'll develop the grammar in a bison++ specification file. Then, the regular expressions for the scanner are specified according to the requirements of flex++. Finally the program is constructed.

The class-generating bison software (bison++) is not widely available. The version used by us is 2.20. It can be obtained at ftp://ftp.rug.nl/contrib/frank/software/linux/bison/bison++2.20.tar.gz.

19.7.2.1: The bison++ specification file

The grammar specification file used with bison++ is comparable to the specification file used with bison. Differences are related to the class nature of the resulting parser. Our calculator will distinguish real numbers from integers, and will support the basic set of arithmetic operators.

The bison++ specification file consists of the following sections:

19.7.2.2: The bison++ token section

The token section contains all the tokens that are used in the grammar, as well as the priority rules as used for the mathematical operators. Moreover, several new items can be declared here:

19.7.2.3: The bison++ grammar rules

The rules and actions of the grammar are specified as usual. The grammar for our little calculator is given below. A lot of rules, but they illustrate the use of non-terminals associated with value-types.

    lines:
        lines
        line
    |
        line
    ;
    
    line:
        intExpr
        '\n'
        {
            cerr << "int: " << $1 << endl;
        }
    |
        doubleExpr
        '\n'
        {
            cerr << "double: " << $1 << endl;
        }
    |
        '\n'
        {
            cout << "Good bye\n";
            YYACCEPT;
        }
    |
        error
        '\n'
    ;
    
    intExpr:
        intExpr '*' intExpr
        {
            $$ = $1 * $3;
        }
    |
        intExpr '+' intExpr
        {
            $$ = $1 + $3;
        }
    |
        '(' intExpr ')'
        {
            $$ = $2;
        }
    |
        '-' intExpr         %prec UnaryMinus
        {
            $$ = -$2;
        }
    |
        INT
        {
            $$ = atoi(lexer.YYText());
        }
    ;
    
    doubleExpr:
        doubleExpr '*' doubleExpr
        {
            $$ = $1 * $3;
        }
    |
        doubleExpr '+' doubleExpr
        {
            $$ = $1 + $3;
        }
    |
        doubleExpr '*' intExpr
        {
            $$ = $1 * $3;
        }
    |
        doubleExpr '+' intExpr
        {
            $$ = $1 + $3;
        }
    |
        intExpr '*' doubleExpr
        {
            $$ = $1 * $3;
        }
    |
        intExpr '+' doubleExpr
        {
            $$ = $1 + $3;
        }
    |
        '(' doubleExpr ')'
        {
            $$ = $2;
        }
    |
        '-' doubleExpr         %prec UnaryMinus
        {
            $$ = -$2;
        }
    |
        DOUBLE
        {
            $$ = atof(lexer.YYText());
        }
    ;
Using these rules a simple calculator is defined in which integer and real values can be negated, added, and multiplied, and in which standard priority rules can be circumvented using parentheses. The rules show the use of typed nonterminal symbols: doubleExpr is linked to real (double) values, intExpr is linked to integer values. Precedence and type association is defined in the token section of the parser specification file, which is:
    %name  Parser                
    %union 
    {
        int i;
        double d;
    };
    %token      INT
                DOUBLE
    %type   <i> intExpr
            <d> doubleExpr
    
    %left   '+'
    %left   '*'
    %right  UnaryMinus
    
    %define MEMBERS                                         \ 
        virtual ~Parser()   {}                              \ 
        private:                                            \ 
            yyFlexLexer lexer;              
    %define LEX_BODY {return lexer.yylex();}
    
    %define ERROR_BODY { cerr << "error encountered\n"; }
In the token section we see the use of the %type specifiers, connecting intExpr to the i-field of the semantic-value union, and connecting doubleExpr to the d-field. At first sight it looks complex, since the expression rules must be included for each individual return type. On the other hand, if the union itself would have been used, we would have had to specify somewhere in the returned semantic values what field to use: less rules, but more complex and error-prone code.

Also, note that the lexical scanner is included as a member of the parser. There is no need to define the scanner outside of the parser, as it's not used outside of the parser object. The virtual destructor is included as an member to prevent the compiler from complaining about the parser having a non-virtual destructor.

19.7.2.4: The flex++ specification file

The flex-specification file that is used by our calculator is simple: blanks are skipped, single characters are returned, and numerical values are returned as either Parser::INT or Parser::DOUBLE tokens. Here is the complete flex++ specification file:

%{ 
#include "parser.h"
%}
%option yylineno
%%

[ \t]                       ;
[0-9]+                      return(Parser::INT);

"."[0-9]*                   |                                        
[0-9]+("."[0-9]*)?          return(Parser::DOUBLE);

exit                        |
quit                        return (Parser::DONE);

.|\n                        return (*yytext);

19.7.2.5: The generation of the code

The code is generated in the same way as with bison and flex. To order bison++ to generate the files parser.cc and parser.h, give the command:

bison++ -d -o parser.cc parser

Next, flex++ will generate code on lexer.cc using the command

flex++ -I -olexer.cc lexer

Note here that flex++ expects no blanks between the -o flag and lexer.cc.

On Unix systems, linking and compiling the generated sources and the source for the main program (given below) is realized by the command:

g++ -o calc -Wall *.cc -lfl -s

Note the fact that the libfl.a library is mentioned here. If it's not mentioned, the linker will complain about unresolved references like yywrap().

A source in which the main() function, the lexical scanner and the parser objects are defined is, finally:

#include "parser.h"

int main()
{
    Parser
        parser;

    cout << "Enter (nested) expressions containing ints, doubles, *, + and "
            "unary -\n"
            "operators. Enter an empty line, exit or quit to exit.\n";

    parser.yyparse();

    return (0);
}