We're always interested in getting feedback. E-mail us if you like this guide, if you think that important material is omitted, if you encounter errors in the code examples or in the documentation, if you find any typos, or generally just if you feel like e-mailing. Send your email to Frank Brokken.Please state the document version you're referring to, as found in the title (in this document: 5.1.1a) and please state the paragraph you're referring to.
All mail received is seriously considered, and new (sub)releases of the Annotations will normally reflect your suggestions for improvements. Except for the incidental case I will not otherwise acknowledge the receipt of suggestions for improvements. Please don't misinterpret this for lack of appreciation.
This chapter presents a number of concrete examples of programming in
C++. Items from this document such as virtual functions, static
members, etc. are rediscussed. Examples of container classes are shown.
Also, with the advent of the
ANSI/ISO standard, classes supporting streams
based on
file descriptors are no longer available,
including the
Gnu procbuf
extension. These classes were frequently used
in older C++ programs. This section of the C++ Annotations develops an
alternative: classes extending streambuf
, allowing the use of file
descriptors, and classes around the
fork()
system call.
Another example digs into the peculiarities of using a parser- and scanner-generator with C++. Once the input for a program exceeds a certain level of complexity, it's advantageous to use a scanner- and parser-generator for creating the code which does the actual input recognition. The example describes the usage of these tool in a C++ environment.
streambuf
as the
starting point for the construction of classes interfacing file
descriptors. In this section we will discuss the construction of a class which
may be used to write to a device defined by its file descriptor: it may be a
file, but could be a
pipe or
socket. Section 19.1.2 will discuss
the matter of reading from devices that are known by their file descriptors,
while section 19.3 will reconsider the question of redirection,
which was discussed earlier in section 5.8.3.
Basically, deriving a class for
output operations is very simple. The only
member function that must be overridden is the
virtual
int overflow(int
c)
member. This member is responsible for writing
characters to the device in the case where there is no buffer or no more room
in the buffer. If fd
represents the file descriptor to write to, and if we
decide against using a buffer, the member overflow()
can be as simple as:
class UnbufferedFD: public std::streambuf { public: int overflow(int c) { if (c != EOF) { if (write(fd, &static_cast<char>(c), 1) != 1) return EOF; } return c; } ... }The reveived argument is either written as
char
to the file
descriptor, or EOF
is returned. However, this approach does not use an
output buffer. As the use of a buffer is strongly advised (see also the next
section), we will discuss the construction of a class using an output buffer
is somewhat greater detail.
When an output buffer is to be used, the overflow()
member gets a bit more
complex, due to the fact that it is now called only when the buffer is
full. So, first we will have to flush the buffer, for which the (virtual)
function
streambuf::sync()
should be used. Defining sync()
and using
it in overflow()
is not all that must be done: the information to write
might be less than the size of the buffer. So, at the end of the
lifetime
of our special streambuf
object, it might find itself having a buffer that
is only partially full. So, we must make sure that the buffer is flushed by
the time our object goes
out of scope. This is of course very simple:
sync()
should be called in the
destructor as well.
Now that we've considered the consequences of using an output buffer,
we're almost ready to construct our derived class. We add a couple of extra
features. First, we permit the specification of the size of the output buffer
by the user of the class. Second, we permit the construction of an object of
the class before the file descriptor to use is actually known. Later, in
section 19.4 we'll encounter a situation where this feature will be used.
In order to save some space, the successful operation of various calls were
not checked. In `real life' implementations these checks should of course not
be omitted. Here is the class ofdnstreambuf
:
#ifndef _OFDNSTREAMBUF_H_ #define _OFDNSTREAMBUF_H_ #include <unistd.h> #include <streambuf> #include <ios> class ofdnstreambuf: public std::streambuf { public: ofdnstreambuf() : bufsize(0), buffer(0) {} ofdnstreambuf(int fd, unsigned bufsize = 1) { open(fd, bufsize); } ~ofdnstreambuf() { if (buffer) { sync(); // 23 delete buffer; } } void open(int xfd, unsigned xbufsize = 1) { fd = xfd; bufsize = xbufsize == 0 ? 1 : xbufsize; buffer = new char[bufsize]; setp(buffer, buffer + bufsize); // 34 } int sync() { if (pptr() > pbase()) { write(fd, buffer, pptr() - pbase()); // 41 setp(buffer, buffer + bufsize); // 42 } return 0; } int overflow(int c) { sync(); // 49 if (c != EOF) { *pptr() = static_cast<char>(c); // 52 pbump(1); } return c; } private: unsigned bufsize; int fd; char *buffer; }; #endif
sync()
is
called to write any unprocessed characters in the output buffer to
the device.
open()
member, the buffer is initialized.
Using
setp()
the begin and end points
of the buffer are passed to the streambuf
base class, allowing it
to setup
pbase()
pptr()
and
epptr()
.
sync()
, the
unflushed characters in the buffer are written to the device. Then,
at the next line, the buffer is reinitialized. Note that sync()
should return 0 after a successfull flush operation.
streambuf
are
reset to their initial values by called setp()
once again.
overflow()
,
sync()
is called to flush the now filled up output buffer to the
device. As this recreates an empty buffer, the character c
which
could not be written to the buffer is now entered into the buffer
using the member functions pptr()
and
pbump()
. Notice that entering a character into the buffer is
realized using available member functions of streambuf
.
ofdstreambuf
class to copy its standard input to file descriptor
STDOUT_FILENO
, which is the
symbolic name of the file descriptor used
for the standard output. Here is the program:
#include "fdout.h" #include <string> #include <iostream> #include <istream> using namespace std; int main(int argc) { ofdnstreambuf fds(STDOUT_FILENO, 500); ostream os(&fds); switch (argc) { case 1: os << "COPYING cin LINE BY LINE\n"; for (string s; getline(cin, s); ) os << s << endl; break; case 2: os << "COPYING cin BY EXTRACTING TO os.rdbuf()\n"; cin >> os.rdbuf(); // Alternatively, use: cin >> &fds; break; case 3: os << "COPYING cin BY INSERTING cin.rdbuf() into os\n"; os << cin.rdbuf(); break; } }
streambuf
, the class should use an input buffer of at least one character,
to allow the use of the member functions
istream::putback()
or
istream::ungetc()
. Stream classes (like
istream
) normally allow the
ungetting of at least one character using their member functions putback()
or ungetc()
. This is important, as these stream classes usually interface
to
streambuf
objects. Although strictly speaking it is not necessary to
implement a buffer in classes derived from streambuf
, we strongly suggest
to use buffers is these cases: the implementation is very simple and
straightforward, and the applicability of such classes is greatly
improved. Therefore, in all our classes
derived from the class streambuf
at least a
buffer of one character will be defined.
19.1.2.1: Using a one-character buffer
When deriving a class (e.g.,
ifdstreambuf
) from streambuf
using a
buffer of at least one character, at the minimum the member
streambuf::underflow()
should be overridden, as this is the member to
which all requests for input eventually degenerate. Since also a buffer is
needed, the member
streambuf::setg()
is used to inform the streambuf
baseclass of the dimensions of the input buffer, so that it is able to set up
its input buffer pointers so that
eback()
,
gptr()
, and
egptr()
return correct values. Here is the
corresponding class definition:
#include <unistd.h> #include <streambuf> #include <ios> class ifdstreambuf: public std::streambuf { public: ifdstreambuf(int fd) : fd(fd) { setg(buffer, buffer + 1, buffer + 1); // 12 } int underflow() { if (gptr() < egptr()) // 18 return *gptr(); // 19 if (read(fd, buffer, 1) <= 0) // 21 return EOF; setg(buffer, buffer, buffer + 1); // 24 return *gptr(); } protected: // 27 int fd; char buffer[1]; };Please note the following:
private
section a small
array of one character is defined, and setg()
will set gptr()
equal to
egptr()
. Since this implies that the buffer is empty, underflow()
will
be called to refill the buffer.
underflow()
will first see whether the buffer
is really empty. If not, then the character to be retrieved next is returned.
EOF
is returned. More sophisticated implementations could
react more intelligently here, of course.
setg()
is called once
again to set up streambuf
's buffer pointers correctly.
protected
data members so that derived classes (e.g., see section
19.1.2.3) can access them.
ifdstreambuf
class is used in the following program:
#include "ifdbuf.h" #include <iostream> #include <istream> using namespace std; int main(int argc) { ifdstreambuf fds(0); istream is(&fds); cout << is.rdbuf(); }
19.1.2.2: Using an n-character buffer
How complex would things get if we decide to use a buffer of substantial
size? Not that complex. The following class allows use to specify the size of
a buffer, and it is basically the same class as ifdstreambuf
from the
previous section. To make things a bit more interesting, in the current class
ifdnstreambuf
the member
streambuf::xsgetn()
was also overridden, to
optimize the reading of series of characters at once. Also, a default
constructor is provided which can be used in combination with the added
open()
member in cases where we want to construct a istream
object
before we know the file descriptor that must be used. Later, in section
19.4 we'll encounter such a situation. In order to save some space, the
successful operation of various calls were not checked. In `real life'
implementations these checks should of course not be omitted. Here is the new
class ifdnstreambuf
:
#ifndef _IFDNBUF_H_ #define _IFDNBUF_H_ #include <unistd.h> #include <streambuf> #include <ios> class ifdnstreambuf: public std::streambuf { public: ifdnstreambuf() : bufsize(0), buffer(0) {} ifdnstreambuf(int fd, unsigned bufsize = 1) { open(fd, bufsize); } ~ifdnstreambuf() { if (bufsize) { close(fd); delete buffer; // 27 } } void open(int xfd, unsigned xbufsize = 1) { fd = xfd; bufsize = xbufsize; buffer = new char[bufsize]; setg(buffer, buffer + bufsize, buffer + bufsize); } int underflow() { if (gptr() < egptr()) return *gptr(); int nread = read(fd, buffer, bufsize); if (nread <= 0) return EOF; setg(buffer, buffer, buffer + nread); // 49 return *gptr(); } std::streamsize xsgetn(char *dest, std::streamsize n) // 54 { int nread = 0; while (n) { if (!in_avail()) { if (underflow() == EOF) // 62 break; } int avail = in_avail(); if (avail > n) avail = n; memcpy(dest + nread, gptr(), avail);// 71 gbump(avail); // 72 nread += avail; n -= avail; } return nread; } // 79 protected: int fd; unsigned bufsize; char* buffer; }; #endifWith this class, please note:
ifdnstreambuf
object goed
out of scope.
underflow()
. The
input buffer pointers of streambuf
are now reset in accord with the actual
number of characters read from the device.
xsgetn()
is overridden. In a loop
n
is reduced until 0, at which point the function
terminates. Alternatively (line 62), the member returns if underflow()
fails to obtain more characters. In lines 71 and 72 reading is
optimized: instead of calling
streambuf::sbumpc()
n
times, a block
of avail
characters is copied to the destination, using
streambuf::gpumb()
to consume avail
characters from the buffer using
one function call.
xsgetn()
is called by
streambuf::sgetn()
,
which is a streambuf
member. The following example illustrates the use of
this member function with a ifdnstreambuf
object:
#include "ifdnbuf.h" #include <iostream> #include <istream> using namespace std; int main(int argc) { ifdnstreambuf fds(0, 30); // internally: 30 char buffer char buf[80]; // main() reads blocks of 80 // chars while (true) { unsigned n = fds.sgetn(buf, 80); if (n == 0) break; cout.write(buf, n); } }
19.1.2.3: Seeking positions in `streambuf' objects
When devices support seek operations, classes derived from
streambuf
should override
streambuf::seekoff()
and
streambuf::seekpos()
. In the
following class
ifdseek
, which is derived from our previously constructed
class
ifdstreambuf
, we assume that the device of which we have a
file descriptor available suports these seeking operations. The class
ifdseek
is derived from ifdstreambuf
, so it uses a character buffer of
just one character. The facilities to perform seek operations, which are added
to our new class ifdseek
will make sure that the input buffer is reset
when a seek operation is requested. The class could also be derived from the
class ifdnstreambuf
, in which case the arguments to reset the input
buffer must be adapted such that its second and third parameter point beyond
the available input buffer. Here is the class definition:
#include "ifdbuf.h" #include <unistd.h> class ifdseek: public ifdstreambuf { typedef std::streambuf::pos_type pos_type; // 6 typedef std::streambuf::off_type off_type; typedef std::ios::seekdir seekdir; typedef std::ios::openmode openmode; // 9 public: ifdseek(int fd) // 12 : ifdstreambuf(fd) {} pos_type // 17 seekoff(off_type offset, seekdir dir, openmode) { pos_type pos = lseek ( fd, offset, (dir == std::ios::beg) ? SEEK_SET : (dir == std::ios::cur) ? SEEK_CUR : SEEK_END ); if (pos < 0) return -1; setg(buffer, buffer + 1, buffer + 1); // 32 return pos; } pos_type // 36 seekpos(pos_type offset, openmode mode) { return seekoff(offset, std::ios::beg, mode); } };
streambuf
and
ios
classes were redefined to prevent us from specifying std::streambuf
and
std::ios
again and again when reffering to these types.
lseek()
is used to seek a new position
in a file whose
file descriptor is known.
streambuf
class will
reset its input buffer pointers to an empty buffer, so that underflow()
will refill the buffer at the next request for input.
seekpos
is overridden as well:
it is actually defined by a call to seekoff()
.
ifdseek
is the following. If
this program is given its own source file using input
redirection then
seeking is supported, and apart from the first line, every other line is shown
twice:
#include "fdinseek.h" #include <string> #include <iostream> #include <istream> #include <iomanip.h> using namespace std; int main(int argc) { ifdseek fds(0); istream is(&fds); string s; while (true) { if (!getline(is, s)) break; streampos pos = is.tellg(); cout << setw(5) << pos << ": `" << s << "'\n"; if (!getline(is, s)) break; streampos pos2 = is.tellg(); cout << setw(5) << pos2 << ": `" << s << "'\n"; if (!is.seekg(pos)) { cout << "Seek failed\n"; break; } } }
buffer is refilled
19.1.2.4: Multiple `unget()' calls in `streambuf' objects
As stated earlier
streambuf
classes and classes derived from streambuf
should at least support ungetting the character that was last read. Special
care must be taken when series of unget()
calls are to be supported. In
this section we will show how to construct a class which is derived from
streambuf
and which allows a configurable number of
istream::unget()
or
istream::putback()
calls. Support for multiple (say `n
')
unget()
calls is realized by setting aside an initial section of the input
buffer, which is then gradually filled up by the last n
characters read.
Here is an implementation of a class supporting series of unget()
calls:
#include <unistd.h> #include <streambuf> #include <ios> #include <algorithm> class fdunget: public std::streambuf { public: fdunget (int fd, unsigned bufsz, unsigned unget) // 9 : fd(fd), reserved(unget) // 12 { unsigned allocate = // 14 bufsz > reserved ? bufsz : reserved + 1; buffer = new char [allocate]; // 20 base = buffer + reserved; // 22 setg(base, base, base); // 23 bufsize = allocate - reserved; // 25 } ~fdunget() { delete buffer; } int underflow() { if (gptr() < egptr()) return *gptr(); unsigned consumed = // 39 static_cast<unsigned>(gptr() - eback()); unsigned move = std::min(consumed, reserved); // 41 memcpy(base - move, egptr() - move, move); // 43 int nread = read(fd, base, bufsize); // 45 if (nread <= 0) // none read -> return EOF; setg(base - move, base, base + nread); // 50 return *gptr(); } private: int fd; unsigned bufsize; unsigned reserved; char* buffer; char* base; };
unget
determines the size of a reserved
area,
which is the first reserved
bytes of the used buffer.
reserved
. So, a certain number of bytes will
be read, and once reserved
bytes have been read at least reserved
bytes can be unget.
base
, which is located reserved
bytes into the buffer
area. This is always the point where refills of the buffer start.
streambuf
's buffer pointers are set up. As no
characters have been read as yet, all pointers are set to base
. At the
next read request,
underflow()
will now know
that no characters are available, so unget()
will fail immediately.
underflow()
. Up to this
point everything is standard, and now we determine how many characters were
consumed. Remember that at line 23 we ascertained that the pointers indicated
0 bytes in the buffer, so consumed
is either 0 or the actual number of
bytes read into the buffer after a successful refill.
reserved
, but it is less if fewer
characters were available. So, move
is set to the minimum of the number of
consumed characters and reserved
.
base
.
base
, not from buffer
.
streambuf
's read buffer pointers are set up.
Eback()
is set to move
locations before
base
, thus defining the guaranteed unget-area,
gptr()
is set to base
, since that's the
location of the first read character after a refill, and
egptr()
is set just beyond the location of
the last read character.
class
fdunget
. The program reads at most 10 characters from the standard input,
stopping at EOF
. A guaranteed unget-buffer of 2 characters is defined,
in a buffer of 3 characters. Just before reading a character it is tried to
unget at most 6 characters. This is of course never possible, but the program
will neatly unget as many characters as possible, considering the actual
number of characters read:
#include "fdunget.h" #include <string> #include <iostream> #include <istream> using namespace std; int main(int argc) { fdunget fds(0, 3, 2); istream is(&fds); char c; for (int idx = 0; idx < 10; ++idx) { cout << "after reading " << idx << " characters:\n"; for (int ug = 0; ug <= 6; ++ug) { if (!is.unget()) { cout << "\tunget failed at attempt " << (ug + 1) << "\n" << "\trereading: '"; is.clear(); while (ug--) { is.get(c); cout << c; } cout << "'\n"; break; } } if (!is.get(c)) { cout << " reached\n"; break; } cout << "Next character: " << c << endl; } } /* Generated output after 'echo abcde | program': after reading 0 characters: unget failed at attempt 1 rereading: '' Next character: a after reading 1 characters: unget failed at attempt 2 rereading: 'a' Next character: b after reading 2 characters: unget failed at attempt 3 rereading: 'ab' Next character: c after reading 3 characters: unget failed at attempt 4 rereading: 'abc' Next character: d after reading 4 characters: unget failed at attempt 4 rereading: 'bcd' Next character: e after reading 5 characters: unget failed at attempt 4 rereading: 'cde' Next character: after reading 6 characters: unget failed at attempt 4 rereading: 'de ' reached */
form()
(and
scan()
) for stream objects. In this section the
construction of a class
oformstream
is described, which is derived from
ostream
and does support form()
and
vform()
. Analogously to oformstream
a class
iscanstream
can be
developed, supporting scan()
and
vscan()
. The contruction of this
latter class is left as an
exercise to the reader.
Here is the class interface and definition. It is annotated below:
#include <iostream> class oformstream: public std::ostream { public: oformstream(std::ostream &ostr) : std::ostream(ostr.rdbuf()) {} std::ostream &form(char const *fmt, ...); };
ostream
, so it
inherits ostream
's capabilities.
streambuf
, which could be obtained (as shown below) from another
ostream
object.
form()
is defined. As is customary for
this kind of function, it doesn't do the formatting itself, but delegates
this to a `v-function' companion, in this case vform()
.
vform()
is defined.
auto_ptr
is used to manage the dynamically
allocated buffer.
vsnprintf()
returns the number of bytes written to the buffer, not
counting the final
ASCII-Z
character.
ostream
base class, and the oformstream
object is
returned.
oformstream
is given in the next
example. It prints a well-known string and some marker text:
#include "oformstream.h" #include <memory> std::ostream &oformstream::form(char const *fmt, ...) { va_list list; va_start(list, fmt); char c; unsigned n = vsnprintf(&c, 0, fmt, list); std::auto_ptr<char> buf(new char[n + 1]); vsprintf(buf.get(), fmt, list); return *this << buf.get(); } using namespace std; int main() { oformstream of(cout); of << "Hello world\n"; cout << "Ok, "; of << "That's all, folks\n"; of.form("%s\n", "Hello world of C++") << endl; };
ios::rdbuf()
member
function. By assigning the streambuf
of a stream to another stream, both
stream objects access the same streambuf
, thus realizing redirection at
the level of the programming language itself.
It should be realized that this is fine within the context of the C++
program, but if that context is left, the redirection is terminated, as the
operating system does not know about streambuf
objects. This happens,
e.g., when a program uses a
system()
call to start a subprogram. The
following program uses C++ redirection to redirect the information
inserted into
cout
to a file, and then calls
system("echo hello world")to echo a well-known line of text. Since
echo
writes its information
to the standard output this would be the program's redirected file if
C++'s redirection would be recognized by the operating system. However,
this is not the case, so hello world
still appears at the program's
standard output and not on the redirected file. A solution of this problem
involves redirection at the operating system level, for which system calls
like
dup()
and
dup2()
are available in some operating systems (e.g.,
Unix and friends). Examples of the use of these system calls are given in
section 19.4 in this chapter.
Here is the example of the failing redirection at the system level
following C++ redirection using streambuf
redirection:
#include <iostream> #include <fstream> #include <cstdlib> using namespace::std; int main() { ofstream of("outfile"); cout.rdbuf(of.rdbuf()); cout << "To the of stream" << endl; system("echo hello world"); cout << "To the of stream" << endl; } /* Generated output: on the file `outfile' To the of stream To the of stream On standard output: hello world */
fork()
system call is well
known. When a program needs to start a new process,
system()
can be used,
but this requires the program to wait for the
child process to
terminate. The more general way to spawn subprocesses is to call fork()
.
In this section we will see how C++ can be used to wrap classes around a
complex system call like fork()
. Much of what follows in this section
directly applies to the
Unix
operating system, and the discussion will
therefore focus on that operating system. However, other systems usually
provide comparable facilities. The following discussion is based heavily on
the notion of
design patterns, as published by Gamma et al. (1995)
When fork()
is called, the current program is duplicated in memory,
thus creating a new process, and both processes continue their execution just
below the fork()
system call. The two processes may, however, inspect the
return value of fork()
which is different in the original process (called
the
parent process) and in the newly created process (called the
child process):
fork()
returns the
process ID of
the child process that was created by the fork()
system call. This is a
positive integer value.
fork()
returns 0.
fork()
fails, -1 is returned.
A basic Fork
class should hide the bookkeeping details of a system
call like fork()
from its users. The Fork
class developed below will
do just that. The class itself only needs to take care of the proper execution
of the fork()
system call. Normally, fork()
is called to start a child
process, usually boiling down to the execution of a separate process. This
child process may expect input at its standard input stream and/or may
generate output to its standard output and/or standard error streams. Fork
does not know all this, and does not have to know what the child process is.
However, a Fork
object should be able to activate the child process.
In the child process created by the Fork
class developed here,
two subtasks will be performed: first, the necessary (standard) streams are
redirected, second the actual child process is started.
Comparably, in the parent process two subtasks will also be performed: here
istreams
and an ostream
object can be made available, followed by the
activation of the actual parent process. Hoever, the Fork
process has no
responsibilities here, but defers these tasks to separate objects. This
design pattern is called the
Chain of Responsibility pattern
by
Gamma et al. (1995). Using this design pattern
the sender of a request is decoupled from its receiver.
Normally, the child process does not return: once it has completed its task it should terminate as a separate process. The parent (main) process usually waits for its child process to complete its task. The terminating child process informs its parent that it is about to terminate by sending a signal to the parent, which should be caught by the parent. If the child terminates and the parent process does not catch the signal generated by the child process then the child process remains visible as a so-called zombi process.
There is one situation where the child process continues to live, and the
parent dies. In nature this happens all the time: parents tend to die
before their children do. In our context this represents a
daemon
program, where the parent process dies and the child program continues in fact
as a child of the basic
init
process. Again, when the child eventually
dies a signal is sent to its `
step-parent' init
. No zombi is created
here, as init
catches the termination signals of its
(step-) children.
The construction of a daemon process is so simple, that no special Fork
object is needed (see section 19.4.12 for an example).
ChildProcess
class, offering a protected pure virtual method
doProcess()
which must be implemented in a concrete implementation of the
abstract class. Depending on the problem at hand, the child process may assume
that it can read from its standard input, and that it can write to its
standard output and/or standard error. Here is the definition of the abstract
ChildProcess
class:
#ifndef _CHILD_PROCESS_ #define _CHILD_PROCESS_ class ChildProcess { public: void process() { doProcess(); exit(-1); // doProcess itself should stop } protected: virtual void doProcess() = 0; // implemented in subclasses }; #endifIn this class, note that the methode
process()
is the only public
method that is offered. This method performs two tasks: it calls
doProcess()
, which is implemented in a derived class; and then, as a
safeguard, it calls exit(-1)
: this ensures that the child process is
properly terminated if, for whatever reason, the doProcess()
member itself
fails to terminate the child process properly (see also section 19.4.12
for the reverse case where the child continues to live and the parent dies).
The design pattern in which a generic class defines a member (an action) to be
implemented in concrete subclasses is called a
Command design pattern by
Gamma et al. (1995). The process()
method defines
a
skeleton algorithm, deferring the implementation of the doProcess()
step to a subclass. This design pattern is called the
Template Method by Gamma et al.
ParentProcess
class, offering a public pure virtual method process()
which must be
implemented in a concrete implementation of the abstract class. This
process()
method always is called with the following arguments:
int pid
, being the
process ID of the child
process;
std::ostream &out
, which may be used to write
information to the child process (appearing at the child process' standard
input stream);
std::istream &in
, which may be used to read
information from the child process (e.g., written to the
child process' standard output stream);
std::istream &inErr
, which may be used to read
information from the child process (e.g., written to the
child process' standard error stream);
ParentProcess
:
#ifndef _PARENT_PROCESS_ #define _PARENT_PROCESS_ #include <istream> class ParentProcess { public: virtual void process // pure virtual function ( int pid, std::ostream &out, std::istream &in, std::istream &inerr ) = 0; }; #endif
Again, this is an example of the Command design pattern as identified by Gamma et al. (1995).
Fork
class itself
has no business here.
Redirection at the system level (and not at the C++
level, see section 19.3) is required to connect the child's
standard streams to the streams defined as parameters in the
ParentProcess::process()
member function. The abstract class
Redirector
defines the interface for this redirection. Depending on the
requirements of the child process its standard input, standard
output and/or standard error may have to be redirected. Here is the definition
of the abstract class Redirector
:
#ifndef _REDIRECTOR_H_ #define _REDIRECTOR_H #include <istream> class Redirector { public: enum { R, W }; enum bad_stream {}; virtual ~Redirector() {} virtual void child() = 0; // sets up redirections in the child virtual void parent() = 0; // initializes in, out, err virtual std::istream &in() { throw bad_stream(); return *reinterpret_cast<std::istream *>(0); } virtual std::istream &inErr() { return in(); } virtual std::ostream &out() { throw bad_stream(); return *reinterpret_cast<std::ostream *>(0); } }; #endifThis class offers the following methods:
child()
defines the redirection of the
necessary streams in the child process.
parent()
defines the redirection of the
necessary streams in the parent process.
in(), inErr()
and out()
throw by default
Redirector::bad_stream
exceptions. When a derived class actually defines a
certain kind of redirection, it can override the default implementations to
make the redirected streams available: in()
should be made available to
read the (standard) output of the child process, inErr()
should be made
avaiable if the child's standard error must be read on a separate stream,
out()
should be made available to write information to the child's
standard input.
The Redirector
class represents yet another example of Gamma's (1995)
Command design pattern.
Fork
's support classes have been defined, let's have a look
at the Fork
class itself. The class defines but one member: fork()
,
performing the action fork()
system call. An object of the class Fork
can be configured by provinding the appropriate Redirector,
ChildProcess
and ParentProcess
objects. In the current definition the
Fork
constructor defines such parameters and the class defines
corresponding reference variables, but an alternative approach where
fork()
itself defines these reference parameters could have been followed
as well. It is up to the reader to experiment with this latter approach.
Here is the interface of the class Fork
, together with the
implementation of the fork()
member function:
#ifndef _FORK_H_ #define _FORK_H class Redirector; class ChildProcess; class ParentProcess; class Fork { public: Fork(Redirector &red, ChildProcess &cp, ParentProcess &pp) : redirector(red), child(cp), parent(pp) {} void fork() const; protected: Redirector &redirector; ChildProcess &child; ParentProcess &parent; }; #endif
#include "fork.i" void Fork::fork() const { int pid = ::fork(); if (pid == 0) // child { redirector.child(); child.process(); } redirector.parent(); parent.process(pid, redirector.out(), redirector.in(), redirector.inErr()); }Looking at the implementation of the
fork()
member, note that its
child- and parent- sections are implemented rather symmetrically: first the
redirector is called to set up the appropriate redirection, then the relevant
(child or parent) object is asked to do its thing.
Again we meet a design pattern as identified by Gamma et al. (1995), this time the Chain of Responsibility pattern (see also section 19.4).
Fork
class, the
abstract base classes Redirector, ChildProcess
and ParentProcess
must
be implemented in concrete subclasses. The following classes will be
developed:
IORedirector
implements the redirection of the standard
input stream, standard output stream and standard error stream in a Fork
object. Rather than redirecting the standard error and standard output to
different streams, they will be merged to one stream, which will be read by
the parent via the redirector.in()
istream.
ChildPlain
will allows us to activate a simple child
process, not expecting any arguments, producing output at its standard
output and/or standard error stream and maybe also expecting input at its
standard input stream.
ParentCmd
is used to communicate with a child process
by sending commands to the child process (arriving at the child's standard
input stream) and reading the child's standard output or error in return.
ParentSlurp
is used just to read the child's standard
output or error.
dup2()
system call, operating on
file descriptors. One of these file
descriptors is the end of a
pipe, as created by the
pipe()
system
call. A pipe is a one-way channel through which information may
flow. Typically one process writes information to the pipe, another process
reads information from the pipe.
The pipe()
system call will fill an int
array having two elements
with two file descriptors: element 0 represents the end of the pipe from which
information can be read, element 1 represents the end to which information
can be written.
The Redirector
subclasses must set up the redirection, and this is
often a rather standard initialization and redirection of the file descriptors
created by a pipe()
system call. Because this is often standard, it is
useful to have a set of objects available doing just that. So, as an extra
support class to assist the implementation of Redirector
subclasses the
Pipe
set of classes were developed too.
The abovementioned classes are now constructed. First the Pipe
set of
classes are developed, followed by the construction of the IORedirector
,
the ChildPlain
, the ParentCmd
and the ParentSlurp
classes. Eventually, these classes are used in some examples.
Although the story seems like a long one, it is hoped that the reader will
be able to appreciate the beauty of the design patterns underlying the
construction of the Fork
- and its support-classes as discussed here. As we
will see, once the design is clear, the implementations are surprisingly
short.
Pipe
support classes were constructed: Pipe
is the base
class, defining and initializing the file descriptor array. Its constructor is
used only by its subclasses, so it was given the protected
access
modifier. Here is the definition of the class Pipe
:
#ifndef _PIPE_H_ #define _PIPE_H_ #include <unistd.h> class Pipe { public: enum bad_pipe {}; int getReadFd() { return fd[R]; } int getWriteFd() { return fd[W]; } protected: enum { R, W }; Pipe() { if (pipe(fd)) throw bad_pipe(); } int fd[2]; }; #endif
The
IPipe
class defines a pipe which can be used by a parent process to
read from an ifdnstreambuf
object (introduced in section
19.1.2.2). The redirection of a (standard) file descriptor to the file
descriptor corresponding to the writing end of the pipe is implemented in its
child()
members. Note that the unused ends of the pipes (the reading end
in the child, the writing end in the parent) are closed: open descriptors will
prevent the corresponding pipe to close if only one of the processes using the
pipe is done with it. Also, once the redirection has been performed (to save
space the successful completion of
dup2()
has not been checked: it should
be checked in production code), the file descriptor associated with the pipe
can be closed as well, as the alternate file descriptor will now be used
instead. Also note the use of the open()
member function of the
ifdnstreambuf
object. Here is the definition of the IPipe
class:
#ifndef _IPIPE_H_ #define _IPIPE_H_ #include "pipe.h" #include "../../examples/ifdnbuf.h" class IPipe: public Pipe // parent reads { public: void child(int std_fd) { close(fd[R]); dup2(fd[W], std_fd); close(fd[W]); } void child(int std_fd1, int std_fd2) { close(fd[R]); dup2(fd[W], std_fd1); dup2(fd[W], std_fd2); close(fd[W]); } void parent(ifdnstreambuf &fdbuf, unsigned bufsize = 500) { close(fd[W]); fdbuf.open(fd[R], bufsize); } }; #endif
The
OPipe
class defines a pipe which can be used by a parent process to
write to an ofdnstreambuf
object (introduced in section
19.1.1). This information will reach the reading end of the pipe in the
child process. The implementation of IPipe
closely mimics the
implementation of the class IPipe
. Since there is only one standard input
stream, the child()
member needs no parameter, but can immediately
redirect
STDIN_FILENO
. Here is the definition of the class OPipe
:
#ifndef _OPIPE_H_ #define _OPIPE_H_ #include "pipe.h" #include "../../examples/fdout.h" class OPipe: public Pipe // parent writes { public: void child() { close(fd[W]); dup2(fd[R], STDIN_FILENO); close(fd[R]); } void parent(ofdnstreambuf &fdbuf, unsigned bufsize = 500) { close(fd[R]); fdbuf.open(fd[W], bufsize); } }; #endif
IORedirector
, derived from the Redirector
abstract base
class, redirects information written in a parent process to the writing end of
a pipe, accessed through its out() ostream
to the standard input stream of
a child process that uses the reading end of the pipe. Furthermore, any
information that's written in the child process to either the standard output
stream or the standard error stream may be read in the parent process from the
reading end of a pipe, which is accessed through the istream in()
reference. First the class definition is shown, then it will be
annotated. Here is the class definition:
#ifndef _IOREDIRECTOR_H_ #define _IOREDIRECTOR_H #include "../abc/redirector.h" #include "../pipe/ipipe.h" #include "../pipe/opipe.h" class IORedirector: public Redirector // defines in() out() { public: IORedirector() // 11 : ins(&ifdb), outs(&ofdb) {} std::istream &in() // 17 { return ins; } std::ostream &out() // 22 { return outs; } void child() // 27 { ip.child(STDOUT_FILENO, STDERR_FILENO); op.child(); } void parent() // 33 { ip.parent(ifdb); op.parent(ofdb); } private: ifdnstreambuf ifdb; // 40 std::istream ins; IPipe ip; ofdnstreambuf ofdb; std::ostream outs; OPipe op; // 47 }; #endif
private
data section, we see (lines 40 to 47) a
ifdstreambuf
(see section 19.1.2.2), an istream
and an IPipe
input-pipe (see section 19.4.6) which is used for reading information from
the child process. A comparable set of objects is defined for wwriting
information to the child process.
istream
and
ostream
objects with their respective buffers, constructed by their
default constructors.
in()
and out()
members (lines 17 and 22) override the
exception throwing default Redirector
implementations.
IPipe ip
in the child process
redirects the standard output and standard error streams to that pipe. The
input and output pipes are initialized in the child()
member defined
in line 27.
streambuf
objects. This is realized by the parent()
member, defined in
line 33.
IORedirector
can be used in many situations. If a process
only writes information, the reading facilities are simply not used. If a
process only reads information, the writing facilities are not used. The class
does not distinguish information written written by the child process
to its standard output stream from information written to its standard error
stream. If these streams must be distinguished, another subclass of
Redirector
must be constructed (probably requiring the use of the
select()
system call in the parent). This situation is left as an
exercise to the reader.
ChildPlain
, derived from the ChildProcess
abstract base
class, only has to define our child process. The current implementation (see
below) defines a public constructor, expecting the full path of the program to
execute as child process. In the current context, it is assumed that the
program to be executed needs no arguments. So, the program could be
/bin/sh
or
/bin/ls
.
The member doProcess()
is never seen outside of the ChildProcess
class
itself, so it needs no public access rights. Here, it is defined as a
private
member.
At the very minimum, only doProcess()
could have been defined, using a
fixed call of the process. For illustration purposes ChildPlain
was
constructed as a slightly more flexible class. Here is the class definition
itself:
#ifndef _CHILD_PLAIN_ #define _CHILD_PLAIN_ #include "../abc/childprocess.h" #include <unistd.h> class ChildPlain: public ChildProcess { public: ChildPlain(char const *cmd) : cmd(cmd) {} private: void doProcess() { execl(cmd, cmd, 0); } char const *cmd; }; #endif
ParentCmd
, derived from the ParentProcess
abstract base
class, only needs to implement the process()
member, defining the parent's
side of the communication with the child process. ParentCmd
was
constructed with the child process executing
/bin/sh
in mind. The
process()
member implements the following simple protocol:
cin
, reading stops at an empty line.
ostream
objected referenced by
the out
parameter. An echo EOC
(End Of Command) is appended, thus
defining a very simple protocol.
istream
object referenced by the
in
parameter. Once EOC
is sensed, reading stops.
cin
, the child process may
terminate. This is realized by sending a
SIGTERM
signal to the child
process, followed (to be sure the process terminates) by the sending of a
SIGKILL
signal.
waitpid()
system call.
ParentCmd
class.
#ifndef _PARENT_CMD_ #define _PARENT_CMD_ #include "../abc/parentprocess.h" #include <iostream> #include <string> #include <sys/wait.h> #include <signal.h> using namespace std; class ParentCmd: public ParentProcess { public: void process ( int pid, std::ostream &out, std::istream &in, std::istream &inerr // not used ) { string s; while (true) { cerr << "? "; if (!getline(cin, s)) break; cout << "Cmd: " << s << endl; out << s << endl << "echo EOC" << endl; while (getline(in, s)) { cout << "Read: " << s << endl; if (s == "EOC") break; } } kill(pid, SIGTERM); kill(pid, SIGKILL); waitpid(pid, 0, 0); } }; #endif
ParentSlurp
, derived from the ParentProcess
abstract base
class, only needs to implement the process()
member, defining the parent's
side of the communication with the child process. ParentSlurp
was
constructed with the child process executing a program merely producing output
(like
/bin/sh
) in mind. The process()
member reads all information
appearing at its istream &in
parameter (note the use of the C++
redirection), and then waits for the child process to complete.
Here is the definition of the ParentSlurp
class:
#ifndef _PARENT_SLURP_ #define _PARENT_SLURP_ #include "../abc/parentprocess.h" #include <iostream> #include <string> #include <sys/wait.h> #include <signal.h> using namespace std; class ParentSlurp: public ParentProcess { public: void process ( int pid, std::ostream &out, // not used std::istream &in, std::istream &inerr // not used ) { std::cout << in.rdbuf(); waitpid(pid, 0, 0); } }; #endif
Fork
class. Two examples
are presented here. Both examples use the IORedirector
object (introduced
in section 19.4.7).
In the first example,
/bin/ls
is called as child process, and a
ParentSlurp
object is used to consume the generated output:
#include "fork/fork.h" #include "ioredirector/ioredirector.h" #include "childplain/childplain.h" #include "parentslurp/parentslurp.h" int main() { IORedirector io; ChildPlain binls("/bin/ls"); ParentSlurp slurp; Fork(io, binls, slurp).fork(); } /* Generated output (partially shown): abc binls binsh ... parentcmd parentslurp pipe */Note the combined construction of the
Fork
object as a constant
object, and the way fork()
is called with this constant
object. Alternatively, the construction of the Fork
object and the
activation of the fork()
member could have been separated:
Fork f(io, binsh, cmd); f.fork();
In the second example,
/bin/sh
is called to execute several commands, and
a ParentCmd
object is used to interface between the user of the program
and the /bin/sh
program. Here is the program and an example of a resulting
conversation:
#include "fork/fork.h" #include "ioredirector/ioredirector.h" #include "childplain/childplain.h" #include "parentcmd/parentcmd.h" int main() { IORedirector io; ChildPlain binsh("/bin/sh"); ParentCmd cmd; Fork(io, binsh, cmd).fork(); } /* Example of conversation: ? cd tmp Cmd: cd tmp Read: /bin/sh: cd: tmp: No such file or directory Read: EOC ? mkdir tmp Cmd: mkdir tmp Read: EOC ? cd tmp Cmd: cd tmp Read: EOC ? cd ../ Cmd: cd ../ Read: EOC ? rmdir tmp Cmd: rmdir tmp Read: EOC ? cd tmp Cmd: cd tmp Read: /bin/sh: cd: tmp: No such file or directory Read: EOC */
fork()
is to start a
child process. The parent process terminates immediately after spawning the
child process. If this happens, the child process continues to run as a child
process of
init
, the always running first process on
Unix systems. Such
a process is often called a
daemon, running as a
background process.
Although the following example can easily be constructed as a plain C
program, it was included in the C++ Annotations because it is so closely
related to the current discussion of the Fork
class. I thought about
adding a daemon()
member to that class, but eventually decided against it
because the construction of a daemon program is very simple and uses none of
the features currently offered by the Fork
class.
Here is an example illustrating the construction of a daemon program:
#include <iostream> #include <unistd.h> int main() { int pid = fork(); if (pid != 0) return pid < 0; // 1: failing fork, 0: daemon started // child process: the daemon sleep(3); // wait 3 seconds std::cout << "Hello from the child process\n"; } /* Generated output: The next command prompt, then after 3 seconds: Hello from the child process */
For whatever reason, some operators are missing: there are no predefined
function objects corresponding to
bitwise operations. However, their
construction is, given the available predefined function objects, not
difficult. The following examples show a
template class implementing a
function object calling the
bitwise and (
operator&()
), and a template
class implementing a function object calling the
unary not
(
operator~()
). Other missing operators can be constructed similarly. This
is, however, left as an
exercise to the reader.
Here is the implementation of a function object calling the
bitwise
operator&()
:
#include <functional> template <typename _Tp> struct bit_and : public binary_function<_Tp,_Tp,_Tp> { _Tp operator()(const _Tp& __x, const _Tp& __y) const { return __x & __y; } };
Here is the implementation of a function object calling operator~()
:
#include <functional> template <typename _Tp> struct bit_not : public unary_function<_Tp,_Tp> { _Tp operator()(const _Tp& __x) const { return ~__x; } };
These and other
missing predefined function objects
are also implemented in the
file
bitfunctional
, which is found in one of the cplusplus
zip-archives.
An example using bit_and()
to remove all odd numbers from a vector of
int
values, using the
remove_if()
generic algorithm is:
#include <iostream> #include <algorithm> #include <vector> #include "bitand.h" int main() { vector<int> vi; for (int idx = 0; idx < 10; ++idx) vi.push_back(idx); copy ( vi.begin(), remove_if(vi.begin(), vi.end(), bind2nd(bit_and<int>(), 1)), ostream_iterator<int>(cout, " ") ); cout << endl; } /* Generated output: 0 2 4 6 8 */
Vector
, introduced in section
18.3 to contain a RandomAccessIterator reverse_iterator
.
First, we note that applying the ordinary iterators of the Vector
class
(made available by the begin()
and end()
member functions) to the
generic algorithms is without any problem, as the pointers that are returned
by these members are accepted by the generic algorithms:
#include <iostream> #include "vector.h" int main() { Vector<int> vi; for (int idx = 0; idx < 10; ++idx) vi.push_back(idx); random_shuffle(vi.begin(), vi.end()); copy(vi.begin(), vi.end(), ostream_iterator<int>(cout, " ")); cout << endl; } /* generated output, e.g., 1 7 9 3 5 6 0 4 8 2 */
However, the use of a reverse iterator is not without problems: the
reverse iterator shows rather complex behavior, in that it visits the elements
in its range backwards, resulting in a reversal of the meanings of the
operators that are used to step forward and step backward in a range. If a
reverse iterator rit
points to element 11 in a range, then it will point
to element 10, rather than 12 after ++rit
. Because of this complex
behavior, the reverse iterator will be implemented as a class
reverse_iterator
, which can do the required bookkeeping for us.
Furthermore, in the current context reverse_iterator
is a reverse
iterator only for the elements of the class Vector
. So, a
nested class seems to be appropriate here. Since Vector
has
already been defined, we will decide to leave it untouched. Instead of
modifying Vector
, a template class Vector2
is derived of it, and
Vector2
will be given its nested class reverse_iterator
. This latter
class can be derived from random_access_iterator
to realize a
RandomAccessIterator for the class Vector2
:
template <typename Type> class Vector2: public Vector<Type> { public: class reverse_iterator: public random_access_iterator<Type, ptrdiff_t> { friend class Vector2<Type>; // see the text, below ... }; ... };The
ptrdiff_t
type that is mentioned here is ordinarilly defined as an
int
(e.g., with the
Linux
operating system it is defined as a
__kernel_ptrdiff_t
in
types.h
, which is defined as an int
in
posix_types.h
).
The reverse_iterator
class is now defined as a nested template class,
derived from the template class random_access_iterator
. Its surrounding
class is Vector2
, which itself is a template class that was derived from
the template class Vector
.
Within reverse_iterator
we store two Type
pointers, called
begin
and end
, which will be used to define the reversed range
[end - 1, begin - 1)
. In the
private section of reverse_iterator
we declare:
Type *begin, *end;
Begin
and end
receive their initial values from the begin()
and end()
members of the Vector
object. However, the class
reverse_iterator
is closely tied to Vector2
: only Vector2
objects
should be allowed to construct their own reverse iterators. This policy is
enforced by making Vector2<Type>
a
bound friend of
reverse_iterator
, and by declaring its constructor private
. The
constructor expects the two Type
pointers, defining the range
[begin(), end())
. Internally begin() - 1
will be used as
endpoint of the reversed range, while end() - 1
will be used as the
beginning of the reversed range. The constructor, defined in the private
section of the reverse_iterator
class, is therefore:
reverse_iterator(Type *begin, Type *end) : begin(begin), end(end) {}In the implementation of
reverse_iterator
, end
will gradually walk
towards begin
, which will keep its value. The dereferencing operator
will not return *begin
, but rather end[-1]
: the element just before
the element to which end
points. All this is allowed, as
Vector::begin()
and Vecotr::end()
return random access iterators
themselves, for which
pointer arithmetic operations are defined.
Since the reverse_iterator
class declares Vector2<Type>
as a
friend, the Vector2
class can implement two members, rbegin()
and
rend()
returning reverse_iterator
objects:
reverse_iterator rbegin() { return reverse_iterator(begin(), end()); } reverse_iterator rend() { return reverse_iterator(begin(), begin()); }Apart from these two members, the
Vector2
class could define two
constructors, analogous to the constructors in the Vector
class. E.g.,
Vector2() : Vector<Type>() {}
In order to allow programs to use these reverse iterators, a public
copy constructor and overloaded
assignment operator is needed in the
class reverse_iterator
. Since the Type
pointers in
reverse_iterator
are merely used for pointing to existing information, a
destructor
is not required. Of course, this
implies that the usefulness of a reverse_iterator
terminates if the object
from which it was obtained goed
out of scope. It is the
responsibility of the programmer to make sure this doesn't cause
disaster. In the public section of reverse_iterator
we add the following
members (implementations given as inline code):
reverse_iterator(reverse_iterator const &other) : begin(other.begin), end(other.end) {} reverse_iterator &operator=(reverse_iterator const &other) { begin(other.begin); end(other.end); }The implementation shown below is fairly Spartan: illegal pointer values are not detected. As the generic algorithms do not require these checks we've left them out. Of course, one could modify the implementation and incorporate checks at several locations. It's up to the reader.
Random access iterators require us to implement a series of member
functions. Here they are, as they should appear in the
public section of
the reverse_iterator
nested class. Most implementations will be
self-explanatory, and consist of single lines of code.
Type &operator*() const { return end[-1]; // reference to the previous element } bool operator==(reverse_iterator const &other) const { return other.end == end; // comparing two iterators for equality } reverse_iterator &operator++() { --end; // increment is: to the previous element return *this; } reverse_iterator &operator--(); { ++end; // decrement is: to the next element return *this; } // remaining members: see the text bool operator<(reverse_iterator const &other) const; reverse_iterator operator++(int); reverse_iterator operator--(int); int operator-(reverse_iterator const &other) const; reverse_iterator operator+(int step) const; reverse_iterator operator-(int step) const;The following member deserve special attention:
bool operator<(reverse_iterator const &other) const
This function compares two iterators. Theoperator<()
must be interpreted as `the lvalue iterator points to an element earlier in the range than the rvalue iterator'. Since we're talking reversed iterators here, this means that the lvalue operator points to a later element than the rvalue operator in the normal iterator range. So, the overloaded operator should return true ifthis->end > other.end
, implying that the current iterator points to an element that is located earlier in the range than the other iterator. So, we implement:return end > other.end;
reverse_iterator operator++(int)
The postfix increment operator must return areverse_iterator
pointing to the current element, incrementing the pointer. Again, this implies thatend
is decremented. Since we have a constructor expecting two (normal) iterators,, the implementation again is a one-liner:return reverse_iterator(begin, end--);Using the postfix decrement withend
, areverse_iterator
object is returned representing the iterator's position at the time the postfix increment operator was called, incrementing (so, decrementingend
) the reversed iterator as a side effect.
reverse_iterator operator++(int)
Implemented analogously:return reverse_iterator(begin, end++);
int operator-(reverse_iterator const &other) const
This is the pointer arithmetic operator: it returns the difference between two pointers, representing the number of steps to make to reach the lvalue operand starting from the rvalue operand. Normally we would implement this asend - other.end
. However, since reversed iterators reduce their values to reach elements later in the range, the algebraic sign must be negated with reversed iterators:-(end - other.end)
, or, using simle algebraic manipulations:return other.end - end;
reverse_iterator operator-(int step) const
This operator allows us to use random staps backward using a
random access iterator. With a reversed iterator this means that we must
increase the value of the iterator. So we return the following
reverse_iterator
object:
return reverse_iterator(begin, end + step);
reverse_iterator operator+(int step) const
This operator does the opposite. So we implement it accordingly: return operator+(-step);
Vector2
interface, if the
corresponding member functions are only declared:
template <typename Type> // this is a template function // the class in which the Vector2<Type>::reverse_iterator<Type>:: // constructor is defined reverse_iterator(Type *begin, Type *end)// the constructor itself : begin(begin), end(end) {} template <typename Type> // this is a template function Vector2<Type>::reverse_iterator // this is its return type // the class in which the funcion Vector2<Type>::reverse_iterator<Type>:: // is defined operator+(int step) const // this is the function itself { return reverse_iterator(begin, end - step); }Finally, an example of a program using the
reversed_iterator
:
#include <iostream> #include "vector3.h" int main() { Vector2<int> vi; for (int idx = 0; idx < 10; ++idx) vi.push_back(idx); copy(vi.begin(), vi.end(), ostream_iterator<int>(cout, " ")); cout << endl; random_shuffle(vi.rbegin(), vi.rend()); copy(vi.begin(), vi.end(), ostream_iterator<int>(cout, " ")); cout << endl; sort(vi.rbegin(), vi.rend()); copy(vi.begin(), vi.end(), ostream_iterator<int>(cout, " ")); cout << endl; reverse(vi.rbegin(), vi.rend()); copy(vi.begin(), vi.end(), ostream_iterator<int>(cout, " ")); cout << endl; copy(vi.rbegin(), vi.rend(), ostream_iterator<int>(cout, " ")); cout << endl; } /* Generated output: 0 1 2 3 4 5 6 7 8 9 7 1 5 9 3 4 6 0 2 8 9 8 7 6 5 4 3 2 1 0 0 1 2 3 4 5 6 7 8 9 9 8 7 6 5 4 3 2 1 0 */
flex
and the
parser generator
bison
. Both
bison
and flex
are well documented elsewhere. The original
predecessors of bison
and flex
, called
yacc
and
lex
are
described in several books, e.g. in
O'Reilly's book `lex & yacc'.
However, scanner- and parser generators are also (and maybe even more
commonly, nowadays) available as free software. Both bison
and flex
can be obtained from
ftp://prep.ai.mit.edu/pub/gnu. Flex
will create a C++
class when called as flex++
, or when the
-+
flag
is used. With bison
the situation is a bit more complex. Scattered over
the Internet several bison++
archives can be found (e.g., in
ftp://ftp.rug.nl/contrib/frank/software/linux/bison/ or
rzbsdi01.uni-trier.de). The
information in these archives usually dates back to 1993, irrespective of the
version number mentioned with the archive itself.
Using flex++
and bison++
a class
-based scanner and parser can be
generated. The advantage of this approach is that the interface to the scanner
and the parser tends to become a bit cleaner than without the class
interface.
Below two examples are given. In the first example only a lexical scanner is
used to monitor the production of a file from several parts. This example
focuses on the lexical scanner, and on switching files while churning through
the information. The second example uses both a scanner and a parser to
transform standard arithmetic expressions to their postfix notation, commonly
encountered in code generated by compilers and in HP
-calculators. The
second example focuses on the parser.
#include
statement, which is
followed by a string indicating the file which should be included at the
location of the #include
.
In order to avoid complexities that have irrelevant to the current
example, the format of the #include
statement is restricted to the
form
#include <filepath>
.
The file specified between the pointed brackets should be available at
the location indicated by filepath
. If the file is not available,
the program should terminate using a proper error message.
The program is started with one or two filename arguments. If the program is
started with just one filename argument, the output is written to the
standard output stream cout
. Otherwise, the output is written to
the stream whose name is given as the program's second argument.
The program uses a maximum nesting depth. Once the maximum is exceeded, the program terminates with an appropriate error message. In that case, the filename stack indicating where which file was included should be printed.
One small extra feature is that
comment-lines should be recognized:
include
directives in comment-lines should be ignored, comment being the
standard C++ comment-types.
The program is created in the following steps:
lexer
is constructed, containing the
specifications of the input-language.
lexer
the requirements for the
class Scanner
evolve. The Scanner
class is a
wrapper around the
class
yyFlexLexer
generated by
flex++
. The requirements results in the
specification of the
interface for the class Scanner
.
main()
function is constructed. A Startup
object
is created to inspect the
command-line arguments. If successful,
the scanner
's member yylex()
is called to construct the
output file.
19.7.1.1: The flex++ specification file
The organization of the
lexical scanner specification file is similar
to the one used with flex
. However, flex++
creates a class
(
yyFlexLexer
) from which the class Scanner
will be derived.
The code associated with the
regular expression rules will be located
inside the class yyFlexLexer
. However, it would be handy to access the
member functions of the derived class within that code. Fortunately,
inheritance helps us to realize this. In the specification of the class
yyFlexLexer()
, we notice that the function
yylex()
is a
virtual
function. In the FlexLexer.h
header file we see virtual int yylex()
:
class yyFlexLexer: public FlexLexer { public: yyFlexLexer( istream* arg_yyin = 0, ostream* arg_yyout = 0 ); virtual ~yyFlexLexer(); void yy_switch_to_buffer( struct yy_buffer_state* new_buffer ); struct yy_buffer_state* yy_create_buffer( istream* s, int size ); void yy_delete_buffer( struct yy_buffer_state* b ); void yyrestart( istream* s ); virtual int yylex(); virtual void switch_streams( istream* new_in, ostream* new_out ); };Consequently, if
yylex()
is defined in a derived class, then this
function of the derived class will be called from a
base class (i.e.,
yyFlexLexer
) pointer. Since the yylex()
function of the
derived class is called, that function will have access to the members
of its class, and to the public and protected members of its base class.
The context in which the generated scanner is placed is (by
default)
the function
yyFlexLexer::yylex()
. However, this context can be changed by
defining the
YY_DECL
-macro. This macro, if defined, determines the context
in which the generated scanner will be placed. So, in order to make the
generated scanner part of the derived class function yylex()
, three
things must be done:
YY_DECL
must be defined in the lexer specficiation
file. It must define the derived class function yylex()
as the scanner
function. For example:
#define YY_DECL int Scanner::yylex()
yylex()
must be declared in the class definition of
the derived class.
yyFlexLexer::yylex()
is a virtual function, it
must still be defined. It is not called, though, so its definition may be a
simple
int yyFlexLexer::yylex() { return 0; }
YY_DECL
macro and the yyFlexLexer::yylex()
function can conveniently be placed in the lexer specification file, as shown
below.
Looking at the regular expressions themselves, notice that we'll need rules
for the recognition of the comment, for the
#include
directive, and for the remaining characters. This is all fairly
standard practice. When an #include
directive is detected, the directive
is parsed by the scanner. This is
common practice:
preprocessor directives are normally not
analyzed by a
parser, but by the
lexical scanner. The scanner uses a
mini scanner to extract the filename from the directive, throwing an
Scanner::Error
value (invalidInclude
) if this fails. If the filename
could be extracted, it is stored in nextSource
, and on completion of the
reading of the #include
directive switchSource()
is called to perform
the switch to another file.
When the end of the file (EOF
) is detected, the derived class'
member function popSource()
is called, which will pop the previously
pushed file, returning
true
. Once the file-stack is empty, the function
will return
false
, resulting in the call of yyterminate()
, which will
terminate the scanner.
The
lexical scanner specification file has three sections: a C++
preamble, containing code which can be used in the code defining the
actions to be performed once a regular expression is matched, a Flex++
symbol area, which is used for the definition of symbols, like a
mini scanner, or
options, like
%option yylineno
when the
lexical scanner should keep track of the
line numbers of the files it is
scanning and, finally a
rules section, in which the regular expressions
and their actions are given. In the current example, the main task of the
lexer is to
copy information from the
istream
*yyin
to the
ostream
*yyout
, for which the predefined macro
ECHO
can
be used. The complete and annotated lexical scanner specification file to be
used with
flex++
is:
%{ /* ---------------------------------------------------------------------------- C++ -preamble. Include header files, other than those generated by flex++ and bison++. E.g., include the interface to the class derived from yyFlexLexer ----------------------------------------------------------------------------*/ // the yylex() function that's actually // used #define YY_DECL int Scanner::yylex() #include "scanner.H" // The interface of the derived class int yyFlexLexer::yylex() // not called: overruled by { // Scanner::yylex() return 0; } %} /* ---------------------------------------------------------------------------- Flex++ symbol area ~~~~~~~~~~~~~~~~~~ The symbols mentioned here are used for defining e.g., a mini scanner ---------------------------------------------------------------------------- */ %x comment %x include %option yylineno eolnComment "//".* anyChar .|\n /* ---------------------------------------------------------------------------- Flex rules area: ~~~~~~~~~~~~~~~~ Regular expressions below here define what the lexer will recognize. ---------------------------------------------------------------------------- */ %% /* The comment-rules: comment lines are ignored. */ {eolnComment} "/*" BEGIN comment; <comment>{anyChar} <comment>"*/" BEGIN INITIAL; /* File switching: #include <filepath> */ #include[ \t]+"<" BEGIN include; <include>[^ \t>]+ nextSource = yytext; <include>">"[ \t]*\n { BEGIN INITIAL; performSwitch(); } <include>{anyChar} throw invalidInclude; /* The default rules: eating all the rest, echoing it to output */ {anyChar} ECHO; /* The <<EOF>>)rule: pop a pushed file, or terminate the lexer */ <<EOF>> { if (!popSource()) yyterminate(); }Since the derived class is able to access the information stored within the lexical scanner itself (it can even access the information directly, since the data members of
yyFlexLexer
are
protected, and thus
accessible to derived classes), very much processing can be done by the
derived class' member functions. This results in a very clean setup of the
lexer specification file, which requires hardly any code in the
preamble.
19.7.1.2: The derived class: Scanner
The class Scanner
is derived from the class
yyFlexLexer
, generated by
flex++
. The derived class has access to the data controlled by the lexical
scanner. In particular, the derived class has access to the following data
members:
char *yytext
: contains the
text matched by
a
regular expression (accessible outside of the scanner from the
YYText()
member);
int yyleng
: the
length of the text
in yytext
(accessible outside of the scanner from the
YYLeng()
member);
int yylineno
: the current
line number (only if
%option yylineo
was specified in the lexer specfication file,
accessible outside of the scanner from the
lineno()
member);
FlexLexer.h
, which is part of the
flex
distribution.
The class Scanner
performs two tasks: It pushes file information about the
current file to a
file stack, and pops the information pushed last once
EOF
is detected in a file.
Several member functions are needed to accomplish these tasks. As they are
auxiliary to the scanner, they are
private members. In practice, these
private members are developed once the need for them arises. In the following
interface of the Scanner
class the final
header file is given. Note
that, apart from the private member functions, several private data members
are used as well. These members are initialized by the Scanner()
constructor and are used in the private member functions. They are
discussed below, in the context of the member functions that are using
them. Here is the scanner.h
header file:
#include <FlexLexer.h> // provides yyFlexLexer interface #include <fstream> #include <string> #include <stack> #include <vector> using namespace std; // FBB: needed for g++ >= 3.0 class Scanner: public yyFlexLexer { public: enum Error { invalidInclude, circularInclusion, nestingTooDeep, cantRead, }; Scanner(istream *yyin, string const &initialName); string const &lastFile() { return fileName.back(); } void stackTrace(); // dumps filename stack to cerr int yylex(); // overruling yyFlexLexer's yylex() private: Scanner(Scanner const &other); // no Scanner copy-initialization Scanner &operator=(Scanner const &other); // no assignment either bool popSource(); // true: source popped void performSwitch(); void throwOnCircularInclusion(); stack<yy_buffer_state *> state; vector<string> fileName; string nextSource; static unsigned const sizeof_buffer = 16384, maxDepth = 10; };If the scanner can extract a filename from an
#include
directive, a
switch to another file is performed by performSwitch()
. If the filename
could not be extracted, the scanner throws an invalidInclude
exception
value. The performSwitch()
member and the matching function
popSource()
handle the file switch. In particular, note that
yylineno
is not updated when a
file switch is performed. If line numbers are to be
monitored, then the current value of yylineno
should be pushed on a stack,
and yylineno
should be reset by performSwitch()
, while popSource()
should reinstate a former value of yylineno
by popping a previous value
from the stack. In the current implementation of Scanner
a simple
stack
of
yy_buffer_state
pointers is maintained. Changing that into a stack of
pair<yy_buffer_state *, unsigned>
elements allows you to save
the line numbers. This modification is left as an
exercise to the reader.
As mentioned. performSwitch()
performs the actual file-switching. The
yyFlexLexer
class provides a series of member functions that can be used
for file switching purposes. The file-switching capability of a
yyFlexLexer
object is founded on the struct yy_buffer_state
,
containing the state of the scan-buffer of the file that is currently scanned
by the lexical scanner. This buffer is pushed on the state
stack when an
#include
is encountered. Then this buffer is replaced by the buffer that
will be created for the file that is mentioned in the #include
directive.
The actual
file switching is realized as follows:
include
-nesting is inspected. If
maxDepth
is reached, the stack is considered full, and the scanner throws
a nestingTooDeep
exception.
fileName
vector is inspected, to avoid
circular inclusions. If nextSource
is encountered in the
fileName
vector, the inclusion is refused, and the scanner throws a
circularInclusion
exception. The member function
throwOnCircularInclusion()
is called for this task.
ifstream
object is created, for the filename
in nextSource
. If this fails, the scanner throws a cantRead
exception.
yy_buffer_state
is created for the newly opened
stream, and the lexical scanner is instructed to switch to that
stream using yyFlexLexer
's member function
yy_switch_to_buffer()
.
performSwitch()
and
throwOnCircularInclusion()
are:
#include "scanner.H" void Scanner::performSwitch() { if (state.size() == maxDepth) throw nestingTooDeep; fileName.push_back(nextSource); throwOnCircularInclusion(); ifstream *newStream = new ifstream(nextSource.c_str()); if (!*newStream) throw cantRead; state.push(yy_current_buffer); yy_switch_to_buffer(yy_create_buffer(newStream, sizeof_buffer)); }
#include "scanner.H" void Scanner::throwOnCircularInclusion() { vector<string>::iterator it = find(fileName.begin(), fileName.end(), nextSource); if (it != fileName.end()) throw circularInclusion; }The member function
popSource()
is called to pop the previously pushed
buffer from the stack, to continue its scan just beyond the just processed
#include
directive. The popSource()
function first inspects
the size of the state
stack: if empty, false
is returned and the
fuction terminates. if not empty, then the current buffer is
deleted, to be replaced by the state waiting on top of the stack. The file
switch is performed by the yyFlexLexer
members
yy_delete_buffer()
and
yy_switch_to_buffer
. Note that yy_delete_buffer()
takes care of the
closing of the ifstream
and of deleting the memory allocated for this
stream in performSwitch()
. Furthermore, the filename that was last entered
in the fileName
vector is removed. Having done all this, the function
returns true
:
#include "scanner.H" bool Scanner::popSource() { if (state.empty()) return false; yy_delete_buffer(yy_current_buffer); yy_switch_to_buffer(state.top()); state.pop(); fileName.pop_back(); return true; }These functions complete the implementation of the complete lexical scanner. the lexical scanner itself is stored in the
Scanner::yylex()
function. The Scanner
object itself only has three public member
functions and a constructor. One member function is yylex()
, the other two
allow the called to dump a stack trace and to obtain the name of the file that
was last processed by the scanner. The constructor is a simple piece of
code. Here is its source:
#include "scanner.h" Scanner::Scanner(istream *yyin, string const &initialName) { switch_streams(yyin, yyout); fileName.push_back(initialName); }
The main program is very simple. As the program expects a filename to start
the scanning process at, initially the number of arguments is checked. If at
least one argument was given, then a ifstream
object is created. If this
object can be created, then a Scanner
object is constructed, receiving the
address of the ifstream
object and the name of the file as its
arguments. Then the yylex()
member function of the Scanner
object is
called. This function is inherited from the Scanner
's base class
yyFlexLexer
. If anything fails inside the scanner, an exception is thrown,
which is caught at the end of the main()
function:
/* lexer.cc A C++ main()-frame generated by C++ for lexer.cc */ #include "lexer.h" /* program header file */ int main(int argc, char **argv) { if (argc == 1) { cerr << "Filename argument required\n"; exit (1); } ifstream yyin(argv[1]); if (!yyin) { cerr << "Can't read " << argv[1] << endl; exit(1); } Scanner scanner(&yyin, argv[1]); try { scanner.yylex(); } catch (Scanner::Error err) { char const *msg[] = { "Include specification", "Circular Include", "Nesting", "Read", }; cerr << msg[err] << " error in " << scanner.lastFile() << ", line " << scanner.lineno() << endl; scanner.stackTrace(); return 1; } }
19.7.1.4: Building the scanner-program
The final program is constructed in two steps. These steps are given
for a
Unix system, on which
flex++
and the
Gnu C++
compiler
g++
have been installed:
flex++
. For
this the command
flex++ lexer
can be given.
libfl.a
library. The appropriate command here is
g++ -o scanner *.cc -lfl
flex++
is called with the
-d
flag, debugging code will be part of the
generated scanner. Apart from that, the debugging code must be
activated. Assuming the scanner object is called scanner
, the statement
scanner.set_debug(1);
must be given following the construction of the scanner
object.
Starting point for a program using both a parser and a scanner is the grammar: the grammar is specified first. This results in a set of tokens which can be returned by the lexical scanner (commonly called the lexer). Finally, auxiliary code is provided to `fill in the blanks': the actions which are performed by the parser and the lexer are not normally specified in the grammatical rules or lexical regular expressions, but are executed by functions, which are called from within the parser's rules or associated with the lexer's regular expressions.
In the previous section we've seen an example of a C++ class generated by
flex++
. In the current section we concern ourselves with the parser. The
parser can be generated from a grammar specified for the program
bison++
. The specification of bison++
is similar to the specifications
required for
bison
, but a class is generated, rather than a single
function. In the next sections we'll develop a program converting
infix expressions, in which binary operators are written between
their operands, to
postfix expressions, in which binary operators are
written behind their operands. A comparable situation holds true for the
unary operators -
and +
: We ignore the +
operator, but the
-
will beconverted to a
unary minus.
Our calculator will recognize a minimal set of operators: multiplication, addition, parentheses, and the unary minus. We'll distinguish real numbers from integers, to illustrate a subtlety in the bison-like grammar specifications, but that's about it: the purpose of this section, after all, is to illustrate a C++ program, using a parser and a lexer, and not to construct a full-fledged calculator.
In the next few sections we'll develop the grammar in a bison++
specification file. Then, the regular expressions for the scanner are
specified according to the requirements of flex++
. Finally the program is
constructed.
The class-generating bison software (bison++) is not widely available. The version used by us is 2.20. It can be obtained at ftp://ftp.rug.nl/contrib/frank/software/linux/bison/bison++2.20.tar.gz.
19.7.2.1: The bison++ specification file
The
grammar specification file used with
bison++
is comparable to the
specification file used with
bison
. Differences are related to the class
nature of the resulting parser. Our calculator will distinguish real numbers
from integers, and will support the basic set of arithmetic operators.
The bison++
specification file consists of the following sections:
bison
. The difference being the
%header{...}
opening. In this section we'll encounter mainly
declarations: header files are included, and the
yyFlexLexer
object is
declared.
bison++
has
several extra items that can be declared here. They are important and warrant
a section of their own.
bison
program.
19.7.2.2: The bison++ token section
The token section contains all the tokens that are used in the grammar, as well as the priority rules as used for the mathematical operators. Moreover, several new items can be declared here:
%name ParserName
. The name ParserName
will be
the name of the parser's class. This entry should be the first entry of the
token section. It is used in cases where
multiple grammars are used, to
make sure that the different parser-classes use unique identifiers. By
default the name
parse
is used.
%define name content
. The %define
has the same
function as the #define
statement for the C++
preprocessor. It can be
used to define, e.g., a
macro. Internally, the defined symbol will be the
concatenation of
YY_
, the parser's class name, and the name of the
macro. E.g.,
YY_ParserName_name
Several symbols will normally be defined here. Of all the definitions that can be given here, the following two are required:
%define LEX_BODY { inline-code }
:
here the body of the call to the
lexer is defined. It can be defined as
= 0
for an abstract parser-class, but otherwise it must contain the code
(including surrounding curly braces) representing the call to the lexer. For
example, if the lexer object generated by flex++
is called lexer
, this
declaration should be
%define LEX_BODY { return lexer.yylex(); }
%define ERROR_BODY { inline-code }
:
similarly, the body of the code of the call to the error-function is defined
here. It can be defined as = 0
, in which case the parser's class will
again become abstract. Otherwise, it is used to specify the inner workings of
the error function, including surrounding braces. E.g.,
%define ERROR_BODY { cerr << "syntax Error\n"; }
LEX_BODY
and
ERROR_BODY
definitions are omitted, then the
compiler is not able to complete the
virtual table of the parser class, and
the linking phase will report an error like
undefined reference to `Parser virtual table'
The remaining symbols are optional, and can be (re)defined as needed:
%define DEBUG 1
: if non-0 debugging
code will be included in the parser's source.
%define ERROR_VERBOSE
: if
defined, the parser's stack will be dumped when an error occurs.
%define LVAL yylval
: the default
variable name is shown here: the variable name containing the parser's
semantic value is by
default
yylval
, but its name may be redefined
here.
%define INHERIT : public ClassA,
public ClassB
: the
inheritance list for the parser's class. Note that it
starts with a
colon. The space character between INHERIT
and the colon may be omitted. The %define
itself should be omitted if the
parser class is not derived from another class.
%define MEMBERS member-prototypes
: if
the parser
contains extra members, they must be declared
here. Note that there is only one %define MEMBERS
definition allowed. So,
if multiple members are to be declared, they must all be declared at this
point. To prevent
very long lines in
the specification file, the \ (
backslash) can be used at the end of a line,
to indicate that it continues on the next line of the source-text. E.g.,
%define MEMBERS void lookup(); \ void lookdown();The
MEMBERS
section starts as a
public
section. If
private
members are required too, a private:
directive can be part of the
MEMBERS
section.
%defines
can be used:
%define
CONSTRUCTOR_PARAM parameterlist
: this defines the parameterlist for the
parser's constructor. Here the types and names of the parameters of the
parser's constructor should be given. The surrounding
parentheses of the
parameter list are not part of the CONSTRUCTOR_PARAM
definition.
%define CONSTRUCTOR_INIT
: initializer(s)
: this defines the
base class and
member initializers of the constructor. Note the
initial
colon following CONSTRUCTOR_INIT
, which is required. The colon
may be given immediately after the CONSTRUCOR_INIT
statement, or blanks
may be used to separate the symbol from the colon.
%define CONSTRUCTOR_CODE
{ code }
: this defines the code of the parser's constructor (including
surrounding curly braces).
parse parser;
%union
. This starts the definition of the
semantical value union. It replaces the
#define YYSTYPE
definition seen
with bison
. An example of a %union
declaration is
%union { int i; double d; };The union cannot contain objects as its fields, as constructors cannot be called when a union is created. This means that a
string
cannot be a member of the
union. A string *
, however, is a possible union member. As a side
line: the
lexical scanner does not have to know about this union. The
scanner can simply pass its scanned text to the parser through its
YYText()
member function. For example, a
statement like:
$$.i = atoi(scanner.YYText());
can be used to convert matched text to a value of an appropriate type.
$$
or the
generic return values $1, $2
, etc, that are associated with components of
rules can be used, rather than $$.i, $3.d
, etc. To associate a
non-terminal or a token with a union field, the
<fieldname>
specification is
used. E.g.,
%token <i> INT // token association (deprecated) <d> DOUBLE %type <i> intExpr // non-terminal associationIn this example, note that both the tokens and the non-terminals can be associated with a field of the union. However, as noted earlier, the lexical scanner does not have to know about all this. In our opinion, it is cleaner to let the scanner do just one thing: scan texts. The parser knows what the input is all about, and may convert strings like
"123"
to an integer
value. Consequently, we are discouraging the association of a union field and
a
token. In the upcoming
description of the rules of the grammar this will be further illustrated.
%union
discussion the
%token
and
%type
specifications should be noted. They are used for
the specficiation of the tokens (
terminal symbols) that can be returned by
the lexical scanner, and for the specification of the return types of
non-terminals. Apart from %token
the
token indicators
%left
,
%right
and
%nonassoc
may be used to specify the
associativity of operators. The tokens
mentioned at these indicators are interpreted as tokens indicating operators,
associating in the indicated direction. The
precedence of operators is given
by their order: the first specification has the lowest priority. To overrule a
certain precedence in a certain context,
%prec
can be
used. As all this is standard bison
practice, it isn't further discussed
in this context. The documentation provided with the bison
distribution
should be consulted for further reference.
19.7.2.3: The bison++ grammar rules
The rules and actions of the grammar are specified as usual. The grammar for our little calculator is given below. A lot of rules, but they illustrate the use of non-terminals associated with value-types.
lines: lines line | line ; line: intExpr '\n' { cerr << "int: " << $1 << endl; } | doubleExpr '\n' { cerr << "double: " << $1 << endl; } | '\n' { cout << "Good bye\n"; YYACCEPT; } | error '\n' ; intExpr: intExpr '*' intExpr { $$ = $1 * $3; } | intExpr '+' intExpr { $$ = $1 + $3; } | '(' intExpr ')' { $$ = $2; } | '-' intExpr %prec UnaryMinus { $$ = -$2; } | INT { $$ = atoi(lexer.YYText()); } ; doubleExpr: doubleExpr '*' doubleExpr { $$ = $1 * $3; } | doubleExpr '+' doubleExpr { $$ = $1 + $3; } | doubleExpr '*' intExpr { $$ = $1 * $3; } | doubleExpr '+' intExpr { $$ = $1 + $3; } | intExpr '*' doubleExpr { $$ = $1 * $3; } | intExpr '+' doubleExpr { $$ = $1 + $3; } | '(' doubleExpr ')' { $$ = $2; } | '-' doubleExpr %prec UnaryMinus { $$ = -$2; } | DOUBLE { $$ = atof(lexer.YYText()); } ;Using these rules a simple calculator is defined in which integer and real values can be negated, added, and multiplied, and in which standard priority rules can be circumvented using parentheses. The rules show the use of typed nonterminal symbols:
doubleExpr
is linked to real (double)
values, intExpr
is linked to integer values. Precedence and type
association is defined in the token section of the parser specification file,
which is:
%name Parser %union { int i; double d; }; %token INT DOUBLE %type <i> intExpr <d> doubleExpr %left '+' %left '*' %right UnaryMinus %define MEMBERS \ virtual ~Parser() {} \ private: \ yyFlexLexer lexer; %define LEX_BODY {return lexer.yylex();} %define ERROR_BODY { cerr << "error encountered\n"; }In the token section we see the use of the
%type
specifiers,
connecting intExpr
to the i
-field of the semantic-value union, and
connecting doubleExpr
to the d
-field. At first sight it looks
complex, since the expression rules must be included for each individual
return type. On the other hand, if the union itself would have been used, we
would have had to specify somewhere in the returned semantic values what field
to use: less rules, but more complex and error-prone code.
Also, note that the lexical scanner is included as a member of the parser. There is no need to define the scanner outside of the parser, as it's not used outside of the parser object. The virtual destructor is included as an member to prevent the compiler from complaining about the parser having a non-virtual destructor.
19.7.2.4: The flex++ specification file
The flex-specification file that is used by our calculator is
simple: blanks are skipped, single characters are returned, and
numerical values are returned as either Parser::INT
or
Parser::DOUBLE
tokens. Here is the complete flex++
specification file:
%{ #include "parser.h" %} %option yylineno %% [ \t] ; [0-9]+ return(Parser::INT); "."[0-9]* | [0-9]+("."[0-9]*)? return(Parser::DOUBLE); exit | quit return (Parser::DONE); .|\n return (*yytext);
19.7.2.5: The generation of the code
The code
is generated in the
same way as with bison
and flex
. To order bison++
to generate the
files parser.cc
and parser.h
, give the command:
bison++ -d -o parser.cc parser
Next,
flex++
will generate code on
lexer.cc
using the command
flex++ -I -olexer.cc lexer
Note here that flex++
expects no blanks between the -o
flag
and lexer.cc
.
On Unix systems, linking and compiling the generated sources and the source for the main program (given below) is realized by the command:
g++ -o calc -Wall *.cc -lfl -s
Note the fact that the
libfl.a
library is mentioned here. If it's not
mentioned, the
linker will complain about
unresolved references like
yywrap()
.
A source in which the main()
function, the lexical
scanner and the parser objects are defined is, finally:
#include "parser.h" int main() { Parser parser; cout << "Enter (nested) expressions containing ints, doubles, *, + and " "unary -\n" "operators. Enter an empty line, exit or quit to exit.\n"; parser.yyparse(); return (0); }