We're always interested in getting feedback. E-mail us if you like this guide, if you think that important material is omitted, if you encounter errors in the code examples or in the documentation, if you find any typos, or generally just if you feel like e-mailing. Send your email to Frank Brokken.Please state the document version you're referring to, as found in the title (in this document: 5.1.0d) and please state the paragraph you're referring to.
All mail received is seriously considered, and new (sub)releases of the Annotations will normally reflect your suggestions for improvements. Except for the incidental case I will not otherwise acknowledge the receipt of suggestions for improvements. Please don't misinterpret this for lack of appreciation.
This chapter presents a number of concrete examples of programming in
C++. Items from this document such as virtual functions, static
members,
etc. are rediscussed. Examples of container classes are shown.
Another example digs into the peculiarities of using a parser- and scanner-generator with C++. Once the input for a program exceeds a certain level of complexity, it's advantageous to use a scanner- and parser-generator for creating the code which does the actual input recognition. The example describes the usage of these tool in a C++ environment.
streambuf
as the
starting point for the construction of classes interfacing file
descriptors. In this section we will discuss the construction of a class which
may be used to write to a device defined by its file descriptor: it may be a
file, but could be a
pipe or
socket. Section 19.1.2 will discuss
the matter of reading from devices that are known by their file descriptors,
while section 19.3 will reconsider the question of redirection,
which was discussed earlier in section 5.8.3.
Basically, deriving a class for
output operations is very simple. The only
member function that must be overridden is the
virtual
int overflow(int
c)
member. This member is responsible for writing
characters to the device in the case where there is no buffer or no more room
in the buffer. If fd
represents the file descriptor to write to, and if we
decide against using a buffer, the member overflow()
can be as simple as:
class UnbufferedFD: public std::streambuf { public: int overflow(int c) { if (c != EOF) { if (write(fd, &static_cast<char>(c), 1) != 1) return EOF; } return c; } ... }The reveived argument is either written as
char
to the file
descriptor, or EOF
is returned. However, this approach does not use an
output buffer. As the use of a buffer is strongly advised (see also the next
section), we will discuss the construction of a class using an output buffer
is somewhat greater detail.
When an output buffer is to be used, the overflow()
member gets a bit more
complex, due to the fact that it is now called only when the buffer is
full. So, first we will have to flush the buffer, for which the (virtual)
function
streambuf::sync()
should be used. Defining sync()
and using
it in overflow()
is not all: the information to write might be less than
the size of the buffer. So, at the end of the
lifetime of our special
streambuf
object, it might find itself having a buffer that is only
partially full. So, we must make sure that the buffer is flushed by the time
our object goes
out of scope. This is of course very simple: sync()
should be called in the
destructor as well.
Now that we've considered the consequences of using an output buffer, we're ready to construct our derived class. Here it is:
#include <unistd.h> #include <streambuf> #include <ios> class ofdstreambuf: public std::streambuf { public: ofdstreambuf(int fd) : fd(fd) { setp(buffer, buffer + BUFSIZE); // 12 } ~ofdstreambuf() { sync(); // 17 } int sync() { if (pptr() > pbase()) { write(fd, &buffer, pptr() - pbase()); // 25 setp(buffer, buffer + BUFSIZE); // 26 } return 0; } int overflow(int c) { sync(); // 34 if (c != EOF) { *pptr() = static_cast<char>(c); // 37 pbump(1); } return c; } private: static unsigned const BUFSIZE = 20; int fd; char buffer[BUFSIZE]; };
setp()
the begin and end points
of the buffer are passed to the streambuf
base class, allowing it
to setup
pbase()
pptr()
and
epptr()
.
sync()
is
called to write any unprocessed characters in the output buffer to
the device.
sync()
, the
unflushed characters in the buffer are written to the device. Then,
at the next line, the buffer is reinitialized. Note that sync()
should return 0 after a successfull flush operation.
streambuf
are
reset to their initial values by called setp()
once again.
overflow()
,
sync()
is called to flush the now filled up output buffer to the
device. As this recreates an empty buffer, the character c
which
could not be written to the buffer is now entered into the buffer
using the member functions pptr()
and
pbump()
. Notice that entering a character into the buffer is
realized using available member functions of streambuf
.
Depending on the number of arguments, the following program uses the
ofstreambuf
class to copy its standard input to file descriptor
STDOUT_FILENO
, which is the
symbolic name of the file descriptor used
for the standard output. Here is the program:
#include "fdout.h" #include <string> #include <iostream> #include <istream> using namespace std; int main(int argc) { ofdstreambuf fds(STDOUT_FILENO); ostream os(&fds); switch (argc) { case 1: os << "COPYING cin LINE BY LINE\n"; for (string s; getline(cin, s); ) os << s << endl; break; case 2: os << "COPYING cin BY EXTRACTING TO os.rdbuf()\n"; cin >> os.rdbuf(); // Alternatively, use: cin >> &fds; break; case 3: os << "COPYING cin BY INSERTING cin.rdbuf() into os\n"; os << cin.rdbuf(); break; } }
streambuf
, the class should use an input buffer of at least one character,
to allow the use of the member functions
istream::putback()
or
istream::ungetc()
. Stream classes (like
istream
) normally allow the
ungetting of at least one character using their member functions putback()
or ungetc()
. This is important, as these stream classes usually interface
to
streambuf
objects. Although strictly speaking it is not necessary to
implement a buffer in classes derived from streambuf
, we strongly suggest
to use buffers is these cases: the implementation is very simple and
straightforward, and the applicability of such classes is greatly
improved. Therefore, in all our classes
derived from the class streambuf
at least a
buffer of one character will be defined.
19.1.2.1: Using a buffer of one character
When deriving a class (e.g.,
ifdstreambuf
) from streambuf
using a
buffer of at least one character, at the minimum the member
streambuf::underflow()
should be overridden, as this is the member to
which all requests for input eventually degenerate. Since also a buffer is
needed, the member
streambuf::setg()
is used to inform the streambuf
baseclass of the dimensions of the input buffer, so that it is able to set up
its input buffer pointers so that
eback()
,
gptr()
, and
egptr()
return correct values. Here is the
corresponding class definition:
#include <unistd.h> #include <streambuf> #include <ios> class ifdstreambuf: public std::streambuf { public: ifdstreambuf(int fd) : fd(fd) { setg(buffer, buffer + 1, buffer + 1); // 12 } int underflow() { if (gptr() < egptr()) // 18 return *gptr(); // 19 if (read(fd, buffer, 1) <= 0) // 21 return EOF; setg(buffer, buffer, buffer + 1); // 24 return *gptr(); } protected: // 27 int fd; char buffer[1]; };Please note the following:
private
section a small
array of one character is defined, and setg()
will set gptr()
equal to
egptr()
. Since this implies that the buffer is empty, underflow()
will
be called to refill the buffer.
underflow()
will first see whether the buffer
is really empty. If not, then the character to be retrieved next is returned.
EOF
is returned. More sophisticated implementations could
react more intelligently here, of course.
setg()
is called once
again to set up streambuf
's buffer pointers correctly.
protected
data members so that derived classes (e.g., see section
19.1.2.3) can access them.
The above ifdstreambuf
class is used in the following program:
#include "ifdbuf.h" #include <iostream> #include <istream> using namespace std; int main(int argc) { ifdstreambuf fds(0); istream is(&fds); cout << is.rdbuf(); }
19.1.2.2: Using an n-character buffer
How complex would things get if we decide to use a buffer of substantial
size? Not that complex. The following class allows use to specify the size of
a buffer, and it is basically the same class as ifdstreambuf
from the
previous section. To make things a bit more interesting, in the current class
ifdnstreambuf
the member
streambuf::xsgetn()
was also overridden, to
optimize the reading of series of characters at once. Here is the class
ifdnstreambuf
:
#include <unistd.h> #include <streambuf> #include <ios> class ifdnstreambuf: public std::streambuf { public: ifdnstreambuf(int fd, unsigned bufsize = 1) : fd(fd), bufsize(bufsize), buffer(new char[bufsize]) { setg(buffer, buffer + bufsize, buffer + bufsize); } ~ifdnstreambuf() { delete buffer; // 19 } int underflow() { if (gptr() < egptr()) return *gptr(); int nread = read(fd, buffer, bufsize); if (nread <= 0) return EOF; setg(buffer, buffer, buffer + nread); // 34 return *gptr(); } std::streamsize xsgetn(char *dest, std::streamsize n) // 39 { int nread = 0; while (n) { if (!in_avail()) { if (underflow() == EOF) // 47 break; } int avail = in_avail(); if (avail > n) avail = n; memcpy(dest + nread, gptr(), avail);// 56 gbump(avail); // 57 nread += avail; n -= avail; } return nread; } // 64 private: int fd; unsigned bufsize; char* buffer; };With this class, please note:
ifdnstreambuf
object goed
out of scope.
underflow()
. The
input buffer pointers of streambuf
are now reset in accord with the actual
number of characters read from the device.
xsgetn()
is overridden. In a loop
n
is reduced until 0, at which point the function
terminates. Alternatively (line 47), the member returns if underflow()
fails to obtain more characters. In line 56 and 57 the reading optimazation
takes place: instead of calling
streambuf::sbumpc()
n
times, a block
of avail
characters is copied to the destination, using
streambuf::gpumb()
to consume avail
characters from the buffer using
one function call.
The member function xsgetn()
is called by
streambuf::sgetn()
,
which is a streambuf
member. The following example illustrates the use of
this member function with a ifdnstreambuf
object:
#include "ifdnbuf.h" #include <iostream> #include <istream> using namespace std; int main(int argc) { ifdnstreambuf fds(0, 30); // internally: 30 char buffer char buf[80]; // main() reads blocks of 80 // chars while (true) { unsigned n = fds.sgetn(buf, 80); if (n == 0) break; cout.write(buf, n); } }
19.1.2.3: Seeking positions in `streambuf' objects
When devices support seek operations, classes derived from
streambuf
should override
streambuf::seekoff()
and
streambuf::seekpos()
. In the
following class
ifdseek
, which is derived from our previously constructed
class
ifdstreambuf
, we assume that the device of which we have a
file descriptor available suports these seeking operations. The class
ifdseek
is derived from ifdstreambuf
, so it uses a character buffer of
just one character. The facilities to perform seek operations, which are added
to our new class ifdseek
will make sure that the input buffer is reset
when a seek operation is requested. The class could also be derived from the
class ifdnstreambuf
, in which case the arguments to reset the input
buffer must be adapted such that its second and third parameter point beyond
the available input buffer. Here is the class definition:
#include "ifdbuf.h" #include <unistd.h> class ifdseek: public ifdstreambuf { typedef std::streambuf::pos_type pos_type; // 6 typedef std::streambuf::off_type off_type; typedef std::ios::seekdir seekdir; typedef std::ios::openmode openmode; // 9 public: ifdseek(int fd) // 12 : ifdstreambuf(fd) {} pos_type // 17 seekoff(off_type offset, seekdir dir, openmode) { pos_type pos = lseek ( fd, offset, (dir == std::ios::beg) ? SEEK_SET : (dir == std::ios::cur) ? SEEK_CUR : SEEK_END ); if (pos < 0) return -1; setg(buffer, buffer + 1, buffer + 1); // 32 return pos; } pos_type // 36 seekpos(pos_type offset, openmode mode) { return seekoff(offset, std::ios::beg, mode); } };
streambuf
and
ios
classes were redefined to prevent us from specifying std::streambuf
and
std::ios
again and again when reffering to these types.
lseek()
is used to seek a new position
in a file whose
file descriptor is known.
streambuf
class will
reset its input buffer pointers to an empty buffer, so that underflow()
will refill the buffer at the next request for input.
seekpos
is overridden as well:
it is actually defined by a call to seekoff()
.
An example of a program using the class ifdseek
is the following. If
this program is given its own source file using input
redirection then
seeking is supported, and apart from the first line, every other line is shown
twice:
#include "fdinseek.h" #include <string> #include <iostream> #include <istream> #include <iomanip.h> using namespace std; int main(int argc) { ifdseek fds(0); istream is(&fds); string s; while (true) { if (!getline(is, s)) break; streampos pos = is.tellg(); cout << setw(5) << pos << ": `" << s << "'\n"; if (!getline(is, s)) break; streampos pos2 = is.tellg(); cout << setw(5) << pos2 << ": `" << s << "'\n"; if (!is.seekg(pos)) { cout << "Seek failed\n"; break; } } }
buffer is refilled
19.1.2.4: Multiple `unget()' calls in `streambuf' objects
As stated earlier
streambuf
classes and classes derived from streambuf
should at least support ungetting the character that was last read. Special
care must be taken when series of unget()
calls are to be supported. In
this section we will show how to construct a class which is derived from
streambuf
and which allows a configurable number of
istream::unget()
or
istream::putback()
calls. Support for multiple (say `n
')
unget()
calls is realized by setting aside an initial section of the input
buffer, which is then gradually filled up by the last n
characters read.
Here is an implementation of a class supporting series of unget()
calls:
#include <unistd.h> #include <streambuf> #include <ios> #include <algorithm> class fdunget: public std::streambuf { public: fdunget (int fd, unsigned bufsz, unsigned unget) // 9 : fd(fd), reserved(unget) // 12 { unsigned allocate = // 14 bufsz > reserved ? bufsz : reserved + 1; buffer = new char [allocate]; // 20 base = buffer + reserved; // 22 setg(base, base, base); // 23 bufsize = allocate - reserved; // 25 } ~fdunget() { delete buffer; } int underflow() { if (gptr() < egptr()) return *gptr(); unsigned consumed = // 39 static_cast<unsigned>(gptr() - eback()); unsigned move = std::min(consumed, reserved); // 41 memcpy(base - move, egptr() - move, move); // 43 int nread = read(fd, base, bufsize); // 45 if (nread <= 0) // none read -> return EOF; setg(base - move, base, base + nread); // 50 return *gptr(); } private: int fd; unsigned bufsize; unsigned reserved; char* buffer; char* base; };
unget
determines the size of a reserved
area,
which is the first reserved
bytes of the used buffer.
reserved
. So, a certain number of bytes will
be read, and once reserved
bytes have been read at least reserved
bytes can be unget.
base
, which is located reserved
bytes into the buffer
area. This is always the point where refills of the buffer start.
streambuf
's buffer pointers are set up. As no
characters have been read as yet, all pointers are set to base
. At the
next read request,
underflow()
will now know
that no characters are available, so unget()
will fail immediately.
underflow()
. Up to this
point everything is standard, and now we determine how many characters were
consumed. Remember that at line 23 we ascertained that the pointers indicated
0 bytes in the buffer, so consumed
is either 0 or the actual number of
bytes read into the buffer after a successful refill.
reserved
, but it is less if fewer
characters were available. So, move
is set to the minimum of the number of
consumed characters and reserved
.
base
.
base
, not from buffer
.
streambuf
's read buffer pointers are set up.
Eback()
is set to move
locations before
base
, thus defining the guaranteed unget-area,
gptr()
is set to base
, since that's the
location of the first read character after a refill, and
egptr()
is set just beyond the location of
the last read character.
Here is an example of a program illustrating the use of the class
fdunget
. The program reads at most 10 characters from the standard input,
stopping at EOF
. A guaranteed unget-buffer of 2 characters is defined,
in a buffer of 3 characters. Just before reading a character it is tried to
unget at most 6 characters. This is of course never possible, but the program
will neatly unget as many characters as possible, considering the actual
number of characters read:
#include "fdunget.h" #include <string> #include <iostream> #include <istream> using namespace std; int main(int argc) { fdunget fds(0, 3, 2); istream is(&fds); char c; for (int idx = 0; idx < 10; ++idx) { cout << "after reading " << idx << " characters:\n"; for (int ug = 0; ug <= 6; ++ug) { if (!is.unget()) { cout << "\tunget failed at attempt " << (ug + 1) << "\n" << "\trereading: '"; is.clear(); while (ug--) { is.get(c); cout << c; } cout << "'\n"; break; } } if (!is.get(c)) { cout << " reached\n"; break; } cout << "Next character: " << c << endl; } } /* Generated output after 'echo abcde | program': after reading 0 characters: unget failed at attempt 1 rereading: '' Next character: a after reading 1 characters: unget failed at attempt 2 rereading: 'a' Next character: b after reading 2 characters: unget failed at attempt 3 rereading: 'ab' Next character: c after reading 3 characters: unget failed at attempt 4 rereading: 'abc' Next character: d after reading 4 characters: unget failed at attempt 4 rereading: 'bcd' Next character: e after reading 5 characters: unget failed at attempt 4 rereading: 'cde' Next character: after reading 6 characters: unget failed at attempt 4 rereading: 'de ' reached */
form()
(and
scan()
) for stream objects. In this section the
construction of a class
oformstream
is described, which is derived from
ostream
and does support form()
and
vform()
. Analogously to oformstream
a class
iscanstream
can be
developed, supporting scan()
and
vscan()
. The contruction of this
latter class is left as an
exercise to the reader.
Here is the class interface and definition. It is annotated below:
#include <iostream> class oformstream: public std::ostream { public: oformstream(std::ostream &ostr) : std::ostream(ostr.rdbuf()) {} std::ostream &form(char const *fmt, ...); };
ostream
, so it
inherits ostream
's capabilities.
streambuf
, which could be obtained (as shown below) from another
ostream
object.
form()
is defined. As is customary for
this kind of function, it doesn't do the formatting itself, but delegates
this to a `v-function' companion, in this case vform()
.
vform()
is defined.
auto_ptr
is used to manage the dynamically
allocated buffer.
vsnprintf()
returns the number of bytes written to the buffer, not
counting the final
ASCII-Z
character.
ostream
base class, and the oformstream
object is
returned.
oformstream
is given in the next
example. It prints a well-known string and some marker text:
#include "oformstream.h" #include <memory> std::ostream &oformstream::form(char const *fmt, ...) { va_list list; va_start(list, fmt); char c; unsigned n = vsnprintf(&c, 0, fmt, list); std::auto_ptr<char> buf(new char[n + 1]); vsprintf(buf.get(), fmt, list); return *this << buf.get(); } using namespace std; int main() { oformstream of(cout); of << "Hello world\n"; cout << "Ok, "; of << "That's all, folks\n"; of.form("%s\n", "Hello world of C++") << endl; };
ios::rdbuf()
member
function. By assigning the streambuf
of a stream to another stream, both
stream objects access the same streambuf
, thus realizing redirection at
the level of the programming language itself.
It should be realized that this is fine within the context of the C++
program, but if that context is left, the redirection is terminated, as the
operating system does not know about streambuf
objects. This happens,
e.g., when a program uses a
system()
call to start a subprogram. The
following program uses C++ redirection to redirect the information
inserted into
cout
to a file, and then calls
system("echo hello world")to echo a well-known line of text. Since
echo
writes its information
to the standard output this would be the program's redirected file if
C++'s redirection would be recognized by the operating system. However,
this is not the case, so hello world
still appears at the program's
standard output and not on the redirected file. A solution of this problem
involves redirection at the operating system level, for which system calls
like
dup()
and
dup2()
are available in some operating systems (e.g.,
Unix and friends). Examples of the use of these system calls are given in
section 19.4.3 in this chapter.
Here is the example of the failing redirection after a C++ redirection:
#include <iostream> #include <fstream> #include <cstdlib> using namespace::std; int main() { ofstream of("outfile"); cout.rdbuf(of.rdbuf()); cout << "To the of stream" << endl; system("echo hello world"); cout << "To the of stream" << endl; } /* Generated output: on the file `outfile' To the of stream To the of stream On standard output: hello world */
fork()
system call is well
known. When a program needs to start a new process,
system()
can be used,
but this requires the program to wait for the
child process to
terminate. The more general way to spawn subprocesses is to call fork()
.
In this section we will see how C++ can be used to wrap a class around a
complex system class like fork()
. Much of what follows in this section
directly applies to the
Unix
operating system, and the discussion will
therefore focus on that operating system. However, other systems
usually provide comparable facilities.
When fork()
is called, the current program is duplicated in memory,
thus creating a new process, and both processes continue their execution just
below the fork()
system call. The two processes may, however, inspect the
return value of fork()
which is different in the original process (called
the
parent process) and in the newly created process (called the
child process):
fork()
returns the
process id of
the child process that was created by the fork()
system call. This is a
positive integer value.
fork()
returns 0.
fork()
fails, -1 is returned.
Fork
class should hide the bookkeeping details of a system
call like fork()
from its users. We will first construct a basic Fork
class doing just that. The class itself is derived from ForkExceptions
, a
class merely defining empty enum
types (see section 16.4.1),
representing several error conditions which may be encountered while using the
various Fork
classes. The ForkExceptions
class is defined as follows:
class ForkExceptions { public: enum bad_fork {}; enum bad_pipe {}; enum bad_redirection {}; enum bad_istream {}; enum bad_ostream {}; };
Then Fork
will be used as a
base class for other classes, derived from
Fork
, performing special functions. The
design consideration here is that the construction of a
derived class is appropriate if the nature of the task to be performed
by a the derived class is different from the tasks performed by existing
classes. This results in relative slim classes, whose names should provide
clear indications of what may be expected of them.
The Fork
class simply allows us to perform the fork()
system
call. Its main functionality is that fork()
throws an exception on
failure, and the definition of an enumeration indicating the process we're in
after the Fork::fork()
member has been called: Fork::CHILD
or
Fork::PARENT
. This process type is the return value of Fork::fork()
.
If fork()
fails, a Fork::bad_fork
exception is thrown,
bad_fork
being an
empty enum defined in the class ForkExceptions
. When a child
process is created, the parent process must be prepared to handle a
signal generated by the child process should the child process terminate
before the parent process terminates. If this situation occurs, and the parent
process does not catch the signal generated by the child process then the child
process remains visible as a
zombi process. To prevent zombi processes,
the Fork
class defines a default
signal handler, which merely catches
the signal generated by the child process and stores the exit value of the
process in the static variable childExitValue
.
The default signal handler is installed by the static function
Fork::defaultSignalHandler()
. If the default signal handler is considered
inappropriate, then an alternate signal handler can be designed and
installed. It is the
responsibility of the programmer to make sure a
SIGCHLD
signal handler has been activated. The default signal handler is:
#include "fork.H" void Fork::sig_child(int signalnumber) { wait(&childExitValue); signal(SIGCHLD, &sig_child); }
Here is the basic class Fork
, together with the implementation of the
Fork::fork()
member function. Note the
protected
nature of the
p_id
data member, which is thus made available for all classes derived
from Fork
:
#ifndef __FORK_H__ #define __FORK_H__ #include <sys/types.h> #include "forkexceptions.h" class Fork: public ForkExceptions { public: enum ProcessType { CHILD, PARENT }; ProcessType fork(); ProcessType processType() { return p_id ? PARENT : CHILD; } pid_t pid() { return p_id; } static void defaultSignalHandler(); static int childExitValue; protected: pid_t p_id; private: static void sig_child(int signalnumber); }; #endif
#include "fork.H" Fork::ProcessType Fork::fork() { p_id = ::fork(); if (p_id < 0) throw bad_fork(); return processType(); }
fork()
is to start a
child process. The parent process terminates immediately after spawning the
child process. This can be considered a natural event: in real life, parents
tend to die before their children die. If this happens in a program, the child
process continues to run, now as a child process of
init
, the always
running first process on Unix systems. Such a process is often called a
daemon, running as a
background process. Thus our next class is
Daemon
. Daemon
redefines fork()
to end the parent process,
allowing the child process to continue as child of init
. Since we know
that we're the child process after Daemon::fork()
, it can be a
void
function. Here is the class interface and the definition of
Daemon::fork()
:
#include "fork.h" class Daemon: public Fork { public: void fork(); };
#include "daemon.h" #include <cstdlib> void Daemon::fork() { if (Fork::fork() == PARENT) exit(0); // the parent dies }Here is an example illustrating the use of the
Daemon
class:
#include <iostream> #include <unistd.h> #include "daemon.h" using namespace std; int main() { Daemon daemon; daemon.fork(); sleep(5); // wait 5 seconds std::cout << "Hello from the child process\n"; return 0; } /* Generated output: The next command prompt, then after 5 seconds: Hello from the child process */;
procbuf
class. The class
IFork
is constructed which allows us to read information from the
IFork
object using, e.g., the
extraction operator. The extracted
information is made available by a function or external process that writes to
its
standard output.
The IFork::fork()
calls now comes in two flavors:
virtual
function IFork::child
can be extracted from the IFork
object. To use IFork::fork()
a class should be derived from IFork
,
overriding the IFork::child()
member.
char *const *
argument, the argument is
interpreted as an array of character strings, defining the name and optionally
arguments of an external process whose standard output can be extracted from
the IFork
object. If this variant is used, it is the
responsibility of the programmer to make sure that the last element of
the array that is passed as argument is 0 (zero).
The IFork
class interface has several special
characteristics, which will be discussed following the interface, which is
shown here:
#include <fstream> #include <unistd.h> #include "fork.h" class IFork: public Fork { public: typedef int (IFork::*memberPtr)() const; IFork(); virtual ~IFork(); operator std::istream &() const { return *istreamPtr; } void fork() { fork(&IFork::child); } void fork(char *const *argv); virtual int child() const; protected: void fork(memberPtr ptr); virtual void parentSetup(); virtual void childSetup(); private: int externalCommand() const { return execvp(argv[0], argv); } IFork(IFork const &other); // Not Implemented IFork &operator=(IFork const &other); // Not Implemented char *const *argv; std::ifstream *istreamPtr; int io[2]; };The interface has several characteristics deserving extra attention:
IFork
. To reduce
typing and clutter, the
typedef memberPtr
is defined to indicate a
pointer to a const
member function of IFork
, returning an int
.
Fork
and an ifstream
object. However, by the time the IFork
object is
constructed the stream from which information must be read is not yet known:
it will be a stream that is constructed around a
file descriptor.
Unfortunately it is not possible to
use
ifstream::open()
with a file descriptor. So, the ifstream
object will
have to be constructed later, and IFork()
can only be derived from the
Fork
base class.
IFork
object, we use a
conversion operator, so that we're able to
interpret the object as an
istream
. The conversion operator doesn't change
the IFork
object, so it can be a const
member function.
fork()
member function and the void fork(char *const
*argv)
are almost identical functions. So it was decided to put their
common code in another (
protected
) fork()
member, expecting a
memberPtr
as its argument: void fork(memberPtr ptr)
. At this point in
our discussion, there is no real need for protected members, but later on, in
section 19.4.5, we will use the protected members of IFork
.
Fork::fork()
calls fork(memberPtr)
using the address of int
IFork::child()
as its argument;, IFork::fork(char *const *argv)
stores
the value of its argument in IFork::argv
, and then calls the protected
IFork::fork(memberPtr)
member:
#include "ifork.H" void IFork::fork(char *const *arg) { argv = arg; fork(&IFork::externalCommand); }Now we have accomplished what we want: as the member function
externalCommand
and child
have identical prototypes, we can use their
addresses as arguments for IFork::fork(i(memberPtr))
(discussed below).
IFork
, we see that the class uses
an int io[2]
array. This array is used to set up the
communication
between the parent and child processes. This communication process is realized
using a
pipe. A pipe is a one-way channel through which information may
flow: one process writes information to the pipe, another process may read
information from the pipe.
The
pipe()
system call can be used to create a pipe, and it
will fill io[]
with two
file descriptors: io[0]
represents the
end of the pipe from which information may be read, io[1]
represents
the end to which information may be written. Constructing a pipe may fail: in
those cases a bad_pipe
exception will be thrown.
IFork::fork()
member forks, and calls
childSetup()
to set up the child part of the communication and
parentSetup()
to set up the parent part of the communication. The function
is constructed such that it can be called only once during the lifetime of an
IFork()
object. After the setup, the function to which member
points
is called. This is either child()
or externalCommand()
, which executes
the command defined in the elements of the argv
data member. Both
child()
and externalCommand()
should not return, but if they do, their
return values become the exit values of the child process. If an attempt is
made to call fork()
multiple times, bad_fork
is thrown. Here is its
code:
#include "ifork.H" void IFork::fork(memberPtr member) { if (istreamPtr) // already initialized ? throw bad_fork(); if (Fork::fork() == CHILD) { childSetup(); exit((this->*member)()); } parentSetup(); }
io
has been duplicated: one copy is available in
the parent process, the other copy in the child process. As the pipe is used
for the information flow from child to parent, the read end of the pipe
(io[0]
) in the child is not used and the write end of the pipe (io[1]
)
in the parent is not used. So they can be closed. The handling of the
io[]
array is done separately for the parent and the child process in,
respectively parentSetup()
and childSetup()
.
parentSetup()
the unused
writing end of the pipe is closed, and an ifstream
object is allocated
using the read section of the io
pipe:
#include "ifork.H" void IFork::parentSetup() { close(io[1]); // no writing done by the parent process istreamPtr = new std::ifstream(io[0]); if (!*istreamPtr) // construction failed ? throw bad_istream(); }This function, again, was given protected access rights for the benefit of the
IOFork
class, discussed in section 19.4.5. IOFork
will not
only use this member, but it will also
override it, so it is also defined
as a
virtual member function
.
childSetup()
will make sure that
information written to standard output will actually be written to the write
section of the pipe. This is realized using
redirection: information
written to cout
is eventually written to the
STDOUT_FILENO
file
descriptor. Using the
dup2()
system call the write section of the pipe is
`copied' to STDOUT_FILENO
. From now on, all information written by the
child process to cout
will eventually appear at the write section of the
pipe. Like parentSetup()
and for the same reasons, childSetup()
is
defined as a protected virtual member function.
child()
member function does not do anything
impressive. It can be overridden in a class derived from IFork
, as
shown below. The default implementation just writes its name to cout
:
#include "ifork.H" int IFork::child() const { std::cout << "IFork::child()\n"; return 0; }
IFork
objects. As they won't be used by the IFork
member functions
either, they will not be implemented.
IFork()
constructor creates the pipe, and initializes
istreamPtr
to zero:
#include "ifork.H" IFork::IFork() : istreamPtr(0) { if (pipe(io)) throw bad_pipe(); }
ifstream
and deletes
allocated memory:
#include "ifork.H" IFork::~IFork() { if (istreamPtr) { istreamPtr->close(); delete istreamPtr; } }An example illustrating the use of
IFork()
to call an external command
is:
#include <string> #include "ifork.h" int main() { IFork ifork; char *const argv[] = {"ls", "-Fla", 0}; ifork.fork(argv); string s; while (getline(ifork, s)) // shows a list of files std::cout << s << endl; }To use
IFork::child()
it should be overridden in a class that is
derived from IFork
. Here is an example:
#include <string> #include "ifork.h" class IF2: public IFork { public: int child() const { std::cout << "Hello from:\n" "IF2::child()\n"; } }; int main() { IF2 ifork; ifork.fork(); string s; while (getline(ifork, s)) std::cout << s << endl; }
OFork
class does the opposite from the IFork
class. Their
implementations are highly comparable. The implementation of OForm
is left
as an
exercise to the reader. The suggested class interface is:
#include <fstream> #include <unistd.h> #include "fork.h" class OFork: public Fork { public: OFork(); virtual ~OFork(); operator ostream &() const { return *ostreamPtr; } void fork() { fork(&OFork::child); } void fork(char *const *argv); virtual int child() const { return 0; } protected: void parentSetup(); void childSetup(); private: int externalCommand() const { return execvp(argv[0], argv); } void fork(int (OFork::*member)() const); OFork(OFork const &other); // NI OFork &operator=(OFork const &other); ofstream *ostreamPtr; char *const *argv; int io[2]; };
IFork
and OFork
classes, we primarily
realized a reconstruction of the facilities already provided by the
procbuf
class. One of the limiting characterstics of a procbuf
is that
its communication is always one-way: either information is read from the
external process, or it is written to an external process. The class
IOFork
, developed below tries to overcome that restriction by implementing
a
two-way communication between parent and child processes.
Since IOFork
combines the characteristics of IFork
and OFork
,
it can be constructed as a class using
multiple derivation: it is derived
from both IFork
and OFork
. This implies that on the one hand, some
members are available through inheritance, and on the other hand, that
ambiguity is introduced, because IOFork
harbors two Fork
objects
and because several members of IFork
and OFork
have identical
names. If this is considered impractical, then IOFork
could also be
constructed as a separate class. However, it is probably instructive to see
how the introduced ambiguity is handled here. To make matters concrete, we
show the final interface, annotating it below. Here is the interface:
#include "ifork.h" #include "ofork.h" class IOFork: public IFork, public OFork { public: void fork() { IFork::fork();// &IFork::child); } void fork(char *const *argv) { IFork::fork(argv); } private: void parentSetup(); void childSetup(); };
IFork
and OFork
. As these
base class members are unavailable (as
they are private), it is again not possible for applications to copy or assign
IOFork
objects using existing IOFork
objects.
istream&
and ostream &
inherit nicely from their
respective base classes, and require no further effort.
IFork
base class has several protected and
virtual members. IFork
was specifically designed with the use IOFork
can make of these features in mind. The IFork::fork(memberPtr)
member
function calls IFork::parentSetup()
and IFork::childSetup()
. As these
latter two functions are protected virtual members, they can be easily
overridden here. No ambiguity with OFork
's members occurs, since a
derived class is simply allowed to redefine members of its base
class. As the members are virtual in IFork
, the compiler also constructs
the redefined IOFork::parentSetup()
and IOFork::childSetup()
as
virtual members. Now, by calling IFork::fork(memberPtr)
from an IOFork
object, the IOFork::parentSetup()
and IOFork::childSetup()
members
will be used, rather than the IFork::parentSetup()
and
IFork::childSetup()
. This is what we want, as the IOFork
object must
do redirection of
standard input and redirection of
standard output
for its child process.
fork()
, we now call IFork::fork()
: this member function
will call IFork::child()
. Since IFork::child()
is also a virtual
function, we can derive a class from IOFork
in which a child()
member
function is defined. This latter function overrides IFork::child()
and
will consequently be called by IFork::fork()
. Analogously,
IOFork::fork(char *const *argv)
calls IFork::fork(char *const *argv)
.
IFork
section of IOFork
is doing the forking, the
Fork
in IFork
contains information about the
process id. To obtain
the child's process id with IOFork obj
call obj.IFork::pid()
.
Here is an example using a class that is derived from IOFork
:
#include <string> #include "iofork.h" class IOF2: public IOFork { public: int child() const { string s; while (getline(cin, s)) { cerr << "Child process: got `" << s << "'\n"; s = "Reveived: " + s; std::cout << s << endl; cerr << "Child process: sent `" << s << "'\n"; } return 0; } }; int main() { IOF2 obj; obj.fork(); string s; std::cout << "Child process id: " << obj.IFork::pid() << endl; while (getline(cin, s)) { obj << s << endl; std::cout << "Parent: sent `" << s << "'\n"; getline(obj, s); std::cout << "Parent: got `" << s << "'\n"; } } /* Example of generated output: Child process id: 32279 Hello world Parent: sent `Hello world' Child process: got `Hello world' Child process: sent `Reveived: Hello world' Parent: got `Reveived: Hello world' That's all, folks! Parent: sent `That's all, folks!' Child process: got `That's all, folks!' Child process: sent `Reveived: That's all, folks!' Parent: got `Reveived: That's all, folks!' */
For whatever reason, some operators are missing: there are no predefined
function objects corresponding to
bitwise operations. However, their
construction is, given the available predefined function objects, not
difficult. The following examples show a
template class implementing a
function object calling the
bitwise and (
operator&()
), and a template
class implementing a function object calling the
unary not
(
operator~()
). Other missing operators can be constructed similarly. This
is, however, left as an
exercise to the reader.
Here is the implementation of a function object calling the
bitwise
operator&()
:
#include <functional> template <typename _Tp> struct bit_and : public binary_function<_Tp,_Tp,_Tp> { _Tp operator()(const _Tp& __x, const _Tp& __y) const { return __x & __y; } };
Here is the implementation of a function object calling operator~()
:
#include <functional> template <typename _Tp> struct bit_not : public unary_function<_Tp,_Tp> { _Tp operator()(const _Tp& __x) const { return ~__x; } };
These and other
missing predefined function objects
are also implemented in the
file
bitfunctional
, which is found in one of the cplusplus
zip-archives.
An example using bit_and()
to remove all odd numbers from a vector of
int
values, using the
remove_if()
generic algorithm is:
#include <iostream> #include <algorithm> #include <vector> #include "bitand.h" int main() { vector<int> vi; for (int idx = 0; idx < 10; ++idx) vi.push_back(idx); copy ( vi.begin(), remove_if(vi.begin(), vi.end(), bind2nd(bit_and<int>(), 1)), ostream_iterator<int>(cout, " ") ); cout << endl; } /* Generated output: 0 2 4 6 8 */
Vector
, introduced in section
18.3 to contain a RandomAccessIterator reverse_iterator
.
First, we note that applying the ordinary iterators of the Vector
class
(made available by the begin()
and end()
member functions) to the
generic algorithms is without any problem, as the pointers that are returned
by these members are accepted by the generic algorithms:
#include <iostream> #include "vector.h" int main() { Vector<int> vi; for (int idx = 0; idx < 10; ++idx) vi.push_back(idx); random_shuffle(vi.begin(), vi.end()); copy(vi.begin(), vi.end(), ostream_iterator<int>(cout, " ")); cout << endl; } /* generated output, e.g., 1 7 9 3 5 6 0 4 8 2 */
However, the use of a reverse iterator is not without problems: the
reverse iterator shows rather complex behavior, in that it visits the elements
in its range backwards, resulting in a reversal of the meanings of the
operators that are used to step forward and step backward in a range. If a
reverse iterator rit
points to element 11 in a range, then it will point
to element 10, rather than 12 after ++rit
. Because of this complex
behavior, the reverse iterator will be implemented as a class
reverse_iterator
, which can do the required bookkeeping for us.
Furthermore, in the current context reverse_iterator
is a reverse
iterator only for the elements of the class Vector
. So, a
nested class seems to be appropriate here. Since Vector
has
already been defined, we will decide to leave it untouched. Instead of
modifying Vector
, a template class Vector2
is derived of it, and
Vector2
will be given its nested class reverse_iterator
. This latter
class can be derived from random_access_iterator
to realize a
RandomAccessIterator for the class Vector2
:
template <typename Type> class Vector2: public Vector<Type> { public: class reverse_iterator: public random_access_iterator<Type, ptrdiff_t> { friend class Vector2<Type>; // see the text, below ... }; ... };The
ptrdiff_t
type that is mentioned here is ordinarilly defined as an
int
(e.g., with the
Linux
operating system it is defined as a
__kernel_ptrdiff_t
in
types.h
, which is defined as an int
in
posix_types.h
).
The reverse_iterator
class is now defined as a nested template class,
derived from the template class random_access_iterator
. Its surrounding
class is Vector2
, which itself is a template class that was derived from
the template class Vector
.
Within reverse_iterator
we store two Type
pointers, called
begin
and end
, which will be used to define the reversed range
[end - 1, begin - 1)
. In the
private section of reverse_iterator
we declare:
Type *begin, *end;
Begin
and end
receive their initial values from the begin()
and end()
members of the Vector
object. However, the class
reverse_iterator
is closely tied to Vector2
: only Vector2
objects
should be allowed to construct their own reverse iterators. This policy is
enforced by making Vector2<Type>
a
bound friend of
reverse_iterator
, and by declaring its constructor private
. The
constructor expects the two Type
pointers, defining the range
[begin(), end())
. Internally begin() - 1
will be used as
endpoint of the reversed range, while end() - 1
will be used as the
beginning of the reversed range. The constructor, defined in the private
section of the reverse_iterator
class, is therefore:
reverse_iterator(Type *begin, Type *end) : begin(begin), end(end) {}In the implementation of
reverse_iterator
, end
will gradually walk
towards begin
, which will keep its value. The dereferencing operator
will not return *begin
, but rather end[-1]
: the element just before
the element to which end
points. All this is allowed, as
Vector::begin()
and Vecotr::end()
return random access iterators
themselves, for which
pointer arithmetic operations are defined.
Since the reverse_iterator
class declares Vector2<Type>
as a
friend, the Vector2
class can implement two members, rbegin()
and
rend()
returning reverse_iterator
objects:
reverse_iterator rbegin() { return reverse_iterator(begin(), end()); } reverse_iterator rend() { return reverse_iterator(begin(), begin()); }Apart from these two members, the
Vector2
class could define two
constructors, analogous to the constructors in the Vector
class. E.g.,
Vector2() : Vector<Type>() {}
In order to allow programs to use these reverse iterators, a public
copy constructor and overloaded
assignment operator is needed in the
class reverse_iterator
. Since the Type
pointers in
reverse_iterator
are merely used for pointing to existing information, a
destructor
is not required. Of course, this
implies that the usefulness of a reverse_iterator
terminates if the object
from which it was obtained goed
out of scope. It is the
responsibility of the programmer to make sure this doesn't cause
disaster. In the public section of reverse_iterator
we add the following
members (implementations given as inline code):
reverse_iterator(reverse_iterator const &other) : begin(other.begin), end(other.end) {} reverse_iterator &operator=(reverse_iterator const &other) { begin(other.begin); end(other.end); }The implementation shown below is fairly Spartan: illegal pointer values are not detected. As the generic algorithms do not require these checks we've left them out. Of course, one could modify the implementation and incorporate checks at several locations. It's up to the reader.
Random access iterators require us to implement a series of member
functions. Here they are, as they should appear in the
public section of
the reverse_iterator
nested class. Most implementations will be
self-explanatory, and consist of single lines of code.
Type &operator*() const { return end[-1]; // reference to the previous element } bool operator==(reverse_iterator const &other) const { return other.end == end; // comparing two iterators for equality } reverse_iterator &operator++() { --end; // increment is: to the previous element return *this; } reverse_iterator &operator--(); { ++end; // decrement is: to the next element return *this; } // remaining members: see the text bool operator<(reverse_iterator const &other) const; reverse_iterator operator++(int); reverse_iterator operator--(int); int operator-(reverse_iterator const &other) const; reverse_iterator operator+(int step) const; reverse_iterator operator-(int step) const;The following member deserve special attention:
bool operator<(reverse_iterator const &other) const
This function compares two iterators. Theoperator<()
must be interpreted as `the lvalue iterator points to an element earlier in the range than the rvalue iterator'. Since we're talking reversed iterators here, this means that the lvalue operator points to a later element than the rvalue operator in the normal iterator range. So, the overloaded operator should return true ifthis->end > other.end
, implying that the current iterator points to an element that is located earlier in the range than the other iterator. So, we implement:return end > other.end;
reverse_iterator operator++(int)
The postfix increment operator must return areverse_iterator
pointing to the current element, incrementing the pointer. Again, this implies thatend
is decremented. Since we have a constructor expecting two (normal) iterators,, the implementation again is a one-liner:return reverse_iterator(begin, end--);Using the postfix decrement withend
, areverse_iterator
object is returned representing the iterator's position at the time the postfix increment operator was called, incrementing (so, decrementingend
) the reversed iterator as a side effect.
reverse_iterator operator++(int)
Implemented analogously:return reverse_iterator(begin, end++);
int operator-(reverse_iterator const &other) const
This is the pointer arithmetic operator: it returns the difference between two pointers, representing the number of steps to make to reach the lvalue operand starting from the rvalue operand. Normally we would implement this asend - other.end
. However, since reversed iterators reduce their values to reach elements later in the range, the algebraic sign must be negated with reversed iterators:-(end - other.end)
, or, using simle algebraic manipulations:return other.end - end;
reverse_iterator operator-(int step) const
This operator allows us to use random staps backward using a
random access iterator. With a reversed iterator this means that we must
increase the value of the iterator. So we return the following
reverse_iterator
object:
return reverse_iterator(begin, end + step);
reverse_iterator operator+(int step) const
This operator does the opposite. So we implement it accordingly: return operator+(-step);For illustration purposes we show how to implement some of these member functions outside of the template class interface: these implementations could be included in the header file defining the
Vector2
interface, if the
corresponding member functions are only declared:
template <typename Type> // this is a template function // the class in which the Vector2<Type>::reverse_iterator<Type>:: // constructor is defined reverse_iterator(Type *begin, Type *end)// the constructor itself : begin(begin), end(end) {} template <typename Type> // this is a template function Vector2<Type>::reverse_iterator // this is its return type // the class in which the funcion Vector2<Type>::reverse_iterator<Type>:: // is defined operator+(int step) const // this is the function itself { return reverse_iterator(begin, end - step); }Finally, an example of a program using the
reversed_iterator
:
#include <iostream> #include "vector3.h" int main() { Vector2<int> vi; for (int idx = 0; idx < 10; ++idx) vi.push_back(idx); copy(vi.begin(), vi.end(), ostream_iterator<int>(cout, " ")); cout << endl; random_shuffle(vi.rbegin(), vi.rend()); copy(vi.begin(), vi.end(), ostream_iterator<int>(cout, " ")); cout << endl; sort(vi.rbegin(), vi.rend()); copy(vi.begin(), vi.end(), ostream_iterator<int>(cout, " ")); cout << endl; reverse(vi.rbegin(), vi.rend()); copy(vi.begin(), vi.end(), ostream_iterator<int>(cout, " ")); cout << endl; copy(vi.rbegin(), vi.rend(), ostream_iterator<int>(cout, " ")); cout << endl; } /* Generated output: 0 1 2 3 4 5 6 7 8 9 7 1 5 9 3 4 6 0 2 8 9 8 7 6 5 4 3 2 1 0 0 1 2 3 4 5 6 7 8 9 9 8 7 6 5 4 3 2 1 0 */
flex
and the
parser generator
bison
. Both
bison
and flex
are well documented elsewhere. The original
predecessors of bison
and flex
, called
yacc
and
lex
are
described in several books, e.g. in
O'Reilly's book `lex & yacc'.
However, scanner- and parser generators are also (and maybe even more
commonly, nowadays) available as free software. Both bison
and flex
can be obtained from
ftp://prep.ai.mit.edu/pub/gnu. Flex
will create a C++
class when called as flex++
, or when the
-+
flag
is used. With bison
the situation is a bit more complex. Scattered over
the Internet several bison++
archives can be found (e.g., in
ftp://ftp.rug.nl/contrib/frank/software/linux/bison/ or
rzbsdi01.uni-trier.de). The
information in these archives usually dates back to 1993, irrespective of the
version number mentioned with the archive itself.
Using flex++
and bison++
a class
-based scanner and parser can be
generated. The advantage of this approach is that the interface to the scanner
and the parser tends to become a bit cleaner than without the class
interface.
Below two examples are given. In the first example only a lexical scanner is
used to monitor the production of a file from several parts. This example
focuses on the lexical scanner, and on switching files while churning through
the information. The second example uses both a scanner and a parser to
transform standard arithmetic expressions to their postfix notation, commonly
encountered in code generated by compilers and in HP
-calculators. The
second example focuses on the parser.
#include
statement, which is
followed by a string indicating the file which should be included at the
location of the #include
.
In order to avoid complexities that have irrelevant to the current
example, the format of the #include
statement is restricted to the
form
#include <filepath>
.
The file specified between the pointed brackets should be available at
the location indicated by filepath
. If the file is not available,
the program should terminate using a proper error message.
The program is started with one or two filename arguments. If the program is
started with just one filename argument, the output is written to the
standard output stream cout
. Otherwise, the output is written to
the stream whose name is given as the program's second argument.
The program uses a maximum nesting depth. Once the maximum is exceeded, the program terminates with an appropriate error message. In that case, the filename stack indicating where which file was included should be printed.
One small extra feature is that
comment-lines should be recognized:
include
directives in comment-lines should be ignored, comment being the
standard C++ comment-types.
The program is created in the following steps:
lexer
is constructed, containing the
specifications of the input-language.
lexer
the requirements for the
class Scanner
evolve. The Scanner
class is a
wrapper around the
class
yyFlexLexer
generated by
flex++
. The requirements results in the
specification of the
interface for the class Scanner
.
main()
function is constructed. A Startup
object
is created to inspect the
command-line arguments. If successful,
the scanner
's member yylex()
is called to construct the
output file.
19.7.1.1: The flex++ specification file
The organization of the
lexical scanner specification file is similar
to the one used with flex
. However, flex++
creates a class
(
yyFlexLexer
) from which the class Scanner
will be derived.
The code associated with the
regular expression rules will be located
inside the class yyFlexLexer
. However, it would be handy to access the
member functions of the derived class within that code. Fortunately,
inheritance helps us to realize this. In the specification of the class
yyFlexLexer()
, we notice that the function
yylex()
is a
virtual
function. In the FlexLexer.h
header file we see virtual int yylex()
:
class yyFlexLexer: public FlexLexer { public: yyFlexLexer( istream* arg_yyin = 0, ostream* arg_yyout = 0 ); virtual ~yyFlexLexer(); void yy_switch_to_buffer( struct yy_buffer_state* new_buffer ); struct yy_buffer_state* yy_create_buffer( istream* s, int size ); void yy_delete_buffer( struct yy_buffer_state* b ); void yyrestart( istream* s ); virtual int yylex(); virtual void switch_streams( istream* new_in, ostream* new_out ); };Consequently, if
yylex()
is defined in a derived class, then this
function of the derived class will be called from a
base class (i.e.,
yyFlexLexer
) pointer. Since the yylex()
function of the
derived class is called, that function will have access to the members
of its class, and to the public and protected members of its base class.
The context in which the generated scanner is placed is (by
default)
the function
yyFlexLexer::yylex()
. However, this context can be changed by
defining the
YY_DECL
-macro. This macro, if defined, determines the context
in which the generated scanner will be placed. So, in order to make the
generated scanner part of the derived class function yylex()
, three
things must be done:
YY_DECL
must be defined in the lexer specficiation
file. It must define the derived class function yylex()
as the scanner
function. For example:
#define YY_DECL int Scanner::yylex()
yylex()
must be declared in the class definition of
the derived class.
yyFlexLexer::yylex()
is a virtual function, it
must still be defined. It is not called, though, so its definition may be a
simple
int yyFlexLexer::yylex() { return 0; }The definition of the
YY_DECL
macro and the yyFlexLexer::yylex()
function can conveniently be placed in the lexer specification file, as shown
below.
Looking at the regular expressions themselves, notice that we'll need rules
for the recognition of the comment, for the
#include
directive, and for the remaining characters. This is all fairly
standard practice. When an #include
directive is detected, the directive
is parsed by the scanner. This is
common practice:
preprocessor directives are normally not
analyzed by a
parser, but by the
lexical scanner. The scanner uses a
mini scanner to extract the filename from the directive, throwing an
Scanner::Error
value (invalidInclude
) if this fails. If the filename
could be extracted, it is stored in nextSource
, and on completion of the
reading of the #include
directive switchSource()
is called to perform
the switch to another file.
When the end of the file (EOF
) is detected, the derived class'
member function popSource()
is called, which will pop the previously
pushed file, returning
true
. Once the file-stack is empty, the function
will return
false
, resulting in the call of yyterminate()
, which will
terminate the scanner.
The
lexical scanner specification file has three sections: a C++
preamble, containing code which can be used in the code defining the
actions to be performed once a regular expression is matched, a Flex++
symbol area, which is used for the definition of symbols, like a
mini scanner, or
options, like
%option yylineno
when the
lexical scanner should keep track of the
line numbers of the files it is
scanning and, finally a
rules section, in which the regular expressions
and their actions are given. In the current example, the main task of the
lexer is to
copy information from the
istream
*yyin
to the
ostream
*yyout
, for which the predefined macro
ECHO
can
be used. The complete and annotated lexical scanner specification file to be
used with
flex++
is:
%{ /* ---------------------------------------------------------------------------- C++ -preamble. Include header files, other than those generated by flex++ and bison++. E.g., include the interface to the class derived from yyFlexLexer ----------------------------------------------------------------------------*/ // the yylex() function that's actually // used #define YY_DECL int Scanner::yylex() #include "scanner.H" // The interface of the derived class int yyFlexLexer::yylex() // not called: overruled by { // Scanner::yylex() return 0; } %} /* ---------------------------------------------------------------------------- Flex++ symbol area ~~~~~~~~~~~~~~~~~~ The symbols mentioned here are used for defining e.g., a mini scanner ---------------------------------------------------------------------------- */ %x comment %x include %option yylineno eolnComment "//".* anyChar .|\n /* ---------------------------------------------------------------------------- Flex rules area: ~~~~~~~~~~~~~~~~ Regular expressions below here define what the lexer will recognize. ---------------------------------------------------------------------------- */ %% /* The comment-rules: comment lines are ignored. */ {eolnComment} "/*" BEGIN comment; <comment>{anyChar} <comment>"*/" BEGIN INITIAL; /* File switching: #include <filepath> */ #include[ \t]+"<" BEGIN include; <include>[^ \t>]+ nextSource = yytext; <include>">"[ \t]*\n { BEGIN INITIAL; performSwitch(); } <include>{anyChar} throw invalidInclude; /* The default rules: eating all the rest, echoing it to output */ {anyChar} ECHO; /* The <<EOF>>)rule: pop a pushed file, or terminate the lexer */ <<EOF>> { if (!popSource()) yyterminate(); }Since the derived class is able to access the information stored within the lexical scanner itself (it can even access the information directly, since the data members of
yyFlexLexer
are
protected, and thus
accessible to derived classes), very much processing can be done by the
derived class' member functions. This results in a very clean setup of the
lexer specification file, which requires hardly any code in the
preamble.
19.7.1.2: The derived class: Scanner
The class Scanner
is derived from the class
yyFlexLexer
, generated by
flex++
. The derived class has access to the data controlled by the lexical
scanner. In particular, the derived class has access to the following data
members:
char *yytext
: contains the
text matched by
a
regular expression (accessible outside of the scanner from the
YYText()
member);
int yyleng
: the
length of the text
in yytext
(accessible outside of the scanner from the
YYLeng()
member);
int yylineno
: the current
line number (only if
%option yylineo
was specified in the lexer specfication file,
accessible outside of the scanner from the
lineno()
member);
Other members are available as well, but we feel they are less often
used. Details can be found in the file
FlexLexer.h
, which is part of the
flex
distribution.
The class Scanner
performs two tasks: It pushes file information about the
current file to a
file stack, and pops the information pushed last once
EOF
is detected in a file.
Several member functions are needed to accomplish these tasks. As they are
auxiliary to the scanner, they are
private members. In practice, these
private members are developed once the need for them arises. In the following
interface of the Scanner
class the final
header file is given. Note
that, apart from the private member functions, several private data members
are used as well. These members are initialized by the Scanner()
constructor and are used in the private member functions. They are
discussed below, in the context of the member functions that are using
them. Here is the scanner.h
header file:
#include <FlexLexer.h> // provides yyFlexLexer interface #include <fstream> #include <string> #include <stack> #include <vector> using namespace std; // FBB: needed for g++ >= 3.0 class Scanner: public yyFlexLexer { public: enum Error { invalidInclude, circularInclusion, nestingTooDeep, cantRead, }; Scanner(istream *yyin, string const &initialName); string const &lastFile() { return fileName.back(); } void stackTrace(); // dumps filename stack to cerr int yylex(); // overruling yyFlexLexer's yylex() private: Scanner(Scanner const &other); // no Scanner copy-initialization Scanner &operator=(Scanner const &other); // no assignment either bool popSource(); // true: source popped void performSwitch(); void throwOnCircularInclusion(); stack<yy_buffer_state *> state; vector<string> fileName; string nextSource; static unsigned const sizeof_buffer = 16384, maxDepth = 10; };If the scanner can extract a filename from an
#include
directive, a
switch to another file is performed by performSwitch()
. If the filename
could not be extracted, the scanner throws an invalidInclude
exception
value. The performSwitch()
member and the matching function
popSource()
handle the file switch. In particular, note that
yylineno
is not updated when a
file switch is performed. If line numbers are to be
monitored, then the current value of yylineno
should be pushed on a stack,
and yylineno
should be reset by performSwitch()
, while popSource()
should reinstate a former value of yylineno
by popping a previous value
from the stack. In the current implementation of Scanner
a simple
stack
of
yy_buffer_state
pointers is maintained. Changing that into a stack of
pair<yy_buffer_state *, unsigned>
elements allows you to save
the line numbers. This modification is left as an
exercise to the reader.
As mentioned. performSwitch()
performs the actual file-switching. The
yyFlexLexer
class provides a series of member functions that can be used
for file switching purposes. The file-switching capability of a
yyFlexLexer
object is founded on the struct yy_buffer_state
,
containing the state of the scan-buffer of the file that is currently scanned
by the lexical scanner. This buffer is pushed on the state
stack when an
#include
is encountered. Then this buffer is replaced by the buffer that
will be created for the file that is mentioned in the #include
directive.
The actual
file switching is realized as follows:
include
-nesting is inspected. If
maxDepth
is reached, the stack is considered full, and the scanner throws
a nestingTooDeep
exception.
fileName
vector is inspected, to avoid
circular inclusions. If nextSource
is encountered in the
fileName
vector, the inclusion is refused, and the scanner throws a
circularInclusion
exception. The member function
throwOnCircularInclusion()
is called for this task.
ifstream
object is created, for the filename
in nextSource
. If this fails, the scanner throws a cantRead
exception.
yy_buffer_state
is created for the newly opened
stream, and the lexical scanner is instructed to switch to that
stream using yyFlexLexer
's member function
yy_switch_to_buffer()
.
The sources for the member functions performSwitch()
and
throwOnCircularInclusion()
are:
#include "scanner.H" void Scanner::performSwitch() { if (state.size() == maxDepth) throw nestingTooDeep; fileName.push_back(nextSource); throwOnCircularInclusion(); ifstream *newStream = new ifstream(nextSource.c_str()); if (!*newStream) throw cantRead; state.push(yy_current_buffer); yy_switch_to_buffer(yy_create_buffer(newStream, sizeof_buffer)); }
#include "scanner.H" void Scanner::throwOnCircularInclusion() { vector<string>::iterator it = find(fileName.begin(), fileName.end(), nextSource); if (it != fileName.end()) throw circularInclusion; }The member function
popSource()
is called to pop the previously pushed
buffer from the stack, to continue its scan just beyond the just processed
#include
directive. The popSource()
function first inspects
the size of the state
stack: if empty, false
is returned and the
fuction terminates. if not empty, then the current buffer is
deleted, to be replaced by the state waiting on top of the stack. The file
switch is performed by the yyFlexLexer
members
yy_delete_buffer()
and
yy_switch_to_buffer
. Note that yy_delete_buffer()
takes care of the
closing of the ifstream
and of deleting the memory allocated for this
stream in performSwitch()
. Furthermore, the filename that was last entered
in the fileName
vector is removed. Having done all this, the function
returns true
:
#include "scanner.H" bool Scanner::popSource() { if (state.empty()) return false; yy_delete_buffer(yy_current_buffer); yy_switch_to_buffer(state.top()); state.pop(); fileName.pop_back(); return true; }These functions complete the implementation of the complete lexical scanner. the lexical scanner itself is stored in the
Scanner::yylex()
function. The Scanner
object itself only has three public member
functions and a constructor. One member function is yylex()
, the other two
allow the called to dump a stack trace and to obtain the name of the file that
was last processed by the scanner. The constructor is a simple piece of
code. Here is its source:
#include "scanner.h" Scanner::Scanner(istream *yyin, string const &initialName) { switch_streams(yyin, yyout); fileName.push_back(initialName); }
The main program is very simple. As the program expects a filename to start
the scanning process at, initially the number of arguments is checked. If at
least one argument was given, then a ifstream
object is created. If this
object can be created, then a Scanner
object is constructed, receiving the
address of the ifstream
object and the name of the file as its
arguments. Then the yylex()
member function of the Scanner
object is
called. This function is inherited from the Scanner
's base class
yyFlexLexer
. If anything fails inside the scanner, an exception is thrown,
which is caught at the end of the main()
function:
/* lexer.cc A C++ main()-frame generated by C++ for lexer.cc */ #include "lexer.h" /* program header file */ int main(int argc, char **argv) { if (argc == 1) { cerr << "Filename argument required\n"; exit (1); } ifstream yyin(argv[1]); if (!yyin) { cerr << "Can't read " << argv[1] << endl; exit(1); } Scanner scanner(&yyin, argv[1]); try { scanner.yylex(); } catch (Scanner::Error err) { char const *msg[] = { "Include specification", "Circular Include", "Nesting", "Read", }; cerr << msg[err] << " error in " << scanner.lastFile() << ", line " << scanner.lineno() << endl; scanner.stackTrace(); return 1; } }
19.7.1.4: Building the scanner-program
The final program is constructed in two steps. These steps are given
for a
Unix system, on which
flex++
and the
Gnu C++
compiler
g++
have been installed:
flex++
. For
this the command
flex++ lexer
can be given.
libfl.a
library. The appropriate command here is
g++ -o scanner *.cc -lfl
For the purpose of
debugging a lexical scanner the rules that are
matched and the tokens that are returned are useful information. When
flex++
is called with the
-d
flag, debugging code will be part of the
generated scanner. Apart from that, the debugging code must be
activated. Assuming the scanner object is called scanner
, the statement
scanner.set_debug(1);
must be given following the construction of the scanner
object.
Starting point for a program using both a parser and a scanner is the grammar: the grammar is specified first. This results in a set of tokens which can be returned by the lexical scanner (commonly called the lexer). Finally, auxiliary code is provided to `fill in the blanks': the actions which are performed by the parser and the lexer are not normally specified in the grammatical rules or lexical regular expressions, but are executed by functions, which are called from within the parser's rules or associated with the lexer's regular expressions.
In the previous section we've seen an example of a C++ class generated by
flex++
. In the current section we concern ourselves with the parser. The
parser can be generated from a grammar specified for the program
bison++
. The specification of bison++
is similar to the specifications
required for
bison
, but a class is generated, rather than a single
function. In the next sections we'll develop a program converting
infix expressions, in which binary operators are written between
their operands, to
postfix expressions, in which binary operators are
written behind their operands. A comparable situation holds true for the
unary operators -
and +
: We ignore the +
operator, but the
-
will beconverted to a
unary minus.
Our calculator will recognize a minimal set of operators: multiplication, addition, parentheses, and the unary minus. We'll distinguish real numbers from integers, to illustrate a subtlety in the bison-like grammar specifications, but that's about it: the purpose of this section, after all, is to illustrate a C++ program, using a parser and a lexer, and not to construct a full-fledged calculator.
In the next few sections we'll develop the grammar in a bison++
specification file. Then, the regular expressions for the scanner are
specified according to the requirements of flex++
. Finally the program is
constructed.
The class-generating bison software (bison++) is not widely available. The version used by us is 2.20. It can be obtained at ftp://ftp.rug.nl/contrib/frank/software/linux/bison/bison++2.20.tar.gz.
19.7.2.1: The bison++ specification file
The
grammar specification file used with
bison++
is comparable to the
specification file used with
bison
. Differences are related to the class
nature of the resulting parser. Our calculator will distinguish real numbers
from integers, and will support the basic set of arithmetic operators.
The bison++
specification file consists of the following sections:
bison
. The difference being the
%header{...}
opening. In this section we'll encounter mainly
declarations: header files are included, and the
yyFlexLexer
object is
declared.
bison++
has
several extra items that can be declared here. They are important and warrant
a section of their own.
bison
program.
19.7.2.2: The bison++ token section
The token section contains all the tokens that are used in the grammar, as well as the priority rules as used for the mathematical operators. Moreover, several new items can be declared here:
%name ParserName
. The name ParserName
will be
the name of the parser's class. This entry should be the first entry of the
token section. It is used in cases where
multiple grammars are used, to
make sure that the different parser-classes use unique identifiers. By
default the name
parse
is used.
%define name content
. The %define
has the same
function as the #define
statement for the C++
preprocessor. It can be
used to define, e.g., a
macro. Internally, the defined symbol will be the
concatenation of
YY_
, the parser's class name, and the name of the
macro. E.g.,
YY_ParserName_name
Several symbols will normally be defined here. Of all the definitions that can be given here, the following two are required:
%define LEX_BODY { inline-code }
:
here the body of the call to the
lexer is defined. It can be defined as
= 0
for an abstract parser-class, but otherwise it must contain the code
(including surrounding curly braces) representing the call to the lexer. For
example, if the lexer object generated by flex++
is called lexer
, this
declaration should be
%define LEX_BODY { return lexer.yylex(); }
%define ERROR_BODY { inline-code }
:
similarly, the body of the code of the call to the error-function is defined
here. It can be defined as = 0
, in which case the parser's class will
again become abstract. Otherwise, it is used to specify the inner workings of
the error function, including surrounding braces. E.g.,
%define ERROR_BODY { cerr << "syntax Error\n"; }
When the
LEX_BODY
and
ERROR_BODY
definitions are omitted, then the
compiler is not able to complete the
virtual table of the parser class, and
the linking phase will report an error like
undefined reference to `Parser virtual table'
The remaining symbols are optional, and can be (re)defined as needed:
%define DEBUG 1
: if non-0 debugging
code will be included in the parser's source.
%define ERROR_VERBOSE
: if
defined, the parser's stack will be dumped when an error occurs.
%define LVAL yylval
: the default
variable name is shown here: the variable name containing the parser's
semantic value is by
default
yylval
, but its name may be redefined
here.
%define INHERIT : public ClassA,
public ClassB
: the
inheritance list for the parser's class. Note that it
starts with a
colon. The space character between INHERIT
and the colon may be omitted. The %define
itself should be omitted if the
parser class is not derived from another class.
%define MEMBERS member-prototypes
: if
the parser
contains extra members, they must be declared
here. Note that there is only one %define MEMBERS
definition allowed. So,
if multiple members are to be declared, they must all be declared at this
point. To prevent
very long lines in
the specification file, the \ (
backslash) can be used at the end of a line,
to indicate that it continues on the next line of the source-text. E.g.,
%define MEMBERS void lookup(); \ void lookdown();The
MEMBERS
section starts as a
public
section. If
private
members are required too, a private:
directive can be part of the
MEMBERS
section.
%defines
can be used:
%define
CONSTRUCTOR_PARAM parameterlist
: this defines the parameterlist for the
parser's constructor. Here the types and names of the parameters of the
parser's constructor should be given. The surrounding
parentheses of the
parameter list are not part of the CONSTRUCTOR_PARAM
definition.
%define CONSTRUCTOR_INIT
: initializer(s)
: this defines the
base class and
member initializers of the constructor. Note the
initial
colon following CONSTRUCTOR_INIT
, which is required. The colon
may be given immediately after the CONSTRUCOR_INIT
statement, or blanks
may be used to separate the symbol from the colon.
%define CONSTRUCTOR_CODE
{ code }
: this defines the code of the parser's constructor (including
surrounding curly braces).
When the parser doesn't need special effects, a constructor
will not be needed. In those cases the parser can be created as
follows (using the default parser-name):
parse parser;
%union
. This starts the definition of the
semantical value union. It replaces the
#define YYSTYPE
definition seen
with bison
. An example of a %union
declaration is
%union { int i; double d; };The union cannot contain objects as its fields, as constructors cannot be called when a union is created. This means that a
string
cannot be a member of the
union. A string *
, however, is a possible union member. As a side
line: the
lexical scanner does not have to know about this union. The
scanner can simply pass its scanned text to the parser through its
YYText()
member function. For example, a
statement like:
$$.i = atoi(scanner.YYText());
can be used to convert matched text to a value of an appropriate type.
$$
or the
generic return values $1, $2
, etc, that are associated with components of
rules can be used, rather than $$.i, $3.d
, etc. To associate a
non-terminal or a token with a union field, the
<fieldname>
specification is
used. E.g.,
%token <i> INT // token association (deprecated) <d> DOUBLE %type <i> intExpr // non-terminal associationIn this example, note that both the tokens and the non-terminals can be associated with a field of the union. However, as noted earlier, the lexical scanner does not have to know about all this. In our opinion, it is cleaner to let the scanner do just one thing: scan texts. The parser knows what the input is all about, and may convert strings like
"123"
to an integer
value. Consequently, we are discouraging the association of a union field and
a
token. In the upcoming
description of the rules of the grammar this will be further illustrated.
%union
discussion the
%token
and
%type
specifications should be noted. They are used for
the specficiation of the tokens (
terminal symbols) that can be returned by
the lexical scanner, and for the specification of the return types of
non-terminals. Apart from %token
the
token indicators
%left
,
%right
and
%nonassoc
may be used to specify the
associativity of operators. The tokens
mentioned at these indicators are interpreted as tokens indicating operators,
associating in the indicated direction. The
precedence of operators is given
by their order: the first specification has the lowest priority. To overrule a
certain precedence in a certain context,
%prec
can be
used. As all this is standard bison
practice, it isn't further discussed
in this context. The documentation provided with the bison
distribution
should be consulted for further reference.
19.7.2.3: The bison++ grammar rules
The rules and actions of the grammar are specified as usual. The grammar for our little calculator is given below. A lot of rules, but they illustrate the use of non-terminals associated with value-types.
lines: lines line | line ; line: intExpr '\n' { cerr << "int: " << $1 << endl; } | doubleExpr '\n' { cerr << "double: " << $1 << endl; } | '\n' { cout << "Good bye\n"; YYACCEPT; } | error '\n' ; intExpr: intExpr '*' intExpr { $$ = $1 * $3; } | intExpr '+' intExpr { $$ = $1 + $3; } | '(' intExpr ')' { $$ = $2; } | '-' intExpr %prec UnaryMinus { $$ = -$2; } | INT { $$ = atoi(lexer.YYText()); } ; doubleExpr: doubleExpr '*' doubleExpr { $$ = $1 * $3; } | doubleExpr '+' doubleExpr { $$ = $1 + $3; } | doubleExpr '*' intExpr { $$ = $1 * $3; } | doubleExpr '+' intExpr { $$ = $1 + $3; } | intExpr '*' doubleExpr { $$ = $1 * $3; } | intExpr '+' doubleExpr { $$ = $1 + $3; } | '(' doubleExpr ')' { $$ = $2; } | '-' doubleExpr %prec UnaryMinus { $$ = -$2; } | DOUBLE { $$ = atof(lexer.YYText()); } ;Using these rules a simple calculator is defined in which integer and real values can be negated, added, and multiplied, and in which standard priority rules can be circumvented using parentheses. The rules show the use of typed nonterminal symbols:
doubleExpr
is linked to real (double)
values, intExpr
is linked to integer values. Precedence and type
association is defined in the token section of the parser specification file,
which is:
%name Parser %union { int i; double d; }; %token INT DOUBLE %type <i> intExpr <d> doubleExpr %left '+' %left '*' %right UnaryMinus %define MEMBERS \ virtual ~Parser() {} \ private: \ yyFlexLexer lexer; %define LEX_BODY {return lexer.yylex();} %define ERROR_BODY { cerr << "error encountered\n"; }In the token section we see the use of the
%type
specifiers,
connecting intExpr
to the i
-field of the semantic-value union, and
connecting doubleExpr
to the d
-field. At first sight it looks
complex, since the expression rules must be included for each individual
return type. On the other hand, if the union itself would have been used, we
would have had to specify somewhere in the returned semantic values what field
to use: less rules, but more complex and error-prone code.
Also, note that the lexical scanner is included as a member of the parser. There is no need to define the scanner outside of the parser, as it's not used outside of the parser object. The virtual destructor is included as an member to prevent the compiler from complaining about the parser having a non-virtual destructor.
19.7.2.4: The flex++ specification file
The flex-specification file that is used by our calculator is
simple: blanks are skipped, single characters are returned, and
numerical values are returned as either Parser::INT
or
Parser::DOUBLE
tokens. Here is the complete flex++
specification file:
%{ #include "parser.h" %} %option yylineno %% [ \t] ; [0-9]+ return(Parser::INT); "."[0-9]* | [0-9]+("."[0-9]*)? return(Parser::DOUBLE); exit | quit return (Parser::DONE); .|\n return (*yytext);
19.7.2.5: The generation of the code
The code
is generated in the
same way as with bison
and flex
. To order bison++
to generate the
files parser.cc
and parser.h
, give the command:
bison++ -d -o parser.cc parser
Next,
flex++
will generate code on
lexer.cc
using the command
flex++ -I -olexer.cc lexer
Note here that flex++
expects no blanks between the -o
flag
and lexer.cc
.
On Unix systems, linking and compiling the generated sources and the source for the main program (given below) is realized by the command:
g++ -o calc -Wall *.cc -lfl -s
Note the fact that the
libfl.a
library is mentioned here. If it's not
mentioned, the
linker will complain about
unresolved references like
yywrap()
.
A source in which the main()
function, the lexical
scanner and the parser objects are defined is, finally:
#include "parser.h" int main() { Parser parser; cout << "Enter (nested) expressions containing ints, doubles, *, + and " "unary -\n" "operators. Enter an empty line, exit or quit to exit.\n"; parser.yyparse(); return (0); }