A typical C library contains a
struct and some
associated
functions to act on that struct. So far,
you've seen how C++ takes functions that
are conceptually associated and makes them literally associated by
putting the function declarations inside
the scope of the struct, changing the way functions are called for the
struct, eliminating the passing of the structure address as the first
argument, and adding a new type name to the program (so you don’t have to
create a typedef for the struct tag).
These are all convenient – they
help you organize your code and make it easier to write and read. However, there
are other important issues when making libraries easier in C++, especially the
issues of safety and control. This chapter looks at the subject of boundaries in
structures.
In any relationship it’s important
to have boundaries that are respected by all parties involved. When you create a
library, you establish a relationship with the client
programmer who uses that
library to build an application or another library.
In a C
struct, as with most
things in C, there are no rules. Client programmers can do anything they want
with that struct, and there’s no way to force any particular
behaviors. For example, even though you saw in the last chapter the importance
of the functions named initialize( ) and cleanup( ), the
client programmer has the option not to call those functions. (We’ll look
at a better approach in the next chapter.) And even though you would really
prefer that the client programmer not directly manipulate some of the members of
your struct, in C there’s no way to prevent it. Everything’s
naked to the world.
There are two reasons for controlling
access
to
members. The first is to keep the client programmer’s hands off tools they
shouldn’t touch, tools that are necessary for the internal machinations of
the data type, but not part of the interface the client programmer needs to
solve their particular problems. This is actually a service to client
programmers because they can easily see what’s important to them and what
they can ignore.
The second reason for access control is
to allow the library designer to change the internal workings of the structure
without worrying about how it will affect the client programmer. In the
Stack example in the last chapter, you might want to allocate the storage
in big chunks, for speed, rather than creating new storage each time an element
is added. If the interface and implementation are clearly separated and
protected, you can accomplish this and require only a relink by the client
programmer.
C++ introduces three new keywords to set
the boundaries in a structure: public, private, and
protected. Their use and meaning are remarkably straightforward. These
access specifiers
are used
only in a structure declaration, and they change the boundary for all the
declarations that follow them. Whenever you use an access specifier, it must be
followed by a colon.
public
means all member declarations that follow are available
to everyone. public members are like struct members. For example,
the following struct declarations are identical:
//: C05:Public.cpp // Public is just like C's struct struct A { int i; char j; float f; void func(); }; void A::func() {} struct B { public: int i; char j; float f; void func(); }; void B::func() {} int main() { A a; B b; a.i = b.i = 1; a.j = b.j = 'c'; a.f = b.f = 3.14159; a.func(); b.func(); } ///:~
The
private keyword, on the
other hand, means that no one can access that member except you, the creator of
the type, inside function members of that type. private is a brick wall
between you and the client programmer; if someone tries to access a
private member, they’ll get a compile-time error. In struct
B in the example above, you may want to make portions of the representation
(that is, the data members) hidden, accessible only to you:
//: C05:Private.cpp // Setting the boundary struct B { private: char j; float f; public: int i; void func(); }; void B::func() { i = 0; j = '0'; f = 0.0; }; int main() { B b; b.i = 1; // OK, public //! b.j = '1'; // Illegal, private //! b.f = 1.0; // Illegal, private } ///:~
Although func( ) can access
any member of B (because func( ) is a member of B,
thus automatically granting it permission), an ordinary global function like
main( ) cannot. Of course, neither can member functions of other
structures. Only the functions that are clearly stated in the structure
declaration (the “contract”) can have access to private
members.
There is no required order for access
specifiers,
and they may appear more than once. They affect all the members declared after
them and before the next access
specifier.
The last access specifier is
protected.
protected acts just like private, with one exception that we
can’t really talk about right now: “Inherited” structures
(which cannot access private members) are granted access to
protected members. This will become clearer in Chapter 14 when
inheritance is introduced. For current purposes, consider protected to
be just like
private.
What if you want to explicitly grant
access to a function that isn’t a member of the current structure? This is
accomplished by declaring that function a friend
inside the structure declaration. It’s
important that the friend declaration occurs inside the structure
declaration because you (and the compiler) must be able to read the structure
declaration and see every rule about the size and behavior of that data type.
And a very important rule in any relationship is, “Who can access my
private implementation?”
The class controls which code has access
to its members. There’s no magic way to “break in” from the
outside if you aren’t a friend; you can’t declare a new class
and say, “Hi, I’m a friend of Bob!” and expect to see
the private and protected members of Bob.
You can declare a global function as a
friend,
and you can also declare a member function of another
structure,
or even an entire structure, as a friend. Here’s an example
:
//: C05:Friend.cpp // Friend allows special access // Declaration (incomplete type specification): struct X; struct Y { void f(X*); }; struct X { // Definition private: int i; public: void initialize(); friend void g(X*, int); // Global friend friend void Y::f(X*); // Struct member friend friend struct Z; // Entire struct is a friend friend void h(); }; void X::initialize() { i = 0; } void g(X* x, int i) { x->i = i; } void Y::f(X* x) { x->i = 47; } struct Z { private: int j; public: void initialize(); void g(X* x); }; void Z::initialize() { j = 99; } void Z::g(X* x) { x->i += j; } void h() { X x; x.i = 100; // Direct data manipulation } int main() { X x; Z z; z.g(&x); } ///:~
struct Y has a member function
f( ) that will modify an object of type X. This is a bit of a
conundrum because the C++ compiler requires you to declare everything before you
can refer to it, so struct Y must be declared before its member
Y::f(X*) can be declared as a friend in struct X. But for
Y::f(X*) to be declared, struct X must be declared
first!
Here’s the solution. Notice that
Y::f(X*) takes the address of an X
object. This is critical because
the compiler always knows how to pass an address, which is of a fixed size
regardless of the object being passed, even if it doesn’t have full
information about the size of the type. If you try to pass the whole object,
however, the compiler must see the entire structure definition of X, to
know the size and how to pass it, before it allows you to declare a function
such as Y::g(X).
By passing the address of an X,
the compiler allows you to make an incomplete type specification
of
X prior to declaring Y::f(X*). This is accomplished in the
declaration:
struct X;
This declaration simply tells the
compiler there’s a struct by that name, so it’s OK to refer
to it as long as you don’t require any more knowledge than the
name.
Now, in struct X, the function
Y::f(X*) can be declared as a friend with no problem. If you tried
to declare it before the compiler had seen the full specification for Y,
it would have given you an error. This is a safety feature to ensure consistency
and eliminate bugs.
Notice the two other friend
functions. The first declares an ordinary global function g( ) as a
friend. But g( ) has not been previously declared at the
global scope! It turns out that friend can be used this way to
simultaneously declare the function and give it friend status.
This extends to entire structures:
friend struct Z;
Making a structure nested doesn’t
automatically give it access to private members. To accomplish this, you
must follow a particular form: first, declare (without defining) the nested
structure, then declare it as a friend, and finally define the structure.
The structure definition must be separate from the friend declaration,
otherwise it would be seen by the compiler as a non-member. Here’s an
example:
//: C05:NestFriend.cpp // Nested friends #include <iostream> #include <cstring> // memset() using namespace std; const int sz = 20; struct Holder { private: int a[sz]; public: void initialize(); struct Pointer; friend struct Pointer; struct Pointer { private: Holder* h; int* p; public: void initialize(Holder* h); // Move around in the array: void next(); void previous(); void top(); void end(); // Access values: int read(); void set(int i); }; }; void Holder::initialize() { memset(a, 0, sz * sizeof(int)); } void Holder::Pointer::initialize(Holder* rv) { h = rv; p = rv->a; } void Holder::Pointer::next() { if(p < &(h->a[sz - 1])) p++; } void Holder::Pointer::previous() { if(p > &(h->a[0])) p--; } void Holder::Pointer::top() { p = &(h->a[0]); } void Holder::Pointer::end() { p = &(h->a[sz - 1]); } int Holder::Pointer::read() { return *p; } void Holder::Pointer::set(int i) { *p = i; } int main() { Holder h; Holder::Pointer hp, hp2; int i; h.initialize(); hp.initialize(&h); hp2.initialize(&h); for(i = 0; i < sz; i++) { hp.set(i); hp.next(); } hp.top(); hp2.end(); for(i = 0; i < sz; i++) { cout << "hp = " << hp.read() << ", hp2 = " << hp2.read() << endl; hp.next(); hp2.previous(); } } ///:~
Once Pointer is declared, it is
granted access to the private members of Holder by
saying:
friend struct Pointer;
The struct Holder contains an
array of ints and the Pointer allows you to access them. Because
Pointer is strongly associated with Holder, it’s sensible to
make it a member structure of Holder. But because Pointer is a
separate class from Holder, you can make more than one of them in
main( ) and use them to select different parts of the array.
Pointer is a structure instead of a raw C pointer, so you can guarantee
that it will always safely point inside the Holder.
The Standard C library function
memset( ) (in
<cstring>) is used
for convenience in the program above. It sets all memory starting at a
particular address (the first argument) to a particular value (the second
argument) for n bytes past the starting address (n is the third
argument). Of course, you could have simply used a loop to iterate through all
the memory, but memset( ) is available, well-tested (so it’s
less likely you’ll introduce an error), and probably more efficient than
if you coded it by
hand.
The class definition gives you an audit
trail, so you can see from looking at the class which functions have permission
to modify the private parts of the class. If a function is a friend, it
means that it isn’t a member, but you want to give permission to modify
private data anyway, and it must be listed in the class definition so everyone
can see that it’s one of the privileged functions.
C++
is a hybrid object-oriented language, not a pure one, and friend was
added to get around practical problems that crop up. It’s fine to point
out that this makes the language less “pure,” because C++ is
designed to be pragmatic, not to aspire to an abstract
ideal.
Chapter 4 stated that a struct
written for a C compiler and later compiled with C++ would be unchanged. This
referred primarily to the object layout of the struct, that is, where the
storage for the individual variables is positioned in the memory allocated for
the object. If the C++ compiler changed the layout
of C
structs, then any C code you wrote that inadvisably took advantage of
knowledge of the positions of variables in the struct would
break.
When you start using access specifiers,
however, you’ve moved completely into the C++ realm, and things change a
bit. Within a particular “access block” (a
group of declarations delimited by access specifiers), the variables are
guaranteed to be laid out contiguously, as in C. However, the access blocks may
not appear in the object in the order that you declare them. Although the
compiler will usually lay the blocks out exactly as you see them, there
is no rule about it, because a particular machine architecture and/or operating
environment may have explicit support for
private and
protected that might
require those blocks to be placed in special memory locations. The language
specification doesn’t want to restrict this kind of
advantage.
Access specifiers are part of the
structure and don’t affect the objects created from the structure. All of
the access specification information disappears before the program is run;
generally this happens during compilation. In a running program, objects become
“regions of storage” and nothing more. If you really want to, you
can break all the rules and access the memory directly, as you can in C. C++ is
not designed to prevent you from doing unwise things. It just provides you with
a much easier, highly desirable alternative.
In general, it’s not a good idea to
depend on anything that’s implementation-specific when you’re
writing a program. When you must have implementation-specific dependencies,
encapsulate them inside a structure so that any porting changes are focused in
one
place.
Access control is often referred to as
implementation hiding.
Including functions within structures (often referred to as
encapsulation[36])
produces a data type with characteristics and behaviors, but access control puts
boundaries within that data type, for two important reasons. The first is to
establish what the client programmers can and can’t use. You can build
your internal mechanisms into the structure without worrying that client
programmers will think that these mechanisms are part of the interface they
should be using.
This feeds directly into the second
reason, which is to separate the interface from the implementation.
If the
structure is used in a set of programs, but the client programmers can’t
do anything but send messages to the public interface, then you can
change anything that’s private without requiring modifications to
their code.
Encapsulation and access control, taken
together, invent something more than a C struct. We’re now in the
world of object-oriented programming, where a structure is describing a class of
objects as you would describe a class of fishes or a class of birds: Any object
belonging to this class will share these characteristics and behaviors.
That’s what the structure declaration has become, a description of the way
all objects of this type will look and act.
In the original OOP
language, Simula-67, the keyword
class was used to
describe a new data type. This apparently inspired Stroustrup to choose the same
keyword for C++, to emphasize that this was the focal point of the whole
language: the creation of new data types that are more than just C
structs with functions. This certainly seems like adequate justification
for a new keyword.
However, the use of class in C++
comes close to being an unnecessary keyword. It’s identical to the
struct keyword in absolutely every way except one: class defaults
to private, whereas struct defaults to public. Here are two
structures that produce the same result:
//: C05:Class.cpp // Similarity of struct and class struct A { private: int i, j, k; public: int f(); void g(); }; int A::f() { return i + j + k; } void A::g() { i = j = k = 0; } // Identical results are produced with: class B { int i, j, k; public: int f(); void g(); }; int B::f() { return i + j + k; } void B::g() { i = j = k = 0; } int main() { A a; B b; a.f(); a.g(); b.f(); b.g(); } ///:~
The class is the fundamental OOP
concept in C++. It is one of the keywords that will not be set in bold in
this book – it becomes annoying with a word repeated as often as
“class.” The shift to classes is so important that I suspect
Stroustrup’s preference would have been to throw struct out
altogether, but the need for backwards compatibility with C wouldn’t allow
that.
Many people prefer a style of creating
classes that is more struct-like than class-like, because you override
the “default-to-private” behavior of the class by starting
out with public elements:
class X { public: void interface_function(); private: void private_function(); int internal_representation; };
The logic behind this is that it makes
more sense for the reader to see the members of interest first, then they can
ignore anything that says private. Indeed, the only reasons all the other
members must be declared in the class at all are so the compiler knows how big
the objects are and can allocate them properly, and so it can guarantee
consistency.
The examples in this book, however, will
put the private members first, like this:
class X { void private_function(); int internal_representation; public: void interface_function(); };
Some people even go to the trouble of
decorating their own private names:
class Y { public: void f(); private: int mX; // "Self-decorated" name };
Because mX is already hidden in
the scope of Y, the m (for “member”) is unnecessary.
However, in projects with many global variables (something you should strive to
avoid, but which is sometimes inevitable in existing projects), it is helpful to
be able to distinguish inside a member function definition which data is global
and which is a
member.
It makes sense to take the examples from
Chapter 4 and modify them to use classes and access control. Notice how the
client programmer portion of the interface is now clearly distinguished, so
there’s no possibility of client programmers accidentally manipulating a
part of the class that they shouldn’t.
//: C05:Stash.h // Converted to use access control #ifndef STASH_H #define STASH_H class Stash { int size; // Size of each space int quantity; // Number of storage spaces int next; // Next empty space // Dynamically allocated array of bytes: unsigned char* storage; void inflate(int increase); public: void initialize(int size); void cleanup(); int add(void* element); void* fetch(int index); int count(); }; #endif // STASH_H ///:~
The inflate( ) function has
been made private because it is used only by the add( )
function and is thus part of the underlying implementation, not the interface.
This means that, sometime later, you can change the underlying implementation to
use a different system for memory management.
Other than the name of the include file,
the header above is the only thing that’s been changed for this example.
The implementation file and test file are the
same.
As a second example, here’s the
Stack turned into a class. Now the nested data structure is
private, which is nice because it ensures that the client programmer will
neither have to look at it nor be able to depend on the internal representation
of the Stack:
//: C05:Stack2.h // Nested structs via linked list #ifndef STACK2_H #define STACK2_H class Stack { struct Link { void* data; Link* next; void initialize(void* dat, Link* nxt); }* head; public: void initialize(); void push(void* dat); void* peek(); void* pop(); void cleanup(); }; #endif // STACK2_H ///:~
As before, the implementation
doesn’t change and so it is not repeated here. The test, too, is
identical. The only thing that’s been changed is the robustness of the
class interface. The real value of access control is to prevent you from
crossing boundaries during development. In fact, the
compiler is the only thing that knows about the protection level of class
members. There is no access control information mangled into the member name
that carries through to the linker. All the protection checking is done by the
compiler; it has vanished by
runtime.
Notice that the interface presented to
the client programmer is now truly that of a push-down
stack. It happens to be
implemented as a linked list,
but you can change that without affecting what the client programmer interacts
with, or (more importantly) a single line of client
code.
Access control in C++ allows you to
separate interface from implementation, but the implementation hiding
is only partial. The compiler
must still see the declarations for all parts of an object in order to create
and manipulate it properly. You could imagine a programming language that
requires only the public interface of an object and allows the private
implementation to be hidden, but C++ performs type checking statically (at
compile time) as much as possible. This means that you’ll learn as early
as possible if there’s an error. It also means that your program is more
efficient. However, including the private implementation has two effects: the
implementation is visible even if you can’t easily access it, and it can
cause needless
recompilation.
Some projects cannot afford to have their
implementation visible to the client programmer. It may show strategic
information in a library header file that the company doesn’t want
available to competitors. You may be working on a system where
security is an issue – an encryption algorithm,
for example – and you don’t want to expose any clues in a header
file that might help people to crack the code. Or you may be putting your
library in a “hostile” environment, where the
programmers will directly access the private components
anyway, using pointers and
casting. In all these situations, it’s valuable to
have the actual structure compiled inside an implementation file rather than
exposed in a header
file.
The project manager in your programming
environment will cause a recompilation of a file if that file is touched (that
is, modified) or if another file it’s dependent upon – that
is, an included header file – is touched. This means that any time you
make a change to a class, whether it’s to the public interface or to the
private member declarations, you’ll force a recompilation of anything that
includes that header file. This is often referred to as the
fragile
base-class problem. For a large project in its early stages this can be very
unwieldy because the underlying implementation may change often; if the project
is very big, the time for compiles can prohibit rapid
turnaround.
The technique to solve this is sometimes
called handle classes or the “Cheshire
cat”[37]
– everything about the implementation disappears except for a single
pointer, the “smile.” The pointer refers to a structure whose
definition is in the implementation file along with all the member function
definitions. Thus, as long as the interface is unchanged, the header file is
untouched. The implementation can change at will, and only the implementation
file needs to be recompiled and relinked with the project.
Here’s a simple example
demonstrating the technique. The header file contains only the public interface
and a single pointer of an incompletely specified class:
//: C05:Handle.h // Handle classes #ifndef HANDLE_H #define HANDLE_H class Handle { struct Cheshire; // Class declaration only Cheshire* smile; public: void initialize(); void cleanup(); int read(); void change(int); }; #endif // HANDLE_H ///:~
This is all the client programmer is able
to see. The line
struct Cheshire;
is an incomplete type
specification
or a
class declaration (A
class definition includes
the body of the class.) It tells the compiler that Cheshire is a
structure name, but it doesn’t give any details about the struct.
This is only enough information to create a pointer to the struct; you
can’t create an object until the structure body has been provided. In this
technique, that structure body is hidden away in the implementation
file:
//: C05:Handle.cpp {O} // Handle implementation #include "Handle.h" #include "../require.h" // Define Handle's implementation: struct Handle::Cheshire { int i; }; void Handle::initialize() { smile = new Cheshire; smile->i = 0; } void Handle::cleanup() { delete smile; } int Handle::read() { return smile->i; } void Handle::change(int x) { smile->i = x; } ///:~
struct Handle::Cheshire {
In Handle::initialize( ),
storage is allocated for a Cheshire structure, and in
Handle::cleanup( ) this storage is released. This storage is used in
lieu of all the data elements you’d normally put into the private
section of the class. When you compile Handle.cpp, this structure
definition is hidden away in the object file where no one can see it. If you
change the elements of Cheshire, the only file that must be recompiled is
Handle.cpp because the header file is untouched.
The use of Handle is like the use
of any class: include the header, create objects, and send
messages.
//: C05:UseHandle.cpp //{L} Handle // Use the Handle class #include "Handle.h" int main() { Handle u; u.initialize(); u.read(); u.change(1); u.cleanup(); } ///:~
The only thing the client programmer can
access is the public interface, so as long as the implementation is the only
thing that changes, the file above never needs recompilation. Thus, although
this isn’t perfect implementation hiding, it’s a big
improvement.
Access control in C++ gives valuable
control to the creator of a class. The users of the class can clearly see
exactly what they can use and what to ignore. More important, though, is the
ability to ensure that no client programmer becomes dependent on any part of the
underlying implementation of a class. If you know this as the creator of the
class, you can change the underlying implementation with the knowledge that no
client programmer will be affected by the changes because they can’t
access that part of the class.
When you have the ability to change the
underlying implementation, you can not only improve your design
at some later time, but you also have the freedom to
make mistakes. No matter how carefully you plan and
design, you’ll make mistakes. Knowing that it’s relatively safe to
make these mistakes means you’ll be more experimental, you’ll learn
faster, and you’ll finish your project sooner.
The public interface to a class is what
the client programmer does see, so that is the most important part of the
class to get “right” during analysis and design. But even that
allows you some leeway for change. If you don’t get the interface right
the first time, you can add more functions, as
long as you don’t remove any that client programmers have already used in
their
code.
Solutions to selected exercises
can be found in the electronic document The Thinking in C++ Annotated
Solution Guide, available for a small fee from
www.BruceEckel.com.
[36]
As noted before, sometimes access control is referred to as
encapsulation.
[37]
This name is attributed to John Carolan, one of the early pioneers in C++, and
of course, Lewis Carroll. This technique can also be seen as a form of the
“bridge” design pattern, described in Volume 2.