Inside the CPP object model - huku-/research GitHub Wiki
Here is my list of notes on an excellent book written by a talented person, Stanley B. Lippman. A great companion for a reverse engineer working on C++ reversing. An online version can be found here.
Lippman worked with Bjarne Stroustrup on Cfront while at Bell Labs. This link contains some interesting resources which might be of great help while studying this book.
I know there's a ton of useful information in it, however, if you think an important point is missing from the following bullets, please let me know.
-
Page 17
The data members within a single access section are guaranteed within C++ to be laid out in the order of their declaration.
-
Page 18
Composition, rather than inheritance, is the only portable method of combining C and C++ portions of a class (the conversion operator provides a handy extraction method).
-
Page 18-19, 29
According to the author, the C++ programming model directly supports three programming paradigms.
-
The procedural model as programmed in C (e.g. using character arrays and the family of strxxx() functions defined the standard C library.
-
The abstract data type (ADT) model in which users of the abstraction are provided with a set of operations (the public interface), while the implementation remains hidden. The author presents the String class as a representative example of this paradigm.
-
The object-oriented (OO) model in which a collection of related types are encapsulated through an abstract base class providing a common interface.
-
On page 29, the author refers to the object-based (OB) model as a fourth programming paradigm. This paradigm refers to classes that provide a public interface and a private implementation, but do not support type extension. Results in faster (function invocations resolved at compile time) and more compact (no virtual mechanism overhead) design.
-
-
Page 33
A constructor is trivial if it is an implicitly declared default constructor.
Each of the subsections in this chapter presents one of the four cases under which the implicit default constructor is considered trivial.
-
Member class object with default constructor (Page 34)
If a class without any constructors contains a member object of a class with a default constructor, the implicit default constructor of the class is nontrivial and the compiler needs to synthesize a default constructor for the containing class. This synthesis, however, takes place only if the constructor actually needs to be invoked.
-
Base class with default constructor (Page 37)
Similarly, if a class without any constructors is derived from a base class containing a default constructor, the default constructor for the derived class is considered nontrivial and so needs to be synthesized. The synthesized default constructor of the derived class invokes the default constructor of each of its immediate base classes in the order of their declaration.
-
Class with a virtual function (Page 37)
There are two additional cases in which a synthesized default constructor is needed:
-
The class either declares (or inherits) a virtual function
-
The class is derived from an inheritance chain in which one or more base classes are virtual
On page 38 the author states:
In classes that do not declare any constructors, the compiler synthesizes a default constructor in order to correctly initialize the vptr of each class object.
-
-
-
Page 40
Programmers new to C++ often have two common misunderstandings:
-
That a default constructor is synthesized for every class that does not define one
-
That the compiler-synthesized default constructor provides explicit default initializers for each data member declared within the class
[...] neither of these is true.
-
-
Page 45
Cases when bitwise copy semantics are not exhibited by a class:
-
When the class contains a member object of a class for which a copy constructor exists (either explicitly declared by the class designer, [...] or synthesized by the compiler [...])
-
When the class is derived from a base class for which a copy constructor exists ([...] explicitly declared or synthesized)
-
When the class declares one or more virtual functions
-
When the class is derived from an inheritance chain in which one or more base classes are virtual
-
-
Page 56 mentions the Named Return Value (NRV) optimization which is too complex to be described here. Page 57 shows some interesting benchmarks when NRV is applied and when it's not.
-
Page 62
You must use the member initialization list in the following cases in order for your program to compile:
-
When initializing a reference member
-
When initializing a const member
-
When invoking a base or member class constructor with a set of arguments
-
-
Page 69
sizeof(X) yields 1 so that different objects of class X can be allocated at unique addresses in memory.
-
Page 76, 82
The Standard requires within an access section (the private, public, or protected section of a class declaration) only that the members be set down such that "later members have higher addresses within a class object" (Section 9.2 of the Standard)
However notice that members are not required to be set down contiguously as padding may intervene between them.
The actual ordering of the derived and base class parts is left unspecified by the Standard. [...] In practice, the base class members always appear first [...]
-
Page 88
The virtual mechanism affects destructors as well:
Augmentation of the destructor to reset the vptr to the associated virtual table of the class. (It is likely to have been set to address the virtual table of the derived class within the destructor of the derived class. Remember, the order of destructor calls is in reverse: derived class and then base class.)
-
Page 98
MetaWare and other compilers still using cfront's original implementation model solve the second problem by promoting (by copying) all nested virtual base class pointers into the derived class objects.
[...]
Microsoft's compiler introduced the virtual base class table. Each class object with one or more virtual base classes has a pointer to the virtual base class table inserted within it. The actual virtual base class pointers, of course, are placed within the table.
[...]
The second solution, and the one preferred by Bjarne [...], is to place not the address but the offset of the virtual base class within the virtual function table. [...] In the recent Sun compiler, the virtual function table is indexed by both positive and negative indices. The positive indices, as previously, index into the set of virtual functions; the negative indices retrieve the virtual base class offsets.
-
Page 106, Section 3.6, interesting discussion on pointers to data members. Also interesting is the pointer arithmetic trick at page 81.
-
Pros and cons of placing the virtual function table pointer at the beginning or the end of class are spread in several pages in chapter 3.
-
vptr at the beginning of a class:
-
(+) Efficient in supporting some virtual function invocations through points to class members under multiple inheritance (Section 4.4)
-
(-) Loss in C language interoperability
-
(-) Breaks the natural polymorphism of single inheritance in the special case of a base class without virtual functions and a derived class with them
-
-
vptr at the end of a class:
-
(+) Preserves the object layout of the base class C sruct, thus permitting its use within C code
-
(-) Its offset must be made available at run-time
-
-
-
Page 121
Recall that objects do not support polymorphism (see Section 1.3). [...] The invocation of a virtual function through a class object should always be resolved by your compiler as an ordinary nonstatic member function. [...] An additional benefit of this optimization is that an inline instance of the virtual function can be expanded, thus providing significant performance benefit.
-
Page 136
Interesting figure (Figure 4.2) showing virtual table layout in multiple inheritance.
-
Page 160-161
A pure virtual function may be defined and then invoked as long as it is invoked statically. The exception is a pure virtual destructor which must be defined.
-
Page 162
Not declaring a function const means the function cannot be called by a const reference or const pointer argument - at least not without resorting to a casting away of the const.
-
Page 172-173
Interesting bullets regarding constructor augmentations under inheritance.
-
Page 177, 189
In case of virtual inheritance, constructors are augmented with a boolean variable indicating whether virtual base class constructor(s) should be invoked.
The same problem manifests itself in the case of assignment operators, however many compilers don't try to fix it. More specifically the standard states:
It is unspecified whether subobjects representing virtual base classes are assigned more than once by the implicitly defined copy assignment operator (Section 12.8).
In any case this looks like a performance problem to me.
-
Page 182
The general algorithm of constructor execution is as follows:
-
Within the derived class constructor, all virtual base class and the immediate base class constructors are invoked.
-
The done, the object's vptr(s) are initialized to address the associated virtual table(s).
-
The member initialization list, if present, is expanded within the body of the constructor. This must be done after the vptr is set in the case a virtual member function is called.
-
The explicit user-supplied code is executed.
-
-
Page 183
There are two conditions under which the vptr must be set:
-
When a complete object is being constructed. [...]
-
When, within the construction of a subobject, a virtual function call is made either directly or indirectly.
-
-
Page 198
As with constructors, current thinking on the best implementation strategy for the destructor is to maintain two destructor instances:
-
A complete object instance that always sets the vptr(s) and invokes the virtual base class destructors.
-
A base class subobject instance that never invokes the virtual base class destructors and sets the vptr(s) only if a virtual function may be invoked from within the body of the destructor.
-
-
Page 205
int v1 = 1024; int v2;
In C v2 is a tentative definition, in C++ it's not. Both v1 and v2 are allocated in the program's data segment and v2 is set to an initial value of 0.
-
Page 209
There are a number of drawbacks to using statically initialized objects. [...] these objects cannot be placed within try blocks.
Consequently any exception will terminate the program.
Another drawback is the complexity involved in controlling order dependency of objects that require static initialization across modules.
-
Page 212
If you have been doing C++ reversing, you have probably come across the following function for initializing arrays of objects:
void *vec_new(void *array, size_t elem_size, int elem_count, void (* constructor)(void *), void (* destructor)(void *, char));
According to the author:
More recent implementations, including Borland, Microsoft and Sun, provide two instances - one to handle classes without virtual base classes, one to handle classes containing virtual base classes, the later usually named vec_vnew().
The corresponding function for deleting arrays of objects is the following:
void *vec_delete(void *array, size_t elem_size, int elem_count, void (* destructor)(void *, char));
-
Page 220
Regarding delete[array_size] p_array and delete[] p_array:
Concern over the impact of searching for the array dimension on the performance of the delete operator led to the following compromise. The compiler searches for a dimension size only if the bracket is present. Otherwise, it assumes a single object is being deleted.
[...]
How is this caching of the element count implemented? One obvious way is to allocate an additional word of memory with each chunk of memory returned by the vec_new() operator, tucking the element count in that word (generally, the value tucked away is called a cookie). However, Jonathan and the Sun implementation chose to keep an associative array of pointer values and the array size. (Sun also tucks away the address of the destructor - see [CLAM93n].)
-
Page 247-250
There is a distinction between the program site at which a template is defined (called in the Standard the scope of the template definition) and the program site at which a template is actually instantiated (called the scope of the template instantiation).
// Scope of the template definition. extern double foo(double); template<class type> class ScopeRules { public: void invariant() { _member = foo(_val); } type type_dependent() { return foo(_member); } private: int _val; type _member; } // Scope of the template instantiation. extern int foo(int); ScopeRules<int> sr;
Notice that there are two invocations of foo() in this template.
The program site of the resolution of a nonmember name within a template is determined by whether the use of the name is dependent on the parameter types used to instantiate the template. If the use is not dependent, then the scope of the template declaration determines the resolution of the name. If the use is dependent, then the scope of the template instantiation determines the resolution of the name.
The first call to foo() is not type dependent and so foo(double) is used. The second is type dependent and, consequently, foo(int) is used.
-
Page 259 (1)
The recommended idiom for handling these sorts of resource management is to encapsulate the resource acquisition within a class object, the destructor of which frees the resource.
void mumble(void *arena) { auto_ptr<Point> ph(new Point); SMLock sm(arena); // No problem now if an exception is thrown here. // [...] // No need to explicitly unlock and delete. Local destructors invoked // here: // // sm.SMLock::~SMLock(); // ph.auto_ptr<Point<::~auto_ptr<Point<(); }
-
Page 259 (2)
EH support complicates the constructors of classes with member class and base class subobjects with constructors. A class that is partially constructed must apply the destructors for only these subobjects and/or member objects that have been constructed. [...] Providing for all these contingencies is the compiler's responsibility.
Windows reverse engineers may have noticed how the MS compiler increments locals after object construction. Maybe the clause above clarifies the rationale behind this design choice.
-
Page 267
Pointer to RTTI occupies a slot in the class' virtual function table.
-
Page 269 - References Are Not Pointers
The dynamic_cast of a class pointer type provides a true/false pair of alternative pathways during program execution:
-
A return of an actual address means the dynamic type of the object is confirmed and type-dependent actions may proceed.
-
A return of 0, the universal address of no object, means alternative logic can be applied to an object of uncertain dynamic type.
The dynamic_cast operator can also be applied to a reference. The result of a non-type-safe cast, however, cannot be the same as for a pointer. Why? A reference cannot refer to "no object" the way a pointer does by having its value be set to 0.
[...]
Rather the following occurs:
-
If the reference is actually referring to the appropriate derived class or an object of a class subsequently derived from that class, the downcast is performed and the program may proceed.
-
If the reference is not actually a kind of the derived class, then because returning 0 is not viable, a bad_cast exception is thrown.
-