C producing an executable - rFronteddu/general_wiki GitHub Wiki

image

The C++ compiler sequentially goes through each source code (.cpp) file in your program and does two important tasks:

  • Checks the code to make sure it follows the rules of the C++ language. If it does not, the compiler will give you an error to help pinpoint what needs fixing. The compilation process will also be aborted until the error is fixed.
  • Translates your code into machine language instructions. These instructions are stored in an intermediate file called an object file. The object file also contains other data that is required or useful in subsequent steps (including linked data).

Object files are typically named name.o or name.obj, where name is the same name as the .cpp file it was produced from.

After compiling, another program called the linker combines the object files and produce the desired output file (such as an executable file). This process is called linking. If any step in the linking process fails, the linker will generate an error message describing the issue and then abort.

  • First, the linker reads in each of the object files generated by the compiler and makes sure they are valid.
  • Second, the linker ensures all cross-file dependencies are resolved properly. For example, if you define something in one .cpp file, and then use it in a different .cpp file, the linker connects the two together. If the linker is unable to connect a reference to something with its definition, you’ll get a linker error, and the linking process will abort.
  • Third, the linker typically links in one or more library files, which are collections of precompiled code that have been “packaged up” for reuse in other programs.

Finally, the linker outputs the desired output file. Typically this will be an executable file that can be launched

Forward declarations

#include <iostream>

int main()
{
    std::cout << "The sum of 3 and 4 is: " << add(3, 4) << '\n';
    return 0;
}

int add(int x, int y)
{
    return x + y;
}

The reason this program doesn’t compile is because the compiler compiles the contents of code files sequentially. It’s useful to generally note that it is fairly common for a single error to produce many redundant or related errors or warnings. It can sometimes be hard to tell whether any error or warning beyond the first is a consequence of the first issue, or whether it is an independent issue that needs to be resolved separately. When addressing compilation errors or warnings in your programs, resolve the first issue listed and then compile again.

There are two common ways to address the issue.

  • Option 1: Reorder the function definitions
  • Option 2: Use a forward declaration

A forward declaration allows us to tell the compiler about the existence of an identifier before actually defining the identifier. In the case of functions, this allows us to tell the compiler about the existence of a function before we define the function’s body. This way, when the compiler encounters a call to the function, it’ll understand that we’re making a function call, and can check to ensure we’re calling the function correctly, even if it doesn’t yet know how or where the function is defined.

To write a forward declaration for a function, we use a function declaration statement (also called a function prototype). The function declaration consists of the function’s return type, name, and parameter types, terminated with a semicolon. The names of the parameters can be optionally included. The function body is not included in the declaration.

int add(int x, int y); 

It is worth noting that function declarations do not need to specify the names of the parameters (as they are not considered to be part of the function declaration). Most often, forward declarations are used to tell the compiler about the existence of some function that has been defined in a different code file. Reordering isn’t possible in this scenario because the caller and the callee are in completely different files! There are times when we have two functions that call each other. Reordering isn’t possible in this case either, as there is no way to reorder the functions such that each is before the other. Forward declarations give us a way to resolve such circular dependencies.

If a forward declaration is made, but the function is never called, the program will compile and run fine. However, if a forward declaration is made and the function is called, but the program never defines the function, the program will compile okay, but the linker will complain that it can’t resolve the function call.

A declaration tells the compiler about the existence of an identifier and its associated type information.

A definition is a declaration that actually implements (for functions and types) or instantiates (for variables) the identifier. In C++, all definitions are declarations. Conversely, not all declarations are definitions. Declarations that aren’t definitions are called pure declarations.

The one definition rule (or ODR for short) is a well-known rule in C++. The ODR has three parts:

  • Within a file, each function, variable, type, or template in a given scope can only have one definition. Definitions occurring in different scopes (e.g. local variables defined inside different functions, or functions defined inside different namespaces) do not violate this rule.
  • Within a program, each function or variable in a given scope can only have one definition. This rule exists because programs can have more than one file (we’ll cover this in the next lesson). Functions and variables not visible to the linker are excluded from this rule
  • Types, templates, inline functions, and inline variables are allowed to have duplicate definitions in different files, so long as each definition is identical.

Violating part 1 of the ODR will cause the compiler to issue a redefinition error. Violating ODR part 2 will cause the linker to issue a redefinition error. Violating ODR part 3 will cause undefined behavior.

Multiple code files

If we have a function in another file that we want to use, we need to forward declare the function so that the compiler knows the identifier will be filled at link time after all the files have been (individually and separatedly) compiled.

This limited visibility and short memory is intentional, for a few reasons:

  • It allows the source files of a project to be compiled in any order.
  • When we change a source file, only that source file needs to be recompiled.
  • It reduces the possibility of naming conflicts between identifiers in different files.

Our options for a solution here are the same as before: place the definition of function add before function main, or satisfy the compiler with a forward declaration. Using this method, we can give files access to functions that live in another file.

When an identifier is used in an expression, the identifier must be connected to its definition.

  • If the compiler has seen neither a forward declaration nor a definition for the identifier in the file being compiled, it will error at the point where the identifier is used.
  • Otherwise, if a definition exists in the same file, the compiler will connect the use of the identifier to its definition.
  • Otherwise, if a definition exists in a different file (and is visible to the linker), the linker will connect the use of the identifier to its definition.
  • Otherwise, the linker will issue an error indicating that it couldn’t find a definition for the identifier.

C++ is designed so that each source file can be compiled independently, with no knowledge of what is in other files. Therefore, the order in which files are actually compiled should not be relevant.

Angled brackets vs double quotes

When we use angled brackets, we’re telling the preprocessor that this is a header file we didn’t write ourselves. The preprocessor will search for the header only in the directories specified by the include directories. The include directories are configured as part of your project/IDE settings/compiler settings, and typically default to the directories containing the header files that come with your compiler and/or OS. The preprocessor will not search for the header file in your project’s source code directory.

When we use double-quotes, we’re telling the preprocessor that this is a header file that we wrote. The preprocessor will first search for the header file in the current directory. If it can’t find a matching header there, it will then search the include directories.

When C++ was first created, all of the headers in the standard library ended in a .h suffix.

When the language was standardized by the ANSI committee, they decided to move all of the names used in the standard library into the std namespace to help avoid naming conflicts with user-declared identifiers. However, this presented a problem: if they moved all the names into the std namespace, none of the old programs (that included iostream.h) would work anymore!

To work around this issue, C++ introduced new header files that lack the .h extension. These new header files declare all names inside the std namespace. This way, older programs that include #include <iostream.h> do not need to be rewritten, and newer programs can #include .

Another common question involves how to include header files from other directories.

One (bad) way to do this is to include a relative path to the header file you want to include as part of the #include line. A better method is to tell your compiler or IDE that you have a bunch of header files in some other location, so that it will look there when it can’t find them in the current directory. This can generally be done by setting an include path or search directory in your IDE project settings (for example, in gcc you use the -I option to specify alternate include directories.

The nice thing about this approach is that if you ever change your directory structure, you only have to change a single compiler or IDE setting instead of every code file.

When your source (.cpp) file #includes a header file, you’ll also get any other header files that are #included by that header (and any header files those include, and so on). These additional header files are sometimes called transitive includes, as they’re included implicitly rather than explicitly. However, you generally should not rely on the content of headers that are included transitively. The implementation of header files may change over time, or be different across different systems.

To maximize the chance that missing includes will be flagged by compiler, order your #includes as follows:

  • The paired header file for this code file (e.g. add.cpp should #include "add.h")
  • Other headers from the same project (e.g. #include "mymath.h")
  • 3rd party library headers (e.g. #include <boost/tuple/tuple.hpp>)
  • Standard library headers (e.g. #include )

Naming collisions and an introduction to namespaces

C++ requires that all identifiers be non-ambiguous. If two identical identifiers are introduced into the same program in a way that the compiler or linker can’t tell them apart, the compiler or linker will produce an error. This error is generally referred to as a naming collision (or naming conflict).

If the colliding identifiers are introduced into the same file, the result will be a compiler error. If the colliding identifiers are introduced into separate files belonging to the same program, the result will be a linker error.

Most naming collisions occur in two cases:

  • Two (or more) identically named functions (or global variables) are introduced into separate files belonging to the same program. This will result in a linker error, as shown above.
  • Two (or more) identically named functions (or global variables) are introduced into the same file. This will result in a compiler error.

A scope region is an area of source code where all declared identifiers are considered distinct from names declared in other scopes. Two identifiers with the same name can be declared in separate scope regions without causing a naming conflict. However, within a given scope region, all identifiers must be unique, otherwise a naming collision will result.

A namespace provides another type of scope region (called namespace scope) that allows you to declare or define names inside of it for the purpose of disambiguation. The names declared in a namespace are isolated from names declared in other scopes, allowing such names to exist without conflict.

A namespace may only contain declarations and definitions. Executable statements are only allowed as part of a definition (e.g. of a function).

In C++, any name that is not defined inside a class, function, or a namespace is considered to be part of an implicitly-defined namespace called the global namespace (sometimes also called the global scope).

  • Identifiers declared inside the global scope are in scope from the point of declaration to the end of the file.
  • Although variables can be defined in the global namespace, this should generally be avoided

When you use an identifier that is defined inside a non-global namespace (e.g. the std namespace), you need to tell the compiler that the identifier lives inside the namespace.” There are a few different ways to do this.

  • Explicit namespace qualifier (for example std::). The :: symbol is an operator called the scope resolution operator. If no identifier to the left of the :: symbol is provided, the global namespace is assumed. When an identifier includes a namespace prefix, the identifier is called a qualified name.
  • Using namespace directive. Another way to access identifiers inside a namespace is to use a using-directive statement. A using directive allows us to access the names in a namespace without using a namespace prefix. Avoid using-directives (such as using namespace std;) at the top of your program or in header files. They violate the reason why namespaces were added in the first place.
⚠️ **GitHub.com Fallback** ⚠️