CC‐Structure of a program - rFronteddu/general_wiki GitHub Wiki
- Statements: A computer program is a sequence of instructions that tell the computer what to do. A statement is a type of instruction that causes the program to perform some action. Statements are the smallest independent unit of computation in the C++ language. Most (but not all) statements in C++ end in a semicolon.
- Function: In C++, statements are typically grouped into units called functions. A function is a collection of statements that get executed sequentially (in order, from top to bottom).
Every C++ program must have a special function named main (all lower case letters). When the program is run, the statements inside of main are executed in sequential order. Programs typically terminate (finish running) after the last statement inside function main has been executed (though programs may abort early in some circumstances, or do some cleanup afterwards).
- Identifier: The name of a function is called its identifier
- Data value: single piece of data such a numbers (5, -6), characters between single-quotes ('a', 'H'), and text ("hello", "H").
- Literals: Values that are placed directly into the source code. Literals are read-only values.
When we run a program, the OS loads the program into RAM. Any data that is hardcoded into the program itself is loaded at this point. The OS also reserves some RAM for the program to use while running (such as for storing values entered by the users, data read from network, etc) so that they can be used again later.
- Objects and variables: In C++, direct memory access is discouraged. Instead, we access memory indirectly through an object. An object represents a region of storage (typically RAM or a CPU register) that can hold a value. Objects also have associated properties. By using objects we can avoid worrying about where in memory those object are placed. Object can be unnamed (anonymous) but often they are named using an identifier. An object with a name is called a variable.
- Definition: To use a variable, we need to tell the compiler we want one through a declaration statement called definition (for example: int x;). At compile-time, the compiler makes a note that we want to use a variable with a certain identifier and whenever we use it later on the compiler can retrieve or update that variable. A variable created via a definition statement is said to be defined at the point where the definition statement is placed.
- Allocation: At runtime each object is given an actual storage location that it can use to store values. The process of reserving storage for an object's use is called allocation.
- Data Type: Determines what kind of value the object will store. In C++ the type of an object must be known at compile-time and cannot be changed without recompiling the program (so the compiler knows how much memory that object requires).
- Assignment: After a variable has been defined, you can give it a value using the = operator. This process is called assignment, = is the assignment operator. By default, assignment copies the value of the right-hand side of the = operator to the variable on the left-hand side of the operator. This is called copy-assignment.
- Initialization: The process of specifying an initial value for an object. The syntax used to initialize an object is called an initializer. Informally, the initial value is often called an initializer as well. For example in int width { 5 }; width is the variable, { 5 } is the initializer, and 5 is the initial value.
Initialization is not straight-forward, here are 5 common forms of initialization:
int a; // default-initialization (no initializer)
// traditional initialization forms:
int b = 5; // copy-initialization
int c ( 6 ); // direct-initialization
// modern initialization
int d { 7 }; // direct-list-initialization
int e {}; // value-initialization
As of C++17, copy-initialization, direct-initialization, and direct-list-initialization behave identically in most cases.
Instantiation
The term instantiation is a fancy word that means a variable has been created (allocated) and initialized (this includes default initialization). An instantiated object is sometimes called an instance. Most often, this term is applied to class type objects, but it is occasionally applied to objects of other types as well.
Uninitialized variables and Undefiend Behavior
Unlike some programming languages, C/C++ does not automatically initialize most variables to a given value. When a variable that is not initialized is given a memory address to use to store data, the default value of that variable is whatever (garbage) value happens to already be in that memory address! A variable that has not been given a known value (through initialization or assignment) is called an uninitialized variable.
To recap:
- Initialized = The object is given a known value at the point of definition.
- Assignment = The object is given a known value beyond the point of definition.
- Uninitialized = The object has not been given a known value yet.
Using the values of uninitialized variables can lead to unexpected results. Some compilers, such as Visual Studio, will initialize the contents of memory to some preset value when you’re using a debug build configuration. This will not happen when using a release build configuration. Most modern compilers will attempt to detect if a variable is being used without being given a value. If they are able to detect this, they will generally issue a compile-time warning or error.
Using the value from an uninitialized variable is our first example of undefined behavior. Undefined behavior (often abbreviated UB) is the result of executing code whose behavior is not well-defined by the C++ language.
Unspecified behavior is almost identical to implementation-defined behavior in that the behavior is left up to the implementation to define, but the implementation is not required to document the behavior. We generally want to avoid implementation-defined and unspecified behavior, as it means our program may not work as expected if compiled on a different compiler (or even on the same compiler if we change project settings that affect how the implementation behaves!)
Keywords and naming identifiers
C++ reserves a set of 92 words (as of C++23) for its own use. These words are called keywords (or reserved words), and each of these keywords has a special meaning within the C++ language.
C++ also defines special identifiers: override, final, import, and module. These have a specific meaning when used in certain contexts but are not reserved otherwise.
You have already run across some of these keywords, including int and return. Along with a set of operators, these keywords and special identifiers define the entire language of C++ (preprocessor commands excluded).
Identifier Naming Rules
As a reminder, the name of a variable (or function, type, or other kind of item) is called an identifier. C++ gives you a lot of flexibility to name identifiers as you wish. However, there are a few rules that must be followed when naming identifiers:
- The identifier can not be a keyword. Keywords are reserved.
- The identifier can only be composed of letters (lower or upper case), numbers, and the underscore character. That means the name can not contain symbols (except the underscore) nor whitespace (spaces or tabs).
- The identifier must begin with a letter (lower or upper case) or an underscore. It can not start with a number.
- C++ is case sensitive, and thus distinguishes between lower and upper case letters. nvalue is different than nValue is different than NVALUE.
Best practices:
- It is conventional in C++ that variable names should begin with a lowercase letter. If the variable name is a single word or acronym, the whole thing should be written in lowercase letters. Most often, function names are also started with a lowercase letter. Identifier names that start with a capital letter are typically used for user-defined types (such as structs, classes, and enumerations, all of which we will cover later).
- If the variable or function name is multi-word, there are two common conventions: words separated by underscores (sometimes called snake_case), or intercapped (sometimes called camelCase, since the capital letters stick up like the humps on a camel).
- It’s worth noting that if you’re working in someone else’s code, it’s generally considered better to match the style of the code you are working in than to rigidly follow the naming conventions laid out above.
- Avoid naming your identifiers starting with an underscore. Although syntactically legal, these names are typically reserved for OS, library, and/or compiler use.
- The name of your identifiers should make clear what the value they are holding means (particularly if the units aren’t obvious). Identifiers should be named in a way that would help someone who has no idea what your code does be able to figure it out as quickly as possible.
- A good rule of thumb is to make the length of an identifier proportional to how specific and accessible the identifier is.
- An identifier that exists for only a few statements (e.g. in the body of a short function) can have a shorter name.
- An identifier that is accessible from anywhere might benefit from a longer name.
- An identifier that represents a non-specific number (e.g. anything the user provides) can have a shorter name.
- An identifier that represents a specific value (e.g. the length of an inseam in millimeters) should have a longer name.
- Avoid abbreviations, except when they are common and unambiguous
- For variable declarations, it can be useful to use a comment to describe what a variable is going to be used for, or to explain anything else that might not be obvious.
Literals and operators
std::cout << "Hello world!";
int x { 5 };
-
Literal: ”Hello world!”‘ and ‘5’ are literals. A literal (also known as a literal constant) is a fixed value that has been inserted directly into the source code. Literals and variables both have a value (and a type). Unlike a variable (whose value can be set and changed through initialization and assignment respectively), the value of a literal is fixed and cannot be changed. The literal 5 always has value 5. This is why literals are called constants. A literal’s value is placed directly in the executable, and the executable itself can’t be changed after it is created. A variable’s value is placed in memory, and the value of memory can be changed while the executable is running.
-
Operators: In mathematics, an operation is a process involving zero or more input values (called operands) that produces a new value (called an output value). The specific operation to be performed is denoted by a symbol called an operator. For operators that are symbols, it is common to append the operator’s symbol to the word operator. For example, the plus operator would be written operator+, and the extraction operator would be written operator>>. The number of operands that an operator takes as input is called the operator’s arity. Note that some operators have more than one meaning depending on how they are used. For example, operator- has two contexts.
Operators in C++ come in four different arities:
- Unary operators act on one operand. An example of a unary operator is the - operator. For example, given -5, operator- takes literal operand 5 and flips its sign to produce new output value -5.
- Binary operators act on two operands (often called left and right, as the left operand appears on the left side of the operator, and the right operand appears on the right side of the operator). An example of a binary operator is the + operator. For example, given 3 + 4, operator+ takes the left operand 3 and the right operand 4 and applies mathematical addition to produce new output value 7.
- Ternary operators act on three operands. There is only one of these in C++ (the conditional operator).
- Nullary operators act on zero operands. There is also only one of these in C++ (the throw operator).
Arithmetic operators execute in the same order as they do in standard mathematics: Parenthesis first, then Exponents, then Multiplication & Division, then Addition & Subtraction. PEMDAS, or expanded to the mnemonic “Please Excuse My Dear Aunt Sally”.
- Side Effects: Most operators in C++ just use their operands to calculate a return value. For example, -5 produces return value -5 and 2 + 3 produces return value 5. There are a few operators that do not produce return values (such as delete and throw). Some operators have additional behaviors. An operator (or function) that has some observable effect beyond producing a return value is said to have a side effect.
Expressions
In general programming, an expression is a non-empty sequence of literals, variables, operators, and function calls that calculates a value. The process of executing an expression is called evaluation, and the resulting value produced is called the result of the expression (also sometimes called the return value).
In C++, the result of an expression is one of the following:
- A value
- An object or function
- Nothing
For example, you can chain std::cout calls because std::cout << x evaluates to std::cout. Similarly x = 2 + 3 evaluates to x.
int x { 2 + 3 };
Can be seen as
type identifier { expression };
* Expression cannot be compiled by themselves but can be converted to an **expression statement** by following the expression with a semicolon. For example x = 5 is an expression that to be compiled needs the semicolumn.
A **subexpression** is an expression used as an operand. A **full expression** is an expression that is not a subexpression. A **compound expression** is an expression that contains two or more uses of operators (for example x = 4 + 5 because it contains two operators).
Statements are used when we want the program to perform an action. Expressions are used when we want the program to calculate a value.
Functions
A function is a reusable sequence of statements designed to do a particular job. A program will be executing statements sequentially inside one function when it encounters a function call. A function call tells the CPU to interrupt the current function and execute another function. The CPU essentially “puts a bookmark” at the current point of execution, executes the function named in the function call, and then returns to the point it bookmarked and resumes execution.
returnType functionName() // This is the function header (tells the compiler about the existence of the function)
{
// This is the function body (tells the compiler what the function does)
}
When the return statement is executed:
- The return expression is evaluated to produce a value.
- The value produced by the return expression is copied back to the caller. This copy is called the return value of the function.
- The function exits, and control returns to the caller.
The process of returning a copied value back to the caller is named return by value.
A value-returning function that does not return a value will produce undefined behavior. The only exception to the rule that a value-returning function must return a value via a return statement is for function main(). The function main() will implicitly return the value 0 if no return statement is provided.
A function parameter is a variable used in the header of a function. Function parameters work almost identically to variables defined inside the function, but with one difference: they are initialized with a value provided by the caller of the function. An argument is a value that is passed from the caller to the function when a function call is made.
When a function is called, all of the parameters of the function are created as variables, and the value of each of the arguments is copied into the matching parameter (using copy initialization). This process is called pass by value. Function parameters that utilize pass by value are called value parameters.
In certain cases, you will encounter functions that have parameters that are not used in the body of the function. These are called unreferenced parameters.
A parameter without a name is called an unnamed parameter.
Variables defined inside the body of a function are called local variables. Function parameters are also generally considered to be local variables. Local variables are destroyed in the opposite order of creation at the end of the set of curly braces in which it is defined (or for a function parameter, at the end of the function). An object’s lifetime is defined to be the time between its creation and destruction. Objects may be created earlier, or destroyed later for optimization purposes. Most often, local variables are created when the function is entered, and destroyed in the opposite order of creation when the function is exited. Any use of an object after it has been destroyed will result in undefined behavior. t some point after destruction, the memory used by the object will be deallocated (freed up for reuse).
An identifier’s scope determines where the identifier can be seen and used within the source code. When an identifier can be seen and used, we say it is in scope. When an identifier can not be seen, we can not use it, and we say it is out of scope. Scope is a compile-time property, and trying to use an identifier when it is not in scope will result in a compile error. The identifier of a local variable has local scope. An identifier with local scope (technically called block scope) is usable from the point of definition to the end of the innermost pair of curly braces containing the identifier (or for function parameters, at the end of the function). This ensures local variables cannot be used before the point of definition (even if the compiler opts to create them before then) or after they are destroyed. Local variables defined in one function are also not in scope in other functions that are called. An identifier is out of scope anywhere it cannot be accessed within the code. We say an object goes out of scope at the end of the scope (the end curly brace) in which the object was instantiated.
Temporary Objects
A temporary object (also sometimes called an anonymous object) is an unnamed object that is used to hold a value that is only needed for a short period of time. Temporary objects are generated by the compiler when they are needed. Return by value returns a temporary object (that holds a copy of the return value) to the caller. Temporary objects have no scope and no identifier. They are destroyed at the end of the full expression in which they are created, always before the next statement executes.
In modern C++ (especially since C++17), the compiler has many tricks to avoid generating temporaries where previously it would have needed to. For example, when we use a return value to initialize a variable, this would normally result in the creation of a temporary holding the return value, and then using the temporary to initialize the variable. However, in modern C++, the compiler will often skip creating the temporary and just initialize the variable directly with the return value.
Header Files
owever, as programs grow larger (and make use of more files and functions), having to manually add a large number of (possibly different) forward declarations to the top of each file becomes extremely tedious.
Header files usually have a .h extension, but you will occasionally see them with a .hpp extension or no extension at all. Conventionally, header files are used to propagate a bunch of related forward declarations into a code file. Header files allow us to put declarations in one place and then import them wherever we need them. This can save a lot of typing in multi-file programs.
When you #include a file, the content of the included file is inserted at the point of inclusion. This provides a useful way to pull in declarations from another file.
header files only consist of two parts:
- A header guard.
- The actual content of the header file, which should be the forward declarations for all of the identifiers we want other files to be able to see.
Unlike source files, header files should not be added to your compile command (they are implicitly included by #include statements and compiled as part of your source files). If a header file is paired with a code file (e.g. add.h with add.cpp), they should both have the same base name (add). In C++, it is a best practice for code files to #include their paired header file (if one exists). This allows the compiler to catch certain kinds of errors at compile time instead of link time (such as wrong return types) saving a lot of time (Unfortunately, this doesn’t work if it is a parameter with a different type instead of a return type. This is because C++ supports overloaded functions (functions with the same name but different parameter types), so the compiler will assume a function with a mismatched parameter type is a different overload).
Although the preprocessor will happily do so, you should generally not #include .cpp files. These should be added to your project and compiled.
There are number of reasons for this:
Doing so can cause naming collisions between source files.
- In a large project it can be hard to avoid one definition rules (ODR) issues.
- Any change to such a .cpp file will cause both the .cpp file and any other .cpp file that includes it to recompile, which can take a long time. Headers tend to change less often than source files.
- It is non-conventional to do so.
Including definitions in a header file results in a violation of the ODR
Avoid putting function or variable definitions in header files. Doing so will generally result in a violation of the one-definition rule (ODR) in cases where the header file is included into more than one source file. The linker will see the repeated definition and trigger an error. An exception to this are inline constructs and templates.
Header Guards
It’s quite easy to end up in a situation where a definition in a header file gets included more than once. This can happen when a header file #includes another header file (which is common). We can avoid the above problem via a mechanism called a header guard (also called an include guard). Header guards are conditional compilation directives that take the following form:
#ifndef SOME_UNIQUE_NAME_HERE
#define SOME_UNIQUE_NAME_HERE
// your declarations (and certain types of definitions) here
#endif
When this header is #included, the preprocessor will check whether SOME_UNIQUE_NAME_HERE has been previously defined in this translation unit. If this is the first time we’re including the header, SOME_UNIQUE_NAME_HERE will not have been defined. Consequently, it #defines SOME_UNIQUE_NAME_HERE and includes the contents of the file. If the header is included again into the same file, SOME_UNIQUE_NAME_HERE will already have been defined from the first time the contents of the header were included, and the contents of the header will be ignored (thanks to the #ifndef).
All of your header files should have header guards on them. SOME_UNIQUE_NAME_HERE can be any name you want, but by convention is set to the full filename of the header file, typed in all caps, using underscores for spaces or punctuation.
Because of this possibility for guard name conflicts, many developers recommend using a more complex/unique name in your header guards. Some good suggestions are a naming convention of PROJECT_PATH_FILE_H, FILE_LARGE-RANDOM-NUMBER_H, or FILE_CREATION-DATE_H.
Header guards prevent duplicate inclusions because the first time a guard is encountered, the guard macro isn’t defined, so the guarded content is included. Past that point, the guard macro is defined, so any subsequent copies of the guarded content are excluded.
Note that the goal of header guards is to prevent a code file from receiving more than one copy of a guarded header. By design, header guards do not prevent a given header file from being included (once) into separate code files.
Modern compilers support a simpler, alternate form of header guards using the #pragma preprocessor directive:
#pragma once
// your code here
There is one known case where #pragma once will typically fail. If a header file is copied so that it exists in multiple places on the file system, if somehow both copies of the header get included, header guards will successfully de-dupe the identical headers, but #pragma once won’t (because the compiler won’t realize they are actually identical content).
With the exception of #pragma once, do not expect a pragma that works on one compiler to be supported by another.
Header guards are designed to ensure that the contents of a given header file are not copied more than once into any single file, in order to prevent duplicate definitions.
Duplicate declarations are fine -- but even if your header file is composed of all declarations (no definitions) it’s still a best practice to include header guards.
Note that header guards do not prevent the contents of a header file from being copied (once) into separate project files. This is a good thing, because we often need to reference the contents of a given header from different project files.
The Call Stack
When your program calls a function, you already know that it bookmarks the current location, makes the function call, and then returns. How does it know where to return to? The answer is that it keeps track in the call stack.
The call stack is a list of all the active functions that have been called to get to the current point of execution. The call stack includes an entry for each function called, as well as which line of code will be returned to when the function returns. Whenever a new function is called, that function is added to the top of the call stack. When the current function returns to the caller, it is removed from the top of the call stack, and control returns to the function just below it.
The call stack window is a debugger window that shows the current call stack. If you don’t see the call stack window, you will need to tell the IDE to show it.