CPP‐Fundamental Data Types - rFronteddu/general_wiki GitHub Wiki
Variables are names for a piece of memory that can be used to store information. When a variable is defined, a piece of RAM is set aside for that variable.
- The smallest unit of memory is a binary digit (also called a bit).
- Memory is organized into sequential units called memory addresses (or addresses for short).
- Each memory address holds 1 byte of data. A byte is a group of bits that are operated on as a unit. The modern standard is that a byte is comprised of 8 sequential bits.
- We use a data type (often called a type for short) to tell the compiler how to interpret the contents of memory in some meaningful way.
When you give an object a value, the compiler and CPU take care of encoding your value into the appropriate sequence of bits for that data type, which are then stored in memory (remember: memory can only store bits). For example, if you assign an integer object the value 65, that value is converted to the sequence of bits 0100 0001 and stored in the memory assigned to the object. Conversely, when the object is evaluated to produce a value, that sequence of bits is reconstituted back into the original value. Meaning that 0100 0001 is converted back into the value 65.
The C++ language comes with many predefined data types available for your use. The most basic of these types are called the fundamental data types (informally sometimes called basic types or primitive types).
- Floating Point (number with a fractional part): float, double, long double
- bool, char, wchar_t, char8_t, char16_t, char32_t, short int, int long int, long long int, std::nullpt_t
- void
The C++ standard defines the following terms:
- The standard integer types are short, int, long, long long (including their signed and unsigned variants). The integral types are bool, the various char types, and the standard integer types.
All integral types are stored in memory as integer values, but only the standard integer types will display as an integer value when output.
Void is our first example of an incomplete type. An incomplete type is a type that has been declared but not yet defined. The compiler knows about the existence of such types, but does not have enough information to determine how much memory to allocate for objects of that type. void is intentionally incomplete since it represents the lack of a type, and thus cannot be defined. Incomplete types can not be instantiated.
Most objects take up more than 1 byte of memory based on its data type. When we access some variable x in our source code, the compiler knows how many bytes of data need to be retrieved (based on the type of variable x), and will output the appropriate machine language code to handle that detail for us.
The C++ standard does not define the exact size (in bits) of any of the fundamental types. Instead, the standard says the following:
- An object must occupy at least 1 byte (so that each object has a distinct memory address).
- A byte must be at least 8 bits.
- The integral types char, short, int, long, and long long have a minimum size of 8, 16, 16, 32, and 64 bits respectively.
- char and char8_t are exactly 1 byte (at least 8 bits).
In order to determine the size of data types on a particular machine, C++ provides an operator named sizeof. The sizeof operator is a unary operator that takes either a type or a variable, and returns the size of an object of that type (in bytes).
#include <iomanip> // for std::setw (which sets the width of the subsequent output)
#include <iostream>
#include <climits> // for CHAR_BIT
int main()
{
std::cout << "A byte is " << CHAR_BIT << " bits\n\n";
std::cout << std::left; // left justify output
std::cout << std::setw(16) << "bool:" << sizeof(bool) << " bytes\n";
std::cout << std::setw(16) << "char:" << sizeof(char) << " bytes\n";
std::cout << std::setw(16) << "short:" << sizeof(short) << " bytes\n";
std::cout << std::setw(16) << "int:" << sizeof(int) << " bytes\n";
std::cout << std::setw(16) << "long:" << sizeof(long) << " bytes\n";
std::cout << std::setw(16) << "long long:" << sizeof(long long) << " bytes\n";
std::cout << std::setw(16) << "float:" << sizeof(float) << " bytes\n";
std::cout << std::setw(16) << "double:" << sizeof(double) << " bytes\n";
std::cout << std::setw(16) << "long double:" << sizeof(long double) << " bytes\n";
return 0;
}
Trying to use sizeof on an incomplete type (such as void) will result in a compilation error. You can also use the sizeof operator on a variable name. sizeof does not include dynamically allocated memory used by an object.
On modern machines, objects of the fundamental data types are fast, so performance while using or copying these types should generally not be a concern. CPUs are often optimized to process data of a certain size (e.g. 32 bits), and types that match that size may be processed quicker. On such a machine, a 32-bit int could be faster than a 16-bit short or an 8-bit char.
For signed types prefer the shorthand types that do not use the int (int short) suffix or signed (signed short) prefix since they are redundant.
We call the set of specific values that a data type can hold its range. The range of an integer variable is determined by two factors: its size (in bits), and whether it is signed or not. For example, an 8-bit signed integer has a range of -128 to 127. This means an 8-bit signed integer can store any integer value between -128 and 127 (inclusive) An n-bit signed variable has a range of
The above ranges assume “two’s complement” binary representation. This representation is the de-facto standard for modern architectures (as it is easier to implement in hardware), and is now required by the C++20 standard.
The C++20 standard makes this blanket statement: “If during the evaluation of an expression, the result is not mathematically defined or not in the range of representable values for its type, the behavior is undefined”. Colloquially, this is called overflow. Therefore, assigning value 140 to an 8-bit signed integer will result in undefined behavior.
If an arithmetic operation (such as addition or multiplication) attempts to create a value outside the range that can be represented, this is called integer overflow (or arithmetic overflow). For signed integers, integer overflow will result in undefined behavior.
When doing division with two integers (called integer division), C++ always produces an integer result. Since integers can’t hold fractional values, any fractional portion is simply dropped (not rounded!).
C++ also supports unsigned integers. Unsigned integers are integers that can only hold non-negative whole numbers. A 1-byte unsigned integer has a range of 0 to 255. Compare this to the 1-byte signed integer range of -128 to 127. Both can store 256 different values, but signed integers use half of their range for negative numbers, whereas unsigned integers can store positive numbers that are twice as large. When no negative numbers are required, unsigned integers are well-suited for networking and systems with little memory, because unsigned integers can store more positive numbers without taking up extra memory. If an unsigned value is out of range, it is divided by one greater than the largest number of the type, and only the remainder kept. The number 280 is too big to fit in our 1-byte range of 0 to 255. 1 greater than the largest number of the type is 256. Therefore, we divide 280 by 256, getting 1 remainder 24. The remainder of 24 is what is stored. This is called “wraps around” (sometimes called “modulo wrapping”). It’s possible to wrap around the other direction as well. 0 is representable in a 2-byte unsigned integer, so that’s fine. -1 is not representable, so it wraps around to the top of the range, producing the value 65535. -2 wraps around to 65534. And so forth.
Many developers (and some large development houses, such as Google) believe that developers should generally avoid unsigned integers.
This is largely because of two behaviors that can cause problems.
First, with signed values, it takes a little work to accidentally overflow the top or bottom of the range because those values are far from 0. With unsigned numbers, it is much easier to overflow the bottom of the range, because the bottom of the range is 0, which is close to where the majority of our values are.
Another common unwanted wrap-around happens when an unsigned integer is repeatedly decremented by 1, until it tries to decrement to a negative number.
Second, and more insidiously, unexpected behavior can result when you mix signed and unsigned integers. In C++, if a mathematical operation (e.g. arithmetic or comparison) has one signed integer and one unsigned integer, the signed integer will usually be converted to an unsigned integer. And the result will thus be unsigned.
The fixed-width integers are defined (in the \ header) as follows: std::int8_t, std::uint8_t... up to 64_t.
Due to an oversight in the C++ specification, modern compilers typically treat std::int8_t and std::uint8_t (and the corresponding fast and least fixed-width types, which we’ll introduce in a moment) the same as signed char and unsigned char respectively.
The fixed-width integers actually don’t define new types -- they are just aliases for existing integral types with the desired size. For each fixed-width type, the implementation (the compiler and standard library) gets to determine which existing type is aliased. As an example, on a platform where int is 32-bits, std::int32_t will be an alias for int. On a system where int is 16-bits (and long is 32-bits), std::int32_t will be an alias for long instead.
So what about the 8-bit fixed-width types?
In most cases, std::int8_t is an alias for signed char because it is the only available 8-bit signed integral type (bool and char are not considered to be signed integral types). And when this is the case, std::int8_t will behave just like a char on that platform.
- if (x) means “if x is non-zero/non-empty”.
However, in rare cases, if a platform has an implementation-specific 8-bit signed integral type, the implementation may decide to make std::int8_t an alias for that type instead. In that case, std::int8_t will behave like that type, which may be more like an int than a char.
std::uint8_t behaves similarly.
The fixed-width integers are not guaranteed to be defined on all architectures.
If you use a fixed-width integer, it may be slower than a wider type on some architectures. For example, if you need an integer that is guaranteed to be 32-bits, you might decide to use std::int32_t, but your CPU might actually be faster at processing 64-bit integers. However, just because your CPU can process a given type faster doesn’t mean your program will be faster overall -- modern programs are often constrained by memory usage rather than CPU, and the larger memory footprint may slow your program more than the faster CPU processing accelerates it. It’s hard to know without actually measuring.
sizeof returns a value of type std::size_t. std::size_t is an alias for an implementation-defined unsigned integral type. In other words, the compiler decides if std::size_t is an unsigned int, an unsigned long, an unsigned long long, etc. std::size_t is actually a typedef.
std::size_t is defined in a number of different headers. If you need to use std::size_t, is the best header to include, as it contains the least number of other defined identifiers. Much like an integer can vary in size depending on the system, std::size_t also varies in size. std::size_t is guaranteed to be unsigned and at least 16 bits, but on most systems will be equivalent to the address-width of the application. That is, for 32-bit applications, std::size_t will typically be a 32-bit unsigned integer, and for a 64-bit application, std::size_t will typically be a 64-bit unsigned integer.
The sizeof operator must be able to return the byte-size of an object as a value of type std::size_t. Therefore, the byte-size of an object can be no larger than the largest value std::size_t can hold. f it were possible to create a larger object, sizeof would not be able to return its byte-size, as it would be outside the range that a std::size_t could hold. Thus, creating an object with a size (in bytes) larger than the largest value an object of type std::size_t can hold is invalid (and will cause a compile error). The size of std::size_t imposes a strict mathematical upper limit on an object’s size. In practice, the largest creatable object may be smaller than this amount (perhaps significantly so).
Some compilers limit the largest creatable object to half the maximum value of std::size_t (an explanation for this can be found here).
Other factors may also play a role, such as how much contiguous memory your computer has available for allocation.
When 8-bit and 16-bit applications were the norm, this limit imposed a significant constraint on the size of objects. In the 32-bit and 64-bit era, this is rarely an issue, and therefore not something you generally need to worry about.
The char data type was designed to hold a single character. A character can be a single letter, number, symbol, or whitespace. The char data type is an integral type, meaning the underlying value is stored as an integer. Similar to how a Boolean value 0 is interpreted as false and non-zero is interpreted as true, the integer stored by a char variable are intepreted as an ASCII character.
ASCII stands for American Standard Code for Information Interchange, and it defines a particular way to represent English characters (plus a few other symbols) as numbers between 0 and 127 (called an ASCII code or code point). For example, ASCII code 97 is interpreted as the character ‘a’.
Character literals are always placed between single quotes (e.g. ‘g’, ‘1’, ‘ ‘).