Basic Data Type - eecse4750/e4750_2024Fall_students_repo GitHub Wiki
Introduction
In the C
programming language, data types are declarations for memory locations or variables that determine the characteristics of the data that may be stored and the methods (operations) of processing that are permitted involving them.
Data types are important. Arithmetic operations usually require the operands to share the same data type to produce a meaningful result. In Python, programmers tend to ignore data types since compiler will convert the data type for them, and this also happens in language C.
However, data types are extremely important in GPU parallel computing because compilers will not convert data types when one transmits data between CPU and GPU. One have to carefully check that the data declarations in both C and Python conform to each other, otherwise data will be lost and thread/block indexes will be in a mess.
Built-in Data Types
We care built-in data types most since in our course we will not create compound data type. (No structure, no class)
Here are main basic data types:
Type | Explanations |
---|---|
char | Smallest addressable unit of the machine that can contain basic character set. It is an integer type. Actual type can be either signed or unsigned. Usually It contains 8 bits. |
signed char | Of the same size as char, but guaranteed to be signed. Capable of containing at least the [−127, +127] range. |
unsigned char | Of the same size as char, but guaranteed to be unsigned. Contains at least the [0, 255] range. |
short int | Short signed integer type. Capable of containing at least the [−32,767, +32,767] range; thus, it is at least 16 bits in size. The negative value is −32767 (not −32768) due to the one's-complement and sign-magnitude representations allowed by the standard, though the two's-complement representation is much more common. |
unsigned short | Short unsigned integer type. Contains at least the [0, 65,535] range. |
int | Basic signed integer type. Capable of containing at least the [−32,767, +32,767] range; thus, it is at least 16 bits in size. |
unsigned int | Basic unsigned integer type. Contains at least the [0, 65,535] range. |
long int | Long signed integer type. Capable of containing at least the [−2,147,483,647, +2,147,483,647] range; thus, it is at least 32 bits in size. |
unsigned long | Long unsigned integer type. Capable of containing at least the [0, 4,294,967,295] range. |
long long int | Long long signed integer type. Capable of containing at least the [−9,223,372,036,854,775,807, +9,223,372,036,854,775,807] range;thus, it is at least 64 bits in size. |
unsigned long long | Long long unsigned integer type. Contains at least the [0, +18,446,744,073,709,551,615] range. |
float | Real floating-point type, usually referred to as a single-precision floating-point type. Actual properties unspecified (except minimum limits), however on most systems this is the IEEE 754 single-precision binary floating-point format (32 bits). This format is required by the optional Annex F "IEC 60559 floating-point arithmetic". |
double | Real floating-point type, usually referred to as a double-precision floating-point type. Actual properties unspecified (except minimum limits), however on most systems this is the IEEE 754 double-precision binary floating-point format (64 bits). This format is required by the optional Annex F "IEC 60559 floating-point arithmetic". |
long double | Real floating-point type, usually mapped to an extended precision floating-point number format. Actual properties unspecified. It can be either x86 extended-precision floating-point format (80 bits, but typically 96 bits or 128 bits in memory with padding bytes), the non-IEEE "double-double" (128 bits), IEEE 754 quadruple-precision floating-point format (128 bits), or the same as double. See the article on long double for details. |
Data Type Match
Parallel computing requires exact data type matching. That is, for example, when a variable in GPU is declared as int
, the CPU has to transmit a exact int
data with the same bits length to it. short int
, long int
, unsigned int
data will not work.
However, the bits length of data type is not strictly defined in C standard. Take int
for example, it could be 32 bits, or in modern CPU, it could be 64 bits. In GPU, usually it is 32 bits. For float
, it is usually 32 bits in both CPU and GPU in C, but it is 64 bits in Python. It is unnecessary to remember all this details, the important thing is to make sure the data type is the same. Always use sizeof
function to check the number of bits of a built-in data type before you run your program.
What if data type is not matched?
If bit length is mismatched, then the threads have to be re-indexed. Take int
again for example. If a variable in GPU is defined as int
(say it's 32 bits), but CPU transmit long int
data (say it's 64 bits) to it, then two threads represent this data instead of one, with one thread represents the lower 32 bits and other represents higher 32 bits. However, this is danger because the number of threads may not be enough to hold the long int
data (after all, you told GPU it is a int
) and leading to 'out of bounds' error. If a variable in GPU is defined as long int
but CPU transmit int
data to it, then a thread will represent two variables, with the higher 32 bits repenting one variable and lower 32 bits representing another.
If unsigned/signed is mismatched, then the data has to be re-organized. Take int
for example, if CPU transmits a int
to an unsigned int
, then you have to negative each bit in that variable and then plus 1 (if singed number follows one's complement). Similar transform is needed if CPU transmits a unsigned int
to a int
.
However, you may find a way to correct int
mismatching, but for float
mismatch, it is much more complicated, let aloe complete mismatch (e.g., int
matches with float
). Therefore always confirm the data type before transmitting data from CPU to GPU.