Word - peregrineshahin/ChessProgrammingWiki GitHub Wiki
Home * Programming * Data * Word
A Word or Computer Word, is a term for the natural unit of data used by a particular computer architecture. Modern computers usually have a word size to be a power of 2 multiple of the unit of address resolution, likely a Byte, that is two, four, or eight Bytes, which are 16, 32, or 64 bits. Many other sizes have been used in the past, including 8 (a Byte), 9, 12, 18, 24, 36, 39, 40, 48, and 60 bits. Some of the early computers were decimal rather than binary, having a word size of 10 or 12 decimal digits, and some of them had no fixed word length at all.
On recent 32-bit and 64-bit processors the primitive C datatype short and unsigned short refers to 16-bit words by most compilers for those architectures. In Java, short is guaranteed to have 16-bit. Signed short in C is assumed to use Twos' Complement, but not strictly specified. A Word-type, explicitly type-defined in C, is therefor usually treated as unsigned, also to avoid arithmetical right shift issues:
typedef unsigned char BYTE;
typedef unsigned short WORD;
| language | type | min | max | | --- | --- | --- | --- | | C, C++ | unsigned short | 0 | 65535 | | hexadecimal | 0x0000 | 0xFFFF | | #include <limits.h> | | USHRT_MAX | | C, C++,Java | short | -32768 | 32767 | | hexadecimal | 0x8000 | 0x7FFF | | #include <limits.h> | SHRT_MIN | SHRT_MAX |
Words stored in memory should be stored at even byte addresses. Otherwise at runtime it will cause a miss-alignment exception on some processors, or a huge penalty on others.
Main article: Endianness. An issue with words consisting of two or more bytes, is the order, bytes may appear inside a word of memory. According to their usual arithmetical significance, there is a low and a high byte of a 16-bit word, which may either be stored at the lower or higher byte-address in memory. Intel processors were always so called little-endian machines, the least significant byte (LSB) is at the lowest address. Other processors, including the IBM 370 family, the PDP-10 (36 bit), the Motorola microprocessor families, and most of the various RISC designs are big-endian, and store the ‘big-end-first’.
Following C union to extract or synthesize bytes from/in words, is not portable and should be avoided.
union {
BYTE b[2];
WORD s;
} u;
u.s = 0xaa55;
assert (u.b[0] == 0x55); // fails, if big-endian
The portable way in C can be done with inlined functions or C preprocessor macros, using arithmetical divide or modulo by 256, aka shift and mask by bitwise 'and' - or for the synthesis multiplication of high byte by 256 plus low byte:
BYTE lowByte (WORD s) {return (BYTE)(s & 255);} // mod 256
BYTE highByte(WORD s) {return (BYTE)(s >> 8);} // div 256
WORD makeWORD (BYTE high, BYTE low) {
WORD s = high;
return (s << 8) + low; // high * 256 + low
}
- Word from Wikipedia
- Byte from Wikipedia
- Endianness from Wikipedia
- Understanding Big and Little Endian Byte Order
- IEN 137 - DAV's Endian FAQ - On Holy Wars and a Plea for Peace by Danny Cohen, U S C/I S I, April 1, 1980
- Mahavishnu Orchestra - One Word, Live at Bananafish Gardens, Brooklyn, N.Y. 1973, YouTube Video
John McLaughlin, Billy Cobham, Rick Laird, Jan Hammer, Jerry Goodman