Computer Architecture - sammanthp007/Linux-Kernel-Development GitHub Wiki

Computer Architecture:

Chapter 1: 1.1: In addition, two significant changes in the computer marketplace made it easier than ever before to succeed commercially with a new architecture.

First, the virtual elimination of assembly language programming reduced the need for object-code compatibility. Second, the creation of standard- ized, vendor-independent operating systems, such as UNIX and its clone, Linux, lowered the cost and risk of bringing out a new architecture. These changes made it possible to develop successfully a new set of architec- tures with simpler instructions, called RISC (Reduced Instruction Set Computer) architectures, in the early 1980s. The RISC-based machines focused the attention of designers on two critical performance techniques,

the exploitation of instruction- level parallelism (initially through pipelining and later through multiple instruction issue) and the use of caches (initially in simple forms and later using more sophisti- cated organizations and optimizations). Since 2003, single-processor performance improvement has dropped to less than 22% per year due to

the twin hurdles of maximum power dissipation of air- cooled chips and the lack of more instruction-level parallelism to exploit efficiently. 1.2:

Classes of Computers:

A. Personal Mobile Devices (PMD):

Cost is a prime concern given the consumer price for the whole prod- uct is a few hundred dollars. Although the emphasis on energy efficiency is frequently driven by the use of batteries, the need to use less expensive packaging— plastic versus ceramic—and the absence of a fan for cooling also limit total power consumption.

Responsiveness and predictability are key characteristics for media applications.

Other key characteristics in many PMD applications are

the need to minimize memory and the need to use energy efficiently. B. Desktop Computing

Throughout this range in price and capability, the desktop market tends to be driven to optimize price-performance.

C. Server

For servers, different characteristics are important.

First, availability is critical A second key feature of server systems is scalability Finally, servers are designed for efficient throughput. That is, the overall performance of the server—in terms of transactions per minute or Web pages served per second—is what is crucial. D. Clusters/Warehouse-scale computers

Clusters are collections of desktop computers or servers connected by local area networks to act as a single larger computer.

Price-performance and power are critical to WSCs since they are so large. As Chapter 6 explains, 80% of the cost of a $90M warehouse is associated with power and cooling of the computers inside.

Supercomputers are related to WSCs in that they are equally expensive, cost- ing hundreds of millions of dollars, but supercomputers differ by emphasizing floating-point performance and by running large, communication-intensive batch programs that can run for weeks at a time.

WSCs emphasize interactive applications, large-scale storage, dependability, and high Internet bandwidth.

E. Embedded Computers

We use the ability to run third-party software as the dividing line between non-embedded and embedded computers.

Although the range of computing power in the embedded computing market is very large, price is a key factor in the design of computers for this space. Performance requirements do exist, of course, but the primary goal is often meeting the performance need at a minimum price, rather than achieving higher performance at a higher price.

Classes of Parallelism and Parallel Architectures

Parallelism at multiple levels is now the driving force of computer design across all four classes of computers, with energy and cost being the primary constraints. There are basically two kinds of parallelism in applications:

Data-Level Parallelism (DLP) arises because there are many data items that can be operated on at the same time.

Task-Level Parallelism (TLP) arises because tasks of work are created that can operate independently and largely in parallel.

Computer hardware in turn can exploit these two kinds of application parallelism in four major ways:

Instruction-Level Parallelism exploits data-level parallelism at modest levels with compiler help using ideas like pipelining and at medium levels using ideas like speculative execution.

Vector Architectures and Graphic Processor Units (GPUs) exploit data-level parallelism by applying a single instruction to a collection of data in parallel.

Thread-Level Parallelism exploits either data-level parallelism or task-level parallelism in a tightly coupled hardware model that allows for interaction among parallel threads.

Request-Level Parallelism exploits parallelism among largely decoupled tasks specified by the programmer or the operating system.

All computers can be put into one of four categories:

  1. Single instruction stream, single data stream (SISD):

This category is the uniprocessor. The programmer thinks of it as the standard sequential com- puter, but it can exploit instruction-level parallelism. 2. Single instruction stream, multiple data streams (SIMD):

The same instruction is executed by multiple processors using different data streams. SIMD computers exploit data-level parallelism by applying the same operations to multiple items of data in parallel. Each processor has its own data memory (hence the MD of SIMD), but there is a single instruction memory and control processor, which fetches and dispatches instructions. 3. Multiple instruction streams, single data stream (MISD):

No commercial multiprocessor of this type has been built to date 4. Multiple instruction streams, multiple data streams (MIMD):

Each processor fetches its own instructions and operates on its own data, and it targets task-level parallelism. In general, MIMD is more flexible than SIMD and thus more generally applicable, but it is inherently more expensive than SIMD. For example, MIMD computers can also exploit data-level parallel- ism, although the overhead is likely to be higher than would be seen in an SIMD computer. This overhead means that grain size must be sufficiently large to exploit the parallelism efficiently. The task the computer designer faces is a complex one:

Determine what attributes are important for a new computer, then Design a computer to maximize performance and energy efficiency while staying within cost, power, and availability constraints.

This task has many aspects, including

instruction set design, functional organization, logic design, and implementation. The implementation may encompass: integrated circuit design, packaging, power, and cooling. Optimizing the design requires familiarity with a very wide range of technologies, from compilers and operating systems to logic design and packaging.

Instruction Set Architecture

The abstract model of a computer. Instruction set are the set of all the codes in machine code that the CPU can process. There are seven dimensions of an ISA:

Class of ISA: Almost all today are general purpose register architecture, where operands are either register or memory locations. Two popular version of this are: Register-memory ISA (x86) where memory can be accessed directly as part of many instruction Load store ISA (ARM, MIPS) Memory Addressing: Virtually all architectures use byte addressing, but MIPS and ARM require word alignment, but x86 does not require alignment. Addressing nodes: MIPS has registers, Immediate (constants), Displacement (offset) but x86 has more than just this much. Types and size of operands: 8-bit (ASCII), 16-bit (Unicode), 32-bit (integer or word), 64 bit double.. but x86 has 80 bit floating point too Operations: More instructions Control flow instructions: MIPS looks at register values, x86 and ARM have branches from condition code bits set as side effects of arithmetic operations Encoding an ISA: ARM and MIPS has 32 bit instruction length, x86 has variable instruction size.