Computer Systems Organization

· A digital computer consists of an interconnected system of processors, memories and I/O devices.

Processors

· The CPU (Central Processing Unit) is typically known as the "brain" of the computer. Its main function is to fetch instructions from main memory, examine them and execute them sequentially.

· The CPU itself is composed of several distinct parts as well. The control unit fetches the instructions from main memory and determines the type. The arithmetic and logical unit performs operations such as arithmetic and logical (AND, OR, etc) that are required to carry out the instructions.

· The CPU also contains small, high-speed memory consisting of registers used to store data temporarily. The most important register is the PC (program counter), which points to the next instruction to be executed. The IR (instruction register) holds the instruction currently being executed.

CPU Organization

· The internal organization of most CPU's is based on the classical Von Neumann CPU section called the data path.

· This part consists of 1 to 32 (typically) registers and an Arithmetic Logic Unit (ALU).

· The registers feed into two ALU input registers, which hold the ALU input while the ALU is computing.

· The ALU itself performs addition, subtraction and other simple operations on the input data and places the output in the output register, which in turn places it's output back into an output register from where it is and stored back into memory.

· Instructions are divided into two categories:

o register-memory instructions allow memory words to be fetched into registers, where they can be used as ALU inputs.

o register-register instructions fetches two operands from registers, brings them into the ALU input registers, performs some operation on them, and stores the result back into a register

o A third method can be derived from the two categories: memory-memory instructions fetches it's operands from memory into the ALU input registers, performs an operation on them and writes the result back into memory via a register.

Instruction Execution (Fetch-Decode-Execute) cycle

· Fetch next instruction from memory into IR

· Increment the PC

· Determine the type of instruction just fetched

· If the instruction requires data from memory, determine the memory address

· Fetch the data into the CPU registers

· Execute the instruction

· Store the results in the proper (specified) place

· Got to step 1 and execute the following instruction

Design Principles for Modern Computers

· Computer architects try to follow a general set of design principles called the RISC (Reduced Instruction Set Computer) design principles.

All Instructions are Directly Executed by Hardware

· By not interpreting by microinstructions we increase the speed of execution.

Maximize the rate at which Instructions are issued

· The main objective here is to increase the number of instructions per second.

· This is measured in MIPS (Millions of Instructions Per Second).

· One way of achieving this is execute multiple instructions in parallel (parallelism).

Instructions should be easy to Decode

· Is we can decode instructions faster we can determine in advance what resources we will need and acquire them.

Only Loads and Stores Should Reference Memory

Memory accesses can take up a lot of time and therefore increase execution time.

The idea is to load instructions and data into registers and manipulate them within the registers.

Then use separate instructions to store them back into memory.

Provide Plenty of Registers

· Plenty of registers means that we can have instructions and data available quickly without resorting to slow memory accesses.

Instruction-Level Parallelism

· This type of parallelism gets more instructions/sec using a single CPU.

Pipelining

· Pipelining involves fetching instructions while others are being executed.

· A number of stages can be employed in the pipelining process.

· For example, in a five-stage pipeline, stage 1 fetches an instruction, stage 2 decodes the instruction, stage 3 fetches the operands, stage 4 will execute the instruction and stage 5 will write the results back into a register.

· The main advantage here is that the overall instruction execution time is reduced considerably.

· Pipelining allows a trade-off between latency (how long it takes to execute and instruction), and processor bandwidth (MIPS).

· With a cycle time of T nsec, and n stages in the pipeline, the latency is nT nsec and the bandwidth is 1000/T MIPS.

Superscalar Architectures

· Increasing the number of pipelines will increase the performance even more.

· In a dual pipeline CPU, a single instruction fetch unit fetches pairs of instructions together and feeds each one into a separate pipeline, complete with a separate ALU for parallel operation.

· The Pentium uses a main pipeline called the u pipeline that can execute an arbitrary Pentium instruction.

· The second pipeline, called the v pipeline, could execute only simple integer instructions.

· Complex rules are used to ascertain that the instructions are compatible and will not cause resource conflicts.

· If conflicts are detected, only the first one was executed and the second one was held over.

Processor-Level Parallelism

· From the earliest days of computing the objective of all engineers is to enhance the performance of the machine i.e. to make it faster.

· To some extent, machines can be accelerated by just speeding up the hardware. However, various physical limits are being rapidly approached.

· The laws of physics (as we understand them today) state that nothing can travel faster than the speed of light, which is about 30 cm/nanosecond in vacuum and 20 cm/nanosecond in copper wire.

· This means that to build a computer with a 1 nsec instruction time, the total distance that the electrical signals can travel, within the CPU, to memory, and back, cannot be more than 20 cm.

· Therefore very fast computers have to be very small. Unfortunately fast computers produce more heat than ones, and packaging all the components into a small volume makes heat dissipation difficult.

· Consequently, supercomputers are sometimes submersed in liquid Freon, in order to transfer the heat out as quickly as possible.

· Another approach is taken to facilitate high speed. Instead of one high- speed CPU, it is possible to build a machine with several slower (cheaper) ALU's or complete CPU's to obtain the same computing power at a lower cost.

Array Computers

· A single instruction is applied to multiple data sets.

· These machines operate on multiple data sets in parallel. A typical application for this machine is weather forecasting where the daily temperature average from several sites has to be computed from the 24 hourly averages.

· Note that for each site, exactly the same computation must be done, but with different data.

· An array processor consists of a large number of identical processors that perform the same sequence of instructions on different sets of data.

· A vector processor appears similar to an array processor except that all of the addition operations are performed in a single, heavily pipelined adder.

Multiprocessors

· This is a system with multiple full-blown CPUs.

· Different CPU's execute different programs, sometimes sharing some common memory. An application of this is an airline reservations system, where multiple instruction and data streams are required.

· This scheme however results in bus conflicts simply because several fast processors are constantly trying to access memory over the same bus.

· Several techniques have been designed to reduce bus contention and improve performance. These range from providing each processor with it's own memory to caching where each processor keeps frequently used words within it's own memory.

Multicomputers

· Multiple CPU systems are difficult to connect to memory systems.

· It us much easier to design systems that consist of a number of autonomous, interconnected (networked) computers.

· The various CPUs coordinate tasks by sending each other messages.