Datapath
Execution
- Fetch instruction from PC
- Decode instruction
- Read source registers
- Perform the operation with ALU
- Arithmetic & logic: ALU
- Memory-reference instructions: ALU for address calculation
- Conditional branch instructions: ALI for comparison
- Memory access
lw,sw
- Write back the result
- PC += 4 OR change PC to branch target address
PC is flip-flop, falling edge, only update at the end
Operations
- R-format
- Read 2 operands
- ALI perform arithmetic/logical operation
- Write register result
- I-format load/store (e.g.
lw $t0, 4($s0))- Read base register operand (
$s0) - ALU adds base address with 16-bit sign-extended offset (4)
- Load: Read memory and update register
- Store: Write register value to memory
- Read base register operand (
- I-format branch
- Read register operands
- ALU compares operands by subtraction, check zero output (branch or not)
- Calculate branch target address
Control
ALU Control
- load/store: add
- branch: subtract
- r-type: depends on funct
| ALU Control Input | Function |
|---|---|
| 0000 | and |
| 0001 | or |
| 0010 | add |
| 0110 | subtract |
| 0111 | set on less than |
| 1100 | nor |
- 1-level decoding
- more input bits
- 6 bit opcode + 6 bit funct = 12 bits
- 2^12 = 4096
- 2-level decoding
- less input bits, less complicated, faster logic
- 6 bit opcode → 2 bit ALUOp + 6 bit funct = 8 bits
- 2^8 = 256
| opcode | ALUOp | Operation | funct | ALU function | ALU control |
|---|---|---|---|---|---|
| lw | 00 | load word | add | 0010 | |
| sw | 00 | store word | add | 0010 | |
| beq | 01 | branch equal | subtract | 0110 | |
| R-type | 10 | add | 100000 | add | 0010 |
| subtract | 100010 | subtract | 0110 | ||
| and | 100100 | and | 0000 | ||
| or | 100101 | or | 0001 | ||
| set on less than | 101010 | set on less than | 0111 |
| ALUOp1 | ALUOp0 | F5 | F4 | F3 | F2 | F1 | F0 | Operation | ||
|---|---|---|---|---|---|---|---|---|---|---|
| 0 | 0 | 0010 | lw | add | ||||||
| 1 | 0110 | beq | sub | |||||||
| 1 | 0 | 0 | 0 | 0 | 0010 | add | ||||
| 1 | 0 | 0 | 1 | 0 | 0110 | sub | ||||
| 1 | 0 | 1 | 0 | 0 | 0000 | and | ||||
| 1 | 0 | 1 | 0 | 1 | 0001 | or | ||||
| 1 | 1 | 0 | 1 | 0 | 0111 | slt |
e.g.
Control Signals
| Signal | Deasserted | Asserted |
|---|---|---|
| RegDst | write to rt | write to rd |
| RegWrite | register write | |
| ALUSrc | second register file output | sign extended immeidate |
| PCSrc | PC + 4 | Branch |
| MemRead | memory read | |
| MemWrite | memory write | |
| MemtoReg | write from output from ALU | write from output from memory |
Pipelining
-
Multiple tasks simultaneously
-
Independent
-
Does not help latency of single task
-
Helps the throughput
-
Potential speedup = number of pipeline stages
-
Pipeline rate is limited by the slowest pipeline stage
-
Unbalanced length can reduce speedup
-
Have to ensure no overlap
-
Limited by the slowest stage vs single cycle limited by the sum of all
Structural Hazards
