Datapath

Execution

  1. Fetch instruction from PC
  2. Decode instruction
    • Read source registers
  3. Perform the operation with ALU
    1. Arithmetic & logic: ALU
    2. Memory-reference instructions: ALU for address calculation
    3. Conditional branch instructions: ALI for comparison
  4. Memory access
    1. lw, sw
  5. Write back the result
    1. PC += 4 OR change PC to branch target address

PC is flip-flop, falling edge, only update at the end

Operations

  • R-format
    • Read 2 operands
    • ALI perform arithmetic/logical operation
    • Write register result
  • I-format load/store (e.g. lw $t0, 4($s0))
    • Read base register operand ($s0)
    • ALU adds base address with 16-bit sign-extended offset (4)
    • Load: Read memory and update register
    • Store: Write register value to memory
  • I-format branch
    • Read register operands
    • ALU compares operands by subtraction, check zero output (branch or not)
    • Calculate branch target address

Control

ALU Control

  • load/store: add
  • branch: subtract
  • r-type: depends on funct
ALU Control InputFunction
0000and
0001or
0010add
0110subtract
0111set on less than
1100nor
  • 1-level decoding
    • more input bits
    • 6 bit opcode + 6 bit funct = 12 bits
    • 2^12 = 4096
  • 2-level decoding
    • less input bits, less complicated, faster logic
    • 6 bit opcode 2 bit ALUOp + 6 bit funct = 8 bits
    • 2^8 = 256
opcodeALUOpOperationfunctALU functionALU control
lw00load wordadd0010
sw00store wordadd0010
beq01branch equalsubtract0110
R-type10add100000add0010
subtract100010subtract0110
and100100and0000
or100101or0001
set on less than101010set on less than0111
ALUOp1ALUOp0F5F4F3F2F1F0Operation
000010lwadd
10110beqsub
100000010add
100100110sub
101000000and
101010001or
110100111slt

e.g.

Control Signals

SignalDeassertedAsserted
RegDstwrite to rtwrite to rd
RegWriteregister write
ALUSrcsecond register file outputsign extended immeidate
PCSrcPC + 4Branch
MemReadmemory read
MemWritememory write
MemtoRegwrite from output from ALUwrite from output from memory

Pipelining

  • Multiple tasks simultaneously

  • Independent

  • Does not help latency of single task

  • Helps the throughput

  • Potential speedup = number of pipeline stages

  • Pipeline rate is limited by the slowest pipeline stage

  • Unbalanced length can reduce speedup

  • Have to ensure no overlap

  • Limited by the slowest stage vs single cycle limited by the sum of all

Structural Hazards

https://stackoverflow.com/a/77893282