vortex/docs/microarchitecture.md
2024-10-23 05:18:53 -04:00

83 lines
No EOL
2.7 KiB
Markdown

# Vortex Microarchitecture
### Vortex GPGPU Execution Model
Vortex uses the SIMT (Single Instruction, Multiple Threads) execution model with a single warp issued per cycle.
- **Threads**
- Smallest unit of computation
- Each thread has its own register file (32 int + 32 fp registers)
- Threads execute in parallel
- **Warps**
- A logical clster of threads
- Each thread in a warp execute the same instruction
- The PC is shared; maintain thread mask for Writeback
- Warp's execution is time-multiplexed at log steps
- Ex. warp 0 executes at cycle 0, warp 1 executes at cycle 1
### Vortex RISC-V ISA Extension
- **Thread Mask Control**
- Control the number of warps to activate during execution
- `TMC` *count*: activate count threads
- **Warp Scheduling**
- Control the number of warps to activate during execution
- `WSPAWN` *count, addr*: activate count warps and jump to addr location
- **Control-Flow Divergence**
- Control threads activation when a branch diverges
- `SPLIT` *taken, predicate*: apply predicate thread mask and save current state into IPDOM stack
- `JOIN`: pop IPDOM stack to restore thread mask
- `PRED` *predicate, restore_mask*: thread predicate instruction
- **Warp Synchronization**
- `BAR` *id, count*: stall warps entering barrier *id* until count is reached
### Vortex Pipeline/Datapath
![Image of Vortex Microarchitecture](./assets/img/vortex_microarchitecture.png)
Vortex has a 6-stage pipeline:
- **Schedule**
- Warp Scheduler
- Schedule the next PC into the pipeline
- Track stalled, active warps
- IPDOM Stack
- Save split/join states for divergent threads
- Inflight Tracker
- Track in-flight instructions
- **Fetch**
- Retrieve instructions from memory
- Handle I-cache requests/responses
- **Decode**
- Decode fetched instructions
- Notify warp scheduler on control instructions
- **Issue**
- IBuffer
- Store decoded instructions in separate per-warp queues
- Scoreboard
- Track in-use registers
- Check register use for decoded instructions
- Operands Collector
- Fetch the operands for issued instructions from the register file
- **Execute**
- ALU Unit
- Handle arithmetic and branch operations
- FPU Unit
- Handle floating-point operations
- LSU Unit
- Handle load/store operations
- SFU Unit
- Handle warp control operations
- Handle Control Status Registers (CSRs) operations
- **Commit**
- Write result back to the register file and update the Scoreboard.
### Vortex clustering architecture
- Sockets
- Grouping multiple cores sharing L1 cache
- Clusters
- Grouping of sockets sharing L2 cache
### Vortex Cache Subsystem
More details about the cache subsystem are provided [here](./cache_subsystem.md).