Added codebase and microarch guides and updated vortex and simulation guides slightly

2025-04-23 13:27:29 -04:00 · 2021-04-23 18:31:52 -04:00 · 2021-04-23 18:31:52 -04:00 · e3e5c178ff
commit e3e5c178ff
parent 2c908b4a07
4 changed files with 133 additions and 5 deletions
--- a/doc/Codebase.md
+++ b/doc/Codebase.md
@ -0,0 +1,35 @@
+# Vortex Codebase
+
+The directory/file layout of the Vortex codebase is as followed:
+
+- `benchmark`: contains opencl, risc-v, and vector tests
+  - `opencl`: contains basic kernel operation tests (i.e. vector add, transpose, dot product)
+  - `riscv`: contains official riscv tests which are pre-compiled into binaries
+  - `vector`: tests for vector instructions (not yet implemented)
+- `ci`: contain tests to be run during continuous integration (Travis CI)
+  - driver, opencl, riscv_isa, and runtime tests
+- `driver`: contains driver software implementation (software that is run on the host to communicate with the vortex processor)
+  - `opae`: contains code for driver that runs on FPGA
+  - `rtlsim`: contains code for driver that runs on local machine (driver built using verilator which converts rtl to c++ binary)
+  - `simx`: contains code for driver that runs on local machine (vortex)
+  - `include`: contains vortex.h which has the vortex API that is used by the drivers
+- `runtime`: contains software used inside kernel programs to expose GPGPU capabilities
+  - `include`: contains vortex API needed for runtime
+  - `linker`: contains linker file for compiling kernels
+  - `src`: contains implementation of vortex API (from include folder)
+  - `tests`: contains runtime tests
+    - `simple`: contains test for GPGPU functionality allowed in vortex
+- `simx`: contains simX, the cycle approximate simulator for vortex
+- `miscs`: contains old code that is no longer used
+- `hw`: 
+  - `unit_tests`: contains unit test for RTL of cache and queue
+  - `syn`: contains all synthesis scripts (quartus and yosys)
+    - `quartus`: contains code to synthesis cache, core, pipeline, top, and vortex stand-alone
+  - `simulate`: contains RTL simulator (verilator)
+    - `testbench.cpp`: runs either the riscv, runtime, or opencl tests
+  - `opae`: contains source code for the accelerator functional unit (AFU) and code which programs the fpga
+  - `rtl`: contains rtl source code
+    - `cache`: contains cache subsystem code
+    - `fp_cores`: contains floating point unit code
+    - `interfaces`: contains code that handles communication for each of the units of the microarchitecture
+    - `libs`: contains general-purpose modules (i.e., buffers, encoders, arbiters, pipe registers)
--- a/doc/Microarchitecture.md
+++ b/doc/Microarchitecture.md
@ -0,0 +1,94 @@
+# Vortex Microarchitecture
+
+### Vortex GPGPU Execution Model
+
+Vortex uses the SIMT (Single Instruction, Multiple Threads) execution model with a single warp issued per cycle.
+
+- **Threads**
+  - Smallest unit of computation
+  - Each thread has its own register file (32 int + 32 fp registers)
+  - Threads execute in parallel
+- **Warps**
+  - A logical clster of threads
+  - Each thread in a warp execute the same instruction
+    - The PC is shared; maintain thread mask for Writeback
+  - Warp's execution is time-multiplexed at log steps
+    - Ex. warp 0 executes at cycle 0, warp 1 executes at cycle 1
+
+### Vortex RISC-V ISA Extension
+
+- **Thread Mask Control**
+  - Control the number of warps to activate during execution
+  - `TMC` *count*: activate count threads
+- **Warp Scheduling**
+  - Control the number of warps to activate during execution
+  - `WSPAWN` *count, addr*: activate count warps and jump to addr location
+- **Control-Flow Divergence**
+  - Control threads to activate when a branch diverges
+    - `SPLIT` *predicate*: apply 'taken' predicate thread mask adn save 'not-taken' into IPDOM stack
+    - `JOIN`: restore 'not-taken' thread mask
+- **Warp Synchronization**
+  - `BAR` *id, count*: stall warps entering barrier *id* until count is reached
+
+### Vortex Pipeline/Datapath
+
+![Image of Vortex Microarchitecture](vortex_microarchitecture_v2.png)
+
+Vortex has a 5-stage pipeline: FI | ID | Issue | EX | WB.
+
+- **Fetch**
+  - Warp Scheduler
+    - Track stalled & active warps, resolve branches and barriers, maintain split/join IPDOM stack
+  - Instruction Cache
+    - Retrieve instruction from cache, issue I-cache requests/responses
+- **Decode**
+  - Decode fetched instructions, notify warp scheduler when the following instructions are decoded:
+    - Branch, tmc, split/join, wspawn
+  - Precompute used_regs mask (needed for Issue stage)
+- **Issue**
+  - Scheduling
+    - In-order issue (operands/execute unit ready), out-of-order commit
+  - IBuffer
+    - Store fetched instructions, separate queues per-warp, selects next warp through round-robin scheduling
+  - Scoreboard
+    - Track in-use registers
+  - GPRs (General-Purpose Registers) stage
+    - Fetch issued instruction operands and send operands to execute unit
+- **Execute**
+  - ALU Unit
+    - Single-cycle operations (+,-,>>,<<,&,|,^), Branch instructions (Share ALU resources)
+  - MULDIV Unit
+    - Multiplier - done in 2 cycles
+    - Divider - division and remainder, done in 32 cycles
+      - Implements serial alogrithm (Stalls the pipeline)
+  - FPU Unit
+    - Multi-cycle operations, uses `FPnew` Library on ASIC, uses hard DSPs on FPGA
+  - CSR Unit
+    - Store constant status registers - device caps, FPU status flags, performance counters
+    - Handle external CSR requests (requests from host CPU)
+  - LSU Unit
+    - Handle load/store operations, issue D-cache requests, handle D-cache responses
+    - Commit load responses - saves storage, Scoreboard tracks completion
+  - GPGPU Unit
+    - Handle GPGPU instructions
+      - TMC, WSPAWN, SPLIT, BAR
+    - JOIN is handled by Warp Scheduler (upon SPLIT response)
+- **Commit**
+  - Commit
+    - Update CSR flags, update performance counters
+  - Writeback
+    - Write result back to GPRs, notify Scoreboard (release in-use register), select candidate instruction (ALU unit has highest priority)
+- **Clustering**
+  - Group mulitple cores into clusters (optionally share L2 cache)
+  - Group multiple clusters (optionally share L3 cache)
+  - Configurable at build time
+  - Default configuration:
+    - #Clusters = 1
+    - #Cores = 4
+    - #Warps = 4
+    - #Threads = 4
+- **FPGA AFU Interface**
+  - Manage CPU-GPU comunication
+    - Query devices caps, load kernel instructions and resource buffers, start kernel execution, read destination buffers
+  - Local Memory - GPU access to local DRAM
+  - Reserved I/O addresses - redirect to host CPU, console output
--- a/doc/Simulation.md
+++ b/doc/Simulation.md
@ -24,10 +24,9 @@ Running tests under specific drivers (rtlsim,simx,fpga) is done using the script
 - *L3cache* - used to enable the shared l3cache among the Vortex clusters.
 - *Driver* - used to specify which driver to run the Vortex simulation (either rtlsim, vlsim, fpga, or simx).
 - *Debug* - used to enable debug mode for the Vortex simulation.
- *Scope* -
- *Perf* - is used to enable the detailed performance counters within the Vortex simulation.
- *App* - is used to specify which test/benchmark to run in the Vortex simulation. The main choices are vecadd, sgemm, basic, demo, and dogfood. Other tests/benchmarks are located in the `/benchmarks/opencl` folder though not all of them work wit the current version of Vortex.
- *Args* - 
+- *Perf* - used to enable the detailed performance counters within the Vortex simulation.
+- *App* - used to specify which test/benchmark to run in the Vortex simulation. The main choices are vecadd, sgemm, basic, demo, and dogfood. Other tests/benchmarks are located in the `/benchmarks/opencl` folder though not all of them work wit the current version of Vortex.
+- *Args* - used to pass additional arguments to the application.

 Example use of command line arguments: Run the sgemm benchmark using the vlsim driver with a Vortex configuration of 1 cluster, 4 cores, 4 warps, and 4 threads.

--- a/doc/Vortex.md
+++ b/doc/Vortex.md
@ -2,7 +2,7 @@

 ### Table of Contents

- Vortex Architecture
+- Vortex Microarchitecture
 - Vortex Software
 - [Vortex Simulation](https://github.com/vortexgpgpu/vortex-dev/blob/master/doc/Simulation.md)
 - [FPGA](https://github.com/vortexgpgpu/vortex-dev/blob/master/doc/Flubber_FPGA_Startup_Guide.md)