bht2lvl containts a coding pattern that is not supported in Verilator, despite being legal
Unsupported: Delayed assignment to array inside for loops (non-delayed is ok - see docs)
This issue is not always triggered, as it popped up for cv64 but not for cv32.
Fix TLB bug in the MMU when H-Mode is active:
TLB: correct a race condition in lsu_gpaddr_o in which only the highest PPN were checked for match, which may result in invalid translation
Also:
improve coding style / comments of cva6_tlb.sv
---------
Co-authored-by: JeanRochCoulon <jean-roch.coulon@thalesgroup.com>
Break-down of #2933 : PTW changes only:
walks on intermediate nodes during G-translation do no generate faults if R/D/X bits is not set and request is a store (fixes [BUG] Wrong permission check walking HGATP #2910)
walks on intermediate nodes during G-translation do no generate faults if X bits is not set and request is an execute
Code style:
gather requests to the TLB in a single process instead of scattered signals
add more comments
---------
Co-authored-by: JeanRochCoulon <jean-roch.coulon@thalesgroup.com>
Prior to this change, the MMU only captured LSU misalignment faults when
Hypervisor extensions (RVH) were enabled. If RVH was disabled, misaligned
loads/stores would silently bypass exception handling, violating the
RISC-V privilege spec. This patch moves the [misaligned_ex_q] update
outside the (RVH) guard so that misaligned exceptions are always registered
and prioritized over translation.
Fixes rv32mi-ma_addr & rv32i-I-MISALIGN_LDST_01 tests.
Fix fence.i synchronization issue with HPDCache write-back Data
Add [halt_frontend_o] signal to stall instruction fetch during fence.i
Introduce [fence_i_active] state to track ICache+DCache flush progress
Gate ICache requests and NPC updates when fence.i is active
Resolves stale instruction fetch in self-modifying code scenarios (riscv-tests/rv32ui/fence_i)
Enables Linux boot with HPDCache write-back data by ensuring ICache/DCache coherency
Note: This was not seen with [std_dcache] in standard [fence_i] tests, since it normally finishes write‑back before the ICache flush completes. However, for large self‑modifying code regions, the write‑back can lag behind the ICache flush, leading to stale instruction fetches even with [std_dcache].
Problem:
When executing self-modifying code or booting Linux with a write-back D$, fence.i flushes the ICache but allows the frontend to fetch new instructions before the DCache flush completes. This causes the core to execute stale instructions from the ICache that were modified in the DCache (Dirty data may not be written back because the DCache is still flushing).
Root Cause:
The frontend resumes fetching immediately after ICache flush, even if the DCache is still flushing.
NPC (Next PC) continues advancing after ICache flush, leading to incorrect instruction fetch.
Solution:
Introduce halt_frontend_o signal to freeze frontend during fence.i.
Add fence_i_active state in controller to track combined ICache+DCache flush.
Gate ICache requests (icache_dreq_o.req) and NPC updates (if_ready) during fence_i_active.
This commit makes two changes to the commit stage:
- It fixes incorrect parenthesis, causing dirty_fp_state_o not to be set
on floating point instructions such as fld.
- It adds a condition for dirty_fp_state_o that only asserts the flag
for floating point instructions that change any FPU registers.
Co-authored-by: Guillaume Chauvon <94678394+Gchauvon@users.noreply.github.com>
Fix port connection width mismatch warning in the instruction tracer when using 32-bit version of the core. As the tracer is hardcoded to 64 bit operands and it could be useful to have the same trace format for both 64 and 32 bit cores, the most convenient solution is to pad the data when needed.
Co-authored-by: JeanRochCoulon <jean-roch.coulon@thalesgroup.com>
* Introduction
This PR adds support for Zbkx, Zkne, Zknd and Zknh extensions in the CVA6 core. It also adds the documentation and tests for these extensions. These changes have been tested with self-written single instruction tests and with the riscv-arch-tests. This PR will complete the Zkn - NIST Algorithm Suite extension.
* Implementation
Zbkx Extension:
Added support for the Zbkx instruction set. It essentially expands the Bitmanip extension with additional instructions useful in cryptography. These instructions are xperm8, xperm4.
Zkne Extension:
Added support for the Zkne instruction set. It essentially adds AES encryption support for scalar cryptography. These instructions are aes32esi, aes32esmi, aes64es, aes64esm, aes64ks1i, aes64ks2.
Zknd Extension:
Added support for the Zknd instruction set. It adds AES decryption support for scalar cryptography. These instructions are aes32dsi, aes32dsmi, aes64ds, aes64dsm, aes64im, aes64ks1i, aes64ks2.
Note:
The aes64ks1i and aes64ks2 instructions are present in both the Zknd and Zkne extensions.
Zknh Extension:
Added support for the Zknh instruction set. It adds the hash function instructions support for scalar cryptography. These instructions are sha256sig0, sha256sig1, sha256sum0, sha256sum1, sha512sig0h, sha512sig0l, sha512sig1h, sha512sig1l, sha512sum0r, sha512sum1r, sha512sig0, sha512sig1, sha512sum0, sha512sum1.
* Modifications
Updated the ALU and decoder to recognize and handle Zbkx instructions. For Zkne, Zknd & Zknh, the decoder will now select the AES unit as functional unit instead of the ALU.
The complete Zkn extension is added under the ZKN bit for ease of use. This configuration will also require the RVB (bitmanip) bit to be set.
Note:
The Zkn extension does not require the use of vectorial fpu.
* AES Functional Unit
A new functional unit was created inside the execute stage that will handle all AES and Hashing instructions (Zkne, Zknd, Zknh).
A new package "aes_pkg" handles all AES functions such as sbox substitution, mix columns, etc.
aes_unit
* Documentation and Reference
The official RISC-V Cryptography Extensions Volume I was followed to ensure alignment with ratification. The relevant documentation for Zbkx, Zkne, Zknd and Zknh instructions was also added.
* Verification
Assembly Tests:
The instructions were tested and verified with the K module of both 32 bit and 64 bit versions of the riscv-arch-tests to ensure proper functionality. These tests check for ISA compliance, edge cases and use assertions to ensure expected behavior.
Currently, cvxif_off_instr_n is assigned to orig_instr[i], with i being
0 or 1 depending on the issue port.
This assigns the original instruction value (which will be propagated,
e.g., into tval in an exception) to the last bit of the instruction on
issue port 0.
This commit assigns it to the full instruction on the corresponding
issue port instead.
Adds support for Trace Interface or Trace Ingress Port (TIP) on CVA6
TIP is Interface between a RISC-V hart and the trace encoder
It generates information about the instruction retired.
The implementation is compliant with the Efficient Trace for RISC-V standard Version 2.0.2(https://github.com/riscv-non-isa/riscv-trace-spec/releases/download/v2.0.2/riscv-trace-spec-asciidoc.pdf), specifically:
Chapter 4.1: Instruction Trace Interface Requirements
Chapter 4.2: Instruction Trace Interface
The current implementation supports the following TIP signals: iretire, itype, cause, tval, priv, iaddr, and time. For Instruction Type (itype) encoding, it supports the following: Exception, Interrupt, Exception or interrupt return, Nontaken branch, Taken branch, Uninferable jump.
What I have been able to test so far:
Simulation: Executed C binaries and observed the waveform of TIP.
---------
Co-authored-by: root <darshak.sheladiya@sysgo.com>
Co-authored-by: CHAUVON Guillaume <guillaume.chauvon@thalesgroup.com>
Co-authored-by: JeanRochCoulon <jean-roch.coulon@thalesgroup.com>
This PR introduces a new RAW hazard detection mechanism to eliminate WAW hazards in CVA6 issue stage.
It first checks for hazards in all scoreboard entries in parallel.
Then it filters found hazards before vs after the current issue pointer.
It then finds the index of the last hazard before (resp. after) the issue pointer.
Finally, it gives precedence to a hazard before the issue pointer over the one after the issue pointer.
---------
Co-authored-by: Junheng Zheng <junheng.zheng@thalesgroup.com>
Co-authored-by: JeanRochCoulon <jean-roch.coulon@thalesgroup.com>
This macro is not required and makes the file harder to parse.
---------
Co-authored-by: github-actions[bot] <41898282+github-actions[bot]@users.noreply.github.com>
Co-authored-by: JeanRochCoulon <jean-roch.coulon@thalesgroup.com>
Following what was done in branch_unit, I set up a default value for hypervisor exception fields in cvxif_fu.
Should fix issue #2831
---------
Co-authored-by: github-actions[bot] <41898282+github-actions[bot]@users.noreply.github.com>
Co-authored-by: JeanRochCoulon <jean-roch.coulon@thalesgroup.com>
This PR assigns 0 to tinst by default.
Even though tinst is only used when CVA6Cfg.RVH is enabled, I chose to assign it a default value in all configurations, since the signal is defined for all configurations.
Fixes#2803
Co-authored-by: JeanRochCoulon <jean-roch.coulon@thalesgroup.com>
Verification Plan provided in VP_TOOL for the PMP. The verification plan should be complete, however only a partial set of the tests is available. This is not included in the CI but a bash script is available to run the test.
This code hits verilator/verilator#5829 due to the use of partial assignments to dcache_rtrn_o in this always block, while reading other bits of the same packed struct elsewhere in the block.
The actual effect of this is that with a Verilator simulation, invalidation requests incoming from the coherence network are sometimes ignored breaking AMOs.
Moving the assignments to the bits read in the always block into the same always block avoids this issue.
---------
Co-authored-by: JeanRochCoulon <jean-roch.coulon@thalesgroup.com>
Co-authored-by: github-actions[bot] <41898282+github-actions[bot]@users.noreply.github.com>
This PR adds a new two-level BHT predictor with private history. The new BPType parameters allow choosing between the original BHT and the new one.
Co-authored-by: Gianmarco Ottavi <ottavig91@gmail.com>
Adds to #2798. Sorry for noticing this only now. Together with #2798, this reverts a bug that was introduced in #2528.
Signed-off-by: Nils Wistoff <nwistoff@iis.ee.ethz.ch>
A new configuration file and core v target is added to start working on a 64 bit CVA6 core.
---------
Co-authored-by: JeanRochCoulon <jean-roch.coulon@thalesgroup.com>
The load and store units sample the MMU exception one cycle after
`dtlb_hit` is asserted. However, misaligned exceptions are currently fed
through the MMU, potentially attributing a misaligned exception to the
*preceding* instruction. Fix this by latching the misaligned exception.
Signed-off-by: Nils Wistoff <nwistoff@iis.ee.ethz.ch>
Co-authored-by: JeanRochCoulon <jean-roch.coulon@thalesgroup.com>
Use separate parametrization for RVF and RVD support. In the cv32a6_imafc_sv32 configuration RVD is currently enabled, leading to compilation errors.
Co-authored-by: JeanRochCoulon <jean-roch.coulon@thalesgroup.com>
Hi! Some tools like morty struggle with this expression. I suggest this very simple rewrite. No need for fancy constructs here.
Thanks @flaviens for this contribution
Add parameter CoproType to select which coprocessor to instantiate when CvxifEn == 1
---------
Co-authored-by: JeanRochCoulon <jean-roch.coulon@thalesgroup.com>
Follow-up to the discussion on extending Linux support to the Ara vector processor.
* Main changes:
Add:
Add external MMU interface to share the MMU with the external accelerator.
Add avoid_neg() function used to clip negative numbers to zero. Useful for parametric array sizes and vector multipliers.
Modifications:
2 commit ports by default in cv64a6_imafdcv_config_pkg.
Change exception_t from localparam to param in cva6.sv.
Add parameters accelerator_req_t, accelerator_resp_t, acc_mmu_req_t, and acc_mmu_resp_t to cva6.sv.
Replace the fall-through register with a spill register in acc_dispatcher to decouple timing with the accelerator.
Decrease cache sizes in cv64a6_imafdcv_sv39_config_pkg.
Modify Bender.yml package name from ariane to cva6.
Add harmless code to prevent synthesizer tool from crashing when compiling csr_regfile.
* Collateral changes:
Fixes:
Guard some X-IF code lines with correct parameter in cva6.sv.
Parametrize the tracer interface with NrCommitPorts.
Add missing local dependencies to Bender.yml.
---------
Co-authored-by: JeanRochCoulon <jean-roch.coulon@thalesgroup.com>
Add support for Superscalar with ZCMP, ZCMT and CVXIF.
ZCMP decoder, ZCMT decoder and CVXIF interface driver are using port 0.
Standard RVC and 32 bits instruction can take port 0 or 1.
Use CVA6Cfg.DebugEn in an outer test instead of in inner tests
Signed-off-by: André Sintzoff <andre.sintzoff@thalesgroup.com>
Co-authored-by: JeanRochCoulon <jean-roch.coulon@thalesgroup.com>
is_rvc is redundant with riscv::OpcodeC1, riscv::OpcodeC2
Signed-off-by: André Sintzoff <andre.sintzoff@thalesgroup.com>
Co-authored-by: JeanRochCoulon <jean-roch.coulon@thalesgroup.com>
## Introduction
This PR implements the ZCMT extension in the CVA6 core, targeting the 32-bit embedded-class platforms. ZCMT is a code-size reduction feature that utilizes compressed table jump instructions (cm.jt and cm.jalt) to reduce code size for embedded systems
**Note:** Due to implementation complexity, ZCMT extension is primarily targeted at embedded class CPUs. Additionally, it is not compatible with architecture class profiles.(Ref. [Unprivilege spec 27.20](https://drive.google.com/file/d/1uviu1nH-tScFfgrovvFCrj7Omv8tFtkp/view))
## Key additions
- Added zcmt_decoder module for compressed table jump instructions: cm.jt (jump table) and cm.jalt (jump-and-link table)
- Implemented the Jump Vector Table (JVT) CSR to store the base address of the jump table in csr_reg module
- Implemented a return address stack, enabling cm.jalt to behave equivalently to jal ra (jump-and-link with return address), by pushing the return address onto the stack in zcmt_decoder module
## Implementation in CVA6
The implementation of the ZCMT extension involves the following major modifications:
### compressed decoder
The compressed decoder scans and identifies the cm.jt and cm.jalt instructions, and generates signals indicating that the instruction is both compressed and a ZCMT instruction.
### zcmt_decoder
A new zcmt_decoder module was introduced to decode the cm.jt and cm.jalt instructions, fetch the base address of the JVT table from JVT CSR, extract the index and construct jump instructions to ensure efficient integration of the ZCMT extension in embedded platforms. Table.1 shows the IO port connection of zcmt_decoder module. High-level block diagram of zcmt implementation in CVA6 is shown in Figure 1.
_Table. 1 IO port connection with zcmt_decoder module_
Signals | IO | Description | Connection | Type
-- | -- | -- | -- | --
clk_i | in | Subsystem Clock | SUBSYSTEM | logic
rst_ni | in | Asynchronous reset active low | SUBSYSTEM | logic
instr_i | in | Instruction in | compressed_decoder | logic [31:0]
pc_i | in | Current PC | PC from FRONTEND | logic [CVA6Cfg.VLEN-1:0]
is_zcmt_instr_i | in | Is instruction a zcmt instruction | compressed_decoder | logic
illegal_instr_i | in | Is instruction a illegal instruction | compressed_decoder | logic
is_compressed_i | in | Is instruction a compressed instruction | compressed_decoder | logic
jvt_i | in | JVT struct from CSR | CSR | jvt_t
req_port_i | in | Handshake between CACHE and FRONTEND (fetch) | Cache | dcache_req_o_t
instr_o | out | Instruction out | cvxif_compressed_if_driver | logic [31:0]
illegal_instr_o | out | Is the instruction is illegal | cvxif_compressed_if_driver | logic
is_compressed_o | out | Is the instruction is compressed | cvxif_compressed_if_driver | logic
fetch_stall_o | out | Stall siganl | cvxif_compressed_if_driver | logic
req_port_o | out | Handshake between CACHE and FRONTEND (fetch) | Cache | dcache_req_i_t
### branch unit condition
A condition is implemented in the branch unit to ensure that ZCMT instructions always cause a misprediction, forcing the program to jump to the calculated address of the newly constructed jump instruction.
### JVT CSR
A new JVT csr is implemented in csr_reg which holds the base address of the JVT table. The base address is fetched from the JVT CSR, and combined with the index value to calculate the effective address.
### No MMU
Embedded platform does not utilize the MMU, so zcmt_decoder is connected with cache through port 0 of the Dcache module for implicit read access from the memory.

_Figure. 1 High level block diagram of ZCMT extension implementation_
## Known Limitations
The implementation targets 32-bit instructions for embedded-class platforms without an MMU. Since the core does not utilize an MMU, it is leveraged to connect the zcmt_decoder to the cache via port 0.
## Testing and Verification
- Developed directed test cases to validate cm.jt and cm.jalt instruction functionality
- Verified correct initialization and updates of JVT CSR
### Test Plan
A test plan is developed to test the functionality of ZCMT extension along with JVT CSR. Directed Assembly test executed to check the functionality.
_Table. 2 Test plan_
S.no | Features | Description | Pass/Fail Criteria | Test Type | Test status
-- | -- | -- | -- | ---- | --
1 | cm.jt | Simple assembly test to validate the working of cm.jt instruction in CV32A60x. | Check against Spike's ref. model | Directed | Pass
2 | cm.jalt | Simple assembly test to validate the working of cm.jalt instruction in both CV32A60x. | Check against Spike's ref. model | Directed | Pass
3 | cm.jalt with return address stack | Simple assembly test to validate the working of cm.jalt instruction with return address stack in both CV32A60x. It works as jump and link ( j ra, imm) | Check against Spike's ref. model | Directed | Pass
4 | JVT CSR | Read and write base address of Jump table to JVT CSR | Check against Spike's ref. model | Directed | Pass
**Note**: Please find the test under CVA6_REPO_DIR/verif/tests/custom/zcmt"
The AXI AW channel in the HPDcache is shared by three components:
Write Buffer
Flush Controller
Uncached Controller
The ID for each transaction is generated based on its source as follows:
Write Buffer: {1'b0, write_buffer_entry_index}
Flush Controller: {1'b1, flush_controller_index}
Uncached Controller: '1
To distinguish between flush transactions and uncached transactions, the flush transaction ID must include at least one 0.
Currently, the AXI ID is limited to 4 bits, while the flush controller supports 8 entries. As a result, when a transaction is sent from the 8th entry of the flush controller, all bits of the ID are set to 1. This causes the HPDcache to misroute the response to the uncached controller instead of the flush controller.
The parameter CVA6ConfigWtDcacheWbufDepth is used in the WB cache to set the number of flush entries. To avoid modifying the ID width, the number of flush entries must be less than 8. Non-power-of-two values are supported.
Co-authored-by: JeanRochCoulon <jean-roch.coulon@thalesgroup.com>
Fix decoding of some bitmanip instruction where decoding differs between rv32 and rv64.
Co-authored-by: JeanRochCoulon <jean-roch.coulon@thalesgroup.com>