Commit graph

101 commits

Author SHA1 Message Date
Geza Lore
c511b21911
Workaround for Verilator ordering issue in OpenPiton cache adapter (#2809)
Some checks failed
bender-up-to-date / bender-up-to-date (push) Has been cancelled
ci / build-riscv-tests (push) Has been cancelled
ci / execute-riscv64-tests (push) Has been cancelled
ci / execute-riscv32-tests (push) Has been cancelled
This code hits verilator/verilator#5829 due to the use of partial assignments to dcache_rtrn_o in this always block, while reading other bits of the same packed struct elsewhere in the block.

The actual effect of this is that with a Verilator simulation, invalidation requests incoming from the coherence network are sometimes ignored breaking AMOs.

Moving the assignments to the bits read in the always block into the same always block avoids this issue.

---------

Co-authored-by: JeanRochCoulon <jean-roch.coulon@thalesgroup.com>
Co-authored-by: github-actions[bot] <41898282+github-actions[bot]@users.noreply.github.com>
2025-03-06 17:16:13 +01:00
Nils Wistoff
abf21ee221
cva6_icache: Fix formatting (#2770)
Run verible verilog format to fix upstream formatting.

Signed-off-by: Nils Wistoff <nwistoff@iis.ee.ethz.ch>
2025-02-13 21:24:31 +01:00
Matteo Perotti
1bc415391a
[RVV] CVA6 re-parametrization and MMU interface (#2652)
Follow-up to the discussion on extending Linux support to the Ara vector processor.

* Main changes:
Add:
Add external MMU interface to share the MMU with the external accelerator.
Add avoid_neg() function used to clip negative numbers to zero. Useful for parametric array sizes and vector multipliers.

Modifications:
2 commit ports by default in cv64a6_imafdcv_config_pkg.
Change exception_t from localparam to param in cva6.sv.
Add parameters accelerator_req_t, accelerator_resp_t, acc_mmu_req_t, and acc_mmu_resp_t to cva6.sv.
Replace the fall-through register with a spill register in acc_dispatcher to decouple timing with the accelerator.
Decrease cache sizes in cv64a6_imafdcv_sv39_config_pkg.
Modify Bender.yml package name from ariane to cva6.
Add harmless code to prevent synthesizer tool from crashing when compiling csr_regfile.

* Collateral changes:
Fixes:
Guard some X-IF code lines with correct parameter in cva6.sv.
Parametrize the tracer interface with NrCommitPorts.
Add missing local dependencies to Bender.yml.

---------

Co-authored-by: JeanRochCoulon <jean-roch.coulon@thalesgroup.com>
2025-02-11 07:22:31 +01:00
Farhan Ali Shah
542fe39adc
Adding support for ZCMT Extension for Code-Size Reduction in CVA6 (#2659)
Some checks are pending
bender-up-to-date / bender-up-to-date (push) Waiting to run
ci / build-riscv-tests (push) Waiting to run
ci / execute-riscv64-tests (push) Blocked by required conditions
ci / execute-riscv32-tests (push) Blocked by required conditions
## Introduction
This PR implements the ZCMT extension in the CVA6 core, targeting the 32-bit embedded-class platforms. ZCMT is a code-size reduction feature that utilizes compressed table jump instructions (cm.jt and cm.jalt) to reduce code size for embedded systems
**Note:** Due to implementation complexity, ZCMT extension is primarily targeted at embedded class CPUs. Additionally, it is not compatible with architecture class profiles.(Ref. [Unprivilege spec 27.20](https://drive.google.com/file/d/1uviu1nH-tScFfgrovvFCrj7Omv8tFtkp/view))

## Key additions

- Added zcmt_decoder module for compressed table jump instructions: cm.jt (jump table) and cm.jalt (jump-and-link table)

- Implemented the Jump Vector Table (JVT) CSR to store the base address of the jump table in csr_reg module

- Implemented a return address stack, enabling cm.jalt to behave equivalently to jal ra (jump-and-link with return address), by pushing the return address onto the stack in zcmt_decoder module

## Implementation in CVA6
The implementation of the ZCMT extension involves the following major modifications:

### compressed decoder 
The compressed decoder scans and identifies the cm.jt and cm.jalt instructions, and generates signals indicating that the instruction is both compressed and a ZCMT instruction.

### zcmt_decoder
A new zcmt_decoder module was introduced to decode the cm.jt and cm.jalt instructions, fetch the base address of the JVT table from JVT CSR, extract the index and construct jump instructions to ensure efficient integration of the ZCMT extension in embedded platforms. Table.1 shows the IO port connection of zcmt_decoder module. High-level block diagram of zcmt implementation in CVA6 is shown in Figure 1.

_Table. 1 IO port connection with zcmt_decoder module_
Signals | IO | Description | Connection | Type
-- | -- | -- | -- | --
clk_i | in | Subsystem Clock | SUBSYSTEM | logic
rst_ni | in | Asynchronous reset active low | SUBSYSTEM | logic
instr_i | in | Instruction in | compressed_decoder | logic [31:0]
pc_i | in | Current PC | PC from FRONTEND | logic [CVA6Cfg.VLEN-1:0]
is_zcmt_instr_i | in | Is instruction a zcmt instruction | compressed_decoder | logic
illegal_instr_i | in | Is instruction a illegal instruction | compressed_decoder | logic
is_compressed_i | in | Is instruction a compressed instruction | compressed_decoder | logic
jvt_i | in | JVT struct from CSR | CSR | jvt_t
req_port_i | in | Handshake between CACHE and FRONTEND (fetch) | Cache | dcache_req_o_t
instr_o | out | Instruction out | cvxif_compressed_if_driver | logic [31:0]
illegal_instr_o | out | Is the instruction is illegal | cvxif_compressed_if_driver | logic
is_compressed_o | out | Is the instruction is compressed | cvxif_compressed_if_driver | logic
fetch_stall_o | out | Stall siganl | cvxif_compressed_if_driver | logic
req_port_o | out | Handshake between CACHE and FRONTEND (fetch) | Cache | dcache_req_i_t

### branch unit condition
A condition is implemented in the branch unit to ensure that ZCMT instructions always cause a misprediction, forcing the program to jump to the calculated address of the newly constructed jump instruction.

### JVT CSR
A new JVT csr is implemented in csr_reg which holds the base address of the JVT table. The base address is fetched from the JVT CSR, and combined with the index value to calculate the effective address.

### No MMU
Embedded platform does not utilize the MMU, so zcmt_decoder is connected with cache through port 0 of the Dcache module for implicit read access from the memory.

![zcmt_block drawio](https://github.com/user-attachments/assets/ac7bba75-4f56-42f4-9f5e-0c18f00d4dae)
_Figure. 1 High level block diagram of ZCMT extension implementation_

## Known Limitations
The implementation targets 32-bit instructions for embedded-class platforms without an MMU. Since the core does not utilize an MMU, it is leveraged to connect the zcmt_decoder to the cache via port 0.

## Testing and Verification

- Developed directed test cases to validate cm.jt and cm.jalt instruction functionality
- Verified correct initialization and updates of JVT CSR

### Test Plan 
A test plan is developed to test the functionality of ZCMT extension along with JVT CSR. Directed Assembly test executed to check the functionality. 

_Table. 2 Test plan_
S.no | Features | Description | Pass/Fail Criteria | Test Type | Test status
-- | -- | -- | -- | ---- | --
1 | cm.jt | Simple assembly test to validate the working of cm.jt instruction in  CV32A60x. | Check against Spike's ref. model | Directed | Pass
2 | cm.jalt | Simple assembly test to validate the working of cm.jalt instruction in both CV32A60x. | Check against Spike's ref. model | Directed | Pass
3 | cm.jalt with return address stack | Simple assembly test to validate the working of cm.jalt instruction with return address stack in both CV32A60x. It works as jump and link ( j ra, imm) | Check against Spike's ref. model | Directed | Pass
4 | JVT CSR | Read and write base address of Jump table to JVT CSR | Check against Spike's ref. model | Directed | Pass


**Note**: Please find the test under CVA6_REPO_DIR/verif/tests/custom/zcmt"
2025-01-27 13:23:26 +01:00
Cesar Fuguet
db568f3e1d
Fully support the Write-Back mode of the HPDcache in the CVA6 (#2691)
Some checks are pending
bender-up-to-date / bender-up-to-date (push) Waiting to run
ci / build-riscv-tests (push) Waiting to run
ci / execute-riscv64-tests (push) Blocked by required conditions
ci / execute-riscv32-tests (push) Blocked by required conditions
This PR modifies some components in the CVA6 to fully support the WB mode of the HPDcache.

When on WB mode, there may be coherency issues between the Instruction Cache and the Data Cache. This may happen when the software writes on instruction segments (e.g. to relocate a code in memory).

This PR contains the following modifications:

The CVA6 controller module rises the flush signal to the caches when executing a fence or fence.i instruction.
The HPDcache cache subsystem translates this fence signal to a FLUSH request to the cache (when the HPDcache is in WB mode).
Add new parameters in the CVA6 configuration packages:
DcacheFlushOnInvalidate: It changes the behavior of the CVA6 controller. When this parameter is set, the controller rises the Flush signal on fence instructions.
DcacheInvalidateOnFlush: It changes the behavior of the HPDcache request adapter. When issuing a flush, it also asks the HPDcache to invalidate the cachelines.
Add additional values to the DcacheType enum: HPDCACHE_WT, HPDCACHE_WB, HPDCACHE_WT_WB
In addition, it also fixes some issues with the rvfi_mem_paddr signal from the store_buffer.
2025-01-10 17:57:32 +01:00
Nils Wistoff
71f96d4329
wt_axi_adapter: Remove redundant, parameterization-breaking zero extensions (#2697)
Some checks are pending
bender-up-to-date / bender-up-to-date (push) Waiting to run
ci / build-riscv-tests (push) Waiting to run
ci / execute-riscv64-tests (push) Blocked by required conditions
ci / execute-riscv32-tests (push) Blocked by required conditions
zero-extended the paddrs to match the axi_addr width and thus fix lint warnings. However, this breaks elaboration if AxiAddrWidth <= PLEN. To fix lint warnings without breaking parametrisation, use explicit casts to pad/truncate as required.
2025-01-10 08:27:10 +01:00
Nils Wistoff
ee58bfab94
wt_dcache_buffer: Avoid out-of-range user signal access (#2698)
Some checks are pending
bender-up-to-date / bender-up-to-date (push) Waiting to run
ci / build-riscv-tests (push) Waiting to run
ci / execute-riscv64-tests (push) Blocked by required conditions
ci / execute-riscv32-tests (push) Blocked by required conditions
If the data user signal is disabled and the user bus width is reduced,
the slice operator into the user field will cause elaboration errors.
Since the faulty else block is anyways without effect, just remove it.
2025-01-09 14:42:41 +01:00
Matteo Perotti
5a484fce42
cache_subsystem: 🐛 Fix AXI read len (#2696)
Some checks are pending
bender-up-to-date / bender-up-to-date (push) Waiting to run
ci / build-riscv-tests (push) Waiting to run
ci / execute-riscv64-tests (push) Blocked by required conditions
ci / execute-riscv32-tests (push) Blocked by required conditions
AxiRdBlenDcache -> AxiRdBlenIcache
2025-01-08 23:06:53 +01:00
AngelaGonzalezMarino
9877af5eb6
fix size of vectors when AxiNumWords=1 (#2639)
in wt_axi_adapter, axi_rd_blen and axi_wr_blen are defined like this:

logic [$clog2(AxiNumWords)-1:0] axi_rd_blen, axi_wr_blen;

However, if AxiNumWords=1, this gives a synthesis error. This happens if the cache line is set to 64 bits (same as AXI width).

It can be fixed by changing to:
logic [AxiNumWords > 1 ? $clog2(AxiNumWords) : AxiNumWords-1:0] axi_rd_blen, axi_wr_blen;
2024-12-03 07:14:29 +01:00
AngelaGonzalezMarino
ba8ac715d8
use dcache_assoc_width (#2640)
Some checks are pending
bender-up-to-date / bender-up-to-date (push) Waiting to run
ci / build-riscv-tests (push) Waiting to run
ci / execute-riscv64-tests (push) Blocked by required conditions
ci / execute-riscv32-tests (push) Blocked by required conditions
cva6/core/cache_subsystem/wt_dcache_missunit.sv

Line 202 in b718824
 .OutWidth ($clog2(CVA6Cfg.DCACHE_SET_ASSOC)) 

Better to use the width parameter which already contemplates the case of 0 to avoid issues if associativity is set to 1
cva6/core/include/build_config_pkg.sv

Line 134 in b718824
 cfg.DCACHE_SET_ASSOC_WIDTH = CVA6Cfg.DcacheSetAssoc > 1 ? $clog2(CVA6Cfg.DcacheSetAssoc) : CVA6Cfg.DcacheSetAssoc;
2024-12-02 17:40:38 +01:00
Yan
25f2f3190d
Fix $fatal system task incorrect usage (#2619)
Some checks are pending
bender-up-to-date / bender-up-to-date (push) Waiting to run
ci / build-riscv-tests (push) Waiting to run
ci / execute-riscv64-tests (push) Blocked by required conditions
ci / execute-riscv32-tests (push) Blocked by required conditions
To fix  #2618
2024-11-20 22:22:50 +01:00
AngelaGonzalezMarino
33c5d77bd8
Altera opt 1 (#2592)
Some checks failed
bender-up-to-date / bender-up-to-date (push) Has been cancelled
ci / build-riscv-tests (push) Has been cancelled
ci / execute-riscv64-tests (push) Has been cancelled
ci / execute-riscv32-tests (push) Has been cancelled
The first optimization for Altera FPGA is to move the instruction queue to LUTRAM. The reason why the optimization previously done for Xilinx is not working, is that in that case asynchronous RAM primitives are used, and Altera does not support asynchronous RAM. Therefore, this optimization consists in using synchronous RAM for the instruction queue and FIFOs inside wt axi adapter.

The main changes to the existing code are:

New RAM module to infer synchronous RAM in altera with independent read and write ports (SyncDpRam_ind_r_w.sv)

Changes inside cva6_fifo_v3 to adapt to the use of synchronous RAM instead of asynchronous:

When the FIFO is not empty, next data is always read and available at the output hiding the reading latency introduced by synchronous RAM (similar to fall-through approach). This is a simplification that is possible because in a FIFO we always know what is the next address to be read.

When data is read right after write, we can’t use the previous method because there is a latency to first write the data in the FIFO, and then to read it. For this reason, in the new design there is an auxiliary register used to hide this latency. This is used only if the FIFO is empty, so we detect when the word written is first word, and keep it in this register. If the next cycle comes a read, the data out is taken from the aux register. Afterwards the data is already available in the RAM and can be read continuously as in the first case.

All this is only used inf FpgaAlteraEn parameter is enabled, otherwise the previous implementation with asynchronous RAM applies (when FpgaEn is set), or the register based implementation (when FpgaEn is not set).
2024-11-15 14:34:15 +01:00
Côme
4619a67fc6
expand glob port maps (#2585)
Some checks failed
bender-up-to-date / bender-up-to-date (push) Has been cancelled
ci / build-riscv-tests (push) Has been cancelled
ci / execute-riscv64-tests (push) Has been cancelled
ci / execute-riscv32-tests (push) Has been cancelled
Expands all glob port maps in the core/ directory of this repository except the core/cache_subsystem/ directory, despite the glob port maps in core/cache_subsystem/miss_handler.sv and core/cache_subsystem/std_nbdcache.sv.

Also reorders port maps to keep the same order as port declarations.
2024-11-07 16:51:46 +01:00
Cesar Fuguet
6bbc1e6d35
update the hpdcache to its latest version (#2579)
Some checks are pending
bender-up-to-date / bender-up-to-date (push) Waiting to run
ci / build-riscv-tests (push) Waiting to run
ci / execute-riscv64-tests (push) Blocked by required conditions
ci / execute-riscv32-tests (push) Blocked by required conditions
2024-11-05 23:57:20 +01:00
Nils Wistoff
aeb0b646bf
cache_ctrl: Generalise AXI offset generation (#2573)
For `XLEN = 64`, some tools (e.g. VCS) still elaborate the offset generation block for `XLEN = 32`, throwing an elaboration error (illegal bit access). Fix this by generating the AXI offset in an equivalent, parameter-agnostic and tool-friendly way.
2024-11-04 09:24:57 +01:00
Riccardo Tedeschi
53472eb026
Move timing statement outside of always_comb block (#2552)
Fix following requirement:

The assertion included in the always_comb block apparently violates the requirements in [section 9.2.2.2.2 of the SystemVerilog standard](https://ieeexplore.ieee.org/document/10458102):

Statements in an always_comb shall not include those that block, have blocking timing or event
controls, or fork-join statements.
2024-10-23 07:32:49 +02:00
Riccardo Tedeschi
164d7c7fc9
Add AW lock register to handle W FIFO push signal (#2461) 2024-09-24 08:42:16 +02:00
dependabot[bot]
ea3a55450b
Bump core/cache_subsystem/hpdcache from 25ffa34 to b4519e7 (#2466) 2024-08-31 08:51:52 +02:00
CoralieAllioux
335c91cc08
[Xcelium flow] Xrun compile fixes (#2389) 2024-07-25 07:37:43 +02:00
Cesar Fuguet
9df64701bd
Update submodule core/cache_subsystem/hpdcache (#2265) 2024-06-18 11:54:35 +02:00
Akiho Kawada
bc7149adc7
refactor hpdcache_cache_subsystem module code to ease reutilization (#2173) 2024-06-11 23:12:30 +02:00
Côme
eac60af1a9
superscalar: add a second issue port (#2209) 2024-06-09 20:47:09 +02:00
Guillaume Chauvon
a5152b03a5
Add support for cv32a65x dedicated synthesis (#2178) 2024-06-04 10:58:09 +02:00
Cyprien Heusse
46e9d5a7fc
32 bits WB cache (#2170) 2024-05-30 18:47:39 +02:00
JeanRochCoulon
8630458370
Parametrization: Use CVA6Cfg.WtDcacheWbufDepth in place of DCACHE_WBUF_DEPTH (#2166) 2024-05-30 12:26:58 +02:00
dependabot[bot]
691c480aea
Bump core/cache_subsystem/hpdcache from 57c82d3 to 32407cb (#2157) 2024-05-27 23:06:50 +02:00
Cyprien Heusse
e823d836f3
Fix bug when killing WB cache request (#2142) 2024-05-22 23:40:11 +02:00
Cesar Fuguet
cd241cb387
hpdcache: update HPDcache to support parametrization (#2059) 2024-05-15 12:28:36 +02:00
JeanRochCoulon
5df5a5c247
Define InstrTlbEntries, DataTlbEntries, cfg.NrLoadPipeRegs, NrStorePipeRegs, DcacheIdWidth as CVA6 parameters (#2034) 2024-04-12 09:06:35 +02:00
Florian Zaruba
ecd6ed6b6b
Move DCacheType to config struct (#2025) 2024-04-10 23:26:21 +02:00
JeanRochCoulon
80e6d7cffc
Verible reformat (#2014) 2024-04-08 11:26:08 +02:00
Côme
ec44b22920
superscalar: fetch 64 bits (#2013) 2024-04-08 11:25:39 +02:00
Cesar Fuguet
83a5b05752
hpdcache: update submodule (#2009) 2024-04-05 18:52:10 +02:00
Florian Zaruba
38e8c059b2
Parameterization and other fixes for downstream project (#1950)
* Bender fixes and switch to `cva6_fifo_v3`
* cfg: Fix verilator warnings
* Bender: Fix yml
* acc_dispatcher: Add `csr_addr_i`
* parameterization: Fox AXI_USER_EN warning
* wb_cache: Fix Verilator Lint warnings
* cva6_fifo_v3: Add to Flist
* parameterization: Address review concerns
* Switch to `cva6_fifo_v3`
* tracer: Remove tracer interface

The interface made a bunch of problems with the
typedefs so I've removed it.

---------

Co-authored-by: JeanRochCoulon <jean-roch.coulon@thalesgroup.com>
Co-authored-by: github-actions[bot] <41898282+github-actions[bot]@users.noreply.github.com>
2024-04-05 13:02:18 +02:00
JeanRochCoulon
4423feb06a
Rename ZiCondExtEn and FPGA_EN parameters (#1992) 2024-04-02 15:37:58 +02:00
dependabot[bot]
d0f411d178
Bump core/cache_subsystem/hpdcache from 8a13ec4 to 645e422 (#1942) 2024-03-18 20:30:40 +01:00
Côme
bd4b57cc64
Parametrization step 3 part 3 (last) (#1940) 2024-03-18 16:19:52 +01:00
Côme
4817575de9
Parametrization step 3 part 2 (#1939) 2024-03-18 12:06:55 +01:00
Côme
987c645bb7
Parametrization step 3 (#1935)
This is the third step for #1451. Many values are moved but not all values are moved yet

* move NR_SB_ENTRIES & TRANS_ID_BITS
* remove default rvfi_instr_t from spike.sv
* fifo_v3: ariane_pkg::FPGA_EN becomes a param
* move FPGA_EN
* inline wt_cache_pkg::L15_SET_ASSOC
* move wt_cache_pkg::L15_WAY_WIDTH
* inline wt_cache_pkg::L1I_SET_ASSOC
* inline wt_cache_pkg::L1D_SET_ASSOC
* move wt_cache_pkg::DCACHE_CL_IDX_WIDTH
* move ICACHE_TAG_WIDTH
* move DCACHE_TAG_WIDTH
* move ICACHE_INDEX_WIDTH
* move ICACHE_SET_ASSOC
* use ICACHE_SET_ASSOC_WIDTH instead of $clog2(ICACHE_SET_ASSOC)
* move DCACHE_NUM_WORDS
* move DCACHE_INDEX_WIDTH
* move DCACHE_OFFSET_WIDTH
* move DCACHE_BYTE_OFFSET
* move DCACHE_DIRTY_WIDTH
* move DCACHE_SET_ASSOC_WIDTH
* move DCACHE_SET_ASSOC
* move CONFIG_L1I_SIZE
* move CONFIG_L1D_SIZE
* move DCACHE_LINE_WIDTH
* move ICACHE_LINE_WIDTH
* move ICACHE_USER_LINE_WIDTH
* move DCACHE_USER_LINE_WIDTH
* DATA_USER_WIDTH = DCACHE_USER_WIDTH
* move DCACHE_USER_WIDTH
* move FETCH_USER_WIDTH
* move FETCH_USER_EN
* move LOG2_INSTR_PER_FETCH
* move INSTR_PER_FETCH
* move FETCH_WIDTH
* transform SSTATUS_SD and SMODE_STATUS_READ_MASK into functions
* move [SM]_{SW,TIMER,EXT}_INTERRUPT into a structure
* move SV
* move vm_mode_t to config_pkg
* move MODE_SV
* move VPN2
* move PPNW
* move ASIDW
* move ModeW
* move XLEN_ALIGN_BYTES
* move DATA_USER_EN
* format: apply verible
2024-03-15 17:21:34 +00:00
Côme
aed4ed7c23
move functions into modules (#1926) 2024-03-13 17:46:33 +01:00
JeanRochCoulon
57f062bd85
Add Caches submodule description in Design Doc (#1923) 2024-03-12 17:40:05 +01:00
Côme
32a3cd56ee
Parametrization step 2 (#1908) 2024-03-08 22:53:42 +01:00
Cesar Fuguet
9267d14f2e
hpdcache: update submodule, interface and parameters (#1893) 2024-03-05 22:24:44 +01:00
dependabot[bot]
5dceb0d57a
Bump core/cache_subsystem/hpdcache from 019e04f to 5dea9e0 (#1877) 2024-02-26 21:24:56 +01:00
dependabot[bot]
e70bcbd6e7
Bump core/cache_subsystem/hpdcache from 38b9318 to 019e04f (#1857) 2024-02-21 13:14:03 +01:00
Cesar Fuguet
5de7c6003a
hpdcache: bump new version of the submodule (#1845) 2024-02-19 18:17:40 +01:00
Cesar Fuguet
45ffb59980
fix: support of AMOs in cv32 configurations (#1841) 2024-02-18 23:30:41 +01:00
Cesar Fuguet
00c0ff083a
hpdcache: bump new version of the submodule (#1830) 2024-02-13 18:19:16 +01:00
Nils Wistoff
6e8e2652b8
miss_handler: Fix AMO AXI ID mapping (#1821) 2024-02-09 23:14:47 +01:00
CoralieAllioux
48ea9a1675
[Bugfix hpdcache] axi struct usage (#1802) 2024-02-05 18:51:44 +01:00