The first optimization for Altera FPGA is to move the instruction queue to LUTRAM. The reason why the optimization previously done for Xilinx is not working, is that in that case asynchronous RAM primitives are used, and Altera does not support asynchronous RAM. Therefore, this optimization consists in using synchronous RAM for the instruction queue and FIFOs inside wt axi adapter.
The main changes to the existing code are:
New RAM module to infer synchronous RAM in altera with independent read and write ports (SyncDpRam_ind_r_w.sv)
Changes inside cva6_fifo_v3 to adapt to the use of synchronous RAM instead of asynchronous:
When the FIFO is not empty, next data is always read and available at the output hiding the reading latency introduced by synchronous RAM (similar to fall-through approach). This is a simplification that is possible because in a FIFO we always know what is the next address to be read.
When data is read right after write, we can’t use the previous method because there is a latency to first write the data in the FIFO, and then to read it. For this reason, in the new design there is an auxiliary register used to hide this latency. This is used only if the FIFO is empty, so we detect when the word written is first word, and keep it in this register. If the next cycle comes a read, the data out is taken from the aux register. Afterwards the data is already available in the RAM and can be read continuously as in the first case.
All this is only used inf FpgaAlteraEn parameter is enabled, otherwise the previous implementation with asynchronous RAM applies (when FpgaEn is set), or the register based implementation (when FpgaEn is not set).
* Fill docs/design/design-manual/source/cva6_issue_stage.adoc
* Add variables to docs/design/design-manual/source/design.adoc
* Update port doc comments in core/issue_stage.sv, core/issue_read_operands.sv and core/scoreboard.sv
Expands all glob port maps in the core/ directory of this repository except the core/cache_subsystem/ directory, despite the glob port maps in core/cache_subsystem/miss_handler.sv and core/cache_subsystem/std_nbdcache.sv.
Also reorders port maps to keep the same order as port declarations.
The former kind of signal initialization generates compilation errors using VCS to simulate the design due to multiple drivers driving those signals. Since these signals are handled inside the always_ff block, they can just be reset.
This gate count increase has been added by #2555. The root cause has not been found but the deviation is small, and as it impacts the merge process (the ci is red), I prefer to fix the ci.
For `XLEN = 64`, some tools (e.g. VCS) still elaborate the offset generation block for `XLEN = 32`, throwing an elaboration error (illegal bit access). Fix this by generating the AXI offset in an equivalent, parameter-agnostic and tool-friendly way.
The controller flushes the pipeline and all the unissued instructions in the presence of instructions with side effects (e.g., fence).
The accelerator dispatcher buffer (now used with the Ara RVV Vector processor) is flushed when this happens and avoids accepting a new instruction in that cycle, but it does not prevent the actual issuing of instructions during a flush cycle.
This fix avoids the issue during a flush cycle.
Both the ISA and design documentations use some parameters generated from the RTL (ports, parameters).
As of now, they are committed to the repository and can be out of sync with the code.
This PR removes them from the repository and freshly generates them from the code when building HTML files.
This PR also removes prebuilt HTML files (design & ISA docs) and generates them when building the top-level Read the Docs documentation (make -C docs).
- FIX: Replace riscv_pkg:VLEN by CVA6Cfg.VLEN
- Declare VLEN as new CVA6 parameter
- smoke-hwconfig: run with vcs-uvm and use return0 tests to speed-up CI light stage timing execution
- Use dedicated linker scripts for 65x configuration.
- Use dedicated spike.yaml for 65x configuration.
- Set BHTEntries=128, cache=WT, scoreboard entries=8 to improve Coremark and Dhrystone results
- Run 4 iterations of coremark to improve results
For reminder, the option --issrun_opts="+tb_performance_mode" allows to disable UVM features like assertion and log generation to reduce simulation time.
Fix following requirement:
The assertion included in the always_comb block apparently violates the requirements in [section 9.2.2.2.2 of the SystemVerilog standard](https://ieeexplore.ieee.org/document/10458102):
Statements in an always_comb shall not include those that block, have blocking timing or event
controls, or fork-join statements.
This modification allows:
- printing the results in the terminal
- running the script from the terminal (without the environment variables from CI)
The yaml report is only built in CI, but the results are always printed.