This has no immediate effect, but it means that the memory agent's
config's "mem_err_shift" value can be changed in the middle of the
test, rather than being fixed in the build_phase.
It turns out that you don't actually need the separate test class for
this, so this commit gets rid of it. The advantage of doing this is
that we can now chain this vseq with others.
Since we are binding in an interface anyway, we can add some SV
assertions to make sure nothing too strange is happening.
Note that they aren't as strong as you might expect: we don't check
that rdata isn't X, for example. This is because the cache makes
speculative reads, which it (hopefully) ignores if the data is
invalid.
It seems that dvsim.py doesn't actually use fusesoc to do things like
pass parameters. Instead, we have to set the tool-specific options in
the hjson file by hand.
Fixes issue #964.
If window_range_hi = 32'hfffffffe and window_range_lo =
32'h00000000 (quite possible if we wrap), we were overflowing the
32-bit int.
The other way to write this would be something like
((window_range_hi - window_range_lo) / 4 +
(((window_range_hi - window_range_lo) & 3) != 0))
which avoids needing the extra bit, but that feels very
cumbersome.
This is supposed to spot when the valid signal drops without a ready
signal from the core. This is only allowed to happen if the core sends
a branch. The previous sequence was bogus: it didn't work for
back-to-back accesses (because it required $rose(valid)) and it didn't
check that valid actually dropped (which doesn't always happen). The
new one is simpler, and correct!
Note that we still don't see coverage of the sequence. I'll fix that
in the next patch.
This doesn't actually have any effect (since the branch has priority
over whether the core is ready), but it's possible in the spec, so we
should do it sometimes.
This hits some coverpoints that are defined at interface-level in the
core agent. The point is that you want to make sure address wrapping
works correctly (what's the next instruction after 0xfffffffe?).
Note that we now also constrain the base address to be even. This was
technically wrong before, but would only have been a problem if you
picked a base address of 0xffffffff (with a probability of 1 in 4
billion).
A few of these messages get printed out just before an error. It's
much more helpful for debugging if you see them with the default
verbosity. They only appear when something goes wrong, so let's just
turn them on.
- The testbench probes signals that are unqualified by instr_valid
- This causes events to trigger due to instructions that are not
actually executed, leading to false timeout failures
- Note this fix alone doesn't eliminate such failures due to another
issue which will be addressed separately
Signed-off-by: Tom Roberts <tomroberts@lowrisc.org>
The agent controls an ibex_icache_ecc_if interface, which is bound
into each prim_badbit_ram_1p module. There's a ton of painful wiring
in the environment to create an agent for each of these interfaces and
connect everything up properly.
By default, these agents don't have associated sequences (so they
don't inject read errors). You can switch them on by setting
enable_ecc_errors on the top-level virtual sequence. The patch adds a
vseq to do so (ibex_icache_ecc_vseq).
Note that we don't currently collect any specific coverage for ECC
checks. We'll probably add some uarch functional coverage points,
which will pick it up in the future, or we'll also pick it up if the
cache gets an alert output.
This does nothing by default, just wrapping up a prim_generic_ram_1p.
But we can bind an interface into it to inject bit errors by forcing
the bad_bit_mask signal.
Note that the icache uses ECC RAMs in a reasonably unusual way (ORing
together inputs and outputs from its data RAMs), so we have to do this
ourselves, rather than piggy-backing on the implementation or testing
done for e.g. OpenTitan's prim_ram_1p_adv.
This signal already got driven (to 1) when signalling a branch with
the interface's branch_to task. This patch now drives the branch_spec
line occasionally even if we don't actually do a branch. (One cycle in
64, for now).
These cover points were extracted by reading down the icache
documentation (icache.rst). There aren't yet cover points to check
that the targets of the testplan were executed properly, nor are there
any uarch coverpoints (which would be bound into the design, rather
than the interface).
The rather elaborate flow of
sequence -> function -> trigger -> task -> covergroup
for cancelled_valid_cg follows a skeleton described in Doug Smith, "A
Practical Look @ SystemVerilog Coverage" (slides from a Doulos
course). I'm not completely convinced it's worth the effort, but I
guess it shows how to extract information from a temporal sequence in
the interface and shove it in a covergroup properly via the monitor.
This should have no functional change - it's still set iff branch is
set - but the logic now lies in the UVM code, rather than the
structural code in tb.sv.
This turns out to be reasonably easy to plumb in: derive from the core
sequence base class, overriding its run_req method (once I've
remembered to make it virtual). Then pick the right core sequence by
adding a factory override in the vseq.
This is an entry in the testplan. Renaming it to "oldval", because
suffixing every class name with "disable_without_invalidation" was
getting ridiculous.
When the existing code in drive_pmp() decided that an error needed
signalling, it waited until the request was dropped, or the address
changed, before clearing the PMP error.
This is fine, unless the memory seed is changed (by magical means!)
under our feet. The monitor spots a new request, but the driver needs
to know to clear the PMP error. This patch forcibly tells the driver
to drop the existing item if a new one comes in.
Without this, you get test failures if there are two back-to-back
branches to the same address that happen at the same time as a seed
update. The problem is that you only see one request transaction (with
the first seed), and the two memory responses both come back with the
first seed, when the second should have had the second seed.
One aspect of (i)cache design that I didn't know about before writing
test code for this block is the problem of multi-way hits. The icache,
as implemented, stores data to parallel ways and it's possible for a
fetch to match more than one way. The data from matching ways all gets
ORed together, which doesn't matter so long as it never
changes (because V | V == V for all V).
Of course, things go poorly if you have two different values, V and W,
at an address which are both stored in the cache. Then the result is V
| W, which isn't necessarily equal to either instruction.
Avoiding this needs priority encoders, which are rather large, so it
seems the usual approach is to disallow branching to modified code
before flushing the cache. This patch teaches the testbench to do this
properly.
Sadly, this means there's now a connection between the core agent and
the memory agent: the memory agent can no longer generate new seeds
whenever it pleases.
The test is the same, but the reordering means that if we see an error
that we weren't expecting, we'll complain about that, rather than
about the instruction data itself.
In practice, this check will only trigger if you constrain your core
to fetch in a tight loop for a while and you don't invalidate the
cache very often.
The check has an assumption about the cache size (at least 1kB), but
that only has an effect on the tightness of the loop needed before we
do any checking.
- test_en_i is a DFT feature that shouldn't be enabled for normal
runtime testing
- Only really affects the clock gate in the design, but is needed for
running tests with the latch-based register file
Signed-off-by: Tom Roberts <tomroberts@lowrisc.org>
Instead of using copies of primitives from OpenTitan, vendor the files
in directly from OpenTitan, and use them.
Benefits:
- Less potential for diverging code between OpenTitan and Ibex, causing
problems when importing Ibex into OT.
- Use of the abstract primitives instead of the generic ones. The
abstract primitives are replaced during synthesis time with
target-dependent implementations. For simulation, nothing changes. For
synthesis for a given target technology (e.g. a specific ASIC or FPGA
technology), the primitives system can be instructed to choose
optimized versions (if available).
This is most relevant for the icache, which hard-coded the generic
SRAM primitive before. This primitive is always implemented as
registers. By using the abstract primitive (prim_ram_1p) instead, the
RAMs can be replaced with memory-compiler-generated ones if necessary.
There are no real draw-backs, but a couple points to be aware of:
- Our ram_1p and ram_2p implementations are kept as wrapper around the
primitives, since their interface deviates slightly from the one in
prim_ram*. This also includes a rather unfortunate naming confusion
around rvalid, which means "read data valid" in the OpenTitan advanced
RAM primitives (prim_ram_1p_adv for example), but means "ack" in
PULP-derived IP and in our bus implementation.
- The core_ibex UVM DV doesn't use FuseSoC to generate its file list,
but uses a hard-coded list in `ibex_files.f` instead. Since the
dynamic primitives system requires the use of FuseSoC we need to
provide a stop-gap until this file is removed. Issue #893 tracks
progress on that.
- Dynamic primitives depend no a not-yet-merged feature of FuseSoC
(https://github.com/olofk/fusesoc/pull/391). We depend on the same
functionality in OpenTitan and have instructed users to use a patched
branch of FuseSoC for a long time through `python-requirements.txt`,
so no action is needed for users which are either successfully
interacting with the OpenTitan source code, or have followed our
instructions. All other users will see a reasonably descriptive error
message during a FuseSoC run.
- This commit is massive, but there are no good ways to split it into
bisectable, yet small, chunks. I'm sorry. Reviewers can safely ignore
all code in `vendor/lowrisc_ip`, it's an import from OpenTitan.
- The check_tool_requirements tooling isn't easily vendor-able from
OpenTitan at the moment. I've filed
https://github.com/lowRISC/opentitan/issues/2309 to get that sorted.
- The LFSR primitive doesn't have a own core file, forcing us to include
the catch-all `lowrisc:prim:all` core. I've filed
https://github.com/lowRISC/opentitan/issues/2310 to get that sorted.
https://github.com/lowRISC/opentitan/pull/2311 added the Verilator
memutils to OpenTitan as upstream. This commit is the second part of the
story, removing the code from the Ibex repository, and vendoring it back
in from OpenTitan.
This also superseded #844, which has now been included through
OpenTitan.
The rdata driven by the cache is undefined when there is an error. There
are therefore no requirements on stability.
Signed-off-by: Tom Roberts <tomroberts@lowrisc.org>
This commit implements the Bit Manipulation Extension ZBR instruction
group: crc32[c].[bhw].
CRC-32 (CRC-32/ISO-HDLC) and CRC-32C (CRC-32/ISCSI) are directly
implemented. The CRC operation solves the following equation using
binary polynomial arithmetic:
rev(rd)(x) = rev(rs1)(x) * x**n mod {1, P}(x),
where {1,P}(x) denotes the crc polynomial. Using barret reduction one
can write this as
rd = (rs1 >> n) ^ rev(rev( (rs1 << (32-1)) cx rev(mu)) cx P)
^-- cycle 0--------------------^
^-- cycle 1 ------------------------------------------^
Where cx denotes carry-less multiplication and mu = polydiv(x**64,
{1,P}), omitting the MSB (bit 32).
The implementation increases area consumption by ~0.6kGE for synthesis
with relaxed timing constraints. With tight timing constraints that is
~1.6kGE. There is no significant impact on frequency.
Signed-off-by: ganoam <gnoam@live.com>