- Add SECDED ECC checking to the register file when SecureIbex is
enabled
- No correction is attempted, but an alert is raised for the system to
intervene
Signed-off-by: Tom Roberts <tomroberts@lowrisc.org>
- Add a major and minor alert output which can be used by the system to
react to fault injection attacks
Signed-off-by: Tom Roberts <tomroberts@lowrisc.org>
But if you are using precompiled UVM it may be compiled with other timescale depending on compilation option used when it was compiled or tools default timescale value (uvm does not set timescale int the code).
In this case for us precompiled UVM timescale is 1ps/1ps - so UVM gets 1000000000 in set timeout but interprets it as ps. As a result timeout is 1000 times smaller that you expect. That is why we are getting timeouts.
It is hard to find perfect solution. One of them is to recompile the UVM with -timescale 1ns/ps (or whatever you will use for your design).
As a result of lowRISC/opentitan#2405 and lowRISC/ibex#928 (reporting
that interrupts that came in while a load instruction was in the ID
stage caused some incorrect behavior in Ibex), this PR adds some new
directed interrupt and debug tests to check that the core behaves
properly during execution of each supported instruction when some
external irq/debug stimulus comes in.
To do this, we use the two new functions `decode_instr(...)` and
`decode_compressed_instr(...)` in `core_ibex_test_list.sv` to "decode"
every instruction that the `core_ibex_instr_monitor_if` sees in the ID
stage of the pipeline. Once the testbench decodes an instruction that
we have not seen before, it can then drive interrupt or debug stimulus
into the core.
Once any given instruction has been detected by the testbench (and
stimulus driven), it will no longer drive stimulus if this instruction
is seen in the decode pipeline (e.g. if we have previously detected a
`c.addi` instruction in the ID stage and have driven irq/debug stimulus,
we will no longer drive stimulus if we see another `c.addi` instruction,
no matter the operands). This is to avoid driving irq/debug stimulus
after every single instruction as this will add a huge unwanted amount
of simulation latency.
A few notes:
- We drive irq/debug stimulus into the core every time we see a
`wfi` instruction, as otherwise we will timeout as the core waits
infinitely for some stimulus from the outside world.
- We ignore some system-level instructions (ebreak/mret/dret) and
illegal instructionsfor now, as driving stimulus during these
instructions will result in a nested trap, which requires special
handling.
- The interrupt agent was modified slightly to drive stimulus by
default on the falling edge of the clock, so this way we can "catch"
instructions that are in the ID pipeline for only a single cycle.
- The duration for which the testbench raises `debug_req_i` for the core
is also increased to avoid edge cases where we lower the debug line
too early (e.g. while long multicycle instructions like `div` are
executing in the ID stage).
- The "PINCONNECTEMPTY" waiver is part of our normal waiver file, no need
to add it to the tool invocation.
- Recent versions of Verilator choose good defaults for MAKE_OPTS,
passing it explicitly overrides the settings.
- All Verilator code is now lint clean, we can remove `-Wno-fatal`.
- FST traces are not much slower then VCD traces any more in recent
Verilator versions, remove the respective comment.
- Align comment about the compile/sim time for tracing with other files
and OpenTitan.
This commit contains some final optimizations regarding the bit
manipulation extension as well as the parametrization into a balanced
version and a full performance version.
Balanced Version:
* Supports ZBB, ZBS, ZBF and ZBT extensions
* Dual cycle instructions:
ror[i], rol, cmov, cmix fsl, fsr[i]
* Everything else completes in a single cycle.
Full Version:
* Supports all 32b sub extensions.
* Dual cycle instructions:
ror[i], rol, cmov, cmix fsl, fsr[i], crc32[c], bext, bdep
* Everything else completes in a single cycle.
Notable Changes:
* bext/bdep are now multi-cycle: Sharing additional register
with multiplier module
* grev/gorc instructions are implemented in separate structures
rather than sharing the shifter or butterfly network.
* Speed up decision on using rs1 or rs3 for alu_operand_a by
introducing single-bit register, to identify ternary
instructions in their first cycle.
* Introduce enumerated parameter to chose bit manipulation
implementation
Signed-off-by: ganoam <gnoam@live.com>
This PR adds clocking blocks to all major Ibex interfaces and updates
all corresponding interface accesses to use these clocking blocks.
A few notes:
- `ibex_mem_intf` has two driver clocking blocks, one for host side and
one for device side.
This is because our Ibex testbench currently provides both host and
device agents for both I/D interfaces (of course we only use the
reactive device agents in the main testbench).
- `csr_if` and `dut_if` only have one clocking block each, as all
signals in each will only be either sampled or driven, never both.
- Some utility tasks have been added to some interfaces to wait for a
specified number of clock cycles.
This is like the stress_all test, picking other sequences at random
and running them back-to-back. The difference is in the reset
behaviour, where we randomly pull the reset line at unexpected times
to try to trigger any strange glitches this might cause.
This requires slight changes to the core and memory drivers, which
need to learn to stop and return early from the current item when they
see a reset.
When we chain sequences together, we are careful to pass seeds between
neighbouring sequences. However, I didn't think to check
mem_err_shift. Before this patch, you see problems if you have a
"caching" sequence followed by a "many_errors" sequence with no reset
and no change of seed and they both happen to pick the same address
range.
The problem is that if the data at address A is cached in the first
sequence, the icache will merrily return it when address A comes up in
the second. However, the change to mem_err_shift might mean that this
would cause a memory error if it hadn't been cached, causing the
scoreboard to get upset.
This patch ensures that we always start a sequence with an
invalidation if there was a previous sequence with a different value
of mem_err_shift.
To do this cleanly, the patch also moves some of the "grab the guts of
the old sequence and put it in the new one" logic from
ibex_icache_combo_vseq and into the underlying sequence classes. The
trick is that a sequence now has a handle to the previous sequence (if
there was one), and can use that to extract whatever information it
needs.
This fixes several problems. Firstly, the window_reset function was
switching off tracking until it next saw busy_o go low, which is
correct at the start of time, but not what we want after we've
started. This patch splits that behaviour into a new tracking_reset
function (which calls window_reset). This is called on reset or
invalidate.
Secondly, this check was occasionally failing where we'd have an ECC
sequence (which should disable the check) immediately followed by a
caching sequence with similar addresses. If the window ended in the
caching sequence, we'd see a high fetch ratio and conclude that
something had gone wrong.
Now we clear the window completely whenever we fetch an instruction
when the check is disabled, which should avoid the problem (at worst,
you might get 1 instruction overlap, which is unlikely to matter).
Finally, we move the call to tracking_reset up to the end of the reset
sequence. It doesn't usually matter, but if there's a pending item
from the core monitor with busy = 0, we need to make sure that item
comes in before we set not_invalidating = 1. Otherwise, the scoreboard
incorrectly thinks it's seen the end of the invalidation
sequence (before it's even started) and starts tracking fetch ratios
too early.
This runs sequences back-to-back, occasionally resetting between
sequences.
Because our virtual sequences are composed of several smaller
sequences, we have to stop them when the core sequence finishes (see
the calls to kill() in ibex_icache_base_vseq). We also have to make
sure that we don't drop items in the memory sequence, which can be
pre-empted as part of sending a response (see the peek/get code
there).
Finally, the memory sequence also has a current seed and a list of
pending grants: this patch has to copy those across between sequences
to make everything work correctly.
This virtual sequence controls what sequence we use in the core agent
with a factory override. We need to make sure that we "tidy up" after
starting it, otherwise every sequence afterwards will use the wrong
core sequence.
This will have no effect for now: we just move the "pick a number in
the range 800..1000" logic to the virtual sequence.
The reason to do this is for tests that combine sequences: we want to
be able to shorten each component sequence so that the combined test
isn't way longer than the original ones were.
As with the ECC sequence, it turns out that you don't actually need
the separate test class for this, so this commit gets rid of it. The
advantage of doing this is that we can now chain this vseq with
others.
This has no immediate effect, but it means that the memory agent's
config's "mem_err_shift" value can be changed in the middle of the
test, rather than being fixed in the build_phase.
It turns out that you don't actually need the separate test class for
this, so this commit gets rid of it. The advantage of doing this is
that we can now chain this vseq with others.
Since we are binding in an interface anyway, we can add some SV
assertions to make sure nothing too strange is happening.
Note that they aren't as strong as you might expect: we don't check
that rdata isn't X, for example. This is because the cache makes
speculative reads, which it (hopefully) ignores if the data is
invalid.
It seems that dvsim.py doesn't actually use fusesoc to do things like
pass parameters. Instead, we have to set the tool-specific options in
the hjson file by hand.
Fixes issue #964.
If window_range_hi = 32'hfffffffe and window_range_lo =
32'h00000000 (quite possible if we wrap), we were overflowing the
32-bit int.
The other way to write this would be something like
((window_range_hi - window_range_lo) / 4 +
(((window_range_hi - window_range_lo) & 3) != 0))
which avoids needing the extra bit, but that feels very
cumbersome.
This is supposed to spot when the valid signal drops without a ready
signal from the core. This is only allowed to happen if the core sends
a branch. The previous sequence was bogus: it didn't work for
back-to-back accesses (because it required $rose(valid)) and it didn't
check that valid actually dropped (which doesn't always happen). The
new one is simpler, and correct!
Note that we still don't see coverage of the sequence. I'll fix that
in the next patch.
This doesn't actually have any effect (since the branch has priority
over whether the core is ready), but it's possible in the spec, so we
should do it sometimes.
This hits some coverpoints that are defined at interface-level in the
core agent. The point is that you want to make sure address wrapping
works correctly (what's the next instruction after 0xfffffffe?).
Note that we now also constrain the base address to be even. This was
technically wrong before, but would only have been a problem if you
picked a base address of 0xffffffff (with a probability of 1 in 4
billion).
A few of these messages get printed out just before an error. It's
much more helpful for debugging if you see them with the default
verbosity. They only appear when something goes wrong, so let's just
turn them on.
- The testbench probes signals that are unqualified by instr_valid
- This causes events to trigger due to instructions that are not
actually executed, leading to false timeout failures
- Note this fix alone doesn't eliminate such failures due to another
issue which will be addressed separately
Signed-off-by: Tom Roberts <tomroberts@lowrisc.org>
The agent controls an ibex_icache_ecc_if interface, which is bound
into each prim_badbit_ram_1p module. There's a ton of painful wiring
in the environment to create an agent for each of these interfaces and
connect everything up properly.
By default, these agents don't have associated sequences (so they
don't inject read errors). You can switch them on by setting
enable_ecc_errors on the top-level virtual sequence. The patch adds a
vseq to do so (ibex_icache_ecc_vseq).
Note that we don't currently collect any specific coverage for ECC
checks. We'll probably add some uarch functional coverage points,
which will pick it up in the future, or we'll also pick it up if the
cache gets an alert output.
This does nothing by default, just wrapping up a prim_generic_ram_1p.
But we can bind an interface into it to inject bit errors by forcing
the bad_bit_mask signal.
Note that the icache uses ECC RAMs in a reasonably unusual way (ORing
together inputs and outputs from its data RAMs), so we have to do this
ourselves, rather than piggy-backing on the implementation or testing
done for e.g. OpenTitan's prim_ram_1p_adv.
This signal already got driven (to 1) when signalling a branch with
the interface's branch_to task. This patch now drives the branch_spec
line occasionally even if we don't actually do a branch. (One cycle in
64, for now).
These cover points were extracted by reading down the icache
documentation (icache.rst). There aren't yet cover points to check
that the targets of the testplan were executed properly, nor are there
any uarch coverpoints (which would be bound into the design, rather
than the interface).
The rather elaborate flow of
sequence -> function -> trigger -> task -> covergroup
for cancelled_valid_cg follows a skeleton described in Doug Smith, "A
Practical Look @ SystemVerilog Coverage" (slides from a Doulos
course). I'm not completely convinced it's worth the effort, but I
guess it shows how to extract information from a temporal sequence in
the interface and shove it in a covergroup properly via the monitor.
This should have no functional change - it's still set iff branch is
set - but the logic now lies in the UVM code, rather than the
structural code in tb.sv.
This turns out to be reasonably easy to plumb in: derive from the core
sequence base class, overriding its run_req method (once I've
remembered to make it virtual). Then pick the right core sequence by
adding a factory override in the vseq.