Once the cache has passed an error to the core, we now allow it to
wiggle its valid, addr, rdata, err and err_plus2 lines however it sees
fit until the core issues a new branch.
Since the core isn't allowed to assert ready until then, the values
will not be read and this won't matter.
This was exposed by
make -C dv/uvm/icache/dv run SEED=1314810947 WAVES=1
This assertion is supposed to say "the core may not request more data
from the cache when there's no valid address".
Unfortunately, I'd represented "requesting more data" by req being
high, rather than ready being high. This is wrong: req is a signal
saying "the core isn't currently asleep". ready (of a ready/valid
pair) is the one I wanted.
The flow for a memory fetch is:
1. Cache requests data for a memory address
2. Agent spots the request, maybe signalling a PMP error
3. Grant line goes high, at which point the request is granted.
4. Sometime later (in-order pipeline), agent sends a response
Occasionally, we need to pick a new seed for the backing memory.
Before this patch, we picked these seeds at point (3).
Unfortunately this was wrong in the following case:
1. We're switching from seed S0 to seed S1.
2. The request is spotted with seed S0 and doesn't signal a PMP error
3. The request is granted and we switch to seed S1.
4. We respond with data from memory based on S1, with no memory
error either
If S1 would have caused a PMP error, the resulting fetch (no error,
but data from S1) doesn't match any possible seed and the scoreboard
gets confused.
This patch changes to picking new seeds at (2) to solve the problem.
This isn't quite enough by itself, because if a request is granted on
a clock-edge, a new request address might appear and there isn't a
guaranteed ordering in the simulation between the new request and the
old grant (both things happen at the same time). To fix this, the
response sequence now maintains a queue of requests and their
corresponding seeds to make sure that all the checks for a fetch are
done with a single seed.
The patch also gets rid of the seed state in the memory model: it
turns out that this didn't really help: the scoreboard is always
asking "what would I get with this seed?" and now the sequence is
doing something similar.
Firstly, the pulse shouldn't be zero length (since that wouldn't
actually do anything).
Also, 1 time in 500 is too rare for either invalidations or "long
invalidations", I think, so this patch also increases how often we see
each.
Now that we have a working framework, let's drive some more items
through (a bit more efficient than running more tests, and also less
skewed by the initial cache invalidation).
This correctly tracks fetch addresses and fetched data. It understands
changing memory seeds, errors, invalidations and enable/disable.
Most of the complexity is in checking whether a fetch got the right
answer, given the set of memory seeds that it might have used. This
isn't conceptually hard (use a local memory model; check each seed and
see whether it matches), but is a bit fiddly in practice. In
particular, a misaligned 4-byte load might actually correspond to two
different seeds: note the nested loops in check_compatible_2.
The general flow of these checks is:
check_compatible
-> check_compatible_<i> (loop over plausible seeds)
-> is_seed_compatible_<i> (ask the memory what data to expect)
-> is_fetch_compatible_<i> (compare seen/expected data)
Note that the check_compatible_<i> functions have a "chatty"
parameter. This is to help with debugging when something goes wrong:
if the check fails then the check_compatible function calls it again
with chatty=1, which dumps a list of the seeds we checked and why they
didn't match.
Other than the scoreboard itself, this patch also adds a "seed" field
to ibex_icache_mem_bus_item. This is used by the monitor to inform the
scoreboard when a new memory seed has been set. Obviously, this
doesn't correspond to any actual monitored signals, but we need some
sort of back channel, and this looked like a sensible way to do it.
The patch also stops reporting PMP responses to the scoreboard: it
turns out the scoreboard doesn't need to care about them, so we can
simplify things slightly this way.
At the moment, the scoreboard doesn't check that fetched data isn't
stored when the cache is disabled. You could see this by disabling the
cache, fetching from an address, enabling the cache and changing the
memory seed, and fetching from the address again. I think it would be
reasonably easy to make an imprecise version of the check, where a
seed gets discarded from the queue if its live period is completely
within a period where the cache was disabled, but I want to wait until
we've got some tests that actually get cache hits before I implement
this.
There's also a slight imprecision in the busy line check that needs
tightening up.
Both of these to-do items have TODO comments in the code.
If the cache returns an error to the core, the core must then issue a
branch before it tries to fetch again from the cache. This patch makes
this work by getting the core driver to respond with a (newly defined)
ibex_icache_core_rsp_item containing an error flag.
The actual core might signal ready at the same time as branching. If
it does so, it will ignore any data that comes back on that
cycle (since it's redirecting the fetch either way).
The test code monitor wasn't ignoring the fetch. In practice, this
doesn't usually matter, because the cache works correctly(!) and the
data is what we expect. However, it isn't quite right if we saw an
error on the previous cycle. In this case, the cache can carry on
responding with errors, which all gets a bit confusing, and also
confuses my prototype scoreboard.
The previous approach didn't work because of possible ordering
problems that make it difficult to convert between "combinatorial
signals" and transactions.
When things are aligned with a clock edge, you solve this problem with
a clocking block (which, by default, guarantees to sample "just after"
the clock edge, once the signals have settled). Since this signal is
not aligned with a clock, we have to be a little more careful.
Before and after this patch, the code in the monitor is "eager" and
might send glitchy transactions through to the sequence, which ends up
passing the data back to the driver. This patch teaches the driver to
cope with that, by passing the address that caused a PMP error as part
of the memory response. Armed with that address, the driver can figure
out whether the error flag applies to what's currently on the bus, or
whether it's a few delta cycles behind, in which case it can safely
just drop the glitch.
This is enough to report transactions up to the scoreboard. At the
moment, there's nothing listening, but you can see them going through
with
make run VERBOSITY=h
and then looking at the dumps in run.log.
This adds SVA properties for the memory <-> cache interface. Rather
than pollute the actual interface, these are wrapped in their own
if ("ibex_icache_mem_protocol_checker").
This is enough to report transactions up to the scoreboard. At the
moment, there's nothing listening, but you can see them going through
with
make run VERBOSITY=h
and then looking at the dumps in run.log.
This adds SVA properties for the cache <-> core interface. Rather than
pollute the actual interfaces, these are wrapped in their own
interface ("ibex_icache_core_protocol_checker").
This pretty much follows various UVM tutorials, except that we use the
sequencers (ibex_icache_core_sequencer, ibex_icache_mem_sequencer) to
poke the heartbeat objection object if there is one.
Doing this means that we can dispense with the explicit test timeout
in the hjson file: the test will fail if there is no activity on
either interface for 2000 cycles (this number is chosen because the
longest "wait a bit" time is 1200 cycles in the core driver).
This allows tests to run to completion. Unfortunately, the core agent
currently has a bug in how it handles errors (if a sequence item
terminates with an error and is followed by a 'req' sequence item, the
test hangs).
I'll fix that in a separate patch soon (and revert this commit), but
don't want the memory agent changes to depend on the fix.
As with the core agent, this doesn't yet have any proper monitoring,
and the testbench has no scoreboard, so we're not actually checking
anything.
However, it *does* implement a slave agent which can respond to
instruction fetches from the icache. This runs a test to completion,
try:
make -C dv/uvm/icache/dv run WAVES=1
Note that the timeout in the hjson file was too small, so this bumps
it to a larger value. A later patch will replace the timeout with a
heartbeat monitor, which should be much cleaner.
The exact dance for the UVM reactive slave is complicated. See
dv/uvm/icache/dv/ibex_icache_mem_agent/README.md for the details.
Since this agent doesn't currently do any monitoring (will be
addressed in a later patch), the monitor_cb clocking block doesn't do
very much at the moment.
The driver_cb clocking block *is* used, though. The input lines are just
those needed to drive things correctly (ready is needed to do
ready/valid signalling properly; err is needed to abort instruction
fetches and do a branch after an error).
I've marked the output signals as negedge: this doesn't really make
any difference to simulation results, since the design samples
everything on posedge, but makes it rather easier to read dumped
waves.
There is more than one icache-specific agent that we need for the
icache testbench, so "ibex_icache_agent" isn't a very helpful name.
This commit was pretty much automated, except for a few spacing
cleanups, with commands like:
git grep -l ibex_icache_agent | \
xargs sed -i 's!ibex_icache_agent!ibex_icache_core_agent!g'
(and then rename the directory and files).
This fills in the sequencer, driver etc. to actually drive signals.
You can "run" a test with
make -C dv/uvm/icache/dv run
This won't do anything useful (it will stop with a timeout) because
there is no memory agent yet.
This patch also includes a hacky test timeout. We'll remove this (or
at least make it bigger) when we start actually running data through
the tests, but this is handy for now because it means simulations
finish without having to pkill them.