- The controller state machine could only progress to FLUSH to handle an
exception if instr_valid_i was set
- When the exception comes from a load/store in the Writeback stage, and
no new instruction has been driven into the ID stage, this could cause
exception to be missed
- The instr_valid_i qualification is therefore removed from the state
machine as all relevant signals inside that if block are already
qualified by instr_valid_i anyway
- Fixes#849
Signed-off-by: Tom Roberts <tomroberts@lowrisc.org>
Change default to 4 rather than 0. Makes no difference when PMPEnable==0
and gets rid of lint failures due to 0 array referencing (0 is an
unsupported value for this parameter).
Signed-off-by: Tom Roberts <tomroberts@lowrisc.org>
This commit implements the Bit Manipulation Extension sign-extend
instructions: sext.b (sign-extend byte) and sext.h (sign-extend half
word).
The implementation is basically a one-liner, duplicating the msb of the
byte / half-word into the msb of the output register.
Signed-off-by: ganoam <gnoam@live.com>
This commit implements the Bit Manipulation Extension ZBF instruction
group, which consists only of the one instruction bfp (bit-field
place).
This instruction places a field of length len < 16 from rs2 in rs1 at
offset off.
Architectureal details:
The implementation works exactly the same as proposed by Claire
Wolf in her reference implementation.
1. bfp_mask = slo(o, len)
2. bfp_result =
(rs1 & ~(bfp_mask << off)) | (rs2 & bfp_mask) << off
^------ shifter-^
The existing shifter structure is shared for the indicated
operation.
Impact on area:
* When synthesizing without the B-extension, the 2 stage
design seems to move the timing bottleneck, leading to
optimizations which result in an area increase by 1 kGE,
when synthesized with tight timing constraints. For the
3 stage configuration there is no change.
When synthesized with relaxed timing constraints there is no
significant change in either configuration.
* With the B-extension enabled, the area increase for tight
timing constraints is 1.1-1.2 kGE. For relaxed timing
constraints that is ~0.4kGE
Impact on timing: No significant impact.
Signed-off-by: ganoam <gnoam@live.com>
This commit implements the Bit Manipulation Extension ZBE instruction
group: bext (bit extract) and bdep (bit deposit).
Architectural details:
* bext/bdep: A new butterfly and inverse butterfly network is
implemented. The generation of its controlbits depend on a
parallel prefix bitcount of the deposit / extract mask.
* bitcounter: The path for bext / bdep instructions traverses
the bit counter and the butterfly network, resulting in both a
larger delay and area. To mitigate the bitcounter has been
changed from a serial bit counter to a radix-2 tree structure.
* grev/gorc: Zbp instructions general reverse and general
or-combine have as of yet shared the shifters reversal
structure. It has proven benefitial to area and timing to reuse
the novel butterfly network instead
The butterfly network itself consumes ~3.5kGE and ~1.1kGE for synthesis
with tight and relaxed timing constraints respectively. Including the
optimizations of the bitcounter and grev/gorc, the overall change in
area consumption is +4.6kGE (+1.2kGE) and +3.3kGE (+1.1kGE) for
synthesis with tight (relaxed) timing constraints for 2- and 3-stage
configurations respectively. For tight timing constraints that is a
growth by around ~10%, for relaxed ~5%.
The impact on the maximum frequency is negligable.
Signed-off-by: ganoam <gnoam@live.com>
Once the cache has passed an error to the core, we now allow it to
wiggle its valid, addr, rdata, err and err_plus2 lines however it sees
fit until the core issues a new branch.
Since the core isn't allowed to assert ready until then, the values
will not be read and this won't matter.
This was exposed by
make -C dv/uvm/icache/dv run SEED=1314810947 WAVES=1
This assertion is supposed to say "the core may not request more data
from the cache when there's no valid address".
Unfortunately, I'd represented "requesting more data" by req being
high, rather than ready being high. This is wrong: req is a signal
saying "the core isn't currently asleep". ready (of a ready/valid
pair) is the one I wanted.
Our hjson-based logic for constructing VCS commands always passes
-elfile, but this doesn't work if the following list of arguments is
empty.
It seems difficult to figure out how to teach dvsim.py to do something
like "prepend X to Y if Y is nonempty", so let's just add an empty
file for now.
- Only specifying the signals that the TB cares about means people will
no longer have to update this file every time the cs_regs port list
changes
Signed-off-by: Tom Roberts <tomroberts@lowrisc.org>
As discussed in issue #845, this tells VCS to wait for signals to
settle in combinatorial blocks before checking uniqueness in
constructs like unique case.
Otherwise things like this can cause spurious warnings:
always_comb b = ~in;
always_comb c = in;
always_comb begin
unique case (1'b1)
b: x = 1;
c: x = 0;
default: x = 0; // not that it matters, but this won't happen
endcase
end
For example, on a falling edge of the in signal, if the processes are
executed in the order 1, 3, 2 then the unique case block will appear
to see both b and c true at the same time.
- All primitives the icache uses are specified in distinct core files
with names that match those existing (or about to exist) in OpenTitan
- When vendoring-in Ibex, none of those primitives need to be copied
across, since OpenTitan will use its own versions
- Relates to lowRISC/opentitan/#1231
Signed-off-by: Tom Roberts <tomroberts@lowrisc.org>
The flow for a memory fetch is:
1. Cache requests data for a memory address
2. Agent spots the request, maybe signalling a PMP error
3. Grant line goes high, at which point the request is granted.
4. Sometime later (in-order pipeline), agent sends a response
Occasionally, we need to pick a new seed for the backing memory.
Before this patch, we picked these seeds at point (3).
Unfortunately this was wrong in the following case:
1. We're switching from seed S0 to seed S1.
2. The request is spotted with seed S0 and doesn't signal a PMP error
3. The request is granted and we switch to seed S1.
4. We respond with data from memory based on S1, with no memory
error either
If S1 would have caused a PMP error, the resulting fetch (no error,
but data from S1) doesn't match any possible seed and the scoreboard
gets confused.
This patch changes to picking new seeds at (2) to solve the problem.
This isn't quite enough by itself, because if a request is granted on
a clock-edge, a new request address might appear and there isn't a
guaranteed ordering in the simulation between the new request and the
old grant (both things happen at the same time). To fix this, the
response sequence now maintains a queue of requests and their
corresponding seeds to make sure that all the checks for a fetch are
done with a single seed.
The patch also gets rid of the seed state in the memory model: it
turns out that this didn't really help: the scoreboard is always
asking "what would I get with this seed?" and now the sequence is
doing something similar.
- Data valid should only be signalled when the current beat is
signalling an error
- PMP errors for future beats can sneak in while waiting for the
current beat
Signed-off-by: Tom Roberts <tomroberts@lowrisc.org>
- Stalls due to preceding memory accesses in the WB stage shouldn't
cause the jump signal to remain high.
- The jump signal being stuck high causes repeated memory accesses to
the same address, and unnecessary stalling.
- FixeslowRISC/opentitan#2099
Signed-off-by: Tom Roberts <tomroberts@lowrisc.org>
Firstly, the pulse shouldn't be zero length (since that wouldn't
actually do anything).
Also, 1 time in 500 is too rare for either invalidations or "long
invalidations", I think, so this patch also increases how often we see
each.
Now that we have a working framework, let's drive some more items
through (a bit more efficient than running more tests, and also less
skewed by the initial cache invalidation).
This correctly tracks fetch addresses and fetched data. It understands
changing memory seeds, errors, invalidations and enable/disable.
Most of the complexity is in checking whether a fetch got the right
answer, given the set of memory seeds that it might have used. This
isn't conceptually hard (use a local memory model; check each seed and
see whether it matches), but is a bit fiddly in practice. In
particular, a misaligned 4-byte load might actually correspond to two
different seeds: note the nested loops in check_compatible_2.
The general flow of these checks is:
check_compatible
-> check_compatible_<i> (loop over plausible seeds)
-> is_seed_compatible_<i> (ask the memory what data to expect)
-> is_fetch_compatible_<i> (compare seen/expected data)
Note that the check_compatible_<i> functions have a "chatty"
parameter. This is to help with debugging when something goes wrong:
if the check fails then the check_compatible function calls it again
with chatty=1, which dumps a list of the seeds we checked and why they
didn't match.
Other than the scoreboard itself, this patch also adds a "seed" field
to ibex_icache_mem_bus_item. This is used by the monitor to inform the
scoreboard when a new memory seed has been set. Obviously, this
doesn't correspond to any actual monitored signals, but we need some
sort of back channel, and this looked like a sensible way to do it.
The patch also stops reporting PMP responses to the scoreboard: it
turns out the scoreboard doesn't need to care about them, so we can
simplify things slightly this way.
At the moment, the scoreboard doesn't check that fetched data isn't
stored when the cache is disabled. You could see this by disabling the
cache, fetching from an address, enabling the cache and changing the
memory seed, and fetching from the address again. I think it would be
reasonably easy to make an imprecise version of the check, where a
seed gets discarded from the queue if its live period is completely
within a period where the cache was disabled, but I want to wait until
we've got some tests that actually get cache hits before I implement
this.
There's also a slight imprecision in the busy line check that needs
tightening up.
Both of these to-do items have TODO comments in the code.
This commit implements the Bit Manipulation Extension ZBP instruction
group: grev[i] (generalized reverse), gorc[i] (generalized or-combine)
and [un]shfl[i] (generalized shuffle) and all of their
pseudo-instructions.
Architectural details:
* grev / gorc: The shifter structure features only a right
shift structure. In order to perform a left shift therefore the
operand needs to be reversed, shifted and reversed again. The
architecture of the back-reversal is implemented in stages
which are activated using the general reverse / orcombine
operand, or a signal marking left-shifts.
* shfl / unshfl: Also known as zip / unzip or interlace /
uninterlace operation. These instructions are implemented
in their own structure using a permutation networ of 6 stages.
4 stages thereof implement the shuffle permutations. the first
and last stage is the flip stage, which effectively reverse s
the order of the inner stages, for unshuffle operations.
Signed-off-by: ganoam <gnoam@live.com>
If the cache returns an error to the core, the core must then issue a
branch before it tries to fetch again from the cache. This patch makes
this work by getting the core driver to respond with a (newly defined)
ibex_icache_core_rsp_item containing an error flag.
Update code from subdir hw/dv/sv/dv_lib in upstream repository
https://github.com/lowRISC/opentitan to revision
1d17b1225d324c81da522c69317335a83edd5ddb
* [dv] Add excl for rstmgr, pwrmgr and fix top-level csr test (Weicai
Yang)
* [dv] Allow dv_lib-based sequences to have different RSP/REQ types
(Rupert Swarbrick)
* [dv] Support WO, RO type for mem (Weicai Yang)
* [dv,sw] SW -> DV tb self-checking mechanism - SV (Srikrishna Iyer)
* [dv/top] Fix csr rw test (Cindy Chen)
Signed-off-by: Rupert Swarbrick <rswarbrick@lowrisc.org>
The actual core might signal ready at the same time as branching. If
it does so, it will ignore any data that comes back on that
cycle (since it's redirecting the fetch either way).
The test code monitor wasn't ignoring the fetch. In practice, this
doesn't usually matter, because the cache works correctly(!) and the
data is what we expect. However, it isn't quite right if we saw an
error on the previous cycle. In this case, the cache can carry on
responding with errors, which all gets a bit confusing, and also
confuses my prototype scoreboard.
The previous approach didn't work because of possible ordering
problems that make it difficult to convert between "combinatorial
signals" and transactions.
When things are aligned with a clock edge, you solve this problem with
a clocking block (which, by default, guarantees to sample "just after"
the clock edge, once the signals have settled). Since this signal is
not aligned with a clock, we have to be a little more careful.
Before and after this patch, the code in the monitor is "eager" and
might send glitchy transactions through to the sequence, which ends up
passing the data back to the driver. This patch teaches the driver to
cope with that, by passing the address that caused a PMP error as part
of the memory response. Armed with that address, the driver can figure
out whether the error flag applies to what's currently on the bus, or
whether it's a few delta cycles behind, in which case it can safely
just drop the glitch.
This is enough to report transactions up to the scoreboard. At the
moment, there's nothing listening, but you can see them going through
with
make run VERBOSITY=h
and then looking at the dumps in run.log.
This adds SVA properties for the memory <-> cache interface. Rather
than pollute the actual interface, these are wrapped in their own
if ("ibex_icache_mem_protocol_checker").
This is enough to report transactions up to the scoreboard. At the
moment, there's nothing listening, but you can see them going through
with
make run VERBOSITY=h
and then looking at the dumps in run.log.