This refactors the invalidation control logic into an explicit state
machine. The top-level icache_invalid_o signal is also removed.
Replaced with an explicit scramble key request instead.
This has all been done to better deal with corner cases around a new
invalidation being requested whilst another is still going on.
Previously there was a bug wher an invalidation request in the final
cycle of an ongoing invalidation didn't restart the invalidation but did
rotate the scrambling key producing an ECC failure and an alert.
This commit includes switching to a scrambling RAM primitive for
ICache data and tag RAMs. Also introduces minor changes to ICache
to handle scrambling key valid signal.
It also includes a minor bug fix regarding not initializing
`fill_way_q` signal without ResetAll parameter. When the parameter
is not set and we have our first hit right after ICache enables,
the signal hangs.
Signed-off-by: Canberk Topal <ctopal@lowrisc.org>
- Instruction addresses are now checked in the IF stage, after the cache
and after the prefetch buffer.
- To deal with unaligned instructions, the PMP logic checks the current
address and the next in parallel.
- The spec_branch timing hack has been removed as it's no longer
relevant with the PMP logic moved.
- Various updates made to the icache testbench to account for the
changes.
- Relates to #1471
Signed-off-by: Tom Roberts <tomroberts@lowrisc.org>
This parameter forces a reset of all registers inside the core. This is
required to guarantee a common starting point for lockstep and thus
prevent spurious lockstep failure alerts.
Another minor change in this commit rearranges the writeback stage
multiplexing to gate incoming lsu write data when not valid. This stops
any X values from the data bus propagating to the register file
signalling (and thus to the lockstep comparison) which would cause the
lockstep alert to be X. It has the side effect of possibly reducing
power consumption in the register file.
Signed-off-by: Tom Roberts <tomroberts@lowrisc.org>
This commit creates a new top level wrapping the core, register file and
icache RAMs. The tracing top level is also renamed to ibex_top_tracing
to match. This new top level is intended to enable a dual core lockstep
implementation of Ibex.
There are no functional changes in this commit, only wiring.
Signed-off-by: Tom Roberts <tomroberts@lowrisc.org>
No functional change. These parameters are effectively fixed. Moving
them to the pkg eases top-level wiring of RAM signals.
Signed-off-by: Tom Roberts <tomroberts@lowrisc.org>
Changes the ECC granularity in the data RAMs from 64bit to 32bit. This
is to align with an upcoming change in bus ECC. Relates to
lowRISC/opentitan#5450
Signed-off-by: Tom Roberts <tomroberts@lowrisc.org>
These changes correspond to similar changes in the prefetch buffer to
support branch prediction. A registered version of fill_ext_done was
required to prevent a combinational loop from branch_i in to valid_o
out.
Multiplexing priorities for fifo_addr have been swapped to match
fetch_addr_d in the same module and all similar multiplexing in the
icache (prioritize incoming branch_i over branch_mispredict_i). Note
however that it is not expected that these conditions will actually
occur together, and an assertion has been added to check that.
Signed-off-by: Tom Roberts <tomroberts@lowrisc.org>
Remove the SpeculativeRequest parameter, and replace it with a
policy. If the cache is disabled, make the request on the basis that we
definitely won't hit in the cache and so should make the bus request
asap.
Stop fill buffers from fetching complete cache lines when the cache is
disabled. When the core branches into the middle of a line, the lower
words would normally be fetched to complete the line. This is
unnecessary when the cache is disabled since those words will just be
thrown away and the core will stall while they are being fetched.
These two changes make the performance using the disabled icache the
same as using non-icache prefetch buffer.
Signed-off-by: Tom Roberts <tomroberts@lowrisc.org>
This turns out to be quite complicated, because the icache has a lot
of different counters that all track addresses or fill buffer state.
For an inductive proof to go through, you need to make the relations
between them explicit, which takes lots of assertions.
All of the signals defined in the formal_tb are prefixed with 'f_'.
This isn't strictly necessary, but it makes it much easier to see what
came from the design (since we are "bound in", our ports don't have _o
or _i suffixes).
We add a couple of protocol assumptions:
ICache <-> Core:
- Branch target addresses are half-word aligned
- The branch_spec signal is driven if branch is
ICache <-> Memory:
- The bus doesn't respond unless there is an outstanding request
- The bus doesn't grant more than 4 billion outstanding requests(!)
There's also some protocol state tracking:
- f_addr_valid tracks whether the ICache currently has an
architectural address. It goes high with branch_i (which gives the
cache an address) and goes low when the cache completes a
transaction with err_o set (since the data is bad, there's no
notion of a "next address").
- f_reqs_on_bus tracks the number of requests currently outstanding
on the bus. This is used for the ICache <-> Memory assumptions
above. We have some internal assertions that check this equals the
sum of the "ext" counters minus the sum of the "rvd" counters.
With these assumptions, we can prove:
- Once asserted, valid_o stays high until a transaction is completed
by ready_i or until branch_i is asserted (which cancels the
transaction).
- While the transaction is pending, addr_o remains stable.
- While the transaction is pending, err_o remains stable and, if
err_o is asserted, so does err_plus2_o.
- While the transaction is pending, if err_o and err_plus2_o are
high then bottom 16 bits of the returned instruction data imply an
uncompressed instruction.
- While the transaction is pending, if err_o is low then the bottom
16 bits of the returned instruction remain stable.
- While the transaction is pending, if err_o is low and the bottom
16 bits of the returned instruction imply an uncompressed
instruction then the top 16 bits of the returned instruction
remain stable.
To get this working, you need a corresponding patch in Edalize, which
adds SymbiYosys as an EDA tool.
At the moment, this proves a couple of simple bus assertions. Later
patches will add more.
There are currently some rough edges to this flow:
(1) We use a hacky pre_build hook to run sv2v and edit the files in
the work tree. Among other problems, this means that the any
failure messages that come out of sby have bogus line numbers.
(2) Since we haven't yet got bind support in Yosys, we have to
include a fragment from the design itself.
- invalidate all ways on a tag error to prevent xprop from the data
error signal and reduce the likelyhood of multi-way allocations
Signed-off-by: Tom Roberts <tomroberts@lowrisc.org>
Since we are binding in an interface anyway, we can add some SV
assertions to make sure nothing too strange is happening.
Note that they aren't as strong as you might expect: we don't check
that rdata isn't X, for example. This is because the cache makes
speculative reads, which it (hopefully) ignores if the data is
invalid.
- Remove the timing optimisations that delay the factoring-in of ecc
errors into valid_o.
- Optimisations are probably unnecessary here due to the minimal logic
hanging off valid_o, and the optimisations cause protocol checker
violations.
Signed-off-by: Tom Roberts <tomroberts@lowrisc.org>
The RAM primitive provides a way to specify the granularity of the write
mask (wmask) signal, which can be used to select an appropriate
implementation (e.g. a SRAM with only byte selects, or no subselects at
all).
Instead of using copies of primitives from OpenTitan, vendor the files
in directly from OpenTitan, and use them.
Benefits:
- Less potential for diverging code between OpenTitan and Ibex, causing
problems when importing Ibex into OT.
- Use of the abstract primitives instead of the generic ones. The
abstract primitives are replaced during synthesis time with
target-dependent implementations. For simulation, nothing changes. For
synthesis for a given target technology (e.g. a specific ASIC or FPGA
technology), the primitives system can be instructed to choose
optimized versions (if available).
This is most relevant for the icache, which hard-coded the generic
SRAM primitive before. This primitive is always implemented as
registers. By using the abstract primitive (prim_ram_1p) instead, the
RAMs can be replaced with memory-compiler-generated ones if necessary.
There are no real draw-backs, but a couple points to be aware of:
- Our ram_1p and ram_2p implementations are kept as wrapper around the
primitives, since their interface deviates slightly from the one in
prim_ram*. This also includes a rather unfortunate naming confusion
around rvalid, which means "read data valid" in the OpenTitan advanced
RAM primitives (prim_ram_1p_adv for example), but means "ack" in
PULP-derived IP and in our bus implementation.
- The core_ibex UVM DV doesn't use FuseSoC to generate its file list,
but uses a hard-coded list in `ibex_files.f` instead. Since the
dynamic primitives system requires the use of FuseSoC we need to
provide a stop-gap until this file is removed. Issue #893 tracks
progress on that.
- Dynamic primitives depend no a not-yet-merged feature of FuseSoC
(https://github.com/olofk/fusesoc/pull/391). We depend on the same
functionality in OpenTitan and have instructed users to use a patched
branch of FuseSoC for a long time through `python-requirements.txt`,
so no action is needed for users which are either successfully
interacting with the OpenTitan source code, or have followed our
instructions. All other users will see a reasonably descriptive error
message during a FuseSoC run.
- This commit is massive, but there are no good ways to split it into
bisectable, yet small, chunks. I'm sorry. Reviewers can safely ignore
all code in `vendor/lowrisc_ip`, it's an import from OpenTitan.
- The check_tool_requirements tooling isn't easily vendor-able from
OpenTitan at the moment. I've filed
https://github.com/lowRISC/opentitan/issues/2309 to get that sorted.
- The LFSR primitive doesn't have a own core file, forcing us to include
the catch-all `lowrisc:prim:all` core. I've filed
https://github.com/lowRISC/opentitan/issues/2310 to get that sorted.
- Drive a speculative version of the branch signal into the IF stage to
drive address muxing
- The speculative signal is the same as the regular branch signal but
assumes all conditional branches are taken
- This breaks the timing path from branch condition calculation into
address muxing (and therefore PMP error calculation)
- When the branch is not taken, any external request we might otherwise
have made is suppressed
- This has a minor performance cost (0.8% without I$, ~0% with I$)
Signed-off-by: Tom Roberts <tomroberts@lowrisc.org>
- Remove any ready -> valid dependency by allowing the skid buffer to
accept data when the core is not ready
- Tighten-up behaviour around invalidations and cache enable/disable
- Remove xprop through output_compressed from invalid data when driving errors
- Make behaviour more consistent where speculative requests return
different data/error conditions to existing cache hit
Signed-off-by: Tom Roberts <tomroberts@lowrisc.org>
- Data valid should only be signalled when the current beat is
signalling an error
- PMP errors for future beats can sneak in while waiting for the
current beat
Signed-off-by: Tom Roberts <tomroberts@lowrisc.org>
- valid_o needs qualifying with output_error to mask X's in the output
data.
- err_plus2_o should not depend on the output data, only the alignment and
error status are relevant. Also needs skid_valid_q to mask
skid_error_q which is not reset.
Signed-off-by: Tom Roberts <tomroberts@lowrisc.org>
- Requests receiving a PMP error need to output a valid indicator, even
though they will not have received any beats of data
Signed-off-by: Tom Roberts <tomroberts@lowrisc.org>
- valid_o could be asserted for one cycle then dropped when receiving
rvalid data for a request which has branched into the middle of a
line.
- This fix keeps valid_o asserted by using the offset version of
fill_rvd_cnt_q (fill_rvd_beat) to compare against fill_out_cnt_q
(which is also offset by the branch).
Signed-off-by: Tom Roberts <tomroberts@lowrisc.org>
- Speculative requests observing a PMP error shouldn't increment the
external request counter
- Remove redundant logic on fill_rvd_exp
Signed-off-by: Tom Roberts <tomroberts@lowrisc.org>
- Add parameters and actual instantiation of icache
- Add a custom CSR in the M-mode custom RW range to enable the cache
- Wire up the cache invalidation signal to trigger on fence.i
Signed-off-by: Tom Roberts <tomroberts@lowrisc.org>
mtval should record which half of the instruction caused the error
rather than just recording the PC.
An extra signal is added in the IF stage to indicate when an error is
caused by the second half of an unaligned instruction. This signal is
then used to increment the PC by 2 for mtval capture on an error.
Fixes#709
Instruction requests triggering PMP errors have their external request
suppressed. The beat counting logic therefore needs to know that these
requests will never receive any rvalid data responses.
This fix stops the external request counter from incrementing, and marks
all external requests complete as soon as any error is received.
The data in the cache line beyond the error is not required since the
core cannot access it without consuming the error first.
- Bring in a version of ram primitive with configurable width similar to
the OT RAM primitive.
- Change the RAM banking structure to be a single bank of LineSize (64
bits) to match the upcoming ECC granularity.
Signed-off-by: Tom Roberts <tomroberts@lowrisc.org>
The design currently relies on fill_done remaining set in the cycle
after the fill buffer completes to ensure the fill_older_q entry gets
cleared (when a fill buffer completes in the same cycle that one is
allocated). This fix makes the behaviour a bit more consistent and easy
to reason about.