This turns out to be quite complicated, because the icache has a lot
of different counters that all track addresses or fill buffer state.
For an inductive proof to go through, you need to make the relations
between them explicit, which takes lots of assertions.
All of the signals defined in the formal_tb are prefixed with 'f_'.
This isn't strictly necessary, but it makes it much easier to see what
came from the design (since we are "bound in", our ports don't have _o
or _i suffixes).
We add a couple of protocol assumptions:
ICache <-> Core:
- Branch target addresses are half-word aligned
- The branch_spec signal is driven if branch is
ICache <-> Memory:
- The bus doesn't respond unless there is an outstanding request
- The bus doesn't grant more than 4 billion outstanding requests(!)
There's also some protocol state tracking:
- f_addr_valid tracks whether the ICache currently has an
architectural address. It goes high with branch_i (which gives the
cache an address) and goes low when the cache completes a
transaction with err_o set (since the data is bad, there's no
notion of a "next address").
- f_reqs_on_bus tracks the number of requests currently outstanding
on the bus. This is used for the ICache <-> Memory assumptions
above. We have some internal assertions that check this equals the
sum of the "ext" counters minus the sum of the "rvd" counters.
With these assumptions, we can prove:
- Once asserted, valid_o stays high until a transaction is completed
by ready_i or until branch_i is asserted (which cancels the
transaction).
- While the transaction is pending, addr_o remains stable.
- While the transaction is pending, err_o remains stable and, if
err_o is asserted, so does err_plus2_o.
- While the transaction is pending, if err_o and err_plus2_o are
high then bottom 16 bits of the returned instruction data imply an
uncompressed instruction.
- While the transaction is pending, if err_o is low then the bottom
16 bits of the returned instruction remain stable.
- While the transaction is pending, if err_o is low and the bottom
16 bits of the returned instruction imply an uncompressed
instruction then the top 16 bits of the returned instruction
remain stable.
To get this working, you need a corresponding patch in Edalize, which
adds SymbiYosys as an EDA tool.
At the moment, this proves a couple of simple bus assertions. Later
patches will add more.
There are currently some rough edges to this flow:
(1) We use a hacky pre_build hook to run sv2v and edit the files in
the work tree. Among other problems, this means that the any
failure messages that come out of sby have bogus line numbers.
(2) Since we haven't yet got bind support in Yosys, we have to
include a fragment from the design itself.
- invalidate all ways on a tag error to prevent xprop from the data
error signal and reduce the likelyhood of multi-way allocations
Signed-off-by: Tom Roberts <tomroberts@lowrisc.org>
Since we are binding in an interface anyway, we can add some SV
assertions to make sure nothing too strange is happening.
Note that they aren't as strong as you might expect: we don't check
that rdata isn't X, for example. This is because the cache makes
speculative reads, which it (hopefully) ignores if the data is
invalid.
- Remove the timing optimisations that delay the factoring-in of ecc
errors into valid_o.
- Optimisations are probably unnecessary here due to the minimal logic
hanging off valid_o, and the optimisations cause protocol checker
violations.
Signed-off-by: Tom Roberts <tomroberts@lowrisc.org>
The RAM primitive provides a way to specify the granularity of the write
mask (wmask) signal, which can be used to select an appropriate
implementation (e.g. a SRAM with only byte selects, or no subselects at
all).
Instead of using copies of primitives from OpenTitan, vendor the files
in directly from OpenTitan, and use them.
Benefits:
- Less potential for diverging code between OpenTitan and Ibex, causing
problems when importing Ibex into OT.
- Use of the abstract primitives instead of the generic ones. The
abstract primitives are replaced during synthesis time with
target-dependent implementations. For simulation, nothing changes. For
synthesis for a given target technology (e.g. a specific ASIC or FPGA
technology), the primitives system can be instructed to choose
optimized versions (if available).
This is most relevant for the icache, which hard-coded the generic
SRAM primitive before. This primitive is always implemented as
registers. By using the abstract primitive (prim_ram_1p) instead, the
RAMs can be replaced with memory-compiler-generated ones if necessary.
There are no real draw-backs, but a couple points to be aware of:
- Our ram_1p and ram_2p implementations are kept as wrapper around the
primitives, since their interface deviates slightly from the one in
prim_ram*. This also includes a rather unfortunate naming confusion
around rvalid, which means "read data valid" in the OpenTitan advanced
RAM primitives (prim_ram_1p_adv for example), but means "ack" in
PULP-derived IP and in our bus implementation.
- The core_ibex UVM DV doesn't use FuseSoC to generate its file list,
but uses a hard-coded list in `ibex_files.f` instead. Since the
dynamic primitives system requires the use of FuseSoC we need to
provide a stop-gap until this file is removed. Issue #893 tracks
progress on that.
- Dynamic primitives depend no a not-yet-merged feature of FuseSoC
(https://github.com/olofk/fusesoc/pull/391). We depend on the same
functionality in OpenTitan and have instructed users to use a patched
branch of FuseSoC for a long time through `python-requirements.txt`,
so no action is needed for users which are either successfully
interacting with the OpenTitan source code, or have followed our
instructions. All other users will see a reasonably descriptive error
message during a FuseSoC run.
- This commit is massive, but there are no good ways to split it into
bisectable, yet small, chunks. I'm sorry. Reviewers can safely ignore
all code in `vendor/lowrisc_ip`, it's an import from OpenTitan.
- The check_tool_requirements tooling isn't easily vendor-able from
OpenTitan at the moment. I've filed
https://github.com/lowRISC/opentitan/issues/2309 to get that sorted.
- The LFSR primitive doesn't have a own core file, forcing us to include
the catch-all `lowrisc:prim:all` core. I've filed
https://github.com/lowRISC/opentitan/issues/2310 to get that sorted.
- Drive a speculative version of the branch signal into the IF stage to
drive address muxing
- The speculative signal is the same as the regular branch signal but
assumes all conditional branches are taken
- This breaks the timing path from branch condition calculation into
address muxing (and therefore PMP error calculation)
- When the branch is not taken, any external request we might otherwise
have made is suppressed
- This has a minor performance cost (0.8% without I$, ~0% with I$)
Signed-off-by: Tom Roberts <tomroberts@lowrisc.org>
- Remove any ready -> valid dependency by allowing the skid buffer to
accept data when the core is not ready
- Tighten-up behaviour around invalidations and cache enable/disable
- Remove xprop through output_compressed from invalid data when driving errors
- Make behaviour more consistent where speculative requests return
different data/error conditions to existing cache hit
Signed-off-by: Tom Roberts <tomroberts@lowrisc.org>
- Data valid should only be signalled when the current beat is
signalling an error
- PMP errors for future beats can sneak in while waiting for the
current beat
Signed-off-by: Tom Roberts <tomroberts@lowrisc.org>
- valid_o needs qualifying with output_error to mask X's in the output
data.
- err_plus2_o should not depend on the output data, only the alignment and
error status are relevant. Also needs skid_valid_q to mask
skid_error_q which is not reset.
Signed-off-by: Tom Roberts <tomroberts@lowrisc.org>
- Requests receiving a PMP error need to output a valid indicator, even
though they will not have received any beats of data
Signed-off-by: Tom Roberts <tomroberts@lowrisc.org>
- valid_o could be asserted for one cycle then dropped when receiving
rvalid data for a request which has branched into the middle of a
line.
- This fix keeps valid_o asserted by using the offset version of
fill_rvd_cnt_q (fill_rvd_beat) to compare against fill_out_cnt_q
(which is also offset by the branch).
Signed-off-by: Tom Roberts <tomroberts@lowrisc.org>
- Speculative requests observing a PMP error shouldn't increment the
external request counter
- Remove redundant logic on fill_rvd_exp
Signed-off-by: Tom Roberts <tomroberts@lowrisc.org>
- Add parameters and actual instantiation of icache
- Add a custom CSR in the M-mode custom RW range to enable the cache
- Wire up the cache invalidation signal to trigger on fence.i
Signed-off-by: Tom Roberts <tomroberts@lowrisc.org>
mtval should record which half of the instruction caused the error
rather than just recording the PC.
An extra signal is added in the IF stage to indicate when an error is
caused by the second half of an unaligned instruction. This signal is
then used to increment the PC by 2 for mtval capture on an error.
Fixes#709
Instruction requests triggering PMP errors have their external request
suppressed. The beat counting logic therefore needs to know that these
requests will never receive any rvalid data responses.
This fix stops the external request counter from incrementing, and marks
all external requests complete as soon as any error is received.
The data in the cache line beyond the error is not required since the
core cannot access it without consuming the error first.
- Bring in a version of ram primitive with configurable width similar to
the OT RAM primitive.
- Change the RAM banking structure to be a single bank of LineSize (64
bits) to match the upcoming ECC granularity.
Signed-off-by: Tom Roberts <tomroberts@lowrisc.org>
The design currently relies on fill_done remaining set in the cycle
after the fill buffer completes to ensure the fill_older_q entry gets
cleared (when a fill buffer completes in the same cycle that one is
allocated). This fix makes the behaviour a bit more consistent and easy
to reason about.