PAC instruction writes to the register file twice,
so when not taking stall_pa into consideration,
instr_id_done_o is activated twice during the execution of an PAC.
Unfortunately it interferes with the implementation of counters and
tracers for PAC, so now we are using stall_pa to calculate instr_id_done_o.
With the writeback stage enabled we execute the PAC/AUT instruction
before the required data is written to the register file.
For example, when the load instruction precedes AUT instruction,
AUT instruction is started before the loaded data is written
to the register file. It is a problem that the hazard detection
(stall_ld_hz) using rf_ren_a/b_o was not active for PAC/AUT instruction.
Also, I change codes not to activate pa_pac_en or pa_aut_en
when load hazard occurs.
- Add SECDED ECC checking to the register file when SecureIbex is
enabled
- No correction is attempted, but an alert is raised for the system to
intervene
Signed-off-by: Tom Roberts <tomroberts@lowrisc.org>
- Add a major and minor alert output which can be used by the system to
react to fault injection attacks
Signed-off-by: Tom Roberts <tomroberts@lowrisc.org>
Fix a lint error reported by AscentLint:
```
E ALWAYS_SPEC: ibex_counter.sv:59 Edge triggered block may be more accurately modeled as always_ff New
```
Fix all remaining issues reported by Verible lint.
It turns out that #965 undid some of the fixes in `ibex_alu.sv`
that were done in #980 around the `SHUFFLE_*`/`FLIP_*` signals.
This turns out to be quite complicated, because the icache has a lot
of different counters that all track addresses or fill buffer state.
For an inductive proof to go through, you need to make the relations
between them explicit, which takes lots of assertions.
All of the signals defined in the formal_tb are prefixed with 'f_'.
This isn't strictly necessary, but it makes it much easier to see what
came from the design (since we are "bound in", our ports don't have _o
or _i suffixes).
We add a couple of protocol assumptions:
ICache <-> Core:
- Branch target addresses are half-word aligned
- The branch_spec signal is driven if branch is
ICache <-> Memory:
- The bus doesn't respond unless there is an outstanding request
- The bus doesn't grant more than 4 billion outstanding requests(!)
There's also some protocol state tracking:
- f_addr_valid tracks whether the ICache currently has an
architectural address. It goes high with branch_i (which gives the
cache an address) and goes low when the cache completes a
transaction with err_o set (since the data is bad, there's no
notion of a "next address").
- f_reqs_on_bus tracks the number of requests currently outstanding
on the bus. This is used for the ICache <-> Memory assumptions
above. We have some internal assertions that check this equals the
sum of the "ext" counters minus the sum of the "rvd" counters.
With these assumptions, we can prove:
- Once asserted, valid_o stays high until a transaction is completed
by ready_i or until branch_i is asserted (which cancels the
transaction).
- While the transaction is pending, addr_o remains stable.
- While the transaction is pending, err_o remains stable and, if
err_o is asserted, so does err_plus2_o.
- While the transaction is pending, if err_o and err_plus2_o are
high then bottom 16 bits of the returned instruction data imply an
uncompressed instruction.
- While the transaction is pending, if err_o is low then the bottom
16 bits of the returned instruction remain stable.
- While the transaction is pending, if err_o is low and the bottom
16 bits of the returned instruction imply an uncompressed
instruction then the top 16 bits of the returned instruction
remain stable.
To get this working, you need a corresponding patch in Edalize, which
adds SymbiYosys as an EDA tool.
At the moment, this proves a couple of simple bus assertions. Later
patches will add more.
There are currently some rough edges to this flow:
(1) We use a hacky pre_build hook to run sv2v and edit the files in
the work tree. Among other problems, this means that the any
failure messages that come out of sby have bogus line numbers.
(2) Since we haven't yet got bind support in Yosys, we have to
include a fragment from the design itself.
This commit contains some final optimizations regarding the bit
manipulation extension as well as the parametrization into a balanced
version and a full performance version.
Balanced Version:
* Supports ZBB, ZBS, ZBF and ZBT extensions
* Dual cycle instructions:
ror[i], rol, cmov, cmix fsl, fsr[i]
* Everything else completes in a single cycle.
Full Version:
* Supports all 32b sub extensions.
* Dual cycle instructions:
ror[i], rol, cmov, cmix fsl, fsr[i], crc32[c], bext, bdep
* Everything else completes in a single cycle.
Notable Changes:
* bext/bdep are now multi-cycle: Sharing additional register
with multiplier module
* grev/gorc instructions are implemented in separate structures
rather than sharing the shifter or butterfly network.
* Speed up decision on using rs1 or rs3 for alu_operand_a by
introducing single-bit register, to identify ternary
instructions in their first cycle.
* Introduce enumerated parameter to chose bit manipulation
implementation
Signed-off-by: ganoam <gnoam@live.com>
- invalidate all ways on a tag error to prevent xprop from the data
error signal and reduce the likelyhood of multi-way allocations
Signed-off-by: Tom Roberts <tomroberts@lowrisc.org>
Although CSR_SECURESEED is unreadable, an attacker can write a new seed,
which brings convenience to the attack. This patch is used to XOR the
historical seed, so that even if the attacker writes a new value to
CSR_SECURESEED , he cannot know the value of the seed.
Signed-off-by: Xiang Wang <merle@hardenedlinux.org>
Since we are binding in an interface anyway, we can add some SV
assertions to make sure nothing too strange is happening.
Note that they aren't as strong as you might expect: we don't check
that rdata isn't X, for example. This is because the cache makes
speculative reads, which it (hopefully) ignores if the data is
invalid.
- Remove the timing optimisations that delay the factoring-in of ecc
errors into valid_o.
- Optimisations are probably unnecessary here due to the minimal logic
hanging off valid_o, and the optimisations cause protocol checker
violations.
Signed-off-by: Tom Roberts <tomroberts@lowrisc.org>
This matches the style used in the FF based register file. It gives each
register its own always block with a single enable rather than having
multiple registers with enables in a single always block.
Signed-off-by: Greg Chadwick <gac@lowrisc.org>
By giving each register its own always_ff block clock gating is more
obvious to synthesis tools.
This also includes some minor naming tweaks to make use of the _q
convention for flops.
- If an interrupt arrives at the same time as a load/store instruction
is in ID stage, the interrupt must wait until load/store completes.
Without the WB stage this happens naturally as the core stalls. With
the WB stage, we need to allow the load/store to progress to the WB
stage (and clear the ID stage) then hold back the interrupt until it
completes.
- Also cleaned up some lsu related stalling terms and signal naming.
Signed-off-by: Tom Roberts <tomroberts@lowrisc.org>
- Protocol-wise data_err_i is notionally X when !data_rvalid_i
- In addition, the design does not appear to rely on the asserted
behaviour
- Removing as it is firing in chip-level OT simulations
Signed-off-by: Tom Roberts <tomroberts@lowrisc.org>
When a potential exception occurs in ID/EX controller must wait for any
outstanding instruction in WB to complete before resolving it. The
instruction in WB may also have an exception which takes priority over
ID/EX.
Signed-off-by: Greg Chadwick <gac@lowrisc.org>
With the writeback stage enabled the controller can see a load or store
error from the writeback stage whilst seeing some other fault/exception
from ID/EX (e.g. an instruction fetch error). The writeback stage fault
must take priority, however without the writeback stage the
priortisation changes.
This introduces more explicit prioritisation logic for faults/exceptions
and gives the correct prioritisation for configurations both with and
without a writeback stage.
Fixes#912
Signed-off-by: Greg Chadwick <gac@lowrisc.org>
- The speculative branch behaviour causes a performance degradation of
around 3% in the max config. This change enables that behaviour only
the maximum PMP config, which is where it is most needed for timing
closure.
Signed-off-by: Tom Roberts <tomroberts@lowrisc.org>
- Split out address matching into less than, greater than and equals to
correctly match NAPOT, NA4 and TOR modes.
- Relates to #902
Signed-off-by: Tom Roberts <tomroberts@lowrisc.org>
- The prefetch buffer needs to know when space is available in the fetch
FIFO to accept a new external request.
- This change updates that logic to look at what is in the FIFO and what
is outstanding on the bus to decide when space is available rather
than always assuming the maximum number of requests are outstanding.
- This improves the usage efficiency of the FIFO and fixes#574
Signed-off-by: Tom Roberts <tomroberts@lowrisc.org>
The RAM primitive provides a way to specify the granularity of the write
mask (wmask) signal, which can be used to select an appropriate
implementation (e.g. a SRAM with only byte selects, or no subselects at
all).
Instead of using copies of primitives from OpenTitan, vendor the files
in directly from OpenTitan, and use them.
Benefits:
- Less potential for diverging code between OpenTitan and Ibex, causing
problems when importing Ibex into OT.
- Use of the abstract primitives instead of the generic ones. The
abstract primitives are replaced during synthesis time with
target-dependent implementations. For simulation, nothing changes. For
synthesis for a given target technology (e.g. a specific ASIC or FPGA
technology), the primitives system can be instructed to choose
optimized versions (if available).
This is most relevant for the icache, which hard-coded the generic
SRAM primitive before. This primitive is always implemented as
registers. By using the abstract primitive (prim_ram_1p) instead, the
RAMs can be replaced with memory-compiler-generated ones if necessary.
There are no real draw-backs, but a couple points to be aware of:
- Our ram_1p and ram_2p implementations are kept as wrapper around the
primitives, since their interface deviates slightly from the one in
prim_ram*. This also includes a rather unfortunate naming confusion
around rvalid, which means "read data valid" in the OpenTitan advanced
RAM primitives (prim_ram_1p_adv for example), but means "ack" in
PULP-derived IP and in our bus implementation.
- The core_ibex UVM DV doesn't use FuseSoC to generate its file list,
but uses a hard-coded list in `ibex_files.f` instead. Since the
dynamic primitives system requires the use of FuseSoC we need to
provide a stop-gap until this file is removed. Issue #893 tracks
progress on that.
- Dynamic primitives depend no a not-yet-merged feature of FuseSoC
(https://github.com/olofk/fusesoc/pull/391). We depend on the same
functionality in OpenTitan and have instructed users to use a patched
branch of FuseSoC for a long time through `python-requirements.txt`,
so no action is needed for users which are either successfully
interacting with the OpenTitan source code, or have followed our
instructions. All other users will see a reasonably descriptive error
message during a FuseSoC run.
- This commit is massive, but there are no good ways to split it into
bisectable, yet small, chunks. I'm sorry. Reviewers can safely ignore
all code in `vendor/lowrisc_ip`, it's an import from OpenTitan.
- The check_tool_requirements tooling isn't easily vendor-able from
OpenTitan at the moment. I've filed
https://github.com/lowRISC/opentitan/issues/2309 to get that sorted.
- The LFSR primitive doesn't have a own core file, forcing us to include
the catch-all `lowrisc:prim:all` core. I've filed
https://github.com/lowRISC/opentitan/issues/2310 to get that sorted.
- Drive a speculative version of the branch signal into the IF stage to
drive address muxing
- The speculative signal is the same as the regular branch signal but
assumes all conditional branches are taken
- This breaks the timing path from branch condition calculation into
address muxing (and therefore PMP error calculation)
- When the branch is not taken, any external request we might otherwise
have made is suppressed
- This has a minor performance cost (0.8% without I$, ~0% with I$)
Signed-off-by: Tom Roberts <tomroberts@lowrisc.org>
- For TOR matching, match should be range_low <= addr < range_high
- Adapt masking so TOR matching can still be reused for NAPOT matching
- Relates to #864
Signed-off-by: Tom Roberts <tomroberts@lowrisc.org>
- Remove any ready -> valid dependency by allowing the skid buffer to
accept data when the core is not ready
- Tighten-up behaviour around invalidations and cache enable/disable
- Remove xprop through output_compressed from invalid data when driving errors
- Make behaviour more consistent where speculative requests return
different data/error conditions to existing cache hit
Signed-off-by: Tom Roberts <tomroberts@lowrisc.org>