Update code from upstream repository
https://github.com/lowRISC/opentitan to revision
249b4c316cd6626d13e17edd8a52ca60c004af96
Since we deleted the lock file in the previous commit, this has no
shortlog, but note that the remote SHA matches that in the grandparent
commit (so we haven't missed anything).
Signed-off-by: Rupert Swarbrick <rswarbrick@lowrisc.org>
This commit was generated by running
for hj in $(grep -l opentitan vendor/*.vendor.hjson); do
$opentitan/util/vendor.py -U -c $hj
done
and then squashing together all the resulting commits. It will be
followed by a patch that combines these vendor.hjson files (using the
vendor tool's new "mapping" functionality), but we need a patch first
to get everything in sync before squashing together.
Individual commit messages below:
*****
Update common_ifs to lowRISC/opentitan@249b4c31
Update code from subdir hw/dv/sv/common_ifs in upstream repository
https://github.com/lowRISC/opentitan to revision
249b4c316cd6626d13e17edd8a52ca60c004af96
* [dv] This fixes a padctrl reset issue in the chip level tb (Michael
Schaffner)
*****
Update csr_utils to lowRISC/opentitan@249b4c31
Update code from subdir hw/dv/sv/csr_utils in upstream repository
https://github.com/lowRISC/opentitan to revision
249b4c316cd6626d13e17edd8a52ca60c004af96
* [dv] csr_excl_item printed msg cleanup (Srikrishna Iyer)
* [dv] Fix top-level mem test (Weicai Yang)
* [doc] Fix typo in CSR exclusions (Michael Schaffner)
* [dv] Fix failures in test csr_mem_rw_with_rand_reset (Weicai Yang)
*****
Update dv_lib to lowRISC/opentitan@249b4c31
Update code from subdir hw/dv/sv/dv_lib in upstream repository
https://github.com/lowRISC/opentitan to revision
249b4c316cd6626d13e17edd8a52ca60c004af96
* [dv/chip] fix csr_hw_reset X assertion issue (Cindy Chen)
* [dv] Use phase_ready_to_end to handle end of test (Weicai Yang)
* [dv] Fix failures in test csr_mem_rw_with_rand_reset (Weicai Yang)
*****
Update dv_utils to lowRISC/opentitan@249b4c31
Update code from subdir hw/dv/sv/dv_utils in upstream repository
https://github.com/lowRISC/opentitan to revision
249b4c316cd6626d13e17edd8a52ca60c004af96
* [dv] Use uvm_config_db to control tlul_assert (Weicai Yang)
* [dv] Add begin...end around if statement in macro (Weicai Yang)
* [dv] Fix timeout due to too many non-blocking TL accesses (Weicai
Yang)
* [spi_device/dv] Add interrupt seq (Weicai Yang)
*****
Update dvsim to lowRISC/opentitan@249b4c31
Update code from subdir util/dvsim in upstream repository
https://github.com/lowRISC/opentitan to revision
249b4c316cd6626d13e17edd8a52ca60c004af96
* [dvsim] Enable round-trip of env variables into log (Philipp Wagner)
* [dvsim] Support for running pre-built SW tests (Srikrishna Iyer)
* [dvsim] Print what cmd is executed in the log (Srikrishna Iyer)
* [dvsim] Specify encoding of opened files as UTF-8 (Philipp Wagner)
* [dvsim] Simplify factory methods for FlowCfg (Rupert Swarbrick)
* [dvsim] small fix on css style (Cindy Chen)
* [dvsim] support css format for email (Cindy Chen)
* [doc] Rename Hardware -> Development Stages (Sam Elliott)
*****
Update uvmdvgen to lowRISC/opentitan@249b4c31
Update code from subdir util/uvmdvgen in upstream repository
https://github.com/lowRISC/opentitan to revision
249b4c316cd6626d13e17edd8a52ca60c004af96
* [uvmdvgen] Minor env gen fix (Srikrishna Iyer)
* [doc] Rename Hardware -> Development Stages (Sam Elliott)
* [dv] Use uvm_config_db to control tlul_assert (Weicai Yang)
* [uvmdvgen] Automate checklist gen, fixes (Srikrishna Iyer)
* [doc] Unify dashboard, manual spec table (Srikrishna Iyer)
* [dvsim] Added fusesoc generator for RAL (Srikrishna Iyer)
- Drive a speculative version of the branch signal into the IF stage to
drive address muxing
- The speculative signal is the same as the regular branch signal but
assumes all conditional branches are taken
- This breaks the timing path from branch condition calculation into
address muxing (and therefore PMP error calculation)
- When the branch is not taken, any external request we might otherwise
have made is suppressed
- This has a minor performance cost (0.8% without I$, ~0% with I$)
Signed-off-by: Tom Roberts <tomroberts@lowrisc.org>
- For TOR matching, match should be range_low <= addr < range_high
- Adapt masking so TOR matching can still be reused for NAPOT matching
- Relates to #864
Signed-off-by: Tom Roberts <tomroberts@lowrisc.org>
The rdata driven by the cache is undefined when there is an error. There
are therefore no requirements on stability.
Signed-off-by: Tom Roberts <tomroberts@lowrisc.org>
- Remove any ready -> valid dependency by allowing the skid buffer to
accept data when the core is not ready
- Tighten-up behaviour around invalidations and cache enable/disable
- Remove xprop through output_compressed from invalid data when driving errors
- Make behaviour more consistent where speculative requests return
different data/error conditions to existing cache hit
Signed-off-by: Tom Roberts <tomroberts@lowrisc.org>
Instead of accessing the signals via the module instance, use the
signals connected to the output port of the module.
Only set the values for RS1/2 if they are used.
Without this addition an instruction without a valid encoding for a
register would reuse invalid data as the address of the register.
Certain checks require that the data must match the register content if
the address is non-zero.
Reuse the signal from the instruction decoder to set the registers to
non-zero values only if the instruction contains a valid encoding for
the register.
The next program counter is not always the program counter of the
fetched instruction. When updating the counter, the actual next
instruction is given by the branch target.
This commit implements the Bit Manipulation Extension ZBR instruction
group: crc32[c].[bhw].
CRC-32 (CRC-32/ISO-HDLC) and CRC-32C (CRC-32/ISCSI) are directly
implemented. The CRC operation solves the following equation using
binary polynomial arithmetic:
rev(rd)(x) = rev(rs1)(x) * x**n mod {1, P}(x),
where {1,P}(x) denotes the crc polynomial. Using barret reduction one
can write this as
rd = (rs1 >> n) ^ rev(rev( (rs1 << (32-1)) cx rev(mu)) cx P)
^-- cycle 0--------------------^
^-- cycle 1 ------------------------------------------^
Where cx denotes carry-less multiplication and mu = polydiv(x**64,
{1,P}), omitting the MSB (bit 32).
The implementation increases area consumption by ~0.6kGE for synthesis
with relaxed timing constraints. With tight timing constraints that is
~1.6kGE. There is no significant impact on frequency.
Signed-off-by: ganoam <gnoam@live.com>
ram_1p is almost a copy of the single-port RAM primitive we have in
OpenTitan, called prim_ram_1p, with its generic implementation
prim_generic_ram_1p. Instead of having a copy of that file in Ibex,
consistently use the OpenTitan one.
Unfortunately, ram_1p has slightly different semantics around some
signals, especially rvalid. This commit adjusts the meanings of the
signals for now, since I don't have a way to test the Arty board
which also uses this primitive (together with the compliance test
suite). With the testing in the compliance suite I'm reasonably certain
that the Arty board will work as well.
DPI access is suggested and more generic than Verilator direct signal
access. This changes the access to the performance counters from the
Verilator testbench to use DPI instead of directly accessing the
array.
Signed-off-by: Stefan Wallentowitz <stefan.wallentowitz@hm.edu>
To allow other modules to reference the simple system, it must provide
default files. In particular this is useful in DV settings where bind
is used.
Signed-off-by: Stefan Wallentowitz <stefan.wallentowitz@hm.edu>
This test constrains the address range (giving the cache a chance to
do some caching), but leaves the cache disabled. Seed changes are more
frequent than usual, to give us a good chance to spot any caching that
shouldn't have happened.
- Adds a new module in the IF stage to inject dummy instructions into
the pipeline
- Control / frequency of insertion is governed by configuration CSRs
- Extra CSR added to allow reseed of the internal LFSR useed for
randomizing insertion
- Extra logic added to the register file to make dummy instruction
writebacks look like real intructions (via the zero register)
Signed-off-by: Tom Roberts <tomroberts@lowrisc.org>
The code before this patch maintained a mailbox, where it would add an
item for each request it saw, and then pop items off until finding the
right address whenever it saw a grant.
Most of the time, you might expect to see a sequence like this:
request 100
grant 100
request 104
grant 104
request 108
grant 108
This scheme is also resilient when glitches (to do with the
delta-cycle scheduling in the simulator) mean you actually see
something like:
request 999
request 100
grant 100
request 104
grant 104
...
However, there's another source of "mismatch" possible too: the cache
can change the request address if the request hasn't been granted (as
opposed to a ready/valid interface, where this sort of tomfoolery is
not allowed!).
When the cache is branching all over the place, as in the sanity
sequence, this doesn't really matter. But if the branch destinations
are constrained, as in the passthru sequence, you can see things like
this:
request 100 (1)
request 120 (2)
request 100 (3)
grant 100 (4)
request 104
grant 104
...
Note that the mailbox has two entries for address 100 when searching
at point (4). This might be ok, but will cause failures if we get a
new seed at (2) or (3).
This patch replaces the mailbox with a queue. New requests get
inserted at the end, as before, but grants search from the end, rather
than the start. This means that when we get to (4) in the example
above, we'll pick the latest seed (and duplicate entries disappear
quickly).
When the memory model sees a new fetch on the bus, it might decide to
pick a new seed for the backing memory. Before this patch, the seed
applied to every fetch strictly after this one. Now, it applies to
this fetch too.
This is what the scoreboard expects. In particular, you can trigger
problems here by disabling the cache and branching lots: things will
go wrong if we pick a new seed at the same time as handling the
branch.
To fix things, we either have to teach the scoreboard to "look one
seed backwards" when the cache is disabled, which is ugly and not as
sensitive to errors in the cache, or we have to apply the new seed
immediately. This is a little painful, because we end up having to
randomize the response item and then calculate a field based on a
possible new seed (see the logic between start_item and end_item in
take_req), but I think it's cleaner than the alternative.
As part of the patch, I've also split the "req" and "grant" handling
code into separate tasks. There's no real change there, except to get
rid of a level of indentation, but I think it makes the code a bit
easier to understand.
The --seed argument has kept its original meaning: Run the one and
only iteration of the test with this seed. We've added another
argument, --start_seed to riscv-dv's run.py and our sim.py which says
"run the first iteration with this seed, and count up for later
iterations".
This should fix issue #859.
If --iterations is 1, this is equivalent to the existing --seed
argument (which we're keeping unchanged). If --iterations is
0 (reading iteration counts from the config) or positive, successive
test iterations use successive seeds. So if you pass --start_seed 123
and run ten iterations, they will run with seeds 123, 124, ... through
133.
Lots of the added code is to check that you don't do something silly
like --seed=123 --iterations=10. Since the next patch will convert the
Makefile which runs this script to using --start_seed, that's all dead
code. Maybe we should get rid of that argument at some point.
This commit implements the Bit Manipulation Extension ZBC instruction
group: clmul[rh] (carry-less multiply [reverse][high])
Carry-less multiplication can be understood as multiplication based on
the addition interpreted as the bit-wise xor operation.
Example: 1101 X 1011 = 1111111:
1011 X 1101
-----------
1101
xor 1101
---------
10111
xor 0000
----------
010111
xor 1101
-----------
1111111
Architectural details:
A 32 x 32-bit array
[ operand_b[i] ? (operand_a << i) : '0 for i in 0 ... 31 ]
is generated. The entries of the array are pairwise 'xor-ed'
together in a 5-stage binary tree.
The area increase when synthesized with relaxed timing constraints is
1.6-1.7kGE.
Timing figures are improve by 0.1 ns for the 3-stage configuration and
worsen by 0.04ns for the 2-stage implementation. This suggests
fluctuations due to the heuristic nature of the synthesis tools.
Signed-off-by: ganoam <gnoam@live.com>
I think these represent the test cases we discussed. I've also removed
non-existent entries from the "tests" keys: I didn't really understand
how dvsim.py worked when I wrote the original version and they just
cause irritating warnings.
The previous code kind of worked, but we were making the "should I
make a new seed" decision in the monitor, rather than the sequence.
The problem is that this is difficult to customize with other test
sequences (they sit adjacent to the monitor in the class hierarchy,
not above it).
The new code seems a little cleaner. We generate new seeds in the
sequence (which is in charge of keeping track of the current seed
anyway). These new seeds get passed to the driver, which has an
analysis port by which it can tell the scoreboard about them. Note
that we have to pass them from the driver, rather than the monitor,
because the new seed doesn't directly appear on the interface.
The rest of the changes are simplifying the ibex_icache_mem_bus_item
class, which now only has two modes and removing the seed field from
the ibex_icache_mem_req_item class.
- The controller state machine could only progress to FLUSH to handle an
exception if instr_valid_i was set
- When the exception comes from a load/store in the Writeback stage, and
no new instruction has been driven into the ID stage, this could cause
exception to be missed
- The instr_valid_i qualification is therefore removed from the state
machine as all relevant signals inside that if block are already
qualified by instr_valid_i anyway
- Fixes#849
Signed-off-by: Tom Roberts <tomroberts@lowrisc.org>
Change default to 4 rather than 0. Makes no difference when PMPEnable==0
and gets rid of lint failures due to 0 array referencing (0 is an
unsupported value for this parameter).
Signed-off-by: Tom Roberts <tomroberts@lowrisc.org>
This commit implements the Bit Manipulation Extension sign-extend
instructions: sext.b (sign-extend byte) and sext.h (sign-extend half
word).
The implementation is basically a one-liner, duplicating the msb of the
byte / half-word into the msb of the output register.
Signed-off-by: ganoam <gnoam@live.com>
This commit implements the Bit Manipulation Extension ZBF instruction
group, which consists only of the one instruction bfp (bit-field
place).
This instruction places a field of length len < 16 from rs2 in rs1 at
offset off.
Architectureal details:
The implementation works exactly the same as proposed by Claire
Wolf in her reference implementation.
1. bfp_mask = slo(o, len)
2. bfp_result =
(rs1 & ~(bfp_mask << off)) | (rs2 & bfp_mask) << off
^------ shifter-^
The existing shifter structure is shared for the indicated
operation.
Impact on area:
* When synthesizing without the B-extension, the 2 stage
design seems to move the timing bottleneck, leading to
optimizations which result in an area increase by 1 kGE,
when synthesized with tight timing constraints. For the
3 stage configuration there is no change.
When synthesized with relaxed timing constraints there is no
significant change in either configuration.
* With the B-extension enabled, the area increase for tight
timing constraints is 1.1-1.2 kGE. For relaxed timing
constraints that is ~0.4kGE
Impact on timing: No significant impact.
Signed-off-by: ganoam <gnoam@live.com>
This commit implements the Bit Manipulation Extension ZBE instruction
group: bext (bit extract) and bdep (bit deposit).
Architectural details:
* bext/bdep: A new butterfly and inverse butterfly network is
implemented. The generation of its controlbits depend on a
parallel prefix bitcount of the deposit / extract mask.
* bitcounter: The path for bext / bdep instructions traverses
the bit counter and the butterfly network, resulting in both a
larger delay and area. To mitigate the bitcounter has been
changed from a serial bit counter to a radix-2 tree structure.
* grev/gorc: Zbp instructions general reverse and general
or-combine have as of yet shared the shifters reversal
structure. It has proven benefitial to area and timing to reuse
the novel butterfly network instead
The butterfly network itself consumes ~3.5kGE and ~1.1kGE for synthesis
with tight and relaxed timing constraints respectively. Including the
optimizations of the bitcounter and grev/gorc, the overall change in
area consumption is +4.6kGE (+1.2kGE) and +3.3kGE (+1.1kGE) for
synthesis with tight (relaxed) timing constraints for 2- and 3-stage
configurations respectively. For tight timing constraints that is a
growth by around ~10%, for relaxed ~5%.
The impact on the maximum frequency is negligable.
Signed-off-by: ganoam <gnoam@live.com>