Commit graph

373 commits

Author SHA1 Message Date
Tobias Wölfel
12f4f3b9ae [rtl] Forward register data with status
Instead of accessing the signals via the module instance, use the
signals connected to the output port of the module.

Only set the values for RS1/2 if they are used.

Without this addition an instruction without a valid encoding for a
register would reuse invalid data as the address of the register.
Certain checks require that the data must match the register content if
the address is non-zero.

Reuse the signal from the instruction decoder to set the registers to
non-zero values only if the instruction contains a valid encoding for
the register.
2020-05-25 16:47:25 +01:00
Tobias Wölfel
f45b3eca99 [rtl] Set RVFI program counter
The next program counter is not always the program counter of the
fetched instruction. When updating the counter, the actual next
instruction is given by the branch target.
2020-05-25 16:47:25 +01:00
Tobias Wölfel
4e7b981911 [rtl] Add RVFI IXL interface
Following the RISC-V Formal Interface (RVFI) specification the output is
added to set the value of MXL/SXL/UXL of the current privilege level.
2020-05-25 16:47:25 +01:00
ganoam
66687e927c [bitmanip] Add ZBR instruction group
This commit implements the Bit Manipulation Extension ZBR instruction
group: crc32[c].[bhw].

CRC-32 (CRC-32/ISO-HDLC) and CRC-32C (CRC-32/ISCSI) are directly
implemented. The CRC operation solves the following equation using
binary polynomial arithmetic:

rev(rd)(x) = rev(rs1)(x) * x**n mod {1, P}(x),

where {1,P}(x) denotes the crc polynomial. Using barret reduction one
can write this as

rd = (rs1 >> n) ^ rev(rev( (rs1 << (32-1)) cx rev(mu)) cx P)
                      ^-- cycle 0--------------------^
     ^-- cycle 1 ------------------------------------------^

Where cx denotes carry-less multiplication and mu = polydiv(x**64,
{1,P}), omitting the MSB (bit 32).

The implementation increases area consumption by ~0.6kGE for synthesis
with relaxed timing constraints. With tight timing constraints that is
~1.6kGE. There is no significant impact on frequency.

Signed-off-by: ganoam <gnoam@live.com>
2020-05-22 17:21:03 +02:00
Tom Roberts
d5ee96fff6 [rtl] Add dummy instruction insertion
- Adds a new module in the IF stage to inject dummy instructions into
  the pipeline
- Control / frequency of insertion is governed by configuration CSRs
- Extra CSR added to allow reseed of the internal LFSR useed for
  randomizing insertion
- Extra logic added to the register file to make dummy instruction
  writebacks look like real intructions (via the zero register)

Signed-off-by: Tom Roberts <tomroberts@lowrisc.org>
2020-05-21 13:58:01 +01:00
ganoam
f173e2baba [bitmanip] Add ZBC instruction group
This commit implements the Bit Manipulation Extension ZBC instruction
group: clmul[rh] (carry-less multiply [reverse][high])

Carry-less multiplication can be understood as multiplication based on
the addition interpreted as the bit-wise xor operation.

Example: 1101 X 1011 = 1111111:

      1011 X 1101
      -----------
             1101
        xor 1101
        ---------
            10111
       xor 0000
       ----------
           010111
      xor 1101
      -----------
          1111111

Architectural details:
        A 32 x 32-bit array
        [ operand_b[i] ? (operand_a << i) : '0 for i in 0 ... 31 ]
        is generated. The entries of the array are pairwise 'xor-ed'
        together in a 5-stage binary tree.

The area increase when synthesized with relaxed timing constraints is
1.6-1.7kGE.

Timing figures are improve by 0.1 ns for the 3-stage configuration and
worsen by 0.04ns for the 2-stage implementation. This suggests
fluctuations due to the heuristic nature of the synthesis tools.

Signed-off-by: ganoam <gnoam@live.com>
2020-05-19 10:38:38 +02:00
Tom Roberts
8934267c78 [rtl] Fix instr_valid_i exception issue
- The controller state machine could only progress to FLUSH to handle an
  exception if instr_valid_i was set
- When the exception comes from a load/store in the Writeback stage, and
  no new instruction has been driven into the ID stage, this could cause
  exception to be missed
- The instr_valid_i qualification is therefore removed from the state
  machine as all relevant signals inside that if block are already
  qualified by instr_valid_i anyway
- Fixes #849

Signed-off-by: Tom Roberts <tomroberts@lowrisc.org>
2020-05-15 11:13:20 +01:00
Tom Roberts
a5ae9f4995 [rtl] Add data-independent timing to multdiv_fast
- No early return on divide by zero

Signed-off-by: Tom Roberts <tomroberts@lowrisc.org>
2020-05-15 10:19:55 +01:00
Tom Roberts
d19189ba43 [rtl] data-independent execution for multdiv_slow
- Remove all early exit's from multiply and divide operations when in
  fixed time execution mode.

Signed-off-by: Tom Roberts <tomroberts@lowrisc.org>
2020-05-15 10:19:55 +01:00
Tom Roberts
0ba0ad5a43 [rtl] multdiv_slow general tidy-up
- Correct some typos and fix various lint / style guide issues
- No functional changes

Signed-off-by: Tom Roberts <tomroberts@lowrisc.org>
2020-05-15 10:19:55 +01:00
ganoam
9bd3350bb3 [bitmanip] Add sext.b/h instructions
This commit implements the Bit Manipulation Extension sign-extend
instructions: sext.b (sign-extend byte) and sext.h (sign-extend half
word).

The implementation is basically a one-liner, duplicating the msb of the
byte / half-word into the msb of the output register.

Signed-off-by: ganoam <gnoam@live.com>
2020-05-14 22:03:45 +02:00
ganoam
fac404a6f3 [bitmanip] Add ZBF instruction group
This commit implements the Bit Manipulation Extension ZBF instruction
group, which consists only of the one instruction bfp (bit-field
place).
This instruction places a field of length len < 16 from rs2 in rs1 at
offset off.

Architectureal details:
        The implementation works exactly the same as proposed by Claire
        Wolf in her reference implementation.
        1. bfp_mask = slo(o, len)
        2. bfp_result =
                (rs1 & ~(bfp_mask << off)) | (rs2 & bfp_mask) << off
                        ^------ shifter-^
        The existing shifter structure is shared for the indicated
        operation.

Impact on area:

        * When synthesizing without the B-extension, the 2 stage
        design seems to move the timing bottleneck, leading to
        optimizations which result in an area increase by 1 kGE,
        when synthesized with tight timing constraints. For the
        3 stage configuration there is no change.
        When synthesized with relaxed timing constraints there is no
        significant change in either configuration.

        * With the B-extension enabled, the area increase for tight
        timing constraints is 1.1-1.2 kGE. For relaxed timing
        constraints that is ~0.4kGE

Impact on timing: No significant impact.

Signed-off-by: ganoam <gnoam@live.com>
2020-05-14 21:34:49 +02:00
ganoam
0afd000a09 [bitmanip] Add ZBE Instruction Group
This commit implements the Bit Manipulation Extension ZBE instruction
group: bext (bit extract) and bdep (bit deposit).

Architectural details:
        * bext/bdep: A new butterfly and inverse butterfly network is
        implemented. The generation of its controlbits depend on a
        parallel prefix bitcount of the deposit / extract mask.

        * bitcounter: The path for bext / bdep instructions traverses
        the bit counter and the butterfly network, resulting in both a
        larger delay and area. To mitigate the bitcounter has been
        changed from a serial bit counter to a radix-2 tree structure.

        * grev/gorc: Zbp instructions general reverse and general
        or-combine have as of yet shared the shifters reversal
        structure. It has proven benefitial to area and timing to reuse
        the novel butterfly network instead

The butterfly network itself consumes ~3.5kGE and ~1.1kGE for synthesis
with tight and relaxed timing constraints respectively. Including the
optimizations of the bitcounter and grev/gorc, the overall change in
area consumption is +4.6kGE (+1.2kGE) and +3.3kGE (+1.1kGE) for
synthesis with tight (relaxed) timing constraints for 2- and 3-stage
configurations respectively. For tight timing constraints that is a
growth by around ~10%, for relaxed ~5%.

The impact on the maximum frequency is negligable.

Signed-off-by: ganoam <gnoam@live.com>
2020-05-14 16:43:19 +02:00
Tom Roberts
15ab023e25 [rtl] Stop regfile writeback for load errors
- A data or PMP error should stop the register file from being updated
- Fixes #832

Signed-off-by: Tom Roberts <tomroberts@lowrisc.org>
2020-05-05 09:26:55 +01:00
Tom Roberts
c862f104af [rtl] icache error signalling fix
- Data valid should only be signalled when the current beat is
  signalling an error
- PMP errors for future beats can sneak in while waiting for the
  current beat

Signed-off-by: Tom Roberts <tomroberts@lowrisc.org>
2020-05-04 09:16:55 +01:00
Tom Roberts
5bcacb876d [rtl] Fix jump signal stuck high during stall
- Stalls due to preceding memory accesses in the WB stage shouldn't
  cause the jump signal to remain high.
- The jump signal being stuck high causes repeated memory accesses to
  the same address, and unnecessary stalling.
- Fixes lowRISC/opentitan#2099

Signed-off-by: Tom Roberts <tomroberts@lowrisc.org>
2020-05-04 08:28:59 +01:00
Pirmin Vogel
3922b2582f [rtl] Rework generation and use of mult/div_sel/en
Signed-off-by: Pirmin Vogel <vogelpi@lowrisc.org>
2020-05-01 17:29:59 +02:00
Pirmin Vogel
511c59db18 [rtl] Switch multdiv_en to multdiv_sel where possible
Signed-off-by: Pirmin Vogel <vogelpi@lowrisc.org>
2020-05-01 17:29:59 +02:00
ganoam
a68923a404 [bitmanip] Add ZBP Instruction Group
This commit implements the Bit Manipulation Extension ZBP instruction
group: grev[i] (generalized reverse), gorc[i] (generalized or-combine)
and [un]shfl[i] (generalized shuffle) and all of their
pseudo-instructions.

Architectural details:
        * grev / gorc: The shifter structure features only a right
        shift structure. In order to perform a left shift therefore the
        operand needs to be reversed, shifted and reversed again. The
        architecture of the back-reversal is implemented in stages
        which are activated using the general reverse / orcombine
        operand, or a signal marking left-shifts.

        * shfl / unshfl: Also known as zip / unzip or interlace /
        uninterlace operation. These instructions are implemented
        in their own structure using a permutation networ of 6 stages.
        4 stages thereof implement the shuffle permutations. the first
        and last stage is the flip stage, which effectively reverse s
        the order of the inner stages, for unshuffle operations.

Signed-off-by: ganoam <gnoam@live.com>
2020-04-29 11:10:44 +02:00
Tom Roberts
36b05e2ebf [rtl/icache] Fix typo in ram req
Signed-off-by: Tom Roberts <tomroberts@lowrisc.org>
2020-04-27 13:52:37 +01:00
Tom Roberts
eb4913c8f0 [rtl/icache] Fix some issues in icache
- valid_o needs qualifying with output_error to mask X's in the output
  data.
- err_plus2_o should not depend on the output data, only the alignment and
  error status are relevant. Also needs skid_valid_q to mask
  skid_error_q which is not reset.

Signed-off-by: Tom Roberts <tomroberts@lowrisc.org>
2020-04-24 14:11:34 +01:00
ganoam
edd8e1d228 [bitmanip] Fix: Oversight in Tracer ZBT DV fail
Failing to read rd in tracer for fsri caused bitmanip DV test to fail.

Signed-off-by: ganoam <gnoam@live.com>
2020-04-24 12:50:42 +02:00
Tom Roberts
73155145c0 [rtl/lsu] Fix bug in pmp error clearing
- pmp_err_q needs to clear once the LSU state machine returns to idle,
  otherwise it will remain set indefinely.
- Relates to #808

Signed-off-by: Tom Roberts <tomroberts@lowrisc.org>
2020-04-24 09:06:05 +01:00
ganoam
133fef2c2f [bitmanip] Add ZBS Instruction Group
This commit implements the Bit Manipulation Extension SBS instruction
group: sbset[i], sbclr[i], sbinv[i] and sbext[i]. These instructions
set, clear, invert or extract bit rs1[rs2] or rs1[imm] for reg-reg and
reg-imm instructions respectively.

Archtectural details:
        * A multiplexer is added to the shifter structure in order to
          chose between 32'h1, used for the single-bit instructions as
          summarized below, and regular operand_b input.

        * Dedicated bitwise-logic blocks are introduced for multicycle
          shifts and cmix instructions (fsr, fsl, ror, rol),
          single-bit instructions (sbset, sbclr, sbinv, sbext), and
          stanard-ALU and zbb instructions (or, and xor, orn, andn,
          xnor).

Instruction details: All of the zbs instructions rely on sharing the
        existing shifter structure. The instructions are carried out in
        one cycle.

        * sbset, sbclr, sbinv:
                shift_result = 32'h1 << rs2[4:0];
                singlebit_result = rs1 [|, ^ , &~] shift_result;

        * sbext:
                shift_result = rs1 >> rs2[4:0];
                singlebit_result = {31'0,shift_resutl[0]};

Signed-off-by: ganoam <gnoam@live.com>
2020-04-24 08:32:30 +02:00
Pirmin Vogel
dcf18d86c3 Add missing include
Signed-off-by: Pirmin Vogel <vogelpi@lowrisc.org>
2020-04-23 15:44:56 +02:00
Jonathan Kimmitt
bd45e9a355 Reorder statements to prevent latch inference 2020-04-23 12:18:20 +01:00
ganoam
999735568e Use Shared Imd Val Reg with Multdiv Slow
This commit adds support for the shared immediate value register in the
id_stage for the slow implementation of the multdiv module.

Register accum_window_q is now stored in the intermediate value
register.

Signed-off-by: ganoam <gnoam@live.com>
2020-04-23 12:14:45 +02:00
Tom Roberts
a3a1f9f40a [rtl] Fix icache PMP error handling
- Requests receiving a PMP error need to output a valid indicator, even
  though they will not have received any beats of data

Signed-off-by: Tom Roberts <tomroberts@lowrisc.org>
2020-04-20 16:05:19 +01:00
ganoam
06f50ddeac Bugfix: Generate Erroneous Illegal Insn
This commit fixes three possible cases for erroneous generation of
illegal instruction signals. Also, the bit-slices considered for
decoding ALU instructions are corrected to better reflect their
encoding specifications.

* Fix decoding of orc_b in illegal_insn generation.

* Insn[31] is no longer checked for generation of illegal instructions:
        This bit is part of the rs3 register adress for ternary
        bitmanipulation instructions (zbt).

* Correct bit-slicing for ALU reg-immediate instructions according
        to specification: immediates are encoded in the range
        insn[26:20] in all cases. Where a shift-amount is encoded, bits
        [26:25] will have no effect, but will no longer generate
        illegal instructions.

Signed-off-by: ganoam <gnoam@live.com>
2020-04-17 13:39:38 +02:00
ganoam
4cb77b8121 [bitmanip] Add ZBT Instruction Group
This commits implements the Bit Manipulateion Extension ZBT instruction
group: cmix, cmov, fsr[i] and fsl. Those are instructions depend on
three ALU operands. Completeion of these instructions takes 2 clock
cycles. Additionally, the rotation shifts rol and ror are made
multicycle instructions.

All multicycle instructions take exactly two cycles to complete.

Architectural additions:

        * Multicycle Stage Register in ID stage.
                multicycle_op_stage_reg

        * Decoder generates alu_multicycle signal, to stall pipeline

        * For all ternary instructions:
                1. cycle: connect alu operands a and b to rs1 and rs2
                          respectively
                2. cycle: connect operands a and be to rs3 and rs2
                          respectively

        * Reduce the physical size of the shifter from 64 bit to 63
                bit: 32-bit operand + 1 bit for arithmetic / one-shift

        * Make rotation shifts multicycle instructions.

Instruction Details:
        * cmov:
                1. store operand a (rs1) in stage reg.
                2. return stage reg output (rs2)  or rs3.

                if rs2 != 0 the output (rs1) is already known in the
                  first cycle. -> variable latency implementation is
                  possible.

        * cmix:
                1. store rs1 & rs2 in stage reg
                2. return stage_reg_q | (rs2 & ~rs3)

                reusing bwlogic from zbb

        * rol/ror: (here: ror)
              shift_amt       = rs2 & 31;
              shift_amt_compl = (32 - shift_amt) & 31
              1. store (rs1 >> shift_amt) in stage reg
              2. return (rs1 << shift_amt_compl) | stage_reg_q

        * fsl/fsr:
        For funnel shifts, the order of applying the shift
        amount or its complement is determined by bit [5] of
        shift_amt. Pseudocode for fsr:

              shift_amt       = rs2 & 63
              shift_amt_compl = (32 - shift_amt[4:0])

              1. if (shift_amt >= 33):
                    store (rs1 >> shift_amt_compl[4:0]) in stage reg
                 else if (shift_amt <0 && shift_amt <= 31):
                    store (rs1 << shift_amt[4:0]) in stage reg
                 else if (shift_amt == 32 || shift_amt == 0):
                    store rs1 in stage reg

              2. if (shift_amt >= 33):
                    return stage_reg_q | (rs3 << shift_amt[4:0])
                 else if (shift_amt <0 && shift_amt <= 31):
                    return stage_reg_q | (rs3 >> shift_amt_compl[4:0])
                 else if (shift_amt == 32):
                    return rs3
                 else if (shift_amt == 0):
                    return rs1

Signed-off-by: ganoam <gnoam@live.com>
2020-04-16 14:03:35 +02:00
Tom Roberts
97a50d7f12 [rtl] Add fixed time execution of branches
- A new parameter and a run-time control bit (DataIndTiming and
  data_ind_timing) enabling different behaviour for running security critical
  code sections.
- In the new mode, all branches act as if taken, with not-taken
  branches executing as a branch to the next instruction.
- This should give similar execution time/power characteristics
  regardless of the branch condition.
- Note that with the BranchTargetALU, branches stall an extra cycle in
  secure mode to avoid factoring the branch-taken decision into the
  branch target address mux.

Signed-off-by: Tom Roberts <tomroberts@lowrisc.org>
2020-04-13 14:27:40 +01:00
Greg Chadwick
0852120462 [rtl] Fix xprop issues in cs_registers
Alter how logic for csr_wreq was written to not propagate X when using
certain more pessimistic modes of x propagation in simulation.
2020-04-07 09:08:26 +01:00
Greg Chadwick
cb8afccb7c [rtl] Modify ASSERT_KNOWN uses to work with xprop
When xprop is enabled various case and if/else constructs will propagate
X leading to failures in ASSERT_KNOWN. This introduces enable terms to
various ASSERT_KNOWN uses that would otherwise fail without them.

prim_assert.sv changes copied across from OpenTitan respository.
2020-04-07 09:08:26 +01:00
Greg Chadwick
04ceda5267 [rtl] Add RV32B to various core files & top-levels 2020-03-31 16:49:08 +01:00
Tom Roberts
08f2271c75 [rtl] Unify top-level parameter declaration
- Make parameter declaration order and default values in
  ibex_core_tracing.sv match the documentation

Signed-off-by: Tom Roberts <tomroberts@lowrisc.org>
2020-03-27 17:28:15 +00:00
Tom Roberts
ea2ffe82f1 [rtl/pmp] Fix PMP error prioritization
- Region matches should be prioritized from 0 - N as stated in the RISCV
  Privileged Spec v1.11

Signed-off-by: Tom Roberts <tomroberts@lowrisc.org>
2020-03-27 16:43:35 +00:00
ganoam
8a26111f40 [bitmanip] Add ZBB Instruction Group
This commit implements the Bit Manipulation Extension ZBB instruction
group: clz, ctz, pcnt, slo, sro, rol, ror, rev, rev8, orcb, pack
packu, packh, min, max, andn, orn, and xnor.

* Bit counting instructions clz, ctz and pcnt can be implemented to
        share much of the architecture:

        clz: Count Leading Zeros. Counts the number of 0 bits at the
                MSB end of the argument.
        ctz: Count Trailing Zeros. Counts the number of 0 bits at the
                LSB end of the argument.
        pcnt: Counts the number of set bits of the argument.

        The implementation uses:

        - 32 one bit adders, counting the set bits of a signal
                bitcnt_bits, starting from the LSB end.

        - For pcnt the argument is fed directly into bitcnt_bits.

        - For clz, the operand is reversed such that leading zeros are
                located at the LSB end of bitcnt_bits.

        - For ctz and clz: counter enable signal for 1-bit counter i
                is high, if the previous enable signal, and
                its corresponting bitcnt_bit was high.

* Instructions sll[i], srl[i],slo[i], sro[i], rol, ror[i], rev, rev8
        and orc.b are summarized as shifting instructions and related:

        The following instructions are slight variations of the
        existing base spec's sll, srl and sra instructions.

        - slo[i] and sro[i]: shift left/right ones: similar to
                shift-logical operations from base spec, but shifting
                in ones instead of zeros.

        - rol and ror[i]: rotate left/right ones: circular shift
                operations. shifting in values from the oposite end
                of the operand instead of zeros.

        Those instructions can be implemented, sharing the base spec's
        shifting structure. In order to support rotate operations, a
        64-bit shifting structure is needed.

        In the existing ALU, hardware is described only for right
        shifts. For left shifts the operand is initially reversed,
        right shifted and the result is reversed back. This gives rise
        to an additional resource sharing oportunity for some more
        zbb operations:

        - rev: bitwise reversal.

        - rev8: byte-order swap.

        - orc.b: byte-wise reverse and or-combine.

* Instructions min, max:
        For the B-extension's min/max instructions, we can share the
        existing comparison operations. The result is obtained by
        activating the comparison structure accordingly and
        multiplexing the operands using the comparison result.

* Logic-with-negate instructions andn, orn, xnor:
        For the B-extension's logic-with-negate instructions we can
        share the structures of the base spec's logic structures
        already present for 'xnor', 'or' and 'and' instructions as
        well as the conditionally negated b operand generated for
        subtraction operations.

* Instructions pack, packu, packh:
        For the pack, packh and packu instructions I don't see any
        opportunities for resource sharing. However, the architecture
        is quite simple.

        - pack: pack the lower halves of rs1 and rs2 into rd, with rs1
                in the lower half and rs2 in the upper half.

        - packu: pack the upper halves of rs1 and rs2 into rd, with
                rs1 in the lower half and rs2 in the upper half.

        - packh: pack the LSB bytes of rs1 and rs2 into rd, with rs1
                in the lower half and rs2 in the upper half.

Signed-off-by: ganoam <gnoam@live.com>
2020-03-27 17:13:26 +01:00
Greg Chadwick
e1aac0735c [rtl] Lint fixes 2020-03-27 10:30:46 +00:00
Philipp Wagner
cef7b4d154
Remove unused ibex_pkg from tracer (#737)
Signed-off-by: Philipp Wagner <phw@lowrisc.org>
2020-03-26 14:14:24 -07:00
Tom Roberts
b897300cbd [rtl] Branch signal timing fix
- Before this fix, the branch signal was qualified by the illegal
  instruction signal and the illegal csr signal.
- This patch removes both of these since the decoder already masks
  branches with illegal isntruction, and a branch cannot be a CSR op.
- This improves the worst path in the design significantly without the
  branch target ALU.

Signed-off-by: Tom Roberts <tomroberts@lowrisc.org>
2020-03-25 15:26:02 +00:00
Tom Roberts
624ef41462 [rtl] Extend BT ALU to be used for all jumps
- Create separate operand muxes for the branch/jump target ALU
- Complete jump instructions in one cycle when BT ALU configured

Signed-off-by: Tom Roberts <tomroberts@lowrisc.org>
2020-03-25 15:25:22 +00:00
Tom Roberts
3c561e4106 [rtl/icache] Fix an inconsistency in data output
- valid_o could be asserted for one cycle then dropped when receiving
  rvalid data for a request which has branched into  the middle of a
  line.
- This fix keeps valid_o asserted by using the offset version of
  fill_rvd_cnt_q (fill_rvd_beat) to compare against fill_out_cnt_q
  (which is also offset by the branch).

Signed-off-by: Tom Roberts <tomroberts@lowrisc.org>
2020-03-24 13:47:57 +00:00
Tom Roberts
f8f8560563 [rtl] Remove stall cycle with BT ALU
- Branches don't need to stall with the branch target ALU

Signed-off-by: Tom Roberts <tomroberts@lowrisc.org>
2020-03-24 13:47:25 +00:00
Tom Roberts
0a540cffb3 [rtl/icache] Fix a couple of icache bugs
- Speculative requests observing a PMP error shouldn't increment the
  external request counter
- Remove redundant logic on fill_rvd_exp

Signed-off-by: Tom Roberts <tomroberts@lowrisc.org>
2020-03-23 12:57:54 +00:00
Tom Roberts
c054a63c3d [rtl] Instantiate instruction cache
- Add parameters and actual instantiation of icache
- Add a custom CSR in the M-mode custom RW range to enable the cache
- Wire up the cache invalidation signal to trigger on fence.i

Signed-off-by: Tom Roberts <tomroberts@lowrisc.org>
2020-03-23 12:57:31 +00:00
Tom Roberts
42aa761c5d [rtl] Fix mtval for unaligned instr errors
mtval should record which half of the instruction caused the error
rather than just recording the PC.
An extra signal is added in the IF stage to indicate when an error is
caused by the second half of an unaligned instruction. This signal is
then used to increment the PC by 2 for mtval capture on an error.

Fixes #709
2020-03-18 12:53:35 +00:00
Tom Roberts
8bb649e4ab [rtl/icache] Fix PMP error logic
Instruction requests triggering PMP errors have their external request
suppressed. The beat counting logic therefore needs to know that these
requests will never receive any rvalid data responses.

This fix stops the external request counter from incrementing, and marks
all external requests complete as soon as any error is received.

The data in the cache line beyond the error is not required since the
core cannot access it without consuming the error first.
2020-03-18 12:53:09 +00:00
Tom Roberts
ef17d4fcc2 [rtl] Add Icache ECC
- Add modules for ecc generation and checking
- Add supporting logic to icache module

Signed-off-by: Tom Roberts <tomroberts@lowrisc.org>
2020-03-18 11:28:06 +00:00
Tom Roberts
fe00eb46e9 [rtl] Icache RAM primitive changes
- Bring in a version of ram primitive with configurable width similar to
  the OT RAM primitive.
- Change the RAM banking structure to be a single bank of LineSize (64
  bits) to match the upcoming ECC granularity.

Signed-off-by: Tom Roberts <tomroberts@lowrisc.org>
2020-03-18 11:28:06 +00:00
Tom Roberts
854faeda39 [rtl/icache] Make age matrix more consistent
The design currently relies on fill_done remaining set in the cycle
after the fill buffer completes to ensure the fill_older_q entry gets
cleared (when a fill buffer completes in the same cycle that one is
allocated). This fix makes the behaviour a bit more consistent and easy
to reason about.
2020-03-16 13:12:19 +00:00