Commit graph

29 commits

Author SHA1 Message Date
Hailin
13bbaf72d9 [test] Connect FPU subsystem 2023-03-31 15:58:32 +02:00
Pirmin Vogel
c78acac8cc [rtl, bitmanip] Add xperm.[nbh] instruction (Zbp, draft v.0.93)
Signed-off-by: Pirmin Vogel <vogelpi@lowrisc.org>
2021-12-06 11:14:49 +01:00
Pirmin Vogel
40dab87448 [rtl, bitmanip] Clarify situation around zext.[bh] pseudo-instructions
This is related to lowRISC/Ibex#1228.

Signed-off-by: Pirmin Vogel <vogelpi@lowrisc.org>
2021-12-03 22:43:05 +01:00
Pirmin Vogel
16d6f5ea2b [rtl, bitmanip] Align Zbb implementation with draft v.0.93 and v.1.0.0
This invovles the following changes:
- Rename pcnt to cpop
- Switch encoding of max and minu
- Remove rev from Balanced version, only available in Full version via
  grev (Zbp)
- Include sext.b/h (previously in Zb_tmp)
- Remove slo[i] and sro[i] from Balanced version, only available in Full
  version (Zbp)

Signed-off-by: Pirmin Vogel <vogelpi@lowrisc.org>
2021-12-03 22:43:05 +01:00
Pirmin Vogel
e765b4dfec [rtl, bitmanip] Align Zbs implementation with draft v.0.93 and v.1.0.0
This only involves dropping the `s` from the instruction names, i.e.,
sbext becomes bext etc.

Signed-off-by: Pirmin Vogel <vogelpi@lowrisc.org>
2021-12-03 22:43:05 +01:00
Pirmin Vogel
71b43a83e2 [rtl, bitmanip] Rename bext/bdep to bcompress/bdecompress
This change is related to the bitmanip draft version 0.94. It's needed
as in draft version 0.93 as well as in version 1.00 sbext from Zbs
changes to bext, leading to two completely different instructions having
the same name.

Signed-off-by: Pirmin Vogel <vogelpi@lowrisc.org>
2021-12-03 22:43:05 +01:00
Michael Munday
c35472abb9 [bitmanip][zba] Add support for Zba (address calculation) extension
Add support for the Zba extension added in v0.93 of the bit manipulation
specification (unchanged in v1.0.0). The new instructions added are:

  - sh1add: rd = (rs1 << 1) + rs2
  - sh2add: rd = (rs1 << 2) + rs2
  - sh3add: rd = (rs1 << 3) + rs2

The instructions are single cycle and have been implemented using the
adder in the ALU.

Signed-off-by: Michael Munday <mike.munday@lowrisc.org>
2021-11-01 09:58:01 +00:00
Philipp Wagner
b99da424ff [style] Indent package bodies
The style guide requires the package body to be indented with two
spaces.
2021-08-31 15:30:28 +02:00
Rupert Swarbrick
53926b5fb9 [rtl] Break long lines in Ibex tracer
These go over the 100 character limit in our style guide (and will
cause Verible lint warnings when vendored into OpenTitan).
2021-04-22 12:30:47 +01:00
Pirmin Vogel
c69fc8b6f2 [rtl] Fix overlapping encodings of immediate instructions in tracer package
This commit modifies the encoding of SROI, RORI, SBEXTI, GREVI and GORCI by
forcing Bit 26 to zero to prevent overlapping encodings with FSRI.

The bitmanip draft spec doesn't explicitly state that Bit 26 for those
instructions must be zero. However, those instructions only ever use
log2(XLEN) LSBs of the immediate. This means they don't use Bit 26 in RV32.
Instead, whenever Bit 26 is set, these instructions are instead decoded as
FSRI.
2021-01-21 17:11:47 +01:00
Pirmin Vogel
e64f94e798 [rtl] Fix encoding of ZIP/UNZIP pseudo-instrcutions in tracer package
Just like for the corresponding base instructions SHFLI/UNSHFLI the MSBs of
all these pseudo-instructions must be 6'b0000_10.
2021-01-21 17:11:47 +01:00
Pirmin Vogel
760baa1eb2 [rtl] Fix encoding for ORC16/REV16 instructions in tracer package
This bug was originally found by @micprog.
2021-01-19 15:05:07 +01:00
Tobias Wölfel
3371732f94 [rtl] Disable definition of unused instructions
The parameters are not used as the instructions are not yet widely
supported.
Keep definitions so they can be easily activated later.

Tracked in issue lowrisc/ibex#1228
2021-01-11 16:20:33 +01:00
Tobias Wölfel
90258b6d07 [rtl] Remove unused tracer branch instruction
`INSTR_BALL` was introduced in 47b713fd as a vector instruction.
This is not used and is probably a leftover so can be removed.
2021-01-11 16:20:33 +01:00
Philipp Wagner
67e7417749 Fix Verible lint issues
Fix all remaining issues reported by Verible lint.

It turns out that #965 undid some of the fixes in `ibex_alu.sv`
that were done in #980 around the `SHUFFLE_*`/`FLIP_*` signals.
2020-07-03 12:20:32 +01:00
ganoam
66687e927c [bitmanip] Add ZBR instruction group
This commit implements the Bit Manipulation Extension ZBR instruction
group: crc32[c].[bhw].

CRC-32 (CRC-32/ISO-HDLC) and CRC-32C (CRC-32/ISCSI) are directly
implemented. The CRC operation solves the following equation using
binary polynomial arithmetic:

rev(rd)(x) = rev(rs1)(x) * x**n mod {1, P}(x),

where {1,P}(x) denotes the crc polynomial. Using barret reduction one
can write this as

rd = (rs1 >> n) ^ rev(rev( (rs1 << (32-1)) cx rev(mu)) cx P)
                      ^-- cycle 0--------------------^
     ^-- cycle 1 ------------------------------------------^

Where cx denotes carry-less multiplication and mu = polydiv(x**64,
{1,P}), omitting the MSB (bit 32).

The implementation increases area consumption by ~0.6kGE for synthesis
with relaxed timing constraints. With tight timing constraints that is
~1.6kGE. There is no significant impact on frequency.

Signed-off-by: ganoam <gnoam@live.com>
2020-05-22 17:21:03 +02:00
ganoam
f173e2baba [bitmanip] Add ZBC instruction group
This commit implements the Bit Manipulation Extension ZBC instruction
group: clmul[rh] (carry-less multiply [reverse][high])

Carry-less multiplication can be understood as multiplication based on
the addition interpreted as the bit-wise xor operation.

Example: 1101 X 1011 = 1111111:

      1011 X 1101
      -----------
             1101
        xor 1101
        ---------
            10111
       xor 0000
       ----------
           010111
      xor 1101
      -----------
          1111111

Architectural details:
        A 32 x 32-bit array
        [ operand_b[i] ? (operand_a << i) : '0 for i in 0 ... 31 ]
        is generated. The entries of the array are pairwise 'xor-ed'
        together in a 5-stage binary tree.

The area increase when synthesized with relaxed timing constraints is
1.6-1.7kGE.

Timing figures are improve by 0.1 ns for the 3-stage configuration and
worsen by 0.04ns for the 2-stage implementation. This suggests
fluctuations due to the heuristic nature of the synthesis tools.

Signed-off-by: ganoam <gnoam@live.com>
2020-05-19 10:38:38 +02:00
ganoam
9bd3350bb3 [bitmanip] Add sext.b/h instructions
This commit implements the Bit Manipulation Extension sign-extend
instructions: sext.b (sign-extend byte) and sext.h (sign-extend half
word).

The implementation is basically a one-liner, duplicating the msb of the
byte / half-word into the msb of the output register.

Signed-off-by: ganoam <gnoam@live.com>
2020-05-14 22:03:45 +02:00
ganoam
fac404a6f3 [bitmanip] Add ZBF instruction group
This commit implements the Bit Manipulation Extension ZBF instruction
group, which consists only of the one instruction bfp (bit-field
place).
This instruction places a field of length len < 16 from rs2 in rs1 at
offset off.

Architectureal details:
        The implementation works exactly the same as proposed by Claire
        Wolf in her reference implementation.
        1. bfp_mask = slo(o, len)
        2. bfp_result =
                (rs1 & ~(bfp_mask << off)) | (rs2 & bfp_mask) << off
                        ^------ shifter-^
        The existing shifter structure is shared for the indicated
        operation.

Impact on area:

        * When synthesizing without the B-extension, the 2 stage
        design seems to move the timing bottleneck, leading to
        optimizations which result in an area increase by 1 kGE,
        when synthesized with tight timing constraints. For the
        3 stage configuration there is no change.
        When synthesized with relaxed timing constraints there is no
        significant change in either configuration.

        * With the B-extension enabled, the area increase for tight
        timing constraints is 1.1-1.2 kGE. For relaxed timing
        constraints that is ~0.4kGE

Impact on timing: No significant impact.

Signed-off-by: ganoam <gnoam@live.com>
2020-05-14 21:34:49 +02:00
ganoam
0afd000a09 [bitmanip] Add ZBE Instruction Group
This commit implements the Bit Manipulation Extension ZBE instruction
group: bext (bit extract) and bdep (bit deposit).

Architectural details:
        * bext/bdep: A new butterfly and inverse butterfly network is
        implemented. The generation of its controlbits depend on a
        parallel prefix bitcount of the deposit / extract mask.

        * bitcounter: The path for bext / bdep instructions traverses
        the bit counter and the butterfly network, resulting in both a
        larger delay and area. To mitigate the bitcounter has been
        changed from a serial bit counter to a radix-2 tree structure.

        * grev/gorc: Zbp instructions general reverse and general
        or-combine have as of yet shared the shifters reversal
        structure. It has proven benefitial to area and timing to reuse
        the novel butterfly network instead

The butterfly network itself consumes ~3.5kGE and ~1.1kGE for synthesis
with tight and relaxed timing constraints respectively. Including the
optimizations of the bitcounter and grev/gorc, the overall change in
area consumption is +4.6kGE (+1.2kGE) and +3.3kGE (+1.1kGE) for
synthesis with tight (relaxed) timing constraints for 2- and 3-stage
configurations respectively. For tight timing constraints that is a
growth by around ~10%, for relaxed ~5%.

The impact on the maximum frequency is negligable.

Signed-off-by: ganoam <gnoam@live.com>
2020-05-14 16:43:19 +02:00
ganoam
a68923a404 [bitmanip] Add ZBP Instruction Group
This commit implements the Bit Manipulation Extension ZBP instruction
group: grev[i] (generalized reverse), gorc[i] (generalized or-combine)
and [un]shfl[i] (generalized shuffle) and all of their
pseudo-instructions.

Architectural details:
        * grev / gorc: The shifter structure features only a right
        shift structure. In order to perform a left shift therefore the
        operand needs to be reversed, shifted and reversed again. The
        architecture of the back-reversal is implemented in stages
        which are activated using the general reverse / orcombine
        operand, or a signal marking left-shifts.

        * shfl / unshfl: Also known as zip / unzip or interlace /
        uninterlace operation. These instructions are implemented
        in their own structure using a permutation networ of 6 stages.
        4 stages thereof implement the shuffle permutations. the first
        and last stage is the flip stage, which effectively reverse s
        the order of the inner stages, for unshuffle operations.

Signed-off-by: ganoam <gnoam@live.com>
2020-04-29 11:10:44 +02:00
ganoam
133fef2c2f [bitmanip] Add ZBS Instruction Group
This commit implements the Bit Manipulation Extension SBS instruction
group: sbset[i], sbclr[i], sbinv[i] and sbext[i]. These instructions
set, clear, invert or extract bit rs1[rs2] or rs1[imm] for reg-reg and
reg-imm instructions respectively.

Archtectural details:
        * A multiplexer is added to the shifter structure in order to
          chose between 32'h1, used for the single-bit instructions as
          summarized below, and regular operand_b input.

        * Dedicated bitwise-logic blocks are introduced for multicycle
          shifts and cmix instructions (fsr, fsl, ror, rol),
          single-bit instructions (sbset, sbclr, sbinv, sbext), and
          stanard-ALU and zbb instructions (or, and xor, orn, andn,
          xnor).

Instruction details: All of the zbs instructions rely on sharing the
        existing shifter structure. The instructions are carried out in
        one cycle.

        * sbset, sbclr, sbinv:
                shift_result = 32'h1 << rs2[4:0];
                singlebit_result = rs1 [|, ^ , &~] shift_result;

        * sbext:
                shift_result = rs1 >> rs2[4:0];
                singlebit_result = {31'0,shift_resutl[0]};

Signed-off-by: ganoam <gnoam@live.com>
2020-04-24 08:32:30 +02:00
ganoam
06f50ddeac Bugfix: Generate Erroneous Illegal Insn
This commit fixes three possible cases for erroneous generation of
illegal instruction signals. Also, the bit-slices considered for
decoding ALU instructions are corrected to better reflect their
encoding specifications.

* Fix decoding of orc_b in illegal_insn generation.

* Insn[31] is no longer checked for generation of illegal instructions:
        This bit is part of the rs3 register adress for ternary
        bitmanipulation instructions (zbt).

* Correct bit-slicing for ALU reg-immediate instructions according
        to specification: immediates are encoded in the range
        insn[26:20] in all cases. Where a shift-amount is encoded, bits
        [26:25] will have no effect, but will no longer generate
        illegal instructions.

Signed-off-by: ganoam <gnoam@live.com>
2020-04-17 13:39:38 +02:00
ganoam
4cb77b8121 [bitmanip] Add ZBT Instruction Group
This commits implements the Bit Manipulateion Extension ZBT instruction
group: cmix, cmov, fsr[i] and fsl. Those are instructions depend on
three ALU operands. Completeion of these instructions takes 2 clock
cycles. Additionally, the rotation shifts rol and ror are made
multicycle instructions.

All multicycle instructions take exactly two cycles to complete.

Architectural additions:

        * Multicycle Stage Register in ID stage.
                multicycle_op_stage_reg

        * Decoder generates alu_multicycle signal, to stall pipeline

        * For all ternary instructions:
                1. cycle: connect alu operands a and b to rs1 and rs2
                          respectively
                2. cycle: connect operands a and be to rs3 and rs2
                          respectively

        * Reduce the physical size of the shifter from 64 bit to 63
                bit: 32-bit operand + 1 bit for arithmetic / one-shift

        * Make rotation shifts multicycle instructions.

Instruction Details:
        * cmov:
                1. store operand a (rs1) in stage reg.
                2. return stage reg output (rs2)  or rs3.

                if rs2 != 0 the output (rs1) is already known in the
                  first cycle. -> variable latency implementation is
                  possible.

        * cmix:
                1. store rs1 & rs2 in stage reg
                2. return stage_reg_q | (rs2 & ~rs3)

                reusing bwlogic from zbb

        * rol/ror: (here: ror)
              shift_amt       = rs2 & 31;
              shift_amt_compl = (32 - shift_amt) & 31
              1. store (rs1 >> shift_amt) in stage reg
              2. return (rs1 << shift_amt_compl) | stage_reg_q

        * fsl/fsr:
        For funnel shifts, the order of applying the shift
        amount or its complement is determined by bit [5] of
        shift_amt. Pseudocode for fsr:

              shift_amt       = rs2 & 63
              shift_amt_compl = (32 - shift_amt[4:0])

              1. if (shift_amt >= 33):
                    store (rs1 >> shift_amt_compl[4:0]) in stage reg
                 else if (shift_amt <0 && shift_amt <= 31):
                    store (rs1 << shift_amt[4:0]) in stage reg
                 else if (shift_amt == 32 || shift_amt == 0):
                    store rs1 in stage reg

              2. if (shift_amt >= 33):
                    return stage_reg_q | (rs3 << shift_amt[4:0])
                 else if (shift_amt <0 && shift_amt <= 31):
                    return stage_reg_q | (rs3 >> shift_amt_compl[4:0])
                 else if (shift_amt == 32):
                    return rs3
                 else if (shift_amt == 0):
                    return rs1

Signed-off-by: ganoam <gnoam@live.com>
2020-04-16 14:03:35 +02:00
ganoam
8a26111f40 [bitmanip] Add ZBB Instruction Group
This commit implements the Bit Manipulation Extension ZBB instruction
group: clz, ctz, pcnt, slo, sro, rol, ror, rev, rev8, orcb, pack
packu, packh, min, max, andn, orn, and xnor.

* Bit counting instructions clz, ctz and pcnt can be implemented to
        share much of the architecture:

        clz: Count Leading Zeros. Counts the number of 0 bits at the
                MSB end of the argument.
        ctz: Count Trailing Zeros. Counts the number of 0 bits at the
                LSB end of the argument.
        pcnt: Counts the number of set bits of the argument.

        The implementation uses:

        - 32 one bit adders, counting the set bits of a signal
                bitcnt_bits, starting from the LSB end.

        - For pcnt the argument is fed directly into bitcnt_bits.

        - For clz, the operand is reversed such that leading zeros are
                located at the LSB end of bitcnt_bits.

        - For ctz and clz: counter enable signal for 1-bit counter i
                is high, if the previous enable signal, and
                its corresponting bitcnt_bit was high.

* Instructions sll[i], srl[i],slo[i], sro[i], rol, ror[i], rev, rev8
        and orc.b are summarized as shifting instructions and related:

        The following instructions are slight variations of the
        existing base spec's sll, srl and sra instructions.

        - slo[i] and sro[i]: shift left/right ones: similar to
                shift-logical operations from base spec, but shifting
                in ones instead of zeros.

        - rol and ror[i]: rotate left/right ones: circular shift
                operations. shifting in values from the oposite end
                of the operand instead of zeros.

        Those instructions can be implemented, sharing the base spec's
        shifting structure. In order to support rotate operations, a
        64-bit shifting structure is needed.

        In the existing ALU, hardware is described only for right
        shifts. For left shifts the operand is initially reversed,
        right shifted and the result is reversed back. This gives rise
        to an additional resource sharing oportunity for some more
        zbb operations:

        - rev: bitwise reversal.

        - rev8: byte-order swap.

        - orc.b: byte-wise reverse and or-combine.

* Instructions min, max:
        For the B-extension's min/max instructions, we can share the
        existing comparison operations. The result is obtained by
        activating the comparison structure accordingly and
        multiplexing the operands using the comparison result.

* Logic-with-negate instructions andn, orn, xnor:
        For the B-extension's logic-with-negate instructions we can
        share the structures of the base spec's logic structures
        already present for 'xnor', 'or' and 'and' instructions as
        well as the conditionally negated b operand generated for
        subtraction operations.

* Instructions pack, packu, packh:
        For the pack, packh and packu instructions I don't see any
        opportunities for resource sharing. However, the architecture
        is quite simple.

        - pack: pack the lower halves of rs1 and rs2 into rd, with rs1
                in the lower half and rs2 in the upper half.

        - packu: pack the upper halves of rs1 and rs2 into rd, with
                rs1 in the lower half and rs2 in the upper half.

        - packh: pack the LSB bytes of rs1 and rs2 into rd, with rs1
                in the lower half and rs2 in the upper half.

Signed-off-by: ganoam <gnoam@live.com>
2020-03-27 17:13:26 +01:00
Philipp Wagner
74780e7e17 Implement Verilator-compatible tracer, and use it
The ibex_tracer module implements an execution tracer, observing the
execution flow and writing a human-readable execution trace. The trace
information is coming from the RVFI signals, as specified at
https://github.com/SymbioticEDA/riscv-formal/blob/master/docs/rvfi.md.

The existing implementation was tailored for use in ModelSim and other
commercial simulators, and used SystemVerilog features which are not
supported in Verilator or Icarus Verilog, such as classes, queues and
non-standard format specifiers (e.g. the `-` specifier for right-aligned
output). Being unable to see an execution trace when using Verilator
significantly reduced productivity and its usefulness.

This commit refactors the tracer to only use SystemVerilog constructs
which are supported in Verilator. While doing so, multiple improvements
were made for correctness and style.

Major changes:

- Improve compatibility with Verilator. Remove many non-synthesizable
  SystemVerilog constructs, such as classes and queues.
  Use casez instead of casex for better Verilator support (Verilator
  doesn't support X).
- Make the decoded output of the tracer match objdump from binutils
  exactly. Doing so is beneficial for two reasons: we can easily
  cross-check the decoded output from the tracer against the disassembly
  produced by objdump (and we did that), and users don't need to get
  used to another slighly different disassembly format.
- A plusarg "+ibex_tracer_file_base=ibex_my_trace" can be used to set a
  different basename for the trace log file.

Smaller cleanups:

- Remove decoding of reg-reg loads, which were leftover from a PULP
  extension.
- Make better use of the data available on the RVFI. Pass all of RVFI
  to the tracer, and use the provided data instead of manually
  recreating it, e.g. to get register data or the jump target.
- Rename all "instr" abbreviations to "insn". "insn" is what RVFI uses
  (and we cannot change that), so for consistency we now always use this
  abbreviation across the file.

All CSR names have been imported from binutils' riscv-opc.h file, available at
https://sourceware.org/git/gitweb.cgi?p=binutils-gdb.git;a=blob_plain;f=include/opcode/riscv-opc.h
using this small C program:

  #include <stdio.h>

  #define STR(s) #s

  int main(int argc, char **argv) {
    printf("unique case (csr_addr)\n");
  #define DECLARE_CSR(name, csraddr) \
    printf("  12'd%d: return \"%s\";\n", csraddr, STR(name));
  #include "riscv-opc.h"
    printf("  default: return $sformatf(\"0x%%x\", csr_addr);\n");
    printf("endcase\n");
    return 0;
  }

The RISC-V compliance test suite for the RV32 I, M, and C extensions has
been executed and traced. The disassembly of all traces have been
compared against traces produced by objdump to ensure identical output.

This PR is based on work by Rahul Behl <raulbehl@gmail.com> in #280.
Thank you Rahul for providing a great starting point for this work!
2019-10-02 18:28:26 +01:00
Rahul Behl
60de915d6b Adding Compressed Instruction support in tracer
Added compressed instruction decoder in the tracer to correctly
trace compressed instructions with their mnemonics. Fixes #197
2019-09-06 15:43:53 +01:00
Philipp Wagner
7eee24c094 Mention CREDITS.md in license header 2019-08-27 18:10:02 +01:00
Philipp Wagner
428d057c4a Rename ibex_[tracer_]define to ibex_[tracer_]pkg
This file doesn't contain defines any more, but a normal SV package.

The diff is best viewed without whitespace changes, as the reindents
cause a lof of diff noise.

Fixes lowrisc/ibex#173
2019-07-19 11:34:40 +01:00
Renamed from rtl/ibex_tracer_defines.sv (Browse further)