[bitmanip] Optimizations and Parametrization

This commit contains some final optimizations regarding the bit
manipulation extension as well as the parametrization into a balanced
version and a full performance version.

Balanced Version:
        * Supports ZBB, ZBS, ZBF and ZBT extensions
        * Dual cycle instructions:
          ror[i], rol, cmov, cmix fsl, fsr[i]
        * Everything else completes in a single cycle.

Full Version:
        * Supports all 32b sub extensions.
        * Dual cycle instructions:
          ror[i], rol, cmov, cmix fsl, fsr[i], crc32[c], bext, bdep
        * Everything else completes in a single cycle.

Notable Changes:
        * bext/bdep are now multi-cycle: Sharing additional register
          with multiplier module
        * grev/gorc instructions are implemented in separate structures
          rather than sharing the shifter or butterfly network.
        * Speed up decision on using rs1 or rs3 for alu_operand_a by
          introducing single-bit register, to identify ternary
          instructions in their first cycle.
        * Introduce enumerated parameter to chose bit manipulation
          implementation

Signed-off-by: ganoam <gnoam@live.com>
This commit is contained in:
ganoam 2020-06-01 14:55:49 +02:00 committed by Pirmin Vogel
parent 71b3474781
commit 1aa4d5a32b
24 changed files with 1137 additions and 880 deletions

View file

@ -20,7 +20,7 @@ The options include different choices for the architecture of the multiplier uni
The table below indicates performance, area and verification status for a few selected configurations. The table below indicates performance, area and verification status for a few selected configurations.
These are configurations on which lowRISC is focusing for performance evaluation and design verification (see [supported configs](ibex_configs.yaml)). These are configurations on which lowRISC is focusing for performance evaluation and design verification (see [supported configs](ibex_configs.yaml)).
| Config | "small" | "maxperf" | "maxperf-pmp-bm" | | Config | "small" | "maxperf" | "maxperf-pmp-bmfull" |
| ------ | ------- | --------- | ---------------- | | ------ | ------- | --------- | ---------------- |
| Features | RV32IMC, 3 cycle mult | RV32IMC, 1 cycle mult, Branch target ALU, Writeback stage | RV32IMCB, 1 cycle mult, Branch target ALU, Writeback stage, 16 PMP regions | | Features | RV32IMC, 3 cycle mult | RV32IMC, 1 cycle mult, Branch target ALU, Writeback stage | RV32IMCB, 1 cycle mult, Branch target ALU, Writeback stage, 16 PMP regions |
| Performance (Coremark/MHz) | 2.44 | 3.09 | 3.09 | | Performance (Coremark/MHz) | 2.44 | 3.09 | 3.09 |

View file

@ -159,4 +159,4 @@ jobs:
ibex_configs: ibex_configs:
- small - small
- experimental-maxperf-pmp - experimental-maxperf-pmp
- experimental-maxperf-pmp-bm - experimental-maxperf-pmp-bmfull

View file

@ -64,10 +64,46 @@ Other blocks use the ALU for the following tasks:
* It computes memory addresses for loads and stores with a Reg + Imm calculation * It computes memory addresses for loads and stores with a Reg + Imm calculation
* The LSU uses it to increment addresses when performing two accesses to handle an unaligned access * The LSU uses it to increment addresses when performing two accesses to handle an unaligned access
Support for the RISC-V Bitmanipulation Extension (Document Version 0.92, November 8, 2019) is enabled via the parameter ``RV32B``. Bit Manipulation Extension
This feature is *EXPERIMENTAL* and the details of its impact are not yet documented here. Support for the `RISC-V Bit Manipulation Extension (Document Version 0.92, November 8, 2019) <https://github.com/riscv/riscv-bitmanip/blob/master/bitmanip-0.92.pdf>`_ is enabled via the enumerated parameter ``RV32B`` defined in :file:`rtl/ibex_pkg.sv`.
Currently the Zbb, Zbs, Zbp, Zbe, Zbf, Zbc, Zbr and Zbt sub-extensions are implemented. This feature is *Experimental*.
The rotate instructions `ror` and `rol` (Zbb), ternary instructions `cmov`, `cmix`, `fsl` and `fsr` as well as cyclic redundancy checks `crc32[c]` (Zbr) are completed in 2 cycles. All remaining instructions complete in one cycle.
There are two versions of the bit manipulation extension available:
The balanced implementation comprises a set of sub-extensions aiming for good benefits at a reasonable area overhead.
The full implementation comprises all 32 bit instructions defined in the extension.
The following table lists the implemented instructions in each version.
Multi-cycle instructions are completed in 2 cycles.
All remaining instructions complete in a single cycle.
+---------------------------+---------------+--------------------------+
| Z-Extension | Version | Multi-Cycle Instructions |
+===========================+===============+==========================+
| Zbb (Base) | Balanced/Full | rol, ror[i] |
+---------------------------+---------------+--------------------------+
| Zbs (Single-bit) | Balanced/Full | None |
+---------------------------+---------------+--------------------------+
| Zbp (Permutation) | Full | None |
+---------------------------+---------------+--------------------------+
| Zbp (Bit extract/deposit) | Full | All |
+---------------------------+---------------+--------------------------+
| Zbf (Bit-field place) | Balanced/Full | All |
+---------------------------+---------------+--------------------------+
| Zbc (Carry-less multiply) | Full | None |
+---------------------------+---------------+--------------------------+
| Zbr (Crc) | Full | All |
+---------------------------+---------------+--------------------------+
| Zbt (Ternary) | Balanced/Full | All |
+---------------------------+---------------+--------------------------+
| Zb_tmp (Temporary)* | Balanced/Full | None |
+---------------------------+---------------+--------------------------+
* The sign-extend instructions `sext.b/sext.h` are defined but not yet classified in version 0.92 of the extension proposal.
Temporarily, they are assigned a separate Z-extension.
The implementation of the B-extension comes with an area overhead of 1.8 to 3.0 kGE for the balanced version and 6.0 to 8.7 kGE for the full version.
That corresponds to an approximate percentage increase in area of 9 to 14 % and 25 to 30 % for the balanced and full versions respectively.
The ranges correspond to synthesis results generated using relaxed and maximum frequency targets respectively.
The designs have been synthesized using Synopsys Design Compiler targeting TSMC 65 nm technology.
.. _mult-div: .. _mult-div:

View file

@ -19,7 +19,7 @@ Instantiation Template
.MHPMCounterWidth ( 40 ), .MHPMCounterWidth ( 40 ),
.RV32E ( 0 ), .RV32E ( 0 ),
.RV32M ( 1 ), .RV32M ( 1 ),
.RV32B ( 0 ), .RV32B ( ibex_pkg::RV32BNone ),
.MultiplierImplementation ( "fast" ), .MultiplierImplementation ( "fast" ),
.ICache ( 0 ), .ICache ( 0 ),
.ICacheECC ( 0 ), .ICacheECC ( 0 ),
@ -74,55 +74,55 @@ Instantiation Template
Parameters Parameters
---------- ----------
+------------------------------+-------------+------------+-----------------------------------------------------------------+ +------------------------------+-------------------+------------+-----------------------------------------------------------------+
| Name | Type/Range | Default | Description | | Name | Type/Range | Default | Description |
+==============================+=============+============+=================================================================+ +==============================+===================+============+=================================================================+
| ``PMPEnable`` | bit | 0 | Enable PMP support | | ``PMPEnable`` | bit | 0 | Enable PMP support |
+------------------------------+-------------+------------+-----------------------------------------------------------------+ +------------------------------+-------------------+------------+-----------------------------------------------------------------+
| ``PMPGranularity`` | int (0..31) | 0 | Minimum granularity of PMP address matching | | ``PMPGranularity`` | int (0..31) | 0 | Minimum granularity of PMP address matching |
+------------------------------+-------------+------------+-----------------------------------------------------------------+ +------------------------------+-------------------+------------+-----------------------------------------------------------------+
| ``PMPNumRegions`` | int (1..16) | 4 | Number implemented PMP regions (ignored if PMPEnable == 0) | | ``PMPNumRegions`` | int (1..16) | 4 | Number implemented PMP regions (ignored if PMPEnable == 0) |
+------------------------------+-------------+------------+-----------------------------------------------------------------+ +------------------------------+-------------------+------------+-----------------------------------------------------------------+
| ``MHPMCounterNum`` | int (0..10) | 0 | Number of performance monitor event counters | | ``MHPMCounterNum`` | int (0..10) | 0 | Number of performance monitor event counters |
+------------------------------+-------------+------------+-----------------------------------------------------------------+ +------------------------------+-------------------+------------+-----------------------------------------------------------------+
| ``MHPMCounterWidth`` | int (64..1) | 40 | Bit width of performance monitor event counters | | ``MHPMCounterWidth`` | int (64..1) | 40 | Bit width of performance monitor event counters |
+------------------------------+-------------+------------+-----------------------------------------------------------------+ +------------------------------+-------------------+------------+-----------------------------------------------------------------+
| ``RV32E`` | bit | 0 | RV32E mode enable (16 integer registers only) | | ``RV32E`` | bit | 0 | RV32E mode enable (16 integer registers only) |
+------------------------------+-------------+------------+-----------------------------------------------------------------+ +------------------------------+-------------------+------------+-----------------------------------------------------------------+
| ``RV32M`` | bit | 1 | M(ultiply) extension enable | | ``RV32M`` | bit | 1 | M(ultiply) extension enable |
+------------------------------+-------------+------------+-----------------------------------------------------------------+ +------------------------------+-------------------+------------+-----------------------------------------------------------------+
| ``RV32B`` | bit | 0 | *EXPERIMENTAL* - B(itmanipulation) extension enable: | | ``RV32B`` | ibex_pkg::rv32b_e | RV32BNone | *EXPERIMENTAL* - B(itmanipulation) extension select: |
| | | | Currently supported Z-extensions: Zbb (base), Zbs (single-bit) | | | | | "RV32BNone": No B-extension |
| | | | Zbp (bit permutation), Zbe (bit extract/deposit), | | | | | "RV32BBalanced": Sub-extensions Zbb, Zbs, Zbf and |
| | | | Zbf (bit-field place) Zbc (carry-less multiplication) | | | | | Zbt |
| | | | Zbr (cyclic redundancy check) and Zbt (ternary) | | | | | "RV32Full": All sub-extensions |
+------------------------------+-------------+------------+-----------------------------------------------------------------+ +------------------------------+-------------------+------------+-----------------------------------------------------------------+
| ``BranchTargetALU`` | bit | 0 | *EXPERIMENTAL* - Enables branch target ALU removing a stall | | ``BranchTargetALU`` | bit | 0 | *EXPERIMENTAL* - Enables branch target ALU removing a stall |
| | | | cycle from taken branches | | | | | cycle from taken branches |
+------------------------------+-------------+------------+-----------------------------------------------------------------+ +------------------------------+------------------ +------------+-----------------------------------------------------------------+
| ``WritebackStage`` | bit | 0 | *EXPERIMENTAL* - Enables third pipeline stage (writeback) | | ``WritebackStage`` | bit | 0 | *EXPERIMENTAL* - Enables third pipeline stage (writeback) |
| | | | improving performance of loads and stores | | | | | improving performance of loads and stores |
+------------------------------+-------------+------------+-----------------------------------------------------------------+ +------------------------------+-------------------+------------+-----------------------------------------------------------------+
| ``MultiplierImplementation`` | string | "fast" | Multiplicator type: | | ``MultiplierImplementation`` | string | "fast" | Multiplicator type: |
| | | | "slow": multi-cycle slow, | | | | | "slow": multi-cycle slow, |
| | | | "fast": multi-cycle fast, | | | | | "fast": multi-cycle fast, |
| | | | "single-cycle": single-cycle | | | | | "single-cycle": single-cycle |
+------------------------------+-------------+------------+-----------------------------------------------------------------+ +------------------------------+-------------------+------------+-----------------------------------------------------------------+
| ``ICache`` | bit | 0 | *EXPERIMENTAL* Enable instruction cache instead of prefetch | | ``ICache`` | bit | 0 | *EXPERIMENTAL* Enable instruction cache instead of prefetch |
| | | | buffer | | | | | buffer |
+------------------------------+-------------+------------+-----------------------------------------------------------------+ +------------------------------+-------------------+------------+-----------------------------------------------------------------+
| ``ICacheECC`` | bit | 0 | *EXPERIMENTAL* Enable SECDED ECC protection in ICache (if | | ``ICacheECC`` | bit | 0 | *EXPERIMENTAL* Enable SECDED ECC protection in ICache (if |
| | | | ICache == 1) | | | | | ICache == 1) |
+------------------------------+-------------+------------+-----------------------------------------------------------------+ +------------------------------+-------------------+------------+-----------------------------------------------------------------+
| ``SecureIbex`` | bit | 0 | *EXPERIMENTAL* Enable various additional features targeting | | ``SecureIbex`` | bit | 0 | *EXPERIMENTAL* Enable various additional features targeting |
| | | | secure code execution. | | | | | secure code execution. |
+------------------------------+-------------+------------+-----------------------------------------------------------------+ +------------------------------+-------------------+------------+-----------------------------------------------------------------+
| ``DbgTriggerEn`` | bit | 0 | Enable debug trigger support (one trigger only) | | ``DbgTriggerEn`` | bit | 0 | Enable debug trigger support (one trigger only) |
+------------------------------+-------------+------------+-----------------------------------------------------------------+ +------------------------------+-------------------+------------+-----------------------------------------------------------------+
| ``DmHaltAddr`` | int | 0x1A110800 | Address to jump to when entering Debug Mode | | ``DmHaltAddr`` | int | 0x1A110800 | Address to jump to when entering Debug Mode |
+------------------------------+-------------+------------+-----------------------------------------------------------------+ +------------------------------+-------------------+------------+-----------------------------------------------------------------+
| ``DmExceptionAddr`` | int | 0x1A110808 | Address to jump to when an exception occurs while in Debug Mode | | ``DmExceptionAddr`` | int | 0x1A110808 | Address to jump to when an exception occurs while in Debug Mode |
+------------------------------+-------------+------------+-----------------------------------------------------------------+ +------------------------------+-------------------+------------+-----------------------------------------------------------------+
Any parameter marked *EXPERIMENTAL* when enabled is not verified to the same standard as the rest of the Ibex core. Any parameter marked *EXPERIMENTAL* when enabled is not verified to the same standard as the rest of the Ibex core.

View file

@ -46,6 +46,10 @@ In addition, the following instruction set extensions are available.
- 2.0 - 2.0
- optional - optional
* - **B**: *EXPERIMENTAL* Standard Extension for Bit Manipulation Instructions
- 0.92
- optional
* - **Zicsr**: Control and Status Register Instructions * - **Zicsr**: Control and Status Register Instructions
- 2.0 - 2.0
- always enabled - always enabled

View file

@ -37,10 +37,10 @@ parameters:
description: "Enable the E ISA extension (reduced register set) [0/1]" description: "Enable the E ISA extension (reduced register set) [0/1]"
RV32B: RV32B:
datatype: int datatype: str
paramtype: vlogparam default: ibex_pkg::RV32BNone
default: 0 paramtype: vlogdefine
description: "Enable the B ISA extension (bit manipulation EXPERIMENTAL) [0/1]" description: "Bitmanip implementation parameter enum. See ibex_pkg.sv (EXPERIMENTAL)"
SRAM_INIT_FILE: SRAM_INIT_FILE:
datatype: str datatype: str

View file

@ -65,7 +65,7 @@ PMP_REGIONS := 16
# PMP Granularity # PMP Granularity
PMP_GRANULARITY := 0 PMP_GRANULARITY := 0
IBEX_CONFIG := experimental-maxperf-pmp-bm IBEX_CONFIG := experimental-maxperf-pmp-bmfull
# TODO(udinator) - might need options for SAIL/Whisper/Spike # TODO(udinator) - might need options for SAIL/Whisper/Spike
ifeq (${ISS},ovpsim) ifeq (${ISS},ovpsim)

View file

@ -643,12 +643,22 @@
+pmp_allow_addr_overlap=1 +pmp_allow_addr_overlap=1
rtl_test: core_ibex_base_test rtl_test: core_ibex_base_test
- test: riscv_bitmanip_test - test: riscv_bitmanip_full_test
desc: > desc: >
Random instruction test with supported B extension instructions Random instruction test with supported B extension instructions in full configuration
iterations: 10 iterations: 10
gen_test: riscv_rand_instr_test gen_test: riscv_rand_instr_test
gen_opts: > gen_opts: >
+enable_b_extension=1 +enable_b_extension=1
+enable_bitmanip_groups=zbb,zbt,zbs,zbp,zbf,zbe,zbc,zbr +enable_bitmanip_groups=zbb,zb_tmp,zbt,zbs,zbp,zbf,zbe,zbc,zbr
rtl_test: core_ibex_base_test
- test: riscv_bitmanip_balanced_test
desc: >
Random instruction test with supported B extension instructions in balanced configuration
iterations: 10
gen_test: riscv_rand_instr_test
gen_opts: >
+enable_b_extension=1
+enable_bitmanip_groups=zbb,zb_tmp,zbt,zbs,zbf
rtl_test: core_ibex_base_test rtl_test: core_ibex_base_test

View file

@ -32,12 +32,16 @@ module core_ibex_tb_top;
`define IBEX_MULTIPLIER_IMPLEMENTATION fast `define IBEX_MULTIPLIER_IMPLEMENTATION fast
`endif `endif
`ifndef IBEX_CFG_RV32B
`define IBEX_CFG_RV32B ibex_pkg::RV32BNone
`endif
parameter bit PMPEnable = 1'b0; parameter bit PMPEnable = 1'b0;
parameter int unsigned PMPGranularity = 0; parameter int unsigned PMPGranularity = 0;
parameter int unsigned PMPNumRegions = 4; parameter int unsigned PMPNumRegions = 4;
parameter bit RV32E = 1'b0; parameter bit RV32E = 1'b0;
parameter bit RV32M = 1'b1; parameter bit RV32M = 1'b1;
parameter bit RV32B = 1'b0; parameter ibex_pkg::rv32b_e RV32B = `IBEX_CFG_RV32B;
parameter bit BranchTargetALU = 1'b0; parameter bit BranchTargetALU = 1'b0;
parameter bit WritebackStage = 1'b0; parameter bit WritebackStage = 1'b0;

View file

@ -36,10 +36,10 @@ parameters:
description: "Enable the E ISA extension (reduced register set) [0/1]" description: "Enable the E ISA extension (reduced register set) [0/1]"
RV32B: RV32B:
datatype: int datatype: str
paramtype: vlogparam default: ibex_pkg::RV32BNone
default: 0 paramtype: vlogdefine
description: "Enable the B ISA extension (bit manipulation EXPERIMENTAL) [0/1]" description: "Bitmanip implementation parameter enum. See ibex_pkg.sv (EXPERIMENTAL)"
SRAM_INIT_FILE: SRAM_INIT_FILE:
datatype: str datatype: str

View file

@ -2,6 +2,10 @@
// Licensed under the Apache License, Version 2.0, see LICENSE for details. // Licensed under the Apache License, Version 2.0, see LICENSE for details.
// SPDX-License-Identifier: Apache-2.0 // SPDX-License-Identifier: Apache-2.0
`ifndef RV32B
`define RV32B ibex_pkg::RV32BNone
`endif
/** /**
* Ibex simple system * Ibex simple system
* *
@ -24,7 +28,7 @@ module ibex_simple_system (
parameter int unsigned PMPNumRegions = 4; parameter int unsigned PMPNumRegions = 4;
parameter bit RV32E = 1'b0; parameter bit RV32E = 1'b0;
parameter bit RV32M = 1'b1; parameter bit RV32M = 1'b1;
parameter bit RV32B = 1'b0; parameter ibex_pkg::rv32b_e RV32B = `RV32B;
parameter bit BranchTargetALU = 1'b0; parameter bit BranchTargetALU = 1'b0;
parameter bit WritebackStage = 1'b0; parameter bit WritebackStage = 1'b0;
parameter MultiplierImplementation = "fast"; parameter MultiplierImplementation = "fast";

View file

@ -10,7 +10,7 @@
small: small:
RV32E : 0 RV32E : 0
RV32M : 1 RV32M : 1
RV32B : 0 RV32B : "ibex_pkg::RV32BNone"
BranchTargetALU : 0 BranchTargetALU : 0
WritebackStage : 0 WritebackStage : 0
MultiplierImplementation : "fast" MultiplierImplementation : "fast"
@ -28,7 +28,7 @@ small:
experimental-maxperf: experimental-maxperf:
RV32E : 0 RV32E : 0
RV32M : 1 RV32M : 1
RV32B : 0 RV32B : "ibex_pkg::RV32BNone"
BranchTargetALU : 1 BranchTargetALU : 1
WritebackStage : 1 WritebackStage : 1
MultiplierImplementation : "single-cycle" MultiplierImplementation : "single-cycle"
@ -40,7 +40,7 @@ experimental-maxperf:
experimental-maxperf-pmp: experimental-maxperf-pmp:
RV32E : 0 RV32E : 0
RV32M : 1 RV32M : 1
RV32B : 0 RV32B : "ibex_pkg::RV32BNone"
BranchTargetALU : 1 BranchTargetALU : 1
WritebackStage : 1 WritebackStage : 1
MultiplierImplementation : "single-cycle" MultiplierImplementation : "single-cycle"
@ -48,14 +48,27 @@ experimental-maxperf-pmp:
PMPGranularity : 0 PMPGranularity : 0
PMPNumRegions : 16 PMPNumRegions : 16
# experimental-maxperf-pmp config above with bitmanip extension # experimental-maxperf-pmp config above with balanced bitmanip extension
experimental-maxperf-pmp-bm: experimental-maxperf-pmp-bmbalanced:
RV32E : 0 RV32E : 0
RV32M : 1 RV32M : 1
RV32B : 1 RV32B : "ibex_pkg::RV32BBalanced"
BranchTargetALU : 1 BranchTargetALU : 1
WritebackStage : 1 WritebackStage : 1
MultiplierImplementation : "single-cycle" MultiplierImplementation : "single-cycle"
PMPEnable : 1 PMPEnable : 1
PMPGranularity : 0 PMPGranularity : 0
PMPNumRegions : 16 PMPNumRegions : 16
# experimental-maxperf-pmp config above with full bitmanip extension
experimental-maxperf-pmp-bmfull:
RV32E : 0
RV32M : 1
RV32B : "ibex_pkg::RV32BFull"
BranchTargetALU : 1
WritebackStage : 1
MultiplierImplementation : "single-cycle"
PMPEnable : 1
PMPGranularity : 0
PMPNumRegions : 16

View file

@ -72,9 +72,10 @@ parameters:
paramtype: vlogparam paramtype: vlogparam
RV32B: RV32B:
datatype: int datatype: str
default: 0 default: ibex_pkg::RV32BNone
paramtype: vlogparam paramtype: vlogdefine
description: "Bitmanip implementation parameter enum. See ibex_pkg.sv (EXPERIMENTAL)"
MultiplierImplementation: MultiplierImplementation:
datatype: str datatype: str

View file

@ -43,9 +43,10 @@ parameters:
paramtype: vlogparam paramtype: vlogparam
RV32B: RV32B:
datatype: int datatype: str
default: 0 default: ibex_pkg::RV32BNone
paramtype: vlogparam paramtype: vlogdefine
description: "Bitmanip implementation parameter enum. See ibex_pkg.sv (EXPERIMENTAL)"
MultiplierImplementation: MultiplierImplementation:
datatype: str datatype: str

View file

@ -37,12 +37,18 @@ lint_off -rule UNUSED -file "*/rtl/ibex_alu.sv" -match "*'shift_amt_compl'[5]*"
// cleaner to write all bits even if not all are used // cleaner to write all bits even if not all are used
lint_off -rule UNUSED -file "*/rtl/ibex_alu.sv" -match "*'shift_result_ext'[32]*" lint_off -rule UNUSED -file "*/rtl/ibex_alu.sv" -match "*'shift_result_ext'[32]*"
// Signal is not used for RV32B == 0: imd_val_q_i // Signal is not used for RV32B == RV32BNone: imd_val_q_i
// //
// No ALU multicycle instructions exist to use the intermediate value register, // No ALU multicycle instructions exist to use the intermediate value register,
// if bitmanipulation extension is not enabled. // if bitmanipulation extension is not enabled.
lint_off -rule UNUSED -file "*/rtl/ibex_alu.sv" -match "*'imd_val_q_i'" lint_off -rule UNUSED -file "*/rtl/ibex_alu.sv" -match "*'imd_val_q_i'"
// Signal is not used for RV32B == RV32BNone: butterfly_result, invbutterfly_result
//
// Need to be declared; referenced in unused if-generate block
lint_off -rule UNUSED -file "*/rtl/ibex_alu.sv" -match "*'butterfly_result'"
lint_off -rule UNUSED -file "*/rtl/ibex_alu.sv" -match "*'invbutterfly_result'"
// Bits of signal are not used: fetch_addr_n[0] // Bits of signal are not used: fetch_addr_n[0]
// cleaner to write all bits even if not all are used // cleaner to write all bits even if not all are used
lint_off -rule UNUSED -file "*/rtl/ibex_if_stage.sv" -match "*'fetch_addr_n'[0]*" lint_off -rule UNUSED -file "*/rtl/ibex_if_stage.sv" -match "*'fetch_addr_n'[0]*"

View file

@ -7,7 +7,7 @@
* Arithmetic logic unit * Arithmetic logic unit
*/ */
module ibex_alu #( module ibex_alu #(
parameter bit RV32B = 1'b0 parameter ibex_pkg::rv32b_e RV32B = ibex_pkg::RV32BNone
) ( ) (
input ibex_pkg::alu_op_e operator_i, input ibex_pkg::alu_op_e operator_i,
input logic [31:0] operand_a_i, input logic [31:0] operand_a_i,
@ -20,9 +20,9 @@ module ibex_alu #(
input logic multdiv_sel_i, input logic multdiv_sel_i,
input logic [31:0] imd_val_q_i, input logic [31:0] imd_val_q_i[2],
output logic [31:0] imd_val_d_o, output logic [31:0] imd_val_d_o[2],
output logic imd_val_we_o, output logic [1:0] imd_val_we_o,
output logic [31:0] adder_result_o, output logic [31:0] adder_result_o,
output logic [33:0] adder_result_ext_o, output logic [33:0] adder_result_ext_o,
@ -241,16 +241,16 @@ module ibex_alu #(
logic [31:0] bfp_result; logic [31:0] bfp_result;
// bfp: shares the shifter structure to compute bfp_mask << bfp_off // bfp: shares the shifter structure to compute bfp_mask << bfp_off
assign bfp_op = RV32B ? (operator_i == ALU_BFP) : 1'b0; assign bfp_op = (RV32B != RV32BNone) ? (operator_i == ALU_BFP) : 1'b0;
assign bfp_len = {~(|operand_b_i[27:24]), operand_b_i[27:24]}; // len = 0 encodes for len = 16 assign bfp_len = {~(|operand_b_i[27:24]), operand_b_i[27:24]}; // len = 0 encodes for len = 16
assign bfp_off = operand_b_i[20:16]; assign bfp_off = operand_b_i[20:16];
assign bfp_mask = RV32B ? ~(32'hffff_ffff << bfp_len) : '0; assign bfp_mask = (RV32B != RV32BNone) ? ~(32'hffff_ffff << bfp_len) : '0;
for (genvar i=0; i<32; i++) begin : gen_rev_bfp_mask for (genvar i=0; i<32; i++) begin : gen_rev_bfp_mask
assign bfp_mask_rev[i] = bfp_mask[31-i]; assign bfp_mask_rev[i] = bfp_mask[31-i];
end end
assign bfp_result = assign bfp_result =(RV32B != RV32BNone) ?
RV32B ? (~shift_result & operand_a_i) | ((operand_b_i & bfp_mask) << bfp_off) : '0; (~shift_result & operand_a_i) | ((operand_b_i & bfp_mask) << bfp_off) : '0;
// bit shift_amt[5]: word swap bit: only considered for FSL/FSR. // bit shift_amt[5]: word swap bit: only considered for FSL/FSR.
// if set, reverse operations in first and second cycle. // if set, reverse operations in first and second cycle.
@ -267,9 +267,8 @@ module ibex_alu #(
end end
end end
// single-bit mode: shift // single-bit mode: shift
assign shift_sbmode = RV32B ? assign shift_sbmode = (RV32B != RV32BNone) ?
(operator_i == ALU_SBSET) | (operator_i == ALU_SBCLR) | (operator_i == ALU_SBINV) : 1'b0; (operator_i == ALU_SBSET) | (operator_i == ALU_SBCLR) | (operator_i == ALU_SBINV) : 1'b0;
// left shift if this is: // left shift if this is:
@ -284,13 +283,13 @@ module ibex_alu #(
unique case (operator_i) unique case (operator_i)
ALU_SLL: shift_left = 1'b1; ALU_SLL: shift_left = 1'b1;
ALU_SLO, ALU_SLO,
ALU_BFP: shift_left = RV32B ? 1'b1 : 1'b0; ALU_BFP: shift_left = (RV32B != RV32BNone) ? 1'b1 : 1'b0;
ALU_ROL: shift_left = RV32B ? instr_first_cycle_i : 0; ALU_ROL: shift_left = (RV32B != RV32BNone) ? instr_first_cycle_i : 0;
ALU_ROR: shift_left = RV32B ? ~instr_first_cycle_i : 0; ALU_ROR: shift_left = (RV32B != RV32BNone) ? ~instr_first_cycle_i : 0;
ALU_FSL: shift_left = ALU_FSL: shift_left = (RV32B != RV32BNone) ?
RV32B ? (shift_amt[5] ? ~instr_first_cycle_i : instr_first_cycle_i) : 1'b0; (shift_amt[5] ? ~instr_first_cycle_i : instr_first_cycle_i) : 1'b0;
ALU_FSR: shift_left = ALU_FSR: shift_left = (RV32B != RV32BNone) ?
RV32B ? (shift_amt[5] ? instr_first_cycle_i : ~instr_first_cycle_i) : 1'b0; (shift_amt[5] ? instr_first_cycle_i : ~instr_first_cycle_i) : 1'b0;
default: shift_left = 1'b0; default: shift_left = 1'b0;
endcase endcase
if (shift_sbmode) begin if (shift_sbmode) begin
@ -299,25 +298,25 @@ module ibex_alu #(
end end
assign shift_arith = (operator_i == ALU_SRA); assign shift_arith = (operator_i == ALU_SRA);
assign shift_ones = RV32B ? (operator_i == ALU_SLO) | (operator_i == ALU_SRO) : 1'b0; assign shift_ones =
assign shift_funnel = RV32B ? (operator_i == ALU_FSL) | (operator_i == ALU_FSR) : 1'b0; (RV32B != RV32BNone) ? (operator_i == ALU_SLO) | (operator_i == ALU_SRO) : 1'b0;
assign shift_funnel =
(RV32B != RV32BNone) ? (operator_i == ALU_FSL) | (operator_i == ALU_FSR) : 1'b0;
// shifter structure. // shifter structure.
always_comb begin always_comb begin
// select shifter input // select shifter input
// for bfp, sbmode and shift_left the corresponding bit-reversed input is chosen. // for bfp, sbmode and shift_left the corresponding bit-reversed input is chosen.
if (shift_sbmode) begin if (RV32B == RV32BNone) begin
shift_result = 32'h8000_0000; // rev(32'h1) shift_result = shift_left ? operand_a_rev : operand_a_i;
end else begin end else begin
unique case (1'b1) unique case (1'b1)
bfp_op: shift_result = bfp_mask_rev; bfp_op: shift_result = bfp_mask_rev;
shift_left: shift_result = operand_a_rev; shift_sbmode: shift_result = 32'h8000_0000;
default: shift_result = operand_a_i; default: shift_result = shift_left ? operand_a_rev : operand_a_i;
endcase endcase
end end
shift_result_ext = shift_result_ext =
$signed({shift_ones | (shift_arith & shift_result[31]), shift_result}) >>> shift_amt[4:0]; $signed({shift_ones | (shift_arith & shift_result[31]), shift_result}) >>> shift_amt[4:0];
@ -350,8 +349,8 @@ module ibex_alu #(
// Logic-with-negate OPs (RV32B Ops) // Logic-with-negate OPs (RV32B Ops)
ALU_XNOR, ALU_XNOR,
ALU_ORN, ALU_ORN,
ALU_ANDN: bwlogic_op_b_negate = RV32B ? 1'b1 : 1'b0; ALU_ANDN: bwlogic_op_b_negate = (RV32B != RV32BNone) ? 1'b1 : 1'b0;
ALU_CMIX: bwlogic_op_b_negate = RV32B ? ~instr_first_cycle_i : 1'b0; ALU_CMIX: bwlogic_op_b_negate = (RV32B != RV32BNone) ? ~instr_first_cycle_i : 1'b0;
default: bwlogic_op_b_negate = 1'b0; default: bwlogic_op_b_negate = 1'b0;
endcase endcase
end end
@ -373,19 +372,19 @@ module ibex_alu #(
endcase endcase
end end
logic [5:0] bitcnt_result;
logic [31:0] minmax_result;
logic [31:0] pack_result;
logic [31:0] sext_result;
logic [31:0] singlebit_result;
logic [31:0] rev_result;
logic [31:0] shuffle_result; logic [31:0] shuffle_result;
logic [31:0] butterfly_result; logic [31:0] butterfly_result;
logic [31:0] invbutterfly_result; logic [31:0] invbutterfly_result;
logic [31:0] minmax_result;
logic [5:0] bitcnt_result;
logic [31:0] pack_result;
logic [31:0] sext_result;
logic [31:0] multicycle_result;
logic [31:0] singlebit_result;
logic [31:0] clmul_result; logic [31:0] clmul_result;
logic [31:0] multicycle_result;
if (RV32B) begin : g_alu_rvb if (RV32B != RV32BNone) begin : g_alu_rvb
///////////////// /////////////////
// Bitcounting // // Bitcounting //
@ -404,6 +403,8 @@ module ibex_alu #(
logic [31:0] bitcnt_mask_op; logic [31:0] bitcnt_mask_op;
logic [31:0] bitcnt_bit_mask; logic [31:0] bitcnt_bit_mask;
logic [ 5:0] bitcnt_partial [32]; logic [ 5:0] bitcnt_partial [32];
logic [31:0] bitcnt_partial_lsb_d;
logic [31:0] bitcnt_partial_msb_d;
assign bitcnt_ctz = operator_i == ALU_CTZ; assign bitcnt_ctz = operator_i == ALU_CTZ;
@ -427,6 +428,8 @@ module ibex_alu #(
bitcnt_bit_mask = ~bitcnt_bit_mask; bitcnt_bit_mask = ~bitcnt_bit_mask;
end end
assign zbe_op = (operator_i == ALU_BEXT) | (operator_i == ALU_BDEP);
always_comb begin always_comb begin
case(1'b1) case(1'b1)
zbe_op: bitcnt_bits = operand_b_i; zbe_op: bitcnt_bits = operand_b_i;
@ -518,207 +521,118 @@ module ibex_alu #(
end end
/////////////// ///////////////
// Butterfly // // Min / Max //
/////////////// ///////////////
// The butterfly / inverse butterfly network is shared between bext/bdep (zbe)instructions assign minmax_result = cmp_result ? operand_a_i : operand_b_i;
// respectively and grev / gorc instructions (zbp).
// For bdep, the control bits mask of a local left region is generated by
// the inverse of a n-bit left rotate and complement upon wrap (LROTC) operation by the number
// of ones in the deposit bitmask to the right of the segment. n hereby denotes the width
// of the according segment. The bitmask for a pertaining local right region is equal to the
// corresponding local left region. Bext uses an analogue inverse process.
// Consider the following 8-bit example. For details, see Hilewitz et al. "Fast Bit Gather,
// Bit Scatter and Bit Permuation Instructions for Commodity Microprocessors", (2008).
// 8-bit example: (Hilewitz et al.) //////////
// Consider the instruction bdep operand_a_i deposit_mask // Pack //
// Let operand_a_i = 8'babcd_efgh //////////
// deposit_mask = 8'b1010_1101
//
// control bitmask for stage 1:
// - number of ones in the right half of the deposit bitmask: 3
// - width of the segment: 4
// - control bitmask = ~LROTC(4'b0, 3)[3:0] = 4'b1000
//
// control bitmask: c3 c2 c1 c0 c3 c2 c1 c0
// 1 0 0 0 1 0 0 0
// <- L -----> <- R ----->
// operand_a_i a b c d e f g h
// :\ | | | /: | | |
// : +|---|--|-+ : | | |
// :/ | | | \: | | |
// stage 1 e b c d a f g h
// <L-> <R-> <L-> <R->
// control bitmask: c3 c2 c3 c2 c1 c0 c1 c0
// 1 1 1 1 1 0 1 0
// :\ :\ /: /: :\ | /: |
// : +:-+-:+ : : +|-+ : |
// :/ :/ \: \: :/ | \: |
// stage 2 c d e b g f a h
// L R L R L R L R
// control bitmask: c3 c3 c2 c2 c1 c1 c0 c0
// 1 1 0 0 1 1 0 0
// :\/: | | :\/: | |
// : : | | : : | |
// :/\: | | :/\: | |
// stage 3 d c e b f g a h
// & deposit bitmask: 1 0 1 0 1 1 0 1
// result: d 0 e 0 f g 0 h
assign zbe_op = (operator_i == ALU_BEXT) | (operator_i == ALU_BDEP); logic packu;
logic packh;
assign packu = operator_i == ALU_PACKU;
assign packh = operator_i == ALU_PACKH;
logic [31:0] butterfly_mask_l[5]; always_comb begin
logic [31:0] butterfly_mask_r[5]; unique case (1'b1)
logic [31:0] butterfly_mask_not[5]; packu: pack_result = {operand_b_i[31:16], operand_a_i[31:16]};
logic [31:0] lrotc_stage [5]; // left rotate and complement upon wrap packh: pack_result = {16'h0, operand_b_i[7:0], operand_a_i[7:0]};
default: pack_result = {operand_b_i[15:0], operand_a_i[15:0]};
endcase
end
// bext / bdep //////////
logic [31:0] butterfly_zbe_mask_l[5]; // Sext //
logic [31:0] butterfly_zbe_mask_r[5]; //////////
logic [31:0] butterfly_zbe_mask_not[5];
// grev / gorc assign sext_result = (operator_i == ALU_SEXTB) ?
logic [31:0] butterfly_zbp_mask_l[5]; { {24{operand_a_i[7]}}, operand_a_i[7:0]} : { {16{operand_a_i[15]}}, operand_a_i[15:0]};
logic [31:0] butterfly_zbp_mask_r[5];
logic [31:0] butterfly_zbp_mask_not[5];
logic grev_op; /////////////////////////////
// Single-bit Instructions //
/////////////////////////////
always_comb begin
unique case (operator_i)
ALU_SBSET: singlebit_result = operand_a_i | shift_result;
ALU_SBCLR: singlebit_result = operand_a_i & ~shift_result;
ALU_SBINV: singlebit_result = operand_a_i ^ shift_result;
default: singlebit_result = {31'h0, shift_result[0]}; // ALU_SBEXT
endcase
end
////////////////////////////////////
// General Reverse and Or-combine //
////////////////////////////////////
// Only a subset of the General reverse and or-combine instructions are implemented in the
// balanced version of the B extension. Currently rev, rev8 and orc.b are supported in the
// base extension.
logic [4:0] zbp_shift_amt;
logic gorc_op; logic gorc_op;
logic zbp_op;
// number of bits in local r = 32 / 2**(stage + 1) = 16/2**stage assign gorc_op = (operator_i == ALU_GORC);
`define _N(stg) (16 >> stg) assign zbp_shift_amt[2:0] = (RV32B == RV32BFull) ? shift_amt[2:0] : {3{&shift_amt[2:0]}};
assign zbp_shift_amt[4:3] = (RV32B == RV32BFull) ? shift_amt[4:3] : {2{&shift_amt[4:3]}};
// bext / bdep control bit generation
for (genvar stg=0; stg<5; stg++) begin : gen_stage
// number of segs: 2** stg
for (genvar seg=0; seg<2**stg; seg++) begin : gen_segment
assign lrotc_stage[stg][2*`_N(stg)*(seg+1)-1 : 2*`_N(stg)*seg] =
{{`_N(stg){1'b0}},{`_N(stg){1'b1}}} <<
bitcnt_partial[`_N(stg)*(2*seg+1)-1][$clog2(`_N(stg)):0];
assign butterfly_zbe_mask_l[stg][`_N(stg)*(2*seg+2)-1 : `_N(stg)*(2*seg+1)]
= ~lrotc_stage[stg][`_N(stg)*(2*seg+2)-1 : `_N(stg)*(2*seg+1)];
assign butterfly_zbe_mask_r[stg][`_N(stg)*(2*seg+1)-1 : `_N(stg)*(2*seg)]
= ~lrotc_stage[stg][`_N(stg)*(2*seg+2)-1 : `_N(stg)*(2*seg+1)];
assign butterfly_zbe_mask_l[stg][`_N(stg)*(2*seg+1)-1 : `_N(stg)*(2*seg)] = '0;
assign butterfly_zbe_mask_r[stg][`_N(stg)*(2*seg+2)-1 : `_N(stg)*(2*seg+1)] = '0;
end
end
`undef _N
for (genvar stg=0; stg<5; stg++) begin : gen_zbe_mask
assign butterfly_zbe_mask_not[stg] =
~(butterfly_zbe_mask_l[stg] | butterfly_zbe_mask_r[stg]);
end
// grev / gorc control bit generation
assign butterfly_zbp_mask_l[0] = shift_amt[4] ? 32'hffff_0000 : 32'h0000_0000;
assign butterfly_zbp_mask_r[0] = shift_amt[4] ? 32'h0000_ffff : 32'h0000_0000;
assign butterfly_zbp_mask_not[0] =
!shift_amt[4] || (shift_amt[4] && gorc_op) ? 32'hffff_ffff : 32'h0000_0000;
assign butterfly_zbp_mask_l[1] = shift_amt[3] ? 32'hff00_ff00 : 32'h0000_0000;
assign butterfly_zbp_mask_r[1] = shift_amt[3] ? 32'h00ff_00ff : 32'h0000_0000;
assign butterfly_zbp_mask_not[1] =
!shift_amt[3] || (shift_amt[3] && gorc_op) ? 32'hffff_ffff : 32'h0000_0000;
assign butterfly_zbp_mask_l[2] = shift_amt[2] ? 32'hf0f0_f0f0 : 32'h0000_0000;
assign butterfly_zbp_mask_r[2] = shift_amt[2] ? 32'h0f0f_0f0f : 32'h0000_0000;
assign butterfly_zbp_mask_not[2] =
!shift_amt[2] || (shift_amt[2] && gorc_op) ? 32'hffff_ffff : 32'h0000_0000;
assign butterfly_zbp_mask_l[3] = shift_amt[1] ? 32'hcccc_cccc : 32'h0000_0000;
assign butterfly_zbp_mask_r[3] = shift_amt[1] ? 32'h3333_3333 : 32'h0000_0000;
assign butterfly_zbp_mask_not[3] =
!shift_amt[1] || (shift_amt[1] && gorc_op) ? 32'hffff_ffff : 32'h0000_0000;
assign butterfly_zbp_mask_l[4] = shift_amt[0] ? 32'haaaa_aaaa : 32'h0000_0000;
assign butterfly_zbp_mask_r[4] = shift_amt[0] ? 32'h5555_5555 : 32'h0000_0000;
assign butterfly_zbp_mask_not[4] =
!shift_amt[0] || (shift_amt[0] && gorc_op) ? 32'hffff_ffff : 32'h0000_0000;
// grev / gorc instructions
assign grev_op = RV32B ? (operator_i == ALU_GREV) : 1'b0;
assign gorc_op = RV32B ? (operator_i == ALU_GORC) : 1'b0;
assign zbp_op = grev_op | gorc_op;
// select set of masks:
assign butterfly_mask_l = zbp_op ? butterfly_zbp_mask_l : butterfly_zbe_mask_l;
assign butterfly_mask_r = zbp_op ? butterfly_zbp_mask_r : butterfly_zbe_mask_r;
assign butterfly_mask_not = zbp_op ? butterfly_zbp_mask_not : butterfly_zbe_mask_not;
always_comb begin always_comb begin
butterfly_result = operand_a_i; rev_result = operand_a_i;
butterfly_result = butterfly_result & butterfly_mask_not[0] | if (zbp_shift_amt[0]) begin
((butterfly_result & butterfly_mask_l[0]) >> 16)| rev_result = (gorc_op ? rev_result : 32'h0) |
((butterfly_result & butterfly_mask_r[0]) << 16); ((rev_result & 32'h5555_5555) << 1) |
((rev_result & 32'haaaa_aaaa) >> 1);
end
butterfly_result = butterfly_result & butterfly_mask_not[1] | if (zbp_shift_amt[1]) begin
((butterfly_result & butterfly_mask_l[1]) >> 8)| rev_result = (gorc_op ? rev_result : 32'h0) |
((butterfly_result & butterfly_mask_r[1]) << 8); ((rev_result & 32'h3333_3333) << 2) |
((rev_result & 32'hcccc_cccc) >> 2);
end
butterfly_result = butterfly_result & butterfly_mask_not[2] | if (zbp_shift_amt[2]) begin
((butterfly_result & butterfly_mask_l[2]) >> 4)| rev_result = (gorc_op ? rev_result : 32'h0) |
((butterfly_result & butterfly_mask_r[2]) << 4); ((rev_result & 32'h0f0f_0f0f) << 4) |
((rev_result & 32'hf0f0_f0f0) >> 4);
end
butterfly_result = butterfly_result & butterfly_mask_not[3] | if (zbp_shift_amt[3]) begin
((butterfly_result & butterfly_mask_l[3]) >> 2)| rev_result = (gorc_op & (RV32B == RV32BFull) ? rev_result : 32'h0) |
((butterfly_result & butterfly_mask_r[3]) << 2); ((rev_result & 32'h00ff_00ff) << 8) |
((rev_result & 32'hff00_ff00) >> 8);
end
butterfly_result = butterfly_result & butterfly_mask_not[4] | if (zbp_shift_amt[4]) begin
((butterfly_result & butterfly_mask_l[4]) >> 1)| rev_result = (gorc_op & (RV32B == RV32BFull) ? rev_result : 32'h0) |
((butterfly_result & butterfly_mask_r[4]) << 1); ((rev_result & 32'h0000_ffff) << 16) |
((rev_result & 32'hffff_0000) >> 16);
if (!zbp_op) begin
butterfly_result = butterfly_result & operand_b_i;
end end
end end
always_comb begin logic crc_hmode;
invbutterfly_result = operand_a_i & operand_b_i; logic crc_bmode;
logic [31:0] clmul_result_rev;
invbutterfly_result = invbutterfly_result & butterfly_mask_not[4] | if (RV32B == RV32BFull) begin : gen_alu_rvb_full
((invbutterfly_result & butterfly_mask_l[4]) >> 1)|
((invbutterfly_result & butterfly_mask_r[4]) << 1);
invbutterfly_result = invbutterfly_result & butterfly_mask_not[3] |
((invbutterfly_result & butterfly_mask_l[3]) >> 2)|
((invbutterfly_result & butterfly_mask_r[3]) << 2);
invbutterfly_result = invbutterfly_result & butterfly_mask_not[2] |
((invbutterfly_result & butterfly_mask_l[2]) >> 4)|
((invbutterfly_result & butterfly_mask_r[2]) << 4);
invbutterfly_result = invbutterfly_result & butterfly_mask_not[1] |
((invbutterfly_result & butterfly_mask_l[1]) >> 8)|
((invbutterfly_result & butterfly_mask_r[1]) << 8);
invbutterfly_result = invbutterfly_result & butterfly_mask_not[0] |
((invbutterfly_result & butterfly_mask_l[0]) >> 16)|
((invbutterfly_result & butterfly_mask_r[0]) << 16);
end
///////////////////////// /////////////////////////
// Shuffle / Unshuffle // // Shuffle / Unshuffle //
///////////////////////// /////////////////////////
localparam logic [31:0] SHUFFLE_MASK_L [4] = localparam logic [31:0] SHUFFLE_MASK_L [0:3] =
'{32'h4444_4444, 32'h3030_3030, 32'h0f00_0f00, 32'h00ff_0000}; '{32'h00ff_0000, 32'h0f00_0f00, 32'h3030_3030, 32'h4444_4444};
localparam logic [31:0] SHUFFLE_MASK_R [4] = localparam logic [31:0] SHUFFLE_MASK_R [0:3] =
'{32'h2222_2222, 32'h0c0c_0c0c, 32'h00f0_00f0, 32'h0000_ff00}; '{32'h0000_ff00, 32'h00f0_00f0, 32'h0c0c_0c0c, 32'h2222_2222};
localparam logic [31:0] FLIP_MASK_L [4] = localparam logic [31:0] FLIP_MASK_L [0:3] =
'{32'h1100_0000, 32'h4411_0000, 32'h0044_0000, 32'h2200_1100}; '{32'h2200_1100, 32'h0044_0000, 32'h4411_0000, 32'h1100_0000};
localparam logic [31:0] FLIP_MASK_R [4] = localparam logic [31:0] FLIP_MASK_R [0:3] =
'{32'h0000_0088, 32'h0000_8822, 32'h0000_2200, 32'h0088_0044}; '{32'h0088_0044, 32'h0000_2200, 32'h0000_8822, 32'h0000_0088};
logic [31:0] SHUFFLE_MASK_NOT [4]; logic [31:0] SHUFFLE_MASK_NOT [0:3];
for(genvar i = 0; i < 4; i++) begin : gen_shuffle_mask_not for(genvar i = 0; i < 4; i++) begin : gen_shuffle_mask_not
assign SHUFFLE_MASK_NOT[i] = ~(SHUFFLE_MASK_L[i] | SHUFFLE_MASK_R[i]); assign SHUFFLE_MASK_NOT[i] = ~(SHUFFLE_MASK_L[i] | SHUFFLE_MASK_R[i]);
end end
@ -776,8 +690,199 @@ module ibex_alu #(
((shuffle_result << 15) & FLIP_MASK_L[2]) | ((shuffle_result >> 15) & FLIP_MASK_R[2]) | ((shuffle_result << 15) & FLIP_MASK_L[2]) | ((shuffle_result >> 15) & FLIP_MASK_R[2]) |
((shuffle_result << 21) & FLIP_MASK_L[3]) | ((shuffle_result >> 21) & FLIP_MASK_R[3]); ((shuffle_result << 21) & FLIP_MASK_L[3]) | ((shuffle_result >> 21) & FLIP_MASK_R[3]);
end end
end end
///////////////
// Butterfly //
///////////////
// The butterfly / inverse butterfly network executing bext/bdep (zbe) instructions.
// For bdep, the control bits mask of a local left region is generated by
// the inverse of a n-bit left rotate and complement upon wrap (LROTC) operation by the number
// of ones in the deposit bitmask to the right of the segment. n hereby denotes the width
// of the according segment. The bitmask for a pertaining local right region is equal to the
// corresponding local left region. Bext uses an analogue inverse process.
// Consider the following 8-bit example. For details, see Hilewitz et al. "Fast Bit Gather,
// Bit Scatter and Bit Permuation Instructions for Commodity Microprocessors", (2008).
//
// The bext/bdep instructions are completed in 2 cycles. In the first cycle, the control
// bitmask is prepared by executing the parallel prefix bit count. In the second cycle,
// the bit swapping is executed according to the control masks.
// 8-bit example: (Hilewitz et al.)
// Consider the instruction bdep operand_a_i deposit_mask
// Let operand_a_i = 8'babcd_efgh
// deposit_mask = 8'b1010_1101
//
// control bitmask for stage 1:
// - number of ones in the right half of the deposit bitmask: 3
// - width of the segment: 4
// - control bitmask = ~LROTC(4'b0, 3)[3:0] = 4'b1000
//
// control bitmask: c3 c2 c1 c0 c3 c2 c1 c0
// 1 0 0 0 1 0 0 0
// <- L -----> <- R ----->
// operand_a_i a b c d e f g h
// :\ | | | /: | | |
// : +|---|--|-+ : | | |
// :/ | | | \: | | |
// stage 1 e b c d a f g h
// <L-> <R-> <L-> <R->
// control bitmask: c3 c2 c3 c2 c1 c0 c1 c0
// 1 1 1 1 1 0 1 0
// :\ :\ /: /: :\ | /: |
// : +:-+-:+ : : +|-+ : |
// :/ :/ \: \: :/ | \: |
// stage 2 c d e b g f a h
// L R L R L R L R
// control bitmask: c3 c3 c2 c2 c1 c1 c0 c0
// 1 1 0 0 1 1 0 0
// :\/: | | :\/: | |
// : : | | : : | |
// :/\: | | :/\: | |
// stage 3 d c e b f g a h
// & deposit bitmask: 1 0 1 0 1 1 0 1
// result: d 0 e 0 f g 0 h
logic [ 5:0] bitcnt_partial_q [32];
// first cycle
// Store partial bitcnts
for (genvar i=0; i<32; i++) begin : gen_bitcnt_reg_in_lsb
assign bitcnt_partial_lsb_d[i] = bitcnt_partial[i][0];
end
for (genvar i=0; i<16; i++) begin : gen_bitcnt_reg_in_b1
assign bitcnt_partial_msb_d[i] = bitcnt_partial[2*i+1][1];
end
for (genvar i=0; i<8; i++) begin : gen_bitcnt_reg_in_b2
assign bitcnt_partial_msb_d[16+i] = bitcnt_partial[4*i+3][2];
end
for (genvar i=0; i<4; i++) begin : gen_bitcnt_reg_in_b3
assign bitcnt_partial_msb_d[24+i] = bitcnt_partial[8*i+7][3];
end
for (genvar i=0; i<2; i++) begin : gen_bitcnt_reg_in_b4
assign bitcnt_partial_msb_d[28+i] = bitcnt_partial[16*i+15][4];
end
assign bitcnt_partial_msb_d[30] = bitcnt_partial[31][5];
assign bitcnt_partial_msb_d[31] = 1'b0; // unused
// Second cycle
// Load partial bitcnts
always_comb begin
bitcnt_partial_q = '{default: '0};
for (int unsigned i=0; i<32; i++) begin : gen_bitcnt_reg_out_lsb
bitcnt_partial_q[i][0] = imd_val_q_i[0][i];
end
for (int unsigned i=0; i<16; i++) begin : gen_bitcnt_reg_out_b1
bitcnt_partial_q[2*i+1][1] = imd_val_q_i[1][i];
end
for (int unsigned i=0; i<8; i++) begin : gen_bitcnt_reg_out_b2
bitcnt_partial_q[4*i+3][2] = imd_val_q_i[1][16+i];
end
for (int unsigned i=0; i<4; i++) begin : gen_bitcnt_reg_out_b3
bitcnt_partial_q[8*i+7][3] = imd_val_q_i[1][24+i];
end
for (int unsigned i=0; i<2; i++) begin : gen_bitcnt_reg_out_b4
bitcnt_partial_q[16*i+15][4] = imd_val_q_i[1][28+i];
end
bitcnt_partial_q[31][5] = imd_val_q_i[1][30];
end
logic [31:0] butterfly_mask_l[5];
logic [31:0] butterfly_mask_r[5];
logic [31:0] butterfly_mask_not[5];
logic [31:0] lrotc_stage [5]; // left rotate and complement upon wrap
// number of bits in local r = 32 / 2**(stage + 1) = 16/2**stage
`define _N(stg) (16 >> stg)
// bext / bdep control bit generation
for (genvar stg=0; stg<5; stg++) begin : gen_butterfly_ctrl_stage
// number of segs: 2** stg
for (genvar seg=0; seg<2**stg; seg++) begin : gen_butterfly_ctrl
assign lrotc_stage[stg][2*`_N(stg)*(seg+1)-1 : 2*`_N(stg)*seg] =
{{`_N(stg){1'b0}},{`_N(stg){1'b1}}} <<
bitcnt_partial_q[`_N(stg)*(2*seg+1)-1][$clog2(`_N(stg)):0];
assign butterfly_mask_l[stg][`_N(stg)*(2*seg+2)-1 : `_N(stg)*(2*seg+1)]
= ~lrotc_stage[stg][`_N(stg)*(2*seg+2)-1 : `_N(stg)*(2*seg+1)];
assign butterfly_mask_r[stg][`_N(stg)*(2*seg+1)-1 : `_N(stg)*(2*seg)]
= ~lrotc_stage[stg][`_N(stg)*(2*seg+2)-1 : `_N(stg)*(2*seg+1)];
assign butterfly_mask_l[stg][`_N(stg)*(2*seg+1)-1 : `_N(stg)*(2*seg)] = '0;
assign butterfly_mask_r[stg][`_N(stg)*(2*seg+2)-1 : `_N(stg)*(2*seg+1)] = '0;
end
end
`undef _N
for (genvar stg=0; stg<5; stg++) begin : gen_butterfly_not
assign butterfly_mask_not[stg] =
~(butterfly_mask_l[stg] | butterfly_mask_r[stg]);
end
always_comb begin
butterfly_result = operand_a_i;
butterfly_result = butterfly_result & butterfly_mask_not[0] |
((butterfly_result & butterfly_mask_l[0]) >> 16)|
((butterfly_result & butterfly_mask_r[0]) << 16);
butterfly_result = butterfly_result & butterfly_mask_not[1] |
((butterfly_result & butterfly_mask_l[1]) >> 8)|
((butterfly_result & butterfly_mask_r[1]) << 8);
butterfly_result = butterfly_result & butterfly_mask_not[2] |
((butterfly_result & butterfly_mask_l[2]) >> 4)|
((butterfly_result & butterfly_mask_r[2]) << 4);
butterfly_result = butterfly_result & butterfly_mask_not[3] |
((butterfly_result & butterfly_mask_l[3]) >> 2)|
((butterfly_result & butterfly_mask_r[3]) << 2);
butterfly_result = butterfly_result & butterfly_mask_not[4] |
((butterfly_result & butterfly_mask_l[4]) >> 1)|
((butterfly_result & butterfly_mask_r[4]) << 1);
butterfly_result = butterfly_result & operand_b_i;
end
always_comb begin
invbutterfly_result = operand_a_i & operand_b_i;
invbutterfly_result = invbutterfly_result & butterfly_mask_not[4] |
((invbutterfly_result & butterfly_mask_l[4]) >> 1)|
((invbutterfly_result & butterfly_mask_r[4]) << 1);
invbutterfly_result = invbutterfly_result & butterfly_mask_not[3] |
((invbutterfly_result & butterfly_mask_l[3]) >> 2)|
((invbutterfly_result & butterfly_mask_r[3]) << 2);
invbutterfly_result = invbutterfly_result & butterfly_mask_not[2] |
((invbutterfly_result & butterfly_mask_l[2]) >> 4)|
((invbutterfly_result & butterfly_mask_r[2]) << 4);
invbutterfly_result = invbutterfly_result & butterfly_mask_not[1] |
((invbutterfly_result & butterfly_mask_l[1]) >> 8)|
((invbutterfly_result & butterfly_mask_r[1]) << 8);
invbutterfly_result = invbutterfly_result & butterfly_mask_not[0] |
((invbutterfly_result & butterfly_mask_l[0]) >> 16)|
((invbutterfly_result & butterfly_mask_r[0]) << 16);
end
/////////////////////////////////////////////////// ///////////////////////////////////////////////////
// Carry-less Multiply + Cyclic Redundancy Check // // Carry-less Multiply + Cyclic Redundancy Check //
/////////////////////////////////////////////////// ///////////////////////////////////////////////////
@ -851,7 +956,6 @@ module ibex_alu #(
logic [31:0] clmul_xor_stage4[2]; logic [31:0] clmul_xor_stage4[2];
logic [31:0] clmul_result_raw; logic [31:0] clmul_result_raw;
logic [31:0] clmul_result_rev;
for (genvar i=0; i<32; i++) begin: gen_rev_operand_b for (genvar i=0; i<32; i++) begin: gen_rev_operand_b
assign operand_b_rev[i] = operand_b_i[31-i]; assign operand_b_rev[i] = operand_b_i[31-i];
@ -868,8 +972,6 @@ module ibex_alu #(
localparam logic [31:0] CRC32C_MU_REV = 32'hdea7_13f1; localparam logic [31:0] CRC32C_MU_REV = 32'hdea7_13f1;
logic crc_op; logic crc_op;
logic crc_hmode;
logic crc_bmode;
logic crc_cpoly; logic crc_cpoly;
@ -902,7 +1004,7 @@ module ibex_alu #(
// Select clmul input // Select clmul input
always_comb begin always_comb begin
if (crc_op) begin if (crc_op) begin
clmul_op_a = instr_first_cycle_i ? crc_operand : imd_val_q_i; clmul_op_a = instr_first_cycle_i ? crc_operand : imd_val_q_i[0];
clmul_op_b = instr_first_cycle_i ? crc_mu_rev : crc_poly; clmul_op_b = instr_first_cycle_i ? crc_mu_rev : crc_poly;
end else begin end else begin
clmul_op_a = clmul_rmode | clmul_hmode ? operand_a_rev : operand_a_i; clmul_op_a = clmul_rmode | clmul_hmode ? operand_a_rev : operand_a_i;
@ -945,135 +1047,126 @@ module ibex_alu #(
default: clmul_result = clmul_result_raw; default: clmul_result = clmul_result_raw;
endcase endcase
end end
end else begin
assign shuffle_result = '0;
assign butterfly_result = '0;
assign invbutterfly_result = '0;
assign clmul_result = '0;
// support signals
assign bitcnt_partial_lsb_d = '0;
assign bitcnt_partial_msb_d = '0;
assign clmul_result_rev = '0;
assign crc_bmode = '0;
assign crc_hmode = '0;
end
////////////////////////////////////// //////////////////////////////////////
// Multicycle Bitmanip Instructions // // Multicycle Bitmanip Instructions //
////////////////////////////////////// //////////////////////////////////////
// Ternary instructions + Shift Rotations + CRC // Ternary instructions + Shift Rotations + Bit extract/deposit + CRC
// For ternary instructions (zbt), operand_a_i is tied to rs1 in the first cycle and rs3 in the // For ternary instructions (zbt), operand_a_i is tied to rs1 in the first cycle and rs3 in the
// second cycle. operand_b_i is always tied to rs2. // second cycle. operand_b_i is always tied to rs2.
always_comb begin always_comb begin
unique case (operator_i) unique case (operator_i)
ALU_CMOV: begin ALU_CMOV: begin
imd_val_d_o = operand_a_i; multicycle_result = (operand_b_i == 32'h0) ? operand_a_i : imd_val_q_i[0];
multicycle_result = (operand_b_i == 32'h0) ? operand_a_i : imd_val_q_i; imd_val_d_o = '{operand_a_i, 32'h0};
if (instr_first_cycle_i) begin if (instr_first_cycle_i) begin
imd_val_we_o = 1'b1; imd_val_we_o = 2'b01;
end else begin end else begin
imd_val_we_o = 1'b0; imd_val_we_o = 2'b00;
end end
end end
ALU_CMIX: begin ALU_CMIX: begin
multicycle_result = imd_val_q_i | bwlogic_and_result; multicycle_result = imd_val_q_i[0] | bwlogic_and_result;
imd_val_d_o = bwlogic_and_result; imd_val_d_o = '{bwlogic_and_result, 32'h0};
if (instr_first_cycle_i) begin if (instr_first_cycle_i) begin
imd_val_we_o = 1'b1; imd_val_we_o = 2'b01;
end else begin end else begin
imd_val_we_o = 1'b0; imd_val_we_o = 2'b00;
end end
end end
ALU_FSR, ALU_FSL, ALU_FSR, ALU_FSL,
ALU_ROL, ALU_ROR: begin ALU_ROL, ALU_ROR: begin
if (shift_amt[4:0] == 5'h0) begin if (shift_amt[4:0] == 5'h0) begin
multicycle_result = shift_amt[5] ? operand_a_i : imd_val_q_i; multicycle_result = shift_amt[5] ? operand_a_i : imd_val_q_i[0];
end else begin end else begin
multicycle_result = imd_val_q_i | shift_result; multicycle_result = imd_val_q_i[0] | shift_result;
end end
imd_val_d_o = shift_result; imd_val_d_o = '{shift_result, 32'h0};
if (instr_first_cycle_i) begin if (instr_first_cycle_i) begin
imd_val_we_o = 1'b1; imd_val_we_o = 2'b01;
end else begin end else begin
imd_val_we_o = 1'b0; imd_val_we_o = 2'b00;
end end
end end
ALU_CRC32_W, ALU_CRC32C_W, ALU_CRC32_W, ALU_CRC32C_W,
ALU_CRC32_H, ALU_CRC32C_H, ALU_CRC32_H, ALU_CRC32C_H,
ALU_CRC32_B, ALU_CRC32C_B: begin ALU_CRC32_B, ALU_CRC32C_B: begin
imd_val_d_o = clmul_result_rev; if (RV32B == RV32BFull) begin
unique case(1'b1) unique case(1'b1)
crc_bmode: multicycle_result = clmul_result_rev ^ (operand_a_i >> 8); crc_bmode: multicycle_result = clmul_result_rev ^ (operand_a_i >> 8);
crc_hmode: multicycle_result = clmul_result_rev ^ (operand_a_i >> 16); crc_hmode: multicycle_result = clmul_result_rev ^ (operand_a_i >> 16);
default: multicycle_result = clmul_result_rev; default: multicycle_result = clmul_result_rev;
endcase endcase
imd_val_d_o = '{clmul_result_rev, 32'h0};
if (instr_first_cycle_i) begin if (instr_first_cycle_i) begin
imd_val_we_o = 1'b1; imd_val_we_o = 2'b01;
end else begin end else begin
imd_val_we_o = 1'b0; imd_val_we_o = 2'b00;
end
end else begin
imd_val_d_o = '{operand_a_i, 32'h0};
imd_val_we_o = 2'b00;
multicycle_result = '0;
end
end
ALU_BEXT, ALU_BDEP: begin
if (RV32B == RV32BFull) begin
multicycle_result = (operator_i == ALU_BDEP) ? butterfly_result : invbutterfly_result;
imd_val_d_o = '{bitcnt_partial_lsb_d, bitcnt_partial_msb_d};
if (instr_first_cycle_i) begin
imd_val_we_o = 2'b11;
end else begin
imd_val_we_o = 2'b00;
end
end else begin
imd_val_d_o = '{operand_a_i, 32'h0};
imd_val_we_o = 2'b00;
multicycle_result = '0;
end end
end end
default: begin default: begin
imd_val_d_o = operand_a_i; imd_val_d_o = '{operand_a_i, 32'h0};
imd_val_we_o = 1'b0; imd_val_we_o = 2'b00;
multicycle_result = operand_a_i; multicycle_result = '0;
end end
endcase endcase
end end
/////////////////////////////
// Single-bit Instructions //
/////////////////////////////
always_comb begin
unique case (operator_i)
ALU_SBSET: singlebit_result = operand_a_i | shift_result;
ALU_SBCLR: singlebit_result = operand_a_i & ~shift_result;
ALU_SBINV: singlebit_result = operand_a_i ^ shift_result;
default: singlebit_result = {31'h0, shift_result[0]}; // ALU_SBEXT
endcase
end
///////////////
// Min / Max //
///////////////
assign minmax_result = cmp_result ? operand_a_i : operand_b_i;
//////////
// Pack //
//////////
logic packu;
logic packh;
assign packu = operator_i == ALU_PACKU;
assign packh = operator_i == ALU_PACKH;
always_comb begin
unique case (1'b1)
packu: pack_result = {operand_b_i[31:16], operand_a_i[31:16]};
packh: pack_result = {16'h0, operand_b_i[7:0], operand_a_i[7:0]};
default: pack_result = {operand_b_i[15:0], operand_a_i[15:0]};
endcase
end
//////////
// Sext //
//////////
assign sext_result = (operator_i == ALU_SEXTB) ?
{ {24{operand_a_i[7]}}, operand_a_i[7:0]} : { {16{operand_a_i[15]}}, operand_a_i[15:0]};
end else begin : g_no_alu_rvb end else begin : g_no_alu_rvb
// RV32B result signals // RV32B result signals
assign minmax_result = '0;
assign bitcnt_result = '0; assign bitcnt_result = '0;
assign minmax_result = '0;
assign pack_result = '0; assign pack_result = '0;
assign sext_result = '0; assign sext_result = '0;
assign multicycle_result = '0;
assign singlebit_result = '0; assign singlebit_result = '0;
assign rev_result = '0;
assign shuffle_result = '0; assign shuffle_result = '0;
assign butterfly_result = '0; assign butterfly_result = '0;
assign invbutterfly_result = '0; assign invbutterfly_result = '0;
assign clmul_result = '0; assign clmul_result = '0;
assign multicycle_result = '0;
// RV32B support signals // RV32B support signals
assign imd_val_d_o = '0; assign imd_val_d_o = '{default: '0};
assign imd_val_we_o = '0; assign imd_val_we_o = '{default: '0};
end end
//////////////// ////////////////
@ -1130,18 +1223,16 @@ module ibex_alu #(
// Cyclic Redundancy Checks (RV32B) // Cyclic Redundancy Checks (RV32B)
ALU_CRC32_W, ALU_CRC32C_W, ALU_CRC32_W, ALU_CRC32C_W,
ALU_CRC32_H, ALU_CRC32C_H, ALU_CRC32_H, ALU_CRC32C_H,
ALU_CRC32_B, ALU_CRC32C_B: result_o = multicycle_result; ALU_CRC32_B, ALU_CRC32C_B,
// Bit Extract / Deposit (RV32B)
ALU_BEXT, ALU_BDEP: result_o = multicycle_result;
// Single-Bit Bitmanip Operations (RV32B) // Single-Bit Bitmanip Operations (RV32B)
ALU_SBSET, ALU_SBCLR, ALU_SBSET, ALU_SBCLR,
ALU_SBINV, ALU_SBEXT: result_o = singlebit_result; ALU_SBINV, ALU_SBEXT: result_o = singlebit_result;
// Bit Extract / Deposit (RV32B)
ALU_BDEP: result_o = butterfly_result;
ALU_BEXT: result_o = invbutterfly_result;
// General Reverse / Or-combine (RV32B) // General Reverse / Or-combine (RV32B)
ALU_GREV, ALU_GORC: result_o = butterfly_result; ALU_GREV, ALU_GORC: result_o = rev_result;
// Bit Field Place (RV32B) // Bit Field Place (RV32B)
ALU_BFP: result_o = bfp_result; ALU_BFP: result_o = bfp_result;

View file

@ -9,6 +9,10 @@
`include "prim_assert.sv" `include "prim_assert.sv"
`ifndef RV32B
`define RV32B ibex_pkg::RV32BNone
`endif
/** /**
* Top level module of the ibex RISC-V core * Top level module of the ibex RISC-V core
*/ */
@ -20,7 +24,7 @@ module ibex_core #(
parameter int unsigned MHPMCounterWidth = 40, parameter int unsigned MHPMCounterWidth = 40,
parameter bit RV32E = 1'b0, parameter bit RV32E = 1'b0,
parameter bit RV32M = 1'b1, parameter bit RV32M = 1'b1,
parameter bit RV32B = 1'b0, parameter ibex_pkg::rv32b_e RV32B = `RV32B,
parameter bit BranchTargetALU = 1'b0, parameter bit BranchTargetALU = 1'b0,
parameter bit WritebackStage = 1'b0, parameter bit WritebackStage = 1'b0,
parameter MultiplierImplementation = "fast", parameter MultiplierImplementation = "fast",
@ -129,9 +133,9 @@ module ibex_core #(
logic [31:0] pc_if; // Program counter in IF stage logic [31:0] pc_if; // Program counter in IF stage
logic [31:0] pc_id; // Program counter in ID stage logic [31:0] pc_id; // Program counter in ID stage
logic [31:0] pc_wb; // Program counter in WB stage logic [31:0] pc_wb; // Program counter in WB stage
logic [33:0] imd_val_d_ex; // Intermediate register for multicycle Ops logic [33:0] imd_val_d_ex[2]; // Intermediate register for multicycle Ops
logic [33:0] imd_val_q_ex; // Intermediate register for multicycle Ops logic [33:0] imd_val_q_ex[2]; // Intermediate register for multicycle Ops
logic imd_val_we_ex; logic [1:0] imd_val_we_ex;
logic data_ind_timing; logic data_ind_timing;
logic dummy_instr_en; logic dummy_instr_en;

View file

@ -2,10 +2,14 @@
// Licensed under the Apache License, Version 2.0, see LICENSE for details. // Licensed under the Apache License, Version 2.0, see LICENSE for details.
// SPDX-License-Identifier: Apache-2.0 // SPDX-License-Identifier: Apache-2.0
`ifndef RV32B
`define RV32B ibex_pkg::RV32BNone
`endif
/** /**
* Top level module of the ibex RISC-V core with tracing enabled * Top level module of the ibex RISC-V core with tracing enabled
*/ */
module ibex_core_tracing #( module ibex_core_tracing #(
parameter bit PMPEnable = 1'b0, parameter bit PMPEnable = 1'b0,
parameter int unsigned PMPGranularity = 0, parameter int unsigned PMPGranularity = 0,
@ -14,7 +18,7 @@ module ibex_core_tracing #(
parameter int unsigned MHPMCounterWidth = 40, parameter int unsigned MHPMCounterWidth = 40,
parameter bit RV32E = 1'b0, parameter bit RV32E = 1'b0,
parameter bit RV32M = 1'b1, parameter bit RV32M = 1'b1,
parameter bit RV32B = 1'b0, parameter ibex_pkg::rv32b_e RV32B = `RV32B,
parameter bit BranchTargetALU = 1'b0, parameter bit BranchTargetALU = 1'b0,
parameter bit WritebackStage = 1'b0, parameter bit WritebackStage = 1'b0,
parameter MultiplierImplementation = "fast", parameter MultiplierImplementation = "fast",

View file

@ -16,8 +16,8 @@
module ibex_decoder #( module ibex_decoder #(
parameter bit RV32E = 0, parameter bit RV32E = 0,
parameter bit RV32M = 1, parameter bit RV32M = 1,
parameter bit RV32B = 0, parameter bit BranchTargetALU = 0,
parameter bit BranchTargetALU = 0 parameter ibex_pkg::rv32b_e RV32B = ibex_pkg::RV32BNone
) ( ) (
input logic clk_i, input logic clk_i,
input logic rst_ni, input logic rst_ni,
@ -112,7 +112,8 @@ module ibex_decoder #(
logic [4:0] instr_rs3; logic [4:0] instr_rs3;
logic [4:0] instr_rd; logic [4:0] instr_rd;
logic use_rs3; logic use_rs3_d;
logic use_rs3_q;
csr_op_e csr_op; csr_op_e csr_op;
@ -139,11 +140,20 @@ module ibex_decoder #(
// immediate for CSR manipulation (zero extended) // immediate for CSR manipulation (zero extended)
assign zimm_rs1_type_o = { 27'b0, instr_rs1 }; // rs1 assign zimm_rs1_type_o = { 27'b0, instr_rs1 }; // rs1
// the use of rs3 is known one cycle ahead.
always_ff @(posedge clk_i or negedge rst_ni) begin
if (!rst_ni) begin
use_rs3_q <= 1'b0;
end else begin
use_rs3_q <= use_rs3_d;
end
end
// source registers // source registers
assign instr_rs1 = instr[19:15]; assign instr_rs1 = instr[19:15];
assign instr_rs2 = instr[24:20]; assign instr_rs2 = instr[24:20];
assign instr_rs3 = instr[31:27]; assign instr_rs3 = instr[31:27];
assign rf_raddr_a_o = use_rs3 ? instr_rs3 : instr_rs1; // rs3 / rs1 assign rf_raddr_a_o = (use_rs3_q & ~instr_first_cycle_i) ? instr_rs3 : instr_rs1; // rs3 / rs1
assign rf_raddr_b_o = instr_rs2; // rs2 assign rf_raddr_b_o = instr_rs2; // rs2
// destination register // destination register
@ -342,9 +352,9 @@ module ibex_decoder #(
5'b0_0100, // sloi 5'b0_0100, // sloi
5'b0_1001, // sbclri 5'b0_1001, // sbclri
5'b0_0101, // sbseti 5'b0_0101, // sbseti
5'b0_1101: illegal_insn = RV32B ? 1'b0 : 1'b1; // sbinvi 5'b0_1101: illegal_insn = (RV32B != RV32BNone) ? 1'b0 : 1'b1; // sbinvi
5'b0_0001: if (instr[26] == 1'b0) begin 5'b0_0001: if (instr[26] == 1'b0) begin
illegal_insn = RV32B ? 1'b0 : 1'b1; // shfl illegal_insn = (RV32B == RV32BFull) ? 1'b0 : 1'b1; // shfl
end else begin end else begin
illegal_insn = 1'b1; illegal_insn = 1'b1;
end end
@ -354,13 +364,13 @@ module ibex_decoder #(
7'b000_0001, // ctz 7'b000_0001, // ctz
7'b000_0010, // pcnt 7'b000_0010, // pcnt
7'b000_0100, // sext.b 7'b000_0100, // sext.b
7'b000_0101, // sext.h 7'b000_0101: illegal_insn = (RV32B != RV32BNone) ? 1'b0 : 1'b1; // sext.h
7'b001_0000, // crc32.b 7'b001_0000, // crc32.b
7'b001_0001, // crc32.h 7'b001_0001, // crc32.h
7'b001_0010, // crc32.w 7'b001_0010, // crc32.w
7'b001_1000, // crc32c.b 7'b001_1000, // crc32c.b
7'b001_1001, // crc32c.h 7'b001_1001, // crc32c.h
7'b001_1010: illegal_insn = RV32B ? 1'b0 : 1'b1; // crc32c.w 7'b001_1010: illegal_insn = (RV32B == RV32BFull) ? 1'b0 : 1'b1; // crc32c.w
default: illegal_insn = 1'b1; default: illegal_insn = 1'b1;
endcase endcase
@ -371,7 +381,7 @@ module ibex_decoder #(
3'b101: begin 3'b101: begin
if (instr[26]) begin if (instr[26]) begin
illegal_insn = RV32B ? 1'b0 : 1'b1; // fsri illegal_insn = (RV32B != RV32BNone) ? 1'b0 : 1'b1; // fsri
end else begin end else begin
unique case (instr[31:27]) unique case (instr[31:27])
5'b0_0000, // srli 5'b0_0000, // srli
@ -379,15 +389,34 @@ module ibex_decoder #(
5'b0_0100, // sroi 5'b0_0100, // sroi
5'b0_1100, // rori 5'b0_1100, // rori
5'b0_1001: illegal_insn = RV32B ? 1'b0 : 1'b1; // sbexti 5'b0_1001: illegal_insn = (RV32B != RV32BNone) ? 1'b0 : 1'b1; // sbexti
5'b0_1101, // grevi 5'b0_1101: begin
5'b0_0101: illegal_insn = RV32B ? 1'b0 : 1'b1; // gorci if ((RV32B == RV32BFull)) begin
5'b0_0001: if (instr[26] == 1'b0) begin illegal_insn = 1'b0; // grevi
illegal_insn = RV32B ? 1'b0 : 1'b1; // unshfl end else begin
unique case (instr[24:20])
5'b11111, // rev
5'b11000: illegal_insn = (RV32B == RV32BBalanced) ? 1'b0 : 1'b1; // rev8
default: illegal_insn = 1'b1;
endcase
end
end
5'b0_0101: begin
if ((RV32B == RV32BFull)) begin
illegal_insn = 1'b0; // gorci
end else if (instr[24:20] == 5'b00111) begin
illegal_insn = (RV32B == RV32BBalanced) ? 1'b0 : 1'b1; // orc.b
end
end
5'b0_0001: begin
if (instr[26] == 1'b0) begin
illegal_insn = (RV32B == RV32BFull) ? 1'b0 : 1'b1; // unshfl
end else begin end else begin
illegal_insn = 1'b1; illegal_insn = 1'b1;
end end
end
default: illegal_insn = 1'b1; default: illegal_insn = 1'b1;
endcase endcase
@ -403,7 +432,7 @@ module ibex_decoder #(
rf_ren_b_o = 1'b1; rf_ren_b_o = 1'b1;
rf_we = 1'b1; rf_we = 1'b1;
if ({instr[26], instr[13:12]} == {1'b1, 2'b01}) begin if ({instr[26], instr[13:12]} == {1'b1, 2'b01}) begin
illegal_insn = RV32B ? 1'b0 : 1'b1; // cmix / cmov / fsl / fsr illegal_insn = (RV32B != RV32BNone) ? 1'b0 : 1'b1; // cmix / cmov / fsl / fsr
end else begin end else begin
unique case ({instr[31:25], instr[14:12]}) unique case ({instr[31:25], instr[14:12]})
// RV32I ALU operations // RV32I ALU operations
@ -438,6 +467,8 @@ module ibex_decoder #(
{7'b001_0100, 3'b001}, // sbset {7'b001_0100, 3'b001}, // sbset
{7'b011_0100, 3'b001}, // sbinv {7'b011_0100, 3'b001}, // sbinv
{7'b010_0100, 3'b101}, // sbext {7'b010_0100, 3'b101}, // sbext
// RV32B zbf
{7'b010_0100, 3'b111}: illegal_insn = (RV32B != RV32BNone) ? 1'b0 : 1'b1; // bfp
// RV32B zbe // RV32B zbe
{7'b010_0100, 3'b110}, // bdep {7'b010_0100, 3'b110}, // bdep
{7'b000_0100, 3'b110}, // bext {7'b000_0100, 3'b110}, // bext
@ -446,12 +477,10 @@ module ibex_decoder #(
{7'b001_0100, 3'b101}, // gorc {7'b001_0100, 3'b101}, // gorc
{7'b000_0100, 3'b001}, // shfl {7'b000_0100, 3'b001}, // shfl
{7'b000_0100, 3'b101}, // unshfl {7'b000_0100, 3'b101}, // unshfl
// RV32B zbf
{7'b010_0100, 3'b111}, // bfp
// RV32B zbc // RV32B zbc
{7'b000_0101, 3'b001}, // clmul {7'b000_0101, 3'b001}, // clmul
{7'b000_0101, 3'b010}, // clmulr {7'b000_0101, 3'b010}, // clmulr
{7'b000_0101, 3'b011}: illegal_insn = RV32B ? 1'b0 : 1'b1; // clmulh {7'b000_0101, 3'b011}: illegal_insn = (RV32B == RV32BFull) ? 1'b0 : 1'b1; // clmulh
// RV32M instructions // RV32M instructions
{7'b000_0001, 3'b000}: begin // mul {7'b000_0001, 3'b000}: begin // mul
@ -627,7 +656,7 @@ module ibex_decoder #(
opcode_alu = opcode_e'(instr_alu[6:0]); opcode_alu = opcode_e'(instr_alu[6:0]);
use_rs3 = 1'b0; use_rs3_d = 1'b0;
alu_multicycle_o = 1'b0; alu_multicycle_o = 1'b0;
mult_sel_o = 1'b0; mult_sel_o = 1'b0;
div_sel_o = 1'b0; div_sel_o = 1'b0;
@ -774,7 +803,7 @@ module ibex_decoder #(
3'b111: alu_operator_o = ALU_AND; // And with Immediate 3'b111: alu_operator_o = ALU_AND; // And with Immediate
3'b001: begin 3'b001: begin
if (RV32B) begin if (RV32B != RV32BNone) begin
unique case (instr_alu[31:27]) unique case (instr_alu[31:27])
5'b0_0000: alu_operator_o = ALU_SLL; // Shift Left Logical by Immediate 5'b0_0000: alu_operator_o = ALU_SLL; // Shift Left Logical by Immediate
5'b0_0100: alu_operator_o = ALU_SLO; // Shift Left Ones by Immediate 5'b0_0100: alu_operator_o = ALU_SLO; // Shift Left Ones by Immediate
@ -791,29 +820,41 @@ module ibex_decoder #(
7'b000_0100: alu_operator_o = ALU_SEXTB; // sext.b 7'b000_0100: alu_operator_o = ALU_SEXTB; // sext.b
7'b000_0101: alu_operator_o = ALU_SEXTH; // sext.h 7'b000_0101: alu_operator_o = ALU_SEXTH; // sext.h
7'b001_0000: begin 7'b001_0000: begin
if (RV32B == RV32BFull) begin
alu_operator_o = ALU_CRC32_B; // crc32.b alu_operator_o = ALU_CRC32_B; // crc32.b
alu_multicycle_o = 1'b1; alu_multicycle_o = 1'b1;
end end
end
7'b001_0001: begin 7'b001_0001: begin
if (RV32B == RV32BFull) begin
alu_operator_o = ALU_CRC32_H; // crc32.h alu_operator_o = ALU_CRC32_H; // crc32.h
alu_multicycle_o = 1'b1; alu_multicycle_o = 1'b1;
end end
end
7'b001_0010: begin 7'b001_0010: begin
if (RV32B == RV32BFull) begin
alu_operator_o = ALU_CRC32_W; // crc32.w alu_operator_o = ALU_CRC32_W; // crc32.w
alu_multicycle_o = 1'b1; alu_multicycle_o = 1'b1;
end end
end
7'b001_1000: begin 7'b001_1000: begin
if (RV32B == RV32BFull) begin
alu_operator_o = ALU_CRC32C_B; // crc32c.b alu_operator_o = ALU_CRC32C_B; // crc32c.b
alu_multicycle_o = 1'b1; alu_multicycle_o = 1'b1;
end end
end
7'b001_1001: begin 7'b001_1001: begin
if (RV32B == RV32BFull) begin
alu_operator_o = ALU_CRC32C_H; // crc32c.h alu_operator_o = ALU_CRC32C_H; // crc32c.h
alu_multicycle_o = 1'b1; alu_multicycle_o = 1'b1;
end end
end
7'b001_1010: begin 7'b001_1010: begin
if (RV32B == RV32BFull) begin
alu_operator_o = ALU_CRC32C_W; // crc32c.w alu_operator_o = ALU_CRC32C_W; // crc32c.w
alu_multicycle_o = 1'b1; alu_multicycle_o = 1'b1;
end end
end
default: ; default: ;
endcase endcase
end end
@ -826,14 +867,14 @@ module ibex_decoder #(
end end
3'b101: begin 3'b101: begin
if (RV32B) begin if (RV32B != RV32BNone) begin
if (instr_alu[26] == 1'b1) begin if (instr_alu[26] == 1'b1) begin
alu_operator_o = ALU_FSR; alu_operator_o = ALU_FSR;
alu_multicycle_o = 1'b1; alu_multicycle_o = 1'b1;
if (instr_first_cycle_i) begin if (instr_first_cycle_i) begin
use_rs3 = 1'b0; use_rs3_d = 1'b1;
end else begin end else begin
use_rs3 = 1'b1; use_rs3_d = 1'b0;
end end
end else begin end else begin
unique case (instr_alu[31:27]) unique case (instr_alu[31:27])
@ -848,7 +889,11 @@ module ibex_decoder #(
5'b0_1101: alu_operator_o = ALU_GREV; // General Reverse with Imm Control Val 5'b0_1101: alu_operator_o = ALU_GREV; // General Reverse with Imm Control Val
5'b0_0101: alu_operator_o = ALU_GORC; // General Or-combine with Imm Control Val 5'b0_0101: alu_operator_o = ALU_GORC; // General Or-combine with Imm Control Val
// Unshuffle with Immediate Control Value // Unshuffle with Immediate Control Value
5'b0_0001: if (instr_alu[26] == 1'b0) alu_operator_o = ALU_UNSHFL; 5'b0_0001: begin
if (RV32B == RV32BFull) begin
if (instr_alu[26] == 1'b0) alu_operator_o = ALU_UNSHFL;
end
end
default: ; default: ;
endcase endcase
end end
@ -871,42 +916,42 @@ module ibex_decoder #(
alu_op_b_mux_sel_o = OP_B_REG_B; alu_op_b_mux_sel_o = OP_B_REG_B;
if (instr_alu[26]) begin if (instr_alu[26]) begin
if (RV32B) begin if (RV32B != RV32BNone) begin
unique case ({instr_alu[26:25], instr_alu[14:12]}) unique case ({instr_alu[26:25], instr_alu[14:12]})
{2'b11, 3'b001}: begin {2'b11, 3'b001}: begin
alu_operator_o = ALU_CMIX; // cmix alu_operator_o = ALU_CMIX; // cmix
alu_multicycle_o = 1'b1; alu_multicycle_o = 1'b1;
if (instr_first_cycle_i) begin if (instr_first_cycle_i) begin
use_rs3 = 1'b0; use_rs3_d = 1'b1;
end else begin end else begin
use_rs3 = 1'b1; use_rs3_d = 1'b0;
end end
end end
{2'b11, 3'b101}: begin {2'b11, 3'b101}: begin
alu_operator_o = ALU_CMOV; // cmov alu_operator_o = ALU_CMOV; // cmov
alu_multicycle_o = 1'b1; alu_multicycle_o = 1'b1;
if (instr_first_cycle_i) begin if (instr_first_cycle_i) begin
use_rs3 = 1'b0; use_rs3_d = 1'b1;
end else begin end else begin
use_rs3 = 1'b1; use_rs3_d = 1'b0;
end end
end end
{2'b10, 3'b001}: begin {2'b10, 3'b001}: begin
alu_operator_o = ALU_FSL; // fsl alu_operator_o = ALU_FSL; // fsl
alu_multicycle_o = 1'b1; alu_multicycle_o = 1'b1;
if (instr_first_cycle_i) begin if (instr_first_cycle_i) begin
use_rs3 = 1'b0; use_rs3_d = 1'b1;
end else begin end else begin
use_rs3 = 1'b1; use_rs3_d = 1'b0;
end end
end end
{2'b10, 3'b101}: begin {2'b10, 3'b101}: begin
alu_operator_o = ALU_FSR; // fsr alu_operator_o = ALU_FSR; // fsr
alu_multicycle_o = 1'b1; alu_multicycle_o = 1'b1;
if (instr_first_cycle_i) begin if (instr_first_cycle_i) begin
use_rs3 = 1'b0; use_rs3_d = 1'b1;
end else begin end else begin
use_rs3 = 1'b1; use_rs3_d = 1'b0;
end end
end end
default: ; default: ;
@ -927,56 +972,67 @@ module ibex_decoder #(
{7'b010_0000, 3'b101}: alu_operator_o = ALU_SRA; // Shift Right Arithmetic {7'b010_0000, 3'b101}: alu_operator_o = ALU_SRA; // Shift Right Arithmetic
// RV32B ALU Operations // RV32B ALU Operations
{7'b001_0000, 3'b001}: if (RV32B) alu_operator_o = ALU_SLO; // slo {7'b001_0000, 3'b001}: if (RV32B != RV32BNone) alu_operator_o = ALU_SLO; // slo
{7'b001_0000, 3'b101}: if (RV32B) alu_operator_o = ALU_SRO; // sro {7'b001_0000, 3'b101}: if (RV32B != RV32BNone) alu_operator_o = ALU_SRO; // sro
{7'b011_0000, 3'b001}: begin {7'b011_0000, 3'b001}: begin
if (RV32B) begin if (RV32B != RV32BNone) begin
alu_operator_o = ALU_ROL; // rol alu_operator_o = ALU_ROL; // rol
alu_multicycle_o = 1'b1; alu_multicycle_o = 1'b1;
end end
end end
{7'b011_0000, 3'b101}: begin {7'b011_0000, 3'b101}: begin
if (RV32B) begin if (RV32B != RV32BNone) begin
alu_operator_o = ALU_ROR; // ror alu_operator_o = ALU_ROR; // ror
alu_multicycle_o = 1'b1; alu_multicycle_o = 1'b1;
end end
end end
{7'b000_0101, 3'b100}: if (RV32B) alu_operator_o = ALU_MIN; // min {7'b000_0101, 3'b100}: if (RV32B != RV32BNone) alu_operator_o = ALU_MIN; // min
{7'b000_0101, 3'b101}: if (RV32B) alu_operator_o = ALU_MAX; // max {7'b000_0101, 3'b101}: if (RV32B != RV32BNone) alu_operator_o = ALU_MAX; // max
{7'b000_0101, 3'b110}: if (RV32B) alu_operator_o = ALU_MINU; // minu {7'b000_0101, 3'b110}: if (RV32B != RV32BNone) alu_operator_o = ALU_MINU; // minu
{7'b000_0101, 3'b111}: if (RV32B) alu_operator_o = ALU_MAXU; // maxu {7'b000_0101, 3'b111}: if (RV32B != RV32BNone) alu_operator_o = ALU_MAXU; // maxu
{7'b000_0100, 3'b100}: if (RV32B) alu_operator_o = ALU_PACK; // pack {7'b000_0100, 3'b100}: if (RV32B != RV32BNone) alu_operator_o = ALU_PACK; // pack
{7'b010_0100, 3'b100}: if (RV32B) alu_operator_o = ALU_PACKU; // packu {7'b010_0100, 3'b100}: if (RV32B != RV32BNone) alu_operator_o = ALU_PACKU; // packu
{7'b000_0100, 3'b111}: if (RV32B) alu_operator_o = ALU_PACKH; // packh {7'b000_0100, 3'b111}: if (RV32B != RV32BNone) alu_operator_o = ALU_PACKH; // packh
{7'b010_0000, 3'b100}: if (RV32B) alu_operator_o = ALU_XNOR; // xnor {7'b010_0000, 3'b100}: if (RV32B != RV32BNone) alu_operator_o = ALU_XNOR; // xnor
{7'b010_0000, 3'b110}: if (RV32B) alu_operator_o = ALU_ORN; // orn {7'b010_0000, 3'b110}: if (RV32B != RV32BNone) alu_operator_o = ALU_ORN; // orn
{7'b010_0000, 3'b111}: if (RV32B) alu_operator_o = ALU_ANDN; // andn {7'b010_0000, 3'b111}: if (RV32B != RV32BNone) alu_operator_o = ALU_ANDN; // andn
// RV32B zbp
{7'b011_0100, 3'b101}: if (RV32B) alu_operator_o = ALU_GREV; // grev
{7'b001_0100, 3'b101}: if (RV32B) alu_operator_o = ALU_GORC; // grev
{7'b000_0100, 3'b001}: if (RV32B) alu_operator_o = ALU_SHFL; // shfl
{7'b000_0100, 3'b101}: if (RV32B) alu_operator_o = ALU_UNSHFL; // unshfl
// RV32B zbs // RV32B zbs
{7'b010_0100, 3'b001}: if (RV32B) alu_operator_o = ALU_SBCLR; // sbclr {7'b010_0100, 3'b001}: if (RV32B != RV32BNone) alu_operator_o = ALU_SBCLR; // sbclr
{7'b001_0100, 3'b001}: if (RV32B) alu_operator_o = ALU_SBSET; // sbset {7'b001_0100, 3'b001}: if (RV32B != RV32BNone) alu_operator_o = ALU_SBSET; // sbset
{7'b011_0100, 3'b001}: if (RV32B) alu_operator_o = ALU_SBINV; // sbinv {7'b011_0100, 3'b001}: if (RV32B != RV32BNone) alu_operator_o = ALU_SBINV; // sbinv
{7'b010_0100, 3'b101}: if (RV32B) alu_operator_o = ALU_SBEXT; // sbext {7'b010_0100, 3'b101}: if (RV32B != RV32BNone) alu_operator_o = ALU_SBEXT; // sbext
// RV32B zbf
{7'b010_0100, 3'b111}: if (RV32B != RV32BNone) alu_operator_o = ALU_BFP; // bfp
// RV32B zbp
{7'b011_0100, 3'b101}: if (RV32B != RV32BNone) alu_operator_o = ALU_GREV; // grev
{7'b001_0100, 3'b101}: if (RV32B != RV32BNone) alu_operator_o = ALU_GORC; // grev
{7'b000_0100, 3'b001}: if (RV32B == RV32BFull) alu_operator_o = ALU_SHFL; // shfl
{7'b000_0100, 3'b101}: if (RV32B == RV32BFull) alu_operator_o = ALU_UNSHFL; // unshfl
// RV32B zbc // RV32B zbc
{7'b000_0101, 3'b001}: if (RV32B) alu_operator_o = ALU_CLMUL; // clmul {7'b000_0101, 3'b001}: if (RV32B == RV32BFull) alu_operator_o = ALU_CLMUL; // clmul
{7'b000_0101, 3'b010}: if (RV32B) alu_operator_o = ALU_CLMULR; // clmulr {7'b000_0101, 3'b010}: if (RV32B == RV32BFull) alu_operator_o = ALU_CLMULR; // clmulr
{7'b000_0101, 3'b011}: if (RV32B) alu_operator_o = ALU_CLMULH; // clmulh {7'b000_0101, 3'b011}: if (RV32B == RV32BFull) alu_operator_o = ALU_CLMULH; // clmulh
// RV32B zbe // RV32B zbe
{7'b010_0100, 3'b110}: if (RV32B) alu_operator_o = ALU_BDEP; // bdep {7'b010_0100, 3'b110}: begin
{7'b000_0100, 3'b110}: if (RV32B) alu_operator_o = ALU_BEXT; // bext if (RV32B == RV32BFull) begin
// RV32B zbf alu_operator_o = ALU_BDEP; // bdep
{7'b010_0100, 3'b111}: if (RV32B) alu_operator_o = ALU_BFP; // bfp alu_multicycle_o = 1'b1;
end
end
{7'b000_0100, 3'b110}: begin
if (RV32B == RV32BFull) begin
alu_operator_o = ALU_BEXT; // bext
alu_multicycle_o = 1'b1;
end
end
// RV32M instructions, all use the same ALU operation // RV32M instructions, all use the same ALU operation
{7'b000_0001, 3'b000}: begin // mul {7'b000_0001, 3'b000}: begin // mul

View file

@ -10,7 +10,7 @@
*/ */
module ibex_ex_block #( module ibex_ex_block #(
parameter bit RV32M = 1, parameter bit RV32M = 1,
parameter bit RV32B = 0, parameter ibex_pkg::rv32b_e RV32B = ibex_pkg::RV32BNone,
parameter bit BranchTargetALU = 0, parameter bit BranchTargetALU = 0,
parameter MultiplierImplementation = "fast" parameter MultiplierImplementation = "fast"
) ( ) (
@ -41,9 +41,9 @@ module ibex_ex_block #(
input logic data_ind_timing_i, input logic data_ind_timing_i,
// intermediate val reg // intermediate val reg
output logic imd_val_we_o, output logic [1:0] imd_val_we_o,
output logic [33:0] imd_val_d_o, output logic [33:0] imd_val_d_o[2],
input logic [33:0] imd_val_q_i, input logic [33:0] imd_val_q_i[2],
// Outputs // Outputs
output logic [31:0] alu_adder_result_ex_o, // to LSU output logic [31:0] alu_adder_result_ex_o, // to LSU
@ -63,10 +63,11 @@ module ibex_ex_block #(
logic alu_cmp_result, alu_is_equal_result; logic alu_cmp_result, alu_is_equal_result;
logic multdiv_valid; logic multdiv_valid;
logic multdiv_sel; logic multdiv_sel;
logic [31:0] alu_imd_val_d; logic [31:0] alu_imd_val_q[2];
logic alu_imd_val_we; logic [31:0] alu_imd_val_d[2];
logic [33:0] multdiv_imd_val_d; logic [ 1:0] alu_imd_val_we;
logic multdiv_imd_val_we; logic [33:0] multdiv_imd_val_d[2];
logic [ 1:0] multdiv_imd_val_we;
/* /*
The multdiv_i output is never selected if RV32M=0 The multdiv_i output is never selected if RV32M=0
@ -80,9 +81,12 @@ module ibex_ex_block #(
end end
// Intermediate Value Register Mux // Intermediate Value Register Mux
assign imd_val_d_o = multdiv_sel ? multdiv_imd_val_d : {2'b0, alu_imd_val_d}; assign imd_val_d_o[0] = multdiv_sel ? multdiv_imd_val_d[0] : {2'b0, alu_imd_val_d[0]};
assign imd_val_d_o[1] = multdiv_sel ? multdiv_imd_val_d[1] : {2'b0, alu_imd_val_d[1]};
assign imd_val_we_o = multdiv_sel ? multdiv_imd_val_we : alu_imd_val_we; assign imd_val_we_o = multdiv_sel ? multdiv_imd_val_we : alu_imd_val_we;
assign alu_imd_val_q = '{imd_val_q_i[0][31:0], imd_val_q_i[1][31:0]};
assign result_ex_o = multdiv_sel ? multdiv_result : alu_result; assign result_ex_o = multdiv_sel ? multdiv_result : alu_result;
// branch handling // branch handling
@ -117,7 +121,7 @@ module ibex_ex_block #(
.operand_a_i ( alu_operand_a_i ), .operand_a_i ( alu_operand_a_i ),
.operand_b_i ( alu_operand_b_i ), .operand_b_i ( alu_operand_b_i ),
.instr_first_cycle_i ( alu_instr_first_cycle_i ), .instr_first_cycle_i ( alu_instr_first_cycle_i ),
.imd_val_q_i ( imd_val_q_i[31:0] ), .imd_val_q_i ( alu_imd_val_q ),
.imd_val_we_o ( alu_imd_val_we ), .imd_val_we_o ( alu_imd_val_we ),
.imd_val_d_o ( alu_imd_val_d ), .imd_val_d_o ( alu_imd_val_d ),
.multdiv_operand_a_i ( multdiv_alu_operand_a ), .multdiv_operand_a_i ( multdiv_alu_operand_a ),
@ -218,6 +222,6 @@ module ibex_ex_block #(
// Multiplier/divider may require multiple cycles. The ALU output is valid in the same cycle // Multiplier/divider may require multiple cycles. The ALU output is valid in the same cycle
// unless the intermediate result register is being written (which indicates this isn't the // unless the intermediate result register is being written (which indicates this isn't the
// final cycle of ALU operation). // final cycle of ALU operation).
assign ex_valid_o = multdiv_sel ? multdiv_valid : !alu_imd_val_we; assign ex_valid_o = multdiv_sel ? multdiv_valid : ~(|alu_imd_val_we);
endmodule endmodule

View file

@ -19,7 +19,7 @@
module ibex_id_stage #( module ibex_id_stage #(
parameter bit RV32E = 0, parameter bit RV32E = 0,
parameter bit RV32M = 1, parameter bit RV32M = 1,
parameter bit RV32B = 0, parameter ibex_pkg::rv32b_e RV32B = ibex_pkg::RV32BNone,
parameter bit DataIndTiming = 1'b0, parameter bit DataIndTiming = 1'b0,
parameter bit BranchTargetALU = 0, parameter bit BranchTargetALU = 0,
parameter bit SpecBranch = 0, parameter bit SpecBranch = 0,
@ -68,9 +68,9 @@ module ibex_id_stage #(
output logic [31:0] alu_operand_b_ex_o, output logic [31:0] alu_operand_b_ex_o,
// Multicycle Operation Stage Register // Multicycle Operation Stage Register
input logic imd_val_we_ex_i, input logic [1:0] imd_val_we_ex_i,
input logic [33:0] imd_val_d_ex_i, input logic [33:0] imd_val_d_ex_i[2],
output logic [33:0] imd_val_q_ex_o, output logic [33:0] imd_val_q_ex_o[2],
// Branch target ALU // Branch target ALU
output logic [31:0] bt_a_operand_o, output logic [31:0] bt_a_operand_o,
@ -247,7 +247,7 @@ module ibex_id_stage #(
logic alu_multicycle_dec; logic alu_multicycle_dec;
logic stall_alu; logic stall_alu;
logic [33:0] imd_val_q; logic [33:0] imd_val_q[2];
op_a_sel_e bt_a_mux_sel; op_a_sel_e bt_a_mux_sel;
imm_b_sel_e bt_b_mux_sel; imm_b_sel_e bt_b_mux_sel;
@ -379,11 +379,13 @@ module ibex_id_stage #(
// Multicycle Operation Stage Register // // Multicycle Operation Stage Register //
///////////////////////////////////////// /////////////////////////////////////////
for (genvar i=0; i<2; i++) begin : gen_intermediate_val_reg
always_ff @(posedge clk_i or negedge rst_ni) begin : intermediate_val_reg always_ff @(posedge clk_i or negedge rst_ni) begin : intermediate_val_reg
if (!rst_ni) begin if (!rst_ni) begin
imd_val_q <= '0; imd_val_q[i] <= '0;
end else if (imd_val_we_ex_i) begin end else if (imd_val_we_ex_i[i]) begin
imd_val_q <= imd_val_d_ex_i; imd_val_q[i] <= imd_val_d_ex_i[i];
end
end end
end end

View file

@ -35,9 +35,9 @@ module ibex_multdiv_fast #(
output logic [32:0] alu_operand_a_o, output logic [32:0] alu_operand_a_o,
output logic [32:0] alu_operand_b_o, output logic [32:0] alu_operand_b_o,
input logic [33:0] imd_val_q_i, input logic [33:0] imd_val_q_i[2],
output logic [33:0] imd_val_d_o, output logic [33:0] imd_val_d_o[2],
output logic imd_val_we_o, output logic [1:0] imd_val_we_o,
input logic multdiv_ready_id_i, input logic multdiv_ready_id_i,
@ -99,13 +99,11 @@ module ibex_multdiv_fast #(
if (!rst_ni) begin if (!rst_ni) begin
div_counter_q <= '0; div_counter_q <= '0;
md_state_q <= MD_IDLE; md_state_q <= MD_IDLE;
op_denominator_q <= '0;
op_numerator_q <= '0; op_numerator_q <= '0;
op_quotient_q <= '0; op_quotient_q <= '0;
div_by_zero_q <= '0; div_by_zero_q <= '0;
end else if (div_en_internal) begin end else if (div_en_internal) begin
div_counter_q <= div_counter_d; div_counter_q <= div_counter_d;
op_denominator_q <= op_denominator_d;
op_numerator_q <= op_numerator_d; op_numerator_q <= op_numerator_d;
op_quotient_q <= op_quotient_d; op_quotient_q <= op_quotient_d;
md_state_q <= md_state_d; md_state_q <= md_state_d;
@ -113,18 +111,24 @@ module ibex_multdiv_fast #(
end end
end end
`ASSERT_KNOWN(DivEnKnown, div_en_internal); `ASSERT_KNOWN(DivEnKnown, div_en_internal);
`ASSERT_KNOWN(MultEnKnown, mult_en_internal); `ASSERT_KNOWN(MultEnKnown, mult_en_internal);
`ASSERT_KNOWN(MultDivEnKnown, multdiv_en); `ASSERT_KNOWN(MultDivEnKnown, multdiv_en);
assign multdiv_en = mult_en_internal | div_en_internal; assign multdiv_en = mult_en_internal | div_en_internal;
assign imd_val_d_o = div_sel_i ? op_remainder_d : mac_res_d; // Intermediate value register shared with ALU
assign imd_val_we_o = multdiv_en; assign imd_val_d_o[0] = div_sel_i ? op_remainder_d : mac_res_d;
assign imd_val_we_o[0] = multdiv_en;
assign imd_val_d_o[1] = {2'b0, op_denominator_d};
assign imd_val_we_o[1] = div_en_internal;
assign op_denominator_q = imd_val_q_i[1][31:0];
logic [1:0] unused_imd_val;
assign unused_imd_val = imd_val_q_i[1][33:32];
assign signed_mult = (signed_mode_i != 2'b00); assign signed_mult = (signed_mode_i != 2'b00);
assign multdiv_result_o = div_sel_i ? imd_val_q_i[31:0] : mac_res_d[31:0]; assign multdiv_result_o = div_sel_i ? imd_val_q_i[0][31:0] : mac_res_d[31:0];
// The single cycle multiplier uses three 17 bit multipliers to compute MUL instructions in a // The single cycle multiplier uses three 17 bit multipliers to compute MUL instructions in a
// single cycle and MULH instructions in two cycles. // single cycle and MULH instructions in two cycles.
@ -170,8 +174,8 @@ module ibex_multdiv_fast #(
assign mult2_op_b = op_b_i[`OP_H]; assign mult2_op_b = op_b_i[`OP_H];
// used in MULH // used in MULH
assign accum[17:0] = imd_val_q_i[33:16]; assign accum[17:0] = imd_val_q_i[0][33:16];
assign accum[33:18] = {16{signed_mult & imd_val_q_i[33]}}; assign accum[33:18] = {16{signed_mult & imd_val_q_i[0][33]}};
always_comb begin always_comb begin
// Default values == MULL // Default values == MULL
@ -268,7 +272,7 @@ module ibex_multdiv_fast #(
mult_op_b = op_b_i[`OP_L]; mult_op_b = op_b_i[`OP_L];
sign_a = 1'b0; sign_a = 1'b0;
sign_b = 1'b0; sign_b = 1'b0;
accum = imd_val_q_i; accum = imd_val_q_i[0];
mac_res_d = mac_res; mac_res_d = mac_res;
mult_state_d = mult_state_q; mult_state_d = mult_state_q;
mult_valid = 1'b0; mult_valid = 1'b0;
@ -293,10 +297,10 @@ module ibex_multdiv_fast #(
mult_op_b = op_b_i[`OP_H]; mult_op_b = op_b_i[`OP_H];
sign_a = 1'b0; sign_a = 1'b0;
sign_b = signed_mode_i[1] & op_b_i[31]; sign_b = signed_mode_i[1] & op_b_i[31];
// result of AL*BL (in imd_val_q_i) always unsigned with no carry, so carries_q always 00 // result of AL*BL (in imd_val_q_i[0]) always unsigned with no carry, so carries_q always 00
accum = {18'b0, imd_val_q_i[31:16]}; accum = {18'b0, imd_val_q_i[0][31:16]};
if (operator_i == MD_OP_MULL) begin if (operator_i == MD_OP_MULL) begin
mac_res_d = {2'b0, mac_res[`OP_L], imd_val_q_i[`OP_L]}; mac_res_d = {2'b0, mac_res[`OP_L], imd_val_q_i[0][`OP_L]};
end else begin end else begin
// MD_OP_MULH // MD_OP_MULH
mac_res_d = mac_res; mac_res_d = mac_res;
@ -311,15 +315,15 @@ module ibex_multdiv_fast #(
sign_a = signed_mode_i[0] & op_a_i[31]; sign_a = signed_mode_i[0] & op_a_i[31];
sign_b = 1'b0; sign_b = 1'b0;
if (operator_i == MD_OP_MULL) begin if (operator_i == MD_OP_MULL) begin
accum = {18'b0, imd_val_q_i[31:16]}; accum = {18'b0, imd_val_q_i[0][31:16]};
mac_res_d = {2'b0, mac_res[15:0], imd_val_q_i[15:0]}; mac_res_d = {2'b0, mac_res[15:0], imd_val_q_i[0][15:0]};
mult_valid = 1'b1; mult_valid = 1'b1;
// Note no state transition will occur if mult_hold is set // Note no state transition will occur if mult_hold is set
mult_state_d = ALBL; mult_state_d = ALBL;
mult_hold = ~multdiv_ready_id_i; mult_hold = ~multdiv_ready_id_i;
end else begin end else begin
accum = imd_val_q_i; accum = imd_val_q_i[0];
mac_res_d = mac_res; mac_res_d = mac_res;
mult_state_d = AHBH; mult_state_d = AHBH;
end end
@ -332,8 +336,8 @@ module ibex_multdiv_fast #(
mult_op_b = op_b_i[`OP_H]; mult_op_b = op_b_i[`OP_H];
sign_a = signed_mode_i[0] & op_a_i[31]; sign_a = signed_mode_i[0] & op_a_i[31];
sign_b = signed_mode_i[1] & op_b_i[31]; sign_b = signed_mode_i[1] & op_b_i[31];
accum[17: 0] = imd_val_q_i[33:16]; accum[17: 0] = imd_val_q_i[0][33:16];
accum[33:18] = {16{signed_mult & imd_val_q_i[33]}}; accum[33:18] = {16{signed_mult & imd_val_q_i[0][33]}};
// result of AH*BL is not signed only if signed_mode_i == 2'b00 // result of AH*BL is not signed only if signed_mode_i == 2'b00
mac_res_d = mac_res; mac_res_d = mac_res;
mult_valid = 1'b1; mult_valid = 1'b1;
@ -366,7 +370,7 @@ module ibex_multdiv_fast #(
// Divider // Divider
assign res_adder_h = alu_adder_ext_i[33:1]; assign res_adder_h = alu_adder_ext_i[33:1];
assign next_remainder = is_greater_equal ? res_adder_h[31:0] : imd_val_q_i[31:0]; assign next_remainder = is_greater_equal ? res_adder_h[31:0] : imd_val_q_i[0][31:0];
assign next_quotient = is_greater_equal ? {1'b0, op_quotient_q} | {1'b0, one_shift} : assign next_quotient = is_greater_equal ? {1'b0, op_quotient_q} | {1'b0, one_shift} :
{1'b0, op_quotient_q}; {1'b0, op_quotient_q};
@ -376,10 +380,10 @@ module ibex_multdiv_fast #(
// Remainder - Divisor. If Remainder - Divisor >= 0, is_greater_equal is equal to 1, // Remainder - Divisor. If Remainder - Divisor >= 0, is_greater_equal is equal to 1,
// the next Remainder is Remainder - Divisor contained in res_adder_h and the // the next Remainder is Remainder - Divisor contained in res_adder_h and the
always_comb begin always_comb begin
if ((imd_val_q_i[31] ^ op_denominator_q[31]) == 1'b0) begin if ((imd_val_q_i[0][31] ^ op_denominator_q[31]) == 1'b0) begin
is_greater_equal = (res_adder_h[31] == 1'b0); is_greater_equal = (res_adder_h[31] == 1'b0);
end else begin end else begin
is_greater_equal = imd_val_q_i[31]; is_greater_equal = imd_val_q_i[0][31];
end end
end end
@ -391,7 +395,7 @@ module ibex_multdiv_fast #(
always_comb begin always_comb begin
div_counter_d = div_counter_q - 5'h1; div_counter_d = div_counter_q - 5'h1;
op_remainder_d = imd_val_q_i; op_remainder_d = imd_val_q_i[0];
op_quotient_d = op_quotient_q; op_quotient_d = op_quotient_q;
md_state_d = md_state_q; md_state_d = md_state_q;
op_numerator_d = op_numerator_q; op_numerator_d = op_numerator_q;
@ -457,13 +461,13 @@ module ibex_multdiv_fast #(
op_quotient_d = next_quotient[31:0]; op_quotient_d = next_quotient[31:0];
md_state_d = (div_counter_q == 5'd1) ? MD_LAST : MD_COMP; md_state_d = (div_counter_q == 5'd1) ? MD_LAST : MD_COMP;
// Division // Division
alu_operand_a_o = {imd_val_q_i[31:0], 1'b1}; // it contains the remainder alu_operand_a_o = {imd_val_q_i[0][31:0], 1'b1}; // it contains the remainder
alu_operand_b_o = {~op_denominator_q[31:0], 1'b1}; // -denominator two's compliment alu_operand_b_o = {~op_denominator_q[31:0], 1'b1}; // -denominator two's compliment
end end
MD_LAST: begin MD_LAST: begin
if (operator_i == MD_OP_DIV) begin if (operator_i == MD_OP_DIV) begin
// this time we save the quotient in op_remainder_d (i.e. imd_val_q_i) since // this time we save the quotient in op_remainder_d (i.e. imd_val_q_i[0]) since
// we do not need anymore the remainder // we do not need anymore the remainder
op_remainder_d = {1'b0, next_quotient}; op_remainder_d = {1'b0, next_quotient};
end else begin end else begin
@ -471,7 +475,7 @@ module ibex_multdiv_fast #(
op_remainder_d = {2'b0, next_remainder[31:0]}; op_remainder_d = {2'b0, next_remainder[31:0]};
end end
// Division // Division
alu_operand_a_o = {imd_val_q_i[31:0], 1'b1}; // it contains the remainder alu_operand_a_o = {imd_val_q_i[0][31:0], 1'b1}; // it contains the remainder
alu_operand_b_o = {~op_denominator_q[31:0], 1'b1}; // -denominator two's compliment alu_operand_b_o = {~op_denominator_q[31:0], 1'b1}; // -denominator two's compliment
md_state_d = MD_CHANGE_SIGN; md_state_d = MD_CHANGE_SIGN;
@ -480,13 +484,13 @@ module ibex_multdiv_fast #(
MD_CHANGE_SIGN: begin MD_CHANGE_SIGN: begin
md_state_d = MD_FINISH; md_state_d = MD_FINISH;
if (operator_i == MD_OP_DIV) begin if (operator_i == MD_OP_DIV) begin
op_remainder_d = (div_change_sign) ? {2'h0, alu_adder_i} : imd_val_q_i; op_remainder_d = (div_change_sign) ? {2'h0, alu_adder_i} : imd_val_q_i[0];
end else begin end else begin
op_remainder_d = (rem_change_sign) ? {2'h0, alu_adder_i} : imd_val_q_i; op_remainder_d = (rem_change_sign) ? {2'h0, alu_adder_i} : imd_val_q_i[0];
end end
// ABS(Quotient) = 0 - Quotient (or Remainder) // ABS(Quotient) = 0 - Quotient (or Remainder)
alu_operand_a_o = {32'h0 , 1'b1}; alu_operand_a_o = {32'h0 , 1'b1};
alu_operand_b_o = {~imd_val_q_i[31:0], 1'b1}; alu_operand_b_o = {~imd_val_q_i[0][31:0], 1'b1};
end end
MD_FINISH: begin MD_FINISH: begin

View file

@ -31,9 +31,9 @@ module ibex_multdiv_slow
output logic [32:0] alu_operand_a_o, output logic [32:0] alu_operand_a_o,
output logic [32:0] alu_operand_b_o, output logic [32:0] alu_operand_b_o,
input logic [33:0] imd_val_q_i, input logic [33:0] imd_val_q_i[2],
output logic [33:0] imd_val_d_o, output logic [33:0] imd_val_d_o[2],
output logic imd_val_we_o, output logic [1:0] imd_val_we_o,
input logic multdiv_ready_id_i, input logic multdiv_ready_id_i,
@ -50,7 +50,8 @@ module ibex_multdiv_slow
md_fsm_e md_state_q, md_state_d; md_fsm_e md_state_q, md_state_d;
logic [32:0] accum_window_q, accum_window_d; logic [32:0] accum_window_q, accum_window_d;
logic unused_imd_val; logic unused_imd_val0;
logic [ 1:0] unused_imd_val1;
logic [32:0] res_adder_l; logic [32:0] res_adder_l;
logic [32:0] res_adder_h; logic [32:0] res_adder_h;
@ -81,11 +82,16 @@ module ibex_multdiv_slow
// ALU Operand MUX // // ALU Operand MUX //
///////////////////// /////////////////////
// Use shared intermediate value register in id_stage for accum_window // Intermediate value register shared with ALU
assign imd_val_d_o = {1'b0,accum_window_d}; assign imd_val_d_o[0] = {1'b0,accum_window_d};
assign imd_val_we_o = ~multdiv_hold; assign imd_val_we_o[0] = ~multdiv_hold;
assign accum_window_q = imd_val_q_i[32:0]; assign accum_window_q = imd_val_q_i[0][32:0];
assign unused_imd_val = imd_val_q_i[33]; assign unused_imd_val0 = imd_val_q_i[0][33];
assign imd_val_d_o[1] = {2'b00, op_numerator_d};
assign imd_val_we_o[1] = multdiv_en;
assign op_numerator_q = imd_val_q_i[1][31:0];
assign unused_imd_val1 = imd_val_q_i[1][33:32];
always_comb begin always_comb begin
alu_operand_a_o = accum_window_q; alu_operand_a_o = accum_window_q;
@ -328,14 +334,12 @@ module ibex_multdiv_slow
multdiv_count_q <= 5'h0; multdiv_count_q <= 5'h0;
op_b_shift_q <= 33'h0; op_b_shift_q <= 33'h0;
op_a_shift_q <= 33'h0; op_a_shift_q <= 33'h0;
op_numerator_q <= 32'h0;
md_state_q <= MD_IDLE; md_state_q <= MD_IDLE;
div_by_zero_q <= 1'b0; div_by_zero_q <= 1'b0;
end else if (multdiv_en) begin end else if (multdiv_en) begin
multdiv_count_q <= multdiv_count_d; multdiv_count_q <= multdiv_count_d;
op_b_shift_q <= op_b_shift_d; op_b_shift_q <= op_b_shift_d;
op_a_shift_q <= op_a_shift_d; op_a_shift_q <= op_a_shift_d;
op_numerator_q <= op_numerator_d;
md_state_q <= md_state_d; md_state_q <= md_state_d;
div_by_zero_q <= div_by_zero_d; div_by_zero_q <= div_by_zero_d;
end end

View file

@ -8,6 +8,15 @@
*/ */
package ibex_pkg; package ibex_pkg;
/////////////////////////
// RV32B Paramter Enum //
/////////////////////////
typedef enum integer {
RV32BNone,
RV32BBalanced,
RV32BFull
} rv32b_e;
///////////// /////////////
// Opcodes // // Opcodes //