This commit replaces the previous combination of `RV32M` bit parameter
used to en/disable the M extension and the `MultiplierImplementation`
used to select the multiplier implementation by a single enum parameter.
Signed-off-by: Pirmin Vogel <vogelpi@lowrisc.org>
This commit contains some final optimizations regarding the bit
manipulation extension as well as the parametrization into a balanced
version and a full performance version.
Balanced Version:
* Supports ZBB, ZBS, ZBF and ZBT extensions
* Dual cycle instructions:
ror[i], rol, cmov, cmix fsl, fsr[i]
* Everything else completes in a single cycle.
Full Version:
* Supports all 32b sub extensions.
* Dual cycle instructions:
ror[i], rol, cmov, cmix fsl, fsr[i], crc32[c], bext, bdep
* Everything else completes in a single cycle.
Notable Changes:
* bext/bdep are now multi-cycle: Sharing additional register
with multiplier module
* grev/gorc instructions are implemented in separate structures
rather than sharing the shifter or butterfly network.
* Speed up decision on using rs1 or rs3 for alu_operand_a by
introducing single-bit register, to identify ternary
instructions in their first cycle.
* Introduce enumerated parameter to chose bit manipulation
implementation
Signed-off-by: ganoam <gnoam@live.com>
This commit adds support for the shared immediate value register in the
id_stage for the slow implementation of the multdiv module.
Register accum_window_q is now stored in the intermediate value
register.
Signed-off-by: ganoam <gnoam@live.com>
This commits implements the Bit Manipulateion Extension ZBT instruction
group: cmix, cmov, fsr[i] and fsl. Those are instructions depend on
three ALU operands. Completeion of these instructions takes 2 clock
cycles. Additionally, the rotation shifts rol and ror are made
multicycle instructions.
All multicycle instructions take exactly two cycles to complete.
Architectural additions:
* Multicycle Stage Register in ID stage.
multicycle_op_stage_reg
* Decoder generates alu_multicycle signal, to stall pipeline
* For all ternary instructions:
1. cycle: connect alu operands a and b to rs1 and rs2
respectively
2. cycle: connect operands a and be to rs3 and rs2
respectively
* Reduce the physical size of the shifter from 64 bit to 63
bit: 32-bit operand + 1 bit for arithmetic / one-shift
* Make rotation shifts multicycle instructions.
Instruction Details:
* cmov:
1. store operand a (rs1) in stage reg.
2. return stage reg output (rs2) or rs3.
if rs2 != 0 the output (rs1) is already known in the
first cycle. -> variable latency implementation is
possible.
* cmix:
1. store rs1 & rs2 in stage reg
2. return stage_reg_q | (rs2 & ~rs3)
reusing bwlogic from zbb
* rol/ror: (here: ror)
shift_amt = rs2 & 31;
shift_amt_compl = (32 - shift_amt) & 31
1. store (rs1 >> shift_amt) in stage reg
2. return (rs1 << shift_amt_compl) | stage_reg_q
* fsl/fsr:
For funnel shifts, the order of applying the shift
amount or its complement is determined by bit [5] of
shift_amt. Pseudocode for fsr:
shift_amt = rs2 & 63
shift_amt_compl = (32 - shift_amt[4:0])
1. if (shift_amt >= 33):
store (rs1 >> shift_amt_compl[4:0]) in stage reg
else if (shift_amt <0 && shift_amt <= 31):
store (rs1 << shift_amt[4:0]) in stage reg
else if (shift_amt == 32 || shift_amt == 0):
store rs1 in stage reg
2. if (shift_amt >= 33):
return stage_reg_q | (rs3 << shift_amt[4:0])
else if (shift_amt <0 && shift_amt <= 31):
return stage_reg_q | (rs3 >> shift_amt_compl[4:0])
else if (shift_amt == 32):
return rs3
else if (shift_amt == 0):
return rs1
Signed-off-by: ganoam <gnoam@live.com>
This commit implements the Bit Manipulation Extension ZBB instruction
group: clz, ctz, pcnt, slo, sro, rol, ror, rev, rev8, orcb, pack
packu, packh, min, max, andn, orn, and xnor.
* Bit counting instructions clz, ctz and pcnt can be implemented to
share much of the architecture:
clz: Count Leading Zeros. Counts the number of 0 bits at the
MSB end of the argument.
ctz: Count Trailing Zeros. Counts the number of 0 bits at the
LSB end of the argument.
pcnt: Counts the number of set bits of the argument.
The implementation uses:
- 32 one bit adders, counting the set bits of a signal
bitcnt_bits, starting from the LSB end.
- For pcnt the argument is fed directly into bitcnt_bits.
- For clz, the operand is reversed such that leading zeros are
located at the LSB end of bitcnt_bits.
- For ctz and clz: counter enable signal for 1-bit counter i
is high, if the previous enable signal, and
its corresponting bitcnt_bit was high.
* Instructions sll[i], srl[i],slo[i], sro[i], rol, ror[i], rev, rev8
and orc.b are summarized as shifting instructions and related:
The following instructions are slight variations of the
existing base spec's sll, srl and sra instructions.
- slo[i] and sro[i]: shift left/right ones: similar to
shift-logical operations from base spec, but shifting
in ones instead of zeros.
- rol and ror[i]: rotate left/right ones: circular shift
operations. shifting in values from the oposite end
of the operand instead of zeros.
Those instructions can be implemented, sharing the base spec's
shifting structure. In order to support rotate operations, a
64-bit shifting structure is needed.
In the existing ALU, hardware is described only for right
shifts. For left shifts the operand is initially reversed,
right shifted and the result is reversed back. This gives rise
to an additional resource sharing oportunity for some more
zbb operations:
- rev: bitwise reversal.
- rev8: byte-order swap.
- orc.b: byte-wise reverse and or-combine.
* Instructions min, max:
For the B-extension's min/max instructions, we can share the
existing comparison operations. The result is obtained by
activating the comparison structure accordingly and
multiplexing the operands using the comparison result.
* Logic-with-negate instructions andn, orn, xnor:
For the B-extension's logic-with-negate instructions we can
share the structures of the base spec's logic structures
already present for 'xnor', 'or' and 'and' instructions as
well as the conditionally negated b operand generated for
subtraction operations.
* Instructions pack, packu, packh:
For the pack, packh and packu instructions I don't see any
opportunities for resource sharing. However, the architecture
is quite simple.
- pack: pack the lower halves of rs1 and rs2 into rd, with rs1
in the lower half and rs2 in the upper half.
- packu: pack the upper halves of rs1 and rs2 into rd, with
rs1 in the lower half and rs2 in the upper half.
- packh: pack the LSB bytes of rs1 and rs2 into rd, with rs1
in the lower half and rs2 in the upper half.
Signed-off-by: ganoam <gnoam@live.com>
- Create separate operand muxes for the branch/jump target ALU
- Complete jump instructions in one cycle when BT ALU configured
Signed-off-by: Tom Roberts <tomroberts@lowrisc.org>
The third pipeline stage is a new writeback stage. Ibex can now be
configured as the original two stage design or the new three stage
design using the `WritebackStage` parameter in ibex_core. This defaults
to 0 (giving the original two stage design).
The three stage design is *EXPERIMENTAL*
In the three stage design all register write back occurs in the third,
final stage. This allows a cycle for responses to loads and stores so
when the memory system can respond in a single cycle there will be no
stall. This offers significant performance benefits.
Documentation of the three stage design is still to be written so
existing documentation applies to the two stage design only as various
aspects of Ibex behaviour will change in the three stage design.
Signed-off-by: Greg Chadwick <gac@lowrisc.org>
* Integrate option to implement a multiplier using 3 parallel 17 bit
multipliers in order to compute MUL instructions in 1 cycle
MULH in 2 cycles.
* Add parameter SingleCycleMultiply to select single cycle
multiplication.
The single cycle multiplication capability is intended for FPGA
targets. Using three parallel multiplication units improves performance
of multiplication operations at the cost of DSP primitives. For ASIC
targets, the area consumed by the multiplication structure will grow
approximately 3-4x.
The functionality is selected within the module using the parameter
`SingleCycleMultiply`. From the top level it can be chosen by setting
the parameter `MultiplierImplementation` to 'single_cc'.
Signed-off-by: ganoam <gnoam@live.com>
multdiv_sel signals the mult/div operand should be selected for the ALU
inputs. Previously the mult_en/div_en signals were used but these factor
in whether the instruction is actually happening which is not relevant
for the mux select. The dedicated select signal gives better timing.
We currently have a documentation block at the beginning of each file,
containing author credits and module-level documentation. The
module-level documentation is retained for historic reasons and
duplicated with the newer comments below it.
For the authors, maintaining author credits in the file is error-prone,
as this information gets outdated very soon. A more reliable way to see
who modified a file is to use the history information in git.
Additionally, we now have the CREDITS.md file, which lists all
contributors, even the ones which don't appear in the git history (e.g.
because the code was copied and commited by someone else).
This file doesn't contain defines any more, but a normal SV package.
The diff is best viewed without whitespace changes, as the reindents
cause a lof of diff noise.
Fixeslowrisc/ibex#173
The EX block actually signals when its output is valid, and not when it
is ready to accept new input. The LSU valid signal is not needed inside
the EX block and can thus be fed directly to the ID stage.
- Move ibex_tracer_defines.sv and ibex_defines.sv out of the 'include'
directory, since these files are not actually included.
- Remove ibex_config.sv, it's mostly unused code. The remaining defines,
SYNTHESIS, ASIC_SYNTHESIS, TRACE_EXECUTION, and CHECK_MISALIGNED should
be set through command-line flags to the simulation/synthesis tools.
Initial version by Nils Gräf.
The code base made extensive use of ASCII art headings/subheadings in
comments to delineate code. Switch to a more space efficient and easier
to edit format:
/////////
// Foo //
/////////
This change has been informed by advice from the lowRISC legal
committee.
The Solderpad 0.51 license states "the Licensor permits any Work
licensed under this License, at the option of the Licensee, to be
treated as licensed under the Apache License Version 2.0". We use this
freedom to convert license markings to Apache 2.0. This commit ensures
that we retain all authorship and copyright attribution information.