ibex/doc/pipeline_details.rst
ganoam 48c4b6a5ea [rtl] Add Single Cycle Multiplier targeting FPGA
* Integrate option to implement a multiplier using 3 parallel 17 bit
        multipliers in order to compute MUL instructions in 1 cycle
        MULH in 2 cycles.

* Add parameter SingleCycleMultiply to select single cycle
        multiplication.

The single cycle multiplication capability is intended for FPGA
targets. Using three parallel multiplication units improves performance
of multiplication operations at the cost of DSP primitives. For ASIC
targets, the area consumed by the multiplication structure will grow
approximately 3-4x.

The functionality is selected within the module using the parameter
`SingleCycleMultiply`. From the top level it can be chosen by setting
the parameter `MultiplierImplementation` to 'single_cc'.

Signed-off-by: ganoam <gnoam@live.com>
2020-02-11 16:09:41 +01:00

89 lines
8.6 KiB
ReStructuredText

.. _pipeline-details:
.. figure:: images/blockdiagram.svg
:name: ibex-pipeline
:align: center
Ibex Pipeline
Pipeline Details
================
Ibex has a 2-stage pipeline, the 2 stages are:
Instruction Fetch (IF)
Fetches instructions from memory via a prefetch buffer, capable of fetching 1 instruction per cycle if the instruction side memory system allows. See :ref:`instruction-fetch` for details.
Instruction Decode and Execute (ID/EX)
Decodes fetched instruction and immediately executes it, register read and write all occurs in this stage.
Multi-cycle instructions will stall this stage until they are complete See :ref:`instruction-decode-execute` for details.
All instructions require two cycles minimum to pass down the pipeline.
One cycle in the IF stage and one in the ID/EX stage.
Not all instructions can complete in the ID/EX stage in one cycle so will stall there until they complete.
This means the maximum IPC (Instructions per Cycle) Ibex can achieve is 1 when multi-cycle instructions aren't used.
See Multi- and Single-Cycle Instructions below for the details.
Multi- and Single-Cycle Instructions
------------------------------------
In the table below when an instruction stalls for X cycles X + 1 cycles pass before a new instruction enters the ID/EX stage.
Some instructions stall for a variable time, this is indicated as a range e.g. 1 - N means the instruction stalls a minimum of 1 cycle with an indeterminate maximum cycles.
Read the description for more information.
+-----------------------+--------------------------------------+-------------------------------------------------------------+
| Instruction Type | Stall Cycles | Description |
+=======================+======================================+=============================================================+
| Integer Computational | 0 | Integer Computational Instructions are defined in the |
| | | RISCV-V RV32I Base Integer Instruction Set. |
+-----------------------+--------------------------------------+-------------------------------------------------------------+
| CSR Access | 0 | CSR Access Instruction are defined in 'Zicsr' of the |
| | | RISC-V specification. |
+-----------------------+--------------------------------------+-------------------------------------------------------------+
| Load/Store | 1 - N | Both loads and stores stall for at least one cycle to await |
| | | a response. For loads this response is the load data |
| | | (which is written directly to the register file the same |
| | | cycle it is received). For stores this is whether an error |
| | | was seen or not. The longer the data side memory interface |
| | | takes to receive a response the longer loads and stores |
| | | will stall. |
+-----------------------+--------------------------------------+-------------------------------------------------------------+
| Multiplication | 0/1 (Single-Cycle Multiplier) | 0 for MUL, 1 for MULH. |
| | | |
| | 2/3 (Fast Multi-Cycle Multiplier) | 2 for MUL, 3 for MULH. |
| | | |
| | clog2(``op_b``)/32 (Slow Multi-Cycle | clog2(``op_b``) for MUL, 32 for MULH. |
| | Multiplier) | See details in :ref:`mult-div`. |
+-----------------------+--------------------------------------+-------------------------------------------------------------+
| Division | 1 or 37 | 1 stall cycle if divide by 0, otherwise full long division. |
| | | See details in :ref:`mult-div` |
| Remainder | | |
+-----------------------+--------------------------------------+-------------------------------------------------------------+
| Jump | 1 - N | Minimum one cycle stall to flush the prefetch counter and |
| | | begin fetching from the new Program Counter (PC). The new |
| | | PC request will appear on the instruction-side memory |
| | | interface the same cycle the jump instruction enters ID/EX. |
| | | The longer the instruction-side memory interface takes to |
| | | receive data the longer the jump will stall. |
+-----------------------+--------------------------------------+-------------------------------------------------------------+
| Branch (Not-Taken) | 0 | Any branch where the condition is not met will |
| | | not stall. |
+-----------------------+--------------------------------------+-------------------------------------------------------------+
| Branch (Taken) | 2 - N | Any branch where the condition is met will stall for 2 |
| | | cycles as in the first cycle the branch is in ID/EX the ALU |
| | 1 - N (Branch Target | is used to calculate the branch condition. The following |
| | ALU enabled) | cycle the ALU is used again to calculate the branch target |
| | | where it proceeds as Jump does above (Flush IF stage and |
| | | prefetch buffer, new PC on instruction-side memory |
| | | interface the same cycle it is calculated). The longer the |
| | | instruction-side memory interface takes to receive data the |
| | | longer the branch will stall. With the parameter |
| | | ``BranchTargetALU`` set to ``1`` a seperate ALU calculates |
| | | the branch target simultaneously to calculating the branch |
| | | condition with the main ALU so 1 less stall cycle is |
| | | required. |
+-----------------------+--------------------------------------+-------------------------------------------------------------+
| Instruction Fence | 1 - N | The FENCE.I instruction as defined in 'Zifencei' of the |
| | | RISC-V specification. Internally it is implemented as a |
| | | jump (which does the required flushing) so it has the same |
| | | stall characteristics (see above). |
+-----------------------+--------------------------------------+-------------------------------------------------------------+