[docs/userguide] reworked section "Adding Custom Hardware Modules"

This commit is contained in:
stnolting 2022-01-29 15:00:02 +01:00
parent 67449f4981
commit 0e3a715e17

View file

@ -4,6 +4,7 @@
In resemblance to the RISC-V ISA, the NEORV32 processor was designed to ease customization and _extensibility_.
The processor provides several predefined options to add application-specific custom hardware modules and accelerators.
A <<_comparative_summary>> is given at the end of this section.
=== Standard (_External_) Interfaces
@ -15,47 +16,84 @@ https://stnolting.github.io/neorv32/#_primary_universal_asynchronous_receiver_an
https://stnolting.github.io/neorv32/#_serial_peripheral_interface_controller_spi[SPI] and
https://stnolting.github.io/neorv32/#_two_wire_serial_interface_controller_twi[TWI].
The SPI and (especially) the GPIO interfaces might be the most straightforward approaches since they
have a minimal protocol overhead. Device-specific interrupt capabilities can be added using the
The SPI and especially the GPIO interfaces might be the most straightforward approaches since they
have a minimal protocol overhead. Device-specific interrupt capabilities could be added using the
https://stnolting.github.io/neorv32/#_external_interrupt_controller_xirq[External Interrupt Controller (XIRQ)].
Beyond simplicity, these interface only provide a very limited bandwidth and require more sophisticated
software handling ("bit-banging" for the GPIO).
software handling ("bit-banging" for the GPIO). Hence, i is not recommend to use them for _chip-internal_ communication.
=== External Bus Interface
The https://stnolting.github.io/neorv32/#_processor_external_memory_interface_wishbone_axi4_lite[External Bus Interface]
provides the classic approach to connect to custom IP. By default, the bus interface implements the widely adopted
Wishbone interface standard. However, this project also includes wrappers to bridge to other protocol standards like ARM's
AXI4-Lite or Intel's Avalon. By using a full-featured bus protocol, complex SoC structures can be implemented (including
several modules and even multi-core architectures). Many FPGA EDA tools provide graphical editors to build and customize
whole SoC architectures and even include pre-defined IP libraries.
provides the classic approach for attaching custom IP. By default, the bus interface implements the widely adopted
Wishbone interface standard. This project also includes wrappers to convert to other protocol standards like ARM's
AXI4-Lite or Intel's Avalon protocols. By using a full-featured bus protocol, complex SoC designs can be implemented
including several modules and even multi-core architectures. Many FPGA EDA tools provide graphical editors to build
and customize whole SoC architectures and even include pre-defined IP libraries.
.Example AXI SoC using Xilinx Vivado
image::neorv32_axi_soc.png[]
The bus interface uses a memory-mapped approach. All data transfers are handled by simple load/store operations since the
external bus interface is mapped into the processor's https://stnolting.github.io/neorv32/#_address_space[address space].
This allows a very simple still high-bandwidth communications.
This allows a very simple still high-bandwidth communications. However, high bus traffic may increase access latencies.
=== Stream Link Interface
The NEORV32 https://stnolting.github.io/neorv32/#_stream_link_interface_slink[Stream Link Interface] provides
point-to-point, unidirectional and parallel data channels that can be used to transfer streaming data. In
contrast to the external bus interface, the streaming data does not provide any kind of "direction" control,
so it can be seen as "constant address bursts". The stream link interface provides less protocol overhead
and less latency than the bus interface. Furthermore, FIFOs can be be configured to each direction (RX/TX) to
allow more CPU-independent operation.
The https://stnolting.github.io/neorv32/#_stream_link_interface_slink[Stream Link Interface (SLINK)] provides a
point-to-point, unidirectional and parallel data interface that can be used to transfer _streaming_ data. In
contrast to the external bus interface, the streaming interface does not provide any kind of advanced control,
so it can be seen as "constant address bursts" where data is transmitted _sequentially_ (no random accesses).
The stream link interface provides less protocol overhead and less latency than the bus interface. Furthermore,
FIFOs can be be configured to each direction (RX/TX) to allow more CPU-independent operation.
=== Custom Functions Subsystem
The NEORV32 https://stnolting.github.io/neorv32/#_custom_functions_subsystem_cfs[Custom Functions Subsystem] is
an "empty" template for a processor-internal module. It provides 32 32-bit memory-mapped interface
registers that can be used to communicate with any arbitrary custom design logic. The intentions of this
subsystem is to provide a simple base, where the user can concentrate on implementing the actual design logic
rather than taking care of the communication between the CPU/software and the design logic. The interface
registers are already allocated within the processor's address space and are supported by the software framework
via low-level hardware access mechanisms. Additionally, the CFS provides a direct pre-defined interrupt channel to
the CPU, which is also supported by the NEORV32 runtime environment.
The https://stnolting.github.io/neorv32/#_custom_functions_subsystem_cfs[Custom Functions Subsystem (CFS)] is
an "empty" template for a memory-mapped, processor-internal module.
The basic idea of this subsystem is to provide a convenient, simple and flexible platform, where the user can
concentrate on implementing the actual design logic rather than taking care of the communication between the
CPU/software and the design logic. Note that the CFS does not have direct access to memory. All data (and control
instruction) have to be send by the CPU.
=== Custom Functions Unit
The https://stnolting.github.io/neorv32/#_custom_functions_unit_cfu[Custom Functions Unit (CFU)] is a functional
unit that is integrated right into the CPU's pipeline. It allows to implement custom RISC-V instructions.
This extension option is intended for rather small logic that implements operations, which cannot be emulated
in pure software in an efficient way. Since the CFU has direct access to the core's register file it can operate
with minimal data latency.
=== Comparative Summary
The following table gives a comparative summary of the most important factors when choosing one of the
chip-internal extension options:
* https://stnolting.github.io/neorv32/#_custom_functions_unit_cfu[Custom Functions Unit] for CPU-internal custom RISC-V instructions
* https://stnolting.github.io/neorv32/#_custom_functions_subsystem_cfs[Custom Functions Subsystem] for tightly-coupled processor-internal co-processors
* https://stnolting.github.io/neorv32/#_stream_link_interface_slink[Stream Link Interface] for processor-external streaming modules
* https://stnolting.github.io/neorv32/#_processor_external_memory_interface_wishbone_axi4_lite[External Bus Interface] for processor-external memory-mapped modules
.Comparison of On-Chip Extension Options
[cols="<1,^1,^1,^1,^1"]
[options="header",grid="rows"]
|=======================
| | Custom Functions Unit | Custom Functions Subsystem | Stream Link Interface | External Bus Interface
| **HW complexity/size** | low/small | medium | unlimited | unlimited
| **CPU-independent operation** | no | mostly | no | completely
| **CPU interface** | inside CPU pipeline | memory-mapped | memory-mapped | memory-mapped
| **Low-level CPU access scheme** | custom instructions | load/store | load/store | load/store
| **Random access** | yes | yes | sequential | yes
| **Access latency** | minimal | low | low | medium to high
| **External IO interfaces** | no | yes, but limited | yes | yes
| **Interrupt capable** | no | yes | yes | yes
|=======================