[docs/userguide] reworked section "Adding Custom Hardware Modules"

2025-04-24 22:27:21 -04:00 · 2022-01-29 15:00:02 +01:00 · 2022-01-29 15:00:02 +01:00 · 0e3a715e17
commit 0e3a715e17
parent 67449f4981
1 changed files with 61 additions and 23 deletions
--- a/docs/userguide/adding_custom_hw_modules.adoc
+++ b/docs/userguide/adding_custom_hw_modules.adoc
@ -4,6 +4,7 @@

 In resemblance to the RISC-V ISA, the NEORV32 processor was designed to ease customization and _extensibility_.
 The processor provides several predefined options to add application-specific custom hardware modules and accelerators.
+A <<_comparative_summary>> is given at the end of this section.


 === Standard (_External_) Interfaces
@ -15,47 +16,84 @@ https://stnolting.github.io/neorv32/#_primary_universal_asynchronous_receiver_an
 https://stnolting.github.io/neorv32/#_serial_peripheral_interface_controller_spi[SPI] and
 https://stnolting.github.io/neorv32/#_two_wire_serial_interface_controller_twi[TWI].

-The SPI and (especially) the GPIO interfaces might be the most straightforward approaches since they
-have a minimal  protocol overhead. Device-specific interrupt capabilities can be added using the
+The SPI and especially the GPIO interfaces might be the most straightforward approaches since they
+have a minimal  protocol overhead. Device-specific interrupt capabilities could be added using the
 https://stnolting.github.io/neorv32/#_external_interrupt_controller_xirq[External Interrupt Controller (XIRQ)].
+
 Beyond simplicity, these interface only provide a very limited bandwidth and require more sophisticated
-software handling ("bit-banging" for the GPIO).
+software handling ("bit-banging" for the GPIO). Hence, i is not recommend to use them for _chip-internal_ communication.


 === External Bus Interface

 The https://stnolting.github.io/neorv32/#_processor_external_memory_interface_wishbone_axi4_lite[External Bus Interface]
-provides the classic approach to connect to custom IP. By default, the bus interface implements the widely adopted
-Wishbone interface standard. However, this project also includes wrappers to bridge to other protocol standards like ARM's
-AXI4-Lite or Intel's Avalon. By using a full-featured bus protocol, complex SoC structures can be implemented (including
-several modules and even multi-core architectures). Many FPGA EDA tools provide graphical editors to build and customize
-whole SoC architectures and even include pre-defined IP libraries.
+provides the classic approach for attaching custom IP. By default, the bus interface implements the widely adopted
+Wishbone interface standard. This project also includes wrappers to convert to other protocol standards like ARM's
+AXI4-Lite or Intel's Avalon protocols. By using a full-featured bus protocol, complex SoC designs can be implemented
+including several modules and even multi-core architectures. Many FPGA EDA tools provide graphical editors to build
+and customize whole SoC architectures and even include pre-defined IP libraries.

 .Example AXI SoC using Xilinx Vivado
 image::neorv32_axi_soc.png[]

 The bus interface uses a memory-mapped approach. All data transfers are handled by simple load/store operations since the
 external bus interface is mapped into the processor's https://stnolting.github.io/neorv32/#_address_space[address space].
-This allows a very simple still high-bandwidth communications.
+This allows a very simple still high-bandwidth communications. However, high bus traffic may increase access latencies.


 === Stream Link Interface

-The NEORV32 https://stnolting.github.io/neorv32/#_stream_link_interface_slink[Stream Link Interface] provides
-point-to-point, unidirectional and parallel data channels that can be used to transfer streaming data. In
-contrast to the external bus interface, the streaming data does not provide any kind of "direction" control,
-so it can be seen as "constant address bursts". The stream link interface provides less protocol overhead
-and less latency than the bus interface. Furthermore, FIFOs can be be configured to each direction (RX/TX) to
-allow more CPU-independent operation.
+The https://stnolting.github.io/neorv32/#_stream_link_interface_slink[Stream Link Interface (SLINK)] provides a
+point-to-point, unidirectional and parallel data interface that can be used to transfer _streaming_ data. In
+contrast to the external bus interface, the streaming interface does not provide any kind of advanced control,
+so it can be seen as "constant address bursts" where data is transmitted _sequentially_ (no random accesses).
+
+The stream link interface provides less protocol overhead and less latency than the bus interface. Furthermore,
+FIFOs can be be configured to each direction (RX/TX) to allow more CPU-independent operation.


 === Custom Functions Subsystem

-The NEORV32 https://stnolting.github.io/neorv32/#_custom_functions_subsystem_cfs[Custom Functions Subsystem] is
-an "empty" template for a processor-internal module. It provides 32 32-bit memory-mapped interface
-registers that can be used to communicate with any arbitrary custom design logic. The intentions of this
-subsystem is to provide a simple base, where the user can concentrate on implementing the actual design logic
-rather than taking care of the communication between the CPU/software and the design logic. The interface
-registers are already allocated within the processor's address space and are supported by the software framework
-via low-level hardware access mechanisms. Additionally, the CFS provides a direct pre-defined interrupt channel to
-the CPU, which is also supported by the NEORV32 runtime environment.
+The https://stnolting.github.io/neorv32/#_custom_functions_subsystem_cfs[Custom Functions Subsystem (CFS)] is
+an "empty" template for a memory-mapped, processor-internal module.
+
+The basic idea of this subsystem is to provide a convenient, simple and flexible platform, where the user can
+concentrate on implementing the actual design logic rather than taking care of the communication between the
+CPU/software and the design logic. Note that the CFS does not have direct access to memory. All data (and control
+instruction) have to be send by the CPU.
+
+
+=== Custom Functions Unit
+
+The https://stnolting.github.io/neorv32/#_custom_functions_unit_cfu[Custom Functions Unit (CFU)] is a functional
+unit that is integrated right into the CPU's pipeline. It allows to implement custom RISC-V instructions.
+This extension option is intended for rather small logic that implements operations, which cannot be emulated
+in pure software in an efficient way. Since the CFU has direct access to the core's register file it can operate
+with minimal data latency.
+
+
+=== Comparative Summary
+
+The following table gives a comparative summary of the most important factors when choosing one of the
+chip-internal extension options:
+
+* https://stnolting.github.io/neorv32/#_custom_functions_unit_cfu[Custom Functions Unit] for CPU-internal custom RISC-V instructions
+* https://stnolting.github.io/neorv32/#_custom_functions_subsystem_cfs[Custom Functions Subsystem] for tightly-coupled processor-internal co-processors
+* https://stnolting.github.io/neorv32/#_stream_link_interface_slink[Stream Link Interface] for processor-external streaming modules
+* https://stnolting.github.io/neorv32/#_processor_external_memory_interface_wishbone_axi4_lite[External Bus Interface] for processor-external memory-mapped modules
+
+.Comparison of On-Chip Extension Options
+[cols="<1,^1,^1,^1,^1"]
+[options="header",grid="rows"]
+|=======================
+|                                 | Custom Functions Unit | Custom Functions Subsystem | Stream Link Interface  | External Bus Interface
+| **HW complexity/size**          | low/small             | medium                     | unlimited              | unlimited
+| **CPU-independent operation**   | no                    | mostly                     | no                     | completely
+| **CPU interface**               | inside CPU pipeline   | memory-mapped              | memory-mapped          | memory-mapped
+| **Low-level CPU access scheme** | custom instructions   | load/store                 | load/store             | load/store
+| **Random access**               | yes                   | yes                        | sequential             | yes
+| **Access latency**              | minimal               | low                        | low                    | medium to high
+| **External IO interfaces**      | no                    | yes, but limited           | yes                    | yes
+| **Interrupt capable**           | no                    | yes                        | yes                    | yes
+|=======================
+