mirror of
https://github.com/stnolting/neorv32.git
synced 2025-04-24 22:27:21 -04:00
389 lines
20 KiB
Text
389 lines
20 KiB
Text
:sectnums:
|
||
== Overview
|
||
|
||
The NEORV32footnote:[Pronounced "neo-R-V-thirty-two" or "neo-risc-five-thirty-two" in its long form.] Processor
|
||
is a customizable microcontroller-like system on chip (SoC) that is based on the
|
||
RISC-V NEORV32 CPU. The processor is intended as ready-to-go auxiliary processor within a larger SoC
|
||
designs or as stand-alone custom microcontroller. Its top entity can be directly synthesized for any target
|
||
technology without modifications.
|
||
|
||
The system is highly configurable and provides optional common peripherals like embedded memories,
|
||
timers, serial interfaces, general purpose IO ports and an external bus interface to connect custom IP like
|
||
memories, NoCs and peripherals.
|
||
|
||
The software framework of the processor comes with application makefiles, software libraries for all CPU
|
||
and processor features, a bootloader, a runtime environment and several example programs – including a port
|
||
of the CoreMark MCU benchmark and the official RISC-V architecture test suite. RISC-V GCC is used as
|
||
default toolchain (https://github.com/stnolting/riscv-gcc-prebuilt[a prebuilt toolchain is also available on GitHub]).
|
||
|
||
The project's change log is available in the https://github.com/stnolting/neorv32/blob/master/CHANGELOG.md[CHANGELOG.md]
|
||
file in the root directory of the NEORV32 repository.
|
||
|
||
|
||
:sectnums!:
|
||
=== Structure
|
||
|
||
Chapter <<_neorv32_central_processing_unit_cpu>>
|
||
|
||
* instruction set(s) and extensions, instruction timing, control ans status registers, traps, exceptions and interrupts,
|
||
hardware execution safety, native bus interface
|
||
|
||
Chapter <<_neorv32_processor_soc>>
|
||
|
||
* top entity signals and configuration generics, address space layout, internal peripheral devices and interrupts, internal
|
||
memories and caches, internal bus architecture, external bus interface
|
||
|
||
Chapter <<_on_chip_debugger_ocd>>
|
||
|
||
* on-chip debugging compatible to the "Minimal RISC-V Debug Specification Version 0.13.2".
|
||
|
||
Chapter <<_software_framework>>
|
||
|
||
* core libraries, bootloader, makefiles, runtime environment
|
||
|
||
Chapter <<_lets_get_it_started>>
|
||
|
||
* toolchain installation and setup, hardware setup, software setup, application compilation, simulating the processor
|
||
|
||
[TIP]
|
||
Links in this document are <<_structure,highlighted>>.
|
||
|
||
|
||
|
||
<<<
|
||
// ####################################################################################################################
|
||
:sectnums:
|
||
=== Project Key Features
|
||
|
||
* **NEORV32 CPU**: 32-bit `rv32i` RISC-V CPU - passes the official RISC-V architecture tests
|
||
* official https://github.com/riscv/riscv-isa-manual/blob/master/marchid.md[RISC-V open source architecture ID]
|
||
* optional RISC-V CPU extensions:
|
||
** `A` - atomic memory access operations
|
||
** `B` - bit-manipulation instructions
|
||
** `C` - 16-bit compressed instructions
|
||
** `E` - embedded CPU version (reduced register file size)
|
||
** `M` - integer multiplication and division hardware
|
||
** `U` - less-privileged _user_ mode
|
||
** `Zfinx` - single-precision floating-point unit
|
||
** `Zicsr` - control and status register access (privileged architecture)
|
||
** `Zifencei` - instruction stream synchronization
|
||
** `PMP` - physical memory protection
|
||
** `HPM` - hardware performance monitors
|
||
* **Software framework**
|
||
** GCC-based toolchain - prebuilt toolchains available; application compilation based on GNU makefiles
|
||
** internal bootloader with serial user interface
|
||
** core libraries for high-level usage of the provided functions and peripherals
|
||
** runtime environment and several example programs
|
||
** doxygen-based documentation of the software framework; a deployed version is available at https://stnolting.github.io/neorv32/sw/files.html
|
||
** FreeRTOS port + demos available
|
||
* **NEORV32 Processor**: highly-configurable full-scale microcontroller-like processor system / SoC based on the NEORV32 CPU with optional standard peripherals:
|
||
** serial interfaces (UARTs, TWI, SPI)
|
||
** timers and counters (WDT, MTIME, NCO)
|
||
** general purpose IO and PWM and native NeoPixel (c) compatible smart LED interface
|
||
** embedded memories / caches for data, instructions and bootloader
|
||
** external memory interface (Wishbone or AXI4-Lite)
|
||
* on-chip debugger compatible with OpenOCD and gdb
|
||
* fully synchronous design, no latches, no gated clocks
|
||
* completely described in behavioral, platform-independent VHDL
|
||
* small hardware footprint and high operating frequency
|
||
|
||
|
||
<<<
|
||
// ####################################################################################################################
|
||
:sectnums:
|
||
=== Project Folder Structure
|
||
|
||
...................................
|
||
neorv32 - Project home folder
|
||
├.ci - Scripts for continuous integration
|
||
├boards - Example setups for various FPGA boards
|
||
├CHANGELOG.md - Project change log
|
||
├docs - Project documentation
|
||
│├doxygen_build - Software framework documentation (generated by doxygen)
|
||
│├src_adoc - AsciiDoc sources for this document
|
||
│├references - Data sheets and RISC-V specs.
|
||
│└figures - Figures and logos
|
||
├riscv-arch-test - Port files for the official RISC-V architecture tests
|
||
├rtl - VHDL sources
|
||
│├core - Sources of the CPU & SoC
|
||
│└top_templates - Alternate/additional top entities/wrappers
|
||
├sim - Simulation files
|
||
│├ghdl - Simulation scripts for GHDL
|
||
│├rtl_modules - Processor modules for simulation-only
|
||
│└vivado - Pre-configured Xilinx ISIM waveform
|
||
└sw - Software framework
|
||
├bootloader - Sources and scripts for the NEORV32 internal bootloader
|
||
├common - Linker script and crt0.S start-up code
|
||
├example - Various example programs
|
||
│└...
|
||
├ocd_firmware - source code for on-chip debugger's "park loop"
|
||
├openocd - OpenOCD on-chip debugger configuration files
|
||
├image_gen - Helper program to generate NEORV32 executables
|
||
└lib - Processor core library
|
||
├include - Header files (*.h)
|
||
└source - Source files (*.c)
|
||
...................................
|
||
|
||
[NOTE]
|
||
There are further files and folders starting with a dot which – for example – contain
|
||
data/configurations only relevant for git or for the continuous integration framework (`.ci`).
|
||
|
||
|
||
<<<
|
||
// ####################################################################################################################
|
||
:sectnums:
|
||
=== VHDL File Hierarchy
|
||
|
||
All necessary VHDL hardware description files are located in the project's `rtl/core folder`. The top entity
|
||
of the entire processor including all the required configuration generics is **`neorv32_top.vhd`**.
|
||
|
||
[IMPORTANT]
|
||
All core VHDL files from the list below have to be assigned to a new design library named **`neorv32`**. Additional
|
||
files, like alternative top entities, can be assigned to any library.
|
||
|
||
...................................
|
||
neorv32_top.vhd - NEORV32 Processor top entity
|
||
├neorv32_boot_rom.vhd - Bootloader ROM
|
||
│└neorv32_bootloader_image.vhd - Bootloader boot ROM memory image
|
||
├neorv32_busswitch.vhd - Processor bus switch for CPU buses (I&D)
|
||
├neorv32_bus_keeper.vhd - Processor-internal bus monitor
|
||
├neorv32_icache.vhd - Processor-internal instruction cache
|
||
├neorv32_cfs.vhd - Custom functions subsystem
|
||
├neorv32_cpu.vhd - NEORV32 CPU top entity
|
||
│├neorv32_package.vhd - Processor/CPU main VHDL package file
|
||
│├neorv32_cpu_alu.vhd - Arithmetic/logic unit
|
||
│├neorv32_cpu_bus.vhd - Bus interface unit + physical memory protection
|
||
│├neorv32_cpu_control.vhd - CPU control, exception/IRQ system and CSRs
|
||
││└neorv32_cpu_decompressor.vhd - Compressed instructions decoder
|
||
│├neorv32_cpu_cp_bitmanip.vhd - Bit manipulation co-processor (B extension)
|
||
│├neorv32_cpu_cp_fpu.vhd - Floating-point co-processor (Zfinx extension)
|
||
│├neorv32_cpu_cp_muldiv.vhd - Mul/Div co-processor (M extension)
|
||
│└neorv32_cpu_regfile.vhd - Data register file
|
||
├neorv32_debug_dm.vhd - on-chip debugger: debug module
|
||
├neorv32_debug_dtm.vhd - on-chip debugger: debug transfer module
|
||
├neorv32_dmem.vhd - Processor-internal data memory
|
||
├neorv32_gpio.vhd - General purpose input/output port unit
|
||
├neorv32_imem.vhd - Processor-internal instruction memory
|
||
│└neor32_application_image.vhd - IMEM application initialization image
|
||
├neorv32_mtime.vhd - Machine system timer
|
||
├neorv32_nco.vhd - Numerically-controlled oscillator
|
||
├neorv32_neoled.vhd - NeoPixel (TM) compatible smart LED interface
|
||
├neorv32_pwm.vhd - Pulse-width modulation controller
|
||
├neorv32_spi.vhd - Serial peripheral interface controller
|
||
├neorv32_sysinfo.vhd - System configuration information memory
|
||
├neorv32_trng.vhd - True random number generator
|
||
├neorv32_twi.vhd - Two wire serial interface controller
|
||
├neorv32_uart.vhd - Universal async. receiver/transmitter
|
||
├neorv32_wdt.vhd - Watchdog timer
|
||
└neorv32_wb_interface.vhd - External (Wishbone) bus interface
|
||
...................................
|
||
|
||
|
||
<<<
|
||
// ####################################################################################################################
|
||
:sectnums:
|
||
=== FPGA Implementation Results
|
||
|
||
This chapter shows exemplary implementation results of the NEORV32 CPU and Processor. Please note, that
|
||
the provided results are just a relative measure as logic functions of different modules might be merged
|
||
between entity boundaries, so the actual utilization results might vary a bit.
|
||
|
||
:sectnums:
|
||
==== CPU
|
||
|
||
[cols="<2,<8"]
|
||
[grid="topbot"]
|
||
|=======================
|
||
| Hardware version: | `1.5.5.5`
|
||
| Top entity: | `rtl/core/neorv32_cpu.vhd`
|
||
|=======================
|
||
|
||
[cols="<5,>1,>1,>1,>1,>1"]
|
||
[options="header",grid="rows"]
|
||
|=======================
|
||
| CPU | LEs | FFs | MEM bits | DSPs | _f~max~_
|
||
| `rv32i` | 980 | 409 | 1024 | 0 | 123 MHz
|
||
| `rv32i_Zicsr` | 1835 | 856 | 1024 | 0 | 124 MHz
|
||
| `rv32im_Zicsr` | 2443 | 1134 | 1024 | 0 | 124 MHz
|
||
| `rv32imc_Zicsr` | 2669 | 1149 | 1024 | 0 | 125 MHz
|
||
| `rv32imac_Zicsr` | 2685 | 1156 | 1024 | 0 | 124 MHz
|
||
| `rv32imac_Zicsr` + `debug_mode` | 3058 | 1225 | 1024 | 0 | 120 MHz
|
||
| `rv32imac_Zicsr` + `u` | 2698 | 1162 | 1024 | 0 | 124 MHz
|
||
| `rv32imac_Zicsr_Zifencei` + `u` | 2715 | 1162 | 1024 | 0 | 122 MHz
|
||
| `rv32imac_Zicsr_Zifencei_Zfinx` + `u` | 4004 | 1812 | 1024 | 7 | 121 MHz
|
||
|=======================
|
||
|
||
|
||
:sectnums:
|
||
==== Processor Modules
|
||
|
||
[cols="<2,<8"]
|
||
[grid="topbot"]
|
||
|=======================
|
||
| Hardware version: | `1.5.5.9`
|
||
| Top entity: | `rtl/core/neorv32_top.vhd`
|
||
|=======================
|
||
|
||
.Hardware utilization by the processor modules (mandatory core modules in **bold**)
|
||
[cols="<2,<8,>1,>1,>2,>1"]
|
||
[options="header",grid="rows"]
|
||
|=======================
|
||
| Module | Description | LEs | FFs | MEM bits | DSPs
|
||
| Boot ROM | Bootloader ROM (4kB) | 3 | 1 | 32768 | 0
|
||
| **BUSKEEPER** | Processor-internal bus monitor | 11 | 6 | 0 | 0
|
||
| **BUSSWITCH** | Bus mux for CPU instr. and data interface | 49 | 8 | 0 | 0
|
||
| CFS | Custom functions subsystem | - | - | - | -
|
||
| DMEM | Processor-internal data memory (8kB) | 18 | 2 | 65536 | 0
|
||
| DM | On-chip debugger - debug module | 493 | 240 | 0 | 0
|
||
| DTM | On-chip debugger - debug transfer module (JTAG) | 254 | 218 | 0 | 0
|
||
| GPIO | General purpose input/output ports | 67 | 65 | 0 | 0
|
||
| iCACHE | Instruction cache (1x4 blocks, 256 bytes per block) | 220 | 154 | 8192 | 0
|
||
| IMEM | Processor-internal instruction memory (16kB) | 6 | 2 | 131072 | 0
|
||
| MTIME | Machine system timer | 289 | 200 | 0 | 0
|
||
| NCO | Numerically-controlled oscillator | 254 | 226 | 0 | 0
|
||
| NEOLED | Smart LED Interface (NeoPixel/WS28128) [4xFIFO] | 347 | 309 | 0 | 0
|
||
| PWM | Pulse_width modulation controller | 71 | 69 | 0 | 0
|
||
| SPI | Serial peripheral interface | 138 | 124 | 0 | 0
|
||
| **SYSINFO** | System configuration information memory | 10 | 10 | 0 | 0
|
||
| TRNG | True random number generator | 132 | 105 | 0 | 0
|
||
| TWI | Two-wire interface | 77 | 44 | 0 | 0
|
||
| UART0/1 | Universal asynchronous receiver/transmitter 0/1 | 176 | 132 | 0 | 0
|
||
| WDT | Watchdog timer | 60 | 45 | 0 | 0
|
||
| WISHBONE | External memory interface | 129 | 104 | 0 | 0
|
||
|=======================
|
||
|
||
|
||
<<<
|
||
:sectnums:
|
||
==== Exemplary Setups
|
||
|
||
[TIP]
|
||
Exemplary setups for different technologies and various FPGA boards can be found in the `boards` folder
|
||
(https://github.com/stnolting/neorv32/tree/master/boards).
|
||
|
||
The following table shows exemplary NEORV32 processor implementation results for different FPGA
|
||
platforms. Most setups use the default peripheral configuration (like no CFS, no caches and no
|
||
TRNG), no external memory interface and only internal instruction and data memories (IMEM uses 16kB
|
||
and DMEM uses 8kB memory space).
|
||
|
||
[cols="<2,<8"]
|
||
[grid="topbot"]
|
||
|=======================
|
||
| Hardware version: | `1.4.9.0`
|
||
|=======================
|
||
|
||
.Hardware utilization for exemplary NEORV32 setups
|
||
[cols="<4,<5,<4,<4,<3,<3,<3,<4,<4,<3"]
|
||
[options="header",grid="rows"]
|
||
|=======================
|
||
| Vendor | FPGA | Board | Toolchain | CPU | LUT | FF | DSP | Memory | _f_
|
||
| Intel | Cyclone IV `EP4CE22F17-C6N` | Terasic DE0-Nano | Quartus Prime Lite 20.1 | `rv32imcu_Zicsr_Zifencei` + `PMP` | 3813 (17%) | 1890 (8%) | 0 (0%) | Memory bits: 231424 (38%) | 119 MHz
|
||
| Lattice | iCE40 UltraPlus `iCE40UP5KSG48I` | Upduino v3.0 | Radiant 2.1 | `rv32icu_Zicsr_Zifencei` | 5123 (97%) | 1972 (37%) | 0 (0%) | EBR: 12 (40%) SPRAM: 4 (100%) | 24 MHz
|
||
| Xilinx | Artix-7 `XC7A35TICSG324-1L` | Arty A7-35T | Vivado 2019.2 | `rv32imcu_Zicsr_Zifencei` + `PMP` | 2465 (12%) | 1912 (5%) | 0 (0%) | BRAM: 8 (16%) | 100 MHz
|
||
|=======================
|
||
|
||
**Notes**
|
||
|
||
* The Lattice iCE40 UltraPlus setup uses the FPGA's SPRAM memory primitives for the internal IMEM and DEMEM (each 64kB).
|
||
* The Upduino and the Arty board have on-board SPI flash memories for storing the FPGA configuration. These device can also be used by the default NEORV32 bootloader to store and automatically boot an application program after reset (both tested successfully).
|
||
* The setups with PMP implement 2 regions with a minimal granularity of 64kB.
|
||
* No HPM counters are used.
|
||
|
||
|
||
<<<
|
||
// ####################################################################################################################
|
||
:sectnums:
|
||
=== CPU Performance
|
||
|
||
:sectnums:
|
||
==== CoreMark Benchmark
|
||
|
||
.Configuration
|
||
[cols="<2,<8"]
|
||
[grid="topbot"]
|
||
|=======================
|
||
| Hardware: | 32kB IMEM, 16kB DMEM, no caches, 100MHz clock
|
||
| CoreMark: | 2000 iterations, MEM_METHOD is MEM_STACK
|
||
| Compiler: | RISCV32-GCC 10.1.0
|
||
| Peripherals: | UART for printing the results
|
||
| Compiler flags: | default, see makefile
|
||
|=======================
|
||
|
||
The performance of the NEORV32 was tested and evaluated using the https://www.eembc.org/coremark/[Core Mark CPU benchmark]. This
|
||
benchmark focuses on testing the capabilities of the CPU core itself rather than the performance of the whole
|
||
system. The according source code and the SW project can be found in the `sw/example/coremark` folder.
|
||
|
||
The resulting CoreMark score is defined as CoreMark iterations per second.
|
||
The execution time is determined via the RISC-V `[m]cycle[h]` CSRs. The relative CoreMark score is
|
||
defined as CoreMark score divided by the CPU's clock frequency in MHz.
|
||
|
||
:sectnums!:
|
||
===== Results
|
||
|
||
[cols="<2,<8"]
|
||
[grid="topbot"]
|
||
|=======================
|
||
| Hardware version: | `1.4.9.8`
|
||
|=======================
|
||
|
||
.CoreMark results
|
||
[cols="<4,>1,>1,>1"]
|
||
[options="header",grid="rows"]
|
||
|=======================
|
||
| CPU (incl. `Zicsr`) | Executable size | CoreMark Score | CoreMarks/Mhz
|
||
| `rv32i` | 28756 bytes | 36.36 | **0.3636**
|
||
| `rv32im` | 27516 bytes | 68.97 | **0.6897**
|
||
| `rv32imc` | 22008 bytes | 68.97 | **0.6897**
|
||
| `rv32imc` + _FAST_MUL_EN_ | 22008 bytes | 86.96 | **0.8696**
|
||
| `rv32imc` + _FAST_MUL_EN_ + _FAST_SHIFT_EN_ | 22008 bytes | 90.91 | **0.9091**
|
||
|=======================
|
||
|
||
[NOTE]
|
||
All executable were generated using maximum optimization `-O3`.
|
||
The _FAST_MUL_EN_ configuration uses DSPs for the multiplier of the _M_ extension (enabled via the
|
||
_FAST_MUL_EN_ generic). The _FAST_SHIFT_EN_ configuration uses a barrel shifter for CPU shift
|
||
operations (enabled via the _FAST_SHIFT_EN_ generic).
|
||
|
||
|
||
<<<
|
||
:sectnums:
|
||
==== Instruction Timing
|
||
|
||
The NEORV32 CPU is based on a multi-cycle architecture. Each instruction is executed in a sequence of
|
||
several consecutive micro operations. Hence, each instruction requires several clock cycles to execute.
|
||
|
||
The average CPI (cycles per instruction) depends on the instruction mix of a specific applications and also on
|
||
the available CPU extensions. The following table shows the performance results for successfully (!) running
|
||
2000 CoreMark iterations.
|
||
|
||
The average CPI is computed by dividing the total number of required clock cycles (only the timed core to
|
||
avoid distortion due to IO wait cycles) by the number of executed instructions (`[m]instret[h]` CSRs). The
|
||
executables were generated using optimization -O3.
|
||
|
||
[cols="<2,<8"]
|
||
[grid="topbot"]
|
||
|=======================
|
||
| Hardware version: | `1.4.9.8`
|
||
|=======================
|
||
|
||
.CoreMark instruction timing
|
||
[cols="<4,>2,>2,>2"]
|
||
[options="header",grid="rows"]
|
||
|=======================
|
||
| CPU (incl. `Zicsr`) | Required clock cycles | Executed instruction | Average CPI
|
||
| `rv32i` | 5595750503 | 1466028607 | **3.82**
|
||
| `rv32im` | 2966086503 | 598651143 | **4.95**
|
||
| `rv32imc` | 2981786734 | 611814918 | **4.87**
|
||
| `rv32imc` + _FAST_MUL_EN_ | 2399234734 | 611814918 | **3.92**
|
||
| `rv32imc` + _FAST_MUL_EN_ + _FAST_SHIFT_EN_ | 2265135174 | 611814948 | **3.70**
|
||
|=======================
|
||
|
||
[TIP]
|
||
The _FAST_MUL_EN_ configuration uses DSPs for the multiplier of the M extension (enabled via the
|
||
_FAST_MUL_EN_ generic). The _FAST_SHIFT_EN_ configuration uses a barrel shifter for CPU shift
|
||
operations (enabled via the _FAST_SHIFT_EN_ generic).
|
||
|
||
[TIP]
|
||
More information regarding the execution time of each implemented instruction can be found in
|
||
chapter <<_instruction_timing>>.
|
||
|