mirror of
https://github.com/stnolting/neorv32.git
synced 2025-04-23 21:57:33 -04:00
updated documentary for new processor version 1.0.0.0
This commit is contained in:
parent
28d4d3f6ef
commit
d99ea599c5
3 changed files with 60 additions and 87 deletions
147
README.md
147
README.md
|
@ -56,9 +56,14 @@ For more information take a look a the [
|
||||
The processor is synthesizable (tested with Intel Quartus Prime, Xilinx Vivado and Lattice Radiant/LSE) and can successfully execute
|
||||
all the [provided example programs](https://github.com/stnolting/neorv32/tree/master/sw/example) including the CoreMark benchmark.
|
||||
|
||||
The processor passes the `rv32i`, `rv32im`, `rv32imc` and `rv32Zicsr` *RISC-V compliance tests*.
|
||||
|
||||
[RISC-V compliance test](https://github.com/stnolting/neorv32_compliance_test):
|
||||
[](https://travis-ci.com/stnolting/neorv32_riscv_compliance)
|
||||
|
||||
The processor is synthesizable (tested with Intel Quartus Prime and Lattice Radiant/Synplify) and can successfully execute all the [provided example programs](https://github.com/stnolting/neorv32/tree/master/sw/example).
|
||||
|
||||
## Features
|
||||
|
||||
|
@ -73,7 +78,7 @@ The processor is synthesizable (tested with Intel Quartus Prime and Lattice Radi
|
|||
- Completely described in behavioral, platform-independent VHDL – no primitives, macros, etc.
|
||||
- Fully synchronous design, no latches, no gated clocks
|
||||
- Small hardware footprint and high operating frequency
|
||||
- Highly customizable processor configuration
|
||||
- Customizable processor configuration
|
||||
- Optional processor-internal data and instruction memories (DMEM/IMEM)
|
||||
- Optional internal bootloader with UART console and automatic SPI flash boot option
|
||||
- Optional machine system timer (MTIME), RISC-V-compliant
|
||||
|
@ -135,8 +140,8 @@ The CPU is compliant to the [official RISC-V specifications](https://raw.githubu
|
|||
* Machine external interrupt (via `CLIC` unit)
|
||||
|
||||
**General**:
|
||||
* No hardware support of unaligned accesses (except for instructions in `C` extension that still have to be aligned on 16-bit boundaries)
|
||||
* Multi-cycle in-order instruction execution
|
||||
* No hardware support of unaligned accesses - they will trigger and exception
|
||||
* Two stages in-order pipeline (FETCH, EXECUTE); each stage uses a multi-cycle execution
|
||||
|
||||
More information including a detailed list of the available CSRs can be found in
|
||||
the [ NEORV32 datasheet](https://raw.githubusercontent.com/stnolting/neorv32/master/docs/NEORV32.pdf).
|
||||
|
@ -144,11 +149,11 @@ the [
|
||||
- No exception is triggered in `E`-mode when using reg >x15 yet
|
||||
- Port Dhrystone benchmark
|
||||
- Implement atomic extensions (`A` extension)
|
||||
- Implement co-processor for single-precision floating-point (`F` extension)
|
||||
- Implement atomic operations (`A` extension)
|
||||
- Implement `Zifence` extension
|
||||
- Implement co-processor for single-precision floating-point operations (`F` extension)
|
||||
- Implement user mode (`U` extension)
|
||||
- Make a 64-bit branch
|
||||
- Maybe port an RTOS (like [freeRTOS](https://www.freertos.org/) or [RIOT](https://www.riot-os.org/))
|
||||
|
@ -162,38 +167,45 @@ a DE0-nano board. The design was synthesized using **Intel Quartus Prime Lite 19
|
|||
information is derived from the Timing Analyzer / Slow 1200mV 0C Model. If not otherwise specified, the default configuration
|
||||
of the processor's generics is assumed. No constraints were used.
|
||||
|
||||
Results generated for hardware version: `0.0.2.3`
|
||||
Results generated for hardware version: `1.0.0.0`
|
||||
|
||||
### CPU
|
||||
|
||||
| CPU Configuration | LEs | FFs | Memory bits | DSPs | f_max |
|
||||
|:--------------------|:----------:|:--------:|:-----------:|:------:|:-------:|
|
||||
| `rv32i` | 852 (4%) | 326 (1%) | 2048 (>1%) | 0 (0%) | 111 MHz |
|
||||
| `rv32i` + `Zicsr` | 1488 (7%) | 694 (3%) | 2048 (>1%) | 0 (0%) | 107 MHz |
|
||||
| `rv32im` + `Zicsr` | 2057 (9%) | 941 (4%) | 2048 (>1%) | 0 (0%) | 102 MHz |
|
||||
| `rv32imc` + `Zicsr` | 2209 (10%) | 958 (4%) | 2048 (>1%) | 0 (0%) | 102 MHz |
|
||||
| `rv32e` | 848 (4%) | 326 (1%) | 1024 (>1%) | 0 (0%) | 111 MHz |
|
||||
| `rv32e` + `Zicsr` | 1316 (6%) | 594 (3%) | 1024 (>1%) | 0 (0%) | 106 MHz |
|
||||
| `rv32em` + `Zicsr` | 1879 (8%) | 841 (4%) | 1024 (>1%) | 0 (0%) | 101 MHz |
|
||||
| `rv32emc` + `Zicsr` | 2065 (9%) | 858 (4%) | 1024 (>1%) | 0 (0%) | 100 MHz |
|
||||
| `rv32i` | 1027 | 474 | 2048 | 0 (0%) | 111 MHz |
|
||||
| `rv32i` + `Zicsr` | 1721 | 868 | 2048 | 0 (0%) | 104 MHz |
|
||||
| `rv32im` + `Zicsr` | 2298 | 1115 | 2048 | 0 (0%) | 103 MHz |
|
||||
| `rv32imc` + `Zicsr` | 2557 | 1138 | 2048 | 0 (0%) | 103 MHz |
|
||||
| `rv32emc` + `Zicsr` | 2342 | 1005 | 1024 | 0 (0%) | 100 MHz |
|
||||
|
||||
### Peripherals / Others
|
||||
### Processor-Internal Peripherals and Memories
|
||||
|
||||
| Module | Description | LEs | FFs | Memory bits | DSPs |
|
||||
|:---------|:------------------------------------------------|:---:|:---:|:-----------:|:----:|
|
||||
| BOOT ROM | Bootloader ROM (4kB) | 3 | 1 | 32 768 | 0 |
|
||||
| DEVNULL | Dummy device | 2 | 1 | 0 | 0 |
|
||||
| DEVNULL | Dummy device | 3 | 1 | 0 | 0 |
|
||||
| DMEM | Processor-internal data memory (8kB) | 12 | 2 | 65 536 | 0 |
|
||||
| GPIO | General purpose input/output ports | 37 | 33 | 0 | 0 |
|
||||
| GPIO | General purpose input/output ports | 38 | 33 | 0 | 0 |
|
||||
| IMEM | Processor-internal instruction memory (16kb) | 7 | 2 | 131 072 | 0 |
|
||||
| MTIME | Machine system timer | 369 | 168 | 0 | 0 |
|
||||
| PWM | Pulse-width modulation controller | 77 | 69 | 0 | 0 |
|
||||
| SPI | Serial peripheral interface | 198 | 125 | 0 | 0 |
|
||||
| TRNG | True random number generator | 103 | 93 | 0 | 0 |
|
||||
| TWI | Two-wire interface | 76 | 44 | 0 | 0 |
|
||||
| UART | Universal asynchronous receiver/transmitter | 154 | 108 | 0 | 0 |
|
||||
| MTIME | Machine system timer | 270 | 167 | 0 | 0 |
|
||||
| PWM | Pulse-width modulation controller | 76 | 69 | 0 | 0 |
|
||||
| SPI | Serial peripheral interface | 206 | 125 | 0 | 0 |
|
||||
| TRNG | True random number generator | 104 | 93 | 0 | 0 |
|
||||
| TWI | Two-wire interface | 78 | 44 | 0 | 0 |
|
||||
| UART | Universal asynchronous receiver/transmitter | 151 | 108 | 0 | 0 |
|
||||
| WDT | Watchdog timer | 57 | 45 | 0 | 0 |
|
||||
|
||||
### CPU + Peripheral
|
||||
|
||||
The following table shows the implementation results for an _Intel Cyclone IV EP4CE22F17C6N_ FPGA.
|
||||
The design was synthesized using Intel Quartus Prime Lite 19.1 (“balanced implementation”).
|
||||
IMEM uses 16kB and DMEM uses 8kB memory space.
|
||||
|
||||
| CPU Configuration | LEs | REGs | DSPs | Memory Bits | f_max |
|
||||
|:--------------------|:----------:|:---------:|:------:|:------------:|:-------:|
|
||||
| `rv32imc` + `Zicsr` | 3724 (17%) | 1899 (9%) | 0 (0%) | 231424 (38%) | 103 MHz |
|
||||
|
||||
|
||||
### Lattice iCE40 UltraPlus 5k
|
||||
|
||||
|
@ -205,11 +217,9 @@ instruction and data memories (each 64kB) based on SPRAM primitives. The FPGA-sp
|
|||
Place & route reports generated with **Lattice Radiant 2.1** using Lattice LSE. The clock frequency
|
||||
is constrained and generated via the PLL from the internal HF oscillator running at 12 MHz.
|
||||
|
||||
Results generated for hardware version: `0.0.2.5`
|
||||
|
||||
| CPU Configuration | Slices | LUT | REG | DSPs | SPRAM | EBR | f |
|
||||
|:--------------------|:----------:|:----------:|:----------:|:------:|:--------:|:--------:|:---------:|
|
||||
| `rv32imc` + `Zicsr` | 2405 (91%) | 4642 (87%) | 1810 (34%) | 0 (0%) | 4 (100%) | 12 (40%) | 20.25 MHz |
|
||||
| CPU Configuration | LUTs | REGs | DSPs | SPRAM | EBR | f |
|
||||
|:--------------------|:----------:|:----------:|:------:|:--------:|:--------:|:---------:|
|
||||
| `rv32imc` + `Zicsr` | 4985 (94%) | 1982 (38%) | 0 (0%) | 4 (100%) | 12 (40%) | 20.25 MHz |
|
||||
|
||||
|
||||
## Performance
|
||||
|
@ -220,7 +230,7 @@ The [CoreMark CPU benchmark](https://www.eembc.org/coremark) was executed on the
|
|||
[sw/example/coremark](https://github.com/stnolting/neorv32/blob/master/sw/example/coremark) project folder. This benchmark
|
||||
tests the capabilities of a CPU itself rather than the functions provided by the whole system / SoC.
|
||||
|
||||
Results generated for hardware version: `0.0.2.3`
|
||||
Results generated for hardware version: `1.0.0.0`
|
||||
|
||||
~~~
|
||||
**Configuration**
|
||||
|
@ -232,12 +242,12 @@ Used peripherals: MTIME for time measurement, UART for printing the results
|
|||
|
||||
| __Configuration__ | __Optimization__ | __Executable Size__ | __CoreMark Score__ | __CoreMarks/MHz__ |
|
||||
|:------------------|:----------------:|:-------------------:|:------------------:|:-----------------:|
|
||||
| `rv32i` | `-Os` | 17 944 bytes | 23.26 | 0.232 |
|
||||
| `rv32i` | `-O2` | 20 264 bytes | 25.64 | 0.256 |
|
||||
| `rv32im` | `-Os` | 16 880 bytes | 40.81 | 0.408 |
|
||||
| `rv32im` | `-O2` | 19 312 bytes | 47.62 | 0.476 |
|
||||
| `rv32imc` | `-Os` | 13 000 bytes | 32.78 | 0.327 |
|
||||
| `rv32imc` | `-O2` | 15 004 bytes | 37.04 | 0.370 |
|
||||
| `rv32i` | `-Os` | 18 044 bytes | 21.98 | 0.21 |
|
||||
| `rv32i` | `-O2` | 20 388 bytes | 25 | 0.25 |
|
||||
| `rv32im` | `-Os` | 16 980 bytes | 40 | 0.40 |
|
||||
| `rv32im` | `-O2` | 19 436 bytes | 51.28 | 0.51 |
|
||||
| `rv32imc` | `-Os` | 13 076 bytes | 39.22 | 0.39 |
|
||||
| `rv32imc` | `-O2` | 15 208 bytes | 50 | 0.50 |
|
||||
|
||||
|
||||
### Instruction Cycles
|
||||
|
@ -250,31 +260,16 @@ CPU extensions.
|
|||
Please note that the CPU-internal shifter (e.g. for the `SLL` instruction) as well as the multiplier and divider of the
|
||||
`M` extension use a bit-serial approach and require several cycles for completion.
|
||||
|
||||
The following table shows the performance results for successfully (!) running 2000 CoreMark
|
||||
The following table shows the performance results for successfully running 2000 CoreMark
|
||||
iterations. The average CPI is computed by dividing the total number of required clock cycles (all of CoreMark
|
||||
– not only the timed core) by the number of executed instructions (`instret[h]` CSRs). The executables
|
||||
were generated using optimization `-O2`.
|
||||
|
||||
| CPU / Toolchain Config. | Required Clock Cycles | Executed Instructions | Average CPI |
|
||||
|:------------------------|----------------------:|----------------------:|:-----------:|
|
||||
| `rv32i` | 10 385 023 697 | 1 949 310 506 | 5.3 |
|
||||
| `rv32im` | 6 276 943 488 | 995 011 883 | 6.3 |
|
||||
| `rv32imc` | 7 340 734 652 | 934 952 588 | 7.6 |
|
||||
|
||||
|
||||
### Evaluation
|
||||
|
||||
Based on the provided performance measurement and the hardware utilization for the
|
||||
different CPU configurations, the following configurations are suggested:
|
||||
|
||||
|
||||
| Design Goal | NEORV32 CPU Config. |
|
||||
|:-------------------------------|:--------------------|
|
||||
| Highest performance: | `rv32im` |
|
||||
| Lowest memory requirements: | `rv32imc` |
|
||||
| Lowest hardware requirements*: | `rv32ec` |
|
||||
|
||||
*) Including on-chip memory hardware requirements.
|
||||
| `rv32i` | 19 355 607 369 | 2 995 064 579 | 6.5 |
|
||||
| `rv32im` | 5 809 384 583 | 867 377 291 | 6.7 |
|
||||
| `rv32imc` | 5 560 220 723 | 825 898 407 | 6.7 |
|
||||
|
||||
|
||||
|
||||
|
@ -294,6 +289,7 @@ entity neorv32_top is
|
|||
CLOCK_FREQUENCY : natural := 0; -- clock frequency of clk_i in Hz
|
||||
HART_ID : std_ulogic_vector(31 downto 0) := x"00000000"; -- custom hardware thread ID
|
||||
BOOTLOADER_USE : boolean := true; -- implement processor-internal bootloader?
|
||||
CSR_COUNTERS_USE : boolean := true; -- implement RISC-V perf. counters ([m]instret[h], [m]cycle[h], time[h])?
|
||||
-- RISC-V CPU Extensions --
|
||||
CPU_EXTENSION_RISCV_C : boolean := false; -- implement compressed extension?
|
||||
CPU_EXTENSION_RISCV_E : boolean := false; -- implement embedded RF extension?
|
||||
|
@ -347,9 +343,9 @@ entity neorv32_top is
|
|||
uart_txd_o : out std_ulogic; -- UART send data
|
||||
uart_rxd_i : in std_ulogic := '0'; -- UART receive data
|
||||
-- SPI (available if IO_SPI_USE = true) --
|
||||
spi_sclk_o : out std_ulogic; -- serial clock line
|
||||
spi_mosi_o : out std_ulogic; -- serial data line out
|
||||
spi_miso_i : in std_ulogic := '0'; -- serial data line in
|
||||
spi_sck_o : out std_ulogic; -- serial clock line
|
||||
spi_sdo_o : out std_ulogic; -- serial data line out
|
||||
spi_sdi_i : in std_ulogic := '0'; -- serial data line in
|
||||
spi_csn_o : out std_ulogic_vector(07 downto 0); -- SPI CS
|
||||
-- TWI (available if IO_TWI_USE = true) --
|
||||
twi_sda_io : inout std_logic := 'H'; -- twi serial data line
|
||||
|
@ -412,7 +408,7 @@ Now its time to get the most recent version the NEORV32 Processor project from G
|
|||
|
||||
$ git clone https://github.com/stnolting/neorv32.git
|
||||
|
||||
Create a new HW project with your FPGA synthesis tool of choice. Add all files from the [`rtl/core`](https://github.com/stnolting/neorv32/blob/master/rtl)
|
||||
Create a new HW project with your FPGA design tool of choice. Add all files from the [`rtl/core`](https://github.com/stnolting/neorv32/blob/master/rtl)
|
||||
folder to this project and add them to a **new library** called `neorv32`.
|
||||
|
||||
You can either instantiate the [processor's top entity](https://github.com/stnolting/neorv32#top-entity) in you own project, or you
|
||||
|
@ -464,30 +460,6 @@ uses the following default UART configuration:
|
|||
|
||||
Use the bootloader console to upload and execute your application image.
|
||||
|
||||
```
|
||||
<< NEORV32 Bootloader >>
|
||||
|
||||
BLDV: Jun 22 2020
|
||||
HWV: 0.0.2.3
|
||||
CLK: 0x0134FD90 Hz
|
||||
MISA: 0x42801104
|
||||
CONF: 0x01FF0015
|
||||
IMEM: 0x00010000 bytes @ 0x00000000
|
||||
DMEM: 0x00010000 bytes @ 0x80000000
|
||||
|
||||
Autoboot in 8s. Press key to abort.
|
||||
Aborted.
|
||||
|
||||
Available commands:
|
||||
h: Help
|
||||
r: Restart
|
||||
u: Upload
|
||||
s: Store to flash
|
||||
l: Load from flash
|
||||
e: Execute
|
||||
CMD:>
|
||||
```
|
||||
|
||||
Going further: Take a look at the [ NEORV32 datasheet](https://raw.githubusercontent.com/stnolting/neorv32/master/docs/NEORV32.pdf).
|
||||
|
||||
|
||||
|
@ -512,6 +484,7 @@ If you are using the NEORV32 Processor in some kind of publication, please cite
|
|||
## Legal
|
||||
|
||||
This is a hobby project released under the BSD 3-Clause license. No copyright infringement intended.
|
||||
Other implied/used projects might have different licensing - see their documentation to get more information.
|
||||
|
||||
**BSD 3-Clause License**
|
||||
|
||||
|
@ -561,4 +534,4 @@ Continous integration provided by [Travis CI](https://travis-ci.com/stnolting/ne
|
|||
This project is not affiliated with or endorsed by the Open Source Initiative (https://www.oshwa.org / https://opensource.org).
|
||||
|
||||
|
||||
Made with :heart: in Hannover, Germany.
|
||||
Made with :coffee: in Hannover, Germany.
|
||||
|
|
BIN
docs/NEORV32.pdf
BIN
docs/NEORV32.pdf
Binary file not shown.
Binary file not shown.
Before Width: | Height: | Size: 37 KiB After Width: | Height: | Size: 42 KiB |
Loading…
Add table
Add a link
Reference in a new issue