mirror of
https://github.com/stnolting/neorv32.git
synced 2025-04-19 11:55:05 -04:00
120 lines
7.2 KiB
Text
120 lines
7.2 KiB
Text
<<<
|
|
:sectnums:
|
|
=== Rationale
|
|
|
|
[discrete]
|
|
==== Why did you make this?
|
|
|
|
For me, processor and CPU architecture designs are fascinating things: they are the magic frontier where software meets hardware.
|
|
This project started as something like a _journey_ into this realm to understand how things actually work
|
|
down on the very low level and evolved over time to a quite capable system-on-chip.
|
|
|
|
When I started to dive into the emerging RISC-V ecosystem I felt overwhelmed by the complexity.
|
|
As a beginner it is hard to get an overview - especially when you want to setup a minimal platform to tinker with...
|
|
Which core to use? How to get the right toolchain? What features do I need? How does booting work? How do I
|
|
create an actual executable? How to get that into the hardware? How to customize things? _Where to start???_
|
|
|
|
This project aims to provide a _simple to understand_ and _easy to use_ yet _powerful_ and _flexible_ platform
|
|
that targets FPGA and RISC-V beginners as well as advanced users.
|
|
|
|
|
|
[discrete]
|
|
==== Why a _soft-core_ processor?
|
|
|
|
As a matter of fact soft-core processors _cannot_ compete with discrete (ASIC) processors in terms
|
|
of performance, energy efficiency and size. But they do fill a niche in the design space: for example, soft-core
|
|
processors allow to implement the _control flow part_ of certain applications (like communication protocol handling)
|
|
using software like plain C. This provides high flexibility as software can be easily changed, re-compiled and
|
|
re-uploaded again.
|
|
|
|
Furthermore, the concept of flexibility applies to all aspects of a soft-core processor. The user can add
|
|
_exactly_ the features that are required by the application: additional memories, custom interfaces, specialized
|
|
co-processors and even user-defined instructions. These application-specific optimization capabilities compensate
|
|
for many of the limitations of soft-core processors.
|
|
|
|
|
|
[discrete]
|
|
==== Why RISC-V?
|
|
|
|
image::riscv_logo.png[width=250,align=left]
|
|
|
|
[quote, RISC-V International, https://riscv.org/about/]
|
|
____
|
|
RISC-V is a free and open ISA enabling a new era of processor innovation through open standard collaboration.
|
|
____
|
|
|
|
Open-source is a great thing!
|
|
While open-source has already become quite popular in _software_, hardware-focused projects still need to catch up.
|
|
Although processors and CPUs are the heart of almost every digital system, having a true open-source platform is still
|
|
a rarity. RISC-V aims to change that - and even it is _just one approach_, it helps paving the road for future development.
|
|
|
|
Furthermore, I highly appreciate the community aspect of RISC-V. The ISA and everything beyond is developed in direct
|
|
contact with the community: this includes businesses and professionals but also hobbyist, amateurs and enthusiasts.
|
|
Everyone can join discussions and contribute to RISC-V in their very own way.
|
|
|
|
Finally, I really like the RISC-V ISA itself. It aims to be a clean, orthogonal and "intuitive" ISA that
|
|
resembles with the basic concepts of RISC: _simple yet effective_.
|
|
|
|
|
|
[discrete]
|
|
==== Yet another RISC-V core? What makes it special?
|
|
|
|
The NEORV32 is not based on another (RISC-V) core. It was build entirely from ground up just following the official
|
|
ISA specs. The project does not intend to replace certain RISC-V cores or beat existing ones in terms of _performance_
|
|
or _size_. It was build having a different design goal in mind.
|
|
|
|
The project aims to provide _another option_ in the RISC-V / soft-core design space with a different performance
|
|
vs. size trade-off and a different focus: embrace concepts like documentation, platform-independence / portability,
|
|
RISC-V compatibility, extensibility & customization and - last but not least - ease of use.
|
|
|
|
Furthermore, the NEORV32 pays special focus on _execution safety_ using <<_full_virtualization>>. The CPU aims to
|
|
provide fall-backs for _everything that could go wrong_. This includes malformed instruction words, privilege escalations
|
|
and even memory accesses that are checked for address space holes and deterministic response times of memory-mapped
|
|
devices. Precise exceptions allow a defined and fully-synchronized state of the CPU at every time an in every situation.
|
|
|
|
To summarize, this project pursues the following objectives (in rough order of importance):
|
|
|
|
[start=1]
|
|
. RISC-V-compliance and -compatibility
|
|
. Functionality and features
|
|
. Extensibility
|
|
. Safety and security
|
|
. Minimal area
|
|
. Short critical paths, high operating clock
|
|
. Simplicity / easy to understand
|
|
. Low-power design
|
|
. High overall performance
|
|
|
|
|
|
[discrete]
|
|
==== A multi-cycle architecture?!
|
|
|
|
The primary goal of many mainstream CPUs is pure performance. Deep pipelines and out-of-order execution are some concepts
|
|
to boost performance, while also increasing complexity. In contrast, most CPUs used for teaching are single-cycle designs
|
|
since they are probably the most easiest to understand. But what about something in-between?
|
|
|
|
In terms of energy, throughput, area and maximal clock frequency, multi-cycle architectures are somewhere in between
|
|
single-cycle and fully-pipelined designs: they provide higher throughput and clock speed when compared to their
|
|
single-cycle counterparts while having less hardware complexity (= area) and thus, less performance, then a fully-pipelined
|
|
designs. So I decided to use the multi-cycle-approach because of the following reasons:
|
|
|
|
* Multi-cycle architectures are quite small! There is no need for pipeline hazard detection/resolution logic
|
|
(e.g. forwarding). Furthermore, you can "re-use" parts of the core to do several tasks (e.g. the ALU is used for
|
|
actual data processing and also for address generation, branch condition check and branch target computation).
|
|
* Single-cycle architectures require memories that can be read asynchronously - a thing that is not feasible to implement
|
|
in real-world applications (i.e. FPGA block RAM is entirely synchronous). Furthermore, such designs usually have a very
|
|
long critical path tremendously reducing maximal operating frequency.
|
|
* Pipelined designs increase performance by having several instruction "in fly" at the same time. But this also means
|
|
there is some kind of "out-of-order" behavior: if an instruction at the end of the pipeline causes an exception
|
|
all the instructions in earlier stages have to be invalidated. Potential architectural state changes have to be made _undone_
|
|
requiring additional logic (Spectre and Meltdown...). In a multi-cycle architecture this situation cannot occur since only a
|
|
single instruction is being processed ("in-fly") at a time.
|
|
* Having only a single instruction in fly does not only reduce hardware costs, it also simplifies
|
|
simulation/verification/debugging, state preservation/restoring during exceptions and extensibility (no need to care
|
|
about pipeline hazards) - but of course at the cost of reduced throughput.
|
|
|
|
To counteract the loss of performance implied by a _pure_ multi-cycle architecture, the NEORV32 CPU uses a _mixed_
|
|
approach: instruction-fetch (front-end) and instruction-execution (back-end) are de-coupled to operate independently
|
|
of each other. Data is interchanged via a queue building a simple 2-stage pipeline. Each "pipeline" stage in terms is
|
|
implemented as multi-cycle architecture to simplify the hardware and to provide _precise_ state control (for example
|
|
during exceptions).
|