mirror of
https://github.com/stnolting/neorv32.git
synced 2025-04-24 22:27:21 -04:00
[docs] extended section "Rational"
Why a multi-cycle architecture?
This commit is contained in:
parent
eed2be4bca
commit
d84f56c581
1 changed files with 35 additions and 0 deletions
|
@ -68,3 +68,38 @@ Furthermore, the NEORV32 pays special focus on _execution safety_ using <<_full_
|
|||
provide fall-backs for _everything that could go wrong_. This includes malformed instruction words, privilege escalations
|
||||
and even memory accesses that are checked for address space holes and deterministic response times of memory-mapped
|
||||
devices. Precise exceptions allow a defined and fully-synchronized state of the CPU at every time an in every situation.
|
||||
|
||||
|
||||
**A multi-cycle architecture?!?**
|
||||
|
||||
Most mainstream CPUs out there are pipelined architectures to increase throughput. In contrast, most CPUs used for
|
||||
teaching are single-cycle designs since they are probably the most easiest to understand. But what about the
|
||||
multi-cycle architectures?
|
||||
|
||||
In terms of energy, throughput, area and maximal clock frequency multi-cycle architectures are somewhere in between
|
||||
single-single and fully-pipelined designs: they provide higher throughput and clock speed when compared to their
|
||||
single-cycle counterparts and have less complexity (= area) then a fully-pipelined designs. I decided to use the
|
||||
multi-cycle approach because of the following reasons:
|
||||
|
||||
* Multi-cycle architecture are damn small! There is no need for pipeline hazard detection and resolution logic
|
||||
(e.g. forwarding) plus you can "re-use" parts of the core to do several tasks (e.g. the ALU is used for the actual data
|
||||
processing, but also for address generation, branch condition check and branch target computation).
|
||||
* Single-cycle architectures require memories that can be read asynchronously - a thing that is not feasible to implement
|
||||
in real world applications (i.e. FPGA block is entirely synchronous). Furthermore, such design usually have a very (very!)
|
||||
long critical path tremendously reducing maximal operating frequency.
|
||||
* Pipelined designs increase performance by having several instruction "in fly" at the same time. But this also means
|
||||
there is some kind of "out-of-order" behavior: if an instruction at the end of the pipeline causes an exception
|
||||
all the instructions in earlier stages have to be invalidated. Potential architecture state changes have to be made _undone_
|
||||
requiring additional (-> exception-handling) logic. In a multi-cycle architecture this situation cannot occur because only a
|
||||
single instruction is "in fly" at a time.
|
||||
* Having only a single instruction in fly does not only reduce hardware costs, it also simplifies simulation/verification/debugging,
|
||||
state preservation/restoring during exceptions and extensibility (no need to care about pipeline hazards) - but of course at the
|
||||
cost of reduced throughput.
|
||||
* To partly counteract this loss of performance the NEORV32 CPU uses a _mixed_ approach: instruction fetch (front-end) and
|
||||
instruction execution (back-end) are de-coupled to operate independently of each other. Data is interchanged via a queue
|
||||
building a simple 2-stage pipeline. Each "pipeline" stage in terms is implemented as multi-cycle architecture to simplify
|
||||
the hardware and to provide _precise_ state control (e.g. during exceptions).
|
||||
|
||||
.CPU Architecture Details
|
||||
[TIP]
|
||||
Want to know more? Check out the description in the CPU's <<_architecture>> section.
|
||||
|
|
Loading…
Add table
Add a link
Reference in a new issue