[docs] extended section "Rational"

Why a multi-cycle architecture?
This commit is contained in:
stnolting 2022-07-21 21:31:01 +02:00
parent eed2be4bca
commit d84f56c581

View file

@ -68,3 +68,38 @@ Furthermore, the NEORV32 pays special focus on _execution safety_ using <<_full_
provide fall-backs for _everything that could go wrong_. This includes malformed instruction words, privilege escalations
and even memory accesses that are checked for address space holes and deterministic response times of memory-mapped
devices. Precise exceptions allow a defined and fully-synchronized state of the CPU at every time an in every situation.
**A multi-cycle architecture?!?**
Most mainstream CPUs out there are pipelined architectures to increase throughput. In contrast, most CPUs used for
teaching are single-cycle designs since they are probably the most easiest to understand. But what about the
multi-cycle architectures?
In terms of energy, throughput, area and maximal clock frequency multi-cycle architectures are somewhere in between
single-single and fully-pipelined designs: they provide higher throughput and clock speed when compared to their
single-cycle counterparts and have less complexity (= area) then a fully-pipelined designs. I decided to use the
multi-cycle approach because of the following reasons:
* Multi-cycle architecture are damn small! There is no need for pipeline hazard detection and resolution logic
(e.g. forwarding) plus you can "re-use" parts of the core to do several tasks (e.g. the ALU is used for the actual data
processing, but also for address generation, branch condition check and branch target computation).
* Single-cycle architectures require memories that can be read asynchronously - a thing that is not feasible to implement
in real world applications (i.e. FPGA block is entirely synchronous). Furthermore, such design usually have a very (very!)
long critical path tremendously reducing maximal operating frequency.
* Pipelined designs increase performance by having several instruction "in fly" at the same time. But this also means
there is some kind of "out-of-order" behavior: if an instruction at the end of the pipeline causes an exception
all the instructions in earlier stages have to be invalidated. Potential architecture state changes have to be made _undone_
requiring additional (-> exception-handling) logic. In a multi-cycle architecture this situation cannot occur because only a
single instruction is "in fly" at a time.
* Having only a single instruction in fly does not only reduce hardware costs, it also simplifies simulation/verification/debugging,
state preservation/restoring during exceptions and extensibility (no need to care about pipeline hazards) - but of course at the
cost of reduced throughput.
* To partly counteract this loss of performance the NEORV32 CPU uses a _mixed_ approach: instruction fetch (front-end) and
instruction execution (back-end) are de-coupled to operate independently of each other. Data is interchanged via a queue
building a simple 2-stage pipeline. Each "pipeline" stage in terms is implemented as multi-cycle architecture to simplify
the hardware and to provide _precise_ state control (e.g. during exceptions).
.CPU Architecture Details
[TIP]
Want to know more? Check out the description in the CPU's <<_architecture>> section.