[docs] rework, update and cleanup entire datasheet

2025-04-24 14:17:51 -04:00 · 2023-03-15 22:09:19 +01:00 · 2023-03-15 22:09:19 +01:00 · 33a729ca01
commit 33a729ca01
parent 01f0bb9d31
32 changed files with 1824 additions and 3318 deletions
--- a/docs/attrs.adoc
+++ b/docs/attrs.adoc
@ -1,4 +1,4 @@
-:author: Stephan Nolting (M.Sc.)
+:author: by Stephan Nolting (M.Sc.)
 :keywords: neorv32, risc-v, riscv, rv32, fpga, soft-core, vhdl, microcontroller, cpu, soc, processor, gcc, openocd, gdb
 :description: A size-optimized, customizable and highly extensible MCU-class 32-bit RISC-V soft-core CPU and microcontroller-like SoC written in platform-independent VHDL.
 :revnumber: v1.8.2
@ -7,6 +7,6 @@
 :stem:
 :reproducible:
 :listing-caption: Listing
-:toclevels: 4
+:toclevels: 3
 :title-logo-image: neorv32_logo_riscv.png[pdfwidth=6.25in,align=center]
 :favicon: img/icon.png
--- a/docs/datasheet/cpu.adoc
+++ b/docs/datasheet/cpu.adoc
--- a/docs/datasheet/cpu_cfu.adoc
+++ b/docs/datasheet/cpu_cfu.adoc
@ -2,9 +2,8 @@
 :sectnums:
 === Custom Functions Unit (CFU)

-The Custom Functions Unit is the central part of the <<_zxcfu_custom_instructions_extension_cfu>> and represents
-the actual hardware module, which is used to implement _custom RISC-V instructions_. The concept of the NEORV32
-CFU has been highly inspired by https://github.com/google/CFU-Playground[Google's CFU-Playground].
+The Custom Functions Unit is the central part of the <<_zxcfu_isa_extension>> and represents
+the actual hardware module, which can be used to implement _custom RISC-V instructions_.

 The CFU is intended for operations that are inefficient in terms of performance, latency, energy consumption or
 program memory requirements when implemented entirely in software. Some potential application fields and exemplary
@ -19,8 +18,8 @@ use-cases might include:
 [NOTE]
 The CFU is not intended for complex and _CPU-independent_ functional units that implement complete accelerators
 (like block-based AES encryption). These kind of accelerators should be implemented as memory-mapped
-<<_custom_functions_subsystem_cfs>>.
-A comparison of all NEORV32-specific chip-internal hardware extension options is provided in the user guide section
+<<_custom_functions_subsystem_cfs>>. A comparison of all NEORV32-specific chip-internal hardware extension
+options is provided in the user guide section
 https://stnolting.github.io/neorv32/ug/#_adding_custom_hardware_modules[Adding Custom Hardware Modules].


@ -28,34 +27,24 @@ https://stnolting.github.io/neorv32/ug/#_adding_custom_hardware_modules[Adding C
 ==== CFU Instruction Formats

 The custom instructions executed by the CFU utilize a specific opcode space in the `rv32` 32-bit instruction
-space that has been explicitly reserved for user-defined extensions by the RISC-V specifications ("_Guaranteed Non-Standard
-Encoding Space_"). The NEORV32 CFU uses the `custom-x` opcodes to identify the instructions implemented
-by the CFU and to differentiate between the different instruction formats.
-The according binary encoding of these opcodes is shown below:
+space that has been explicitly reserved for user-defined extensions by the RISC-V specifications ("Guaranteed
+Non-Standard Encoding Space"). The NEORV32 CFU uses the `custom` opcodes to identify the instructions implemented
+by the CFU and to differentiate between the different instruction formats. The according binary encoding of these
+opcodes is shown below:

-* `custom-0`: `0001011` (R3-type instructions, RISC-V standard)
-* `custom-1`: `0101011` (R4-type instructions, RISC-V standard)
-* `custom-2`: `1011011` (R5-type instruction A, NEORV32-specific)
-* `custom-3`: `1111011` (R5-type instruction B, NEORV32-specific)
-
-.CFU Instructions - Exceptions
-[IMPORTANT]
-The CPU control logic only analyzes the opcode of the custom instructions to check if the _entire_
-instruction word is valid. All remaining bit-fields are **not checked** at all.
-This also means that the MSBs of the register fields are **not checked** even if the `E` ISA extension
-is enabled (for standard RISC-V instructions this would cause an exception).
-Hence, a custom CFU instruction can never raise an illegal instruction exception. If the CFU is not
-implemented at all (`Zxcfu` ISA extension is not enabled) any instruction with `custom-x` opcode
-will raise an illegal instruction exception.
+* `custom-0`: `0001011` RISC-V standard, used for CFU R3-type instructions
+* `custom-1`: `0101011` RISC-V standard, used for CFU R4-type instructions
+* `custom-2`: `1011011` NEORV32-specific, used for CFU R5-type instruction A
+* `custom-3`: `1111011` NEORV32-specific, used for CFU R5-type instruction B


 :sectnums:
 ==== CFU R3-Type Instructions

-The R3-type CFU instructions operate on two source registers and return the processing result to the destination register.
-The actual operation can be defined by using the `funct7` and `funct3` bit fields. These immediates can also be used to
-pass additional data to the CFU like offsets, look-up-tables addresses or shift-amounts. However, the actual
-functionality is entirely user-defined.
+The R3-type CFU instructions operate on two source registers `rs1` and `rs2` and return the processing result to
+the destination register `rd`. The actual operation can be defined by using the `funct7` and `funct3` bit fields.
+These immediates can also be used to pass additional data to the CFU like offsets, look-up-tables addresses or
+shift-amounts. However, the actual functionality is entirely user-defined.

 Example operation: `rd <= rs1 xnor rs2`

@ -75,17 +64,17 @@ The CFU R3-type instruction format is compliant to the RISC-V ISA specification.

 .Instruction encoding space
 [NOTE]
-By using the `funct7` and `funct3` bit fields entirely for selecting the actual operation a total of 1024 custom R3-type
-instructions can be implemented (7-bit + 3-bit = 10 bit -> 1024 different values).
+By using the `funct7` and `funct3` bit fields entirely for selecting the actual operation a total of 1024 custom
+R3-type instructions can be implemented (7-bit + 3-bit = 10 bit -> 1024 different values).


 :sectnums:
 ==== CFU R4-Type Instructions

-The R4-type CFU instructions operate on three source registers and return the processing result to the destination register.
-The actual operation can be defined by using the `funct3` bit field. Alternatively, this immediate can also be used to
-pass additional data to the CFU like offsets, look-up-tables addresses or shift-amounts. However, the actual
-functionality is entirely user-defined.
+The R4-type CFU instructions operate on three source registers `rs1, `rs2` and `rs2` and return the processing
+result to the destination register `rd`. The actual operation can be defined by using the `funct3` bit field.
+Alternatively, this immediate can also be used to pass additional data to the CFU like offsets, look-up-tables
+addresses or shift-amounts. However, the actual functionality is entirely user-defined.

 Example operation: `rd <= (rs1 * rs2 + rs3)[31:0]`

@ -105,23 +94,24 @@ The CFU R4-type instruction format is compliant to the RISC-V ISA specification.

 .Unused instruction bits
 [NOTE]
-The RISC-V ISA specification defines bits [26:25] of the R4-type instruction word to be all-zero. These bits are ignored
-by the hardware (CFU and illegal instruction check logic) and should be set to all-zero to preserve compatibility with
-future implementations.
+The RISC-V ISA specification defines bits [26:25] of the R4-type instruction word to be all-zero. These bits
+are ignored by the hardware (CFU and illegal instruction check logic) and should be set to all-zero to preserve
+compatibility with future ISA spec. versions.

 .Instruction encoding space
 [NOTE]
-By using the `funct3` bit field entirely for selecting the actual operation a total of 8 custom R4-type instructions
-can be implemented (3-bit -> 8 different values).
+By using the `funct3` bit field entirely for selecting the actual operation a total of 8 custom R4-type
+instructions can be implemented (3-bit -> 8 different values).


 :sectnums:
 ==== CFU R5-Type Instructions

-The R5-type CFU instructions operate on three source registers and return the processing result to the destination register.
-As all bits of the instruction word are used to encode the five registers and the opcode, no further immediate bits
-are available to specify the actual operation. There are two different R5-type instruction with two different opcodes
-available. Hence, only two R5-type operations can be implemented out of the box.
+The R5-type CFU instructions operate on four source registers `rs1`, `rs2`, `rs3` and `r4` and return the
+processing result to the destination register `rd`. As all bits of the instruction word are used to encode the
+five registers and the opcode, no further immediate bits are available to specify the actual operation. There
+are two different R5-type instruction with two different opcodes available. Hence, only two R5-type operations
+can be implemented out of the box.

 Example operation: `rd <= rs1 & rs2 & rs3 & rs4`

@ -146,7 +136,7 @@ decoding logic as the location of the remaining register fields is identical to
 .RISC-V compatibility
 [IMPORTANT]
 The RISC-V ISA specifications does not specify a R5-type instruction format. Hence, this instruction
-layout is NEORV32-specific.
+format is NEORV32-specific.

 .Instruction encoding space
 [IMPORTANT]
@ -160,9 +150,9 @@ writing operation information to a CFU-internal "command" register.
 ==== Using Custom Instructions in Software

 The custom instructions provided by the CFU can be used in plain C code by using **intrinsics**. Intrinsics
-behave like "normal" functions but under the hood they are a set of macros that hide the complexity of inline assembly.
-Using intrinsics removes the need to modify the compiler, built-in libraries or the assembler when including custom
-instructions. Each intrinsic will result in a single 32-bit instruction word providing maximum code efficiency.
+behave like "normal" C functions but under the hood they are a set of macros that hide the complexity of inline assembly.
+Using intrinsics removes the need to modify the compiler, built-in libraries or the assembler when using custom
+instructions. Each intrinsic will be compiled into a single 32-bit instruction word providing maximum code efficiency.

 The NEORV32 software framework provides four pre-defined prototypes for custom instructions, which are defined in
 `sw/lib/include/neorv32_cpu_cfu.h`:
@ -177,18 +167,18 @@ neorv32_cfu_r5_instr_b(rs1, rs2, rs3, rs4)     // R5-type instruction B
 ----

 The intrinsic functions always return a 32-bit value of type `uint32_t` (the processing result), which can be discarded
-when not needed. Each intrinsic function requires several arguments depending on the instruction type/format:
+if not needed. Each intrinsic function requires several arguments depending on the instruction type/format:

 * `funct7` - 7-bit immediate (R3-type only)
 * `funct3` - 3-bit immediate (R3-type, R4-type)
 * `rs1` - source operand 1, 32-bit (R3-type, R4-type)
 * `rs2` - source operand 2, 32-bit (R3-type, R4-type)
-* `rs3` - source operand 2, 32-bit (R3-type, R4-type, R5-type)
-* `rs4` - source operand 2, 32-bit (R4-type, R4-type, R5-type)
+* `rs3` - source operand 3, 32-bit (R3-type, R4-type, R5-type)
+* `rs4` - source operand 4, 32-bit (R4-type, R4-type, R5-type)

-The `funct3` and `funct7` bit-fields are used to pass 3-bit or 7-bit literals to the CFU. The `rs1`, `rs2` and `rs3`
-arguments pass the actual data to the CFU. These register arguments can be populated with variables or literals.
-The following example shows how to pass arguments when executing both CFU instruction types:
+The `funct3` and `funct7` bit-fields are used to pass 3-bit or 7-bit literals to the CFU. The `rs1`, `rs2`, `rs3`
+and `r4` arguments pass the actual data to the CFU. These register arguments can be populated with variables or
+literals. The following example shows how to pass arguments when executing all exemplary CFU instruction types:

 .CFU instruction usage example
 [source,c]
@ -196,7 +186,7 @@ The following example shows how to pass arguments when executing both CFU instru
 uint32_t tmp = some_function();
 ...
 uint32_t res = neorv32_cfu_r3_instr(0b0000000, 0b101, tmp, 123);
-uint32_t foo = neorv32_cfu_r4_instr(0b011, tmp, res, some_array[i]);
+uint32_t foo = neorv32_cfu_r4_instr(0b011, tmp, res, (uint32_t)some_array[i]);
 uint32_t bar = neorv32_cfu_r5_instr_a(tmp, res, foo, tmp);
 ----

@ -212,6 +202,11 @@ This example program is located in `sw/example/demo_cfu`.
 The actual functionality of the CFU's custom instructions is defined by the user-defined logic inside
 the CFU hardware module `rtl/core/neorv32_cpu_cp_cfu.vhd`.

+CFU operations can be entirely combinatorial (like bit-reversal) so the result is available at the end of
+the current clock cycle. Operations can also take several clock cycles to complete (like multiplications)
+and may also include internal states and memories. The CFU's internal control unit takes care of
+interfacing the custom user logic to the CPU pipeline.
+
 .CFU Hardware Example & More Details
 [TIP]
 The default CFU hardware module already implement some exemplary instructions that are used for illustration
@ -224,13 +219,14 @@ Enabling the CFU and actually implementing R4-type and/or R5-type instructions (
 the according operands for the CFU hardware) will add one or two additional read ports to the core's
 register file increasing resource requirements.

-CFU operations can be entirely combinatorial (like bit-reversal) so the result is available at the end of
-the current clock cycle. Operations can also take several clock cycles to complete (like multiplications)
-and may also include internal states and memories. The CFU's internal control/proxy unit takes care of
-interfacing the custom user logic to the CPU pipeline.
-
 .CFU Execution Time
 [NOTE]
 The CFU has to complete computation within a **bound time window**. Otherwise, the CFU operation is terminated
 by the hardware and an illegal instruction exception is raised. See section <<_cpu_arithmetic_logic_unit>>
 for more information.
+
+.CFU Exception
+[NOTE]
+The CFU can intentionally raise an illegal instruction exception by not asserting the "done" signal within
+a bound time window. For example this can be used to signal invalid configurations/operations to the runtime
+environment. See the CFU's VHDL file for more information.
--- a/docs/datasheet/cpu_csr.adoc
+++ b/docs/datasheet/cpu_csr.adoc
--- a/docs/datasheet/on_chip_debugger.adoc
+++ b/docs/datasheet/on_chip_debugger.adoc
@ -2,9 +2,9 @@
 :sectnums:
 == On-Chip Debugger (OCD)

-The NEORV32 Processor features an _on-chip debugger_ (OCD) implementing **execution-based debugging** compatible
-to the **Minimal RISC-V Debug Specification Version 1.0**. Please refer to this spec for in-deep information.
-A copy of the specification is available in `docs/references`.
+The NEORV32 Processor features an _on-chip debugger_ (OCD) implementing the **execution-based debugging** scheme,
+which is compatible to the **Minimal RISC-V Debug Specification Version 1.0**. A copy of the specification is
+available in `docs/references`.

 **Section Structure**

@ -13,9 +13,9 @@ A copy of the specification is available in `docs/references`.
 * <<_cpu_debug_mode>>
 * <<_trigger_module>>

-The NEORV32 OCD provides the following key features:
+**Key Features**

-* JTAG access port
+* standard JTAG access port
 * run-control of the CPU: halting, single-stepping and resuming
 * executing arbitrary programs during debugging
 * indirect access to all core registers (via program buffer)
@ -25,15 +25,13 @@ The NEORV32 OCD provides the following key features:

 .OCD Security Note
 [NOTE]
-Access via the OCD is _always authenticated_ (`dmstatus.authenticated = 1`). Hence, the
-_whole system_ can always be accessed via the on-chip debugger. Currently, there is no option
-to disable the OCD via software - the OCD can only be disabled by disabling implementation
-(setting <<_on_chip_debugger_en>> generic to _false_).
+Access via the OCD is **always authenticated** (`dmstatus.authenticated = 1`). Hence, the entire system can always
+be accessed via the on-chip debugger.

 .Hands-On Tutorial
 [TIP]
-A simple example on how to use NEORV32 on-chip debugger in combination with OpenOCD and the GNU debugger
-is shown in section https://stnolting.github.io/neorv32/ug/#_debugging_using_the_on_chip_debugger[Debugging using the On-Chip Debugger]
+A simple example on how to use NEORV32 on-chip debugger in combination with OpenOCD and the GNU debugger is shown in
+section https://stnolting.github.io/neorv32/ug/#_debugging_using_the_on_chip_debugger[Debugging using the On-Chip Debugger]
 of the User Guide.

 The NEORV32 on-chip debugger complex is based on four hardware modules:
@ -43,19 +41,17 @@ image::neorv32_ocd_complex.png[align=center]

 [start=1]
 . <<_debug_transport_module_dtm>> (`rtl/core/neorv32_debug_dtm.vhd`): JTAG access tap to allow an external
-  adapter to interface with the _debug module(DM)_ using the _debug module interface (dmi)_ - this interface is compatible to
-  the interface description shown in Appendix 3 of the "RISC-V debug stable" specification.
-. <<_debug_module_dm>> (`rtl/core/neorv32_debug_tm.vhd`): Debugger control unit that is configured by the DTM via the
-  the _dmi_. From the CPU's "point of view" this module behaves as another memory-mapped "peripheral" that can be accessed
-  via the processor-internal bus. The memory-mapped registers provide an internal _data buffer_ for data transfer
-  from/to the DM, a _code ROM_ containing the "park loop" code, a _program buffer_ to allow the debugger to
-  execute small programs defined by the DM and a _status register_ that is used to communicate
-  _exception, _halt_, _resume_ and _execute_ requests/acknowledges from/to the DM.
-. CPU <<_cpu_debug_mode>> extension (part of `rtl/core/neorv32_cpu_control.vhd`):
-  This extension provides the "debug execution mode" which executes the "park loop" code from the DM.
-  The mode also provides additional CSRs.
-. CPU <<_trigger_module>> (also part of `rtl/core/neorv32_cpu_control.vhd`):
-  This module provides a single _hardware_ breakpoint, which allows to debug code executed from ROM.
+adapter to interface with the _debug module (DM)_ using the _debug module interface (dmi)_ - this interface is compatible to
+the interface description shown in Appendix 3 of the "RISC-V debug" specification.
+. <<_debug_module_dm>> (`rtl/core/neorv32_debug_tm.vhd`): Debugger control unit that is configured by the DTM via the _dmi_.
+From the CPU's "point of view" this module behaves as another memory-mapped "peripheral" that can be accessed via the
+processor-internal bus. The memory-mapped registers provide an internal _data buffer_ for data transfer from/to the DM, a
+_code ROM_ containing the "park loop" code, a _program buffer_ to allow the debugger to execute small programs defined by the
+DM and a _status register_ that is used to communicate _exception, _halt_, _resume_ and _execute_ requests/acknowledges from/to the DM.
+. CPU <<_cpu_debug_mode>> extension (part of `rtl/core/neorv32_cpu_control.vhd`): This extension provides the "debug execution mode"
+which executes the "park loop" code from the DM. The mode also provides additional CSRs.
+. CPU <<_trigger_module>> (also part of `rtl/core/neorv32_cpu_control.vhd`): This module provides a single _hardware_ breakpoint,
+which allows to debug code executed from ROM.

 **Theory of Operation**

@ -65,12 +61,11 @@ state of the system/CPU is "frozen" so the debugger can monitor if without inter
 However, the OCD can also modify the entire architectural state at any time.

 While in debug mode, the CPU executes the "park loop" code from the _code ROM_ of the DM.
-This park loop implements an endless loop, in which the CPU polls the memory-mapped _status register_ that is
+This park loop implements an endless loop, where the CPU polls the memory-mapped _status register_ that is
 controlled by the _debug module (DM)_. The flags in this register are used to communicate requests from
 the DM and to acknowledge them by the CPU: trigger execution of the program buffer or resume the halted
 application. Furthermore, the CPU uses this register to signal that the CPU has halted after a halt request
-and to signal that an exception has fired while in debug mode.
-
+and to signal that an exception has fired while being in debug mode.


 <<<
@ -79,8 +74,6 @@ and to signal that an exception has fired while in debug mode.
 === Debug Transport Module (DTM)

 The debug transport module (VHDL module: `rtl/core/neorv32_debug_dtm.vhd`) provides a JTAG test access port (TAP).
-The DTM is the first entity in the debug system, which connects and external debugger via JTAG to the next debugging
-entity - the debug module (DM).
 External JTAG access is provided by the following top-level ports.

 .JTAG top level signals
@ -88,7 +81,7 @@ External JTAG access is provided by the following top-level ports.
 [options="header",grid="rows"]
 |=======================
 | Name          | Width | Direction | Description
-| `jtag_trst_i` | 1     | in        | TAP reset (low-active); this signal is optional, make sure to pull it _high_ if it is not used by the JTAG adapter
+| `jtag_trst_i` | 1     | in        | TAP reset (low-active); this signal is optional, make sure to pull it _high_ if not used
 | `jtag_tck_i`  | 1     | in        | serial clock
 | `jtag_tdi_i`  | 1     | in        | serial data input
 | `jtag_tdo_o`  | 1     | out       | serial data output
@ -97,25 +90,14 @@ External JTAG access is provided by the following top-level ports.

 .Maximum JTAG Clock
 [IMPORTANT]
-All JTAG signals are synchronized to the processor clock domain by oversampling them in the DTM. Hence, no additional
-clock domain is required for the DTM. However, this constraints the maximal JTAG clock frequency (`jtag_tck_i`) to be less
-than or equal to **1/5** of the processor clock frequency (`clk_i`).
+All JTAG signals are synchronized to the processor's clock domain. Hence, no additional clock domain is required for the DTM.
+However, this constraints the maximal JTAG clock frequency (`jtag_tck_i`) to be less than or equal to **1/5** of the processor
+clock frequency (`clk_i`).

 [NOTE]
-If the on-chip debugger is disabled (<<_on_chip_debugger_en>> = false) the JTAG serial input `jtag_tdi_i` is directly
+If the on-chip debugger is disabled the JTAG serial input `jtag_tdi_i` is directly
 connected to the JTAG serial output `jtag_tdo_o` to maintain the JTAG chain.

-[NOTE]
-The NEORV32 JTAG TAP does not provide a _boundary check_ function (yet?). Hence, physical device pins cannot be accessed.
-
-The DTM uses the "debug module interface (dmi)" to access the actual debug module (DM).
-The accesses are controlled by TAP-internal registers, which are selected by the JTAG instruction register (`IR`)
-and accessed through the JTAG data register (`DR`).
-
-[NOTE]
-The DTM's instruction and data registers can be accessed using OpenOCD's `irscan` and `drscan` commands.
-OpenOCD also provides low-level RISC-V-specific commands for direct DMI accesses (`riscv dmi_read` & `riscv dmi_write`).
-
 JTAG accesses are based on a single _instruction register_ `IR`, which is 5 bit wide, and several _data registers_ `DR`
 with different sizes. The individual data registers are accessed by writing the according address to the instruction
 register. The following table shows the available data registers and their addresses:
@ -146,16 +128,9 @@ register. The following table shows the available data registers and their addre
 | 3:0    | `version`      | r/- | `0001` = DTM is compatible to spec. version 0.13 & 1.0
 |=======================

-[INFO]
-See the https://github.com/riscv/riscv-debug-spec[RISC-V debug specification] for more information regarding the data
-registers and operations. A local copy can be found in `docs/references`.
-
 [NOTE]
-Most FPGAs are programmed over a JTAG connection itself and support the use of it in user designs with instantiation of
-platform-specific entities. So instead of two JTAG connections, one to program the FPGA and one to debug the core,
-only one connection is needed. See the setups in [`neorv32-setups`](https://github.com/stnolting/neorv32-setups)
-for example implementations.
-
+The DTM's instruction and data registers can be accessed using OpenOCD's `irscan` and `drscan` commands.
+OpenOCD also provides low-level RISC-V-specific commands for direct DMI accesses (`riscv dmi_read` & `riscv dmi_write`).


 <<<
@ -163,9 +138,9 @@ for example implementations.
 :sectnums:
 === Debug Module (DM)

-According to the RISC-V debug specification, the DM (VHDL module: `rtl/core/neorv32_debug_dm.vhd`)
-acts as a translation interface between abstract operations issued by the debugger (application) and the
-platform-specific debugger (circuit) implementation. It supports the following features (excerpt from the debug spec):
+The debug module "DM" (VHDL module: `rtl/core/neorv32_debug_dm.vhd`) acts as a translation interface between abstract
+operations issued by the debugger (application) and the platform-specific debugger (circuit) implementation.
+It supports the following features:

 * Gives the debugger necessary information about the implementation.
 * Allows the hart to be halted and resumed and provides status of the current state.
@ -175,10 +150,9 @@ platform-specific debugger (circuit) implementation. It supports the following f
 * Provides a Program Buffer to force the hart to execute arbitrary instructions.
 * Allows memory access from a hart's point of view.

-The NEORV32 DM follows the "Minimal RISC-V External Debug Specification" to provide full debugging
-capabilities while keeping resource/area requirements at a minimum level.
-It implements the **execution based debugging scheme** for a single hart and provides the following
-hardware features:
+The NEORV32 DM follows the "Minimal RISC-V External Debug Specification" to provide full debugging capabilities while
+keeping resource/area requirements at a minimum level. It implements the **execution based debugging scheme** for a
+single hart and provides the following hardware features:

 * program buffer with 2 entries and implicit `ebreak` instruction afterwards
 * no _direct_ bus access; indirect bus access via the CPU using the program buffer
@ -195,9 +169,7 @@ debugging control and status (<<_dm_cpu_access>>).
 :sectnums:
 ==== DM Registers

-The DM is controlled via a set of registers that are accessed via the DTM's _dmi_.
-The "Minimal RISC-V Debug Specification" requires only a subset of the registers specified in the spec.
-The following registers are implemented:
+The DM is controlled via a set of registers that are accessed via the DTM's _dmi_. The following registers are implemented:

 [NOTE]
 Write accesses to registers that are not implemented are simply ignored and read accesses will always return zero.
@ -219,7 +191,7 @@ their functionality.
 |  `0x1d` | (`nextdm`)     | Base address of _next_ DM; reads as zero to indicate there is only _one_ DM
 |  `0x20` | `progbuf0`     | Program buffer 0
 |  `0x21` | `progbuf1`     | Program buffer 1
-|  `0x38` | (`sbcs`)       | System bus access control and status; reads as zero to indicate there is no _direct_ system bus access
+|  `0x38` | (`sbcs`)       | System bus access control and status; reads as zero to indicate there is **no** direct system bus access
 |=======================


@ -247,7 +219,7 @@ their functionality.
 are configured as "zero" and are read-only. Writing '1' to these bits/fields will be ignored.
 |======

-.`dmcontrol` - debug module control register bits
+.`dmcontrol` Register Bits
 [cols="^1,^2,^1,<8"]
 [options="header",grid="rows"]
 |=======================
@ -255,7 +227,7 @@ are configured as "zero" and are read-only. Writing '1' to these bits/fields wil
 | 31  | `haltreq`      | -/w | set/clear hart halt request
 | 30  | `resumereq`    | -/w | request hart to resume
 | 28  | `ackhavereset` | -/w | write `1` to clear `*havereset` flags
-|  1  | `ndmreset`     | r/w | put whole processor into reset when `1`
+|  1  | `ndmreset`     | r/w | put whole processor into reset sate when `1`
 |  0  | `dmactive`     | r/w | DM enable; writing `0`-`1` will reset the DM
 |=======================

@ -271,7 +243,7 @@ are configured as "zero" and are read-only. Writing '1' to these bits/fields wil
 3+| Current status of the overall debug module and the hart. The entire register is read-only.
 |======

-.`dmstatus` - debug module status register bits
+.`dmstatus` Register Bits
 [cols="^1,^2,<10"]
 [options="header",grid="rows"]
 |=======================
@ -310,7 +282,7 @@ are configured as "zero" and are read-only. Writing '1' to these bits/fields wil
 3+| This register gives information about the hart. The entire register is read-only.
 |======

-.`hartinfo` - hart information register bits
+.`hartinfo` Register Bits
 [cols="^1,^2,<8"]
 [options="header",grid="rows"]
 |=======================
@ -335,7 +307,7 @@ are configured as "zero" and are read-only. Writing '1' to these bits/fields wil
 3+| Command execution info and status.
 |======

-.`abstracts` - abstract control and status register bits
+.`abstracts` Register Bits
 [cols="^1,^2,^1,<8"]
 [options="header",grid="rows"]
 |=======================
@ -358,10 +330,6 @@ Error codes in `cmderr` (highest priority first):
 * `010` - unsupported command
 * `001` - invalid DM register read/write while command is/was executing

-.PMP Rules
-[NOTE]
-When in debug-mode all PMP rules are ignored making the debugger have maximum access rights.
-

 :sectnums!:
 ===== **`command`**
@ -379,7 +347,7 @@ When in debug-mode all PMP rules are ignored making the debugger have maximum ac
 The NEORV32 DM only supports **Access Register** abstract commands. These commands can only access the
 hart's GPRs (abstract command register index `0x1000` - `0x101f`).

-.`command` - abstract command register - "access register" commands only
+.`command` Register Bits
 [cols="^1,^2,^1,<8"]
 [options="header",grid="rows"]
 |=======================
@ -406,7 +374,7 @@ hart's GPRs (abstract command register index `0x1000` - `0x101f`).
 3+| Register to configure when a read/write access to a DM repeats execution of the last abstract command.
 |======

-.`abstractauto` - Abstract command auto-execution register bits
+.`abstractauto` Register Bits
 [cols="^1,^2,^1,<8"]
 [options="header",grid="rows"]
 |=======================
@ -439,13 +407,11 @@ From the CPU's perspective, the DM behaves as a memory-mapped peripheral that in
 * a program buffer populated by the debugger host to execute small programs
 * a data buffer to transfer data between the processor and the debugger host
 * a status register to communicate debugging requests and status
-(see `sw/ocd-firmware/README.md`).

 The DM occupies 256 bytes of the CPU's address space starting at address `dm_base_c` (see table below).
 This address space is divided into four sections of 64 bytes each to provide access to the _park loop code ROM_,
 the _program buffer_, the _data buffer_ and the _status register_. The program buffer, the data buffer and the
-status register do not fully occupy the 64-byte-wide sections. However, the according registers are mirrored
-to fill the entire section.
+status register do not fully occupy the 64-byte-wide sections and are mirrored to fill the entire section.

 .DM CPU access - address map (divided into four sections)
 [cols="^2,^4,^2,<7"]
@ -488,11 +454,11 @@ has occurred _while executing_ the park loop itself.
 |=======================

 When the CPU enters or re-enters debug mode (for example via an `ebreak` in the DM's program buffer), it jumps to
-address of the _normal entry point_ for the park loop code defined by the <<_cpu_debug_park_addr>> generic.
-By default, this generic is set to `dm_park_entry_c`, which is defined in main package file.
-If an exception is encountered during debug mode, the CPU jumps to the address of the _exception entry point_
-defined  by the <<_cpu_debug_exc_addr>> generic. By default, this generic is set to `dm_exc_entry_c`, which is
-also defined in main package file.
+address of the _normal entry point_ for the park loop code defined by the `CPU_DEBUG_PARK_ADDR` generic
+(<<_cpu_top_entity_generics>>). By default, this generic is set to `dm_park_entry_c`, which is defined in main
+package file. If an exception is encountered during debug mode, the CPU jumps to the address of the _exception
+entry point_ defined  by the `CPU_DEBUG_EXC_ADDR` generic (<<_cpu_top_entity_generics>>). By default, this generic
+is set to `dm_exc_entry_c`, which is also defined in main package file.


 :sectnums:
@ -522,12 +488,6 @@ and faster execution.
        | `sreg_execute_ack` | write      <| Set by the CPU if an exception occurs while being in debug mode
 |=======================

-.Access Details
-[NOTE]
-The underlaying hardware to implement the CPU access to the status register is highly optimized to provide
-fastest access times while requiring minimal code and hardware size: the actual data written by the CPU is irrelevant
-as only the sub-byte accesses (so, the actual bus transactions) are tracked by the status register hardware.
-

 <<<
 // ####################################################################################################################
@ -535,23 +495,25 @@ as only the sub-byte accesses (so, the actual bus transactions) are tracked by t
 === CPU Debug Mode

 The NEORV32 CPU Debug Mode `DB` or `DEBUG` is compatible to the **Minimal RISC-V Debug Specification 1.0**
-`Sdext` (external debug) ISA extension. When enabled via the <<_cpu_extension_riscv_sdext>> generic (CPU) and/or
-the <<_on_chip_debugger_en>> (Processor) it adds a new CPU operation mode ("debug mode"), three additional CSRs
+`Sdext` (external debug) ISA extension. When enabled via the <<_sdext_isa_extension>> generic (CPU) and/or
+the `ON_CHIP_DEBUGGER_EN` (Processor) it adds a new CPU operation mode ("debug mode"), three additional CSRs
 (section <<_cpu_debug_mode_csrs>>) and one additional instruction (`dret`) to the core.

 [IMPORTANT]
 The CPU debug mode requires the `Zicsr` and `Zifencei` CPU extension to be implemented (top generics
-<<_cpu_extension_riscv_zicsr>> and <<_cpu_extension_riscv_zifencei>> = true).
+<<_zicsr_isa_extension>> and <<_zifencei_isa_extension>> = true).

-The CPU debug-mode is entered when one of the following events appear:
+The CPU debug-mode is entered on any of the following events:

 [start=1]
-. executed `ebreak` instruction (when in machine-mode and `dcsr.ebreakm` is set OR when in user-mode and `dcsr.ebreaku` is set)
+. executed `ebreak` instruction (when in machine-mode and <<_dcsr>>`.ebreakm` is set OR when in user-mode and <<_dcsr>>`.ebreaku` is set)
 . debug halt request from external DM (via CPU signal `db_halt_req_i`, high-active, triggering on rising-edge)
-. finished executing of a single instruction while in single-step debugging mode (enabled via `dcsr.step`)
+. finished executing of a single instruction while in single-step debugging mode (enabled via <<_dcsr>>`.step`)
 . hardware trigger by the <<_trigger_module>>

-From a hardware point of view, these "entry conditions" are special traps that are handled transparently by the control logic.
+[NOTE]
+From a hardware point of view, these "entry conditions" are special traps that are handled transparently by
+the control logic.

 **Whenever the CPU enters debug-mode it performs the following operations:**

@ -560,7 +522,7 @@ From a hardware point of view, these "entry conditions" are special traps that a
 * copy the hart's current privilege level to `dcsr.prv`
 * set `dcrs.cause` according to the cause why debug mode is entered
 * **no update** of `mtval`, `mcause`, `mtval` and `mstatus` CSRs
-* load the address configured via the CPU's <<_cpu_debug_park_addr>> generic to the `pc` to jump to the
+* load the address configured via the CPU's `CPU_DEBUG_PARK_ADDR` (<<_cpu_top_entity_generics>>) generic to the `pc` to jump to the
 "debugger park loop" code stored in the debug module (DM)

 **When the CPU is in debug-mode the following things are important:**
@ -570,8 +532,8 @@ From a hardware point of view, these "entry conditions" are special traps that a
 * the `wfi` instruction acts as a `nop` (also during single-stepping)
 * if an exception occurs while being in debug mode:
 ** if the exception was caused by any debug-mode entry action the CPU jumps to the _normal entry point_
-   (defined by <<_cpu_debug_park_addr>> generic) of the park loop again (for example when executing `ebreak` while in debug-mode)
-** for all other exception sources the CPU jumps to the _exception entry point_ (defined by <<_cpu_debug_exc_addr>> generic)
+   (defined by `CPU_DEBUG_PARK_ADDR` generic of the <<_cpu_top_entity_generics>>) of the park loop again (for example when executing `ebreak` while in debug-mode)
+** for all other exception sources the CPU jumps to the _exception entry point_ (defined by `CPU_DEBUG_EXC_ADDR` generic of the <<_cpu_top_entity_generics>>)
   to signal an exception to the DM; the CPU restarts the park loop again afterwards
 * interrupts are disabled; however, they will remain pending and will get executed after the CPU has left debug mode
 * if the DM makes a resume request, the park loop exits and the CPU leaves debug mode (executing `dret`)
@ -579,8 +541,8 @@ From a hardware point of view, these "entry conditions" are special traps that a
 <<_machine_system_timer_mtime>> keep running as well as it's shadowed copies in the `[m]time[h]` CSRs
 * all <<_hardware_performance_monitors_hpm_csrs>> are stopped

-Debug mode is left either by executing the `dret` instruction or by performing
-a hardware reset of the CPU. Executing `dret` outside of debug mode will raise an illegal instruction exception.
+Debug mode is left either by executing the `dret` instruction or by performing a hardware reset of the CPU.
+Executing `dret` outside of debug mode will raise an illegal instruction exception.

 **Whenever the CPU leaves debug mode it performs the following operations:**

@ -610,7 +572,6 @@ an illegal instruction exception is raised.
 | 0x7b0 | **Debug control and status register** | `dcsr`
 3+<| Reset value: `0x40000413`
 3+<| The `dcsr` CSR is compatible to the RISC-V debug spec. It is used to configure debug mode and provides additional status information.
-The following bits are implemented. The reaming bits are read-only and always read as zero.
 |======

 .Debug control and status register `dcsr` bits
@ -637,7 +598,7 @@ The following bits are implemented. The reaming bits are read-only and always re

 Cause codes in `dcsr.cause` (highest priority first):

-* `010` - trigger by hardware <<_trigger_module>>
+* `010` - triggered by hardware <<_trigger_module>>
 * `001` - executed `EBREAK` instruction
 * `011` - external halt request (from DM)
 * `100` - return from single-stepping
@ -674,7 +635,7 @@ debug mode is entered. The `dret` instruction will return to `dpc` by moving `dp
 === Trigger Module

 The RISC-V `Sdtrig` ISA extension add a programmable _trigger module_ to the processor when enabled
-(via the <<_cpu_extension_riscv_sdtrig>>). The NEORV32 trigger module implements a subset of the features
+(via the <<_sdtrig_isa_extension>>). The NEORV32 trigger module implements a subset of the features
 described in the "RISC-V Debug Specification / Trigger Module".

 The trigger module only provides a _single_ trigger supporting only the "instruction address match" type. This limitation
@ -720,7 +681,7 @@ for implementing a "hardware breakpoint"
 Write attempts to the hardwired bits are ignored.
 |======

-.Match control CSR (`tdata1`) bits
+.Match Control CSR (`tdata1`) Bits
 [cols="^1,^2,^1,<8"]
 [options="header",grid="rows"]
 |=======================
--- a/docs/datasheet/overview.adoc
+++ b/docs/datasheet/overview.adoc
@ -1,9 +1,9 @@
 :sectnums:
 == Overview

-The NEORV32footnote:[Pronounced "neo-R-V-thirty-two" or "neo-risc-five-thirty-two" in its long form.] is an open-source
-RISC-V compatible processor system that is intended as *ready-to-go* auxiliary processor within a larger SoC
-designs or as stand-alone custom / customizable microcontroller.
+The NEORV32 RISC-V Processor is an open-source RISC-V compatible processor system that is intended as
+*ready-to-go* auxiliary processor within a larger SoC designs or as stand-alone custom / customizable
+microcontroller.

 The system is highly configurable and provides optional common peripherals like embedded memories,
 timers, serial interfaces, general purpose IO ports and an external bus interface to connect custom IP like
@ -63,24 +63,24 @@ include::rationale.adoc[]

 * all-in-one package: **CPU** + **SoC** + **Software Framework & Tooling**
 * completely described in behavioral, platform-independent VHDL - no vendor- or technology-specific primitives, attributes, macros, libraries, etc. are used at all
-* all-Verilog "version" https://github.com/stnolting/neorv32-verilog[available] (auto-generated netlist)
+* all-Verilog "version" available (auto-generated netlist)
 * extensive configuration options for adapting the processor to the requirements of the application
-* highly https://stnolting.github.io/neorv32/ug/#_comparative_summary[extensible hardware] - on CPU, SoC and system level
+* highly extensible hardware - on CPU, SoC and system level
 * aims to be as small as possible while being as RISC-V-compliant as possible - with a reasonable area-vs-performance trade-off
-* FPGA friendly (e.g. _all_ internal memories can be mapped to block RAM - including the register file)
+* FPGA friendly (e.g. all internal memories can be mapped to block RAM - including the register file)
 * optimized for high clock frequencies to ease timing closure and integration
 * from zero to _"hello world!"_ - completely open source and documented
 * easy to use even for FPGA/RISC-V starters – intended to _work out of the box_

 **NEORV32 CPU (the core)**

-* 32-bit `rv32i` RISC-V CPU
-* fully RISC-V ISA compatible - checked by the https://github.com/stnolting/neorv32-riscof[official RISCOF architecture tests]
+* 32-bit RISC-V CPU
+* fully compatible to the RISC-V ISA specs. - checked by the https://github.com/stnolting/neorv32-riscof[official RISCOF architecture tests]
 * base ISA + privileged ISA + several optional standard and custom ISA extensions
-* option to add custom RISC-V instructions as custom ISA extension
-* rich set of customization options (ISA extensions, design goal: performance / area (/ energy), ...)
-* aims to support <<_full_virtualization>> capabilities to increase execution safety
-* official https://github.com/riscv/riscv-isa-manual/blob/master/marchid.md[RISC-V open source architecture ID]: decimal **19**; hexadecimal `0x00000013`
+* option to add user-defined RISC-V instructions as custom ISA extension
+* rich set of customization options (ISA extensions, design goal: performance / area / energy, tuning options, ...)
+* <<_full_virtualization>> capabilities to increase execution safety
+* official RISC-V open source architecture ID

 **NEORV32 Processor (the SoC)**

@ -90,8 +90,8 @@ include::rationale.adoc[]
 * optional timers and counters (watchdog, system timer)
 * optional general purpose IO and PWM; a native NeoPixel(c)-compatible smart LED interface
 * optional embedded memories / caches for data, instructions and bootloader
-* optional external memory interface (Wishbone / AXI4-Lite) and stream link interface (AXI4-Stream) for custom connectivity
-* optional execute_in_place (XIP) module to execute code _directly_ form external SPI flash
+* optional external memory interface for custom connectivity
+* optional execute in-place (XIP) module to execute code directly form an external SPI flash
 * on-chip debugger compatible with OpenOCD and gdb including hardware trigger module

 **Software framework**
@ -103,17 +103,14 @@ include::rationale.adoc[]
 * doxygen-based documentation of the software framework; a deployed version is available at https://stnolting.github.io/neorv32/sw/files.html
 * FreeRTOS port + demos available

-[TIP]
-For more in-depth details regarding the feature provided by he hardware see the according sections:
-<<_neorv32_central_processing_unit_cpu>> and <<_neorv32_processor_soc>>.

 **Extensibility and Customization**

-The NEORV32 processor was designed to ease customization and extensibility and provides several options for adding
+The NEORV32 processor is designed to ease customization and extensibility and provides several options for adding
 application-specific custom hardware modules and accelerators. The three most common options for adding custom
 on-chip modules are listed below.

-* <<_processor_external_memory_interface_wishbone_axi4_lite>> for processor-external modules
+* <<_processor_external_memory_interface_wishbone>> to attach processor-external IP modules
 * <<_custom_functions_subsystem_cfs>> for tightly-coupled processor-internal co-processors
 * <<_custom_functions_unit_cfu>> for custom RISC-V instructions

@ -169,11 +166,11 @@ neorv32                - Project home folder
 === VHDL File Hierarchy

 All necessary VHDL hardware description files are located in the project's `rtl/core` folder. The top entity
-of the entire processor including all the required configuration generics is **`neorv32_top.vhd`**.
+of the entire processor including all the required configuration generics is `neorv32_top.vhd`.

+.NEORV32 VHDL Library
 [IMPORTANT]
-All core VHDL files from the list below have to be assigned to a new design library named **`neorv32`**. Additional
-files, like alternative top entities, can be assigned to any library.
+All core VHDL files from the list below have to be assigned to a new design library named `neorv32`.

 ...................................
 neorv32_top.vhd                  - NEORV32 Processor top entity
@ -230,7 +227,7 @@ neorv32_top.vhd                  - NEORV32 Processor top entity
 The processor-internal instruction and data memories (IMEM and DMEM) are split into two design files each:
 a plain entity definition (`neorv32_*mem.entity.vhd`) and the actual architecture definition
 (`mem/neorv32_*mem.default.vhd`). The `*.default.vhd` architecture definitions from `rtl/core/mem` provide a _generic_ and
-_platform independent_ memory design that (should) infers embedded memory blocks. You can replace/modify the architecture
+_platform independent_ memory design (inferring embedded memory blocks). You can replace/modify the architecture
 source file in order to use platform-specific features (like advanced memory resources) or to improve technology mapping
 and/or timing.

@ -240,12 +237,10 @@ and/or timing.
 :sectnums:
 === FPGA Implementation Results

-This section shows _exemplary_ FPGA implementation results for the NEORV32 CPU and NEORV32 Processor modules.
-Note that certain configuration options might also have an impact on other configuration options. Furthermore,
-this report cannot cover all possible option combinations. Hence, the presented implementation results are
-just _exemplary_. If not otherwise mentioned all implementations use the default generic configurations.
+[NOTE]
+This section shows **exemplary** FPGA implementation results for the NEORV32 CPU and NEORV32 Processor modules.

-:sectnums:
+[discrete]
 ==== CPU

 [cols="<2,<8"]
@ -273,19 +268,14 @@ just _exemplary_. If not otherwise mentioned all implementations use the default
 | `rv32imcbu_Zicsr_Zicntr_Zifencei_Zfinx_DebugMode` | 4825 | 2018 |     1024 |    7 | 123 MHz
 |=======================

-[NOTE]
-The table above does not show _all_ CPU ISA extensions. More sophisticated and application-specific
-options like PMP and HMP are not included in this overview.
-
 .Goal-Driven Optimization
 [TIP]
-The CPU provides further options to reduce the area footprint (for example by constraining the CPU-internal
-counter sizes) or to increase performance (for example by using a barrel-shifter; at cost of extra hardware).
+The CPU provides further options to reduce the area footprint or to increase performance.
 See section <<_processor_top_entity_generics>> for more information. Also, take a look at the User Guide section
 https://stnolting.github.io/neorv32/ug/#_application_specific_processor_configuration[Application-Specific Processor Configuration].


-:sectnums:
+[discrete]
 ==== Processor - Modules

 [cols="<2,<8"]
@ -330,13 +320,9 @@ https://stnolting.github.io/neorv32/ug/#_application_specific_processor_configur
 | XIRQ          | External interrupt controller (32 channels)                    | 245 | 200 |        0 |    0
 |=======================

-[NOTE]
-Note that not all IOs were actually connected to FPGA pins (for example some GPIO inputs and outputs)
-when generating these reports.

-
-:sectnums:
-==== Processor - Exemplary Setups
+[discrete]
+==== Processor - Exemplary Setup

 [cols="<2,<8"]
 [grid="topbot"]
@ -356,7 +342,7 @@ when generating these reports.
 | 2488  | 1807 |     7 |    4 | 150 MHz
 |=======================

-.Exemplary Setups
+.Exemplary Processor Setups
 [TIP]
 Check out the `neorv32-setups` repository (on GitHub: https://github.com/stnolting/neorv32-setups),
 which provides several demo setups and community projects for various FPGA boards and toolchains.
@ -368,17 +354,13 @@ which provides several demo setups and community projects for various FPGA board
 === CPU Performance

 The performance of the NEORV32 was tested and evaluated using the https://www.eembc.org/coremark/[Core Mark CPU benchmark].
-This benchmark focuses on testing the capabilities of the CPU core itself rather than the performance of the whole
-system. The according sources can be found in the `sw/example/coremark` folder.
+The according sources can be found in the `sw/example/coremark` folder.
+The resulting CoreMark score is defined as CoreMark iterations per second per MHz.

-.Dhrystone
+.Dhrystone Benchmark
 [TIP]
 A very simple port of the Dhrystone benchmark is also available in `sw/example/dhrystone`.

-The resulting CoreMark score is defined as CoreMark iterations per second.
-The execution time is determined via the RISC-V `[m]cycle[h]` CSRs. The relative CoreMark score is
-defined as CoreMark score divided by the CPU's clock frequency in MHz.
-
 .Configuration
 [cols="<2,<8"]
 [grid="topbot"]
@ -400,13 +382,7 @@ defined as CoreMark score divided by the CPU's clock frequency in MHz.
 | _performance_ (`rv32imc_Zicsr` + perf. options) |          95.23 | **0.9523**    | **3.54**
 |=======================

-[NOTE]
-The "_performance_" CPU configuration uses the <<_fast_mul_en>> and <<_fast_shift_en>> options.
-
 The NEORV32 CPU is based on a multi-cycle architecture. Each instruction is executed in a sequence of
-several consecutive micro operations.
-The average CPI (cycles per instruction) depends on the instruction mix of a specific applications and also on
-the available CPU extensions. The average CPI is computed by dividing the total number of required clock cycles
-(only the timed core to avoid distortion due to IO wait cycles) by the number of executed instructions
-(`[m]instret[h]` CSRs). More information regarding the execution time of each implemented instruction can be found in
-chapter <<_instruction_timing>>.
+several consecutive micro operations. The average CPI (cycles per instruction) depends on the instruction
+mix of a specific applications and also on the available CPU extensions. More information regarding the execution
+time of each implemented instruction can be found in section <<_instruction_sets_and_extensions>>.
--- a/docs/datasheet/rationale.adoc
+++ b/docs/datasheet/rationale.adoc
@ -1,7 +1,7 @@
 :sectnums:
 === Rationale

-:sectnums!:
+[discrete]
 ==== Why did you make this?

 Processor and CPU architecture designs are fascinating things: they are the magic frontier where software meets hardware.
@ -17,13 +17,13 @@ This project aims to provide a _simple to understand_ and _easy to use_ yet _pow
 that targets FPGA and RISC-V beginners as well as advanced users.


-:sectnums!:
+[discrete]
 ==== Why a _soft-core_ processor?

 As a matter of fact soft-core processors _cannot_ compete with discrete (like FPGA hard-macro) processors in terms
-of performance, energy efficiency and size. But they do fill a niche in FPGA design space: for example, soft-core processors
-allow to implement the _control flow part_ of certain applications (e.g. communication protocol handling) using
-software like plain C. This provides high flexibility as software can be easily changed, re-compiled and
+of performance, energy efficiency and size. But they do fill a niche in FPGA design space: for example, soft-core
+processors allow to implement the _control flow part_ of certain applications (e.g. communication protocol handling)
+using software like plain C. This provides high flexibility as software can be easily changed, re-compiled and
 re-uploaded again.

 Furthermore, the concept of flexibility applies to all aspects of a soft-core processor. The user can add
@ -31,7 +31,7 @@ _exactly_ the features that are required by the application: additional memories
 co-processors and even user-defined instructions.


-:sectnums!:
+[discrete]
 ==== Why RISC-V?

 image::riscv_logo.png[width=250,align=left]
@ -56,7 +56,7 @@ Finally, I really like the RISC-V ISA itself. It aims to be a clean, orthogonal
 resembles with the basic concepts of _RISC_: simple yet effective.


-:sectnums!:
+[discrete]
 ==== Yet another RISC-V core? What makes it special?

 The NEORV32 is not based on another RISC-V core. It was build entirely from ground up (just following the official
@ -74,7 +74,7 @@ and even memory accesses that are checked for address space holes and determinis
 devices. Precise exceptions allow a defined and fully-synchronized state of the CPU at every time an in every situation.


-:sectnums!:
+[discrete]
 ==== A multi-cycle architecture?!?!

 Most mainstream CPUs out there are pipelined architectures to increase throughput. In contrast, most CPUs used for
@ -83,29 +83,26 @@ multi-cycle architectures?

 In terms of energy, throughput, area and maximal clock frequency multi-cycle architectures are somewhere in between
 single-cycle and fully-pipelined designs: they provide higher throughput and clock speed when compared to their
-single-cycle counterparts while having less hardware complexity (= area) then a fully-pipelined designs. I decided to use the
-multi-cycle approach because of the following reasons:
+single-cycle counterparts while having less hardware complexity (= area) then a fully-pipelined designs. I decided to
+use the multi-cycle approach because of the following reasons:

 * Multi-cycle architecture are quite small! There is no need for pipeline hazard detection and resolution logic
-(e.g. forwarding). Furthermore, you can "re-use" parts of the core to do several tasks (e.g. the ALU is used for the actual data
-processing, but also for address generation, branch condition check and branch target computation).
+(e.g. forwarding). Furthermore, you can "re-use" parts of the core to do several tasks (e.g. the ALU is used for the
+actual data processing, but also for address generation, branch condition check and branch target computation).
 * Single-cycle architectures require memories that can be read asynchronously - a thing that is not feasible to implement
-in real world applications (i.e. FPGA block RAM is entirely synchronous). Furthermore, such design usually have a very (very!!!)
+in real world applications (i.e. FPGA block RAM is entirely synchronous). Furthermore, such design usually have a very
 long critical path tremendously reducing maximal operating frequency.
 * Pipelined designs increase performance by having several instruction "in fly" at the same time. But this also means
 there is some kind of "out-of-order" behavior: if an instruction at the end of the pipeline causes an exception
 all the instructions in earlier stages have to be invalidated. Potential architecture state changes have to be made _undone_
 requiring additional (exception-handling) logic. In a multi-cycle architecture this situation cannot occur because only a
 single instruction is "in fly" at a time.
-* Having only a single instruction in fly does not only reduce hardware costs, it also simplifies simulation/verification/debugging,
-state preservation/restoring during exceptions and extensibility (no need to care about pipeline hazards) - but of course at the
-cost of reduced throughput.
+* Having only a single instruction in fly does not only reduce hardware costs, it also simplifies
+simulation/verification/debugging, state preservation/restoring during exceptions and extensibility (no need to care
+about pipeline hazards) - but of course at the cost of reduced throughput.

-To counteract the loss of performance implied by a _pure_ multi-cycle architecture, the NEORV32 CPU uses a _mixed_ approach: instruction fetch
-(front-end) and instruction execution (back-end) are de-coupled to operate independently of each other. Data is interchanged via a queue
-building a simple 2-stage pipeline. Each "pipeline" stage in terms is implemented as multi-cycle architecture to simplify
-the hardware and to provide _precise_ state control (e.g. during exceptions).
-
-.CPU Architecture Details
-[TIP]
-Want to know more? Check out the description in the CPU's <<_architecture>> section.
+To counteract the loss of performance implied by a _pure_ multi-cycle architecture, the NEORV32 CPU uses a _mixed_
+approach: instruction fetch (front-end) and instruction execution (back-end) are de-coupled to operate independently
+of each other. Data is interchanged via a queue building a simple 2-stage pipeline. Each "pipeline" stage in terms is
+implemented as multi-cycle architecture to simplify the hardware and to provide _precise_ state control (e.g. during
+exceptions).
--- a/docs/datasheet/soc.adoc
+++ b/docs/datasheet/soc.adoc
--- a/docs/datasheet/soc_bootrom.adoc
+++ b/docs/datasheet/soc_bootrom.adoc
@ -6,44 +6,23 @@
 [frame="topbot",grid="none"]
 |=======================
 | Hardware source file(s): | neorv32_boot_rom.vhd | 
-| Software driver file(s): | none             | _implicitly used_
-| Top entity port:         | none             | 
-| Configuration generics:  | _INT_BOOTLOADER_EN_ | implement processor-internal bootloader when _true_
-| CPU interrupts:          | none             | 
+| Software driver file(s): | none                 | 
+| Top entity port:         | none                 | 
+| Configuration generics:  | `INT_BOOTLOADER_EN`  | implement processor-internal bootloader when `true`
+| CPU interrupts:          | none                 | 
 |=======================

+This boot ROM module provides a read-only memory that contain the executable image of the default NEORV32
+<<_bootloader>>. If the internal bootloader is enabled via the `INT_BOOTLOADER_EN` generic the CPU's boot address
+is automatically set to the beginning of the bootloader ROM. See section <<_boot_configuration>> for more
+information regarding the processor's different boot scenarios.
+
+.Address Configuration
 [NOTE]
-The default `neorv32_boot_rom.vhd` HDL source file provides a _generic_ memory design that infers embedded
-memory for _larger_ memory configurations. You might need to replace/modify the source file in order to use
-platform-specific features (like advanced memory resources) or to improve technology mapping and/or timing.
-
-This HDL modules provides a read-only memory that contain the executable code image of the bootloader.
-If the <<_int_bootloader_en>> generic is _true_ this module will be implemented and the CPU boot address
-is modified to directly execute the code from the bootloader ROM after reset.
-
-The bootloader ROM is located at address `0xFFFF0000` and can occupy a address space of up to 32kB. The base
-address as well as the maximum address space size are fixed and cannot (should not!) be modified as this
-might address collision with other processor modules.
-
-The bootloader memory is _read-only_ and is automatically initialized with the bootloader executable image
-`rtl/core/neorv32_bootloader_image.vhd` during synthesis. The actual _physical_ size of the ROM is also
-determined via synthesis and expanded to the next power of two. For example, if the bootloader code requires
-10kB of storage, a ROM with 16kB will be generated. The maximum size must not exceed 32kB.
-
-.Access Latency
-[NOTE]
-By default, the bootloader ROM has a fixed access latency of one clock cycle (like all other processor-internal
-modules). However, custom versions of this module may also have higher access latency. See section <<_bus_interface>>
-for more information.
+The bootloader ROM is located at address `0xFFFF0000` and can occupy an address space of up to 32kB. The base
+address as well as the maximum address space size are fixed and cannot be modified as this might cause address
+collision with other processor modules.

 .Read-Only Access
 [NOTE]
 Any write access to the BOOTROM will raise a _store access fault_ exception.
-
-.Bootloader - Software
-[TIP]
-See section <<_bootloader>> for more information regarding the actual bootloader software/executable itself.
-
-.Boot Configuration
-[TIP]
-See section <<_boot_configuration>> for more information regarding the processor's different boot scenarios.
--- a/docs/datasheet/soc_buskeeper.adoc
+++ b/docs/datasheet/soc_buskeeper.adoc
@ -9,7 +9,7 @@
 | Software driver file(s): | none | 
 | Top entity port:         | none | 
 | Configuration generics:  | none | 
-| Package constants:       | `max_proc_int_response_time_c` | Access time window (#cycles)
+| Package constants:       | `max_proc_int_response_time_c` | Access time window (maximum number of cycles)
 | CPU interrupts:          | none | 
 |=======================

@ -17,10 +17,10 @@
 **Theory of Operation**

 The Bus Keeper is a fundamental component of the processor's internal bus system that ensures correct bus operations
-to maintain execution safety. The Bus Keeper monitors every single bus transactions that is intimated by the CPU.
-If an accessed device responds with an error condition or do not respond within a specific _access time window_,
-the according bus access fault exception is raised. The following exceptions can be raised by the Bus Keeper
-(see section <<_traps_exceptions_and_interrupts>> for all available CPU traps):
+while maintaining execution safety. It monitors every single bus transactions that is initiated by the CPU.
+If an accessed device responds with an error condition or do not respond at all within a specific _access time window_,
+an according bus access fault exception is raised. The following exceptions can be raised by the Bus Keeper
+(see section <<_traps_exceptions_and_interrupts>> for a list of all available bus access-related exceptions):

 * `TRAP_CODE_I_ACCESS`: error during instruction fetch bus access
 * `TRAP_CODE_S_ACCESS`: error during data store bus access
@ -30,20 +30,15 @@ The **access time window**, in which an accessed device has to respond, is defin
 constant from the processor's VHDL package file (`rtl/neorv32_package.vhd`). The default value is **15 clock cycles**.

 In case of a bus access fault exception application software can evaluate the Bus Keeper's control register
-`NEORV32_BUSKEEPER.CTRL` to retrieve further details of the bus exception. The _BUSKEEPER_ERR_FLAG_ bit indicates
+`CTRL` to retrieve further details regarding the bus exception. The `BUSKEEPER_ERR_FLAG` bit indicates
 that an actual bus access fault has occurred. The bit is sticky once set and is automatically cleared when reading or
-writing the `NEORV32_BUSKEEPER.CTRL` register. The _BUSKEEPER_ERR_TYPE_ bit defines the type of the bus fault:
+writing the `NEORV32_BUSKEEPER.CTRL` register. The `BUSKEEPER_ERR_TYPE` bit defines the type of the bus fault:

 * `0` - "Device Error": The bus access exception was cause by the memory-mapped device that
 has been accessed (the device asserted it's `err_o`).
 * `1` - "Timeout Error": The bus access exception was caused by the Bus Keeper because the
 accessed memory-mapped device did not respond within the access time window. Note that this error type can also be raised
-by the optional timeout feature of the <<_processor_external_memory_interface_wishbone_axi4_lite>>).
-
-[NOTE]
-Bus access fault exceptions are also raised if a physical memory protection (PMP) rule is violated. In this case
-the _BUSKEEPER_ERR_FLAG_ bit remains zero (since the error signal is not triggered by the BUSKEEPER but by
-the CPU's PMP logic).
+by the optional timeout feature of the external bus interface.


 **Register Map**
@ -53,7 +48,7 @@ the CPU's PMP logic).
 [options="header",grid="all"]
 |=======================
 | Address | Name [C] | Bit(s), Name [C] | R/W | Function
-.2+<| `0xffffff78` .2+<| `CTRL` <|`0`  _BUSKEEPER_ERR_TYPE_ ^| r/- <| Bus error type, valid if _BUSKEEPER_ERR_FLAG_
-                                <|`31` _BUSKEEPER_ERR_FLAG_ ^| r/c <| Sticky error flag, clears after read or write access
+.2+<| `0xffffff78` .2+<| `CTRL` <|`0`  `BUSKEEPER_ERR_TYPE` ^| r/- <| Bus error type, valid if _BUSKEEPER_ERR_FLAG_
+                                <|`31` `BUSKEEPER_ERR_FLAG` ^| r/c <| Sticky error flag, clears after read or write access
 | `0xffffff7c` | - | _reserved_ | r/c | _reserved_ (mirrored access to `CTRL`)
 |=======================
--- a/docs/datasheet/soc_cfs.adoc
+++ b/docs/datasheet/soc_cfs.adoc
@ -10,25 +10,21 @@
 |                          | neorv32_cfs.h |
 | Top entity port:         | `cfs_in_i`  | custom input conduit
 |                          | `cfs_out_o` | custom output conduit
-| Configuration generics:  | _IO_CFS_EN_ | implement CFS when _true_
-|                          | _IO_CFS_CONFIG_    | custom generic conduit
-|                          | _IO_CFS_IN_SIZE_   | size of `cfs_in_i`
-|                          | _IO_CFS_OUT_SIZE_  | size of `cfs_out_o`
+| Configuration generics:  | `IO_CFS_EN`        | implement CFS when `true`
+|                          | `IO_CFS_CONFIG`    | custom generic conduit
+|                          | `IO_CFS_IN_SIZE`   | size of `cfs_in_i`
+|                          | `IO_CFS_OUT_SIZE`  | size of `cfs_out_o`
 | CPU interrupts:          | fast IRQ channel 1 | CFS interrupt (see <<_processor_interrupts>>)
 |=======================


 **Theory of Operation**

-The custom functions subsystem is meant for implementing custom and application-specific logic.
-The CFS provides up to 64 32-bit memory-mapped read/write
-registers (`REG`, see register map below) that can be accessed by the CPU via normal load/store operations.
-The actual functionality of these register has to be defined by the hardware designer. Furthermore, the CFS
-provides two IO conduits to implement custom on-chip or off-chip interfaces.
-
-In contrast to connecting custom hardware accelerators via external memory interfaces (like SPI or the processor's
-external bus interface), the CFS provide a convenient, low-latency and tightly-coupled extension and
-customization option.
+The custom functions subsystem is meant for implementing custom tightly-coupled co-processors or interfaces.
+IT provides up to 64 32-bit memory-mapped read/write registers (`REG`, see register map below) that can be
+accessed by the CPU via normal load/store operations. The actual functionality of these register has to be
+defined by the hardware designer. Furthermore, the CFS provides two IO conduits to implement custom on-chip
+or off-chip interfaces.

 Just like any other externally-connected IP, logic implemented within the custom functions subsystem can operate
 _independently_ of the CPU providing true parallel processing capabilities. Potential use cases might include
@ -38,7 +34,7 @@ or real-time data transport (I2S).

 [TIP]
 If you like to implement _custom instructions_ that are executed right within the CPU's ALU
-see the <<_zxcfu_custom_instructions_extension_cfu>> and the according <<_custom_functions_unit_cfu>>.
+see the <<_zxcfu_isa_extension>> and the according <<_custom_functions_unit_cfu>>.

 [TIP]
 Take a look at the template CFS VHDL source file (`rtl/core/neorv32_cfs.vhd`). The file is highly
@ -51,7 +47,7 @@ The CFS can also be used to _replicate_ existing NEORV32 modules - for example t
 **CFS Software Access**

 The CFS memory-mapped registers can be accessed by software using the provided C-language aliases (see
-register map table below). Note that all interface registers are declared as 32-bit words of type `uint32_t`.
+register map table below). Note that all interface registers are defined as 32-bit words of type `uint32_t`.

 .CFS Software Access Example
 [source,c]
@ -61,9 +57,6 @@ NEORV32_CFS->REG[0] = (uint32_t)some_data_array(i); // write to CFS register 0
 int temp = (int)NEORV32_CFS->REG[20]; // read from CFS register 20
 ----

-[TIP]
-A very simple example program that uses the _default_ CFS hardware module can be found in `sw/example/cfs_demo`.
-

 **CFS Interrupt**

@ -74,7 +67,7 @@ writing zero to the according <<_mip>> CSR bit. See section <<_processor_interru

 **CFS Configuration Generic**

-By default, the CFS provides a single 32-bit `std_(u)logic_vector` configuration generic _IO_CFS_CONFIG_
+By default, the CFS provides a single 32-bit `std_ulogic_vector` configuration generic `IO_CFS_CONFIG`
 that is available in the processor's top entity. This generic can be used to pass custom configuration options
 from the top entity directly down to the CFS. The actual definition of the generic and it's usage inside the
 CFS is left to the hardware designer.
@ -87,10 +80,10 @@ These signals are directly propagated to the processor's top entity. These condu
 application-specific interfaces like memory or peripheral connections. The actual use case of these signals
 has to be defined by the hardware designer.

-The size of the input signal conduit `cfs_in_i` is defined via the top's _IO_CFS_IN_SIZE_ configuration
+The size of the input signal conduit `cfs_in_i` is defined via the top's `IO_CFS_IN_SIZE` configuration
 generic (default = 32-bit). The size of the output signal conduit `cfs_out_o` is defined via the top's
-_IO_CFS_OUT_SIZE_ configuration generic (default = 32-bit). If the custom function subsystem is not implemented
-(_IO_CFS_EN_ = false) the `cfs_out_o` signal is tied to all-zero.
+`IO_CFS_OUT_SIZE` configuration generic (default = 32-bit). If the custom function subsystem is not implemented
+(`IO_CFS_EN` = false) the `cfs_out_o` signal is tied to all-zero.


 **Register Map**
--- a/docs/datasheet/soc_dmem.adoc
+++ b/docs/datasheet/soc_dmem.adoc
@ -9,21 +9,21 @@
 |                          | mem/neorv32_dmem.default.vhd | default _platform-agnostic_ memory architecture
 | Software driver file(s): | none                         | _implicitly used_
 | Top entity port:         | none                         | 
-| Configuration generics:  | _MEM_INT_DMEM_EN_            | implement processor-internal DMEM when _true_
-|                          | _MEM_INT_DMEM_SIZE_          | DMEM size in bytes
+| Configuration generics:  | `MEM_INT_DMEM_EN`            | implement processor-internal DMEM when `true`
+|                          | `MEM_INT_DMEM_SIZE`          | DMEM size in bytes
 | CPU interrupts:          | none                         | 
 |=======================

-Implementation of the processor-internal data memory is enabled via the processor's _MEM_INT_DMEM_EN_
-generic. The size in bytes is defined via the _MEM_INT_DMEM_SIZE_ generic. If the DMEM is implemented,
+Implementation of the processor-internal data memory is enabled via the processor's `MEM_INT_DMEM_EN`
+generic. The size in bytes is defined via the `MEM_INT_DMEM_SIZE` generic. If the DMEM is implemented,
 the memory is mapped into the data memory space and located right at the beginning of the data memory
 space (default `dspace_base_c` = 0x80000000). The DMEM is always implemented as true RAM.

 .Access Latency
 [NOTE]
 By default, the DMEM has a fixed access latency of one clock cycle (like all other processor-internal
-modules). However, custom versions of this module may also have higher access latency. See section <<_bus_interface>>
-for more information.
+modules). However, custom versions of this module may also have higher access latency. See section
+<<_bus_interface>> for more information.

 .VHDL Source File
 [NOTE]
--- a/docs/datasheet/soc_gpio.adoc
+++ b/docs/datasheet/soc_gpio.adoc
@ -10,16 +10,18 @@
 |                          | neorv32_gpio.h |
 | Top entity port:         | `gpio_o` | 64-bit parallel output port
 |                          | `gpio_i` | 64-bit parallel input port
-| Configuration generics:  | _IO_GPIO_NUM_ | number of input/output pairs (0..64)
+| Configuration generics:  | `IO_GPIO_NUM` | number of input/output pairs to implement (0..64)
 | CPU interrupts:          | none |
 |=======================

-The general purpose parallel IO unit provides a simple parallel input and output port. These ports can be used chip-externally
-(for example to drive status LEDs, connect buttons, etc.) or chip-internally to provide control signals for other IP modules.
-The actual number of input/output pairs is defined by the _IO_GPIO_NUM_ generic. When set to zero, the GPIO module is excluded
-from synthesis and the output port `gpio_o` is tied to all-zero. If _IO_GPIO_NUM_ is less than the maximum value of 64
-only the LSB-aligned bits in `gpio_o` and `gpio_i` are actually connected while the remaining bits are unconnected or tied
-to zero, respectively.
+The general purpose parallel IO unit provides a simple parallel input and output port. These ports can be used
+chip-externally (for example to drive status LEDs, connect buttons, etc.) or chip-internally to provide control
+signals for other IP modules.
+
+The actual number of input/output pairs is defined by the `IO_GPIO_NUM` generic. When set to zero, the GPIO module
+is excluded from synthesis and the output port `gpio_o` is tied to all-zero. If `IO_GPIO_NUM` is less than the
+maximum value of 64, only the LSB-aligned bits in `gpio_o` and `gpio_i` are actually connected while the remaining
+bits are tied to zero or are left unconnected, respectively.

 .Access Atomicity
 [NOTE]
--- a/docs/datasheet/soc_gptmr.adoc
+++ b/docs/datasheet/soc_gptmr.adoc
@ -9,7 +9,7 @@
 | Software driver file(s): | neorv32_gptmr.c |
 |                          | neorv32_gptmr.h |
 | Top entity port:         | none | 
-| Configuration generics:  | _IO_GPTMR_EN_ | implement general purpose timer when _true_
+| Configuration generics:  | `IO_GPTMR_EN`       | implement general purpose timer when `true`
 | CPU interrupts:          | fast IRQ channel 12 | timer interrupt (see <<_processor_interrupts>>)
 |=======================

@ -17,13 +17,13 @@
 **Theory of Operation**

 The general purpose timer module provides a simple yet universal 32-bit timer. The timer is implemented if
-_IO_GPTMR_EN_ top generic is set _true_. It provides a 32-bit counter register (`COUNT`) and a 32-bit threshold
+`IO_GPTMR_EN` top generic is set `true`. It provides a 32-bit counter register (`COUNT`) and a 32-bit threshold
 register (`THRES`). An interrupt is generated whenever the value of the counter registers matches the one from
 threshold register.

-The timer is enabled by setting the _GPTMR_CTRL_EN_ bit in the device's control register `CTRL`. The `COUNT`
+The timer is enabled by setting the `GPTMR_CTRL_EN` bit in the device's control register `CTRL`. The `COUNT`
 register will start incrementing at a programmable rate, which scales the main processor clock. The
-pre-scaler value is configured via the three _GPTMR_CTRL_PRSCx_ control register bits:
+pre-scaler value is configured via the three `GPTMR_CTRL_PRSCx` control register bits:

 .GPTMR prescaler configuration
 [cols="<4,^1,^1,^1,^1,^1,^1,^1,^1"]
@ -33,10 +33,10 @@ pre-scaler value is configured via the three _GPTMR_CTRL_PRSCx_ control register
 | Resulting `clock_prescaler` |       2 |       4 |       8 |      64 |     128 |    1024 |    2048 |    4096
 |=======================

-The timer provides two operation modes that are configured by the _GPTMR_CTRL_MODE_ control register bit:
-if _GPTMR_CTRL_MODE_ is cleared (`0`) the timer operates in _single-shot mode_. As soon as `COUNT` matches
-`THRES` an interrupt request is generated and the timer stops operation (i.e. it stops incrementing). If
-_GPTMR_CTRL_MODE_ is set (`1`) the timer operates in _continuous mode_. When `COUNT` matches `THRES` an interrupt
+The timer provides two operation modes that are configured via the `GPTMR_CTRL_MODE` control register bit:
+.If `GPTMR_CTRL_MODE` is cleared (`0`) the timer operates in _single-shot mode_. As soon as `COUNT` matches
+`THRES` an interrupt request is generated and the timer stops operation (i.e. it stops incrementing)
+.If `GPTMR_CTRL_MODE` is set (`1`) the timer operates in _continuous mode_. When `COUNT` matches `THRES` an interrupt
 request is generated and `COUNT` is automatically reset to all-zero before continuing to increment.

 [NOTE]
@ -57,11 +57,10 @@ remains pending inside the CPU until it explicitly cleared by writing zero to th
 [options="header",grid="all"]
 |=======================
 | Address | Name [C] | Bit(s), Name [C] | R/W | Function
-.5+<| `0xffffff60` .5+<| `CTRL` <|`0` _GPTMR_CTRL_EN_    ^| r/w <| Timer enable flag
-                                <|`1` _GPTMR_CTRL_PRSC0_ ^| r/w .3+| 3-bit clock prescaler select
-                                <|`2` _GPTMR_CTRL_PRSC1_ ^| r/w 
-                                <|`3` _GPTMR_CTRL_PRSC2_ ^| r/w 
-                                <|`4` _GPTMR_CTRL_MODE_  ^| r/w <| Counter mode: `0`=single-shot, `1`=continuous
+.4+<| `0xffffff60` .4+<| `CTRL` <|`0`    `GPTMR_CTRL_EN`                       ^| r/w <| Timer enable flag
+                                <|`3:1`  `GPTMR_CTRL_PRSC2 : GPTMR_CTRL_PRSC0` ^| r/w <| 3-bit clock prescaler select
+                                <|`4`    `GPTMR_CTRL_MODE`                     ^| r/w <| Counter mode: `0`=single-shot, `1`=continuous
+                                <|`31:5` -                                     ^| r/- <| _reserved_, read as zero
 | `0xffffff64` | `THRES` |`31:0` | r/w | Threshold value register
 | `0xffffff68` | `COUNT` |`31:0` | r/w | Counter register
 |=======================
--- a/docs/datasheet/soc_icache.adoc
+++ b/docs/datasheet/soc_icache.adoc
@ -6,45 +6,34 @@
 [frame="topbot",grid="none"]
 |=======================
 | Hardware source file(s): | neorv32_icache.vhd | 
-| Software driver file(s): | none             | _implicitly used_
-| Top entity port:         | none             | 
-| Configuration generics:  | _ICACHE_EN_ | implement processor-internal instruction cache when _true_
-|                          | _ICACHE_NUM_BLOCKS_ | number of cache blocks (pages/lines)
-|                          | _ICACHE_BLOCK_SIZE_ | size of a cache block in bytes
-|                          | _ICACHE_ASSOCIATIVITY_ | associativity / number of sets
-| CPU interrupts:          | none             | 
+| Software driver file(s): | none               | _implicitly used_
+| Top entity port:         | none               | 
+| Configuration generics:  | `ICACHE_EN`            | implement processor-internal instruction cache when `true`
+|                          | `ICACHE_NUM_BLOCKS`    | number of cache blocks (pages/lines)
+|                          | `ICACHE_BLOCK_SIZE`    | size of a cache block in bytes
+|                          | `ICACHE_ASSOCIATIVITY` | associativity / number of sets
+| CPU interrupts:          | none | 
 |=======================

-The processor features an optional cache for instructions to improve performance when using memories with high
+The processor features an optional instruction cache to improve performance when using memories with high
 access latencies. The cache is directly connected to the CPU's instruction fetch interface and provides
 full-transparent buffering of instruction fetch accesses to the entire address space.

-The cache is implemented if the _ICACHE_EN_ generic is true. The size of the cache memory is defined via
-_ICACHE_BLOCK_SIZE_ (the size of a single cache block/page/line in bytes; has to be a power of two and >=
-4 bytes), _ICACHE_NUM_BLOCKS_ (the total amount of cache blocks; has to be a power of two and >= 1) and
-the actual cache associativity _ICACHE_ASSOCIATIVITY_ (number of sets; 1 = direct-mapped, 2 = 2-way set-associative,
-has to be a power of two and >= 1). If the cache associativity (_ICACHE_ASSOCIATIVITY_) is greater than one
-the LRU replacement policy (least recently used) is used.
+The cache is implemented if the `ICACHE_EN` generic is `true`. The size of the cache memory is defined via
+`ICACHE_BLOCK_SIZE` (the size of a single cache block/page/line in bytes; has to be a power of two and greater than or
+equal to 4 bytes), `ICACHE_NUM_BLOCKS` (the total amount of cache blocks; has to be a power of two and greater than or
+equal to 1) and the actual cache associativity `ICACHE_ASSOCIATIVITY` (number of sets; 1 = direct-mapped, 2 = 2-way set-associative).
+If the cache associativity is greater than one the LRU replacement policy (least recently used) is used.

-.Cache Memory HDL
-[NOTE]
-The default `neorv32_icache.vhd` HDL source file provides a _generic_ memory design that infers embedded
-memory. You might need to replace/modify the source file in order to use platform-specific features
-(like advanced memory resources) or to improve technology mapping and/or timing. Also, keep the features
-of the targeted FPGA's memory resources (block RAM) in mind when configuring
-the cache size/layout to maximize and optimize resource utilization.

 .Caching Internal Memories
 [NOTE]
 The instruction cache is intended to accelerate instruction fetches from _processor-external_ memories
 (via the external bus interface or via the XIP module).
-Since all processor-internal memories provide an access latency of one cycle (by default), caching
-internal memories does not bring a relevant performance gain. However, it will slightly reduce traffic on the
-processor-internal bus.

 .Manual Cache Clear/Reload
 [NOTE]
-By executing the `ifence.i` instruction (`Zifencei` CPU extension) the cache is cleared and a reload from
+By executing the `fence.i` instruction (<<_zifencei_isa_extension>>) the cache is cleared and a reload from
 main memory is triggered. This also allows to implement self-modifying code.

 .Retrieve Cache Configuration from Software
@ -54,7 +43,7 @@ Software can retrieve the cache configuration/layout from the <<_sysinfo_cache_c

 **Bus Access Fault Handling**

-The cache always loads a complete cache block (_ICACHE_BLOCK_SIZE_ bytes; aligned to the block size) every time a
+The cache always loads a complete cache block (aligned to the block size) every time a
 cache miss is detected. Each cached word from this block provides a single status bit that indicates if the
 according bus access was successful or caused a bus error. Hence, the whole cache block remains valid even
 if certain addresses inside caused a bus error. If the CPU accesses any of the faulty cache words, an
--- a/docs/datasheet/soc_mtime.adoc
+++ b/docs/datasheet/soc_mtime.adoc
@ -8,15 +8,15 @@
 | Hardware source file(s): | neorv32_mtime.vhd | 
 | Software driver file(s): | neorv32_mtime.c |
 |                          | neorv32_mtime.h |
-| Top entity port:         | `mtime_irq_i` | RISC-V machine timer IRQ if internal MTIME is **not** implemented
-| Configuration generics:  | _IO_MTIME_EN_ | implement MTIME when _true_
+| Top entity port:         | `mtime_irq_i` | RISC-V machine timer IRQ if internal one is **not** implemented
+| Configuration generics:  | `IO_MTIME_EN` | implement machine timer when `true`
 | CPU interrupts:          | `MTI` | machine timer interrupt (see <<_processor_interrupts>>)
 |=======================

 The MTIME module implements a memory-mapped MTIME machine system timer that is compatible to the RISC-V
 privileged specifications. The 64-bit system time is accessed via the  memory-mapped `TIME_LO` and
 `TIME_HI`registers. A 64-bit time compare register, which is accessible via the memory-mapped `TIMECMP_LO`
-and `TIMECMP_HI` registers, can be used to configure the CPU's MTI (machine timer interrupt). The interrupt
+and `TIMECMP_HI` registers, can be used to configure the CPU's `MTI` (machine timer interrupt). The interrupt
 is triggered whenever `TIME` (high & low part) is greater than or equal to `TIMECMP` (high & low part).
 The interrupt remains active (=pending) until `TIME` becomes less `TIMECMP` again (either by modifying
 `TIME` or `TIMECMP`).
@ -27,9 +27,9 @@ After a hardware reset the `TIME` and `TIMECMP` register are reset to all-zero.

 .External MTIME Interrupt
 [IMPORTANT]
+If the internal MTIME module is disabled (`IO_MTIME_EN` = `false`) the machine timer interrupt becomes available as external signal.
 The `mtime_irq_i` signal is level-triggered and high-active. Once set the signal has to stay high until
-the interrupt request is explicitly acknowledged (e.g. writing to a memory-mapped register). All RISC-V standard interrupts
-can **NOT** be acknowledged by writing zero to the according <<_mip>> CSR bit. +
+the interrupt request is explicitly acknowledged (e.g. writing to a memory-mapped register).


 **Register Map**
--- a/docs/datasheet/soc_neoled.adoc
+++ b/docs/datasheet/soc_neoled.adoc
@ -9,8 +9,8 @@
 | Software driver file(s): | neorv32_neoled.c |
 |                          | neorv32_neoled.h |
 | Top entity port:         | `neoled_o` | 1-bit serial data output
-| Configuration generics:  | _IO_NEOLED_EN_      | implement NEOLED when _true_
-|                          | _IO_NEOLED_TX_FIFO_ | TX FIFO depth, has to be a power of 2, min 1
+| Configuration generics:  | `IO_NEOLED_EN`      | implement NEOLED controller when `true`
+|                          | `IO_NEOLED_TX_FIFO` | TX FIFO depth, has to be a power of 2, min 1
 | CPU interrupts:          | fast IRQ channel 9  | configurable NEOLED data FIFO interrupt (see <<_processor_interrupts>>)
 |=======================

@ -19,17 +19,13 @@

 The NEOLED module provides a dedicated interface for "smart RGB LEDs" like WS2812, WS2811 or any other compatible
 LEDs. These LEDs provide a single-wire interface that uses an asynchronous serial protocol for transmitting color
-data. Basically, data is transferred via LED-internal shift registers, which allows to cascade an unlimited
-number of smart LEDs. The protocol provides a RESET command to strobe the transmitted data into the
-LED PWM driver registers after data has shifted throughout all LEDs in a chain.
-
-Using the NEOLED module allows CPU-independent operation of an arbitrary number of smart LEDs. A configurable data
+data.  Using the NEOLED module allows CPU-independent operation of an arbitrary number of smart LEDs. A configurable data
 buffer (FIFO) allows to utilize block transfer operation without requiring the CPU.

 [NOTE]
 The NEOLED interface is compatible to the "Adafruit Industries NeoPixel(TM)" products, which feature
-WS2812 (or older WS2811) smart LEDs (see link:https://learn.adafruit.com/adafruit-neopixel-uberguide).
-Other LEDs might be compatible as well when adjusting the controller's programmable timing configuration.
+WS2812 (or older WS2811) smart LEDs. Other LEDs might be compatible as well when adjusting the controller's programmable
+timing configuration.

 The interface provides a single 1-bit output `neoled_o` to drive an arbitrary number of cascaded LEDs. Since the
 NEOLED module provides 24-bit and 32-bit operating modes, a mixed setup with RGB LEDs (24-bit color)
@ -40,8 +36,8 @@ and RGBW LEDs (32-bit color including a dedicated white LED chip) is possible.

 The NEOLED modules provides two accessible interface registers: the control register `CTRL` and the write-only
 TX data register `DATA`. The NEOLED module is globally enabled via the control register's
-_NEOLED_CTRL_EN_ bit. Clearing this bit will terminate any current operation, clear the TX buffer, reset the module
-and set the `neoled_o` output to zero. The precise timing (implementing the **WS2812** protocol) and transmission
+`NEOLED_CTRL_EN` bit. Clearing this bit will terminate any current operation, clear the TX buffer, reset the module
+and set the `neoled_o` output to zero. The precise timing (e.g. implementing the **WS2812** protocol) and transmission
 mode are fully programmable via the `CTRL` register to provide maximum flexibility.


@ -52,9 +48,9 @@ four chips providing RGB color plus a dedicated white LED chip (= RGBW). Since t
 LED chip is defined via an 8-bit value the RGB LEDs require a frame of 24-bit per module and the RGBW
 LEDs require a frame of 32-bit per module.

-The data transfer quantity of the NEOLED module can be programmed via the _NEOLED_MODE_EN_ control
+The data transfer quantity of the NEOLED module can be programmed via the `NEOLED_MODE_EN` control
 register bit. If this bit is cleared, the NEOLED interface operates in 24-bit mode and will transmit bits `23:0` of
-the data written to `DATA` to the LEDs. If _NEOLED_MODE_EN_ is set, the NEOLED interface operates in 32-bit
+the data written to `DATA` to the LEDs. If `NEOLED_MODE_EN` is set, the NEOLED interface operates in 32-bit
 mode and will transmit bits `31:0` of the data written to `DATA` to the LEDs.

 The mode bit can be reconfigured before writing a new data word to `DATA` in order to support an arbitrary setup/mixture
@ -86,10 +82,11 @@ image::neopixel.png[align=center]

 **Timing Configuration**

-The basic carrier frequency (800kHz for the WS2812 LEDs) is configured via a 3-bit main clock prescaler (_NEOLED_CTRL_PRSCx_, see table below)
-that scales the main processor clock f~main~ and a 5-bit cycle multiplier _NEOLED_CTRL_T_TOT_x_.
+The basic carrier frequency (800kHz for the WS2812 LEDs) is configured via a 3-bit main clock prescaler
+(`NEOLED_CTRL_PRSC*`, see table below) that scales the main processor clock f~main~ and a 5-bit cycle
+multiplier `NEOLED_CTRL_T_TOT_*`.

-.NEOLED prescaler configuration
+.NEOLED Prescaler Configuration
 [cols="<4,^1,^1,^1,^1,^1,^1,^1,^1"]
 [options="header",grid="rows"]
 |=======================
@ -98,7 +95,7 @@ that scales the main processor clock f~main~ and a 5-bit cycle multiplier _NEOLE
 |=======================

 The duty-cycles (or more precisely: the high- and low-times for sending either a '1' bit or a '0' bit) are
-defined via the 5-bit _NEOLED_CTRL_T_ONE_H_x_ and _NEOLED_CTRL_T_ZERO_H_x_ values, respectively. These programmable
+defined via the 5-bit `NEOLED_CTRL_T_ONE_H_*` and `NEOLED_CTRL_T_ZERO_H_*` values, respectively. These programmable
 timing constants allow to adapt the interface for a wide variety of smart LED protocol (for example WS2812 vs.
 WS2811).

@ -108,7 +105,7 @@ WS2811).
 Generate the base clock f~TX~ for the NEOLED TX engine:

 * processor clock f~main~ = 100 MHz
-* _NEOLED_CTRL_PRSCx_ = `0b001` = f~main~ / 4
+* `NEOLED_CTRL_PRSCx` = `0b001` = f~main~ / 4

 _**f~TX~**_ = _f~main~[Hz]_ / `clock_prescaler` = 100MHz / 4 = 25MHz

@ -116,15 +113,15 @@ _**T~TX~**_ = 1 / _**f~TX~**_ = 40ns

 Generate carrier period (T~carrier~) and *high-times* (duty cycle) for sending `0` (T~0H~) and `1` (T~1H~) bits:

-* _NEOLED_CTRL_T_TOT_ = `0b11110` (= decimal 30)
-* _NEOLED_CTRL_T_ZERO_H_ = `0b01010` (= decimal 10)
-* _NEOLED_CTRL_T_ONE_H_ = `0b10100` (= decimal 20)
+* `NEOLED_CTRL_T_TOT` = `0b11110` (= decimal 30)
+* `NEOLED_CTRL_T_ZERO_H` = `0b01010` (= decimal 10)
+* `NEOLED_CTRL_T_ONE_H` = `0b10100` (= decimal 20)

-_**T~carrier~**_ = _**T~TX~**_ * _NEOLED_CTRL_T_TOT_ = 40ns * 30 = 1.4µs
+_**T~carrier~**_ = _**T~TX~**_ * `NEOLED_CTRL_T_TOT` = 40ns * 30 = 1.4µs

-_**T~0H~**_ = _**T~TX~**_ * _NEOLED_CTRL_T_ZERO_H_ = 40ns * 10 = 0.4µs
+_**T~0H~**_ = _**T~TX~**_ * `NEOLED_CTRL_T_ZERO_H` = 40ns * 10 = 0.4µs

-_**T~1H~**_ = _**T~TX~**_ * _NEOLED_CTRL_T_ONE_H_ = 40ns * 20 = 0.8µs
+_**T~1H~**_ = _**T~TX~**_ * `NEOLED_CTRL_T_ONE_H` = 40ns * 20 = 0.8µs

 [TIP]
 The NEOLED SW driver library (`neorv32_neoled.h`) provides a simplified configuration
@ -135,21 +132,21 @@ clock frequency.
 **TX Data FIFO**

 The interface features a configurable  TX data buffer (a FIFO) to allow more CPU-independent operation. The buffer
-depth is configured via the _IO_NEOLED_TX_FIFO_ top generic (default = 1 entry). The FIFO size configuration can be
-read via the _NEOLED_CTRL_BUFS_x_ control register bits, which result log2(_IO_NEOLED_TX_FIFO_).
+depth is configured via the `IO_NEOLED_TX_FIFO` top generic (default = 1 entry). The FIFO size configuration can be
+read via the `NEOLED_CTRL_BUFS_x` control register bits, which result log2(_IO_NEOLED_TX_FIFO_).

 When writing data to the `DATA` register the data is automatically written to the TX buffer. Whenever
 data is available in the buffer the serial transmission engine will take and transmit it to the LEDs.
-The data transfer size (_NEOLED_MODE_EN_) can be modified at any time since this control register bit is also buffered
+The data transfer size (`NEOLED_MODE_EN`) can be modified at any time since this control register bit is also buffered
 in the FIFO. This allows an arbitrary mix of RGB and RGBW LEDs in the chain.

-Software can check the FIFO fill level via the control register's _NEOLED_CTRL_TX_EMPTY_, _NEOLED_CTRL_TX_HALF_
-and _NEOLED_CTRL_TX_FULL_ flags. The _NEOLED_CTRL_TX_BUSY_ flags provides additional information if the the serial
+Software can check the FIFO fill level via the control register's `NEOLED_CTRL_TX_EMPTY`, `NEOLED_CTRL_TX_HALF`
+and `NEOLED_CTRL_TX_FULL` flags. The `NEOLED_CTRL_TX_BUSY` flags provides additional information if the the serial
 transmit engine is still busy sending data.

 [WARNING]
-Please note that the timing configurations (_NEOLED_CTRL_PRSCx_, _NEOLED_CTRL_T_TOT_x_,
-_NEOLED_CTRL_T_ONE_H_x_ and _NEOLED_CTRL_T_ZERO_H_x_) are **NOT** stored to the buffer. Changing
+Please note that the timing configurations (`NEOLED_CTRL_PRSCx`, `NEOLED_CTRL_T_TOT_x`,
+`NEOLED_CTRL_T_ONE_H_x` and `NEOLED_CTRL_T_ZERO_H_x`) are **NOT** stored to the buffer. Changing
 these value while the buffer is not empty or the TX engine is still busy will cause data corruption.


@ -160,11 +157,11 @@ registers when the data line is low for 50μs ("RESET" command, see table above)
 using busy-wait for at least 50μs. Obviously, this concept wastes a lot of processing power.

 To circumvent this, the NEOLED module provides an option to automatically issue an idle time for creating the RESET
-command. If the _NEOLED_CTRL_STROBE_ control register bit is set, _all_ data written to the data FIFO (via `DATA`,
+command. If the `NEOLED_CTRL_STROBE` control register bit is set, _all_ data written to the data FIFO (via `DATA`,
 the actually written data is irrelevant) will trigger an idle phase (`neoled_o` = zero) of 127 periods (= _**T~carrier~**_).
 This idle time will cause the LEDs to strobe the color data into the PWM driver registers.

-Since the _NEOLED_CTRL_STROBE_ flag is also buffered in the TX buffer, the RESET command is treated just as another
+Since the `NEOLED_CTRL_STROBE` flag is also buffered in the TX buffer, the RESET command is treated just as another
 data word being written to the TX buffer making busy wait concepts obsolete and allowing maximum refresh rates.


@ -172,11 +169,11 @@ data word being written to the TX buffer making busy wait concepts obsolete and

 The NEOLED modules features a single interrupt that triggers based on the current TX buffer fill level.
 The interrupt can only become pending if the NEOLED module is enabled. The specific interrupt condition
-is configured via the _NEOLED_CTRL_IRQ_CONF_ bit in the unit's control register.
+is configured via the `NEOLED_CTRL_IRQ_CONF` bit in the unit's control register.

-If _NEOLED_CTRL_IRQ_CONF_ is set, the module's interrupt is generated whenever the TX FIFO is less than half-full.
-In this case software can write up to _IO_NEOLED_TX_FIFO_/2 new data words to `DATA` without checking the FIFO
-status flags. If _NEOLED_CTRL_IRQ_CONF_ is cleared, an interrupt is generated when the TX FIFO is empty.
+If `NEOLED_CTRL_IRQ_CONF` is set, the module's interrupt is generated whenever the TX FIFO is less than half-full.
+In this case software can write up to `IO_NEOLED_TX_FIFO`/2 new data words to `DATA` without checking the FIFO
+status flags. If `NEOLED_CTRL_IRQ_CONF` is cleared, an interrupt is generated when the TX FIFO is empty.

 One the NEOLED interrupt has been triggered and became pending, it has to explicitly cleared again by
 writing zero to according <<_mip>> CSR bit.
@ -189,18 +186,18 @@ writing zero to according <<_mip>> CSR bit.
 [options="header",grid="all"]
 |=======================
 | Address | Name [C] | Bit(s), Name [C] | R/W | Function
-.13+<| `0xffffffd8` .13+<| `CTRL` <|`0`     _NEOLED_CTRL_EN_                                    ^| r/w <| NEOLED enable
-                                  <|`1`     _NEOLED_CTRL_MODE_                                  ^| r/w <| data transfer size; `0`=24-bit; `1`=32-bit
-                                  <|`2`     _NEOLED_CTRL_STROBE_                                ^| r/w <| `0`=send normal color data; `1`=send RESET command on data write access
-                                  <|`5:3`   _NEOLED_CTRL_PRSC2_ : _NEOLED_CTRL_PRSC0_           ^| r/w <| 3-bit clock prescaler, bit 0
-                                  <|`9:6`   _NEOLED_CTRL_BUFS3_ : _NEOLED_CTRL_BUFS0_           ^| r/- <| 4-bit log2(_IO_NEOLED_TX_FIFO_)
-                                  <|`14:10` _NEOLED_CTRL_T_TOT_4_ : _NEOLED_CTRL_T_TOT_0_       ^| r/w <| 5-bit pulse clock ticks per total single-bit period (T~total~)
-                                  <|`19:15` _NEOLED_CTRL_T_ZERO_H_4_ : _NEOLED_CTRL_T_ZERO_H_0_ ^| r/w <| 5-bit pulse clock ticks per high-time for sending a zero-bit (T~0H~)
-                                  <|`24:20` _NEOLED_CTRL_T_ONE_H_4_ : _NEOLED_CTRL_T_ONE_H_0_   ^| r/w <| 5-bit pulse clock ticks per high-time for sending a one-bit (T~1H~)
-                                  <|`27`    _NEOLED_CTRL_IRQ_CONF_                              ^| r/w <| TX FIFO interrupt configuration: `0`=IRQ if FIFO is empty, `1`=IRQ if FIFO is less than half-full
-                                  <|`28`    _NEOLED_CTRL_TX_EMPTY_                              ^| r/- <| TX FIFO is empty
-                                  <|`29`    _NEOLED_CTRL_TX_HALF_                               ^| r/- <| TX FIFO is _at least_ half full
-                                  <|`30`    _NEOLED_CTRL_TX_FULL_                               ^| r/- <| TX FIFO is full
-                                  <|`31`    _NEOLED_CTRL_TX_BUSY_                               ^| r/- <| TX serial engine is busy when set
+.13+<| `0xffffffd8` .13+<| `CTRL` <|`0`     `NEOLED_CTRL_EN`                                  ^| r/w <| NEOLED enable
+                                  <|`1`     `NEOLED_CTRL_MODE`                                ^| r/w <| data transfer size; `0`=24-bit; `1`=32-bit
+                                  <|`2`     `NEOLED_CTRL_STROBE`                              ^| r/w <| `0`=send normal color data; `1`=send RESET command on data write access
+                                  <|`5:3`   `NEOLED_CTRL_PRSC2 : NEOLED_CTRL_PRSC0`           ^| r/w <| 3-bit clock prescaler, bit 0
+                                  <|`9:6`   `NEOLED_CTRL_BUFS3 : NEOLED_CTRL_BUFS0`           ^| r/- <| 4-bit log2(_IO_NEOLED_TX_FIFO_)
+                                  <|`14:10` `NEOLED_CTRL_T_TOT_4 : NEOLED_CTRL_T_TOT_0`       ^| r/w <| 5-bit pulse clock ticks per total single-bit period (T~total~)
+                                  <|`19:15` `NEOLED_CTRL_T_ZERO_H_4 : NEOLED_CTRL_T_ZERO_H_0` ^| r/w <| 5-bit pulse clock ticks per high-time for sending a zero-bit (T~0H~)
+                                  <|`24:20` `NEOLED_CTRL_T_ONE_H_4 : NEOLED_CTRL_T_ONE_H_0`   ^| r/w <| 5-bit pulse clock ticks per high-time for sending a one-bit (T~1H~)
+                                  <|`27`    `NEOLED_CTRL_IRQ_CONF`                            ^| r/w <| TX FIFO interrupt configuration: `0`=IRQ if FIFO is empty, `1`=IRQ if FIFO is less than half-full
+                                  <|`28`    `NEOLED_CTRL_TX_EMPTY`                            ^| r/- <| TX FIFO is empty
+                                  <|`29`    `NEOLED_CTRL_TX_HALF`                             ^| r/- <| TX FIFO is _at least_ half full
+                                  <|`30`    `NEOLED_CTRL_TX_FULL`                             ^| r/- <| TX FIFO is full
+                                  <|`31`    `NEOLED_CTRL_TX_BUSY`                             ^| r/- <| TX serial engine is busy when set
 | `0xffffffdc` | `DATA` <|`31:0` / `23:0` ^| -/w <| TX data (32- or 24-bit, depending on _NEOLED_CTRL_MODE_ bit)
 |=======================
--- a/docs/datasheet/soc_onewire.adoc
+++ b/docs/datasheet/soc_onewire.adoc
@ -10,7 +10,7 @@
 |                          | neorv32_onewire.h |
 | Top entity port:         | `onewire_i` | 1-bit 1-wire bus sense input
 |                          | `onewire_o` | 1-bit 1-wire bus output (pull low only)
-| Configuration generics:  | _IO_ONEWIRE_EN_ | implement ONEWIRE interface controller when _true_
+| Configuration generics:  | `IO_ONEWIRE_EN`     | implement ONEWIRE interface controller when `true`
 | CPU interrupts:          | fast IRQ channel 13 | operation done interrupt (see <<_processor_interrupts>>)
 |=======================

@ -20,30 +20,25 @@
 The NEORV32 ONEWIRE module implements a single-wire interface controller that is compatible to the
 _Dallas/Maxim 1-Wire_ protocol, which is an asynchronous half-duplex bus requiring only a single signal wire
 connected to `onewire_io` (plus ground).
-The 1-Wire protocol allows an (nearly) arbitrary number of devices but only a single controller that initiates all transfers.

-The bus is based on a single tristate signal. The controller and all the devices can only pull-down the bus actively.
+The bus is based on a single open-drain signal. The controller and all the devices can only pull-down the bus actively.
 Hence, an external pull-up resistor is required. Recommended values are between 1kΩ and 4kΩ depending on the bus
 characteristics (wire length, number of devices, etc.). Furthermore, a series resistor (~100Ω) at the controller side
 is recommended to control the slew rate and to reduce signal reflections. Also, additional external ESD protection clamp diodes
-should be added to the `onewire_io` bus line.
-
-[TIP]
-For more information regarding the 1-Wire bus and the device access mechanism
-see the Application Notes provided by Maxim Integrated.
+should be added to the bus line.


 **Tri-State Drivers**

-The ONEWIRE module requires a tri-state driver for the 1-wire bus line, which has to be implemented
+The ONEWIRE module requires a tri-state driver (actually, open-drain) for the 1-wire bus line, which has to be implemented
 in the top module of the setup. A generic VHDL example is given below (`onewire` is the actual 1-wire
 bus signal, which is of type `std_logic`).

 .ONEWIRE VHDL tri-state driver example
 [source,VHDL]
 ----
-onewire   <= '0' when (onewire_o = '0') else 'Z';
-onewire_i <= std_ulogic(onewire);
+onewire   <= '0' when (onewire_o = '0') else 'Z'; -- drive
+onewire_i <= std_ulogic(onewire); -- sense
 ----


@ -53,18 +48,18 @@ The ONEWIRE controller provides two interface registers: `CTRL` and `DATA.` The
 is used to configure the module, to trigger bus transactions and to monitor the current state of the module.
 The `DATA` register is used to read/write data from/to the bus.

-The module is enabled by setting the _ONEWIRE_CTRL_EN_ bit in the control register. If this bit is cleared, the
-module is automatically reset and the bus is brought to high-impedance (tristate) state.
-The basic timing configuration is programmed via the clock prescaler bits _ONEWIRE_CTRL_PRSCx_ and the
-clock divider bits _ONEWIRE_CTRL_CLKDIVx_ (see next section).
+The module is enabled by setting the `ONEWIRE_CTRL_EN` bit in the control register. If this bit is cleared, the
+module is automatically reset and the bus is brought to high-level (due to the external pull-up resistor).
+The basic timing configuration is programmed via the clock prescaler bits `ONEWIRE_CTRL_PRSCx` and the
+clock divider bits `ONEWIRE_CTRL_CLKDIVx` (see next section).

 The controller can execute three basic bus operations, which are triggered by setting one out of three specific
 control register bits (the bits auto-clear):

 [start=1]
-. generate reset pulse and check for device presence; triggered when setting _ONEWIRE_CTRL_TRIG_RST_
-. transfer a single-bit (read-while-write); triggered when setting _ONEWIRE_CTRL_TRIG_BIT_
-. transfer a full-byte (read-while-write); triggered when setting _ONEWIRE_CTRL_TRIG_BYTE_
+. generate reset pulse and check for device presence; triggered when setting `ONEWIRE_CTRL_TRIG_RST`
+. transfer a single-bit (read-while-write); triggered when setting `ONEWIRE_CTRL_TRIG_BIT`
+. transfer a full-byte (read-while-write); triggered when setting `ONEWIRE_CTRL_TRIG_BYTE`

 [IMPORTANT]
 Only one trigger bit may be set at once, otherwise undefined behavior might occur.
@ -72,7 +67,7 @@ Only one trigger bit may be set at once, otherwise undefined behavior might occu
 When a single-bit operation has been triggered, the data previously written to `DATA[0]` will be send to the bus
 and `DATA[7]` will be sampled from the bus. Accordingly, a full-byte transmission will send the previously
 byte written to `DATA[7:0]` to the bus and will update `DATA[7:0]` with the data read from the bus (LSB-first).
-The triggered operation has completed when the module's busy flag _ONEWIRE_CTRL_BUSY_ has cleared again.
+The triggered operation has completed when the module's busy flag `ONEWIRE_CTRL_BUSY` has cleared again.

 .Read from Bus
 [NOTE]
@ -80,19 +75,17 @@ In order to read a single bit from the bus `DATA[0]` has to set to `1` before tr
 operation to allow the accessed device to pull-down the bus. Accordingly, `DATA` has to be set to `0xFF` before
 triggering the byte transmission operation when the controller shall read a byte from the bus.

-The _ONEWIRE_CTRL_PRESENCE_ bit gets set if at least one device has send a "presence" signal right after the
+The `ONEWIRE_CTRL_PRESENCE` bit gets set if at least one device has send a "presence" signal right after the
 reset pulse. 


 **Bus Timing**

-The control register provides a 2-bit clock prescaler select (_ONEWIRE_CTRL_PRSCx_) and a 8-bit clock divider
-(_ONEWIRE_CTRL_CLKDIVx_) for timing configuration. Both are used to define the elementary **base time T~base~**.
+The control register provides a 2-bit clock prescaler select (`ONEWIRE_CTRL_PRSCx`) and a 8-bit clock divider
+(`ONEWIRE_CTRL_CLKDIVx`) for timing configuration. Both are used to define the elementary **base time T~base~**.
 All bus operations are timed using _multiples_ of this elementary base time.

-The following clock prescalers are available:
-
-.ONEWIRE clock prescaler configurations
+.ONEWIRE Clock Prescaler Configurations
 [cols="<4,^1,^1,^1,^1"]
 [options="header",grid="rows"]
 |=======================
@ -100,7 +93,7 @@ The following clock prescalers are available:
 | Resulting `clock_prescaler` |      2 |      4 |      8 |     64
 |=======================

-Together with the clock divider value (_ONEWIRE_CTRL_PRSCx_ bits = `clock_divider`) the base time is defined by the
+Together with the clock divider value (`ONEWIRE_CTRL_PRSCx` bits = `clock_divider`) the base time is defined by the
 following formula:

 _**T~base~**_ = (1 / _f~main~[Hz]_) * `clock_prescaler` * (`clock_divider` + 1)
@ -183,15 +176,15 @@ according <<_mip>> CSR FIRQ bit.
 [options="header",grid="all"]
 |=======================
 | Address | Name [C] | Bit(s), Name [C] | R/W | Function
-.10+<| `0xffffff70` .10+<| `CTRL` <|`0`     _ONEWIRE_CTRL_EN_                               ^| r/w <| ONEWIRE enable, reset if cleared
-                                  <|`2:1`   _ONEWIRE_CTRL_PRSC1_   : _ONEWIRE_CTRL_PRSC0_   ^| r/w <| 2-bit clock prescaler select
-                                  <|`10:3`  _ONEWIRE_CTRL_CLKDIV7_ : _ONEWIRE_CTRL_CLKDIV0_ ^| r/w <| 8-bit clock divider value
-                                  <|`11`    _ONEWIRE_CTRL_TRIG_RST_                         ^| -/w <| trigger reset pulse, auto-clears
-                                  <|`12`    _ONEWIRE_CTRL_TRIG_BIT_                         ^| -/w <| trigger single bit transmission, auto-clears
-                                  <|`13`    _ONEWIRE_CTRL_TRIG_BYTE_                        ^| -/w <| trigger full-byte transmission, auto-clears
-                                  <|`28:14` -                                               ^| r/- <| _reserved_, read as zero
-                                  <|`29`    _ONEWIRE_CTRL_SENSE_                            ^| r/- <| current state of the bus line
-                                  <|`30`    _ONEWIRE_CTRL_PRESENCE_                         ^| r/- <| device presence detected after reset pulse
-                                  <|`31`    _ONEWIRE_CTRL_BUSY_                             ^| r/- <| operation in progress when set
-| `0xffffff74` | `DATA` |`7:0` _ONEWIRE_DATA_MSB_ : _ONEWIRE_DATA_LSB_ | r/w | receive/transmit data (8-bit)
+.10+<| `0xffffff70` .10+<| `CTRL` <|`0`     `ONEWIRE_CTRL_EN`                             ^| r/w <| ONEWIRE enable, reset if cleared
+                                  <|`2:1`   `ONEWIRE_CTRL_PRSC1 : ONEWIRE_CTRL_PRSC0`     ^| r/w <| 2-bit clock prescaler select
+                                  <|`10:3`  `ONEWIRE_CTRL_CLKDIV7 : ONEWIRE_CTRL_CLKDIV0` ^| r/w <| 8-bit clock divider value
+                                  <|`11`    `ONEWIRE_CTRL_TRIG_RST`                       ^| -/w <| trigger reset pulse, auto-clears
+                                  <|`12`    `ONEWIRE_CTRL_TRIG_BIT`                       ^| -/w <| trigger single bit transmission, auto-clears
+                                  <|`13`    `ONEWIRE_CTRL_TRIG_BYTE`                      ^| -/w <| trigger full-byte transmission, auto-clears
+                                  <|`28:14` -                                             ^| r/- <| _reserved_, read as zero
+                                  <|`29`    `ONEWIRE_CTRL_SENSE`                          ^| r/- <| current state of the bus line
+                                  <|`30`    `ONEWIRE_CTRL_PRESENCE`                       ^| r/- <| device presence detected after reset pulse
+                                  <|`31`    `ONEWIRE_CTRL_BUSY`                           ^| r/- <| operation in progress when set
+| `0xffffff74` | `DATA` |`7:0` `ONEWIRE_DATA_MSB : ONEWIRE_DATA_LSB` | r/w | receive/transmit data (8-bit)
 |=======================
--- a/docs/datasheet/soc_pwm.adoc
+++ b/docs/datasheet/soc_pwm.adoc
@ -9,35 +9,33 @@
 | Software driver file(s): | neorv32_pwm.c |
 |                          | neorv32_pwm.h |
 | Top entity port:         | `pwm_o` | PWM output channels (12-bit)
-| Configuration generics:  | _IO_PWM_NUM_CH_ | number of PWM channels to implement (0..12)
+| Configuration generics:  | `IO_PWM_NUM_CH` | number of PWM channels to implement (0..12)
 | CPU interrupts:          | none | 
 |=======================

-The PWM controller implements a pulse-width modulation controller with up to 12 independent channels and an
-8-bit resolution per channel. The actual number of implemented channels is defined by the _IO_PWM_NUM_CH_ generic.
+
+**Overview**
+**Overview**
+
+The PWM module implements a pulse-width modulation controller with up to 12 independent channels providing
+8-bit resolution per channel. The actual number of implemented channels is defined by the `IO_PWM_NUM_CH` generic.
 Setting this generic to zero will completely remove the PWM controller from the design.

 [NOTE]
 The `pwm_o` has a static size of 12-bit. If less than 12 PWM channels are configured, only the LSB-aligned channel
 bits are used while the remaining bits are hardwired to zero.

-The PWM controller is based on an 8-bit base counter with a programmable threshold comparators for each channel
-that defines the actual duty cycle. The controller can be used to drive fancy RGB-LEDs with 24-
-bit true color, to dim LCD back-lights or even for "analog" control. An external integrator (RC low-pass filter)
-can be used to smooth the generated "analog" signals.
-

 **Theory of Operation**

-The PWM controller is activated by setting the _PWM_CTRL_EN_ bit in the module's control register `CTRL`. When this
-bit is cleared, the unit is reset and all PWM output channels are set to zero.
-The 8-bit duty cycle for each channel, which represents the channel's "intensity", is defined via an 8-bit value. The module
-provides up to 3 duty cycle registers `DC[0]` to `DC[2]` (depending on the number of implemented channels).
-Each register contains the duty cycle configuration for 4 consecutive channels. For example, the duty cycle of channel 0
-is defined via bits 7:0 in `DC[0]`. The duty cycle of channel 2 is defined via bits 15:0 in `DC[0]` and so on.
+The PWM controller is activated by setting the `PWM_CTRL_EN` bit in the module's control register `CTRL`. When this
+bit is cleared, the unit is reset and all PWM output channels are set to zero. The module
+provides three duty cycle registers `DC[0]` to `DC[2]`. Each register contains the duty cycle configuration for four
+consecutive channels. For example, the duty cycle of channel 0 is defined via bits 7:0 in `DC[0]`. The duty cycle of
+channel 2 is defined via bits 15:0 in `DC[0]` and so on.

 [NOTE]
-Regardless of the configuration of _IO_PWM_NUM_CH_ all module registers can be accessed without raising an exception.
+Regardless of the configuration of `IO_PWM_NUM_CH` all module registers can be accessed without raising an exception.
 Software can discover the number of available channels by writing 0xff to all duty cycle configuration bytes and
 reading those values back. The duty-cycle of channels that were not implemented always reads as zero.

@ -46,8 +44,8 @@ Based on the configured duty cycle the according intensity of the channel can be
 _**Intensity~x~**_ = `DC[y](i*8+7 downto i*8)` / (2^8^)

 The base frequency of the generated PWM signals is defined by the PWM core clock. This clock is derived
-from the main processor clock and divided by a prescaler via the 3-bit PWM_CTRL_PRSCx in the unit's control
-register. The following pre-scalers are available:
+from the main processor clock and divided by a prescaler via the 3-bit `PWM_CTRL_PRSCx` in the unit's control
+register.

 .PWM prescaler configuration
 [cols="<4,^1,^1,^1,^1,^1,^1,^1,^1"]
@ -69,10 +67,9 @@ _**f~PWM~**_ = _f~main~[Hz]_ / (2^8^ * `clock_prescaler`)
 [options="header",grid="all"]
 |=======================
 | Address | Name [C] | Bit(s), Name [C] | R/W | Function
-.4+<| `0xffffff50` .4+<| `CTRL`  <|`0` _PWM_CTRL_EN_    ^| r/w | PWM enable
-                                 <|`1` _PWM_CTRL_PRSC0_ ^| r/w .3+<| 3-bit clock prescaler select
-                                 <|`2` _PWM_CTRL_PRSC1_ ^| r/w
-                                 <|`3` _PWM_CTRL_PRSC2_ ^| r/w
+.3+<| `0xffffff50` .3+<| `CTRL`  <|`0`    `PWM_CTRL_EN`                     ^| r/w <| PWM enable
+                                 <|`3:1`  `PWM_CTRL_PRSC2 : PWM_CTRL_PRSC0` ^| r/w <| 3-bit clock prescaler select
+                                 <|`31:4` -                                 ^| r/- <| _reserved_, read as zero
 .4+<| `0xffffff54` .4+<| `DC[0]` <|`7:0`   ^| r/w <| 8-bit duty cycle for channel 0
                                 <|`15:8`  ^| r/w <| 8-bit duty cycle for channel 1
                                 <|`23:16` ^| r/w <| 8-bit duty cycle for channel 2
--- a/docs/datasheet/soc_sdi.adoc
+++ b/docs/datasheet/soc_sdi.adoc
@ -12,8 +12,8 @@
 |                          | `sdi_dat_o` | 1-bit serial data output
 |                          | `sdi_dat_i` | 1-bit serial data input
 |                          | `sdi_csn_i` | 1-bit chip-select input (low-active)
-| Configuration generics:  | _IO_SDI_EN_   | implement SDI controller when _true_
-|                          | _IO_SDI_FIFO_ | data FIFO size, has to be at least 1 or a power of two
+| Configuration generics:  | `IO_SDI_EN`   | implement SDI controller when `true`
+|                          | `IO_SDI_FIFO` | data FIFO size, has to a power of two, min 1
 | CPU interrupts:          | fast IRQ channel 11 | configurable SDI interrupt (see <<_processor_interrupts>>)
 |=======================

@ -29,7 +29,7 @@ transmissions without CPU interaction.
 [NOTE]
 The NEORV32 SDI module only supports _device mode_. Transmission are initiated by an external host and not by the
 the processor itself. If you are looking for a _host-mode_ serial peripheral interface (transactions
-initiated by the NEORV32) check out the <<_serial_peripheral_interface_spi>> module.
+initiated by the NEORV32) check out the <<_serial_peripheral_interface_controller_spi>>.

 The SDI module provides a single control register `CTRL` to configure the module and to check it's status
 and a single data register `DATA` for receiving/transmitting data.
@ -37,18 +37,23 @@ and a single data register `DATA` for receiving/transmitting data.

 **Theory of Operation**

-The SDI module is enabled by setting the _SDI_CTRL_EN_ bit in the `CTRL` control register. Clearing this bit
+The SDI module is enabled by setting the `SDI_CTRL_EN` bit in the `CTRL` control register. Clearing this bit
 resets the entire module including the RX and TX FIFOs.

 The SDI operates on byte-level only. Data written to the `DATA` register will be pushed to the TX FIFO. Received
 data can be retrieved by reading the RX FIFO via the `DATA` register. The current state of these FIFOs is available
-via the control register's _SDI_CTRL_RX_*_ and _SDI_CTRL_TX_*_ flags. The RX FIFO can be manually cleared at any time
-by setting the _SDI_CTRL_CLR_RX_ bit.
+via the control register's `SDI_CTRL_RX_*` and `SDI_CTRL_TX_*` flags. The RX FIFO can be manually cleared at any time
+by setting the `SDI_CTRL_CLR_RX` bit.

 .MSB-first Only
 [NOTE]
 The NEORV32 SDI module only supports MSB-first mode.

+.Transmission Abort
+[NOTE]
+If the external SPI controller aborts an transmission (by setting the chip-select signal high again) _before_
+8 data bits have been transferred, no data is written to the RX FIFO.
+

 **SDI Clocking**

@ -61,11 +66,11 @@ clock domain to simplify timing behavior. However, the clock synchronization req
 **SDI Interrupt**

 The SDI module provides a set of programmable interrupt conditions based on the level of the RX & TX FIFOs. The different
-interrupt sources are enabled by setting the according control register's _SDI_CTRL_IRQ_ bits. All enabled interrupt
+interrupt sources are enabled by setting the according control register's `SDI_CTRL_IRQ` bits. All enabled interrupt
 conditions are logically OR-ed so any enabled interrupt source will trigger the module's interrupt signal.

 Once the SDI interrupt has fired it will remain active until the actual cause of the interrupt is resolved; for
-example if just the _SDI_CTRL_IRQ_RX_AVAIL_ bit is set, the interrupt will keep firing until the RX FIFO is empty again.
+example if just the `SDI_CTRL_IRQ_RX_AVAIL` bit is set, the interrupt will keep firing until the RX FIFO is empty again.
 Furthermore, an active SDI interrupt has to be explicitly cleared again by writing zero to the according
 <<_mip>> CSR bit.

@ -77,21 +82,21 @@ Furthermore, an active SDI interrupt has to be explicitly cleared again by writi
 [options="header",grid="all"]
 |=======================
 | Address | Name [C] | Bit(s), Name [C] | R/W | Function
-.16+<| `0xfffffff0` .16+<| `CTRL` <|`0`     _SDI_CTRL_EN_                             ^| r/w <| SDI module enable
-                                  <|`1`     _SDI_CTRL_CLR_RX_                         ^| -/w <| clear RX FIFO when set, bit auto-clears
-                                  <|`3:2`   _reserved_                                ^| r/- <| reserved, read as zero
-                                  <|`7:4`   _SDI_CTRL_FIFO_MSB_ : _SDI_CTRL_FIFO_LSB_ ^| r/- <| FIFO depth; log2(_IO_SDI_FIFO_)
-                                  <|`14:8`  _reserved_                                ^| r/- <| reserved, read as zero
-                                  <|`15`    _SDI_CTRL_IRQ_RX_AVAIL_                   ^| r/w <| fire interrupt if RX FIFO is not empty
-                                  <|`16`    _SDI_CTRL_IRQ_RX_HALF_                    ^| r/w <| fire interrupt if RX FIFO is at least half full
-                                  <|`17`    _SDI_CTRL_IRQ_RX_FULL_                    ^| r/w <| fire interrupt if if RX FIFO is full
-                                  <|`18`    _SDI_CTRL_IRQ_TX_EMPTY_                   ^| r/w <| fire interrupt if TX FIFO is empty
-                                  <|`22:19` _reserved_                                ^| r/- <| reserved, read as zero
-                                  <|`23`    _SDI_CTRL_RX_AVAIL_                       ^| r/- <| RX FIFO data available (RX FIFO not empty)
-                                  <|`24`    _SDI_CTRL_RX_HALF_                        ^| r/- <| RX FIFO at least half full
-                                  <|`25`    _SDI_CTRL_RX_FULL_                        ^| r/- <| RX FIFO full
-                                  <|`26`    _SDI_CTRL_TX_EMPTY_                       ^| r/- <| TX FIFO empty
-                                  <|`27`    _SDI_CTRL_TX_FULL_                        ^| r/- <| TX FIFO full
-                                  <|`31:28` _reserved_                                ^| r/- <| reserved, read as zero
+.16+<| `0xfffffff0` .16+<| `CTRL` <|`0`     `SDI_CTRL_EN`                           ^| r/w <| SDI module enable
+                                  <|`1`     `SDI_CTRL_CLR_RX`                       ^| -/w <| clear RX FIFO when set, bit auto-clears
+                                  <|`3:2`   _reserved_                              ^| r/- <| reserved, read as zero
+                                  <|`7:4`   `SDI_CTRL_FIFO_MSB : SDI_CTRL_FIFO_LSB` ^| r/- <| FIFO depth; log2(_IO_SDI_FIFO_)
+                                  <|`14:8`  _reserved_                              ^| r/- <| reserved, read as zero
+                                  <|`15`    `SDI_CTRL_IRQ_RX_AVAIL`                 ^| r/w <| fire interrupt if RX FIFO is not empty
+                                  <|`16`    `SDI_CTRL_IRQ_RX_HALF`                  ^| r/w <| fire interrupt if RX FIFO is at least half full
+                                  <|`17`    `SDI_CTRL_IRQ_RX_FULL`                  ^| r/w <| fire interrupt if if RX FIFO is full
+                                  <|`18`    `SDI_CTRL_IRQ_TX_EMPTY`                 ^| r/w <| fire interrupt if TX FIFO is empty
+                                  <|`22:19` _reserved_                              ^| r/- <| reserved, read as zero
+                                  <|`23`    `SDI_CTRL_RX_AVAIL`                     ^| r/- <| RX FIFO data available (RX FIFO not empty)
+                                  <|`24`    `SDI_CTRL_RX_HALF`                      ^| r/- <| RX FIFO at least half full
+                                  <|`25`    `SDI_CTRL_RX_FULL`                      ^| r/- <| RX FIFO full
+                                  <|`26`    `SDI_CTRL_TX_EMPTY`                     ^| r/- <| TX FIFO empty
+                                  <|`27`    `SDI_CTRL_TX_FULL`                      ^| r/- <| TX FIFO full
+                                  <|`31:28` _reserved_                              ^| r/- <| reserved, read as zero
 | `0xfffffff4` | `DATA` |`7:0` | r/w | receive/transmit data (FIFO)
 |=======================
--- a/docs/datasheet/soc_spi.adoc
+++ b/docs/datasheet/soc_spi.adoc
@ -12,71 +12,62 @@
 |                          | `spi_dat_o` | 1-bit serial data output
 |                          | `spi_dat_i` | 1-bit serial data input
 |                          | `spi_csn_o` | 8-bit dedicated chip select output (low-active)
-| Configuration generics:  | _IO_SPI_EN_   | implement SPI controller when _true_
-|                          | _IO_SPI_FIFO_ | FIFO depth, has to be a power of two, min 1
+| Configuration generics:  | `IO_SPI_EN`   | implement SPI controller when `true`
+|                          | `IO_SPI_FIFO` | FIFO depth, has to be a power of two, min 1
 | CPU interrupts:          | fast IRQ channel 6 | configurable SPI interrupt (see <<_processor_interrupts>>)
 |=======================


 **Overview**

-SPI is a common synchronous serial transmission interface for fast on-board communications.
 The NEORV32 SPI transceiver module operates on 8-bit base, supports all 4 standard clock modes
 and provides up to 8 dedicated chip select signals via the top entity's `spi_csn_o` signal.
-An receive/transmit FIFO can be configured via the _IO_SPI_FIFO_ generic to support block-based
+An receive/transmit FIFO can be configured via the `IO_SPI_FIFO` generic to support block-based
 transmissions without CPU interaction.

+The SPI module provides a single control register `CTRL` to configure the module and to check it's status
+and a single data register `DATA` for receiving/transmitting data.
+
 .Host-Mode Only
 [NOTE]
 The NEORV32 SPI module only supports _host mode_. Transmission are initiated only by the processor's SPI module
 and not by an external SPI module. If you are looking for a _device-mode_ serial peripheral interface (transactions
-initiated by an external host) check out the <<_serial_data_interface_sdi>> module..
+initiated by an external host) check out the <<_serial_data_interface_controller_sdi>>.

-The SPI module provides a single control register `CTRL` to configure the module and to check it's status
-and a single data register `DATA` for receiving/transmitting data.


 **Theory of Operation**

-The SPI module is enabled by setting the _SPI_CTRL_EN_ bit in the `CTRL` control register. No transfer can be initiated
+The SPI module is enabled by setting the `SPI_CTRL_EN` bit in the `CTRL` control register. No transfer can be initiated
 and no interrupt request will be triggered if this bit is cleared. Clearing this bit will reset the module, clear
 the FIFO and terminate any transfer being in process.

 The data quantity to be transferred within a single data transmission is fixed to 8 bits. However, the
-total transmission length is left to the user: after asserting chip-select an arbitrary amount of transmission
+total transmission length is left to the user: after asserting chip-select an arbitrary amount of 8-bit transmission
 can be made before de-asserting chip-select again.

 A transmission is started when writing data to the transmitter FIFO via the `DATA` register. Note that data always
-transferred MSB-first. The SPI operation is completed as soon as the _SPI_CTRL_BUSY_ flag clears. Received data can
-be retrieved by reading the RX FIFO also via the `DATA` register. The control register's SPI_CTRL_RX_AVAIL_,
-_SPI_CTRL_TX_EMPTY_, _SPI_CTRL_TX_NHALF_ and _SPI_CTRL_TX_FULL_ flags provide information regarding the FIFO levels.
+transferred MSB-first. The SPI operation is completed as soon as the `SPI_CTRL_BUSY` flag clears. Received data can
+be retrieved by reading the RX FIFO also via the `DATA` register. The control register's `SPI_CTRL_RX_AVAIL`,
+`SPI_CTRL_TX_EMPTY`, `SPI_CTRL_TX_NHALF` and `SPI_CTRL_TX_FULL` flags provide information regarding the RX/TX FIFO levels.

 The SPI controller features 8 dedicated chip-select lines. These lines are controlled via the control register's
-_SPI_CTRL_CS_SELx_ and _SPI_CTRL_CS_EN_ bits. The 3-bit _SPI_CTRL_CSx_ bits are used to select one out of the eight
-dedicated chip select lines. As soon as _SPI_CTRL_CS_EN_ is _set_ the selected chip select line is activated (driven _low_).
+`SPI_CTRL_CS_SELx` and `SPI_CTRL_CS_EN` bits. The 3-bit `SPI_CTRL_CS_SELx` bits are used to select one out of the eight
+dedicated chip select lines. As soon as `SPI_CTRL_CS_EN` is _set_ the selected chip select line is activated (driven _low_).
 Note that disabling the SPI module via the _SPI_CTRL_EN_ bit will also deactivate any currently activated chip select line.


 **SPI Clock Configuration**

 The SPI module supports all standard SPI clock modes (0, 1, 2, 3), which are configured via the two control register bits
-_SPI_CTRL_CPHA_ and _SPI_CTRL_CPOL_. The _SPI_CTRL_CPHA_ bit defines the _clock phase_ and the _SPI_CTRL_CPOL_
+`SPI_CTRL_CPHA` and `SPI_CTRL_CPOL`. The `SPI_CTRL_CPHA` bit defines the _clock phase_ and the `SPI_CTRL_CPOL`
 bit defines the _clock polarity_.

 .SPI clock modes; image from https://en.wikipedia.org/wiki/File:SPI_timing_diagram2.svg (license: (Wikimedia) https://en.wikipedia.org/wiki/Creative_Commons[Creative Commons] https://creativecommons.org/licenses/by-sa/3.0/deed.en[Attribution-Share Alike 3.0 Unported])
 image::SPI_timing_diagram2.wikimedia.png[]

-.SPI standard clock modes
-[cols="<2,^1,^1,^1,^1"]
-[options="header",grid="rows"]
-|=======================
-|                 | Mode 0 | Mode 1 | Mode 2 | Mode 3
-| _SPI_CTRL_CPOL_ |    `0` |    `0` |    `1` |    `1` 
-| _SPI_CTRL_CPHA_ |    `0` |    `1` |    `0` |    `1` 
-|=======================
-
-The SPI clock frequency (`spi_clk_o`) is programmed by the 3-bit _SPI_CTRL_PRSCx_ clock prescaler for a coarse clock selection
-and a 4-bit clock divider _SPI_CTRL_CDIVx_ for a fine clock configuration.
+The SPI clock frequency (`spi_clk_o`) is programmed by the 3-bit `SPI_CTRL_PRSCx` clock prescaler for a coarse clock selection
+and a 4-bit clock divider `SPI_CTRL_CDIVx` for a fine clock configuration.

 The following clock prescalers (_SPI_CTRL_PRSCx_) are available:

@ -88,10 +79,10 @@ The following clock prescalers (_SPI_CTRL_PRSCx_) are available:
 | Resulting `clock_prescaler` |       2 |       4 |       8 |      64 |     128 |    1024 |    2048 |    4096
 |=======================

-Based on the _SPI_CTRL_PRSCx_ and _SPI_CTRL_CDIVx_ configuration, the actual SPI clock frequency f~SPI~ is derived
+Based on the programmen clock configuration, the actual SPI clock frequency f~SPI~ is derived
 from the processor's main clock f~main~ according to the following equation:

-_**f~SPI~**_ = _f~main~[Hz]_ / (2 * `clock_prescaler` * (1 + _SPI_CTRL_CDIVx_))
+_**f~SPI~**_ = _f~main~[Hz]_ / (2 * `clock_prescaler` * (1 + `SPI_CTRL_CDIVx`))

 Hence, the maximum SPI clock is f~main~ / 4 and the lowest SPI clock is f~main~ / 131072. The SPI clock is always
 symmetric having a duty cycle of 50%.
@ -100,11 +91,11 @@ symmetric having a duty cycle of 50%.
 **SPI Interrupt**

 The SPI module provides a set of programmable interrupt conditions based on the level of the RX/TX FIFO. The different
-interrupt sources are enabled by setting the according control register's _SPI_CTRL_IRQ_ bits. All enabled interrupt
+interrupt sources are enabled by setting the according control register's `SPI_CTRL_IRQ_*` bits. All enabled interrupt
 conditions are logically OR-ed so any enabled interrupt source will trigger the module's interrupt signal.

 Once the SPI interrupt has fired it remains pending until the actual cause of the interrupt is resolved; for
-example if just the _SPI_CTRL_IRQ_RX_AVAIL_ bit is set, the interrupt will keep firing until the RX FIFO is empty again.
+example if just the `SPI_CTRL_IRQ_RX_AVAIL` bit is set, the interrupt will keep firing until the RX FIFO is empty again.
 Furthermore, an active SPI interrupt has to be explicitly cleared again by writing zero to the according
 <<_mip>> CSR bit.

@ -116,23 +107,23 @@ Furthermore, an active SPI interrupt has to be explicitly cleared again by writi
 [options="header",grid="all"]
 |=======================
 | Address | Name [C] | Bit(s), Name [C] | R/W | Function
-.18+<| `0xffffffa8` .18+<| `CTRL` <|`0`     _SPI_CTRL_EN_                             ^| r/w <| SPI module enable
-                                  <|`1`     _SPI_CTRL_CPHA_                           ^| r/w <| clock phase
-                                  <|`2`     _SPI_CTRL_CPOL_                           ^| r/w <| clock polarity
-                                  <|`5:3`   _SPI_CTRL_CS_SEL2_ : _SPI_CTRL_CS_SEL0_   ^| r/w <| Direct chip-select 0..7
-                                  <|`6`     _SPI_CTRL_CS_EN_                          ^| r/w <| Direct chip-select enable: setting `spi_csn_o(SPI_CTRL_CS_SEL)` low when set
-                                  <|`9:7`   _SPI_CTRL_PRSC2_ : _SPI_CTRL_PRSC0_       ^| r/w <| 3-bit clock prescaler select
-                                  <|`13:10` _SPI_CTRL_CDIV2_ : _SPI_CTRL_CDIV0_       ^| r/w <| 4-bit clock divider
-                                  <|`15:14`    _reserved_                             ^| r/- <| reserved, read as zero
-                                  <|`16`   _SPI_CTRL_RX_AVAIL_                        ^| r/- <| RX FIFO data available (RX FIFO not empty)
-                                  <|`17`   _SPI_CTRL_TX_EMPTY_                        ^| r/- <| TX FIFO empty
-                                  <|`18`   _SPI_CTRL_TX_NHALF_                        ^| r/- <| TX FIFO _not_ at least half full
-                                  <|`19`   _SPI_CTRL_TX_FULL_                         ^| r/- <| TX FIFO full
-                                  <|`20`   _SPI_CTRL_IRQ_RX_AVAIL_                    ^| r/w <| Trigger IRQ if RX FIFO not empty
-                                  <|`21`   _SPI_CTRL_IRQ_TX_EMPTY_                    ^| r/w <| Trigger IRQ if TX FIFO empty
-                                  <|`22`   _SPI_CTRL_IRQ_TX_NHALF_                    ^| r/w <| Trigger IRQ if TX FIFO _not_ at least half full
-                                  <|`26:23` _SPI_CTRL_FIFO_MSB_ : _SPI_CTRL_FIFO_LSB_ ^| r/- <| FIFO depth; log2(_IO_SPI_FIFO_)
-                                  <|`30:27` _reserved_                                ^| r/- <| reserved, read as zero
-                                  <|`31`   _SPI_CTRL_BUSY_                            ^| r/- <| SPI module busy when set (serial engine operation in progress and TX FIFO not empty yet)
+.18+<| `0xffffffa8` .18+<| `CTRL` <|`0`     `SPI_CTRL_EN`                           ^| r/w <| SPI module enable
+                                  <|`1`     `SPI_CTRL_CPHA`                         ^| r/w <| clock phase
+                                  <|`2`     `SPI_CTRL_CPOL`                         ^| r/w <| clock polarity
+                                  <|`5:3`   `SPI_CTRL_CS_SEL2 : SPI_CTRL_CS_SEL0`   ^| r/w <| Direct chip-select 0..7
+                                  <|`6`     `SPI_CTRL_CS_EN`                        ^| r/w <| Direct chip-select enable: setting `spi_csn_o(SPI_CTRL_CS_SEL)` low when set
+                                  <|`9:7`   `SPI_CTRL_PRSC2 : SPI_CTRL_PRSC0`       ^| r/w <| 3-bit clock prescaler select
+                                  <|`13:10` `SPI_CTRL_CDIV2 : SPI_CTRL_CDIV0`       ^| r/w <| 4-bit clock divider
+                                  <|`15:14`  _reserved_                             ^| r/- <| reserved, read as zero
+                                  <|`16`    `SPI_CTRL_RX_AVAIL`                     ^| r/- <| RX FIFO data available (RX FIFO not empty)
+                                  <|`17`    `SPI_CTRL_TX_EMPTY`                     ^| r/- <| TX FIFO empty
+                                  <|`18`    `SPI_CTRL_TX_NHALF`                     ^| r/- <| TX FIFO _not_ at least half full
+                                  <|`19`    `SPI_CTRL_TX_FULL`                      ^| r/- <| TX FIFO full
+                                  <|`20`    `SPI_CTRL_IRQ_RX_AVAIL`                 ^| r/w <| Trigger IRQ if RX FIFO not empty
+                                  <|`21`    `SPI_CTRL_IRQ_TX_EMPTY`                 ^| r/w <| Trigger IRQ if TX FIFO empty
+                                  <|`22`    `SPI_CTRL_IRQ_TX_NHALF`                 ^| r/w <| Trigger IRQ if TX FIFO _not_ at least half full
+                                  <|`26:23` `SPI_CTRL_FIFO_MSB : SPI_CTRL_FIFO_LSB` ^| r/- <| FIFO depth; log2(_IO_SPI_FIFO_)
+                                  <|`30:27` `reserved_                              ^| r/- <| reserved, read as zero
+                                  <|`31`    `SPI_CTRL_BUSY`                         ^| r/- <| SPI module busy when set (serial engine operation in progress and TX FIFO not empty yet)
 | `0xffffffac` | `DATA` |`7:0` | r/w | receive/transmit data (FIFO)
 |=======================
--- a/docs/datasheet/soc_sysinfo.adoc
+++ b/docs/datasheet/soc_sysinfo.adoc
@ -12,10 +12,11 @@
 | CPU interrupts:          | none | 
 |=======================

-**Theory of Operation**

-The SYSINFO allows the application software to determine the setting of most of the processor's top entity
-generics that are related to processor/SoC configuration. All registers of this unit are read-only.
+**Overview**
+
+The SYSINFO allows the application software to determine the setting of most of the <<_processor_top_entity_generics>>
+that are related to processor/SoC configuration. All registers of this unit are read-only.

 This device is always implemented - regardless of the actual hardware configuration. The bootloader as well
 as the NEORV32 software runtime environment require information from this device (like memory layout
@ -33,14 +34,14 @@ will signal a "DEVICE ERROR" in this case.
 [options="header",grid="all"]
 |=======================
 | Address | Name [C] | Function
-| `0xffffffe0` | `CLK`         | clock speed in Hz (via top's <<_clock_frequency>> generic)
-| `0xffffffe4` | `CUSTOM_ID    | custom user-defined ID (via top's <<_custom_id>> generic)
-| `0xffffffe8` | `SOC`         | specific SoC configuration (see <<_sysinfo_soc_configuration>>)
+| `0xffffffe0` | `CLK`         | clock speed in Hz (via top's `CLOCK_FREQUENCY` generic)
+| `0xffffffe4` | `CUSTOM_ID    | custom user-defined ID (via top's `CUSTOM_ID` generic)
+| `0xffffffe8` | `SOC`         | specific SoC configuration (see `sysinfo_soc_configuration>>)
 | `0xffffffec` | `CACHE`       | cache configuration information (see <<_sysinfo_cache_configuration>>)
 | `0xfffffff0` | `ISPACE_BASE` | instruction address space base (via package's `ispace_base_c` constant)
-| `0xfffffff4` | `IMEM_SIZE`   | internal IMEM size in bytes (via top's <<_mem_int_imem_size>> generic)
+| `0xfffffff4` | `IMEM_SIZE`   | internal IMEM size in bytes (via top's `MEM_INT_IMEM_SIZE` generic)
 | `0xfffffff8` | `DSPACE_BASE` | data address space base (via package's `sdspace_base_c` constant)
-| `0xfffffffc` | `DMEM_SIZE`   | internal DMEM size in bytes (via top's <<_mem_int_dmem_size>> generic)
+| `0xfffffffc` | `DMEM_SIZE`   | internal DMEM size in bytes (via top's `MEM_INT_DMEM_SIZE` generic)
 |=======================


@ -51,32 +52,32 @@ will signal a "DEVICE ERROR" in this case.
 [options="header",grid="all"]
 |=======================
 | Bit | Name [C] | Function
-| `0`    | _SYSINFO_SOC_BOOTLOADER_       | set if the processor-internal bootloader is implemented (via top's <<_int_bootloader_en>> generic)
-| `1`    | _SYSINFO_SOC_MEM_EXT_          | set if the external Wishbone bus interface is implemented (via top's <<_mem_ext_en>> generic)
-| `2`    | _SYSINFO_SOC_MEM_INT_IMEM_     | set if the processor-internal DMEM implemented (via top's <<_mem_int_dmem_en>> generic)
-| `3`    | _SYSINFO_SOC_MEM_INT_DMEM_     | set if the processor-internal IMEM is implemented (via top's <<_mem_int_imem_en>> generic)
-| `4`    | _SYSINFO_SOC_MEM_EXT_ENDIAN_   | set if external bus interface uses BIG-endian byte-order (via top's <<_mem_ext_big_endian>> generic)
-| `5`    | _SYSINFO_SOC_ICACHE_           | set if processor-internal instruction cache is implemented (via top's <<_icache_en>> generic)
-| `12:6` | -                              | _reserved_, read as zero
-| `13`   | _SYSINFO_SOC_IS_SIM_           | set if processor is being **simulated** (⚠️ not guaranteed)
-| `14`   | _SYSINFO_SOC_OCD_              | set if on-chip debugger implemented (via top's <<_on_chip_debugger_en>> generic)
-| `15`   | -                              | _reserved_, read as zero
-| `16`   | _SYSINFO_SOC_IO_GPIO_          | set if the GPIO is implemented (via top's <<_io_gpio_en>> generic)
-| `17`   | _SYSINFO_SOC_IO_MTIME_         | set if the MTIME is implemented (via top's <<_io_mtime_en>> generic)
-| `18`   | _SYSINFO_SOC_IO_UART0_         | set if the primary UART0 is implemented (via top's <<_io_uart0_en>> generic)
-| `19`   | _SYSINFO_SOC_IO_SPI_           | set if the SPI is implemented (via top's <<_io_spi_en>> generic)
-| `20`   | _SYSINFO_SOC_IO_TWI_           | set if the TWI is implemented (via top's <<_io_twi_en>> generic)
-| `21`   | _SYSINFO_SOC_IO_PWM_           | set if the PWM is implemented (via top's <<_io_pwm_num_ch>> generic)
-| `22`   | _SYSINFO_SOC_IO_WDT_           | set if the WDT is implemented (via top's <<_io_wdt_en>> generic)
-| `23`   | _SYSINFO_SOC_IO_CFS_           | set if the custom functions subsystem is implemented (via top's <<_io_cfs_en>> generic)
-| `24`   | _SYSINFO_SOC_IO_TRNG_          | set if the TRNG is implemented (via top's _IO_TRNG_EN_ generic)
-| `25`   | _SYSINFO_SOC_IO_SDI_           | set if the SDI is implemented (via top's <<_io_sdi_en>> generic)
-| `26`   | _SYSINFO_SOC_IO_UART1_         | set if the secondary UART1 is implemented (via top's <<_io_uart1_en>> generic)
-| `27`   | _SYSINFO_SOC_IO_NEOLED_        | set if the NEOLED is implemented (via top's <<_io_neoled_en>> generic)
-| `28`   | _SYSINFO_SOC_IO_XIRQ_          | set if the XIRQ is implemented (via top's <<_xirq_num_ch>> generic)
-| `29`   | _SYSINFO_SOC_IO_GPTMR_         | set if the GPTMR is implemented (via top's <<_io_gptmr_en>> generic)
-| `30`   | _SYSINFO_SOC_IO_XIP_           | set if the XIP module is implemented (via top's <<_io_xip_en>> generic)
-| `31`   | _SYSINFO_SOC_IO_ONEWIRE_       | set if the ONEWIRE interface is implemented (via top's <<_io_onewire_en>> generic)
+| `0`    | `SYSINFO_SOC_BOOTLOADER`     | set if the processor-internal bootloader is implemented (via top's `INT_BOOTLOADER_EN` generic)
+| `1`    | `SYSINFO_SOC_MEM_EXT`        | set if the external Wishbone bus interface is implemented (via top's `MEM_EXT_EN` generic)
+| `2`    | `SYSINFO_SOC_MEM_INT_IMEM`   | set if the processor-internal DMEM implemented (via top's `MEM_INT_DMEM_EN` generic)
+| `3`    | `SYSINFO_SOC_MEM_INT_DMEM`   | set if the processor-internal IMEM is implemented (via top's `MEM_INT_IMEM_EN` generic)
+| `4`    | `SYSINFO_SOC_MEM_EXT_ENDIAN` | set if external bus interface uses BIG-endian byte-order (via top's `MEM_EXT_BIG_ENDIAN` generic)
+| `5`    | `SYSINFO_SOC_ICACHE`         | set if processor-internal instruction cache is implemented (via top's `ICACHE_EN` generic)
+| `12:6` | -                            | _reserved_, read as zero
+| `13`   | `SYSINFO_SOC_IS_SIM`         | set if processor is being **simulated** (⚠️ not guaranteed)
+| `14`   | `SYSINFO_SOC_OCD`            | set if on-chip debugger implemented (via top's `ON_CHIP_DEBUGGER_EN` generic)
+| `15`   | -                            | _reserved_, read as zero
+| `16`   | `SYSINFO_SOC_IO_GPIO`        | set if the GPIO is implemented (via top's `IO_GPIO_EN` generic)
+| `17`   | `SYSINFO_SOC_IO_MTIME`       | set if the MTIME is implemented (via top's `IO_MTIME_EN` generic)
+| `18`   | `SYSINFO_SOC_IO_UART0`       | set if the primary UART0 is implemented (via top's `IO_UART0_EN` generic)
+| `19`   | `SYSINFO_SOC_IO_SPI`         | set if the SPI is implemented (via top's `IO_SPI_EN` generic)
+| `20`   | `SYSINFO_SOC_IO_TWI`         | set if the TWI is implemented (via top's `IO_TWI_EN` generic)
+| `21`   | `SYSINFO_SOC_IO_PWM`         | set if the PWM is implemented (via top's `IO_PWM_NUM_CH` generic)
+| `22`   | `SYSINFO_SOC_IO_WDT`         | set if the WDT is implemented (via top's `IO_WDT_EN` generic)
+| `23`   | `SYSINFO_SOC_IO_CFS`         | set if the custom functions subsystem is implemented (via top's `IO_CFS_EN` generic)
+| `24`   | `SYSINFO_SOC_IO_TRNG`        | set if the TRNG is implemented (via top's `IO_TRNG_EN` generic)
+| `25`   | `SYSINFO_SOC_IO_SDI`         | set if the SDI is implemented (via top's `IO_SDI_EN` generic)
+| `26`   | `SYSINFO_SOC_IO_UART1`       | set if the secondary UART1 is implemented (via top's `IO_UART1_EN` generic)
+| `27`   | `SYSINFO_SOC_IO_NEOLED`      | set if the NEOLED is implemented (via top's `IO_NEOLED_EN` generic)
+| `28`   | `SYSINFO_SOC_IO_XIRQ`        | set if the XIRQ is implemented (via top's `XIRQ_NUM_CH` generic)
+| `29`   | `SYSINFO_SOC_IO_GPTMR`       | set if the GPTMR is implemented (via top's `IO_GPTMR_EN` generic)
+| `30`   | `SYSINFO_SOC_IO_XIP`         | set if the XIP module is implemented (via top's `IO_XIP_EN` generic)
+| `31`   | `SYSINFO_SOC_IO_ONEWIRE`     | set if the ONEWIRE interface is implemented (via top's `IO_ONEWIRE_EN` generic)
 |=======================


@ -90,9 +91,9 @@ Bit fields in this register are set to all-zero if the according cache is not im
 [options="header",grid="all"]
 |=======================
 | Bit      | Name [C] | Function
-| `3:0`    | _SYSINFO_CACHE_IC_BLOCK_SIZE_3_ : _SYSINFO_CACHE_IC_BLOCK_SIZE_0_       | _log2_(i-cache block size in bytes), via top's <<_icache_block_size>> generic
-| `7:4`    | _SYSINFO_CACHE_IC_NUM_BLOCKS_3_ : _SYSINFO_CACHE_IC_NUM_BLOCKS_0_       | _log2_(i-cache number of cache blocks), via top's <<_icache_num_blocks>> generic
-| `11:9`   | _SYSINFO_CACHE_IC_ASSOCIATIVITY_3_ : _SYSINFO_CACHE_IC_ASSOCIATIVITY_0_ | _log2_(i-cache associativity), via top's <<_icache_associativity>> generic
-| `15:12`  | _SYSINFO_CACHE_IC_REPLACEMENT_3_ : _SYSINFO_CACHE_IC_REPLACEMENT_0_     | i-cache replacement policy (`0001` = LRU if associativity > 0)
-| `32:16`  | -                                                                       | zero, reserved for d-cache
+| `3:0`    | `SYSINFO_CACHE_IC_BLOCK_SIZE_3 : SYSINFO_CACHE_IC_BLOCK_SIZE_0`       | _log2_(i-cache block size in bytes), via top's `ICACHE_BLOCK_SIZE` generic
+| `7:4`    | `SYSINFO_CACHE_IC_NUM_BLOCKS_3 : SYSINFO_CACHE_IC_NUM_BLOCKS_0`       | _log2_(i-cache number of cache blocks), via top's `ICACHE_NUM_BLOCKS` generic
+| `11:9`   | `SYSINFO_CACHE_IC_ASSOCIATIVITY_3 : SYSINFO_CACHE_IC_ASSOCIATIVITY_0` | _log2_(i-cache associativity), via top's `ICACHE_ASSOCIATIVITY` generic
+| `15:12`  | `SYSINFO_CACHE_IC_REPLACEMENT_3 : SYSINFO_CACHE_IC_REPLACEMENT_0`     | i-cache replacement policy (`0001` = LRU if associativity > 0)
+| `32:16`  | -                                                                     | zero, reserved for d-cache
 |=======================
--- a/docs/datasheet/soc_trng.adoc
+++ b/docs/datasheet/soc_trng.adoc
@ -9,13 +9,13 @@
 | Software driver file(s): | neorv32_trng.c |
 |                          | neorv32_trng.h |
 | Top entity port:         | none | 
-| Configuration generics:  | _IO_TRNG_EN_   | implement TRNG when _true_
-|                          | _IO_TRNG_FIFO_ | data FIFO depth, min 1, has to be a power of two
+| Configuration generics:  | `IO_TRNG_EN`   | implement TRNG when `true`
+|                          | `IO_TRNG_FIFO` | data FIFO depth, min 1, has to be a power of two
 | CPU interrupts:          | none | 
 |=======================


-**Theory of Operation**
+**Overview**

 The NEORV32 true random number generator provides _physically_ true random numbers.
 Instead of using a pseudo RNG like a LFSR, the TRNG uses a simple, straight-forward ring
@ -23,48 +23,38 @@ oscillator concept as physical entropy source. Hence, voltage, thermal and also
 fluctuations are used to provide a true physical entropy source.

 The TRNG features a platform independent architecture without FPGA-specific primitives, macros or
-attributes so it can be synthesized for _any_ FPGA. Ir is based on the **neoTRNG V2**, which is a "spin-off project" of the
+attributes so it can be synthesized for _any_ FPGA. It is based on the **neoTRNG V2**, which is a "spin-off project" of the
 NEORV32 processor. More detailed information about the neoTRNG, its architecture and a
 detailed evaluation of the random number quality can be found it the neoTRNG repository: https://github.com/stnolting/neoTRNG

 .Inferring Latches
 [NOTE]
-The synthesis tool might emit a warning like _"inferring latches for ... neorv32_trng ..."_. This is no problem
+The synthesis tool might emit a warning like "inferring latches for ... neorv32_trng ...". This is no problem
 as this is what we actually want: the TRNG is based on latches, which implement the inverters of the ring oscillators.

 .Simulation
 [IMPORTANT]
 When simulating the processor the NEORV32 TRNG is automatically set to "simulation mode". In this mode, the physical entropy
 sources (= the ring oscillators) are replaced by a simple **pseudo RNG (LFSR)** providing weak pseudo-random data only.
-The _TRNG_CTRL_SIM_MODE_ flag of the control register is set if simulation mode is active.
+The `TRNG_CTRL_SIM_MODE` flag of the control register is set if simulation mode is active.


-**Using the TRNG**
+**Theory of Operation**

-The TRNG features a single control register `CTRL` for control, status check and data access. When the _TRNG_CTRL_EN_
-bit is set, the TRNG is enabled and starts operation.
+The TRNG features a single control register `CTRL` for control, status check and data access. When the `TRNG_CTRL_EN`
+bit is set, the TRNG is enabled and starts operation. As soon as the `TRNG_CTRL_VALID` bit is set a new random data byte
+is available and can be obtained from the lowest 8 bits of the `CTRL` register. If this bit is cleared, there is no
+valid data available and the lowest 8 bit of the `CTRL` register are set to all-zero.

-.TRNG Reset
-[NOTE]
-The TRNG core does not provide a dedicated reset. In order to ensure correct operations, the TRNG should be
-disabled (=reset) by clearing the _TRNG_CTRL_EN_ and waiting some 1000s clock cycles before re-enabling it.
-
-As soon as the _TRNG_CTRL_VALID_ bit is set a new random data byte is available and can be obtained from the lowest 8 bits
-of the `CTRL` register (_TRNG_CTRL_DATA_MSB_ : _TRNG_CTRL_DATA_LSB_). If this bit is cleared, there is no valid data available
-and the lowest 8 bit of the `CTRL` register are set to all-zero.
-
-.Read Access Security
-[NOTE]
-The random data byte (_TRNG_CTRL_DATA_) in the control register is automatically cleared after each read access
-to prevent software from reading the _same_ random data byte more than once.
-
-An optional random data FIFO can be configured using the <<_io_trng_fifo>> generic. This FIFO automatically samples
+An internal entropy FIFO can be configured using the `IO_TRNG_FIFO` generic. This FIFO automatically samples
 new random data from the TRNG to provide some kind of _random data pool_ for applications, which require a large number
-of RND data in a short time. The minimal and default value for <<_io_trng_fifo>> is 1 (implementing a register rather
-than a real FIFO); the generic has to be a power of two.
+of RND data in a short time. The random data FIFO can be cleared at any time either by disabling the TRNG or by
+setting the `TRNG_CTRL_FIFO_CLR` flag.

-The random data FIFO can be cleared at any time either by disabling the TRNG via the _TRNG_CTRL_EN_ flag or by
-setting the _TRNG_CTRL_FIFO_CLR_ flag. Note that this flag is write-only and auto clears after being set. 
+.Data Gating
+[NOTE]
+The TRNG data bits `TRNG_CTRL_DATA_MSB : TRNG_CTRL_DATA_MSB` are set to zero if `TRNG_CTRL_VALID` is low.
+This prevents a random byte being read twice.


 **Register Map**
@ -74,9 +64,9 @@ setting the _TRNG_CTRL_FIFO_CLR_ flag. Note that this flag is write-only and aut
 [options="header",grid="all"]
 |=======================
 | Address | Name [C] | Bit(s), Name [C] | R/W | Function
-.5+<| `0xffffffb8` .5+<| `CTRL` <|`7:0` _TRNG_CTRL_DATA_MSB_ : _TRNG_CTRL_DATA_MSB_ ^| r/- <| 8-bit random data
-                                <|`28` _TRNG_CTRL_FIFO_CLR_                         ^| -/w <| flush random data FIFO when set (auto clears)
-                                <|`29` _TRNG_CTRL_SIM_MODE_                         ^| r/- <| simulation mode (PRNG!)
-                                <|`30` _TRNG_CTRL_EN_                               ^| r/w <| TRNG enable
-                                <|`31` _TRNG_CTRL_VALID_                            ^| r/- <| random data is valid when set
+.5+<| `0xffffffb8` .5+<| `CTRL` <|`7:0` `TRNG_CTRL_DATA_MSB : TRNG_CTRL_DATA_MSB` ^| r/- <| 8-bit random data
+                                <|`28`  `TRNG_CTRL_FIFO_CLR`                      ^| -/w <| flush random data FIFO when set (auto-clears)
+                                <|`29`  `TRNG_CTRL_SIM_MODE`                      ^| r/- <| simulation mode (PRNG!)
+                                <|`30`  `TRNG_CTRL_EN`                            ^| r/w <| TRNG enable
+                                <|`31`  `TRNG_CTRL_VALID`                         ^| r/- <| random data is valid when set
 |=======================
--- a/docs/datasheet/soc_twi.adoc
+++ b/docs/datasheet/soc_twi.adoc
@ -12,20 +12,15 @@
 |                          | `twi_sda_o` | 1-bit serial data line output (pull low only)
 |                          | `twi_scl_i` | 1-bit serial clock line sense input
 |                          | `twi_scl_o` | 1-bit serial clock line output (pull low only)
-| Configuration generics:  | _IO_TWI_EN_ | implement TWI controller when _true_
+| Configuration generics:  | `IO_TWI_EN` | implement TWI controller when `true`
 | CPU interrupts:          | fast IRQ channel 7 | transmission done interrupt (see <<_processor_interrupts>>)
 |=======================


-**Theory of Operation**
+**Overview**

-The two wire interface - also called "I²C" - is a quite famous interface for connecting several on-board
-components. Since this interface only needs two signals (the serial data line SDA and the serial
-clock line SCL) for an arbitrarily number of devices it allows easy interconnections of
-several peripheral nodes.
-
-The NEORV32 TWI implements a **TWI controller**. Currently, **no multi-controller
-support** is available. Furthermore, the NEORV32 TWI unit cannot operate in peripheral mode.
+The NEORV32 TWI implements a **TWI controller**. Currently, **no multi-controller support** is available.
+Furthermore, the NEORV32 TWI unit cannot operate in peripheral mode.

 [IMPORTANT]
 The serial clock (SCL) and the serial data (SDA) lines can only be actively driven low by the
@ -34,26 +29,24 @@ controller. Hence, external pull-up resistors are required for these lines.

 **Tri-State Drivers**

-The TWI module requires two tri-state drivers for the SDA and SCL lines, which have to be implemented
-in the top module of the setup. A generic VHDL example is given below (`sda` and `scl` are the actual TWI
+The TWI module requires two tri-state drivers (actually: open-drain) for the SDA and SCL lines, which have to be
+implemented in the top module of the setup. A generic VHDL example is given below (`sda` and `scl` are the actual TWI
 bus signal, which are of type `std_logic`).

 .TWI VHDL tri-state driver example
 [source,VHDL]
 ----
-sda       <= '0' when (twi_sda_o = '0') else 'Z';
-scl       <= '0' when (twi_scl_o = '0') else 'Z';
-twi_sda_i <= std_ulogic(sda);
-twi_scl_i <= std_ulogic(scl);
+sda       <= '0' when (twi_sda_o = '0') else 'Z'; -- drive
+scl       <= '0' when (twi_scl_o = '0') else 'Z'; -- drive
+twi_sda_i <= std_ulogic(sda); -- sense
+twi_scl_i <= std_ulogic(scl); -- sense
 ----


 **TWI Clock Speed**

-The TWI clock frequency is programmed by the 3-bit _TWI_CTRL_PRSCx_ clock prescaler for a coarse selection
-and a 4-bit clock divider _TWI_CTRL_CDIVx_ for a fine selection.
-
-The following pre-scalers (_TWI_CTRL_PRSCx_) are available:
+The TWI clock frequency is programmed by the 3-bit `TWI_CTRL_PRSCx` clock prescaler for a coarse selection
+and a 4-bit clock divider `TWI_CTRL_CDIVx` for a fine selection.

 .TWI prescaler configuration
 [cols="<4,^1,^1,^1,^1,^1,^1,^1,^1"]
@ -63,7 +56,7 @@ The following pre-scalers (_TWI_CTRL_PRSCx_) are available:
 | Resulting `clock_prescaler` |       2 |       4 |       8 |      64 |     128 |    1024 |    2048 |    4096
 |=======================

-Based on the _TWI_CTRL_PRSCx_ and _TWI_CTRL_CDIVx_ configuration, the actual TWI clock frequency f~SCL~ is derived
+Based on the the clock configuration, the actual TWI clock frequency f~SCL~ is derived
 from the processor's main clock f~main~ according to the following equation:

 _**f~SCL~**_ = _f~main~[Hz]_ / (4 * `clock_prescaler` * (1 + TWI_CTRL_CDIV))
@ -71,59 +64,20 @@ _**f~SCL~**_ = _f~main~[Hz]_ / (4 * `clock_prescaler` * (1 + TWI_CTRL_CDIV))
 Hence, the maximum TWI clock is f~main~ / 8 and the lowest TWI clock is f~main~ / 262144. The generated TWI clock is
 always symmetric having a duty cycle of exactly 50%. However, an accessed peripheral can "slow down" the bus clock
 by using **clock stretching** (= actively driving the SCL line low). The controller will pause operation in this case
-if clock stretching is enabled via the _TWI_CTRL_CSEN_ bit of the unit's control register `CTRL`
+if clock stretching is enabled via the `TWI_CTRL_CSEN` bit of the unit's control register `CTRL`


 **TWI Transfers**

-The TWI is enabled via the _TWI_CTRL_EN_ bit in the `CTRL` control register. The user program can start / stop a
+The TWI is enabled via the `TWI_CTRL_EN` bit in the `CTRL` control register. The user program can start / stop a
 transmission by issuing a START or STOP condition. These conditions are generated by setting the
-according bits (_TWI_CTRL_START_ or _TWI_CTRL_STOP_) in the control register.
+according bits (`TWI_CTRL_START` or `TWI_CTRL_STOP`) in the control register.

 Data is transferred via the TWI bus by writing a byte to the `DATA` register. The written byte is send via the TWI bus
 and the received byte from the bus is also available in this register after the transmission is completed. 

 The TWI operation (transmitting data or performing a START or STOP condition) is in progress as long as the
-control register's _TWI_CTRL_BUSY_ bit is set.
-
-
-**TWI ACK/NACK and MACK**
-
-An accessed TWI peripheral has to acknowledge each transferred byte. When the _TWI_CTRL_ACK_ bit is set after a
-completed transmission the accessed peripheral has send an _acknowledge_. If this bit is cleared after a completed
-transmission, the peripheral has send a _not-acknowledge_ (NACK).
-
-The NEORV32 TWI controller can also send an ACK generated by itself ("controller acknowledge _MACK_") right after
-transmitting a byte by driving SDA low during the ACK time slot. Some TWI modules require this MACK to acknowledge
-certain data movement operations.
-
-The control register's _TWI_CTRL_MACK_ bit has to be set to make the TWI module automatically generate a MACK after
-the byte transmission has been completed. If this bit is cleared, the ACK/NACK generated by the peripheral is sampled
-in this time slot instead (normal mode).
-
-
-**TWI Bus Status**
-
-The TWI controller can check if the TWI bus is currently claimed (SCL and SDA both low). The bus can be claimed by the
-NEORV32 TWI itself or by any other controller. Bit _TWI_CTRL_CLAIMED_ of the control register will be set if the bus
-is currently claimed.
-
-
-**Summary**
-
-In summary, a complete TWI transfer is based on the following elementary operation:
-
-[start=1]
-. generate START condition by setting _TWI_CTRL_START_
-. wait until _TWI_CTRL_BUSY_ has cleared (start condition completed)
-. transfer one byte while also sampling one byte from the bus (this also samples ACK/NACK or generates a
-controller ACK "MACK" if _TWI_CTRL_MACK_ is set) by writing data to `NEORV32_TWI.DATA`; this step can be repeated to
-send/receive an arbitrary number of bytes
-. wait until _TWI_CTRL_BUSY_ has cleared (data transfer completed)
-. optionally generate another START condition (as REPEATED-START condition) by setting _TWI_CTRL_START_ again
-. wait until _TWI_CTRL_BUSY_ has cleared (repeated-start condition completed)
-. generate STOP condition by setting _TWI_CTRL_STOP_
-. wait until _TWI_CTRL_BUSY_ has cleared (stop condition completed)
+control register's `TWI_CTRL_BUSY` bit is set.

 [TIP]
 A transmission can be terminated at any time by disabling the TWI module
@ -134,6 +88,28 @@ When reading data from a device, an all-one byte (`0xFF`) has to be written to T
 so the accessed device can actively pull-down SDA when required.


+**TWI ACK/NACK and MACK**
+
+An accessed TWI peripheral has to acknowledge each transferred byte. When the `TWI_CTRL_ACK` bit is set after a
+completed transmission the accessed peripheral has send an ACKNOWLEDGE (ACK). If this bit is cleared after a completed
+transmission, the peripheral has send a_NOT-ACKNOWLEDGE (NACK).
+
+The NEORV32 TWI controller can also send an ACK generated by itself ("controller acknowledge `MACK`") right after
+transmitting a byte by driving SDA low during the ACK time slot. Some TWI modules require this MACK to acknowledge
+certain data movement operations.
+
+The control register's `TWI_CTRL_MACK` bit has to be set to make the TWI module automatically generate a MACK after
+the byte transmission has been completed. If this bit is cleared, the ACK/NACK generated by the peripheral is sampled
+in this time slot instead (normal mode).
+
+
+**TWI Bus Status**
+
+The TWI controller can check if the TWI bus is currently claimed (SCL and SDA both low). The bus can be claimed by the
+NEORV32 TWI itself or by any other controller. Bit `TWI_CTRL_CLAIME` of the control register will be set if the bus
+is currently claimed.
+
+
 **TWI Interrupt**

 The TWI module provides a single interrupt to signal "transmission done" to the CPU. Whenever the TWI
@ -149,16 +125,16 @@ explicitly cleared again by writing zero to the according <<_mip>> CSR bit.
 [options="header",grid="all"]
 |=======================
 | Address | Name [C] | Bit(s), Name [C] | R/W | Function
-.10+<| `0xffffffb0` .10+<| `CTRL` <|`0`     _TWI_CTRL_EN_                       ^| r/w <| TWI enable, reset if cleared
-                                  <|`1`     _TWI_CTRL_START_                    ^| -/w <| generate START condition, auto-clears
-                                  <|`2`     _TWI_CTRL_STOP_                     ^| -/w <| generate STOP condition, auto-clears
-                                  <|`3`     _TWI_CTRL_MACK_                     ^| r/w <| generate controller-ACK for each transmission ("MACK")
-                                  <|`4`     _TWI_CTRL_CSEN_                     ^| r/w <| allow clock stretching when set
-                                  <|`7:5`   _TWI_CTRL_PRSC2_ : _TWI_CTRL_PRSC0_ ^| r/w <| 3-bit clock prescaler select
-                                  <|`11:8`  _TWI_CTRL_CDIV3_ : _TWI_CTRL_CDIV0_ ^| r/w <| 4-bit clock divider
-                                  <|`28:12` -                                   ^| r/- <| _reserved_, read as zero
-                                  <|`29`    _TWI_CTRL_CLAIMED_                  ^| r/- <| set if the TWI bus is claimed by any controller
-                                  <|`30`    _TWI_CTRL_ACK_                      ^| r/- <| ACK received when set, NACK received when cleared
-                                  <|`31`    _TWI_CTRL_BUSY_                     ^| r/- <| transfer/START/STOP in progress when set
-| `0xffffffb4` | `DATA` |`7:0` _TWI_DATA_MSB_ : _TWI_DATA_LSB_ | r/w | receive/transmit data
+.10+<| `0xffffffb0` .10+<| `CTRL` <|`0`     `TWI_CTRL_EN`                     ^| r/w <| TWI enable, reset if cleared
+                                  <|`1`     `TWI_CTRL_START`                  ^| -/w <| generate START condition, auto-clears
+                                  <|`2`     `TWI_CTRL_STOP`                   ^| -/w <| generate STOP condition, auto-clears
+                                  <|`3`     `TWI_CTRL_MACK`                   ^| r/w <| generate controller-ACK for each transmission ("MACK")
+                                  <|`4`     `TWI_CTRL_CSEN`                   ^| r/w <| allow clock stretching when set
+                                  <|`7:5`   `TWI_CTRL_PRSC2 : TWI_CTRL_PRSC0` ^| r/w <| 3-bit clock prescaler select
+                                  <|`11:8`  `TWI_CTRL_CDIV3 : TWI_CTRL_CDIV0` ^| r/w <| 4-bit clock divider
+                                  <|`28:12` -                                 ^| r/- <| _reserved_, read as zero
+                                  <|`29`    `TWI_CTRL_CLAIMED`                ^| r/- <| set if the TWI bus is claimed by any controller
+                                  <|`30`    `TWI_CTRL_ACK`                    ^| r/- <| ACK received when set, NACK received when cleared
+                                  <|`31`    `TWI_CTRL_BUSY`                   ^| r/- <| transfer/START/STOP in progress when set
+| `0xffffffb4` | `DATA` |`7:0` | r/w | receive/transmit data
 |=======================
--- a/docs/datasheet/soc_uart.adoc
+++ b/docs/datasheet/soc_uart.adoc
@ -12,9 +12,9 @@
 |                          | `uart0_rxd_i` | serial receiver input
 |                          | `uart0_rts_o` | flow control: RX ready to receive, low-active
 |                          | `uart0_cts_i` | flow control: RX ready to receive, low-active
-| Configuration generics:  | _IO_UART0_EN_   | implement UART0 when _true_
-|                          | _UART0_RX_FIFO_ | RX FIFO depth (power of 2, min 1)
-|                          | _UART0_TX_FIFO_ | TX FIFO depth (power of 2, min 1)
+| Configuration generics:  | `IO_UART0_EN`   | implement UART0 when `true`
+|                          | `UART0_RX_FIFO` | RX FIFO depth (power of 2, min 1)
+|                          | `UART0_TX_FIFO` | TX FIFO depth (power of 2, min 1)
 | CPU interrupts:          | fast IRQ channel 2 | RX interrupt
 |                          | fast IRQ channel 3 | TX interrupt (see <<_processor_interrupts>>)
 |=======================
@ -22,8 +22,8 @@

 **Overview**

-The NEORV32 UART provides a standard serial interface with independent transmitter and receiver channel, each
-quipped with a configurable FIFO. The transmission frame is fixed to **8N1**: 8 data bits, no parity bit, 1 stop
+The NEORV32 UART provides a standard serial interface with independent transmitter and receiver channels, each
+equipped with a configurable FIFO. The transmission frame is fixed to **8N1**: 8 data bits, no parity bit, 1 stop
 bit. The actual transmission rate (Baud rate) is programmable via software. The module features two memory-mapped
 registers: `CTRL` and `DATA`. These are used for configuration, status check and data transfer.

@ -36,11 +36,11 @@ is used to implement the "standard consoles" (`STDIN`, `STDOUT` and `STDERR`).

 **Theory of Operation**

-UART0 is enabled by setting the _UART_CTRL_EN_ bit in the UART0 control register `CTRL`. The Baud rate
-is configured via a 10-bit _UART_CTRL_BAUDx_ baud divisor (`baud_div`) and a 3-bit _UART_CTRL_PRSCx_
+The module is enabled by setting the `UART_CTRL_EN` bit in the UART0 control register `CTRL`. The Baud rate
+is configured via a 10-bit `UART_CTRL_BAUDx` baud divisor (`baud_div`) and a 3-bit `UART_CTRL_PRSCx`
 clock prescaler (`clock_prescaler`).

-.UART0 clock configuration
+.UART0 Clock Configuration
 [cols="<4,^1,^1,^1,^1,^1,^1,^1,^1"]
 [options="header",grid="rows"]
 |=======================
@ -50,12 +50,12 @@ clock prescaler (`clock_prescaler`).

 _**Baud rate**_ = (_f~main~[Hz]_ / `clock_prescaler`) / (`baud_div` + 1)

-The control register's _UART_CTRL_RX_ and _UART_CTRL_TX_ flags provide information about the RX and TX FIFO fill level.
-Disabling the module via the _UART_CTRL_EN_ bit will also clear these FIFOs.
+The control register's `UART_CTRL_RX_*` and `UART_CTRL_TX_*` flags provide information about the RX and TX FIFO fill level.
+Disabling the module via the `UART_CTRL_EN` bit will also clear these FIFOs.

-A new TX transmission is started by writing the data byte to be send to the lowest byte of the `DATA` register. The
-transfer is completed when the _UART_CTRL_TX_BUSY_ control register flag returns to zero. Rx data is available when
-the _UART_CTRL_RX_NEMPTY_ flag becomes set. The _UART_CTRL_RX_OVER_ will be set if the RX FIFO overflows. This flag
+A new TX transmission is started by writing data to the lowest byte of the `DATA` register. The
+transfer is completed when the `UART_CTRL_TX_BUSY` control register flag returns to zero. RX data is available when
+the `UART_CTRL_RX_NEMPTY` flag becomes set. The `UART_CTRL_RX_OVER` will be set if the RX FIFO overflows. This flag
 is cleared by reading the `DATA` register or by disabling the module.


@ -63,18 +63,18 @@ is cleared by reading the `DATA` register or by disabling the module.

 The UART module provides independent interrupt channels for RX and TX. These interrupts are triggered by certain RX and TX
 FIFO levels. The actual configuration is programmed independently for the RX and TX interrupt channel via the control register's
-_UART_CTRL_IRQ_RX_ and _UART_CTRL_IRQ_TX_ bits:
+`UART_CTRL_IRQ_RX_*` and `UART_CTRL_IRQ_TX_*` bits:

-. **RX IRQ** The RX interrupt can be triggered by three different RX FIFO level states: If _UART_CTRL_IRQ_RX_NEMPTY_ is set the
-interrupt fires if the RX FIFO is _not_ empty (e.g. when incoming data is available). If _UART_CTRL_IRQ_RX_HALF_ is set the RX IRQ
-fires if the RX FIFO is at least half-full. If _UART_CTRL_IRQ_RX_FULL_ the interrupt fires if the RX FIFO is full. Note that all
+. **RX IRQ** The RX interrupt can be triggered by three different RX FIFO level states: If `UART_CTRL_IRQ_RX_NEMPTY` is set the
+interrupt fires if the RX FIFO is _not_ empty (e.g. when incoming data is available). If `UART_CTRL_IRQ_RX_HALF` is set the RX IRQ
+fires if the RX FIFO is at least half-full. If `UART_CTRL_IRQ_RX_FULL` the interrupt fires if the RX FIFO is full. Note that all
 these programmable conditions are logically OR-ed (interrupt fires if any enabled conditions is true).
-. **TX IRQ** The TX interrupt can be triggered by two different TX FIFO level states: If _UART_CTRL_IRQ_TX_EMPTY_ is set the
-interrupt fires if the TX FIFO is empty. If _UART_CTRL_IRQ_TX_NHALF_ is set the interrupt fires if the TX FIFO is _not_ at least
+. **TX IRQ** The TX interrupt can be triggered by two different TX FIFO level states: If `UART_CTRL_IRQ_TX_EMPTY` is set the
+interrupt fires if the TX FIFO is empty. If `UART_CTRL_IRQ_TX_NHALF` is set the interrupt fires if the TX FIFO is _not_ at least
 half full. Note that all these programmable conditions are logically OR-ed (interrupt fires if any enabled conditions is true).

 Once an UART interrupt has fired it remains pending until the actual cause of the interrupt is resolved; for
-example if just the _UART_CTRL_IRQ_RX_NEMPTY_ bit is set, the RX interrupt will keep firing until the RX FIFO is empty again.
+example if just the `UART_CTRL_IRQ_RX_NEMPTY` bit is set, the RX interrupt will keep firing until the RX FIFO is empty again.
 Furthermore, a pending UART interrupt has to be explicitly cleared again by writing zero to the according <<_mip>> CSR bit.


@ -82,15 +82,15 @@ Furthermore, a pending UART interrupt has to be explicitly cleared again by writ

 The NEORV32 UART supports optional hardware flow control using the standard CTS `uart0_cts_i` ("clear to send") and RTS
 `uart0_rts_o` ("ready to send" / "ready to receive (RTR)") signals. Both signals are low-active.
-Hardware flow control is enabled by setting the _UART_CTRL_HWFC_EN_ bit in the modules control register `CTRL`.
+Hardware flow control is enabled by setting the `UART_CTRL_HWFC_EN` bit in the modules control register `CTRL`.

 When hardware flow control is enabled:

 . The UART's transmitter will not start a new transmission until the `uart0_cts_i` signal goes low.
-During this time, the UART busy flag _UART_CTRL_TX_BUSY_ remains set.
+During this time, the UART busy flag `UART_CTRL_TX_BUSY` remains set.
 . The UART will set `uart0_rts_o` signal low if the RX FIFO is **less than half full** (to have a wide safety margin).
 As long as this signal is low, the connected device can send new data. `uart0_rts_o` is always low if the hardware flow-control
-is disabled. Disabling the UART (setting _UART_CTRL_EN_ low) while having hardware flow-control enabled, will set `uart0_rts_o`
+is disabled. Disabling the UART (setting `UART_CTRL_EN` low) while having hardware flow-control enabled, will set `uart0_rts_o`
 high to signal that the UARt is not capable of receiving new data.

 [NOTE]
@ -101,7 +101,7 @@ unconnected. If the CTS handshake is not required it has to be tied to zero.
 **Simulation Mode**

 The UART provides a _simulation-only_ mode to dump console data as well as raw data directly to a file. When the simulation
-mode is enabled (by setting the _UART_CTRL_SIM_MODE_ bit) there will be **no** physical transaction on the `uart0_txd_o` signal.
+mode is enabled (by setting the `UART_CTRL_SIM_MODE` bit) there will be **no** physical transaction on the `uart0_txd_o` signal.
 Instead, all data written to the `DATA` register is immediately dumped to a file.

 . Data written to `DATA[7:0]` will be dumped as ASCII chars to a file named `neorv32.uart0.sim_mode.text.out`. Additionally,
@ -118,24 +118,25 @@ Both file are created in the simulation's home folder.
 [options="header",grid="all"]
 |=======================
 | Address | Name [C] | Bit(s), Name [C] | R/W | Function
-.18+<| `0xffffffa0` .18+<| `CTRL` <|`0`    _UART_CTRL_EN_                        ^| r/w <| UART enable
-                                  <|`1`    _UART_CTRL_SIM_MODE_                  ^| r/w <| enable **simulation mode**
-                                  <|`2`    _UART_CTRL_HWFC_EN_                   ^| r/w <| enable RTS/CTS hardware flow-control
-                                  <|`5:3`  _UART_CTRL_PRSC2_ : _UART_CTRL_PRSC0_ ^| r/w <| Baud rate clock prescaler select
-                                  <|`15:6` _UART_CTRL_BAUD9_ : _UART_CTRL_BAUD0_ ^| r/w <| 12-bit Baud value configuration value
-                                  <|`16`   _UART_CTRL_RX_NEMPTY_                 ^| r/- <| RX FIFO not empty
-                                  <|`17`   _UART_CTRL_RX_HALF_                   ^| r/- <| RX FIFO at least half-full
-                                  <|`18`   _UART_CTRL_RX_FULL_                   ^| r/- <| RX FIFO full
-                                  <|`19`   _UART_CTRL_TX_EMPTY_                  ^| r/- <| TX FIFO empty
-                                  <|`20`   _UART_CTRL_TX_NHALF_                  ^| r/- <| TX FIFO not at least half-full
-                                  <|`21`   _UART_CTRL_TX_FULL_                   ^| r/- <| TX FIFO full
-                                  <|`22`   _UART_CTRL_IRQ_RX_NEMPTY_             ^| r/w <| fire IRQ if RX FIFO not empty
-                                  <|`23`   _UART_CTRL_IRQ_RX_HALF_               ^| r/w <| fire IRQ if RX FIFO at least half-full
-                                  <|`24`   _UART_CTRL_IRQ_RX_FULL_               ^| r/w <| fire IRQ if RX FIFO full
-                                  <|`25`   _UART_CTRL_IRQ_TX_EMPTY_              ^| r/w <| fire IRQ if TX FIFO empty
-                                  <|`26`   _UART_CTRL_IRQ_TX_NHALF_              ^| r/w <| fire IRQ if TX not at least half full
-                                  <|`30`   _UART_CTRL_RX_OVER_                   ^| r/- <| RX FIFO overflow
-                                  <|`31`   _UART_CTRL_TX_BUSY_                   ^| r/- <| TX busy or TX FIFO not empty
+.19+<| `0xffffffa0` .19+<| `CTRL` <|`0`     `UART_CTRL_EN`                      ^| r/w <| UART enable
+                                  <|`1`     `UART_CTRL_SIM_MODE`                ^| r/w <| enable **simulation mode**
+                                  <|`2`     `UART_CTRL_HWFC_EN`                 ^| r/w <| enable RTS/CTS hardware flow-control
+                                  <|`5:3`   `UART_CTRL_PRSC2 : UART_CTRL_PRSC0` ^| r/w <| Baud rate clock prescaler select
+                                  <|`15:6`  `UART_CTRL_BAUD9 : UART_CTRL_BAUD0` ^| r/w <| 12-bit Baud value configuration value
+                                  <|`16`    `UART_CTRL_RX_NEMPTY`               ^| r/- <| RX FIFO not empty
+                                  <|`17`    `UART_CTRL_RX_HALF`                 ^| r/- <| RX FIFO at least half-full
+                                  <|`18`    `UART_CTRL_RX_FULL`                 ^| r/- <| RX FIFO full
+                                  <|`19`    `UART_CTRL_TX_EMPTY`                ^| r/- <| TX FIFO empty
+                                  <|`20`    `UART_CTRL_TX_NHALF`                ^| r/- <| TX FIFO not at least half-full
+                                  <|`21`    `UART_CTRL_TX_FULL`                 ^| r/- <| TX FIFO full
+                                  <|`22`    `UART_CTRL_IRQ_RX_NEMPTY`           ^| r/w <| fire IRQ if RX FIFO not empty
+                                  <|`23`    `UART_CTRL_IRQ_RX_HALF`             ^| r/w <| fire IRQ if RX FIFO at least half-full
+                                  <|`24`    `UART_CTRL_IRQ_RX_FULL`             ^| r/w <| fire IRQ if RX FIFO full
+                                  <|`25`    `UART_CTRL_IRQ_TX_EMPTY`            ^| r/w <| fire IRQ if TX FIFO empty
+                                  <|`26`    `UART_CTRL_IRQ_TX_NHALF`            ^| r/w <| fire IRQ if TX not at least half full
+                                  <|`29:27` -                                   ^| r/- <| _reserved_ read as zero
+                                  <|`30`    `UART_CTRL_RX_OVER`                 ^| r/- <| RX FIFO overflow
+                                  <|`31`    `UART_CTRL_TX_BUSY`                 ^| r/- <| TX busy or TX FIFO not empty
 .3+<| `0xffffffa4` .3+<| `DATA` <|`7:0`  ^| r/w <| receive/transmit data
                                <|`31:8` ^| r/- <| _reserved_, read as zero
                                <|`31:0` ^| -/w <| **simulation data output**
@ -158,9 +159,9 @@ Both file are created in the simulation's home folder.
 |                          | `uart1_rxd_i` | serial receiver input
 |                          | `uart1_rts_o` | flow control: RX ready to receive, low-active
 |                          | `uart1_cts_i` | flow control: RX ready to receive, low-active
-| Configuration generics:  | _IO_UART1_EN_   | implement UART1 when _true_
-|                          | _UART1_RX_FIFO_ | RX FIFO depth (power of 2, min 1)
-|                          | _UART1_TX_FIFO_ | TX FIFO depth (power of 2, min 1)
+| Configuration generics:  | `IO_UART1_EN`   | implement UART1 when `true`
+|                          | `UART1_RX_FIFO` | RX FIFO depth (power of 2, min 1)
+|                          | `UART1_TX_FIFO` | TX FIFO depth (power of 2, min 1)
 | CPU interrupts:          | fast IRQ channel 4 | RX interrupt
 |                          | fast IRQ channel 5 | TX interrupt (see <<_processor_interrupts>>)
 |=======================
--- a/docs/datasheet/soc_wdt.adoc
+++ b/docs/datasheet/soc_wdt.adoc
@ -9,7 +9,7 @@
 | Software driver file(s): | neorv32_wdt.c |
 |                          | neorv32_wdt.h |
 | Top entity port:         | none | 
-| Configuration generics:  | _IO_WDT_EN_ | implement watchdog when _true_
+| Configuration generics:  | `IO_WDT_EN` | implement watchdog when `true`
 | CPU interrupts:          | fast IRQ channel 0 | watchdog timeout (see <<_processor_interrupts>>)
 |=======================

@ -25,54 +25,55 @@ program every now and then to prevent a timeout.

 **Configuration**

-The watchdog is enabled by setting the control register's `WDT_CTRL_EN_ bit. When this bit is cleared, the internal
+The watchdog is enabled by setting the control register's `WDT_CTRL_EN` bit. When this bit is cleared, the internal
 timeout counter is reset to zero and no interrupt and no system reset can be triggered.

 The internal 32-bit timeout counter is clocked at 1/4096th of the processor's main clock (f~WDT~[Hz] = f~main~[Hz] / 4096).
-Whenever this counter reaches the programmed timeout value (_WDT_CTRL_TIMEOUT_ bits in the control register) a
+Whenever this counter reaches the programmed timeout value (`WDT_CTRL_TIMEOUT` bits in the control register) a
 hardware reset is triggered. In order to inform the application of an imminent timeout, an optional CPU interrupt is
-triggered when the timeout counter reaches **half** of the programmed timeout value.
+triggered when the timeout counter reaches _half_ of the programmed timeout value.

-The watchdog is "fed" by writing `1` to the _WDT_CTRL_RESET_ control register bit, which
+The watchdog is "fed" by writing `1` to the `WDT_CTRL_RESET` control register bit, which
 will reset the internal timeout counter back to zero.

+.Forced Reset
 [NOTE]
-Writing all-zero to the _WDT_CTRL_TIMEOUT_ bits will immediately trigger a system-wide reset.
+Writing all-zero to the `WDT_CTRL_TIMEOUT` bits will immediately trigger a system-wide reset.

 .Watchdog Interrupt
 [NOTE]
 A watchdog interrupt occurs when the watchdog is enabled and the internal counter reaches _exactly_ half of the programmed
-timeout value. Hence, the interrupt only fires once. Howeer, a triggered WDT interrupt has to be explicitly cleared by
+timeout value. Hence, the interrupt only fires once. However, a triggered WDT interrupt has to be explicitly cleared by
 writing zero to the according <<_mip>> CSR bit.

 .Watchdog Operation during Debugging
 [IMPORTANT]
-By default the watchdog stops operation when the CPU enters debug mode and will resume normal operation after
+By default, the watchdog stops operation when the CPU enters debug mode and will resume normal operation after
 the CPU has left debug mode again. This will prevent an unintended watchdog timeout during a debug session. However,
 the watchdog can also be configured to keep operating even when the CPU is in debug mode by setting the control
-register's _WDT_CTRL_DBEN_ bit.
+register's `WDT_CTRL_DBEN` bit.

 .Watchdog Operation during CPU Sleep
 [IMPORTANT]
-By default the watchdog stops operating when the CPU enters sleep mode. However, the watchdog can also be configured
-to keep operating even when the CPU is in sleep mode by setting the control register's _WDT_CTRL_SEN_ bit.
+By default, the watchdog stops operating when the CPU enters sleep mode. However, the watchdog can also be configured
+to keep operating even when the CPU is in sleep mode by setting the control register's `WDT_CTRL_SEN` bit.


 **Configuration Lock**

-The watchdog control register can be locked to protect the current configuration from being modified. The lock is
-activated by setting the _WDT_CTRL_LOCK_ bit. In the locked state any write access to the control register is entirely
+The watchdog control register can be _locked_ to protect the current configuration from being modified. The lock is
+activated by setting the `WDT_CTRL_LOCK` bit. In the locked state any write access to the control register is entirely
 ignored (see table below, "writable if locked"). Read accesses to the control register as well as watchdog resets
-(by setting the _WDT_CTRL_RESET_ flag) are not affected.
+(by setting the `WDT_CTRL_RESET` flag) are not affected.

-The lock bit can only be set if the WDT is already enabled (_WDT_CTRL_EN_ is set).
-The lock bit can only be cleared again by a system-wide hardware reset.
+The lock bit can only be set if the WDT is already enabled (`WDT_CTRL_EN` is set). Furthermore, the lock bit can
+only be cleared again by a system-wide hardware reset.


 **Cause of last Hardware Reset**

-The cause of the last system hardware reset can be determined via the _WDT_CTRL_RCAUSE_ flag. If this flag is
-zero, the processor has been reset via the external reset signal (or the on-chip debugger). If this flag is set,
+The cause of the last system hardware reset can be determined via the `WDT_CTRL_RCAUSE` flag. If this flag is
+cleared, the processor has been reset via the external reset signal (or the on-chip debugger). If this flag is set,
 the last system reset was caused by the watchdog itself.


@ -83,12 +84,12 @@ the last system reset was caused by the watchdog itself.
 [options="header",grid="all"]
 |=======================
 | Address | Name [C] | Bit(s), Name [C] | R/W | Reset value | Writable if locked | Function
-.8+<| `0xffffffbc` .8+<| `CTRL` <|`0` _WDT_CTRL_EN_     ^| r/w ^| `0` ^| no  <| watchdog enable
-                                <|`1  _WDT_CTRL_LOCK_   ^| r/w ^| `0` ^| no  <| lock configuration when set, clears only on system reset, can only be set if enable bit is set already
-                                <|`2` _WDT_CTRL_DBEN_   ^| r/w ^| `0` ^| no  <| set to allow WDT to continue operation even when CPU is in debug mode
-                                <|`3` _WDT_CTRL_SEN_    ^| r/w ^| `0` ^| no  <| set to allow WDT to continue operation even when CPU is in sleep mode
-                                <|`4` _WDT_CTRL_RESET_  ^| -/w ^| -   ^| yes <| reset watchdog when set, auto-clears
-                                <|`5` _WDT_CTRL_RCAUSE_ ^| r/- ^| `0` ^| -   <| cause of last system reset: `0`=caused by external reset signal, `1`=caused by watchdog
+.8+<| `0xffffffbc` .8+<| `CTRL` <|`0` `WDT_CTRL_EN`     ^| r/w ^| `0` ^| no  <| watchdog enable
+                                <|`1  `WDT_CTRL_LOCK`   ^| r/w ^| `0` ^| no  <| lock configuration when set, clears only on system reset, can only be set if enable bit is set already
+                                <|`2` `WDT_CTRL_DBEN`   ^| r/w ^| `0` ^| no  <| set to allow WDT to continue operation even when CPU is in debug mode
+                                <|`3` `WDT_CTRL_SEN`    ^| r/w ^| `0` ^| no  <| set to allow WDT to continue operation even when CPU is in sleep mode
+                                <|`4` `WDT_CTRL_RESET`  ^| -/w ^| -   ^| yes <| reset watchdog when set, auto-clears
+                                <|`5` `WDT_CTRL_RCAUSE` ^| r/- ^| `0` ^| -   <| cause of last system reset: `0`=caused by external reset signal, `1`=caused by watchdog
                                <|`7:6` -               ^| r/- ^| -   ^| -   <| _reserved_, reads as zero
-                                <|`31:8` _WDT_CTRL_TIMEOUT_MSB_ : _WDT_CTRL_TIMEOUT_LSB_ ^| r/w ^| 0 ^| no <| 24-bit watchdog timeout value
+                                <|`31:8` `WDT_CTRL_TIMEOUT_MSB : WDT_CTRL_TIMEOUT_LSB` ^| r/w ^| 0 ^| no <| 24-bit watchdog timeout value
 |=======================
--- a/docs/datasheet/soc_wishbone.adoc
+++ b/docs/datasheet/soc_wishbone.adoc
@ -1,6 +1,6 @@
 <<<
 :sectnums:
-==== Processor-External Memory Interface (WISHBONE) (AXI4-Lite)
+==== Processor-External Memory Interface (WISHBONE)

 [cols="<3,<3,<4"]
 [frame="topbot",grid="none"]
@ -19,28 +19,28 @@
 |                          | `wb_err_i`  | bus error (1-bit)
 |                          | `fence_o`   | an executed `fence` instruction
 |                          | `fencei_o`  | an executed `fence.i` instruction
-| Configuration generics:  | _MEM_EXT_EN_         | enable external memory interface when _true_
-|                          | _MEM_EXT_TIMEOUT_    | number of clock cycles after which an unacknowledged external bus access will auto-terminate (0 = disabled)
-|                          | _MEM_EXT_PIPE_MODE_  | when _false_ (default): classic/standard Wishbone protocol; when _true_: pipelined Wishbone protocol
-|                          | _MEM_EXT_BIG_ENDIAN_ | byte-order (Endianness) of external memory interface; true=BIG, false=little (default)
-|                          | _MEM_EXT_ASYNC_RX_   | use registered RX path when _false_ (default); use async/direct RX path when _true_
-|                          | _MEM_EXT_ASYNC_TX_   | use registered TX path when _false_ (default); use async/direct TX path when _true_
+| Configuration generics:  | `MEM_EXT_EN`         | enable external memory interface when `true`
+|                          | `MEM_EXT_TIMEOUT`    | number of clock cycles after which an unacknowledged external bus access will auto-terminate (0 = disabled)
+|                          | `MEM_EXT_PIPE_MODE`  | when `false` (default): classic/standard Wishbone protocol; when `true`: pipelined Wishbone protocol
+|                          | `MEM_EXT_BIG_ENDIAN` | byte-order (Endianness) of external memory interface; `true`=BIG, `false`=little (default)
+|                          | `MEM_EXT_ASYNC_RX`   | use registered RX path when `false` (default); use async/direct RX path when `true`
+|                          | `MEM_EXT_ASYNC_TX_`  | use registered TX path when `false` (default); use async/direct TX path when `true`
 | CPU interrupts:          | none |
 |=======================


 The external memory interface provides a Wishbone b4-compatible on-chip bus interface. The bus interface is
-implemented if the _MEM_EXT_EN_ generic is _true_. This interface can be used to attach external memories,
+implemented if the `MEM_EXT_EN` generic is `true`. This interface can be used to attach external memories,
 custom hardware accelerators, additional IO devices or all other kinds of IP blocks to the processor.

-The external interface is _not_ mapped to a _specific_ address space region. Instead, all CPU memory accesses that
+The external interface is not mapped to a specific address space. Instead, all CPU memory accesses that
 do not target a processor-internal module are delegated to the external memory interface. In summary, a CPU load/store
 access is delegated via the external bus interface if...

-* it does not target the internal instruction memory IMEM (if implemented at all).
-* **and** it does not target the internal data memory DMEM (if implemented at all).
-* **and** it does not target the internal bootloader ROM or any of the IO devices - regardless if one or more of these components are
-actually implemented or not.
+* the access does not target the internal instruction memory IMEM (if implemented at all)
+* **and** the access does not target the internal data memory DMEM (if implemented at all)
+* **and** the access does not target the internal bootloader ROM or any of the internal modules - regardless
+if one or more of these components are actually implemented or not.

 .Address Space Layout
 [TIP]
@ -48,18 +48,18 @@ See section <<_address_space>> for more information.

 .Execute-in-Place Module
 [NOTE]
-If the Execute In Place module (XIP) is implemented accesses targeting the XIP memory-mapped-region will not be forwarded to the
+If the Execute In Place module (XIP) is implemented, accesses targeting the XIP memory-mapped-region will not be forwarded to the
 external memory interface. See section <<_execute_in_place_module_xip>> for more information.


 **Wishbone Bus Protocol**

 The external memory interface either uses the **standard** (also called "classic") Wishbone protocol (default) or
-**pipelined** Wishbone protocol. The protocol to be used is configured via the <<_mem_ext_pipe_mode>> generic:
+**pipelined** Wishbone protocol. The protocol to be used is configured via the `MEM_EXT_PIPE_MODE` generic:

-* If _MEM_EXT_PIPE_MODE_ is _false_, all bus control signals including `wb_stb_o` are active and remain stable until the
+* If `MEM_EXT_PIPE_MODE` is `false`, all bus control signals including `wb_stb_o` are active and remain stable until the
 transfer is acknowledged/terminated.
-* If _MEM_EXT_PIPE_MODE_ is _true_, all bus control except `wb_stb_o` are active and remain until the transfer is
+* If `MEM_EXT_PIPE_MODE` is `true`, all bus control except `wb_stb_o` are active and remain until the transfer is
 acknowledged/terminated. In this case, `wb_stb_o` is asserted only during the very first bus clock cycle.

 .Exemplary Wishbone bus accesses using "classic" and "pipelined" protocol
@ -81,33 +81,22 @@ project.

 **Bus Access**

-The NEORV32 Wishbone gateway does not support burst transfer yet, so there is always just a single transfer in "in fly".
+The NEORV32 Wishbone gateway does not support burst transfer yet, so there is always just a single transfer "in fly".
 Hence, the Wishbone `STALL` signal is not implemented. An accessed Wishbone device does not have to respond immediately to a bus
 request by sending an ACK. Instead, there is a _time window_ where the device has to acknowledge the transfer. This time window
-id configured by the _MEM_EXT_TIMEOUT_ top generic that defines the maximum time (in clock cycles) a bus access can be pending
-before it is automatically terminated with an error condition. If _MEM_EXT_TIMEOUT_ is set to zero, the timeout disabled
-an a bus access can take an arbitrary number of cycles to complete (this is **not recommended**!).
+s configured by the `MEM_EXT_TIMEOUT` generic that defines the maximum time (in clock cycles) a bus access can be pending
+before it is automatically terminated with an error condition. If `MEM_EXT_TIMEOUT` is set to zero, the timeout is disabled
+and a bus access can take an arbitrary number of cycles to complete (this is not recommended!).

-When _MEM_EXT_TIMEOUT_ is greater than zero, the Wishbone gateway starts an internal countdown whenever the CPU
+When `MEM_EXT_TIMEOUT` is greater than zero, the Wishbone gateway starts an internal countdown whenever the CPU
 accesses an address via the external memory interface. If the accessed device does not acknowledge (via `wb_ack_i`)
-or terminate (via `wb_err_i`) the transfer within _MEM_EXT_TIMEOUT_ clock cycles, the bus access is automatically canceled
+or terminate (via `wb_err_i`) the transfer within `MEM_EXT_TIMEOUT` clock cycles, the bus access is automatically canceled
 setting `wb_cyc_o` low again and a CPU load/store/instruction fetch bus access fault exception is raised.

-.External "Address Space Holes"
-[IMPORTANT]
-Setting _MEM_EXT_TIMEOUT_ to zero will permanently stall the CPU if the targeted Wishbone device never responds. Hence,
-_MEM_EXT_TIMEOUT_ should be always set to a value greater than zero. +
- +
-This feature can be used as **safety guard** if the external memory system does not check for "address space holes". That means
-that accessing addresses, which do not belong to a certain memory or device, do not permanently stall the processor due to an
-unacknowledged/unterminated bus access. If the external memory system can guarantee to acknowledge **any** bus accesses
-(even if targeting an unimplemented address) the timeout feature can be safely disabled (_MEM_EXT_TIMEOUT_ = 0).
-

 **Wishbone Tag**

-The 3-bit wishbone `wb_tag_o` signal provides additional information regarding the access type. This signal
-is compatible to the AXI4 `AxPROT` signal.
+The 3-bit wishbone `wb_tag_o` signal provides additional information regarding the access type:

 * `wb_tag_o(0)` 1: privileged access (CPU is in machine mode); 0: unprivileged access (CPU is not in machine mode)
 * `wb_tag_o(1)` always zero (indicating "secure access")
@ -117,8 +106,8 @@ is compatible to the AXI4 `AxPROT` signal.
 **Endianness**

 The NEORV32 CPU and the Processor setup are *little-endian* architectures. To allow direct connection
-to a big-endian memory system the external bus interface provides an _Endianness configuration_. The
-Endianness of the external memory interface can be configured via the _MEM_EXT_BIG_ENDIAN_ generic.
+to a big-endian memory system the external bus interface provides an Endianness configuration. The
+Endianness of the external memory interface can be configured via the `MEM_EXT_BIG_ENDIAN` generic.
 By default, the external memory interface uses little-endian byte-order.

 Application software can check the Endianness configuration of the external bus interface via the
@ -133,7 +122,7 @@ via Wishbone requires 2 additional clock cycles. This can ease timing closure wh
 interconnection networks.

 Optionally, the latency of the Wishbone gateway can be reduced by removing the input and output register stages.
-Enabling the _MEM_EXT_ASYNC_RX_ option will remove the input register stage; enabling _MEM_EXT_ASYNC_TX_ option will
+Enabling the `MEM_EXT_ASYNC_RX` option will remove the input register stage; enabling `MEM_EXT_ASYNC_TX` option will
 remove the output register stages. Each enabled option reduces access latency by 1 cycle.

 .Output Gating
@ -141,17 +130,4 @@ remove the output register stages. Each enabled option reduces access latency by
 All outgoing Wishbone signals use a "gating mechanism" so they only change if there is a actual Wishbone transaction being in
 progress. This can reduce dynamic switching activity in the external bus system and also simplifies simulation-based
 inspection of the Wishbone transactions. Note that this output gating is only available if the output register buffer is not
-disabled (_MEM_EXT_ASYNC_TX_ = _false_).
-
-
-**AXI4-Lite Connectivity**
-
-The AXI4-Lite wrapper (`rtl/system_integration/neorv32_SystemTop_axi4lite.vhd`) provides a Wishbone-to-
-AXI4-Lite bridge, compatible with Xilinx Vivado (IP packager and block design editor). All entity signals of
-this wrapper are of type _std_logic_ or _std_logic_vector_, respectively. See The USer Guide for more
-information: https://stnolting.github.io/neorv32/ug/#_packaging_the_processor_as_ip_block_for_xilinx_vivado_block_designer
-
-[WARNING]
-Using the auto-termination timeout feature (_MEM_EXT_TIMEOUT_ greater than zero) is **not AXI4 compliant** as
-the AXI protocol does not support "aborting" bus transactions. Therefore, the NEORV32 top wrapper with AXI4-Lite interface
-(`rtl/system_integration/neorv32_SystemTop_axi4lite`) configures _MEM_EXT_TIMEOUT_ = 0 by default.
+disabled (`MEM_EXT_ASYNC_TX` = `false`).
--- a/docs/datasheet/soc_xip.adoc
+++ b/docs/datasheet/soc_xip.adoc
@ -12,18 +12,17 @@
 |                          | `xip_clk_o` | 1-bit serial clock output
 |                          | `xip_dat_i` | 1-bit serial data input
 |                          | `xip_dat_o` | 1-bit serial data output
-| Configuration generics:  | _IO_XIP_EN_ | implement XIP module when _true_
+| Configuration generics:  | `IO_XIP_EN` | implement XIP module when `true`
 | CPU interrupts:          | none | 
 |=======================


 **Overview**

-The execute in place (XIP) module is probably one of the more complicated modules of the NEORV32. The module
-allows to execute code (and read constant data) directly from a SPI flash memory. Hence, it uses the standard
-serial peripheral interface (SPI) as transfer protocol under the hood.
+The execute in-place (XIP) module allows to execute code (and read constant data) directly from a SPI flash memory.
+Hence, it uses the standard serial peripheral interface (SPI) as transfer protocol under the hood.

-The XIP flash is not mapped to a specific region of the processor's address space. Instead, the XIP module
+The XIP flash is not mapped to a specific region of the processor's address space. Instead, it
 provides a programmable mapping scheme to allow a flexible user-defined mapping of the flash to _any section_
 of the address space.

@ -34,26 +33,13 @@ Note that this interface is read-only. Any write access will raise a bus error e
 The second interface is mapped to the processor's IO space and allows data accesses to the XIP module's
 configuration registers.

-.Example Program
-[TIP]
-An example program for the XIP module is available in `sw/example/demo_xip`.
-
-.Compiling a Program for XIP Execution
-[NOTE]
-If you want to compile a program that shall be executed using the XIP module, the default NEORV32 linker script
-(`sw/common/neorv32.ld`) has to be modified: the `ORIGIN` attribute of the `rom` section needs to be adapted to
-the XIP page base address and the flash base address. For example if the XIP page is set to `0x20000000` and the
-executable is placed in the flash as offset `0x00400000` the `ORIGIN` attribute has to be set to the sum of both
-address offsets (`0x20000000 + 0x00400000 = 0x20400000` -> `rom  (rx) : ORIGIN = DEFINED(make_bootloader) ? 0xFFFF0000 : 0x20400000`).
-See sections <<_linker_script>>, <<_application_makefile>> and <<_executable_image_generator>> for more information.
-

 **SPI Protocol**

 The XIP module accesses external flash using the standard SPI protocol. The module always sends data MSB-first and
-provides all of the standard four clock modes (0..3), which are configured via the _XIP_CTRL_CPOL_ (clock polarity)
-and _XIP_CTRL_CPHA_ (clock phase) control register bits, respectively. The clock speed of the interface (`xip_clk_o`)
-is defined by a three-bit clock pre-scaler configured using the _XIP_CTRL_PRSCx_ bits:
+provides all of the standard four clock modes (0..3), which are configured via the `XIP_CTRL_CPOL` (clock polarity)
+and `XIP_CTRL_CPHA` (clock phase) control register bits, respectively. The clock speed of the interface (`xip_clk_o`)
+is defined by a three-bit clock pre-scaler configured using the `XIP_CTRL_PRSCx` bits:

 .XIP prescaler configuration
 [cols="<4,^1,^1,^1,^1,^1,^1,^1,^1"]
@ -63,7 +49,7 @@ is defined by a three-bit clock pre-scaler configured using the _XIP_CTRL_PRSCx_
 | Resulting `clock_prescaler` |       2 |       4 |       8 |      64 |     128 |    1024 |    2048 |    4096
 |=======================

-Based on the _XIP_CTRL_PRSCx_ configuration the actual XIP SPI clock frequency f~XIP~ is derived from the processor's
+Based on the clock configuration the actual XIP SPI clock frequency f~XIP~ is derived from the processor's
 main clock f~main~ and is determined by:

 _**f~XIP~**_ = _f~main~[Hz]_ / (2 * `clock_prescaler`)
@ -72,36 +58,36 @@ Hence, the maximum XIP clock speed is f~main~ / 4.

 .High-Speed SPI mode
 [TIP]
-The module provides a "high-speed" SPI mode. In this mode the clock prescaler configuration (_XIP_CTRL_PRSCx_) is ignored
+The module provides a "high-speed" SPI mode. In this mode the clock prescaler configuration (`XIP_CTRL_PRSCx`) is ignored
 and the SPI clock operates at f~main~ / 2 (half of the processor's main clock). High speed SPI mode is enabled by setting
-the control register's _XIP_CTRL_HIGHSPEED_ bit.
+the control register's `XIP_CTRL_HIGHSPEED` bit.

-The flash's "read command", which initiates a read access, is defined by the _XIP_CTRL_RD_CMD_ control register bits.
+The flash's "read command", which initiates a read access, is defined by the `XIP_CTRL_RD_CMD` control register bits.
 For most SPI flash memories this is `0x03` for normal SPI mode.


 **Direct SPI Access**

 The XIP module allows to initiate _direct_ SPI transactions. This feature can be used to configure the attached SPI
-flash or to perform direct read and write accesses to the flash memory. Two data registers `NEORV32_XIP.DATA_LO` and
-`NEORV32_XIP.DATA_HI` are provided to send up to 64-bit of SPI data. The `NEORV32_XIP.DATA_HI` register is write-only,
-so a total of 32-bit receive data is provided. Note that the module handles the chip-select
+flash or to perform direct read and write accesses to the flash memory. Two data registers `DATA_LO` and
+`DATA_HI` are provided to send up to 64-bit of SPI data. The `DATA_HI` register is write-only,
+so a total of just 32-bits of receive data is provided. Note that the module handles the chip-select
 line (`xip_csn_o`) by itself so it is not possible to construct larger consecutive transfers.

-The actual data transmission size in bytes is defined by the control register's _XIP_CTRL_SPI_NBYTES_ bits.
+The actual data transmission size in bytes is defined by the control register's `XIP_CTRL_SPI_NBYTES` bits.
 Any configuration from 1 byte to 8 bytes is valid. Other value will result in unpredictable behavior.

 Since data is always transferred MSB-first, the data in `DATA_HI:DATA_LO` also has to be MSB-aligned. Receive data is
-available in `DATA_LO` only - `DATA_HI` is write-only. Writing to `DATA_HI` triggers the actual SPI transmission.
-The _XIP_CTRL_PHY_BUSY_ control register flag indicates a transmission being in progress.
+available in `DATA_LO` only since `DATA_HI` is write-only. Writing to `DATA_HI` triggers the actual SPI transmission.
+The `XIP_CTRL_PHY_BUSY` control register flag indicates a transmission being in progress.

 The chip-select line of the XIP module (`xip_csn_o`) will only become asserted (enabled, pulled low) if the
-_XIP_CTRL_SPI_CSEN_ control register bit is set. If this bit is cleared, `xip_csn_o` is always disabled
+`XIP_CTRL_SPI_CSEN` control register bit is set. If this bit is cleared, `xip_csn_o` is always disabled
 (pulled high).

 [NOTE]
-Direct SPI mode is only possible when the module is enabled (setting _XIP_CTRL_EN_) but **before** the actual
-XIP mode is enabled via _XIP_CTRL_XIP_EN_.
+Direct SPI mode is only possible when the module is enabled (setting `XIP_CTRL_EN`) but **before** the actual
+XIP mode is enabled via `XIP_CTRL_XIP_EN`.

 [TIP]
 When the XIP mode is not enabled, the XIP module can also be used as additional general purpose SPI controller
@ -113,28 +99,16 @@ with a transfer size of up to 64 bits per transmission.
 The address mapping of the XIP flash is not fixed by design. It can be mapped to _any section_ within the processor's
 address space. A _section_ refers to one out of 16 naturally aligned 256MB wide memory segments. This segment
 is defined by the four most significant bits of the address (`31:28`) and the XIP's segment is programmed by the
-four _XIP_CTRL_XIP_PAGE_ bits in the unit's control register. All accesses within this page will be mapped to the XIP flash.
+four `XIP_CTRL_XIP_PAGE` bits in the unit's control register. All accesses within this page will be mapped to the XIP flash.

 [NOTE]
 Care must be taken when programming the page mapping to prevent access collisions with other modules (like internal memories
 or modules attached to the external memory interface).

-Example: to map the XIP flash to the address space starting at `0x20000000` write a "2" (`0b0010`) to the _XIP_CTRL_XIP_PAGE_
-control register bits. Any access within `0x20000000 .. 0x2fffffff` will be forwarded to the XIP flash.
-Note that the SPI access address might wrap around.
-
-.Using the FPGA Bitstream Flash also for XIP
-[TIP]
-You can also use the FPGA's bitstream SPI flash for storing XIP programs. To prevent overriding the bitstream,
-a certain offset needs to be added to the executable (which might require linker script modifications).
-To execute the program stored in the SPI flash simply jump to the according base address. For example
-if the executable starts at flash offset `0x8000` and the XIP flash is mapped to the base address `0x20000000`
-then add the offset to the base address and use that as jump/call destination (=`0x20008000`).
-

 **Using the XIP Mode**

-The XIP module is globally enabled by setting the _XIP_CTRL_EN_ bit in the device's `CTRL` control register.
+The XIP module is globally enabled by setting the `XIP_CTRL_EN` bit in the device's `CTRL` control register.
 Clearing this bit will reset the whole module and will also terminate any pending SPI transfer.

 Since there is a wide variety of SPI flash components with different sizes, the XIP module allows to specify
@ -145,11 +119,11 @@ address bytes (**minus one**). For example for a SPI flash with 24-bit addresses

 The transparent XIP accesses are transformed into SPI transmissions with the following format (starting with the MSB):

-* 8-bit command: configured by the _XIP_CTRL_RD_CMD_ control register bits ("SPI read command")
-* 8 to 32 bits address: defined by the _XIP_CTRL_XIP_ABYTES_ control register bits ("number of address bytes")
+* 8-bit command: configured by the `XIP_CTRL_RD_CMD` control register bits ("SPI read command")
+* 8 to 32 bits address: defined by the `XIP_CTRL_XIP_ABYTES` control register bits ("number of address bytes")
 * 32-bit data: sending zeros and receiving the according flash word (32-bit)

-Hence, the maximum XIP transmission size is 72-bit, which has to be configured via the _XIP_CTRL_SPI_NBYTES_
+Hence, the maximum XIP transmission size is 72-bit, which has to be configured via the `XIP_CTRL_SPI_NBYTES`
 control register bits. Note that the 72-bit transmission size is only available in XIP mode. The transmission
 size of the direct SPI accesses is limited to 64-bit.

@ -166,8 +140,8 @@ The XIP mode requires the 4-byte data words in the flash to be ordered in **litt

 After the SPI properties (including the amount of address bytes **and** the total amount of SPI transfer bytes)
 and XIP address mapping are configured, the actual XIP mode can be enabled by setting
-the control register's _XIP_CTRL_XIP_EN_ bit. This will enable the "transparent SPI access port" of the module and thus,
-the _transparent_ conversion of access requests into proper SPI flash transmissions. Make sure _XIP_CTRL_SPI_CSEN_
+the control register's `XIP_CTRL_XIP_EN` bit. This will enable the "transparent SPI access port" of the module and thus,
+the _transparent_ conversion of access requests into proper SPI flash transmissions. Make sure `XIP_CTRL_SPI_CSEN`
 is also set so the module can actually select/enable the attached SPI flash.
 No more direct SPI accesses via `DATA_HI:DATA_LO` are possible when the XIP mode is enabled. However, the
 XIP mode can be disabled at any time.
@ -190,7 +164,7 @@ data bytes. Obviously, this introduces a certain transmission overhead. To reduc
 the flash's _incrmental read_ function, which will return consecutive bytes when continuing to send clock cycles after a read command.
 Hence, the XIP module provides an optional "burst mode" to accelerate consecutive read accesses.

-The XIP burst mode is enabled by setting the _XIP_CTRL_BURST_EN_ bit in the module's control register. The burst mode only affects
+The XIP burst mode is enabled by setting the `XIP_CTRL_BURST_EN` bit in the module's control register. The burst mode only affects
 the actual XIP mode and _not_ the direct SPI mode. Hence, it should be enabled right before enabling XIP mode only.
 By using the XIP burst mode flash read accesses can be accelerated by up to 50%.

@ -202,23 +176,21 @@ By using the XIP burst mode flash read accesses can be accelerated by up to 50%.
 [options="header",grid="all"]
 |=======================
 | Address | Name [C] | Bit(s), Name [C] | R/W | Function
-.17+<| `0xffffff40` .17+<| `CTRL` <|`0`  _XIP_CTRL_EN_    ^| r/w <| XIP module enable
-                                  <|`1`  _XIP_CTRL_PRSC0_ ^| r/w .3+| 3-bit SPI clock prescaler select
-                                  <|`2`  _XIP_CTRL_PRSC1_ ^| r/w
-                                  <|`3`  _XIP_CTRL_PRSC2_ ^| r/w
-                                  <|`4`  _XIP_CTRL_CPOL_  ^| r/w <| SPI clock polarity
-                                  <|`5`  _XIP_CTRL_CPHA_  ^| r/w <| SPI clock phase
-                                  <|`9:6`  _XIP_CTRL_SPI_NBYTES_MSB_ : _XIP_CTRL_SPI_NBYTES_LSB_ ^| r/w <| Number of bytes in SPI transaction (1..9)
-                                  <|`10` _XIP_CTRL_XIP_EN_ ^| r/w <| XIP mode enable
-                                  <|`12:11` _XIP_CTRL_XIP_ABYTES_MSB_ : _XIP_CTRL_XIP_ABYTES_LSB_ ^| r/w <| Number of address bytes for XIP flash (minus 1)
-                                  <|`20:13` _XIP_CTRL_RD_CMD_MSB_ : _XIP_CTRL_RD_CMD_LSB_ ^| r/w <| Flash read command
-                                  <|`24:21` _XIP_CTRL_XIP_PAGE_MSB_ : _XIP_CTRL_XIP_PAGE_LSB_ ^| r/w <| XIP memory page
-                                  <|`25` _XIP_CTRL_SPI_CSEN_  ^| r/w <| Allow SPI chip-select to be actually asserted when set
-                                  <|`26` _XIP_CTRL_HIGHSPEED_ ^| r/w <| enable SPI high-speed mode (ignoring _XIP_CTRL_PRSC_)
-                                  <|`27` _XIP_CTRL_BURST_EN_  ^| r/w <| Enable XIP burst mode
-                                  <|`29:28`                   ^| r/- <| _reserved_, read as zero
-                                  <|`30` _XIP_CTRL_PHY_BUSY_  ^| r/- <| SPI PHY busy when set
-                                  <|`31` _XIP_CTRL_XIP_BUSY_  ^| r/- <| XIP access in progress when set
+.15+<| `0xffffff40` .15+<| `CTRL` <|`0`     `XIP_CTRL_EN`                                       ^| r/w <| XIP module enable
+                                  <|`3:1`   `XIP_CTRL_PRSC2 : XIP_CTRL_PRSC0`                   ^| r/w <| 3-bit SPI clock prescaler select
+                                  <|`4`     `XIP_CTRL_CPOL`                                     ^| r/w <| SPI clock polarity
+                                  <|`5`     `XIP_CTRL_CPHA`                                     ^| r/w <| SPI clock phase
+                                  <|`9:6`   `XIP_CTRL_SPI_NBYTES_MSB : XIP_CTRL_SPI_NBYTES_LSB` ^| r/w <| Number of bytes in SPI transaction (1..9)
+                                  <|`10`    `XIP_CTRL_XIP_EN`                                   ^| r/w <| XIP mode enable
+                                  <|`12:11` `XIP_CTRL_XIP_ABYTES_MSB : XIP_CTRL_XIP_ABYTES_LSB` ^| r/w <| Number of address bytes for XIP flash (minus 1)
+                                  <|`20:13` `XIP_CTRL_RD_CMD_MSB : XIP_CTRL_RD_CMD_LSB`         ^| r/w <| Flash read command
+                                  <|`24:21` `XIP_CTRL_XIP_PAGE_MSB : XIP_CTRL_XIP_PAGE_LSB`     ^| r/w <| XIP memory page
+                                  <|`25`    `XIP_CTRL_SPI_CSEN`                                 ^| r/w <| Allow SPI chip-select to be actually asserted when set
+                                  <|`26`    `XIP_CTRL_HIGHSPEED`                                ^| r/w <| enable SPI high-speed mode (ignoring _XIP_CTRL_PRSC_)
+                                  <|`27`    `XIP_CTRL_BURST_EN`                                 ^| r/w <| Enable XIP burst mode
+                                  <|`29:28` -                                                   ^| r/- <| _reserved_, read as zero
+                                  <|`30`   `XIP_CTRL_PHY_BUSY`                                  ^| r/- <| SPI PHY busy when set
+                                  <|`31`   `XIP_CTRL_XIP_BUSY`                                  ^| r/- <| XIP access in progress when set
 | `0xffffff44` | _reserved_ |`31:0` | r/- | _reserved_, read as zero
 | `0xffffff48` | `DATA_LO`  |`31:0` | r/w | Direct SPI access - data register low
 | `0xffffff4C` | `DATA_HI`  |`31:0` | -/w | Direct SPI access - data register high; write access triggers SPI transfer
--- a/docs/datasheet/soc_xirq.adoc
+++ b/docs/datasheet/soc_xirq.adoc
@ -6,31 +6,38 @@
 [frame="topbot",grid="none"]
 |=======================
 | Hardware source file(s): | neorv32_xirq.vhd |
-| Software driver file(s): | neorv32_xirq.c |
-|                          | neorv32_xirq.h |
-| Top entity port:         | `xirq_i` | IRQ input (32-bit, fixed)
-| Configuration generics:  | _XIRQ_NUM_CH_           | Number of IRQs to implement (0..32)
-|                          | _XIRQ_TRIGGER_TYPE_     | IRQ trigger type configuration
-|                          | _XIRQ_TRIGGER_POLARITY_ | IRQ trigger polarity configuration
-| CPU interrupts:          | fast IRQ channel 8 | XIRQ (see <<_processor_interrupts>>)
+| Software driver file(s): | neorv32_xirq.c   |
+|                          | neorv32_xirq.h   |
+| Top entity port:         | `xirq_i`                | External interrupts input (32-bit)
+| Configuration generics:  | `XIRQ_NUM_CH`           | Number of external IRQ channels to implement (0..32)
+|                          | `XIRQ_TRIGGER_TYPE`     | IRQ trigger type configuration
+|                          | `XIRQ_TRIGGER_POLARITY` | IRQ trigger polarity configuration
+| CPU interrupts:          | fast IRQ channel 8      | XIRQ (see <<_processor_interrupts>>)
 |=======================

-The eXternal interrupt controller provides a simple mechanism to implement up to 32 processor-external interrupt
+
+**Overview**
+
+The external interrupt controller provides a simple mechanism to implement up to 32 processor-external interrupt
 request signals. The external IRQ requests are prioritized, queued and signaled to the CPU via a
-single _CPU fast interrupt request_.
+_single_ CPU fast interrupt request.


 **Theory of Operation**

-The XIRQ provides up to 32 interrupt _channels_ (configured via the _XIRQ_NUM_CH_ generic). Each bit in the `xirq_i`
-input signal vector represents one interrupt channel. If less than 32 channels are configure, only the LSB-aligned channels
-are used while the remaining bits are left unconnected. An interrupt channel is enabled by setting the according bit in the
-interrupt enable register `IER`.
+The XIRQ provides up to 32 interrupt channels (configured via the `XIRQ_NUM_CH` generic). Each bit in the `xirq_i`
+input signal vector represents one interrupt channel. If less than 32 channels are configured, only the LSB-aligned channels
+are used while the remaining bits are left unconnected.

-If the configured trigger (see below) of an enabled channel fires, the request is stored into an internal buffer.
-This buffer is available via the interrupt pending register `IPR`. A `1` in this register indicates that the
-corresponding interrupt channel has fired but has not yet been serviced (so it is pending). An interrupt channel can
-become pending if the according `IER` bit is set. Pending IRQs can be cleared by writing `0` to the according `IPR`
+The actual interrupt trigger (low-level, high-level, rising-edge, falling-edge) can be configured independently for each channel
+using the `XIRQ_TRIGGER_TYPE` and `XIRQ_TRIGGER_POLARITY` generics. `XIRQ_TRIGGER_TYPE` is used to define the general trigger type.
+This can be either _level-triggered_ (`0`) or _edge-triggered_ (`1`). `XIRQ_TRIGGER_POLARITY` is used to configure the polarity of
+the selected trigger: a `0` defines low-level or falling-edge and a `1` defines high-level or rising-edge trigger polarity.
+
+Each channel can be independently enabled or disabled via the `IER` register. If the configured trigger (see below) of an
+enabled channel fires, the request is stored into an internal buffer. This buffer is available via the interrupt pending register `IPR`.
+A `1` in this register indicates that the corresponding interrupt channel has fired but has not yet been serviced (so it is pending).
+An interrupt channel can become pending if the according `IER` bit is set. Pending IRQs can be cleared by writing `0` to the according `IPR`
 bit. As soon as there is a least one pending interrupt in the buffer, an interrupt request is send to the CPU.

 [NOTE]
@ -39,7 +46,7 @@ A disabled interrupt channel can still be pending if it has been triggered befor
 The CPU can determine active external interrupt request either by checking the bits in the `IPR` register, which show all
 pending interrupt channels, or by reading the interrupt source register `SCR`.
 This register provides a 5-bit wide ID (0..31) that shows the interrupt request with _highest priority_.
-Interrupt channel `xirq_i(0)` has highest priority and `xirq_i(_XIRQ_NUM_CH_-1)` has lowest priority.
+Interrupt channel `xirq_i(0)` has highest priority and `xirq_i(XIRQ_NUM_CH-1)` has lowest priority.
 This priority assignment is fixed and cannot be altered by software.
 The CPU can use the ID from `SCR` to service IRQ according to their priority. To acknowledge the according
 interrupt the CPU can write `1 << SCR` to `IPR`.
@ -53,25 +60,6 @@ An interrupt handler should clear the interrupt pending bit that caused the inte
 acknowledging the interrupt by writing the `SCR` register.


-**IRQ Trigger Configuration**
-
-The controller does not provide a configuration option to define the IRQ triggers _during runtime_. Instead, two
-generics are provided to configure the trigger of each interrupt channel before synthesis: the _XIRQ_TRIGGER_TYPE_
-and _XIRQ_TRIGGER_POLARITY_ generic. Both generics are 32 bit wide representing one bit per interrupt channel. If
-less than 32 interrupt channels are implemented the remaining configuration bits are ignored.
-
-_XIRQ_TRIGGER_TYPE_ is used to define the general trigger type. This can be either _level-triggered_ (`0`) or
-_edge-triggered_ (`1`). _XIRQ_TRIGGER_POLARITY_ is used to configure the polarity of the trigger: a `0` defines
-low-level or falling-edge and a `1` defines high-level or rising-edge.
-
-.Example trigger configuration: channel 0 for rising-edge, IRQ channels 1 to 31 for high-level
-[source, vhdl]
----
-XIRQ_TRIGGER_TYPE     => x"00000001";  
-XIRQ_TRIGGER_POLARITY => x"ffffffff";  
----
-
-
 **Register Map**

 .XIRQ register map (`struct NEORV32_XIRQ`)
--- a/docs/datasheet/software.adoc
+++ b/docs/datasheet/software.adoc
@ -1,8 +1,8 @@
 :sectnums:
 == Software Framework

-To make actual use of the NEORV32 processor, the project comes with a complete software ecosystem. This
-ecosystem is based on the RISC-V port of the GCC GNU Compiler Collection and consists of the following elementary parts:
+The NEORV32 project comes with a complete software ecosystem called the "software framework", which
+is based on the C-language RISC-V GCC port and consists of the following parts:

 * <<_compiler_toolchain>>
 * <<_core_libraries>>
@ -18,7 +18,7 @@ ecosystem is based on the RISC-V port of the GCC GNU Compiler Collection and con
 A summarizing list of the most important elements of the software framework and their according
 files and folders is shown below:

-[cols="<6,<4"]
+[cols="<5,<5"]
 [grid="none"]
 |=======================
 | Application start-up code               | `sw/common/crt0.S`
@ -26,7 +26,7 @@ files and folders is shown below:
 | Core hardware driver libraries ("HAL")  | `sw/lib/include/` & `sw/lib/source/`
 | Central application makefile            | `sw/common/common.mk`
 | Tool for generating NEORV32 executables | `sw/image_gen/`
-| Default bootloader                      | `sw/bootloader/bootloader.c`
+| Default bootloader                      | `sw/bootloader`
 | Example programs                        | `sw/example`
 |=======================

@ -46,7 +46,7 @@ and peripheral/IO modules, can be found in `sw/example`.
 :sectnums:
 === Compiler Toolchain

-The toolchain for this project is based on the free RISC-V GCC-port. You can find the compiler sources and
+The toolchain for this project is based on the free and open  RISC-V GCC-port. You can find the compiler sources and
 build instructions on the official RISC-V GNU toolchain GitHub page: https://github.com/riscv/riscv-gnutoolchain.

 The NEORV32 implements a 32-bit RISC-V architecture and uses a 32-bit integer and soft-float ABI by default.
@ -56,11 +56,11 @@ Make sure the toolchain / toolchain build is configured accordingly.
 * `MABI=ilp32`
 * `RISCV_PREFIX=riscv32-unknown-elf-`

-These default configurations can be override at any times using <<_application_makefile>> variables.
+These default configurations can be overridden at any times using <<_application_makefile>> variables.

 [TIP]
-More information regarding the toolchain (building from scratch or downloading the prebuilt ones)
-can be found in the user guides' section https://stnolting.github.io/neorv32/ug/#_software_toolchain_setup[Software Toolchain Setup].
+More information regarding the toolchain (building from scratch or downloading prebuilt ones) can be found in the
+user guide section https://stnolting.github.io/neorv32/ug/#_software_toolchain_setup[Software Toolchain Setup].


 <<<
@ -77,10 +77,11 @@ The NEORV32 project provides a set of pre-defined C libraries that allow an easy
 #include <neorv32.h> // NEORV32 HAL, core and runtime libraries
 ----

-[cols="<3,<4,<8"]
+.NEORV32 HAL File List
+[cols="<3,<3,<6"]
 [options="header",grid="rows"]
 |=======================
-| C source file | C header file | Description
+| C source file       | C header file          | Description
 | -                   | `neorv32.h`            | main NEORV32 definitions and library file
 | -                   | `neorv32_buskeeper`    | HW driver functions for the bus keeper
 | `neorv32_cfs.c`     | `neorv32_cfs.h`        | HW driver (stubs) functions for custom functions subsystem
@ -94,7 +95,7 @@ The NEORV32 project provides a set of pre-defined C libraries that allow an easy
 | `neorv32_neoled.c`  | `neorv32_neoled.h`     | HW driver functions for the smart LED interface
 | `neorv32_onewire.c` | `neorv32_onewire.h`    | HW driver functions for the 1-wire interface
 | `neorv32_pwm.c`     | `neorv32_pwm.h`        | HW driver functions for the pulse-width modulation controller
-| `neorv32_rte.c`     | `neorv32_rte.h`        | NEORV32 runtime environment and helper functions
+| `neorv32_rte.c`     | `neorv32_rte.h`        | <<_neorv32_runtime_environment>>
 | `neorv32_sdi.c`     | `neorv32_sdi.h`        | HW driver functions for the serial data interface
 | `neorv32_spi.c`     | `neorv32_spi.h`        | HW driver functions for the serial peripheral interface
 | -                   | `neorv32_sysinfo.h`    | HW driver functions for the system information memory
@ -123,8 +124,8 @@ A CMSIS-SVD-compatible **System View Description (SVD)** file including all peri
 :sectnums:
 === Application Makefile

-Application compilation is based on a single, centralized **GNU makefile** (`sw/common/common.mk`). Each project in the
-`sw/example` folder provides a makefile that just includes this central makefile.
+Application compilation is based on a single, centralized GNU makefile (`sw/common/common.mk`). Each project in the
+`sw/example` folder provides a makefile that just _includes_ this central makefile.

 [TIP]
 When creating a new project, copy an existing project folder or at least the makefile to the new project folder.
@ -133,18 +134,12 @@ dependencies can be manually configured via makefile variables if the new projec

 [NOTE]
 Before the makefile can be used to compile applications, the RISC-V GCC toolchain needs to be installed and
-the compiler's `bin` folder has to be added to the system's `PATH` variable. More information can be found in
-https://stnolting.github.io/neorv32/ug/#_software_toolchain_setup[User Guide: Software Toolchain Setup].
+the compiler's `bin` folder has to be added to the system's `PATH` environment variable. More information can be
+found in https://stnolting.github.io/neorv32/ug/#_software_toolchain_setup[User Guide: Software Toolchain Setup].

-The makefile is invoked by simply executing `make` in the console. For example:
-
-[source,bash]
----
-neorv32/sw/example/demo_blink_led$ make
----

 :sectnums:
-==== Targets
+==== Makefile Targets

 Just executing `make` (or executing `make help`) will show the help menu listing all available targets.

@ -185,16 +180,15 @@ Make sure to add the bin folder of RISC-V GCC to your PATH variable.


 :sectnums:
-==== Configuration
+==== Makefile Configuration

 The compilation flow is configured via variables right at the beginning of the central
 makefile (`sw/common/common.mk`):

+.Customizing Makefile Variables
 [TIP]
 The makefile configuration variables can be overridden or extended directly when invoking the makefile. For
 example `$ make MARCH=rv32ic_zicsr clean_all exe` overrides the default `MARCH` variable definitions.
-Permanent modifications/definitions can be made in the project-local makefile
-(e.g., `sw/example/demo_blink_led/makefile`).

 .Default Makefile Configuration
 [source,makefile]
@ -224,22 +218,18 @@ NEORV32_HOME ?= ../../..
 ----

 .Variables Description
-[cols="<3,<10"]
+[cols="<2,<8"]
 [grid="none"]
 |=======================
-| `APP_SRC`      | The source files of the application (`*.c`, `*.cpp`, `*.S` and `*.s` files are allowed;
-files of these types in the project folder are automatically added via wild cards). Additional files can be added separated by white spaces
+| `APP_SRC`      | The source files of the application (`*.c`, `*.cpp`, `*.S` and `*.s` files are allowed; files of these types in the project folder are automatically added via wild cards). Additional files can be added separated by white spaces
 | `APP_INC`      | Include file folders; separated by white spaces; must be defined with `-I` prefix
 | `ASM_INC`      | Include file folders that are used only for the assembly source files (`*.S`/`*.s`).
 | `EFFORT`       | Optimization level, optimize for size (`-Os`) is default; legal values: `-O0`, `-O1`, `-O2`, `-O3`, `-Os`, `-Ofast`, ...
 | `RISCV_PREFIX` | The toolchain prefix to be used; follows the triplet naming convention `[architecture]-[host_system]-[output]-...`
-| `MARCH`        | The targeted RISC-V architecture/ISA; enable compiler support of optional CPU extension by adding the according extension
-name (e.g. `rv32im_zicsr` for `M` CPU extension; see https://stnolting.github.io/neorv32/ug/#_enabling_risc_v_cpu_extensions[User Guide: Enabling RISC-V CPU Extensions]
-for more information
+| `MARCH`        | The targeted RISC-V architecture/ISA
 | `MABI`         | Application binary interface (default: 32-bit integer ABI `ilp32`)
 | `USER_FLAGS`   | Additional flags that will be forwarded to the compiler tools
-| `NEORV32_HOME` | Relative or absolute path to the NEORV32 project home folder; adapt this if the makefile/project is not in the project's
-default `sw/example` folder
+| `NEORV32_HOME` | Relative or absolute path to the NEORV32 project home folder; adapt this if the makefile/project is not in the project's default `sw/example` folder
 |=======================

 :sectnums:
@ -266,12 +256,11 @@ The following default compiler flags are used for compiling an application. Thes
 ==== Custom (Compiler) Flags

 Custom flags can be _appended_ to the `USER_FLAGS` variable. This allows to customize the entire software framework while
-calling `make` without the need to change the makefile(s) or the linker script.
+calling `make` without the need to change the makefile(s) or the linker script. The following example will add debug symbols
+to the executable (`-g`) and will also re-define the linker script's `__neorv32_heap_size` variable setting the maximal heap
+size to 4096 bytes (see sections <<_linker_script>> and <<_ram_layout>>):

-The following example will add debug symbols to the executable (`-g`) and will also define the linker script's
-`__neorv32_heap_size` setting the maximal heap size to 4096 bytes:
-
-.Example: using the `USER_FLAGS` variable for customization
+.Using the `USER_FLAGS` Variable for Customization
 [source,bash]
 ----
 $ make USER_FLAGS+="-g -Wl,--__neorv32_heap_size,__heap_size=4096" clean_all exe
@ -283,8 +272,8 @@ $ make USER_FLAGS+="-g -Wl,--__neorv32_heap_size,__heap_size=4096" clean_all exe
 :sectnums:
 === Executable Image Format

-In order to generate an executable for th processors all source files have to be compiled, linked
-and packed into a _final executable_.
+In order to generate an executable for the processors all source files have to be compiled, linked
+and packed into a final executable.

 :sectnums:
 ==== Linker Script
@ -326,8 +315,7 @@ __neorv32_rom_base = DEFINED(__neorv32_rom_base) ? __neorv32_rom_base : 0x000000
 __neorv32_ram_base = DEFINED(__neorv32_ram_base) ? __neorv32_ram_base : 0x80000000; /* = VHDL package's "dspace_base_c" */
 ----

-Only the region **sizes** should be modified by the user. The base addresses are defined by the processor's hardware (see section
-<<_address_space>>) and should not be altered at all. The size (and base) configuration can be edited by the user - either by explicitly
+The region size and base address configuration can be edited by the user - either by explicitly
 changing the default values in the linker script or by overriding them when invoking `make`:
 
 .Overriding default rom size configuration (setting 4096 bytes)
@ -337,18 +325,19 @@ $ make USER_FLAGS+="-Wl,--defsym,__neorv32_rom_size=4096" clean_all exe
 ----

 [IMPORTANT]
-`neorv32_rom_base` (= `ORIGIN` of the `ram` section) has to be always identical to the processor's `dspace_base_c` hardware configuration.
-Also, `neorv32_ram_base` (= `ORIGIN` of the `rom` section) has to be always identical to the processor's `ispace_base_c` hardware configuration.
+`neorv32_rom_base` (= `ORIGIN` of the `ram` section) has to be always identical to the processor's `dspace_base_c` hardware
+configuration. Also, `neorv32_ram_base` (= `ORIGIN` of the `rom` section) has to be always identical to the processor's
+`ispace_base_c` hardware configuration.

 [NOTE]
-The default configuration for the `rom` section assumes a maximum of 2GB _logical_ memory address space. This size does not have
-to reflect the _actual_ physical size of the instruction memory (internal IMEM and/or processor-external memory). It just provides a maximum
-limit. When uploading a new executable via the bootloader, the bootloader itself checks if sufficient _physical_ instruction memory is available.
-If a new executable is embedded right into the internal-IMEM the synthesis tool will check, if the configured instruction memory size
-is sufficient (e.g., via the <<_mem_int_imem_size>> generic).
+The default configuration for the `rom` section assumes a maximum of 2GB _logical_ memory address space. This size does not
+have to reflect the _actual_ physical size of the entire instruction memory. It just provides a maximum limit. When uploading
+a new executable via the bootloader, the bootloader itself checks if sufficient _physical_ instruction memory is available.
+If a new executable is embedded right into the internal-IMEM the synthesis tool will check, if the configured instruction memory
+size is sufficient.

-The linker maps all the regions from the compiled object files into five final sections: `.text`, `.rodata`, `.data`, `.bss` and `.heap`.
-These regions contain everything required for the application to run:
+The linker maps all the regions from the compiled object files into five final sections: `.text`,
+`.rodata`, `.data`, `.bss` and `.heap`:

 .Linker script - memory regions
 [cols="<1,<9"]
@ -368,7 +357,7 @@ sections are extracted and concatenated into a single file `main.bin`.

 .Section Alignment
 [NOTE]
-The default NEORV32 linker script aligns _all_ regions so they start and end on a 32-bit (word) boundary. The default
+The default NEORV32 linker script aligns _all_ regions so they start and end on a 32-bit (word) boundaries. The default
 NEORV32 start-up code (crt0) makes use of this alignment by using word-level memory instructions to initialize the `.data`
 section and to clear the `.bss` section (faster!).

@ -411,8 +400,7 @@ using dynamic memory allocation.
 :sectnums:
 ==== C Standard Library

-The NEORV32 is a processor for _embedded_ applications, which is not capable of running desktop OSs like Linux
-(at least not without emulation). Hence, the default software framework relies on **newlib** as default C standard library.
+The default software framework relies on **newlib** as default C standard library.

 .RTOS Support
 [NOTE]
@ -424,9 +412,9 @@ for more information.
 Newlib provides stubs for common "system calls" (like file handling and standard input/output) that are used by other
 C libraries like `stdio`. These stubs are available in `sw/source/source/syscalls.c` and were adapted for the NEORV32 processor.

-.Standard Console(s)
+.Standard Consoles
 [NOTE]
-<<_primary_universal_asynchronous_receiver_and_transmitter_uart0, UART0>>
+The <<_primary_universal_asynchronous_receiver_and_transmitter_uart0, UART0>>
 is used to implement all the standard input, output and error consoles (`STDIN`, `STDOUT` and `STDERR`).

 .Constructors and Destructors
@ -446,7 +434,7 @@ is available in `sw/example/demo_newlib`
 The `main.bin` file is packed by the NEORV32 image generator (`sw/image_gen`) to generate the final executable file.
 The image generator can generate several types of executables selected by a flag when calling the generator:

-[cols="<1,<9"]
+[cols="<2,<8"]
 [grid="none"]
 |=======================
 | `-app_bin` | Generates an executable binary file `neorv32_exe.bin` (including header) for UART uploading via the bootloader.
@ -456,8 +444,8 @@ The image generator can generate several types of executables selected by a flag
 | `-bld_img` | Generates an executable VHDL memory initialization image (no header) for the processor-internal BOOT ROM. This option generates the `rtl/core/neorv32_bootloader_image.vhd` file.
 |=======================

-All these options are managed by the makefile. The "normal application2 compilation flow will generate the `neorv32_exe.bin`
-executable for uploading via UART to the default NEORV32 bootloader.
+All these options are managed by the makefile. The normal application compilation flow will generate the `neorv32_exe.bin`
+executable designated for uploading via the default NEORV32 bootloader.

 .Image Generator Compilation
 [NOTE]
@ -465,12 +453,12 @@ The sources of the image generator are automatically compiled when invoking the

 .Executable Header
 [NOTE]
-The image generator add a small header to the `neorv32_exe.bin` executable, which consists of three 32-bit words located right at the
-beginning of the file. The first word of the executable is the signature word and is always `0x4788cafe`. Based on this word the bootloader
-can identify a valid image file. The next word represents the size in bytes of the actual program
-image in bytes. A simple "complement" checksum of the actual program image is given by the third word. This
-provides a simple protection against data transmission or storage errors. **Note that this executable format cannot be used for _direct_
-execution (e.g. via XIP or direct memory access).**
+The image generator add a small header to the `neorv32_exe.bin` executable, which consists of three 32-bit words located right
+at the beginning of the file. The first word of the executable is the signature word and is always `0x4788cafe`. Based on this
+word the bootloader can identify a valid image file. The next word represents the size in bytes of the actual program image in
+bytes. A simple "complement" checksum of the actual program image is given by the third word. This provides a simple protection
+against data transmission or storage errors. **Note that this executable format cannot be used for _direct_ execution (e.g. via
+XIP or direct memory access).**


 :sectnums:
--- a/docs/datasheet/software_bootloader.adoc
+++ b/docs/datasheet/software_bootloader.adoc
@ -2,37 +2,34 @@
 === Bootloader

 [NOTE]
-This section refers to the **default** bootloader from the repository. The bootloader can be customized
-to target application-specific scenarios using pre-defined options (see User Guide section
-https://stnolting.github.io/neorv32/ug/#_customizing_the_internal_bootloader[Customizing the Internal Bootloader]
-) or it can be completely rewritten/replaced for custom purpose.
+This section refers to the **default** NEORV32 bootloader.

-The NEORV32 bootloader (source code `sw/bootloader/bootloader.c`) provides an optional build-in firmware that
+The NEORV32 bootloader (`sw/bootloader/bootloader.c`) provides an optional built-in firmware that
 allows to upload new application executables at _any time_ without the need to re-synthesize the FPGA's bitstream.
 A UART connection is used to provide a simple text-based user interface that allows to upload executables.

 Furthermore, the bootloader provides options to store an executable to a processor-external SPI flash.
 An "auto boot" feature can optionally fetch this executable right after reset if there is no user interaction
-via UART. This allows to build processor setups with _non-volatile application storage_, which can still be updated at any time. 
+via UART. This allows to build processor setups with _non-volatile application storage_ while maintaining the option
+to update the application software at any timer.


 :sectnums:
 ==== Bootloader SoC/CPU Requirements

-The bootloader relies on certain CPU and SoC extensions and modules to be enabled to allow full functionality.
+The bootloader requires certain CPU and SoC extensions and modules to be enabled in order to operate correctly.

-[cols="<3,<12"]
+[cols="^2,<8"]
 [grid="none"]
 |=======================
-| **REQUIRED**  | The bootloader is implemented only if the <<_int_bootloader_en>> is _true_ (default). This will automatically select the CPU's <<_indirect_boot>> boot configuration.
-| **REQUIRED**  | The bootloader requires the privileged architecture CPU extension (<<_zicsr_control_and_status_register_access_privileged_architecture>>) to be enabled.
-| **REQUIRED**  | At least 512 bytes of data memory (processor-internal DMEM or processor-external DMEM) are required for the bootloader's stack.
-| _RECOMMENDED_ | For user interaction via UART (like uploading executables) the primary UART (<<_primary_universal_asynchronous_receiver_and_transmitter_uart0>>) has to be implemented.
-Without UART0 the auto-boot via SPI is still supported but the bootloader should be customized (see User Guide).
+| **REQUIRED**  | The bootloader is implemented only if the `INT_BOOTLOADER_EN` top generic is `true`. This will automatically select the CPU's <<_indirect_boot>> boot configuration.
+| **REQUIRED**  | The bootloader requires the privileged architecture CPU extension (<<_zicsr_isa_extension>>) to be enabled.
+| **REQUIRED**  | At least 512 bytes of data memory (processor-internal DMEM or processor-external DMEM) are required for the bootloader's stack and global variables.
+| _RECOMMENDED_ | For user interaction via the <<_bootloader_console>> (like uploading executables) the primary UART (<<_primary_universal_asynchronous_receiver_and_transmitter_uart0>>) is required.
 | _RECOMMENDED_ | The default bootloader uses bit 0 of the <<_general_purpose_input_and_output_port_gpio>> output port to drive a high-active "heart beat" status LED.
-| _RECOMMENDED_ | The <<_machine_system_timer_mtime>> is used to control blinking of the status LED and also to automatically trigger the auto-boot sequence.
-| OPTIONAL      | The SPI controller (<<_serial_peripheral_interface_controller_spi>>) is needed to store/load executable from external flash (for the auto boot feature).
-| OPTIONAL      | The XIP controller (<<_execute_in_place_module_xip>>) is needed to execute code directly from a pre-programmed SPI flash.
+| _RECOMMENDED_ | The <<_machine_system_timer_mtime>> is used to control blinking of the status LED and also to automatically trigger the <<_auto_boot_sequence>>.
+| OPTIONAL      | The SPI controller (<<_serial_peripheral_interface_controller_spi>>) is needed to store/load executable from external flash using the <<_auto_boot_sequence>>.
+| OPTIONAL      | The XIP controller (<<_execute_in_place_module_xip>>) is needed to boot/execute code directly from a pre-programmed SPI flash.
 |=======================


@ -53,17 +50,16 @@ The SPI flash has to support single-byte read and write operations, 24-bit addre
 .Custom Configuration
 [TIP]
 Most properties (like chip select line, flash address width, SPI clock frequency, ...) of the default bootloader can be reconfigured
-without the need to change the source code. Custom configuration can be made using command line switches when recompiling the bootloader.
-See the User Guide https://stnolting.github.io/neorv32/ug/#_customizing_the_internal_bootloader for more information.
+without the need to change the source code. Custom configuration can be made using command line switches (defines) when recompiling
+the bootloader. See the User Guide https://stnolting.github.io/neorv32/ug/#_customizing_the_internal_bootloader for more information.


 :sectnums:
 ==== Bootloader Console

-To interact with the bootloader, connect the primary UART (UART0) signals (`uart0_txd_o` and
-`uart0_rxd_o`) of the processor's top entity via a serial port (-adapter) to your computer (hardware flow control is
-not used so the according interface signals can be ignored.), configure your
-terminal program using the following settings and perform a reset of the processor.
+To interact with the bootloader, connect the primary UART (UART0) signals (`uart0_txd_o` and `uart0_rxd_o`) of the processor's top
+entity via a serial port (-adapter) to your computer (hardware flow control is not used so the according interface signals can be
+ignored), configure your terminal program using the following settings and perform a reset of the processor.

 Terminal console settings (`19200-8-N-1`):

@ -74,8 +70,9 @@ Terminal console settings (`19200-8-N-1`):
 * newline on `\r\n` (carriage return, newline)
 * no transfer protocol / control flow protocol - just raw bytes

+.Terminal Program
 [IMPORTANT]
-_Any_ terminal program that can connect to a serial port should work. However, make sure the program
+Any terminal program that can connect to a serial port should work. However, make sure the program
 can transfer data in _raw_ byte mode without any protocol overhead (e.g. XMODEM). Some terminal programs struggle with
 transmitting files larger than 4kB (see https://github.com/stnolting/neorv32/pull/215). Try a different terminal program
 if uploading of a binary does not work.
@ -108,13 +105,13 @@ The start-up screen gives some brief information about the bootloader and severa
 |=======================
 | `BLDV` | Bootloader version (built date).
 | `HWV`  | Processor hardware version (the <<_mimpid>> CSR); in BCD format; example: `0x01040606` = v1.4.6.6).
-| `CID`  | Custom user-defined ID (via the `CUSTOM_ID` register from <<_system_configuration_information_memory_sysinfo>>; defined by the <<_custom_id>> generic).
-| `CLK`  | Processor clock speed in Hz (via the `CLK` register from <<_system_configuration_information_memory_sysinfo>>; defined by the <<_clock_frequency>> generic).
+| `CID`  | Custom user-defined ID (via the `CUSTOM_ID` register from <<_system_configuration_information_memory_sysinfo>>.
+| `CLK`  | Processor clock speed in Hz (via the `CLK` register from <<_system_configuration_information_memory_sysinfo>>.
 | `MISA` | RISC-V CPU extensions (<<_misa>> CSR).
 | `XISA` | NEORV32-specific CPU extensions (<<_mxisa>> CSR).
-| `SOC`  | Processor configuration (via the `SOC` register from the <<_system_configuration_information_memory_sysinfo>>; mainly defined by the `IO_*` and `MEM_*` configuration generics).
-| `IMEM` | IMEM memory base address and size in byte (via the `IMEM_SIZE` and `ISPACE_BASE` registers from the <<_system_configuration_information_memory_sysinfo>>; defined by the <<_mem_int_imem_size>> generic).
-| `DMEM` | DMEM memory base address and size in byte (via the `DMEM_SIZE` and `DSPACE_BASE` registers from the <<_system_configuration_information_memory_sysinfo>>; defined by the <<_mem_int_dmem_size>> generic).
+| `SOC`  | Processor configuration (via the `SOC` register from the <<_system_configuration_information_memory_sysinfo>>.
+| `IMEM` | IMEM memory base address and size in byte (via the `IMEM_SIZE` and `ISPACE_BASE` registers from the <<_system_configuration_information_memory_sysinfo>>.
+| `DMEM` | DMEM memory base address and size in byte (via the `DMEM_SIZE` and `DSPACE_BASE` registers from the <<_system_configuration_information_memory_sysinfo>>.
 |=======================

 Now you have 8 seconds to press _any_ key. Otherwise, the bootloader starts the <<_auto_boot_sequence>>. When
@ -177,7 +174,7 @@ section https://stnolting.github.io/neorv32/ug/#_customizing_the_internal_bootlo
 .SPI Flash Programming
 [TIP]
 For detailed information on using an SPI flash for application storage see User Guide section
-https://stnolting.github.io/neorv32/ug/#_programming_an_external_spi_flash_via_the_bootloader[Programming an External SPI Flash via the Bootloader].
+https://stnolting.github.io/neorv32/ug/#_programming_an_external_spi_flash_via_the_bootloader[Programming an ExternalSPI Flash via the Bootloader].


 :sectnums:
--- a/docs/datasheet/software_rte.adoc
+++ b/docs/datasheet/software_rte.adoc
@ -1,11 +1,9 @@
 :sectnums:
 === NEORV32 Runtime Environment

-The NEORV32 software framework provides a minimal runtime environment (**RTE**) that takes care of a stable
-and _safe_ execution environment by handling _all_ traps (= exceptions & interrupts). The RTE simplifies trap handling
-by wrapping the CPU's _privileged architecture_ (i.e. trap-related CSRs) into a unified software API.
-The NEORV32 RTE is a software library (`sw/lib/source/neorv32_rte.c`) that is part of the default processor library set.
-It provides public functions via `sw/lib/include/neorv32_rte.h` for application interaction.
+The NEORV32 software framework provides a minimal **runtime environment** (abbreviated "RTE") that takes care of a stable
+and _safe_ execution environment by handling _all_ traps (exceptions & interrupts). The RTE simplifies trap handling
+by wrapping the CPU's privileged architecture (i.e. trap-related CSRs) into a unified software API.

 Once initialized, the RTE provides <<_default_rte_trap_handlers>> that catch all possible traps. These
 default handlers just output a message via UART to inform the user when a certain trap has been triggered. The
@ -14,36 +12,34 @@ default handlers can be overridden by the application code to install applicatio
 [IMPORTANT]
 Using the RTE is **optional but highly recommended**. The RTE provides a simple and comfortable way of delegating
 traps to application-specific handlers while making sure that all traps (even though they are not explicitly used
-by the application) are handled correctly. Performance-optimized applications or embedded operating systems should
-not use the RTE for delegating traps.
-
-[NOTE]
-For the **C standard runtime library** see section <<c_standard_library>>.
+by the application) are handled correctly. Performance-optimized applications or embedded operating systems may
+not use the RTE at allin order to increase response time.


 ==== RTE Operation

-The RTE handles the trap-related CSRs of the CPU's privileged architecture (<<_machine_trap_handling_csrs>>).
-It initializes the <<_mtvec>> CSR, which provides the base entry point for all trap
-handlers. The address stored to this register reflects the **first-level trap handler**, which is provided by the
-NEORV32 RTE. Whenever an exception or interrupt is triggered this first-level handler is executed.
+The RTE manages the trap-related CSRs of the CPU's privileged architecture (<<_machine_trap_handling_csrs>>).
+It initializes the <<_mtvec>> CSR, which provides the base entry point for all trap handlers. The address
+stored to this register defines the address of the  **first-level trap handler**, which is provided by the
+NEORV32 RTE. Whenever an exception or interrupt is triggered this first-level trap handler is executed.

 The first-level handler performs a complete context save, analyzes the source of the trap and
 calls the according **second-level trap handler**, which takes care of the actual exception/interrupt
-handling. For this, the RTE manages a private look-up table to store the addresses of the according trap
-handlers.
+handling. The RTE manages a private look-up table to store the addresses of the according second-level trap handlers.

 After the initial RTE setup, each entry in the RTE's trap handler look-up table is initialized with a
 <<_default_rte_trap_handlers>>. These default handler do not execute any trap-related operations - they
-just output a message via the *primary UART (UART0)* to inform the user that a trap has occurred, that is not
-handled by the actual application. After sending this message, the RTE tries to continue executing the user program.
+just output a message via the *primary UART (UART0)* to inform the user that a trap has occurred, which is not (yet)
+handled by the actual application. After sending this message, the RTE tries to continue executing the actual program
+by resolving the trap cause.


 ==== Using the RTE

-The NEORV32 is enabled by calling the RTE's setup function:
+The NEORV32 is part of the default NEORV32 software framework. However, it has to explicitly enabled by calling
+the RTE's setup function:

-.Function Prototype: RTE Setup
+.RTE Setup (Function Prototype)
 [source,c]
 ----
 void neorv32_rte_setup(void);
@ -52,24 +48,45 @@ void neorv32_rte_setup(void);
 [NOTE]
 The RTE should be enabled right at the beginning of the application's `main` function.

-As mentioned above, _all_ traps will only trigger execution of the RTE's <<_default_rte_trap_handlers>>.
-To use application-specific handlers, which actually _handle_ a trap, the default handlers can be overridden
+As mentioned above, all traps will just trigger execution of the RTE's <<_default_rte_trap_handlers>> at first.
+To use application-specific handlers, which actually "handle" a trap, the default handlers can be overridden
 by installing user-defined ones:

-.Function Prototype: Installing an Application-Specific Trap Handler
+.Installing an Application-Specific Trap Handler (Function Prototype)
 [source,c]
 ----
 int neorv32_rte_handler_install(uint8_t id, void (*handler)(void));
 ----

 The first argument `id` defines the "trap ID" (for example a certain interrupt request) that shall be handled
-by the user-defined handler. The second argument `*handler` is the actual function that implements the trap
-handler. The function return zero on success and a non-zero value if an error occurred (invalid `id`). In this
-case no modifications to the RTE's trap look-up-table will be made.
+by the user-defined handler. These IDs are defined in `sw/lib/include/neorv32_rte.h`:

-The custom handler functions need to have a specific format without any arguments an with no return value:
+.RTE Trap Identifiers (cut-out)
+[source,c]
+----
+enum NEORV32_RTE_TRAP_enum {
+  RTE_TRAP_I_MISALIGNED =  0, /**< Instruction address misaligned */
+  RTE_TRAP_I_ACCESS     =  1, /**< Instruction (bus) access fault */
+  RTE_TRAP_I_ILLEGAL    =  2, /**< Illegal instruction */
+  RTE_TRAP_BREAKPOINT   =  3, /**< Breakpoint (EBREAK instruction) */
+  RTE_TRAP_L_MISALIGNED =  4, /**< Load address misaligned */
+  RTE_TRAP_L_ACCESS     =  5, /**< Load (bus) access fault */
+  RTE_TRAP_S_MISALIGNED =  6, /**< Store address misaligned */
+  RTE_TRAP_S_ACCESS     =  7, /**< Store (bus) access fault */
+  RTE_TRAP_UENV_CALL    =  8, /**< Environment call from user mode (ECALL instruction) */
+  RTE_TRAP_MENV_CALL    =  9, /**< Environment call from machine mode (ECALL instruction) */
+  RTE_TRAP_MSI          = 10, /**< Machine software interrupt */
+  RTE_TRAP_MTI          = 11, /**< Machine timer interrupt */
+  RTE_TRAP_MEI          = 12, /**< Machine external interrupt */
+  RTE_TRAP_FIRQ_0       = 13, /**< Fast interrupt channel 0 */
+  RTE_TRAP_FIRQ_1       = 14, /**< Fast interrupt channel 1 */
+  ...
+----

-.Function Prototype: Custom Trap Handler
+The second argument `*handler` is the actual function that implements the user-defined trap handler.
+The custom handler functions need to have a specific format without any arguments and with no return value:
+
+.Custom Trap Handler (Function Prototype)
 [source,c]
 ----
 void custom_trap_handler_xyz(void) {
@ -80,56 +97,14 @@ void custom_trap_handler_xyz(void) {

 .Custom Trap Handler Attributes
 [WARNING]
-Do NOT use the `((interrupt))` attribute for the application trap handler functions! This
+Do **NOT** use the `((interrupt))` attribute for the application trap handler functions! This
 will place a `mret` instruction to the end of it making it impossible to return to the first-level
 trap handler of the RTE core, which will cause stack corruption.

-The trap identifier `id` specifies the according trap cause. These can be an _asynchronous trap_ like
-an interrupt from one of the processor modules or a _synchronous trap_ triggered by software-caused events
-like an illegal instruction or an environment call instruction. The `sw/lib/include/neorv32_rte.h` library files
-provides aliases for trap events supported by the CPU (see <<_neorv32_trap_listing>>) that can be used when
-installing custom trap handler functions:
-
-.RTE Trap ID List
-[cols="<5,<12"]
-[options="header",grid="rows"]
-|=======================
-| ID alias [C] | Description / trap causing event
-| `RTE_TRAP_I_MISALIGNED` | instruction address misaligned
-| `RTE_TRAP_I_ACCESS`     | instruction (bus) access fault
-| `RTE_TRAP_I_ILLEGAL`    | illegal instruction
-| `RTE_TRAP_BREAKPOINT`   | breakpoint (`ebreak` instruction)
-| `RTE_TRAP_L_MISALIGNED` | load address misaligned
-| `RTE_TRAP_L_ACCESS`     | load (bus) access fault
-| `RTE_TRAP_S_MISALIGNED` | store address misaligned
-| `RTE_TRAP_S_ACCESS`     | store (bus) access fault
-| `RTE_TRAP_MENV_CALL`    | environment call from machine mode (`ecall` instruction)
-| `RTE_TRAP_UENV_CALL`    | environment call from user mode (`ecall` instruction)
-| `RTE_TRAP_MTI`          | machine timer interrupt
-| `RTE_TRAP_MEI`          | machine external interrupt
-| `RTE_TRAP_MSI`          | machine software interrupt
-| `RTE_TRAP_FIRQ_0`       | fast interrupt channel 0
-| `RTE_TRAP_FIRQ_1`       | fast interrupt channel 1
-| `RTE_TRAP_FIRQ_2`       | fast interrupt channel 2
-| `RTE_TRAP_FIRQ_3`       | fast interrupt channel 3
-| `RTE_TRAP_FIRQ_4`       | fast interrupt channel 4
-| `RTE_TRAP_FIRQ_5`       | fast interrupt channel 5
-| `RTE_TRAP_FIRQ_6`       | fast interrupt channel 6
-| `RTE_TRAP_FIRQ_7`       | fast interrupt channel 7
-| `RTE_TRAP_FIRQ_8`       | fast interrupt channel 8
-| `RTE_TRAP_FIRQ_9`       | fast interrupt channel 9
-| `RTE_TRAP_FIRQ_10`      | fast interrupt channel 10
-| `RTE_TRAP_FIRQ_11`      | fast interrupt channel 11
-| `RTE_TRAP_FIRQ_12`      | fast interrupt channel 12
-| `RTE_TRAP_FIRQ_13`      | fast interrupt channel 13
-| `RTE_TRAP_FIRQ_14`      | fast interrupt channel 14
-| `RTE_TRAP_FIRQ_15`      | fast interrupt channel 15
-|=======================
-
 The following example shows how to install a custom handler (`custom_mtime_irq_handler`) for handling
 the RISC-V machine timer (MTIME) interrupt:

-.Example: Installing the MTIME IRQ Handler
+.Installing a MTIME IRQ Handler
 [source,c]
 ----
 neorv32_rte_handler_install(RTE_TRAP_MTI, custom_mtime_irq_handler);
@ -144,10 +119,7 @@ and will re-install the <<_default_rte_trap_handlers>> for the specific trap.
 int neorv32_rte_handler_uninstall(uint8_t id);
 ----

-The argument `id` defines the identifier of the according trap that shall be un-installed. The function return zero
-on success and a non-zero value if an error occurred (invalid `id`). In this case no modifications to the RTE's trap
-look-up-table will be made.
-
+The argument `id` defines the identifier of the according trap that shall be un-installed. 
 The following example shows how to un-install the custom handler `custom_mtime_irq_handler` from the
 RISC-V machine timer (MTIME) interrupt:

@ -160,19 +132,16 @@ neorv32_rte_handler_uninstall(RTE_TRAP_MTI);

 ==== Default RTE Trap Handlers

-The default RTE trap handlers are executed when a certain trap is triggered that is not (yet) handled by a user-defined
-application-specific trap handler. The default handler will output a message giving additional debug information
-via UART0 to inform the user and will try to resume normal program execution.
-
-.Fast Interrupts (FIRQs)
-[NOTE]
-The RTE default trap handlers will also clear the according <<_mip>> bit if an un-handled fast interrupt occurs.
+The default RTE trap handlers are executed when a certain trap is triggered that is not (yet) handled by an
+application-defined trap handler. The default handler will output a message giving additional debug information
+via the <<_primary_universal_asynchronous_receiver_and_transmitter_uart0>> to inform the user and it will also
+try to resume normal program execution. Some exemplary RTE outputs are shown below.

 .Continuing Execution
 [WARNING]
-In most cases the RTE can successfully continue operation - for example if it catches an **interrupt** request that is not handled
-by the actual application program. However, if the RTE catches an un-handled **trap** like a bus access fault
-continuing execution will most likely fail making the CPU crash.
+In most cases the RTE can successfully continue operation - for example if it catches an **interrupt** request
+that is not handled by the actual application program. However, if the RTE catches an un-handled **trap** like
+a bus access fault exception continuing execution will most likely fail making the CPU crash.

 .RTE Default Trap Handler Output Examples
 [source]
@ -182,13 +151,14 @@ continuing execution will most likely fail making the CPU crash.
 <RTE> Load address misaligned @ PC=0x00000440, ADDR=0x80000101 </RTE> <3>
 <RTE> Fast IRQ 0x00000003 @ PC=0x00000820 </RTE> <4>
 ----
-<1> Illegal 32-bit instruction at address 0x000002d6.
-<2> Illegal 16-bit instruction at address 0x00000302.
-<3> Misaligned load access at address 0x00000440 (trying to load a full word from 0x80000101).
-<4> Fast interrupt request from channel 3 before executing instruction at address 0x00000820.
+<1> Illegal 32-bit instruction at address `0x000002d6`.
+<2> Illegal 16-bit instruction at address `0x00000302`.
+<3> Misaligned load access at address `0x00000440` (trying to load a full 32-bit word from address `0x80000101`).
+<4> Fast interrupt request from channel 3 before executing instruction at address `0x00000820`.

-The specific _message_ right at the beginning of the debug trap handler message corresponds to the trap code from the
-<<_mcause>> CSR (see <<_neorv32_trap_listing>>). A full list of all messages and the according `mcause` trap codes is shown below.
+The specific message right at the beginning of the debug trap handler message corresponds to the trap code
+obtained from the <<_mcause>> CSR (see <<_neorv32_trap_listing>>). A full list of all messages and the according
+`mcause` trap codes is shown below.

 .RTE Default Trap Handler Messages and According `mcause` Values
 [cols="<5,^5"]
@ -224,12 +194,12 @@ The specific _message_ right at the beginning of the debug trap handler message
 | "Fast IRQ 0x0000000d"            | `0x8000001d`
 | "Fast IRQ 0x0000000e"            | `0x8000001e`
 | "Fast IRQ 0x0000000f"            | `0x8000001f`
-| "Unknown trap cause"             | _unknown_
+| "Unknown trap cause"             | undefined
 |=======================

 ===== Bus Access Faults

-For bus access faults the RTE default trap handlers also outputs an error code from the
+For bus access faults the RTE default trap handlers also outputs the error code obtained from the
 <<_internal_bus_monitor_buskeeper>> to show the cause of the bus fault. One example is shown below.

 .RTE Default Trap Handler Output Example (Load Access Bus Fault)
@ -243,8 +213,7 @@ Three different messages are possible here:

 * `[TIMEOUT_ERR]`: The accessed memory-mapped module did not respond within the valid access time window.
 In Most cases this is caused by accessing a module that has not been implemented or when accessing
-"address space holes" (unused/unmapped addresses).
-* `[DEVICE_ERR]`: The accesses memory-mapped module asserted it's error signal to indicate an invalid access.
-For example this can be caused by trying to write to read-only registers or by writing data quantities (like a byte)
-to devices that do not support sub-word write accesses.
-* `[PMP_ERR]`: This indicates an access right violation caused by the <<_pmp_physical_memory_protection>>.
+"address space holes" via unused/unmapped addresses (see section <<_bus_interface_protocol>>).
+* `[DEVICE_ERR]`: The accesses memory-mapped module asserted its error signal to indicate an invalid access.
+For example this can be caused by trying to write to read-only registers.
+* `[PMP_ERR]`: This indicates an access right violation caused by the <<_pmp_isa_extension>>.