[docs] update: Zalrsc -> Zaamo

2025-04-24 22:27:21 -04:00 · 2025-01-03 21:37:06 +01:00 · 2025-01-03 21:37:06 +01:00 · d65663e93d
commit d65663e93d
parent 69e82684eb
6 changed files with 65 additions and 109 deletions
--- a/README.md
+++ b/README.md
@ -106,7 +106,7 @@ setup according to your needs. Note that all of the following SoC modules are en
 [[`B`](https://stnolting.github.io/neorv32/#_b_isa_extension)]
 [[`U`](https://stnolting.github.io/neorv32/#_u_isa_extension)]
 [[`X`](https://stnolting.github.io/neorv32/#_x_isa_extension)]
-[[`Zalrsc`](https://stnolting.github.io/neorv32/#_zalrsc_isa_extension)]
+[[`Zaamo`](https://stnolting.github.io/neorv32/#_zaamo_isa_extension)]
 [[`Zba`](https://stnolting.github.io/neorv32/#_zba_isa_extension)]
 [[`Zbb`](https://stnolting.github.io/neorv32/#_zbb_isa_extension)]
 [[`Zbkb`](https://stnolting.github.io/neorv32/#_zbkb_isa_extension)]
--- a/docs/datasheet/cpu.adoc
+++ b/docs/datasheet/cpu.adoc
@ -415,7 +415,8 @@ always valid when set.
 | `rw`    |     1 | Access direction (`0` = read, `1` = write)
 | `src`   |     1 | Access source (`0` = instruction fetch, `1` = load/store)
 | `priv`  |     1 | Set if privileged (M-mode) access
-| `rvso`  |     1 | Set if current access is a reservation-set operation (`lr` or `sc` instruction, <<_zalrsc_isa_extension>>)
+| `amo`   |     1 | Set if current access is an atomic memory operation (<<_atomic_memory_access>>)
+| `amoop` |     4 | Type of atomic memory operation (<<_atomic_memory_access>>)
 3+^| **Out-Of-Band Signals**
 | `fence` |     1 | Data/instruction fence request; single-shot
 | `sleep` |     1 | Set if ALL upstream devices are in <<_sleep_mode>>
@ -463,36 +464,31 @@ additional latency). However, _all_ bus signals (request and response) need to b


 :sectnums:
-==== Atomic Accesses
+==== Atomic Memory Access

-The load-reservate (`lr.w`) and store-conditional (`sc.w`) instructions from the <<_zalrsc_isa_extension>> execute as standard
-load/store bus transactions but with the `rvso` ("reservation set operation") signal being set. It is the task of the
-<<_reservation_set_controller>> to handle these LR/SC bus transactions accordingly. Note that these reservation set operations
-are intended for processor-internal usage only (i.e. the reservation state is not available for processor-external modules yet).
+The <<_zaamo_isa_extension>> adds atomic read-modify-write memory operations. Since the <<_bus_interface_protocol>>
+only supports read-or-write operations, the atomic memory requests are handled by a dedicated module of the bus
+infrastructure - the <<_atomic_memory_operations_controller>>.

-.Reservation Set Controller
-[NOTE]
-See section <<_address_space>> / <<_reservation_set_controller>> for more information.
+For the CPU, the atomic memory accesses are handled as plain "load" operation but with the `amo` signal set
+and also providing write data (see <<_bus_interface>>). The `amoop` signal defines the actual atomic processing
+operation:

-The figure below shows three exemplary bus accesses (1 to 3 from left to right). The `req` signal record represents
-the CPU-side of the bus interface. For easier understanding the current state of the reservation set is added as `rvs_valid` signal.
-
-[start=1]
-. A load-reservate (LR) instruction using `addr` as address. This instruction returns the loaded data `rdata` via `rsp.data`
-and also registers a reservation for the address `addr` (`rvs_valid` becomes set).
-. A store-conditional (SC) instruction attempts to write `wdata1` to address `addr`. This SC operation **succeeds**, so
-`wdata1` is actually written to address `addr`. The successful operation is indicated by a **0** being returned via
-`rsp.data` together with `ack`. As the LR/SC is completed the registered reservation is invalidated (`rvs_valid` becomes cleared).
-. Another store-conditional (SC) instruction attempts to write `wdata2` to address `addr`. As the reservation set is already
-invalidated (`rvs_valid` is `0`) the store access fails, so `wdata2` is **not** written to address `addr` at all. The failed
-operation is indicated by a **1** being returned via `rsp.data` together with `ack`.
-
-.Three Exemplary LR/SC Bus Transactions (showing only in-band signals)
-image::bus_interface_atomic.png[700]
-
-.Store-Conditional Status
-[NOTE]
-The "normal" load data mechanism is used to return success/failure of the `sc.w` instruction to the CPU (via the LSB of `rsp.data`).
+.AMO Operation Type Encoding
+[cols="<1,<4"]
+[options="header",grid="rows"]
+|=======================
+| `bus_req_t.amoop` | Description
+| `-000` | swap
+| `-001` | unsigned add
+| `-010` | logical xor
+| `-011` | logical and
+| `-100` | logical or
+| `0110` | unsigned minimum
+| `0111` | unsigned maximum
+| `1110` | signed minimum
+| `1111` | signed maximum
+|=======================

 .Cache Coherency
 [IMPORTANT]
@ -521,7 +517,7 @@ This chapter gives a brief overview of all available ISA extensions.
 | <<_m_isa_extension,`M`>>               | Integer multiplication and division instructions              | <<_processor_top_entity_generics, `RISCV_ISA_M`>>
 | <<_u_isa_extension,`U`>>               | Less-privileged _user_ mode extension                         | <<_processor_top_entity_generics, `RISCV_ISA_U`>>
 | <<_x_isa_extension,`X`>>               | Platform-specific / NEORV32-specific extension                | Always enabled
-| <<_zalrsc_isa_extension,`Zalrsc`>>     | Atomic reservation-set instructions                           | <<_processor_top_entity_generics, `RISCV_ISA_Zalrsc`>>
+| <<_zaamo_isa_extension,`Zaamo`>>       | Atomic memory operations                                      | <<_processor_top_entity_generics, `RISCV_ISA_Zaamo`>>
 | <<_zba_isa_extension,`Zba`>>           | Shifted-add bit manipulation instructions                     | <<_processor_top_entity_generics, `RISCV_ISA_Zba`>>
 | <<_zbb_isa_extension,`Zbb`>>           | Basic bit manipulation instructions                           | <<_processor_top_entity_generics, `RISCV_ISA_Zbb`>>
 | <<_zbkb_isa_extension,`Zbkb`>>         | Scalar cryptographic bit manipulation instructions            | <<_processor_top_entity_generics, `RISCV_ISA_Zbkb`>>
@ -689,37 +685,23 @@ RISC-V specs. Also, custom trap codes for <<_mcause>> are implemented.
 * There are <<_neorv32_specific_csrs>>.


-==== `Zalrsc` ISA Extension
+==== `Zaamo` ISA Extension

-The `Zalrsc` ISA extension is a sub-extension of the RISC-V _atomic memory access_ (`A`) ISA extension and includes
-instructions for reservation-set operations (load-reservate `lr` and store-conditional `sc`) only.
-It is enabled by the top's <<_processor_top_entity_generics, `RISCV_ISA_Zalrsc`>> generic.
-
-.AMO / `A` Emulation
-[NOTE]
-The atomic memory access / read-modify-write operations of the `A` ISA extension can be emulated using the
-LR and SC operations (quote from the RISC-V spec.: "_Any AMO can be emulated by an LR/SC pair._").
-The NEORV32 <<_core_libraries>> provide an emulation wrapper for emulating AMO/read-modify-write instructions that is
-based on LR/SC pairs. A demo/program can be found in `sw/example/atomic_test`.
+The `Zaamo` ISA extension is a sub-extension of the RISC-V `A` ISA extension and compromises instructions for read-modify-write
+<<_atomic_memory_access>> operations. It is enabled by the top's <<_processor_top_entity_generics, `RISCV_ISA_Zaamo`>> generic.

 .Instructions and Timing
-[cols="<2,<4,<3"]
+[cols="<2,<4,<1"]
 [options="header", grid="rows"]
 |=======================
 | Class | Instructions | Execution cycles
-| Load-reservate word    | `lr.w` | 5
-| Store-conditional word | `sc.w` | 5
+| Atomic memory operations | `amoswap.w` `amoadd.w` `amoand.w` `amoor.w` `amoxor.w` `amomax[u].w` `amomin[u].w` | 5 + 2 * _memory_latency_
 |=======================

 .`aq` and `rl` Bits
 [NOTE]
 The instruction word's `aq` and `lr` memory ordering bits are not evaluated by the hardware at all.

-.Atomic Memory Access on Hardware Level
-[NOTE]
-More information regarding the atomic memory accesses and the according reservation
-sets can be found in section <<_reservation_set_controller>>.
-

 ==== `Zifencei` ISA Extension

--- a/docs/datasheet/soc.adoc
+++ b/docs/datasheet/soc.adoc
@ -226,7 +226,7 @@ The generic type "`suv(x:y)`" is an abbreviation for "`std_ulogic_vector(x downt
 | `RISCV_ISA_E`           | boolean   | false         | Enable <<_e_isa_extension>> (reduced register file size).
 | `RISCV_ISA_M`           | boolean   | false         | Enable <<_m_isa_extension>> (hardware-based integer multiplication and division).
 | `RISCV_ISA_U`           | boolean   | false         | Enable <<_u_isa_extension>> (less-privileged user mode).
-| `RISCV_ISA_Zalrsc`      | boolean   | false         | Enable <<_zalrsc_isa_extension>> (atomic reservation-set operations).
+| `RISCV_ISA_Zaamo`       | boolean   | false         | Enable <<_zaamo_isa_extension>> (atomic memory operations).
 | `RISCV_ISA_Zba`         | boolean   | false         | Enable <<_zba_isa_extension>> (shifted-add bit-manipulation instructions).
 | `RISCV_ISA_Zbb`         | boolean   | false         | Enable <<_zbb_isa_extension>> (basic bit-manipulation instructions).
 | `RISCV_ISA_Zbkb`        | boolean   | false         | Enable <<_zbkb_isa_extension>> (scalar cryptography bit manipulation instructions).
@ -576,67 +576,41 @@ explicit specific processor generic. See section <<_processor_external_bus_inter


 :sectnums:
-==== Reservation Set Controller
+==== Atomic Memory Operations Controller

-The reservation set controller is responsible for handling the load-reservate and store-conditional bus transaction that
-are triggered by the `lr.w` (LR) and `sc.w` (SC) instructions from the CPU's <<_zalrsc_isa_extension>>.
+The atomic memory operations (AMO) controller is responsible for handling the read-modify-write operations issued by the
+CPU's <<_zaamo_isa_extension>>. For each AMO request, the controller executes an atomic set of three operations:

-A "reservation" defines an address or address range that provides a guarding mechanism to support atomic accesses. A new
-reservation is registered by the LR instruction. The address provided by this instruction defines the memory location
-that is now monitored for atomic accesses. The according SC instruction evaluates the state of this reservation. If
-the reservation is still valid the write access triggered by the SC instruction is finally executed and the instruction
-return a "success" state (`rd` = 0). If the reservation has been invalidated the SC instruction will not write to memory
-and will return a "failed" state (`rd` = 1).
+.Simplified AMO Controller Operation
+[cols="^1,<3,<6"]
+[options="header",grid="rows"]
+|=======================
+| Step | Pseudo Code | Description
+| 1    | `tmp1 <= MEM[address];` | Perform a read operation accessing the addressed memory
+cell and store the loaded data into an internal buffer (`tmp1`).
+| 2    | `tmp2 <= tmp1 OP cpu_wdata` | The buffered data from the first step is processed
+using the write data provide by the CPU. The result is stored to another internal buffer (`tmp2`).
+| 3    | `MEM[address] <= tmp2;` `cpu_rdata <= tmp1;` | The data from the second buffer (`tmp2`) is
+written to the addressed memory cell. In parallel, the data from the first buffer (`tmp1` = original
+content of the addresses memory cell) is sent back to the requesting CPU.
+|=======================

-.Reservation Set(s) and Granule
-[NOTE]
-The reservation set controller supports only **a single** global reservation set with a **word-aligned 4-byte granule**.
+The controller performs two bus transactions: a read operations and a write operation. Only the acknowledge/error
+handshake of the last transaction is sent back to the CPU.

-The reservation is invalidated if...
-
-* an SC instruction is executed that accesses an address **outside** of the reservation set of the previous LR instruction.
-This SC instruction will **fail** (not writing to memory).
-* an SC instruction is executed that accesses an address **inside** of the reservation set of the previous LR instruction.
-This SC instruction will **succeed** (finally writing to memory).
-* a normal store operation accesses an address **inside** of the current reservation set (by the CPU or by the DMA).
-* a hardware reset is triggered.
-
-.Consecutive LR Instructions
-[NOTE]
-If an LR instruction is followed by another LR instruction the reservation set of the former one is overridden
-by the reservation set of the latter one.
-
-.Bus Access Errors
-[IMPORTANT]
-If the LR operation causes a bus access error (raising a load access exception) the reservation **is registered anyway**.
-If the SC operation causes a bus access error (raising a store access exception) an already registered reservation set
-**is invalidated anyway**.
-
-.Strong Semantic
-[IMPORTANT]
-The LR/SC mechanism follows the _strong semantic_ approach: the LR/SC instruction pair fails only if there is a write
-access to the referenced memory location between the LR and SC instructions (by the CPU itself or by the DMA).
-Context changes, interrupts, traps, etc. do not effect nor invalidate the reservation state at all.
+As the AMO controller is the memory-nearest instance (see <<_bus_system>>) the previously described set of operations
+cannot be interrupted. Hence, they execute in an atomic way.

 .Physical Memory Attributes
 [NOTE]
-The reservation set can be set for _any_ address (only constrained by the configured granularity). This also
-includes cached memory, memory-mapped IO devices and processor-external address spaces.
-
-Bus transactions triggered by the LR instruction register a new reservation set and are delegated to the adressed
-memory/device. Bus transactions triggered by the SC remove a reservation set and are forwarded to the adressed
-memory/device only if the SC operations succeeds. Otherwise, the access request is not forwarded and a local ACK is
-generated to terminate the bus transaction.
-
-.LR/SC Bus Protocol
-[NOTE]
-More information regarding the LR/SC bus transactions and the the according protocol can be found in section
-<<_bus_interface>> / <<_atomic_accesses>>.
+Atomic memory operations can be executed for _any_ address. This also includes
+cached memory, memory-mapped IO devices and processor-external address spaces.

 .Cache Coherency
 [IMPORTANT]
-Atomic operations **always bypass** the cache using direct/uncached accesses. Care must be taken
-to maintain data cache coherency (e.g. by using the `fence` instruction).
+Atomic operations **always bypass** the CPU's <<_processor_internal_data_cache_dcache, data cache>>
+using direct/uncached accesses. Care must be taken to maintain data cache coherency when accessing
+cached memory (e.g. by using the `fence` instruction).


 :sectnums:
--- a/docs/datasheet/soc_dcache.adoc
+++ b/docs/datasheet/soc_dcache.adoc
@ -19,7 +19,7 @@
 **Overview**

 The processor features an optional data cache to improve performance when using memories with high
-access latencies. The cache is connected directly to the CPU's data access interface and provides
+access latency. The cache is connected directly to the CPU's data access interface and provides
 full-transparent accesses. The cache is direct-mapped and uses "write-allocate" and "write-back" strategies.

 .Cached/Uncached Accesses
@ -28,8 +28,8 @@ The data cache provides direct accesses (= uncached) to memory in order to acces
 processor-internal IO/peripheral modules). All accesses that target the address range from `0xF0000000` to `0xFFFFFFFF`
 will not be cached at all (see section <<_address_space>>). Direct/uncached accesses have **lower** priority than
 cache block operations to allow continuous burst transfer and also to maintain logical instruction forward
-progress / data coherency. Furthermore, atomic load-reservate and store-conditional instructions (<<_zalrsc_isa_extension>>)
-will always **bypass** the cache.
+progress / data coherency. Furthermore, the atomic memory operations of the <<_zaamo_isa_extension>> will
+always **bypass** the cache.

 .Caching Internal Memories
 [NOTE]
--- a/docs/datasheet/soc_icache.adoc
+++ b/docs/datasheet/soc_icache.adoc
@ -19,7 +19,7 @@
 **Overview**

 The processor features an optional instruction cache to improve performance when using memories with high
-access latencies. The cache is connected directly to the CPU's instruction fetch interface and provides
+access latency. The cache is connected directly to the CPU's instruction fetch interface and provides
 full-transparent accesses. The cache is direct-mapped and read-only.

 .Cached/Uncached Accesses
@ -28,8 +28,8 @@ The data cache provides direct accesses (= uncached) to memory in order to acces
 processor-internal IO/peripheral modules). All accesses that target the address range from `0xF0000000` to `0xFFFFFFFF`
 will not be cached at all (see section <<_address_space>>). Direct/uncached accesses have **lower** priority than
 cache block operations to allow continuous burst transfer and also to maintain logical instruction forward
-progress / data coherency. Furthermore, atomic load-reservate and store-conditional instructions (<<_zalrsc_isa_extension>>)
-will always **bypass** the cache.
+progress / data coherency. Furthermore, the atomic memory operations of the <<_zaamo_isa_extension>> will
+always **bypass** the cache.

 .Caching Internal Memories
 [NOTE]
--- a/docs/datasheet/soc_xbus.adoc
+++ b/docs/datasheet/soc_xbus.adoc
@ -140,5 +140,5 @@ The data cache provides direct accesses (= uncached) to memory in order to acces
 All accesses that target the address range from `0xF0000000` to `0xFFFFFFFF`
 will not be cached at all (see section <<_address_space>>). Direct/uncached accesses have **lower** priority than
 cache block operations to allow continuous burst transfer and also to maintain logical instruction forward
-progress / data coherency. Furthermore, atomic load-reservate and store-conditional instructions (<<_zalrsc_isa_extension>>)
-will always **bypass** the cache.
+progress / data coherency. Furthermore, the atomic memory operations of the <<_zaamo_isa_extension>> will
+always **bypass** the cache.