mirror of
https://github.com/stnolting/neorv32.git
synced 2025-04-24 22:27:21 -04:00
Update CFU example: use XTEA as "real world" demo application (#855)
This commit is contained in:
commit
c6fa219f5d
5 changed files with 368 additions and 255 deletions
|
@ -29,6 +29,7 @@ mimpid = 0x01040312 -> Version 01.04.03.12 -> v1.4.3.12
|
|||
|
||||
| Date | Version | Comment | Link |
|
||||
|:----:|:-------:|:--------|:----:|
|
||||
| 18.03.2024 | 1.9.6.9 | :sparkles: update CFU example: now implementing the Extended Tiny Encryption Algorithm (XTEA) | [#855](https://github.com/stnolting/neorv32/pull/855) |
|
||||
| 16.03.2024 | 1.9.6.8 | rework cache system: L1 + L2 caches, all based on the generic cache component | [#853](https://github.com/stnolting/neorv32/pull/853) |
|
||||
| 16.03.2024 | 1.9.6.7 | cache optimizations: add read-only option, add option to disable direct/uncached accesses | [#851](https://github.com/stnolting/neorv32/pull/851) |
|
||||
| 15.03.2024 | 1.9.6.6 | :warning: clean-up configuration generics (remove XBUS endianness configuration; refine JEDED/VENDORID configuration); rearrange SYSINFO.SOC bits | [#850](https://github.com/stnolting/neorv32/pull/850) |
|
||||
|
|
|
@ -2,8 +2,8 @@
|
|||
:sectnums:
|
||||
=== Custom Functions Unit (CFU)
|
||||
|
||||
The Custom Functions Unit is the central part of the <<_zxcfu_isa_extension>> and represents
|
||||
the actual hardware module, which can be used to implement _custom RISC-V instructions_.
|
||||
The Custom Functions Unit (CFU) is the central part of the NEORV32-specific <<_zxcfu_isa_extension>> and
|
||||
represents the actual hardware module that can be used to implement **custom RISC-V instructions**.
|
||||
|
||||
The CFU is intended for operations that are inefficient in terms of performance, latency, energy consumption or
|
||||
program memory requirements when implemented entirely in software. Some potential application fields and exemplary
|
||||
|
@ -13,23 +13,27 @@ use-cases might include:
|
|||
* **Cryptographic:** bit substitution and permutation
|
||||
* **Communication:** conversions like binary to gray-code; multiply-add operations
|
||||
* **Image processing:** look-up-tables for color space transformations
|
||||
* implementing instructions from **other RISC-V ISA extensions** that are not yet supported by the NEORV32
|
||||
* implementing instructions from **other RISC-V ISA extensions** that are not yet supported by NEORV32
|
||||
|
||||
[NOTE]
|
||||
The CFU is not intended for complex and _CPU-independent_ functional units that implement complete accelerators
|
||||
The CFU is not intended for complex and **CPU-independent** functional units that implement complete accelerators
|
||||
(like block-based AES encryption). These kind of accelerators should be implemented as memory-mapped
|
||||
<<_custom_functions_subsystem_cfs>>. A comparison of all NEORV32-specific chip-internal hardware extension
|
||||
options is provided in the user guide section
|
||||
https://stnolting.github.io/neorv32/ug/#_adding_custom_hardware_modules[Adding Custom Hardware Modules].
|
||||
|
||||
.Default CFU Hardware Example
|
||||
[TIP]
|
||||
The default CFU module (`rtl/core/neorv32_cpu_cp_cfu.vhd`) implements the _Extended Tiny Encryption Algorithm (XTEA)_
|
||||
as "real world" application example.
|
||||
|
||||
|
||||
:sectnums:
|
||||
==== CFU Instruction Formats
|
||||
|
||||
The custom instructions executed by the CFU utilize a specific opcode space in the `rv32` 32-bit instruction
|
||||
space that has been explicitly reserved for user-defined extensions by the RISC-V specifications ("Guaranteed
|
||||
Non-Standard Encoding Space"). The NEORV32 CFU uses the `custom` opcodes to identify the instructions implemented
|
||||
by the CFU and to differentiate between the different instruction formats. The according binary encoding of these
|
||||
encoding space that has been explicitly reserved for user-defined extensions by the RISC-V specifications ("Guaranteed
|
||||
Non-Standard Encoding Space"). The NEORV32 CFU uses the `custom-*` opcodes to identify the instructions implemented
|
||||
by the CFU and to differentiate between the available instruction formats. The according binary encoding of these
|
||||
opcodes is shown below:
|
||||
|
||||
* `custom-0`: `0001011` RISC-V standard, used for <<_cfu_r3_type_instructions>>
|
||||
|
@ -44,9 +48,10 @@ opcodes is shown below:
|
|||
The R3-type CFU instructions operate on two source registers `rs1` and `rs2` and return the processing result to
|
||||
the destination register `rd`. The actual operation can be defined by using the `funct7` and `funct3` bit fields.
|
||||
These immediates can also be used to pass additional data to the CFU like offsets, look-up-tables addresses or
|
||||
shift-amounts. However, the actual functionality is entirely user-defined.
|
||||
shift-amounts. However, the actual functionality is entirely user-defined. Note that all immediate values are
|
||||
always compile-time-static.
|
||||
|
||||
Example operation: `rd <= rs1 xnor rs2`
|
||||
Example operation: `rd <= rs1 xnor rs2` (bit-wise XNOR)
|
||||
|
||||
.CFU R3-type instruction format
|
||||
image::cfu_r3type_instruction.png[align=center]
|
||||
|
@ -74,9 +79,10 @@ R3-type instructions can be implemented (7-bit + 3-bit = 10 bit -> 1024 differen
|
|||
The R4-type CFU instructions operate on three source registers `rs1`, `rs2` and `rs2` and return the processing
|
||||
result to the destination register `rd`. The actual operation can be defined by using the `funct3` bit field.
|
||||
Alternatively, this immediate can also be used to pass additional data to the CFU like offsets, look-up-tables
|
||||
addresses or shift-amounts. However, the actual functionality is entirely user-defined.
|
||||
addresses or shift-amounts. However, the actual functionality is entirely user-defined. Note that all immediate
|
||||
values are always compile-time-static.
|
||||
|
||||
Example operation: `rd <= (rs1 * rs2 + rs3)[31:0]`
|
||||
Example operation: `rd <= (rs1 * rs2 + rs3)[31:0]` (multiply-and-accumulate; "MAC")
|
||||
|
||||
.CFU R4-type instruction format
|
||||
image::cfu_r4type_instruction.png[align=center]
|
||||
|
@ -111,9 +117,9 @@ The R5-type CFU instructions operate on four source registers `rs1`, `rs2`, `rs3
|
|||
processing result to the destination register `rd`. As all bits of the instruction word are used to encode the
|
||||
five registers and the opcode, no further immediate bits are available to specify the actual operation. There
|
||||
are two different R5-type instruction with two different opcodes available. Hence, only two R5-type operations
|
||||
can be implemented out of the box.
|
||||
can be implemented by default.
|
||||
|
||||
Example operation: `rd <= rs1 & rs2 & rs3 & rs4`
|
||||
Example operation: `rd <= rs1 & rs2 & rs3 & rs4` (bit-wise AND of 4 operands)
|
||||
|
||||
.CFU R5-type instruction A format
|
||||
image::cfu_r5type_instruction_a.png[align=center]
|
||||
|
@ -207,7 +213,6 @@ neorv32_cpu_csr_write(CSR_CFUREG0, 0xabcdabcd); // write data to CFU CSR 0
|
|||
uint32_t tmp = neorv32_cpu_csr_read(CSR_CFUREG3); // read data from CFU CSR 3
|
||||
----
|
||||
|
||||
|
||||
.Additional CFU-internal CSRs
|
||||
[TIP]
|
||||
If more than four CFU-internal CSRs are required the designer can implement an "indirect access mechanism" based
|
||||
|
@ -215,35 +220,35 @@ on just two of the default CSRs: one CSR is used to configure the index while th
|
|||
data with the indexed CFU-internal CSR - this concept is similar to the RISC-V Indirect CSR Access Extension
|
||||
Specification (`Smcsrind`).
|
||||
|
||||
.Security Considerations
|
||||
[NOTE]
|
||||
The CFU CSRs are mapped to the user-mode CSR space so software running at _any privilege level_ can access these
|
||||
CSRs. However, accesses can be constrained to certain privilege level (see <<_custom_instructions_hardware>>).
|
||||
|
||||
|
||||
:sectnums:
|
||||
==== Custom Instructions Hardware
|
||||
|
||||
The actual functionality of the CFU's custom instructions is defined by the user-defined logic inside
|
||||
the CFU hardware module `rtl/core/neorv32_cpu_cp_cfu.vhd`.
|
||||
the CFU hardware module `rtl/core/neorv32_cpu_cp_cfu.vhd`. This file is highly commented to illustrate the
|
||||
hardware design considerations.
|
||||
|
||||
CFU operations can be entirely combinatorial (like bit-reversal) so the result is available at the end of
|
||||
the current clock cycle. Operations can also take several clock cycles to complete (like multiplications)
|
||||
and may also include internal states and memories. The CFU's internal control unit takes care of
|
||||
interfacing the custom user logic to the CPU pipeline.
|
||||
|
||||
.CFU Hardware Example & More Details
|
||||
[TIP]
|
||||
The default CFU hardware module already implement some exemplary instructions that are used for illustration
|
||||
by the CFU example program. See the CFU's VHDL source file (`rtl/core/neorv32_cpu_cp_cfu.vhd`), which
|
||||
is highly commented to explain the available signals, implementation options and the handshake with the CPU pipeline.
|
||||
|
||||
.CFU Hardware Resource Requirements
|
||||
[NOTE]
|
||||
Enabling the CFU and actually implementing R4-type and/or R5-type instructions (or more precisely, using
|
||||
the according operands for the CFU hardware) will add one or two, respectively, additional read ports to
|
||||
the core's register file significantly increasing resource requirements.
|
||||
|
||||
.CFU Access
|
||||
.CFU Access Privilege Levels
|
||||
[NOTE]
|
||||
The CFU is accessible from all privilege modes (including CFU-internal registers accessed via the indirects CSR
|
||||
access mechanism). It is the task of the CFU designers to add according access-constraining logic if certain CFU
|
||||
states shall not be exposed to all privilege levels (i.e. exncryption keys).
|
||||
states shall not be exposed to all privilege levels (i.e. encryption keys).
|
||||
|
||||
.CFU Execution Time
|
||||
[NOTE]
|
||||
|
|
|
@ -1,5 +1,5 @@
|
|||
-- #################################################################################################
|
||||
-- # << NEORV32 CPU - Co-Processor: Custom (Instructions) Functions Unit >> #
|
||||
-- # << NEORV32 CPU - Co-Processor: Custom (RISC-V Instructions) Functions Unit (CFU) >> #
|
||||
-- # ********************************************************************************************* #
|
||||
-- # For custom/user-defined RISC-V instructions (R3-type, R4-type and R5-type formats). See the #
|
||||
-- # CPU's documentation for more information. Also take a look at the "software-counterpart" of #
|
||||
|
@ -67,7 +67,7 @@ end neorv32_cpu_cp_cfu;
|
|||
|
||||
architecture neorv32_cpu_cp_cfu_rtl of neorv32_cpu_cp_cfu is
|
||||
|
||||
-- CFU Control - do not modify! ----------------------------
|
||||
-- CFU Control ---------------------------------------------
|
||||
-- ------------------------------------------------------------
|
||||
type control_t is record
|
||||
busy : std_ulogic; -- CFU is busy
|
||||
|
@ -85,29 +85,39 @@ architecture neorv32_cpu_cp_cfu_rtl of neorv32_cpu_cp_cfu is
|
|||
constant r5typeA_c : std_ulogic_vector(1 downto 0) := "10"; -- R5-type instruction A (custom-2 opcode)
|
||||
constant r5typeB_c : std_ulogic_vector(1 downto 0) := "11"; -- R5-type instruction B (custom-3 opcode)
|
||||
|
||||
|
||||
-- User-Defined Logic --------------------------------------
|
||||
-- ------------------------------------------------------------
|
||||
-- multiply-add unit (r4-type instruction example) --
|
||||
type madd_t is record
|
||||
sreg : std_ulogic_vector(2 downto 0); -- 3 cycles latency = 3 bits in arbitration shift register
|
||||
done : std_ulogic;
|
||||
--
|
||||
opa : std_ulogic_vector(XLEN-1 downto 0);
|
||||
opb : std_ulogic_vector(XLEN-1 downto 0);
|
||||
opc : std_ulogic_vector(XLEN-1 downto 0);
|
||||
mul : std_ulogic_vector(2*XLEN-1 downto 0);
|
||||
res : std_ulogic_vector(2*XLEN-1 downto 0);
|
||||
end record;
|
||||
signal madd : madd_t;
|
||||
-- xtea instructions (funct3 bit-field) --
|
||||
constant xtea_enc_v0_c : std_ulogic_vector(2 downto 0) := "000";
|
||||
constant xtea_enc_v1_c : std_ulogic_vector(2 downto 0) := "001";
|
||||
constant xtea_dec_v0_c : std_ulogic_vector(2 downto 0) := "010";
|
||||
constant xtea_dec_v1_c : std_ulogic_vector(2 downto 0) := "011";
|
||||
constant xtea_init_c : std_ulogic_vector(2 downto 0) := "100";
|
||||
|
||||
-- custom control and status registers (CSRs) --
|
||||
signal cfu_csr_0, cfu_csr_1 : std_ulogic_vector(XLEN-1 downto 0);
|
||||
-- xtea round-key adjusting --
|
||||
constant xtea_delta_c : std_ulogic_vector(31 downto 0) := x"9e3779b9";
|
||||
|
||||
-- xtea key storage (accessed via CFU CSRs) --
|
||||
type key_mem_t is array (0 to 3) of std_ulogic_vector(31 downto 0);
|
||||
signal key_mem : key_mem_t;
|
||||
|
||||
-- xtea processing logic --
|
||||
type xtea_t is record
|
||||
done : std_ulogic_vector(1 downto 0); -- multi-cycle operation SREG
|
||||
opa : std_ulogic_vector(31 downto 0); -- input operand a
|
||||
opb : std_ulogic_vector(31 downto 0); -- input operand b
|
||||
sum : std_ulogic_vector(31 downto 0); -- round key buffer
|
||||
res : std_ulogic_vector(31 downto 0); -- operation results
|
||||
end record;
|
||||
signal xtea : xtea_t;
|
||||
|
||||
-- xtea helper --
|
||||
signal tmp_a, tmp_b, tmp_x, tmp_y, tmp_z, tmp_r : std_ulogic_vector(31 downto 0);
|
||||
|
||||
begin
|
||||
|
||||
-- **************************************************************************************************************************
|
||||
-- This controller is required to handle the CFU <-> CPU interface. Do not modify!
|
||||
-- This controller is required to handle the CFU <-> CPU interface.
|
||||
-- **************************************************************************************************************************
|
||||
|
||||
-- CFU Controller -------------------------------------------------------------------------
|
||||
|
@ -122,13 +132,13 @@ begin
|
|||
control.busy <= '0';
|
||||
elsif rising_edge(clk_i) then
|
||||
res_o <= (others => '0'); -- default; all CPU co-processor outputs are logically OR-ed
|
||||
if (control.busy = '0') then -- idle
|
||||
if (start_i = '1') then -- trigger new CFU operation
|
||||
control.busy <= '1';
|
||||
if (control.busy = '0') then -- CFU is idle
|
||||
control.busy <= start_i; -- trigger new CFU operation
|
||||
else -- CFU operation in progress
|
||||
res_o <= control.result; -- output result only if CFU is processing; has to be all-zero otherwise
|
||||
if (control.done = '1') or (ctrl_i.cpu_trap = '1') then -- operation done or abort if trap (exception)
|
||||
control.busy <= '0';
|
||||
end if;
|
||||
elsif (control.done = '1') or (ctrl_i.cpu_trap = '1') then -- operation done? abort if trap (exception)
|
||||
res_o <= control.result; -- output result for just one cycle, CFU output has to be all-zero otherwise
|
||||
control.busy <= '0';
|
||||
end if;
|
||||
end if;
|
||||
end process cfu_control;
|
||||
|
@ -143,7 +153,7 @@ begin
|
|||
|
||||
|
||||
-- **************************************************************************************************************************
|
||||
-- CFU Interface Documentation
|
||||
-- CFU Hardware Documentation
|
||||
-- **************************************************************************************************************************
|
||||
|
||||
-- ----------------------------------------------------------------------------------------
|
||||
|
@ -221,7 +231,6 @@ begin
|
|||
--
|
||||
-- [NOTE] If the <control.done> signal is not set within a bound time window (default = 512 cycles) the CFU operation is
|
||||
-- automatically terminated by the hardware and an illegal instruction exception is raised. This feature can also be
|
||||
-- be used to implement custom CFU exceptions (for example to indicate invalid CFU operations).
|
||||
|
||||
-- ----------------------------------------------------------------------------------------
|
||||
-- CFU-Internal Control and Status Registers (CFU-CSRs)
|
||||
|
@ -241,145 +250,130 @@ begin
|
|||
|
||||
|
||||
-- **************************************************************************************************************************
|
||||
-- Actual CFU User Logic Example - replace this with your custom logic
|
||||
-- Actual CFU User Logic Example: XTEA - Extended Tiny Encryption Algorithm (replace this with your custom logic)
|
||||
-- **************************************************************************************************************************
|
||||
|
||||
-- CFU-Internal Control and Status Registers (CFU-CSRs) -----------------------------------
|
||||
-- This CFU example implements the Extended Tiny Encryption Algorithm (XTEA).
|
||||
-- The CFU provides 5 custom instructions to accelerate encryption and decryption using dedicated hardware.
|
||||
-- The RTL code is not optimized (not for area, not for clock speed, not for performance) and was
|
||||
-- implemented according to a software C reference (https://de.wikipedia.org/wiki/Extended_Tiny_Encryption_Algorithm).
|
||||
|
||||
|
||||
-- CFU-Internal Control and Status Registers (CFU-CSRs): 128-Bit Key Storage --------------
|
||||
-- -------------------------------------------------------------------------------------------
|
||||
-- synchronous write access --
|
||||
csr_write_access: process(rstn_i, clk_i)
|
||||
begin
|
||||
if (rstn_i = '0') then
|
||||
cfu_csr_0 <= (others => '0');
|
||||
cfu_csr_1 <= (others => '0');
|
||||
key_mem <= (others => (others => '0'));
|
||||
elsif rising_edge(clk_i) then
|
||||
if (csr_we_i = '1') and (csr_addr_i = "00") then
|
||||
cfu_csr_0 <= csr_wdata_i;
|
||||
end if;
|
||||
if (csr_we_i = '1') and (csr_addr_i = "01") then
|
||||
cfu_csr_1 <= csr_wdata_i;
|
||||
if (csr_we_i = '1') then
|
||||
key_mem(to_integer(unsigned(csr_addr_i))) <= csr_wdata_i;
|
||||
end if;
|
||||
end if;
|
||||
end process csr_write_access;
|
||||
|
||||
-- asynchronous read access --
|
||||
csr_read_access: process(csr_addr_i, cfu_csr_0, cfu_csr_1)
|
||||
begin
|
||||
case csr_addr_i is
|
||||
when "00" => csr_rdata_o <= cfu_csr_0; -- CSR0: simple read/write register
|
||||
when "01" => csr_rdata_o <= cfu_csr_1; -- CSR1: simple read/write register
|
||||
when "10" => csr_rdata_o <= x"1234abcd"; -- CSR2: hardwired/read-only register
|
||||
when others => csr_rdata_o <= (others => '0'); -- CSR3: not implemented
|
||||
end case;
|
||||
end process csr_read_access;
|
||||
csr_rdata_o <= key_mem(to_integer(unsigned(csr_addr_i)));
|
||||
|
||||
|
||||
-- Iterative Multiply-Add Unit ------------------------------------------------------------
|
||||
-- XTEA Processing Core ------------------------------------------------------------------
|
||||
-- -------------------------------------------------------------------------------------------
|
||||
-- iteration control --
|
||||
madd_control: process(rstn_i, clk_i)
|
||||
xtea_core: process(rstn_i, clk_i)
|
||||
begin
|
||||
if (rstn_i = '0') then
|
||||
madd.sreg <= (others => '0');
|
||||
xtea.done <= (others => '0');
|
||||
xtea.opa <= (others => '0');
|
||||
xtea.opb <= (others => '0');
|
||||
xtea.sum <= (others => '0');
|
||||
xtea.res <= (others => '0');
|
||||
elsif rising_edge(clk_i) then
|
||||
-- operation trigger --
|
||||
if (control.busy = '0') and -- CFU is idle (ready for next operation)
|
||||
(start_i = '1') and -- CFU is actually triggered by a custom instruction word
|
||||
(control.rtype = r4type_c) and -- this is a R4-type instruction
|
||||
(control.funct3(2 downto 1) = "00") then -- trigger only for specific funct3 values
|
||||
madd.sreg(0) <= '1';
|
||||
else
|
||||
madd.sreg(0) <= '0';
|
||||
|
||||
-- shift register for computation delay --
|
||||
xtea.done(0) <= '0'; -- default
|
||||
xtea.done(1) <= xtea.done(0);
|
||||
|
||||
-- trigger new operation --
|
||||
if (start_i = '1') and (control.rtype = r3type_c) then -- execution trigger and correct instruction type
|
||||
xtea.opa <= rs1_i; -- buffer input operand rs1 (for improved physical timing)
|
||||
xtea.opb <= rs2_i; -- buffer input operand rs2 (for improved physical timing)
|
||||
xtea.done(0) <= '1'; -- result is available in the 2nd cycle
|
||||
end if;
|
||||
-- simple shift register for tracking operation --
|
||||
madd.sreg(madd.sreg'left downto 1) <= madd.sreg(madd.sreg'left-1 downto 0); -- shift left
|
||||
|
||||
-- data processing --
|
||||
if (xtea.done(0) = '1') then -- second-stage execution trigger
|
||||
-- update "sum" round key --
|
||||
if (control.funct3(2) = '1') then -- initialize
|
||||
xtea.sum <= xtea.opa; -- set initial round key
|
||||
elsif (control.funct3(1 downto 0) = xtea_enc_v0_c(1 downto 0)) then -- encrypt v0
|
||||
xtea.sum <= std_ulogic_vector(unsigned(xtea.sum) + unsigned(xtea_delta_c));
|
||||
elsif (control.funct3(1 downto 0) = xtea_dec_v1_c(1 downto 0)) then -- decrypt v1
|
||||
xtea.sum <= std_ulogic_vector(unsigned(xtea.sum) - unsigned(xtea_delta_c));
|
||||
end if;
|
||||
-- process "v" operands --
|
||||
if (control.funct3(1) = '0') then -- encrypt
|
||||
xtea.res <= std_ulogic_vector(unsigned(tmp_b) + unsigned(tmp_r));
|
||||
else -- decrypt
|
||||
xtea.res <= std_ulogic_vector(unsigned(tmp_b) - unsigned(tmp_r));
|
||||
end if;
|
||||
end if;
|
||||
|
||||
end if;
|
||||
end process madd_control;
|
||||
end process xtea_core;
|
||||
|
||||
-- processing has reached last stage (= done) when sreg's MSB is set --
|
||||
madd.done <= madd.sreg(madd.sreg'left);
|
||||
|
||||
-- arithmetic core --
|
||||
madd_core: process(rstn_i, clk_i)
|
||||
begin
|
||||
if (rstn_i = '0') then
|
||||
madd.opa <= (others => '0');
|
||||
madd.opb <= (others => '0');
|
||||
madd.opc <= (others => '0');
|
||||
madd.mul <= (others => '0');
|
||||
madd.res <= (others => '0');
|
||||
elsif rising_edge(clk_i) then
|
||||
-- stage 0: buffer input operands --
|
||||
madd.opa <= rs1_i;
|
||||
madd.opb <= rs2_i;
|
||||
madd.opc <= rs3_i;
|
||||
-- stage 1: multiply rs1 and rs2 --
|
||||
madd.mul <= std_ulogic_vector(unsigned(madd.opa) * unsigned(madd.opb));
|
||||
-- stage 2: add rs3 to multiplication result --
|
||||
madd.res <= std_ulogic_vector(unsigned(madd.mul) + unsigned(madd.opc));
|
||||
end if;
|
||||
end process madd_core;
|
||||
-- helpers --
|
||||
tmp_a <= xtea.opb when (control.funct3(0) = '0') else xtea.opa; -- v1 / v0 select
|
||||
tmp_b <= xtea.opa when (control.funct3(0) = '0') else xtea.opb; -- v0 / v1 select
|
||||
tmp_x <= xtea.opb(27 downto 0) & "0000" when (control.funct3(0) = '0') else xtea.opa(27 downto 0) & "0000"; -- v << 4
|
||||
tmp_y <= "00000" & xtea.opb(31 downto 5) when (control.funct3(0) = '0') else "00000" & xtea.opa(31 downto 5); -- v >> 5
|
||||
tmp_z <= key_mem(to_integer(unsigned(xtea.sum(1 downto 0)))) when (control.funct3(0) = '0') else -- key[sum & 3]
|
||||
key_mem(to_integer(unsigned(xtea.sum(12 downto 11)))); -- key[(sum >> 11) & 3]
|
||||
tmp_r <= std_ulogic_vector(unsigned(tmp_x xor tmp_y) + unsigned(tmp_a)) xor std_ulogic_vector(unsigned(xtea.sum) + unsigned(tmp_z));
|
||||
|
||||
|
||||
-- Output select --------------------------------------------------------------------------
|
||||
-- Function Result Select -----------------------------------------------------------------
|
||||
-- -------------------------------------------------------------------------------------------
|
||||
out_select: process(control, rs1_i, rs2_i, rs3_i, rs4_i, madd)
|
||||
result_select: process(control, xtea)
|
||||
begin
|
||||
case control.rtype is
|
||||
|
||||
when r3type_c => -- R3-type instructions
|
||||
when r3type_c => -- R3-type instructions; function select via "funct7" and "funct3"
|
||||
-- ----------------------------------------------------------------------
|
||||
-- This is a simple ALU that implements four pure-combinatorial instructions.
|
||||
-- The actual function is selected by the "funct3" bit-field.
|
||||
case control.funct3 is
|
||||
when "000" => -- funct3 = "000": bit-reversal of rs1
|
||||
control.result <= bit_rev_f(rs1_i);
|
||||
case control.funct3 is -- Just check "funct3" here; "funct7" bit-field is ignored
|
||||
when xtea_enc_v0_c | xtea_enc_v1_c | xtea_dec_v0_c | xtea_dec_v1_c => -- encryption/decryption
|
||||
control.result <= xtea.res; -- processing result
|
||||
control.done <= xtea.done(xtea.done'left); -- multi-cycle processing done when set
|
||||
when others => -- initialization and all further unspecified operations
|
||||
control.result <= (others => '0'); -- just output zero
|
||||
control.done <= '1'; -- pure-combinatorial, so we are done "immediately"
|
||||
when "001" => -- funct3 = "001": XNOR input operands
|
||||
control.result <= not (rs1_i xor rs2_i);
|
||||
control.done <= '1'; -- pure-combinatorial, so we are done "immediately"
|
||||
when others => -- not implemented
|
||||
control.result <= (others => '0');
|
||||
control.done <= '0'; -- this will cause an illegal instruction exception after timeout
|
||||
end case;
|
||||
|
||||
when r4type_c => -- R4-type instructions
|
||||
when r4type_c => -- R4-type instructions; function select via "funct3"
|
||||
-- ----------------------------------------------------------------------
|
||||
-- This is an iterative multiply-and-add unit that requires several cycles for processing.
|
||||
-- The actual function is selected by the lowest bit of the "funct3" bit-field.
|
||||
case control.funct3 is
|
||||
when "000" => -- funct3 = "000": multiply-add low-part result: rs1*rs2+r3 [31:0]
|
||||
control.result <= madd.res(31 downto 0);
|
||||
control.done <= madd.done; -- iterative, wait for unit to finish
|
||||
when "001" => -- funct3 = "001": multiply-add high-part result: rs1*rs2+r3 [63:32]
|
||||
control.result <= madd.res(63 downto 32);
|
||||
control.done <= madd.done; -- iterative, wait for unit to finish
|
||||
when others => -- not implemented
|
||||
control.result <= (others => '0');
|
||||
control.done <= '0'; -- this will cause an illegal instruction exception after timeout
|
||||
end case;
|
||||
control.result <= (others => '0'); -- no logic implemented
|
||||
control.done <= '0'; -- this will cause an illegal instruction exception after timeout
|
||||
|
||||
when r5typeA_c => -- R5-type instruction A
|
||||
-- ----------------------------------------------------------------------
|
||||
-- No function/immediate bit-fields are available for this instruction type.
|
||||
-- Hence, there is just one operation that can be implemented.
|
||||
control.result <= rs1_i and rs2_i and rs3_i and rs4_i; -- AND-all
|
||||
control.done <= '1'; -- pure-combinatorial, so we are done "immediately"
|
||||
control.result <= (others => '0'); -- no logic implemented
|
||||
control.done <= '0'; -- this will cause an illegal instruction exception after timeout
|
||||
|
||||
when r5typeB_c => -- R5-type instruction B
|
||||
-- ----------------------------------------------------------------------
|
||||
-- No function/immediate bit-fields are available for this instruction type.
|
||||
-- Hence, there is just one operation that can be implemented.
|
||||
control.result <= rs1_i xor rs2_i xor rs3_i xor rs4_i; -- XOR-all
|
||||
control.done <= '1'; -- pure-combinatorial, so we are done "immediately"
|
||||
control.result <= (others => '0'); -- no logic implemented
|
||||
control.done <= '0'; -- this will cause an illegal instruction exception after timeout
|
||||
|
||||
when others => -- undefined
|
||||
-- ----------------------------------------------------------------------
|
||||
control.result <= (others => '0');
|
||||
control.done <= '0';
|
||||
control.result <= (others => '0'); -- no logic implemented
|
||||
control.done <= '0'; -- this will cause an illegal instruction exception after timeout
|
||||
|
||||
end case;
|
||||
end process out_select;
|
||||
end process result_select;
|
||||
|
||||
|
||||
end neorv32_cpu_cp_cfu_rtl;
|
||||
|
|
|
@ -52,7 +52,7 @@ package neorv32_package is
|
|||
|
||||
-- Architecture Constants -----------------------------------------------------------------
|
||||
-- -------------------------------------------------------------------------------------------
|
||||
constant hw_version_c : std_ulogic_vector(31 downto 0) := x"01090608"; -- hardware version
|
||||
constant hw_version_c : std_ulogic_vector(31 downto 0) := x"01090609"; -- hardware version
|
||||
constant archid_c : natural := 19; -- official RISC-V architecture ID
|
||||
constant XLEN : natural := 32; -- native data path width
|
||||
|
||||
|
|
|
@ -36,7 +36,7 @@
|
|||
/**********************************************************************//**
|
||||
* @file demo_cfu/main.c
|
||||
* @author Stephan Nolting
|
||||
* @brief Example program showing how to use the CFU's custom instructions.
|
||||
* @brief Example program showing how to use the CFU's custom instructions (XTEA example).
|
||||
* @note Take a look at the highly-commented "hardware-counterpart" of this CFU
|
||||
* example in 'rtl/core/neorv32_cpu_cp_cfu.vhd'.
|
||||
**************************************************************************/
|
||||
|
@ -49,8 +49,59 @@
|
|||
/**@{*/
|
||||
/** UART BAUD rate */
|
||||
#define BAUD_RATE 19200
|
||||
/** Number of test cases per CFU instruction */
|
||||
#define TESTCASES 4
|
||||
/** Number XTEA cycles */
|
||||
#define XTEA_CYCLES 20
|
||||
/** Input data size (in number of 32-bit words), has to be even */
|
||||
#define DATA_NUM 64
|
||||
/**@}*/
|
||||
|
||||
|
||||
/**********************************************************************//**
|
||||
* @name Define macros for easy custom instruction wrapping
|
||||
**************************************************************************/
|
||||
/**@{*/
|
||||
#define xtea_hw_init(sum) neorv32_cfu_r3_instr(0b0000000, 0b100, sum, 0)
|
||||
#define xtea_hw_enc_v0_step(v0, v1) neorv32_cfu_r3_instr(0b0000000, 0b000, v0, v1)
|
||||
#define xtea_hw_enc_v1_step(v0, v1) neorv32_cfu_r3_instr(0b0000000, 0b001, v0, v1)
|
||||
#define xtea_hw_dec_v0_step(v0, v1) neorv32_cfu_r3_instr(0b0000000, 0b010, v0, v1)
|
||||
#define xtea_hw_dec_v1_step(v0, v1) neorv32_cfu_r3_instr(0b0000000, 0b011, v0, v1)
|
||||
/**@}*/
|
||||
|
||||
// The CFU custom instructions can be used as plain C functions as they are simple "intrinsics".
|
||||
// There are 4 "prototype primitives" for the CFU instructions (define in sw/lib/include/neorv32_cfu.h):
|
||||
//
|
||||
// > neorv32_cfu_r3_instr(funct7, funct3, rs1, rs2) - for r3-type instructions (custom-0 opcode)
|
||||
// > neorv32_cfu_r4_instr(funct3, rs1, rs2, rs3) - for r4-type instructions (custom-1 opcode)
|
||||
// > neorv32_cfu_r5_instr_a(rs1, rs2, rs3, rs4) - for r5-type instruction A (custom-2 opcode)
|
||||
// > neorv32_cfu_r5_instr_b(rs1, rs2, rs3, rs4) - for r5-type instruction B (custom-3 opcode)
|
||||
//
|
||||
// Every instance of these functions is converted into a single 32-bit RISC-V instruction word
|
||||
// without any calling overhead at all (see the generated assembly code).
|
||||
//
|
||||
// The "rs*" source operands can be literals, variables, function return values, ... - you name it.
|
||||
// The 7-bit immediate ("funct7") and the 3-bit immediate ("funct3") values can be used to pass
|
||||
// compile-time static literal data to the CFU or to do a fine-grained function selection.
|
||||
//
|
||||
// Each "neorv32_cfu_r*" function returns a 32-bit data word of type uint32_t that represents
|
||||
// the processing result of the according instruction.
|
||||
|
||||
|
||||
/**********************************************************************//**
|
||||
* @name Global variables
|
||||
**************************************************************************/
|
||||
/**@{*/
|
||||
/** XTEA delta (round-key update) */
|
||||
const uint32_t xtea_delta = 0x9e3779b9;
|
||||
/** Encryption/decryption key (128-bit) */
|
||||
const uint32_t key[4] = {0x207230ba, 0x1ffba710, 0xc45271ef, 0xdd01768a};
|
||||
/** Encryption input data */
|
||||
uint32_t input_data[DATA_NUM];
|
||||
/** Encryption results */
|
||||
uint32_t cypher_data_sw[DATA_NUM], cypher_data_hw[DATA_NUM];
|
||||
/** Decryption results */
|
||||
uint32_t plain_data_sw[DATA_NUM], plain_data_hw[DATA_NUM];
|
||||
/** Timing data */
|
||||
uint32_t time_enc_sw, time_enc_hw, time_dec_sw, time_dec_hw;
|
||||
/**@}*/
|
||||
|
||||
|
||||
|
@ -72,173 +123,235 @@ uint32_t xorshift32(void) {
|
|||
|
||||
|
||||
/**********************************************************************//**
|
||||
* Main function
|
||||
* XTEA encryption - software reference
|
||||
*
|
||||
* @note This program requires the CFU and UART0.
|
||||
* Source: https://de.wikipedia.org/wiki/Extended_Tiny_Encryption_Algorithm
|
||||
*
|
||||
* @param[in] num_cycles Number of encryption cycles.
|
||||
* @param[in,out] v Encryption data/result array (2x32-bit).
|
||||
* @param[in] k Encryption key array (4x32-bit).
|
||||
**************************************************************************/
|
||||
void xtea_sw_encipher(uint32_t num_cycles, uint32_t *v, const uint32_t k[4]) {
|
||||
|
||||
uint32_t i = 0;
|
||||
uint32_t v0 = v[0];
|
||||
uint32_t v1 = v[1];
|
||||
uint32_t sum = 0;
|
||||
|
||||
for (i=0; i < num_cycles; i++) {
|
||||
v0 += (((v1 << 4) ^ (v1 >> 5)) + v1) ^ (sum + k[sum & 3]);
|
||||
sum += xtea_delta;
|
||||
v1 += (((v0 << 4) ^ (v0 >> 5)) + v0) ^ (sum + k[(sum>>11) & 3]);
|
||||
}
|
||||
|
||||
v[0] = v0;
|
||||
v[1] = v1;
|
||||
}
|
||||
|
||||
|
||||
/**********************************************************************//**
|
||||
* XTEA decryption - software reference
|
||||
*
|
||||
* Source: https://de.wikipedia.org/wiki/Extended_Tiny_Encryption_Algorithm
|
||||
*
|
||||
* @param[in] num_cycles Number of encryption cycles.
|
||||
* @param[in,out] v Decryption data/result array (2x32-bit).
|
||||
* @param[in] k Decryption key array (4x32-bit).
|
||||
**************************************************************************/
|
||||
void xtea_sw_decipher(unsigned int num_cycles, uint32_t *v, const uint32_t k[4]) {
|
||||
|
||||
uint32_t i = 0;
|
||||
uint32_t v0 = v[0];
|
||||
uint32_t v1 = v[1];
|
||||
uint32_t sum = xtea_delta * num_cycles;
|
||||
|
||||
for (i=0; i < num_cycles; i++) {
|
||||
v1 -= (((v0 << 4) ^ (v0 >> 5)) + v0) ^ (sum + k[(sum>>11) & 3]);
|
||||
sum -= xtea_delta;
|
||||
v0 -= (((v1 << 4) ^ (v1 >> 5)) + v1) ^ (sum + k[sum & 3]);
|
||||
}
|
||||
|
||||
v[0] = v0;
|
||||
v[1] = v1;
|
||||
}
|
||||
|
||||
|
||||
/**********************************************************************//**
|
||||
* Main function: run pure-SW XTEA and compare with HW-XTEA
|
||||
*
|
||||
* @note This program requires the CFU, UART0 and the Zicntr ISA extension.
|
||||
*
|
||||
* @return 0 if execution was successful
|
||||
**************************************************************************/
|
||||
int main() {
|
||||
|
||||
uint32_t i, rs1, rs2, rs3, rs4;
|
||||
uint32_t i, j;
|
||||
uint32_t v[2];
|
||||
|
||||
// initialize NEORV32 run-time environment
|
||||
neorv32_rte_setup();
|
||||
|
||||
// setup UART at default baud rate, no interrupts
|
||||
neorv32_uart0_setup(BAUD_RATE, 0);
|
||||
|
||||
// check if UART0 is implemented
|
||||
if (neorv32_uart0_available() == 0) {
|
||||
return 1; // UART0 not available, exit
|
||||
return -1; // UART0 not available, exit
|
||||
}
|
||||
|
||||
// check if the CFU is implemented at all (the CFU is wrapped in the core's "Zxcfu" ISA extension)
|
||||
// setup UART0 at default baud rate, no interrupts
|
||||
neorv32_uart0_setup(BAUD_RATE, 0);
|
||||
|
||||
// check if the CFU is implemented (the CFU is wrapped in the core's "Zxcfu" ISA extension)
|
||||
if (neorv32_cpu_cfu_available() == 0) {
|
||||
neorv32_uart0_printf("ERROR! CFU ('Zxcfu' ISA extensions) not implemented!\n");
|
||||
return 1;
|
||||
return -1;
|
||||
}
|
||||
|
||||
// check if the CPU base counters are implemented
|
||||
if ((neorv32_cpu_csr_read(CSR_MXISA) & (1 << CSR_MXISA_ZICNTR)) == 0) {
|
||||
neorv32_uart0_printf("ERROR! Base counters ('Zicntr' ISA extensions) not implemented!\n");
|
||||
return -1;
|
||||
}
|
||||
|
||||
// check if data size configuration is even
|
||||
if ((DATA_NUM & 1) != 0) {
|
||||
neorv32_uart0_printf("ERROR! DATA_NUM has to be even!\n");
|
||||
return -1;
|
||||
}
|
||||
|
||||
// intro
|
||||
neorv32_uart0_printf("\n<<< NEORV32 Custom Functions Unit (CFU) - Custom Instructions Example >>>\n\n");
|
||||
|
||||
neorv32_uart0_printf("[NOTE] This program assumes the _default_ CFU hardware module, which\n"
|
||||
" implements simple and exemplary data processing instructions.\n\n");
|
||||
|
||||
/*
|
||||
The CFU custom instructions can be used as plain C functions as they are simple "intrinsics".
|
||||
|
||||
There are 4 "prototype primitives" for the CFU instructions (define in sw/lib/include/neorv32_cfu.h):
|
||||
|
||||
> neorv32_cfu_r3_instr(funct7, funct3, rs1, rs2) - for r3-type instructions (custom-0 opcode)
|
||||
> neorv32_cfu_r4_instr(funct3, rs1, rs2, rs3) - for r4-type instructions (custom-1 opcode)
|
||||
> neorv32_cfu_r5_instr_a(rs1, rs2, rs3, rs4) - for r5-type instruction A (custom-2 opcode)
|
||||
> neorv32_cfu_r5_instr_b(rs1, rs2, rs3, rs4) - for r5-type instruction B (custom-3 opcode)
|
||||
|
||||
Every "call" of these functions is turned into a single 32-bit ISC-V instruction word
|
||||
without any calling overhead at all (see the generated assembly code).
|
||||
|
||||
The "rs*" operands can be literals, variables, function return values, ... - you name it.
|
||||
The 7-bit immediate ("funct7") and the 3-bit immediate ("funct3") values can be used to pass
|
||||
_compile-time static_ literals to the CFU or to do a fine-grained function selection.
|
||||
|
||||
Each "neorv32_cfu_r*" function returns a 32-bit data word of type uint32_t that represents
|
||||
the result of the according instruction.
|
||||
*/
|
||||
neorv32_uart0_printf("[NOTE] This program assumes the default CFU hardware module that\n"
|
||||
" implements the Extended Tiny Encryption Algorithm (XTEA).\n\n");
|
||||
|
||||
|
||||
// ----------------------------------------------------------
|
||||
// R3-type instructions (up to 1024 custom instructions)
|
||||
// XTEA example
|
||||
// ----------------------------------------------------------
|
||||
|
||||
neorv32_uart0_printf("\n--- CFU R3-Type: Bit-Reversal Instruction ---\n");
|
||||
for (i=0; i<TESTCASES; i++) {
|
||||
rs1 = xorshift32();
|
||||
neorv32_uart0_printf("%u: neorv32_cfu_r3_instr( funct7=0b1111111, funct3=0b000, [rs1]=0x%x, [rs2]=0x%x ) = ", i, rs1, 0);
|
||||
// here we are setting the funct7 bit-field to all-one; however, this is not used at all by the default CFU hardware module.
|
||||
neorv32_uart0_printf("0x%x\n", neorv32_cfu_r3_instr(0b1111111, 0b000, rs1, 0));
|
||||
}
|
||||
// set XTEA-CFU key storage (the CFU CSRs)
|
||||
neorv32_cpu_csr_write(CSR_CFUREG0, key[0]);
|
||||
neorv32_cpu_csr_write(CSR_CFUREG1, key[1]);
|
||||
neorv32_cpu_csr_write(CSR_CFUREG2, key[2]);
|
||||
neorv32_cpu_csr_write(CSR_CFUREG3, key[3]);
|
||||
|
||||
neorv32_uart0_printf("\n--- CFU R3-Type: XNOR Instruction ---\n");
|
||||
for (i=0; i<TESTCASES; i++) {
|
||||
rs1 = xorshift32();
|
||||
rs2 = xorshift32();
|
||||
neorv32_uart0_printf("%u: neorv32_cfu_r3_instr( funct7=0b0000000, funct3=0b001, [rs1]=0x%x, [rs2]=0x%x ) = ", i, rs1, rs2);
|
||||
neorv32_uart0_printf("0x%x\n", neorv32_cfu_r3_instr(0b0000000, 0b001, rs1, rs2));
|
||||
// read-back CSRs and print key
|
||||
neorv32_uart0_printf("XTEA key: 0x%x%x%x%x\n\n",
|
||||
neorv32_cpu_csr_read(CSR_CFUREG0),
|
||||
neorv32_cpu_csr_read(CSR_CFUREG1),
|
||||
neorv32_cpu_csr_read(CSR_CFUREG2),
|
||||
neorv32_cpu_csr_read(CSR_CFUREG3));
|
||||
|
||||
// generate "random" data for the plain text
|
||||
for (i=0; i<DATA_NUM; i++) {
|
||||
input_data[i] = xorshift32();
|
||||
}
|
||||
|
||||
|
||||
// ----------------------------------------------------------
|
||||
// R4-type instructions (up to 8 custom instructions)
|
||||
// XTEA encryption
|
||||
// ----------------------------------------------------------
|
||||
|
||||
// You can use macros to simplify the usage of the custom instructions.
|
||||
#define madd_lo(a, b, c) neorv32_cfu_r4_instr(0b000, a, b, c)
|
||||
#define madd_hi(a, b, c) neorv32_cfu_r4_instr(0b001, a, b, c)
|
||||
// encryption using software only
|
||||
neorv32_uart0_printf("XTEA SW encryption (%u rounds, %u words)...\n", 2*XTEA_CYCLES, DATA_NUM);
|
||||
|
||||
neorv32_uart0_printf("\n--- CFU R4-Type: Multiply-Add (Low-Part) Instruction ---\n");
|
||||
for (i=0; i<TESTCASES; i++) {
|
||||
rs1 = xorshift32();
|
||||
rs2 = xorshift32();
|
||||
rs3 = xorshift32();
|
||||
neorv32_uart0_printf("%u: neorv32_cfu_r4_instr( funct3=0b000, [rs1]=0x%x, [rs2]=0x%x, [rs3]=0x%x ) = ", i, rs1, rs2, rs3);
|
||||
neorv32_uart0_printf("0x%x\n", madd_lo(rs1, rs2, rs3));
|
||||
neorv32_cpu_csr_write(CSR_MCYCLE, 0); // start timing
|
||||
for (i=0; i<(DATA_NUM/2); i++) {
|
||||
v[0] = input_data[i*2+0];
|
||||
v[1] = input_data[i*2+1];
|
||||
xtea_sw_encipher(XTEA_CYCLES, v, key);
|
||||
cypher_data_sw[i*2+0] = v[0];
|
||||
cypher_data_sw[i*2+1] = v[1];
|
||||
}
|
||||
time_enc_sw = neorv32_cpu_csr_read(CSR_MCYCLE); // stop timing
|
||||
|
||||
neorv32_uart0_printf("\n--- CFU R4-Type: Multiply-Add (High-Part) Instruction ---\n");
|
||||
for (i=0; i<TESTCASES; i++) {
|
||||
rs1 = xorshift32();
|
||||
rs2 = xorshift32();
|
||||
rs3 = xorshift32();
|
||||
neorv32_uart0_printf("%u: neorv32_cfu_r4_instr( funct3=0b001, [rs1]=0x%x, [rs2]=0x%x, [rs3]=0x%x ) = ", i, rs1, rs2, rs3);
|
||||
neorv32_uart0_printf("0x%x\n", madd_hi(rs1, rs2, rs3));
|
||||
|
||||
// encryption using the XTEA CFU
|
||||
neorv32_uart0_printf("XTEA HW encryption (%u rounds, %u words)...\n", 2*XTEA_CYCLES, DATA_NUM);
|
||||
|
||||
neorv32_cpu_csr_write(CSR_MCYCLE, 0); // start timing
|
||||
for (i=0; i<(DATA_NUM/2); i++) {
|
||||
v[0] = input_data[i*2+0];
|
||||
v[1] = input_data[i*2+1];
|
||||
xtea_hw_init(0);
|
||||
for (j=0; j<XTEA_CYCLES; j++) {
|
||||
v[0] = xtea_hw_enc_v0_step(v[0], v[1]);
|
||||
v[1] = xtea_hw_enc_v1_step(v[0], v[1]);
|
||||
}
|
||||
cypher_data_hw[i*2+0] = v[0];
|
||||
cypher_data_hw[i*2+1] = v[1];
|
||||
}
|
||||
time_enc_hw = neorv32_cpu_csr_read(CSR_MCYCLE); // stop timing
|
||||
|
||||
|
||||
// ----------------------------------------------------------
|
||||
// R5-type instruction A (only 1 custom instruction)
|
||||
// ----------------------------------------------------------
|
||||
|
||||
neorv32_uart0_printf("\n--- CFU R5-Type A: AND-All Instruction ---\n");
|
||||
for (i=0; i<TESTCASES; i++) {
|
||||
rs1 = xorshift32();
|
||||
rs2 = xorshift32();
|
||||
rs3 = xorshift32();
|
||||
rs4 = xorshift32();
|
||||
neorv32_uart0_printf("%u: neorv32_cfu_r5_instr_a( [rs1]=0x%x, [rs2]=0x%x, [rs3]=0x%x, [rs3]=0x%x ) = ", i, rs1, rs2, rs3, rs4);
|
||||
neorv32_uart0_printf("0x%x\n", neorv32_cfu_r5_instr_a(rs1, rs2, rs3, rs4));
|
||||
// compare results
|
||||
neorv32_uart0_printf("Comparing results... ");
|
||||
for (i=0; i<DATA_NUM; i++) {
|
||||
if (cypher_data_sw[i] != cypher_data_hw[i]) {
|
||||
neorv32_uart0_printf("FAILED\n");
|
||||
return -1;
|
||||
}
|
||||
}
|
||||
neorv32_uart0_printf("OK\n");
|
||||
|
||||
|
||||
// ----------------------------------------------------------
|
||||
// R5-type instruction B (only 1 custom instruction)
|
||||
// XTEA decryption
|
||||
// ----------------------------------------------------------
|
||||
neorv32_uart0_printf("\n");
|
||||
|
||||
neorv32_uart0_printf("\n--- CFU R5-Type B: XOR-All Instruction ---\n");
|
||||
for (i=0; i<TESTCASES; i++) {
|
||||
rs1 = xorshift32();
|
||||
rs2 = xorshift32();
|
||||
rs3 = xorshift32();
|
||||
rs4 = xorshift32();
|
||||
neorv32_uart0_printf("%u: neorv32_cfu_r5_instr_b( [rs1]=0x%x, [rs2]=0x%x, [rs3]=0x%x, [rs3]=0x%x ) = ", i, rs1, rs2, rs3, rs4);
|
||||
neorv32_uart0_printf("0x%x\n", neorv32_cfu_r5_instr_b(rs1, rs2, rs3, rs4));
|
||||
// decryption using software only
|
||||
neorv32_uart0_printf("XTEA SW decryption (%u rounds, %u words)...\n", 2*XTEA_CYCLES, DATA_NUM);
|
||||
|
||||
neorv32_cpu_csr_write(CSR_MCYCLE, 0); // start timing
|
||||
for (i=0; i<(DATA_NUM/2); i++) {
|
||||
v[0] = cypher_data_sw[i*2+0];
|
||||
v[1] = cypher_data_sw[i*2+1];
|
||||
xtea_sw_decipher(XTEA_CYCLES, v, key);
|
||||
plain_data_sw[i*2+0] = v[0];
|
||||
plain_data_sw[i*2+1] = v[1];
|
||||
}
|
||||
time_dec_sw = neorv32_cpu_csr_read(CSR_MCYCLE); // stop timing
|
||||
|
||||
|
||||
// ----------------------------------------------------------
|
||||
// Unimplemented R3-type (=illegal) instruction
|
||||
// ----------------------------------------------------------
|
||||
// decryption using the XTEA CFU
|
||||
neorv32_uart0_printf("XTEA HW decryption (%u rounds, %u words)...\n", 2*XTEA_CYCLES, DATA_NUM);
|
||||
|
||||
neorv32_uart0_printf("\n--- CFU Unimplemented (= illegal) R3-Type ---\n");
|
||||
for (i=0; i<TESTCASES; i++) {
|
||||
rs1 = xorshift32();
|
||||
rs2 = xorshift32();
|
||||
// this funct3 is NOT implemented by the default CFU hardware causing an illegal instruction exception
|
||||
// due to a multi-cycle execution timeout (processing does not complete within a bound time)
|
||||
neorv32_uart0_printf("%u: neorv32_cfu_r3_instr( funct7=0b0000000, funct3=0b111, [rs1]=0x%x, [rs2]=0x%x ) = ", i, rs1, rs2);
|
||||
neorv32_uart0_printf("0x%x\n", neorv32_cfu_r3_instr(0b0000000, 0b111, rs1, rs2));
|
||||
neorv32_cpu_csr_write(CSR_MCYCLE, 0); // start timing
|
||||
for (i=0; i<(DATA_NUM/2); i++) {
|
||||
v[0] = cypher_data_hw[i*2+0];
|
||||
v[1] = cypher_data_hw[i*2+1];
|
||||
xtea_hw_init(XTEA_CYCLES * xtea_delta);
|
||||
for (j=0; j<XTEA_CYCLES; j++) {
|
||||
v[1] = xtea_hw_dec_v1_step(v[0], v[1]);
|
||||
v[0] = xtea_hw_dec_v0_step(v[0], v[1]);
|
||||
}
|
||||
plain_data_hw[i*2+0] = v[0];
|
||||
plain_data_hw[i*2+1] = v[1];
|
||||
}
|
||||
time_dec_hw = neorv32_cpu_csr_read(CSR_MCYCLE); // stop timing
|
||||
|
||||
|
||||
// compare results
|
||||
neorv32_uart0_printf("Comparing results... ");
|
||||
for (i=0; i<DATA_NUM; i++) {
|
||||
if (plain_data_sw[i] != plain_data_hw[i]) {
|
||||
neorv32_uart0_printf("FAILED\n");
|
||||
return -1;
|
||||
}
|
||||
}
|
||||
neorv32_uart0_printf("OK\n");
|
||||
|
||||
|
||||
// ----------------------------------------------------------
|
||||
// CFU-internal control and status registers (CFU-CSRs)
|
||||
// Print benchmarking results
|
||||
// ----------------------------------------------------------
|
||||
neorv32_uart0_printf("\n--- CFU CSRs: Control and Status Registers ---\n");
|
||||
neorv32_uart0_printf("\nExecution benchmarking:\n");
|
||||
neorv32_uart0_printf("ENC SW = %u cycles\n", time_enc_sw);
|
||||
neorv32_uart0_printf("ENC HW = %u cycles\n", time_enc_hw);
|
||||
neorv32_uart0_printf("DEC SW = %u cycles\n", time_dec_sw);
|
||||
neorv32_uart0_printf("DEC HW = %u cycles\n", time_dec_hw);
|
||||
|
||||
|
||||
neorv32_cpu_csr_write(CSR_CFUREG0, 0xffffffff); // just write some exemplary data to CSR
|
||||
neorv32_uart0_printf("CFU-CSR 0 = 0x%x\n", neorv32_cpu_csr_read(CSR_CFUREG0)); // read-back data from CSR
|
||||
|
||||
neorv32_cpu_csr_write(CSR_CFUREG1, 0x12345678);
|
||||
neorv32_uart0_printf("CFU-CSR 1 = 0x%x\n", neorv32_cpu_csr_read(CSR_CFUREG1));
|
||||
|
||||
neorv32_cpu_csr_write(CSR_CFUREG2, 0x22334455);
|
||||
neorv32_uart0_printf("CFU-CSR 2 = 0x%x\n", neorv32_cpu_csr_read(CSR_CFUREG2));
|
||||
|
||||
neorv32_cpu_csr_write(CSR_CFUREG3, 0xabcdabcd);
|
||||
neorv32_uart0_printf("CFU-CSR 3 = 0x%x\n", neorv32_cpu_csr_read(CSR_CFUREG3));
|
||||
|
||||
|
||||
neorv32_uart0_printf("\nCFU demo program completed.\n");
|
||||
return 0;
|
||||
|
|
Loading…
Add table
Add a link
Reference in a new issue