Adding support for Scalar Cryptography Extensions (Zkn -- Zbkx, Zkne, Zknd, Zknh) (#2804)

* Introduction This PR adds support for Zbkx, Zkne, Zknd and Zknh extensions in the CVA6 core. It also adds the documentation and tests for these extensions. These changes have been tested with self-written single instruction tests and with the riscv-arch-tests. This PR will complete the Zkn - NIST Algorithm Suite extension. * Implementation Zbkx Extension: Added support for the Zbkx instruction set. It essentially expands the Bitmanip extension with additional instructions useful in cryptography. These instructions are xperm8, xperm4. Zkne Extension: Added support for the Zkne instruction set. It essentially adds AES encryption support for scalar cryptography. These instructions are aes32esi, aes32esmi, aes64es, aes64esm, aes64ks1i, aes64ks2. Zknd Extension: Added support for the Zknd instruction set. It adds AES decryption support for scalar cryptography. These instructions are aes32dsi, aes32dsmi, aes64ds, aes64dsm, aes64im, aes64ks1i, aes64ks2. Note: The aes64ks1i and aes64ks2 instructions are present in both the Zknd and Zkne extensions. Zknh Extension: Added support for the Zknh instruction set. It adds the hash function instructions support for scalar cryptography. These instructions are sha256sig0, sha256sig1, sha256sum0, sha256sum1, sha512sig0h, sha512sig0l, sha512sig1h, sha512sig1l, sha512sum0r, sha512sum1r, sha512sig0, sha512sig1, sha512sum0, sha512sum1. * Modifications Updated the ALU and decoder to recognize and handle Zbkx instructions. For Zkne, Zknd & Zknh, the decoder will now select the AES unit as functional unit instead of the ALU. The complete Zkn extension is added under the ZKN bit for ease of use. This configuration will also require the RVB (bitmanip) bit to be set. Note: The Zkn extension does not require the use of vectorial fpu. * AES Functional Unit A new functional unit was created inside the execute stage that will handle all AES and Hashing instructions (Zkne, Zknd, Zknh). A new package "aes_pkg" handles all AES functions such as sbox substitution, mix columns, etc. aes_unit * Documentation and Reference The official RISC-V Cryptography Extensions Volume I was followed to ensure alignment with ratification. The relevant documentation for Zbkx, Zkne, Zknd and Zknh instructions was also added. * Verification Assembly Tests: The instructions were tested and verified with the K module of both 32 bit and 64 bit versions of the riscv-arch-tests to ensure proper functionality. These tests check for ISA compliance, edge cases and use assertions to ensure expected behavior.
2025-06-27 17:00:57 -04:00 · 2025-05-11 21:02:28 +05:00 · 2025-05-11 21:02:28 +05:00 · 6d9b76e560
commit 6d9b76e560
parent 4a3629bff7
17 changed files with 1607 additions and 21 deletions
--- a/core/Flist.cva6
+++ b/core/Flist.cva6
@ -71,6 +71,7 @@ ${CVA6_REPO_DIR}/core/include/wt_cache_pkg.sv
 ${CVA6_REPO_DIR}/core/include/std_cache_pkg.sv
 ${CVA6_REPO_DIR}/core/include/instr_tracer_pkg.sv
 ${CVA6_REPO_DIR}/core/include/build_config_pkg.sv
+${CVA6_REPO_DIR}/core/include/aes_pkg.sv

 //CVXIF
 ${CVA6_REPO_DIR}/core/cvxif_compressed_if_driver.sv
@ -106,6 +107,7 @@ ${CVA6_REPO_DIR}/vendor/pulp-platform/common_cells/src/delta_counter.sv
 ${CVA6_REPO_DIR}/core/cva6.sv
 ${CVA6_REPO_DIR}/core/cva6_rvfi_probes.sv
 ${CVA6_REPO_DIR}/core/alu.sv
+${CVA6_REPO_DIR}/core/aes.sv
 // Note: depends on fpnew_pkg, above
 ${CVA6_REPO_DIR}/core/fpu_wrap.sv
 ${CVA6_REPO_DIR}/core/branch_unit.sv
--- a/core/aes.sv
+++ b/core/aes.sv
@ -0,0 +1,234 @@
+// Licensed under the Solderpad Hardware Licence, Version 2.1 (the "License");
+// you may not use this file except in compliance with the License.
+// SPDX-License-Identifier: Apache-2.0 WITH SHL-2.1
+// You may obtain a copy of the License at https://solderpad.org/licenses/
+//
+// Author: Munail Waqar, 10xEngineers
+// Date: 03.05.2025
+// Description: The Zkn extension including its subsets accelerates cryptographic workloads by introducing dedicated
+// scalar instructions compliant with the RISC-V Scalar Cryptography specification. The subsets include:
+// Zknd (AES Decryption and related instructions), Zkne (AES Encryption support, including AES rounds and key expansion steps),
+// Zknh (SHA-256 and SHA-512 hash functions for secure hashing operations).
+//
+module aes
+  import ariane_pkg::*;
+  import aes_pkg::*;
+#(
+    parameter config_pkg::cva6_cfg_t CVA6Cfg = config_pkg::cva6_cfg_empty,
+    parameter type fu_data_t = logic
+) (
+    // Subsystem Clock - SUBSYSTEM
+    input  logic                        clk_i,
+    // Asynchronous reset active low - SUBSYSTEM
+    input  logic                        rst_ni,
+    // FU data needed to execute instruction - ISSUE_STAGE
+    input  fu_data_t                    fu_data_i,
+    // Original instruction bits for aes
+    input  logic     [             5:0] orig_instr_aes,
+    // AES result - ISSUE_STAGE
+    output logic     [CVA6Cfg.XLEN-1:0] result_o
+);
+
+  logic [63:0] sr;
+  logic [ 7:0] sbox_in;
+  logic [31:0] aes32esi_gen;
+  logic [31:0] aes32esmi_gen;
+  logic [63:0] aes64es_gen;
+  logic [63:0] aes64esm_gen;
+  logic [31:0] aes32dsi_gen;
+  logic [31:0] aes32dsmi_gen;
+  logic [63:0] sr_inv;
+  logic [63:0] aes64ds_gen;
+  logic [63:0] aes64dsm_gen;
+  logic [63:0] aes64im_gen;
+  logic [63:0] aes64ks1i_gen;
+  logic [63:0] aes64ks2_gen;
+
+  logic [31:0] sha256sig0_gen;
+  logic [31:0] sha256sig1_gen;
+  logic [31:0] sha256sum0_gen;
+  logic [31:0] sha256sum1_gen;
+
+  logic [31:0] sha512sig0h_gen;
+  logic [31:0] sha512sig0l_gen;
+  logic [31:0] sha512sig1h_gen;
+  logic [31:0] sha512sig1l_gen;
+  logic [31:0] sha512sum0r_gen;
+  logic [31:0] sha512sum1r_gen;
+
+  logic [63:0] sha512sig0_gen;
+  logic [63:0] sha512sig1_gen;
+  logic [63:0] sha512sum0_gen;
+  logic [63:0] sha512sum1_gen;
+
+  // AES gen block
+  if (CVA6Cfg.ZKN && CVA6Cfg.RVB) begin : aes_gen_block
+    // SHA256 sigma0 transformation function by rotating, shifting and XORing rs1
+    assign sha256sig0_gen = (fu_data_i.operand_a[31:0] >> 7 | fu_data_i.operand_a[31:0] << 25) ^ (fu_data_i.operand_a[31:0] >> 18 | fu_data_i.operand_a[31:0] << 14) ^ (fu_data_i.operand_a[31:0] >> 3);
+    // SHA256 sigma1 transformation function by rotating, shifting and XORing rs1
+    assign sha256sig1_gen = (fu_data_i.operand_a[31:0] >> 17 | fu_data_i.operand_a[31:0] << 15) ^ (fu_data_i.operand_a[31:0] >> 19 | fu_data_i.operand_a[31:0] << 13) ^ (fu_data_i.operand_a[31:0] >> 10);
+    // SHA256 sum0 transformation function by rotating, shifting and XORing rs1
+    assign sha256sum0_gen = (fu_data_i.operand_a[31:0] >> 2 | fu_data_i.operand_a[31:0] << 30) ^ (fu_data_i.operand_a[31:0] >> 13 | fu_data_i.operand_a[31:0] << 19) ^ (fu_data_i.operand_a[31:0] >> 22 | fu_data_i.operand_a[31:0] << 10);
+    // SHA256 sum1 transformation function by rotating, shifting and XORing rs1
+    assign sha256sum1_gen = (fu_data_i.operand_a[31:0] >> 6 | fu_data_i.operand_a[31:0] << 26) ^ (fu_data_i.operand_a[31:0] >> 11 | fu_data_i.operand_a[31:0] << 21) ^ (fu_data_i.operand_a[31:0] >> 25 | fu_data_i.operand_a[31:0] << 7);
+    if (CVA6Cfg.IS_XLEN32) begin
+      assign sbox_in = fu_data_i.operand_b >> {orig_instr_aes[5:4], 3'b000};
+      // AES 32-bit final round encryption by applying rotations and the forward sbox to a single byte of rs2 based on the MSB byte of the instruction itself  
+      assign aes32esi_gen = (fu_data_i.operand_a ^ ({24'b0, aes_sbox_fwd(
+          sbox_in[7:0]
+      )} << {orig_instr_aes[5:4], 3'b000}) | ({24'b0, aes_sbox_fwd(
+          sbox_in[7:0]
+      )} >> (32 - {orig_instr_aes[5:4], 3'b000})));
+      // AES 32-bit middle round encryption by applying rotations, forward mix-columns and the forward sbox to a single byte of rs2 based on the MSB byte of the instruction itself
+      assign aes32esmi_gen = fu_data_i.operand_a ^ ((aes_mixcolumn_fwd(
+          {24'h000000, aes_sbox_fwd(sbox_in[7:0])}
+      ) << {orig_instr_aes[5:4], 3'b000}) | (aes_mixcolumn_fwd(
+          {24'h000000, aes_sbox_fwd(sbox_in[7:0])}
+      ) >> (32 - {orig_instr_aes[5:4], 3'b000})));
+      // AES 32-bit final round decryption by applying rotations and the inverse sbox to a single byte of rs2 based on the MSB byte of the instruction itself
+      assign aes32dsi_gen = (fu_data_i.operand_a ^ ({24'b0, aes_sbox_inv(
+          sbox_in[7:0]
+      )} << {orig_instr_aes[5:4], 3'b000}) | ({24'b0, aes_sbox_inv(
+          sbox_in[7:0]
+      )} >> (32 - {orig_instr_aes[5:4], 3'b000})));
+      // AES 32-bit middle round decryption by applying rotations, inverse mix-columns and the inverse sbox to a single byte of rs2 based on the MSB byte of the instruction itself
+      assign aes32dsmi_gen = fu_data_i.operand_a ^ ((aes_mixcolumn_inv(
+          {24'h000000, aes_sbox_inv(sbox_in[7:0])}
+      ) << {orig_instr_aes[5:4], 3'b000}) | (aes_mixcolumn_inv(
+          {24'h000000, aes_sbox_inv(sbox_in[7:0])}
+      ) >> (32 - {orig_instr_aes[5:4], 3'b000})));
+      // SHA512 32-bit shifting and XORing rs1 and rs2
+      assign sha512sig0h_gen = (fu_data_i.operand_a >> 1) ^ (fu_data_i.operand_a >> 7) ^ (fu_data_i.operand_a >> 8) ^ (fu_data_i.operand_b << 31) ^ (fu_data_i.operand_b << 24);
+      assign sha512sig0l_gen = (fu_data_i.operand_a >> 1) ^ (fu_data_i.operand_a >> 7) ^ (fu_data_i.operand_a >> 8) ^ (fu_data_i.operand_b << 31) ^ (fu_data_i.operand_b << 25) ^ (fu_data_i.operand_b << 24);
+      assign sha512sig1h_gen = (fu_data_i.operand_a << 3) ^ (fu_data_i.operand_a >> 6) ^ (fu_data_i.operand_a >> 19) ^ (fu_data_i.operand_b >> 29) ^ (fu_data_i.operand_b << 13);
+      assign sha512sig1l_gen = (fu_data_i.operand_a << 3) ^ (fu_data_i.operand_a >> 6) ^ (fu_data_i.operand_a >> 19) ^ (fu_data_i.operand_b >> 29) ^ (fu_data_i.operand_b << 26) ^ (fu_data_i.operand_b << 13);
+      assign sha512sum0r_gen = (fu_data_i.operand_a << 25) ^ (fu_data_i.operand_a << 30) ^ (fu_data_i.operand_a >> 28) ^ (fu_data_i.operand_b >> 7) ^ (fu_data_i.operand_b >> 2) ^ (fu_data_i.operand_b << 4);
+      assign sha512sum1r_gen = (fu_data_i.operand_a << 23) ^ (fu_data_i.operand_a >> 14) ^ (fu_data_i.operand_a >> 18) ^ (fu_data_i.operand_b >> 9) ^ (fu_data_i.operand_b << 18) ^ (fu_data_i.operand_b << 14);
+    end else if (CVA6Cfg.IS_XLEN64) begin
+      // AES Shift rows forward and inverse step
+      assign sr = {
+        fu_data_i.operand_a[31:24],
+        fu_data_i.operand_b[55:48],
+        fu_data_i.operand_b[15:8],
+        fu_data_i.operand_a[39:32],
+        fu_data_i.operand_b[63:56],
+        fu_data_i.operand_b[23:16],
+        fu_data_i.operand_a[47:40],
+        fu_data_i.operand_a[7:0]
+      };
+      assign sr_inv = {
+        fu_data_i.operand_b[31:24],
+        fu_data_i.operand_b[55:48],
+        fu_data_i.operand_a[15:8],
+        fu_data_i.operand_a[39:32],
+        fu_data_i.operand_a[63:56],
+        fu_data_i.operand_b[23:16],
+        fu_data_i.operand_b[47:40],
+        fu_data_i.operand_a[7:0]
+      };
+      // AES 64-bit final round encryption by applying forward shift-rows and the forward sbox to each byte
+      assign aes64es_gen = {
+        aes_sbox_fwd(sr[63:56]),
+        aes_sbox_fwd(sr[55:48]),
+        aes_sbox_fwd(sr[47:40]),
+        aes_sbox_fwd(sr[39:32]),
+        aes_sbox_fwd(sr[31:24]),
+        aes_sbox_fwd(sr[23:16]),
+        aes_sbox_fwd(sr[15:8]),
+        aes_sbox_fwd(sr[7:0])
+      };
+      // AES 64-bit middle round encryption by applying forward shift-rows, forward sbox and forward mix-columns to all bytes
+      assign aes64esm_gen = {
+        aes_mixcolumn_fwd(aes64es_gen[63:32]), aes_mixcolumn_fwd(aes64es_gen[31:0])
+      };
+      // AES 64-bit final round decryption by applying inverse shift-rows and the inverse sbox to each byte
+      assign aes64ds_gen = {
+        aes_sbox_inv(sr_inv[63:56]),
+        aes_sbox_inv(sr_inv[55:48]),
+        aes_sbox_inv(sr_inv[47:40]),
+        aes_sbox_inv(sr_inv[39:32]),
+        aes_sbox_inv(sr_inv[31:24]),
+        aes_sbox_inv(sr_inv[23:16]),
+        aes_sbox_inv(sr_inv[15:8]),
+        aes_sbox_inv(sr_inv[7:0])
+      };
+      // AES 64-bit middle round decryption by applying inverse shift-rows, inverse sbox and inverse mix-columns to all bytes
+      assign aes64dsm_gen = {
+        aes_mixcolumn_inv(aes64ds_gen[63:32]), aes_mixcolumn_inv(aes64ds_gen[31:0])
+      };
+      // AES 64-bit keySchedule decryption by applying inverse mix-columns on rs1 
+      assign aes64im_gen = {
+        aes_mixcolumn_inv(fu_data_i.operand_a[63:32]), aes_mixcolumn_inv(fu_data_i.operand_a[31:0])
+      };
+      // AES Key Schedule part by XORing different slices of rs1 and rs2 
+      assign aes64ks2_gen = {
+        (fu_data_i.operand_a[63:32] ^ fu_data_i.operand_b[31:0] ^ fu_data_i.operand_b[63:32]),
+        (fu_data_i.operand_a[63:32] ^ fu_data_i.operand_b[31:0])
+      };
+      // AES Key Schedule part by substituting round constant based on round number(from instruction), rotations and forward subword substitutions
+      assign aes64ks1i_gen = (orig_instr_aes[3:0] <= 4'hA) ? {((aes_subword_fwd(
+          (orig_instr_aes[3:0] == 4'hA) ? fu_data_i.operand_a[63:32] : ((fu_data_i.operand_a[63:32] >> 8) | (fu_data_i.operand_a[63:32] << 24))
+      )) ^ (aes_decode_rcon(
+          orig_instr_aes[3:0]
+      ))), ((aes_subword_fwd(
+          (orig_instr_aes[3:0] == 4'hA) ? fu_data_i.operand_a[63:32] : ((fu_data_i.operand_a[63:32] >> 8) | (fu_data_i.operand_a[63:32] << 24))
+      )) ^ (aes_decode_rcon(
+          orig_instr_aes[3:0]
+      )))} : 64'h0;
+      // SHA512 64bit rotating, shifting and XORing rs1
+      assign sha512sig0_gen = (fu_data_i.operand_a >> 1 | fu_data_i.operand_a << 63) ^ (fu_data_i.operand_a >> 8 | fu_data_i.operand_a << 56) ^ (fu_data_i.operand_a >> 7);
+      assign sha512sig1_gen = (fu_data_i.operand_a >> 19 | fu_data_i.operand_a << 45) ^ (fu_data_i.operand_a >> 61 | fu_data_i.operand_a << 3) ^ (fu_data_i.operand_a >> 6);
+      assign sha512sum0_gen = (fu_data_i.operand_a >> 28 | fu_data_i.operand_a << 36) ^ (fu_data_i.operand_a >> 34 | fu_data_i.operand_a << 30) ^ (fu_data_i.operand_a >> 39 | fu_data_i.operand_a << 25);
+      assign sha512sum1_gen = (fu_data_i.operand_a >> 14 | fu_data_i.operand_a << 50) ^ (fu_data_i.operand_a >> 18 | fu_data_i.operand_a << 46) ^ (fu_data_i.operand_a >> 41 | fu_data_i.operand_a << 23);
+    end
+  end
+
+  // -----------
+  // Result MUX
+  // -----------
+  always_comb begin
+    result_o = '0;
+    // AES instructions
+    if (CVA6Cfg.ZKN && CVA6Cfg.RVB) begin
+      if (CVA6Cfg.IS_XLEN32) begin
+        unique case (fu_data_i.operation)
+          AES32ESI: result_o = aes32esi_gen;
+          AES32ESMI: result_o = aes32esmi_gen;
+          AES32DSI: result_o = aes32dsi_gen;
+          AES32DSMI: result_o = aes32dsmi_gen;
+          SHA256SIG0: result_o = sha256sig0_gen;
+          SHA256SIG1: result_o = sha256sig1_gen;
+          SHA256SUM0: result_o = sha256sum0_gen;
+          SHA256SUM1: result_o = sha256sum1_gen;
+          SHA512SIG0H: result_o = sha512sig0h_gen;
+          SHA512SIG0L: result_o = sha512sig0l_gen;
+          SHA512SIG1H: result_o = sha512sig1h_gen;
+          SHA512SIG1L: result_o = sha512sig1l_gen;
+          SHA512SUM0R: result_o = sha512sum0r_gen;
+          SHA512SUM1R: result_o = sha512sum1r_gen;
+          default: ;
+        endcase
+      end
+      if (CVA6Cfg.IS_XLEN64) begin
+        unique case (fu_data_i.operation)
+          AES64ES: result_o = aes64es_gen;
+          AES64ESM: result_o = aes64esm_gen;
+          AES64DS: result_o = aes64ds_gen;
+          AES64DSM: result_o = aes64dsm_gen;
+          AES64IM: result_o = aes64im_gen;
+          AES64KS1I: result_o = aes64ks1i_gen;
+          AES64KS2: result_o = aes64ks2_gen;
+          SHA256SIG0: result_o = {{32{sha256sig0_gen[31]}}, sha256sig0_gen};
+          SHA256SIG1: result_o = {{32{sha256sig1_gen[31]}}, sha256sig1_gen};
+          SHA256SUM0: result_o = {{32{sha256sum0_gen[31]}}, sha256sum0_gen};
+          SHA256SUM1: result_o = {{32{sha256sum1_gen[31]}}, sha256sum1_gen};
+          SHA512SIG0: result_o = sha512sig0_gen;
+          SHA512SIG1: result_o = sha512sig1_gen;
+          SHA512SUM0: result_o = sha512sum0_gen;
+          SHA512SUM1: result_o = sha512sum1_gen;
+          default: ;
+        endcase
+      end
+    end
+  end
+endmodule
--- a/core/alu.sv
+++ b/core/alu.sv
@ -54,6 +54,9 @@ module alu
  logic [CVA6Cfg.XLEN-1:0] brev8_reversed;
  logic [            31:0] unzip_gen;
  logic [            31:0] zip_gen;
+  logic [CVA6Cfg.XLEN-1:0] xperm8_result;
+  logic [CVA6Cfg.XLEN-1:0] xperm4_result;
+
  // bit reverse operand_a for left shifts and bit counting
  generate
    genvar k;
@ -268,16 +271,22 @@ module alu

  // ZKN gen block
  if (CVA6Cfg.ZKN && CVA6Cfg.RVB) begin : zkn_gen_block
-    genvar i, m, n;
-    // Generate brev8_reversed by reversing bits within each byte
-    for (i = 0; i < (CVA6Cfg.XLEN / 8); i++) begin : brev8_gen
+    genvar i, m, n, q;
+    for (i = 0; i < (CVA6Cfg.XLEN / 8); i++) begin : brev8_xperm8_gen
+      // Generating xperm8_result by extracting bytes from operand a based on indices from operand b
+      assign xperm8_result[i << 3 +: 8] = (fu_data_i.operand_b[i << 3 +: 8] < (CVA6Cfg.XLEN / 8)) ? fu_data_i.operand_a[fu_data_i.operand_b[i << 3 +: 8] << 3 +: 8] : 8'b0;
+      // Generate brev8_reversed by reversing bits within each byte
      for (m = 0; m < 8; m++) begin : reverse_bits
        // Reversing the order of bits within a single byte
        assign brev8_reversed[(i<<3)+m] = fu_data_i.operand_a[(i<<3)+(7-m)];
      end
    end
-    // Generate zip and unzip results
+    for (q = 0; q < (CVA6Cfg.XLEN / 4); q++) begin : xperm4_gen
+      // Generating xperm4_result by extracting nibbles from operand a based on indices from operand b
+      assign xperm4_result[q << 2 +: 4] = (fu_data_i.operand_b[q << 2 +: 4] < (CVA6Cfg.XLEN / 4)) ? fu_data_i.operand_a[{2'b0, fu_data_i.operand_b[q << 2 +: 4]} << 2 +: 4] : 4'b0;
+    end
    if (CVA6Cfg.IS_XLEN32) begin
+      // Generate zip and unzip results
      for (n = 0; n < 16; n++) begin : zip_unzip_gen
        // Assigning lower and upper half of operand into the even and odd positions of result
        assign zip_gen[n<<1] = fu_data_i.operand_a[n];
@ -392,6 +401,8 @@ module alu
        PACK_H:
        result_o = (CVA6Cfg.IS_XLEN32) ? ({16'b0, fu_data_i.operand_b[7:0], fu_data_i.operand_a[7:0]}) : ({48'b0, fu_data_i.operand_b[7:0], fu_data_i.operand_a[7:0]});
        BREV8: result_o = brev8_reversed;
+        XPERM8: result_o = xperm8_result;
+        XPERM4: result_o = xperm4_result;
        default: ;
      endcase
      if (fu_data_i.operation == PACK_W && CVA6Cfg.IS_XLEN64)
--- a/core/cva6.sv
+++ b/core/cva6.sv
@ -442,6 +442,8 @@ module cva6
  exception_t flu_exception_ex_id;
  // ALU
  logic [CVA6Cfg.NrIssuePorts-1:0] alu_valid_id_ex;
+  logic [5:0] orig_instr_aes;
+  logic [CVA6Cfg.NrIssuePorts-1:0] aes_valid_id_ex;
  // Branches and Jumps
  logic [CVA6Cfg.NrIssuePorts-1:0] branch_valid_id_ex;

@ -858,6 +860,7 @@ module cva6
      .flu_ready_i             (flu_ready_ex_id),
      // ALU
      .alu_valid_o             (alu_valid_id_ex),
+      .aes_valid_o             (aes_valid_id_ex),
      // Branches and Jumps
      .branch_valid_o          (branch_valid_id_ex),            // branch is valid
      .branch_predict_o        (branch_predict_id_ex),          // branch predict to ex
@ -916,7 +919,8 @@ module cva6
      .rvfi_issue_pointer_o (rvfi_issue_pointer),
      .rvfi_commit_pointer_o(rvfi_commit_pointer),
      .rvfi_rs1_o           (rvfi_rs1),
-      .rvfi_rs2_o           (rvfi_rs2)
+      .rvfi_rs2_o           (rvfi_rs2),
+      .orig_instr_aes_bits  (orig_instr_aes)
  );

  // ---------
@ -958,6 +962,8 @@ module cva6
      .flu_ready_o(flu_ready_ex_id),
      // ALU
      .alu_valid_i(alu_valid_id_ex),
+      .orig_instr_aes_i(orig_instr_aes),
+      .aes_valid_i(aes_valid_id_ex),
      // Branches and Jumps
      .branch_valid_i(branch_valid_id_ex),
      .branch_predict_i(branch_predict_id_ex),  // branch predict to ex
--- a/core/decoder.sv
+++ b/core/decoder.sv
@ -467,7 +467,7 @@ module decoder
          // --------------------------------------------
          // Vectorial Floating-Point Reg-Reg Operations
          // --------------------------------------------
-          if (instr.rvftype.funct2 == 2'b10) begin  // Prefix 10 for all Xfvec ops
+          if (!CVA6Cfg.ZKN && instr.rvftype.funct2 == 2'b10) begin  // Prefix 10 for all Xfvec ops
            // only generate decoder if FP extensions are enabled (static)
            if (CVA6Cfg.FpPresent && CVA6Cfg.XFVec && fs_i != riscv::Off && ((CVA6Cfg.RVH && (!v_i || vfs_i != riscv::Off)) || !CVA6Cfg.RVH)) begin
              automatic logic allow_replication;  // control honoring of replication flag
@ -788,6 +788,18 @@ module decoder
                  if (CVA6Cfg.ZKN) instruction_o.op = ariane_pkg::PACK_H;  //packh
                  else illegal_instr_bm = 1'b1;
                end
+                {
+                  7'b001_0100, 3'b100
+                } : begin
+                  if (CVA6Cfg.ZKN) instruction_o.op = ariane_pkg::XPERM8;  // xperm8
+                  else illegal_instr_bm = 1'b1;
+                end
+                {
+                  7'b001_0100, 3'b010
+                } : begin
+                  if (CVA6Cfg.ZKN) instruction_o.op = ariane_pkg::XPERM4;  // xperm4
+                  else illegal_instr_bm = 1'b1;
+                end
                // Zero Extend Op RV32 encoding
                {
                  7'b000_0100, 3'b100
@ -797,6 +809,150 @@ module decoder
                  else if (CVA6Cfg.ZKN) instruction_o.op = ariane_pkg::PACK;  // pack
                  else illegal_instr_bm = 1'b1;
                end
+                {
+                  7'b001_1001, 3'b000
+                } : begin
+                  if (CVA6Cfg.ZKN) begin
+                    instruction_o.op = ariane_pkg::AES64ES;  // aes64es
+                    instruction_o.fu = AES;
+                  end else illegal_instr_bm = 1'b1;
+                end
+                {
+                  7'b001_1011, 3'b000
+                } : begin
+                  if (CVA6Cfg.ZKN) begin
+                    instruction_o.op = ariane_pkg::AES64ESM;  // aes64esm
+                    instruction_o.fu = AES;
+                  end else illegal_instr_bm = 1'b1;
+                end
+                {
+                  7'b011_1111, 3'b000
+                } : begin
+                  if (CVA6Cfg.ZKN) begin
+                    instruction_o.op = ariane_pkg::AES64KS2;  // aes64ks2
+                    instruction_o.fu = AES;
+                  end else illegal_instr_bm = 1'b1;
+                end
+                {
+                  7'b0010001, 3'b000
+                }, {
+                  7'b0110001, 3'b000
+                }, {
+                  7'b1010001, 3'b000
+                }, {
+                  7'b1110001, 3'b000
+                } : begin
+                  if (CVA6Cfg.ZKN) begin
+                    instruction_o.op = ariane_pkg::AES32ESI;  // aes32esi
+                    instruction_o.fu = AES;
+                  end else illegal_instr_bm = 1'b1;
+                end
+                {
+                  7'b0010011, 3'b000
+                }, {
+                  7'b0110011, 3'b000
+                }, {
+                  7'b1010011, 3'b000
+                }, {
+                  7'b1110011, 3'b000
+                } : begin
+                  if (CVA6Cfg.ZKN) begin
+                    instruction_o.op = ariane_pkg::AES32ESMI;  // aes32esmi
+                    instruction_o.fu = AES;
+                  end else illegal_instr_bm = 1'b1;
+                end
+                {
+                  7'b0010101, 3'b000
+                }, {
+                  7'b0110101, 3'b000
+                }, {
+                  7'b1010101, 3'b000
+                }, {
+                  7'b1110101, 3'b000
+                } : begin
+                  if (CVA6Cfg.ZKN) begin
+                    instruction_o.op = ariane_pkg::AES32DSI;  // aes32dsi
+                    instruction_o.fu = AES;
+                  end else illegal_instr_bm = 1'b1;
+                end
+                {
+                  7'b0010111, 3'b000
+                }, {
+                  7'b0110111, 3'b000
+                }, {
+                  7'b1010111, 3'b000
+                }, {
+                  7'b1110111, 3'b000
+                } : begin
+                  if (CVA6Cfg.ZKN) begin
+                    instruction_o.op = ariane_pkg::AES32DSMI;  // aes32dsmi
+                    instruction_o.fu = AES;
+                  end else illegal_instr_bm = 1'b1;
+                end
+                {
+                  7'b001_1101, 3'b000
+                } : begin
+                  if (CVA6Cfg.ZKN) begin
+                    instruction_o.op = ariane_pkg::AES64DS;  // aes64ds
+                    instruction_o.fu = AES;
+                  end else illegal_instr_bm = 1'b1;
+                end
+                {
+                  7'b001_1111, 3'b000
+                } : begin
+                  if (CVA6Cfg.ZKN) begin
+                    instruction_o.op = ariane_pkg::AES64DSM;  // aes64dsm
+                    instruction_o.fu = AES;
+                  end else illegal_instr_bm = 1'b1;
+                end
+                {
+                  7'b010_1110, 3'b000
+                } : begin
+                  if (CVA6Cfg.ZKN) begin
+                    instruction_o.op = ariane_pkg::SHA512SIG0H;  // sha512sig0h
+                    instruction_o.fu = AES;
+                  end else illegal_instr_bm = 1'b1;
+                end
+                {
+                  7'b010_1010, 3'b000
+                } : begin
+                  if (CVA6Cfg.ZKN) begin
+                    instruction_o.op = ariane_pkg::SHA512SIG0L;  // sha512sig0l
+                    instruction_o.fu = AES;
+                  end else illegal_instr_bm = 1'b1;
+                end
+                {
+                  7'b010_1111, 3'b000
+                } : begin
+                  if (CVA6Cfg.ZKN) begin
+                    instruction_o.op = ariane_pkg::SHA512SIG1H;  // sha512sig1h
+                    instruction_o.fu = AES;
+                  end else illegal_instr_bm = 1'b1;
+                end
+                {
+                  7'b010_1011, 3'b000
+                } : begin
+                  if (CVA6Cfg.ZKN) begin
+                    instruction_o.op = ariane_pkg::SHA512SIG1L;  // sha512sig1l
+                    instruction_o.fu = AES;
+                  end else illegal_instr_bm = 1'b1;
+                end
+                {
+                  7'b010_1000, 3'b000
+                } : begin
+                  if (CVA6Cfg.ZKN) begin
+                    instruction_o.op = ariane_pkg::SHA512SUM0R;  // sha512sum0r
+                    instruction_o.fu = AES;
+                  end else illegal_instr_bm = 1'b1;
+                end
+                {
+                  7'b010_1001, 3'b000
+                } : begin
+                  if (CVA6Cfg.ZKN) begin
+                    instruction_o.op = ariane_pkg::SHA512SUM1R;  // sha512sum1r
+                    instruction_o.fu = AES;
+                  end else illegal_instr_bm = 1'b1;
+                end
                default: begin
                  illegal_instr_bm = 1'b1;
                end
@ -937,7 +1093,37 @@ module decoder
                  instruction_o.op = ariane_pkg::BSETI;
                else if (CVA6Cfg.ZKN && instr.instr[31:20] == 12'b000010001111)
                  instruction_o.op = ariane_pkg::ZIP;
-                else illegal_instr_bm = 1'b1;
+                else if (CVA6Cfg.ZKN && instr.instr[31:24] == 8'b00110001) begin
+                  instruction_o.op = ariane_pkg::AES64KS1I;
+                  instruction_o.fu = AES;
+                end else if (CVA6Cfg.ZKN && instr.instr[31:20] == 12'b001100000000) begin
+                  instruction_o.op = ariane_pkg::AES64IM;
+                  instruction_o.fu = AES;
+                end else if (CVA6Cfg.ZKN && instr.instr[31:20] == 12'b000100000010) begin
+                  instruction_o.op = ariane_pkg::SHA256SIG0;
+                  instruction_o.fu = AES;
+                end else if (CVA6Cfg.ZKN && instr.instr[31:20] == 12'b000100000011) begin
+                  instruction_o.op = ariane_pkg::SHA256SIG1;
+                  instruction_o.fu = AES;
+                end else if (CVA6Cfg.ZKN && instr.instr[31:20] == 12'b000100000000) begin
+                  instruction_o.op = ariane_pkg::SHA256SUM0;
+                  instruction_o.fu = AES;
+                end else if (CVA6Cfg.ZKN && instr.instr[31:20] == 12'b000100000001) begin
+                  instruction_o.op = ariane_pkg::SHA256SUM1;
+                  instruction_o.fu = AES;
+                end else if (CVA6Cfg.ZKN && instr.instr[31:20] == 12'b000100000110) begin
+                  instruction_o.op = ariane_pkg::SHA512SIG0;
+                  instruction_o.fu = AES;
+                end else if (CVA6Cfg.ZKN && instr.instr[31:20] == 12'b000100000111) begin
+                  instruction_o.op = ariane_pkg::SHA512SIG1;
+                  instruction_o.fu = AES;
+                end else if (CVA6Cfg.ZKN && instr.instr[31:20] == 12'b000100000100) begin
+                  instruction_o.op = ariane_pkg::SHA512SUM0;
+                  instruction_o.fu = AES;
+                end else if (CVA6Cfg.ZKN && instr.instr[31:20] == 12'b000100000101) begin
+                  instruction_o.op = ariane_pkg::SHA512SUM1;
+                  instruction_o.fu = AES;
+                end else illegal_instr_bm = 1'b1;
              end
              3'b101: begin
                if (instr.instr[31:20] == 12'b001010000111) instruction_o.op = ariane_pkg::ORCB;
--- a/core/ex_stage.sv
+++ b/core/ex_stage.sv
@ -67,6 +67,8 @@ module ex_stage
    output logic flu_valid_o,
    // ALU instruction is valid - ISSUE_STAGE
    input logic [CVA6Cfg.NrIssuePorts-1:0] alu_valid_i,
+    // AES instruction is valid - ISSUE_STAGE
+    input logic [CVA6Cfg.NrIssuePorts-1:0] aes_valid_i,
    // Branch unit instruction is valid - ISSUE_STAGE
    input logic [CVA6Cfg.NrIssuePorts-1:0] branch_valid_i,
    // Information of branch prediction - ISSUE_STAGE
@ -235,7 +237,9 @@ module ex_stage
    // Information dedicated to RVFI - RVFI
    output lsu_ctrl_t rvfi_lsu_ctrl_o,
    // Information dedicated to RVFI - RVFI
-    output [CVA6Cfg.PLEN-1:0] rvfi_mem_paddr_o
+    output [CVA6Cfg.PLEN-1:0] rvfi_mem_paddr_o,
+    // Original instruction AES bits
+    input logic [5:0] orig_instr_aes_i
 );

  // -------------------------
@ -271,14 +275,14 @@ module ex_stage

  // from ALU to branch unit
  logic alu_branch_res;  // branch comparison result
-  logic [CVA6Cfg.XLEN-1:0] alu_result, csr_result, mult_result;
+  logic [CVA6Cfg.XLEN-1:0] alu_result, csr_result, mult_result, aes_result;
  logic [CVA6Cfg.VLEN-1:0] branch_result;
  logic csr_ready, mult_ready;
  logic [CVA6Cfg.TRANS_ID_BITS-1:0] mult_trans_id;
  logic mult_valid;

  logic [CVA6Cfg.NrIssuePorts-1:0] one_cycle_select;
-  assign one_cycle_select = alu_valid_i | branch_valid_i | csr_valid_i;
+  assign one_cycle_select = alu_valid_i | branch_valid_i | csr_valid_i | aes_valid_i;

  fu_data_t one_cycle_data;
  logic [CVA6Cfg.VLEN-1:0] rs1_forwarding;
@ -370,6 +374,8 @@ module ex_stage
    end else if (mult_valid) begin
      flu_result_o   = mult_result;
      flu_trans_id_o = mult_trans_id;
+    end else if (|aes_valid_i) begin
+      flu_result_o = aes_result;
    end
  end

@ -723,4 +729,24 @@ module ex_stage
    assign gpaddr_to_be_flushed               = '0;
  end

+  // ----------------
+  // Scalar Cryptography Unit
+  // ----------------
+  generate
+    if (CVA6Cfg.ZKN) begin : aes_gen
+      aes #(
+          .CVA6Cfg  (CVA6Cfg),
+          .fu_data_t(fu_data_t)
+      ) aes_i (
+          .clk_i,
+          .rst_ni,
+          .fu_data_i     (one_cycle_data),
+          .result_o      (aes_result),
+          .orig_instr_aes(orig_instr_aes_i)
+      );
+    end else begin : no_aes_gen
+      assign aes_result = '0;
+    end
+  endgenerate
+
 endmodule
--- a/core/include/aes_pkg.sv
+++ b/core/include/aes_pkg.sv
@ -0,0 +1,219 @@
+// Licensed under the Solderpad Hardware Licence, Version 2.1 (the "License");
+// you may not use this file except in compliance with the License.
+// SPDX-License-Identifier: Apache-2.0 WITH SHL-2.1
+// You may obtain a copy of the License at https://solderpad.org/licenses/
+//
+// Author: Munail Waqar, 10xEngineers
+// Date: 03.05.2025
+// Description: The Zkn extension including its subsets accelerates cryptographic workloads by introducing dedicated
+// scalar instructions compliant with the RISC-V Scalar Cryptography specification. The subsets include:
+// Zknd (AES Decryption and related instructions), Zkne (AES Encryption support, including AES rounds and key expansion steps),
+// Zknh (SHA-256 and SHA-512 hash functions for secure hashing operations).
+//
+package aes_pkg;
+
+  // ----------------------
+  // AES functions
+  // ----------------------
+
+  // AES MixColumns Forward
+  function [31:0] aes_mixcolumn_fwd(input [31:0] x);
+    begin
+      aes_mixcolumn_fwd = {
+        (((x[7:0] << 1) ^ ((x[7]) ? 8'h1B : 8'h00)) ^ x[7:0]) ^ x[15:8] ^ x[23:16] ^ ((x[31:24] << 1) ^ ((x[31]) ? 8'h1B : 8'h00)),
+        x[7:0] ^ x[15:8] ^ ((x[23:16] << 1) ^ ((x[23]) ? 8'h1B : 8'h00)) ^ (((x[31:24] << 1) ^ ((x[31]) ? 8'h1B : 8'h00)) ^ x[31:24]),
+        x[7:0] ^ ((x[15:8] << 1) ^ ((x[15]) ? 8'h1B : 8'h00)) ^ (((x[23:16] << 1) ^ ((x[23]) ? 8'h1B : 8'h00)) ^ x[23:16]) ^ x[31:24],
+        ((x[7:0] << 1) ^ ((x[7]) ? 8'h1B : 8'h00)) ^ (((x[15:8] << 1) ^ ((x[15]) ? 8'h1B : 8'h00)) ^ x[15:8]) ^ x[23:16] ^ x[31:24]
+      };
+    end
+  endfunction
+  // AES subword Forward
+  function [31:0] aes_subword_fwd(input [31:0] word);
+    aes_subword_fwd = {
+      aes_sbox_fwd(word[31:24]),
+      aes_sbox_fwd(word[23:16]),
+      aes_sbox_fwd(word[15:8]),
+      aes_sbox_fwd(word[7:0])
+    };
+  endfunction
+  // AES Round Constant
+  function [31:0] aes_decode_rcon(input [3:0] r);
+    case (r)
+      4'h0: aes_decode_rcon = 32'h00000001;
+      4'h1: aes_decode_rcon = 32'h00000002;
+      4'h2: aes_decode_rcon = 32'h00000004;
+      4'h3: aes_decode_rcon = 32'h00000008;
+      4'h4: aes_decode_rcon = 32'h00000010;
+      4'h5: aes_decode_rcon = 32'h00000020;
+      4'h6: aes_decode_rcon = 32'h00000040;
+      4'h7: aes_decode_rcon = 32'h00000080;
+      4'h8: aes_decode_rcon = 32'h0000001b;
+      4'h9: aes_decode_rcon = 32'h00000036;
+      4'hA: aes_decode_rcon = 32'h00000000;
+      4'hB: aes_decode_rcon = 32'h00000000;
+      4'hC: aes_decode_rcon = 32'h00000000;
+      4'hD: aes_decode_rcon = 32'h00000000;
+      4'hE: aes_decode_rcon = 32'h00000000;
+      4'hF: aes_decode_rcon = 32'h00000000;
+      default: aes_decode_rcon = 32'h00000000;
+    endcase
+  endfunction
+  // AES MixColumns Inverse
+  function logic [31:0] aes_mixcolumn_inv(input logic [31:0] x);
+    aes_mixcolumn_inv = {
+      (gfmul(x[7:0], 4'hB) ^ gfmul(x[15:8], 4'hD) ^ gfmul(x[23:16], 4'h9) ^ gfmul(x[31:24], 4'hE)),
+      (gfmul(x[7:0], 4'hD) ^ gfmul(x[15:8], 4'h9) ^ gfmul(x[23:16], 4'hE) ^ gfmul(x[31:24], 4'hB)),
+      (gfmul(x[7:0], 4'h9) ^ gfmul(x[15:8], 4'hE) ^ gfmul(x[23:16], 4'hB) ^ gfmul(x[31:24], 4'hD)),
+      (gfmul(x[7:0], 4'hE) ^ gfmul(x[15:8], 4'hB) ^ gfmul(x[23:16], 4'hD) ^ gfmul(x[31:24], 4'h9))
+    };
+  endfunction
+  // GF multiplication
+  function logic [7:0] gfmul(input logic [7:0] x, input logic [3:0] y);
+    logic [7:0] result, temp;
+    result = 8'h00;
+    if (y[0]) result ^= x;
+    if (y[1]) begin
+      result ^= ((x << 1) ^ ((x[7]) ? 8'h1B : 8'h00));
+    end
+    if (y[2]) begin
+      temp = (x << 1) ^ ((x[7]) ? 8'h1B : 8'h00);
+      result ^= (temp << 1) ^ ((temp[7]) ? 8'h1B : 8'h00);
+    end
+    if (y[3]) begin
+      temp = (x << 1) ^ ((x[7]) ? 8'h1B : 8'h00);
+      temp = (temp << 1) ^ ((temp[7]) ? 8'h1B : 8'h00);
+      result ^= (temp << 1) ^ ((temp[7]) ? 8'h1B : 8'h00);
+    end
+    return result;
+  endfunction
+  // AES Sbox implementation based on https://github.com/riscv/riscv-crypto
+  // AES Sbox Forward
+  function automatic logic [7:0] aes_sbox_fwd(input logic [7:0] in_byte);
+    logic [20:0] expanded;
+    logic [17:0] non_linear;
+    logic [ 7:0] compressed;
+    expanded = linear_top_layer(in_byte);
+    non_linear = non_linear_layer(expanded);
+    compressed = linear_bottom_layer(non_linear);
+    aes_sbox_fwd = compressed;
+  endfunction
+  // AES Sbox Inverse
+  function automatic logic [7:0] aes_sbox_inv(input logic [7:0] in_byte);
+    logic [20:0] expanded;
+    logic [17:0] non_linear;
+    logic [ 7:0] compressed;
+    expanded = aes_sbox_inv_top(in_byte);
+    non_linear = non_linear_layer(expanded);
+    compressed = aes_sbox_inv_out(non_linear);
+    aes_sbox_inv = compressed;
+  endfunction
+  // AES Sbox Forward Top Layer
+  function automatic logic [20:0] linear_top_layer(input logic [7:0] x);
+    return {
+      ((x[7] ^ x[4]) ^ (x[5] ^ x[2])),
+      (((x[7] ^ x[4])  ^ ((x[6] ^ x[5])  ^ (x[4] ^ x[0]))) ^ ((x[0] ^ (x[6] ^ x[5]))  ^ ((x[3] ^ x[1])  ^ (x[5] ^ x[2])))),
+      ((x[7] ^ x[2]) ^ (((x[7] ^ x[4]) ^ (x[3] ^ x[1])) ^ (x[6] ^ x[5]))),
+      ((x[7] ^ x[2]) ^ ((x[6] ^ x[5]) ^ (x[1] ^ x[0]))),
+      ((x[6] ^ x[5]) ^ (x[1] ^ x[0])),
+      ((x[7] ^ x[4]) ^ ((x[6] ^ x[5]) ^ (x[4] ^ x[0]))),
+      ((x[6] ^ x[5]) ^ (x[4] ^ x[0])),
+      ((x[0] ^ (x[6] ^ x[5])) ^ ((x[3] ^ x[1]) ^ (x[5] ^ x[2]))),
+      ((x[3] ^ x[1]) ^ (x[5] ^ x[2])),
+      ((x[3] ^ x[1]) ^ (x[6] ^ x[2])),
+      (((x[7] ^ x[4]) ^ (x[3] ^ x[1])) ^ (x[6] ^ x[2])),
+      ((x[7] ^ x[1]) ^ (x[4] ^ x[2])),
+      (((x[7] ^ x[4]) ^ (x[3] ^ x[1])) ^ (x[6] ^ x[5])),
+      (x[0] ^ (x[6] ^ x[5])),
+      (x[0] ^ ((x[7] ^ x[4]) ^ (x[3] ^ x[1]))),
+      ((x[7] ^ x[4]) ^ (x[3] ^ x[1])),
+      (x[4] ^ x[2]),
+      (x[7] ^ x[1]),
+      (x[7] ^ x[2]),
+      (x[7] ^ x[4]),
+      (x[0])
+    };
+  endfunction
+  // AES Sbox Middle Layer
+  function automatic logic [17:0] non_linear_layer(input logic [20:0] x);
+    logic t1, t2, t3, t4, t5;
+    logic [17:0] y;
+    t1 = (((x[10] ^ (x[9] & x[5])) ^ (x[17] & x[6])) ^ ((x[4] & x[20]) ^ (x[1] & x[11])));
+    t2 = ((((x[14] & x[0]) ^ (x[9] & x[5])) ^ x[18]) ^ ((x[2] & x[8]) ^ (x[1] & x[11])));
+    t3 = ((((x[3] ^ x[12]) ^ (x[3] & x[12])) ^ (x[16] & x[7])) ^ ((x[4] & x[20]) ^ (x[1] & x[11])));
+    t4 = ((((x[15] & x[13]) ^ (x[3] & x[12])) ^ ((x[2] & x[8]) ^ (x[1] & x[11]))) ^ x[19]);
+    t5 = ((((t1 ^ t2) & (t1 & t4)) ^ ((t1 ^ t2) ^ (t3 & t1))) ^ (((t3 ^ t4) & (t2 & t3)) ^ ((t3 ^ t4) ^ (t3 & t1))));
+
+    y[0] = (((t1 ^ t2) & (t1 & t4)) ^ ((t1 ^ t2) ^ (t3 & t1))) & x[7];
+    y[1] = (t2 ^ ((t4 ^ (t3 & t1)) & (t1 ^ t2))) & x[13];
+    y[2] = ((t2 ^ ((t4 ^ (t3 & t1)) & (t1 ^ t2))) ^ (t4 ^ ((t2 ^ (t3 & t1)) & (t3 ^ t4)))) & x[11];
+    y[3] = (((t2 ^ ((t4 ^ (t3 & t1)) & (t1 ^ t2))) ^ (t4 ^ ((t2 ^ (t3 & t1)) & (t3 ^ t4)))) ^ t5) & x[20];
+    y[4] = t5 & x[8];
+    y[5] = ((t4 ^ ((t2 ^ (t3 & t1)) & (t3 ^ t4))) ^ (((t3 ^ t4) & (t2 & t3)) ^ ((t3 ^ t4) ^ (t3 & t1)))) & x[9];
+    y[6] = (((t3 ^ t4) & (t2 & t3)) ^ ((t3 ^ t4) ^ (t3 & t1))) & x[17];
+    y[7] = (t4 ^ ((t2 ^ (t3 & t1)) & (t3 ^ t4))) & x[14];
+    y[8] = ((t2 ^ ((t4 ^ (t3 & t1)) & (t1 ^ t2))) ^ (((t1 ^ t2) & (t1 & t4)) ^ ((t1 ^ t2) ^ (t3 & t1)))) & x[3];
+    y[9] = (((t1 ^ t2) & (t1 & t4)) ^ ((t1 ^ t2) ^ (t3 & t1))) & x[16];
+    y[10] = (t2 ^ ((t4 ^ (t3 & t1)) & (t1 ^ t2))) & x[15];
+    y[11] = ((t2 ^ ((t4 ^ (t3 & t1)) & (t1 ^ t2))) ^ (t4 ^ ((t2 ^ (t3 & t1)) & (t3 ^ t4)))) & x[1];
+    y[12] = (((t2 ^ ((t4 ^ (t3 & t1)) & (t1 ^ t2))) ^ (t4 ^ ((t2 ^ (t3 & t1)) & (t3 ^ t4)))) ^ t5) & x[4];
+    y[13] = t5 & x[2];
+    y[14] = ((t4 ^ ((t2 ^ (t3 & t1)) & (t3 ^ t4))) ^ (((t3 ^ t4) & (t2 & t3)) ^ ((t3 ^ t4) ^ (t3 & t1)))) & x[5];
+    y[15] = (((t3 ^ t4) & (t2 & t3)) ^ ((t3 ^ t4) ^ (t3 & t1))) & x[6];
+    y[16] = (t4 ^ ((t2 ^ (t3 & t1)) & (t3 ^ t4))) & x[0];
+    y[17] = ((t2 ^ ((t4 ^ (t3 & t1)) & (t1 ^ t2))) ^ (((t1 ^ t2) & (t1 & t4)) ^ ((t1 ^ t2) ^ (t3 & t1)))) & x[12];
+    return y;
+  endfunction
+  // AES Sbox Forward Bottom Layer
+  function automatic logic [7:0] linear_bottom_layer(input logic [17:0] x);
+    logic [7:0] y;
+    y[0] = ((x[12] ^ (x[17] ^ x[11])) ^~ ((x[8] ^ (x[1] ^ x[9])) ^ (x[14] ^ x[16])));
+    y[1] = ((x[0] ^ (x[11] ^ x[12])) ^~ ((x[1] ^ x[9]) ^ (x[3] ^ (x[4] ^ x[8]))));
+    y[2] = (((x[12] ^ (x[17] ^ x[11])) ^ (x[3] ^ (x[4] ^ x[8]))) ^ ((x[10] ^ (x[14] ^ x[16])) ^ (x[7] ^ (x[0] ^ x[6]))));
+    y[3] = (((x[11] ^ x[12]) ^ (x[0] ^ x[6])) ^ ((x[15] ^ x[5]) ^ (x[16] ^ x[1])));
+    y[4] = ((x[12] ^ (x[17] ^ x[11])) ^ ((x[0] ^ x[6]) ^ (x[14] ^ (x[15] ^ x[5]))));
+    y[5] = ((x[13] ^ (x[4] ^ x[8])) ^~ ((x[10] ^ (x[14] ^ x[16])) ^ (x[2] ^ x[11])));
+    y[6] = ((x[6] ^ (x[11] ^ x[12])) ^~ ((x[14] ^ (x[15] ^ x[5])) ^ (x[2] ^ x[3])));
+    y[7] = ((x[12] ^ (x[17] ^ x[11])) ^ ((x[5] ^ (x[0] ^ x[6])) ^ (x[2] ^ x[3])));
+    return y;
+  endfunction
+  // AES Sbox Inverse Top Layer
+  function automatic logic [20:0] aes_sbox_inv_top(input logic [7:0] x);
+    return {
+      ((x[4] ^ x[3]) ^ (x[2] ^~ x[1])),
+      (x[5] ^~ (x[4] ^ x[3])),
+      (x[3] ^~ x[0]),
+      (x[7] ^ x[4]),
+      (x[6] ^~ x[4]),
+      ((x[3] ^~ x[0]) ^ (x[6] ^ x[1])),
+      ((x[6] ^~ x[4]) ^ (x[1] ^ x[0])),
+      (x[5] ^~ ((x[6] ^~ x[4]) ^ (x[1] ^ x[0]))),
+      ((x[6] ^ x[1]) ^ (x[5] ^~ x[3])),
+      (((x[7] ^~ x[6]) ^ (x[3] ^~ x[0])) ^ ((x[4] ^ x[3]) ^ (x[2] ^~ x[1]))),
+      (((x[7] ^~ x[6]) ^ (x[3] ^~ x[0])) ^ (x[2] ^~ x[1])),
+      ((x[7] ^~ x[6]) ^ (x[1] ^ x[0])),
+      ((x[7] ^~ x[6]) ^ (x[3] ^~ x[0])),
+      (x[0] ^~ (x[4] ^ x[3])),
+      (x[6] ^~ (x[7] ^ x[4])),
+      ((x[6] ^~ x[4]) ^ (x[5] ^~ x[2])),
+      (x[3] ^ (x[6] ^~ (x[7] ^ x[4]))),
+      ((x[4] ^ x[3]) ^ (x[1] ^ x[0])),
+      (x[7] ^~ x[6]),
+      (x[4] ^ x[3]),
+      (x[7] ^ (x[5] ^~ x[2]))
+    };
+  endfunction
+  // AES Sbox Inverse Bottom Layer
+  function automatic logic [7:0] aes_sbox_inv_out(input logic [17:0] x);
+    logic [7:0] y;
+    y[0] = ((x[5] ^ x[13]) ^ (x[7] ^ x[11]));
+    y[1] = ((x[17] ^ x[12]) ^ (((x[2] ^ x[11]) ^ (x[8] ^ x[9])) ^ (x[0] ^ x[3])));
+    y[2] = (((x[4] ^ x[12]) ^ (x[15] ^ x[0])) ^ ((x[14] ^ x[1]) ^ ((x[2] ^ x[11]) ^ (x[8] ^ x[9]))));
+    y[3] = ((((x[2] ^ x[11]) ^ (x[8] ^ x[9])) ^ (x[0] ^ x[3])) ^ ((x[7] ^ (x[16] ^ x[6])) ^ (x[13] ^ (x[14] ^ x[1]))));
+    y[4] = ((x[14] ^ x[16]) ^ ((x[4] ^ x[12]) ^ ((x[2] ^ x[11]) ^ (x[8] ^ x[9]))));
+    y[5] = ((x[8] ^ (x[4] ^ x[12])) ^ (((x[2] ^ x[11]) ^ (x[15] ^ x[0])) ^ ((x[17] ^ x[10]) ^ (x[7] ^ (x[16] ^ x[6])))));
+    y[6] = (((x[5] ^ x[13]) ^ ((x[2] ^ x[11]) ^ (x[15] ^ x[0]))) ^ ((x[4] ^ x[9]) ^ ((x[16] ^ x[6]) ^ (x[17] ^ x[10]))));
+    y[7] = ((x[17] ^ x[1]) ^ ((x[4] ^ x[12]) ^ ((x[2] ^ x[11]) ^ (x[8] ^ x[9]))));
+    return y;
+  endfunction
+
+endpackage
--- a/core/include/ariane_pkg.sv
+++ b/core/include/ariane_pkg.sv
@ -196,7 +196,8 @@ package ariane_pkg;
    FPU,        // 7
    FPU_VEC,    // 8
    CVXIF,      // 9
-    ACCEL       // 10
+    ACCEL,      // 10
+    AES         // 11
  } fu_t;

  // Index of writeback ports
@ -496,7 +497,39 @@ package ariane_pkg;
    BREV8,
    // Zip instructions
    UNZIP,
-    ZIP
+    ZIP,
+    // Xperm instructions
+    XPERM8,
+    XPERM4,
+    // AES Encryption instructions
+    AES32ESI,
+    AES32ESMI,
+    AES64ES,
+    AES64ESM,
+    // AES Decryption instructions
+    AES32DSI,
+    AES32DSMI,
+    AES64DS,
+    AES64DSM,
+    AES64IM,
+    // AES Key-Schedule instructions
+    AES64KS1I,
+    AES64KS2,
+    // Hashing instructions
+    SHA256SIG0,
+    SHA256SIG1,
+    SHA256SUM0,
+    SHA256SUM1,
+    SHA512SIG0H,
+    SHA512SIG0L,
+    SHA512SIG1H,
+    SHA512SIG1L,
+    SHA512SUM0R,
+    SHA512SUM1R,
+    SHA512SIG0,
+    SHA512SIG1,
+    SHA512SUM0,
+    SHA512SUM1
  } fu_op;

  function automatic logic op_is_branch(input fu_op op);
--- a/core/include/cv64a6_imafdcv_sv39_config_pkg.sv
+++ b/core/include/cv64a6_imafdcv_sv39_config_pkg.sv
@ -35,7 +35,7 @@ package cva6_config_pkg;
  localparam CVA6ConfigAxiAddrWidth = 64;
  localparam CVA6ConfigAxiDataWidth = 64;
  localparam CVA6ConfigFetchUserEn = 0;
-  localparam CVA6ConfigFetchUserWidth = 1; // Just not to raise warnings
+  localparam CVA6ConfigFetchUserWidth = 1;  // Just not to raise warnings
  localparam CVA6ConfigDataUserEn = 0;
  localparam CVA6ConfigDataUserWidth = CVA6ConfigXlen;

--- a/core/issue_read_operands.sv
+++ b/core/issue_read_operands.sv
@ -64,6 +64,8 @@ module issue_read_operands
    input logic flu_ready_i,
    // ALU output is valid - EX_STAGE
    output logic [CVA6Cfg.NrIssuePorts-1:0] alu_valid_o,
+    // AES output is valid - EX_STAGE
+    output logic [CVA6Cfg.NrIssuePorts-1:0] aes_valid_o,
    // Branch unit is valid - EX_STAGE
    output logic [CVA6Cfg.NrIssuePorts-1:0] branch_valid_o,
    // Transformed trap instruction - EX_STAGE
@ -126,14 +128,15 @@ module issue_read_operands
    // Information dedicated to RVFI - RVFI
    output logic [CVA6Cfg.NrIssuePorts-1:0][CVA6Cfg.XLEN-1:0] rvfi_rs1_o,
    // Information dedicated to RVFI - RVFI
-    output logic [CVA6Cfg.NrIssuePorts-1:0][CVA6Cfg.XLEN-1:0] rvfi_rs2_o
-
+    output logic [CVA6Cfg.NrIssuePorts-1:0][CVA6Cfg.XLEN-1:0] rvfi_rs2_o,
+    // Original instruction bits for AES
+    output logic [5:0] orig_instr_aes_bits
 );

  localparam OPERANDS_PER_INSTR = CVA6Cfg.NrRgprPorts / CVA6Cfg.NrIssuePorts;

  typedef struct packed {
-    logic none, load, store, alu, alu2, ctrl_flow, mult, csr, fpu, fpu_vec, cvxif, accel;
+    logic none, load, store, alu, alu2, ctrl_flow, mult, csr, fpu, fpu_vec, cvxif, accel, aes;
  } fus_busy_t;

  logic [CVA6Cfg.NrIssuePorts-1:0] stall_raw, stall_rs1, stall_rs2, stall_rs3;
@ -153,6 +156,7 @@ module issue_read_operands
  logic               [CVA6Cfg.XLEN-1:0] imm_forward_rs3;

  logic [CVA6Cfg.NrIssuePorts-1:0] alu_valid_n, alu_valid_q;
+  logic [CVA6Cfg.NrIssuePorts-1:0] aes_valid_n, aes_valid_q;
  logic [CVA6Cfg.NrIssuePorts-1:0] mult_valid_n, mult_valid_q;
  logic [CVA6Cfg.NrIssuePorts-1:0] fpu_valid_n, fpu_valid_q;
  logic [1:0] fpu_fmt_n, fpu_fmt_q;
@ -271,6 +275,7 @@ module issue_read_operands

  assign fu_data_o = fu_data_q;
  assign alu_valid_o = alu_valid_q;
+  assign aes_valid_o = aes_valid_q;
  assign branch_valid_o = branch_valid_q;
  assign lsu_valid_o = lsu_valid_q;
  assign csr_valid_o = csr_valid_q;
@ -294,6 +299,7 @@ module issue_read_operands
    // Since we can not have two CVXIF instruction on 1st issue port, CVXIF is always ready for the pending instruction.
    if (!flu_ready_i) begin
      fus_busy[0].alu = 1'b1;
+      fus_busy[0].aes = 1'b1;
      fus_busy[0].ctrl_flow = 1'b1;
      fus_busy[0].csr = 1'b1;
      fus_busy[0].mult = 1'b1;
@ -303,6 +309,7 @@ module issue_read_operands
    // otherwise we will get contentions on the fixed latency bus
    if (|mult_valid_q) begin
      fus_busy[0].alu = 1'b1;
+      fus_busy[0].aes = 1'b1;
      fus_busy[0].ctrl_flow = 1'b1;
      fus_busy[0].csr = 1'b1;
    end
@ -401,6 +408,7 @@ module issue_read_operands
        LOAD: fu_busy[i] = fus_busy[i].load;
        STORE: fu_busy[i] = fus_busy[i].store;
        CVXIF: fu_busy[i] = fus_busy[i].cvxif;
+        AES: fu_busy[i] = fus_busy[i].aes;
        default:
        if (CVA6Cfg.FpPresent) begin
          unique case (issue_instr_i[i].fu)
@ -673,6 +681,7 @@ module issue_read_operands

  always_comb begin
    alu_valid_n    = '0;
+    aes_valid_n    = '0;
    lsu_valid_n    = '0;
    mult_valid_n   = '0;
    fpu_valid_n    = '0;
@ -703,6 +712,9 @@ module issue_read_operands
          CSR: begin
            csr_valid_n[i] = 1'b1;
          end
+          AES: begin
+            aes_valid_n[i] = 1'b1;
+          end
          default: begin
            if (issue_instr_i[i].fu == FPU && CVA6Cfg.FpPresent) begin
              fpu_valid_n[i] = 1'b1;
@ -721,6 +733,7 @@ module issue_read_operands
    // functional unit with the wrong inputs
    if (flush_i) begin
      alu_valid_n    = '0;
+      aes_valid_n    = '0;
      lsu_valid_n    = '0;
      mult_valid_n   = '0;
      fpu_valid_n    = '0;
@ -734,6 +747,7 @@ module issue_read_operands
  always_ff @(posedge clk_i or negedge rst_ni) begin
    if (!rst_ni) begin
      alu_valid_q    <= '0;
+      aes_valid_q    <= '0;
      lsu_valid_q    <= '0;
      mult_valid_q   <= '0;
      fpu_valid_q    <= '0;
@ -744,6 +758,7 @@ module issue_read_operands
      branch_valid_q <= '0;
    end else begin
      alu_valid_q    <= alu_valid_n;
+      aes_valid_q    <= aes_valid_n;
      lsu_valid_q    <= lsu_valid_n;
      mult_valid_q   <= mult_valid_n;
      fpu_valid_q    <= fpu_valid_n;
@ -1004,6 +1019,9 @@ module issue_read_operands
      x_transaction_rejected_o <= 1'b0;
    end else begin
      fu_data_q <= fu_data_n;
+      if (CVA6Cfg.ZKN) begin
+        orig_instr_aes_bits <= {orig_instr_i[0][31:30], orig_instr_i[0][23:20]};
+      end
      if (CVA6Cfg.RVH) begin
        tinst_q <= tinst_n;
      end
--- a/core/issue_stage.sv
+++ b/core/issue_stage.sv
@ -70,6 +70,8 @@ module issue_stage
    input logic flu_ready_i,
    // ALU output is valid - EX_STAGE
    output logic [CVA6Cfg.NrIssuePorts-1:0] alu_valid_o,
+    // AES output is valid - EX_STAGE
+    output logic [CVA6Cfg.NrIssuePorts-1:0] aes_valid_o,
    // Branch unit is valid - EX_STAGE
    output logic [CVA6Cfg.NrIssuePorts-1:0] branch_valid_o,
    // Information of branch prediction - EX_STAGE
@ -163,7 +165,9 @@ module issue_stage
    // Information dedicated to RVFI - RVFI
    output logic [CVA6Cfg.NrIssuePorts-1:0][CVA6Cfg.XLEN-1:0] rvfi_rs1_o,
    // Information dedicated to RVFI - RVFI
-    output logic [CVA6Cfg.NrIssuePorts-1:0][CVA6Cfg.XLEN-1:0] rvfi_rs2_o
+    output logic [CVA6Cfg.NrIssuePorts-1:0][CVA6Cfg.XLEN-1:0] rvfi_rs2_o,
+    // Original instruction bits for AES
+    output logic [5:0] orig_instr_aes_bits
 );
  // ---------------------------------------------------
  // Scoreboard (SB) <-> Issue and Read Operands (IRO)
@ -265,6 +269,7 @@ module issue_stage
      .is_compressed_instr_o,
      .flu_ready_i             (flu_ready_i),
      .alu_valid_o             (alu_valid_o),
+      .aes_valid_o             (aes_valid_o),
      .branch_valid_o          (branch_valid_o),
      .tinst_o                 (tinst_o),
      .branch_predict_o,
@ -300,7 +305,8 @@ module issue_stage
      .we_fpr_i,
      .stall_issue_o,
      .rvfi_rs1_o              (rvfi_rs1_o),
-      .rvfi_rs2_o              (rvfi_rs2_o)
+      .rvfi_rs2_o              (rvfi_rs2_o),
+      .orig_instr_aes_bits     (orig_instr_aes_bits)
  );

 endmodule
--- a/docs/01_cva6_user/RISCV_Instructions_RVZbkx.rst
+++ b/docs/01_cva6_user/RISCV_Instructions_RVZbkx.rst
@ -0,0 +1,93 @@
+.. Licensed under the Solderpad Hardware Licence, Version 2.1 (the "License");
+.. you may not use this file except in compliance with the License.
+.. SPDX-License-Identifier: Apache-2.0 WITH SHL-2.1
+.. You may obtain a copy of the License at https://solderpad.org/licenses/
+
+.. Author: Munail Waqar, 10xEngineers
+.. Date: 03.05.2025
+..
+   Copyright (c) 2023 OpenHW Group
+   Copyright (c) 2023 10xEngineers
+
+   SPDX-License-Identifier: Apache-2.0 WITH SHL-2.1
+
+.. Level 1
+   =======
+
+   Level 2
+   -------
+
+   Level 3
+   ~~~~~~~
+
+   Level 4
+   ^^^^^^^
+
+.. _cva6_riscv_instructions_RV32Zbkx:
+
+*Applicability of this chapter to configurations:*
+
+.. csv-table::
+   :widths: auto
+   :align: left
+   :header: "Configuration", "Implementation"
+
+   "CV32A60AX", "Implemented extension"
+   "CV64A6_MMU", "Implemented extension"
+
+=============================
+RVZbkx: Crossbar permutation instructions
+=============================
+
+The following instructions comprise the Zbkx extension:
+
+Xperm instructions
+--------------------
+The xperm instructions perform permutation operations on a register. They use indices extracted from rs2 to select data chunks (bytes for xperm8 or nibbles for xperm4) from rs1. The selected data is then placed into the destination register (rd) at positions corresponding to the extracted indices in rs2. If an index in rs2 is out of range, the corresponding chunk in rd is set to 0.
+
+-----------+-----------+-----------------------+
+| RV32      | RV64      | Mnemonic              |
+===========+===========+=======================+
+| ✔         | ✔         | xperm8 rd, rs1, rs2   |
+-----------+-----------+-----------------------+
+| ✔         | ✔         | xperm4 rd, rs1, rs2   |
+-----------+-----------+-----------------------+
+
+
+RV32 and RV64 Instructions
+~~~~~~~~~~~~~~~~~~~~~~~~~~
+
+
+- **XPERM8**: Crossbar permutation (bytes)
+
+    **Format**: xperm8 rd, rs1, rs2
+
+    **Description**: The xperm8 instruction operates on bytes. The rs1 register contains a vector of XLEN/8 8-bit elements. The rs2 register contains a vector of XLEN/8 8-bit indexes. The result is each element in rs2 replaced by the indexed element in rs1, or zero if the index into rs2 is out of bounds.
+
+    **Pseudocode**: foreach (i from 0 to xlen by 8) {
+                        if (rs2[i*8+:8]<(xlen/8))
+                            X(rd)[i*8+:8] = rs1[rs2[i*8+:8]*8+:8];
+                        else
+                            X(rd)[i*8+:8] = 8'b0;
+                    }
+
+    **Invalid values**: NONE
+
+    **Exception raised**: NONE
+
+- **XPERM4**: Crossbar permutation (nibbles)
+
+    **Format**: xperm4 rd, rs1, rs2 
+
+    **Description**: The xperm4 instruction operates on nibbles. The rs1 register contains a vector of XLEN/4 4-bit elements. The rs2 register contains a vector of XLEN/4 4-bit indexes. The result is each element in rs2 replaced by the indexed element in rs1, or zero if the index into rs2 is out of bounds.
+
+    **Pseudocode**: foreach (i from 0 to xlen by 4) {
+                        if (rs2[i*4+:4]<(xlen/4))
+                            X(rd)[i*4+:4] = rs1[rs2[i*4+:4]*4+:4];
+                        else
+                            X(rd)[i*4+:4] = 4'b0;
+                    }
+
+    **Invalid values**: NONE
+
+    **Exception raised**: NONE
--- a/docs/01_cva6_user/RISCV_Instructions_RVZknd.rst
+++ b/docs/01_cva6_user/RISCV_Instructions_RVZknd.rst
@ -0,0 +1,161 @@
+.. Licensed under the Solderpad Hardware Licence, Version 2.1 (the "License");
+.. you may not use this file except in compliance with the License.
+.. SPDX-License-Identifier: Apache-2.0 WITH SHL-2.1
+.. You may obtain a copy of the License at https://solderpad.org/licenses/
+
+.. Author: Munail Waqar, 10xEngineers
+.. Date: 03.05.2025
+..
+   Copyright (c) 2023 OpenHW Group
+   Copyright (c) 2023 10xEngineers
+
+   SPDX-License-Identifier: Apache-2.0 WITH SHL-2.1
+
+.. Level 1
+   =======
+
+   Level 2
+   -------
+
+   Level 3
+   ~~~~~~~
+
+   Level 4
+   ^^^^^^^
+
+.. _cva6_riscv_instructions_RV32Zkne:
+
+*Applicability of this chapter to configurations:*
+
+.. csv-table::
+   :widths: auto
+   :align: left
+   :header: "Configuration", "Implementation"
+
+   "CV32A60AX", "Implemented extension"
+   "CV64A6_MMU", "Implemented extension"
+
+=============================
+RVZknd: NIST Suite: AES Decryption
+=============================
+
+The following instructions comprise the Zknd extension:
+
+Decryption instructions
+--------------------
+The Decryption instructions (Zknd) provide support and acceleration for AES decryption and key expansion.
+
+-----------+-----------+----------------------------+
+| RV32      | RV64      | Mnemonic                   |
+===========+===========+============================+
+| ✔         |           | aes32dsi rd, rs1, rs2, bs  |
+-----------+-----------+----------------------------+
+| ✔         |           | aes32dsmi rd, rs1, rs2, bs |
+-----------+-----------+----------------------------+
+|           | ✔         | aes64ds rd, rs1, rs2       |
+-----------+-----------+----------------------------+
+|           | ✔         | aes64dsm rd, rs1, rs2      |
+-----------+-----------+----------------------------+
+
+RV32 specific instructions
+~~~~~~~~~~~~~~~~~~~~~~~~~~
+
+
+- **AES32DSI**: AES final round decryption instruction for RV32
+
+    **Format**: aes32dsi rd, rs1, rs2, bs
+
+    **Description**: This instruction sources a single byte from rs2 according to bs. To this it applies the inverse AES SBox operation, and XOR the result with rs1.
+
+    **Pseudocode**: X(rd) = X(rs1)[31..0] ^ rol32((0x000000 @ aes_sbox_inv((X(rs2)[31..0] >> bs*8)[7..0])), unsigned(bs*8));
+
+    **Invalid values**: NONE
+
+    **Exception raised**: NONE
+
+- **AES32DSMI**: AES middle round decryption instruction for RV32.
+
+    **Format**: aes32dsmi rd, rs1, rs2, bs
+
+    **Description**: This instruction sources a single byte from rs2 according to bs. To this it applies the inverse AES SBox operation, and a partial inverse MixColumn, before XORing the result with rs1.
+
+    **Pseudocode**: X(rd) = X(rs1)[31..0] ^ rol32(aes_mixcolumn_byte_inv(aes_sbox_inv((X(rs2)[31..0] >> bs*8)[7..0])), unsigned(bs*8));
+
+    **Invalid values**: NONE
+
+    **Exception raised**: NONE
+
+RV64 specific instructions
+~~~~~~~~~~~~~~~~~~~~~~~~~~	
+
+- **AES64DS**: AES final round decryption instruction for RV64.
+
+    **Format**: aes64ds rd, rs1, rs2
+
+    **Description**: Uses the two 64-bit source registers to represent the entire AES state, and produces half of the next round output, applying the Inverse ShiftRows and SubBytes steps.
+
+    **Pseudocode**: X(rd) = aes_apply_inv_sbox_to_each_byte(aes_rv64_shiftrows_inv(X(rs2)[63..0], X(rs1)[63..0]));
+
+    **Invalid values**: NONE
+
+    **Exception raised**: NONE
+
+- **AES64DSM**: AES middle round decryption instruction for RV64.
+
+    **Format**: aes64dsm rd, rs1, rs2
+
+    **Description**: Uses the two 64-bit source registers to represent the entire AES state, and produces half of the next round output, applying the Inverse ShiftRows, SubBytes and MixColumns steps.
+
+    **Pseudocode**: X(rd) = aes_mixcolumn_inv(aes_apply_inv_sbox_to_each_byte(aes_rv64_shiftrows_inv(X(rs2)[63..0], X(rs1)[63..0]))[63..32])
+                            @ 
+                            aes_mixcolumn_inv(aes_apply_inv_sbox_to_each_byte(aes_rv64_shiftrows_inv(X(rs2)[63..0], X(rs1)[63..0]))[31..0]);
+
+    **Invalid values**: NONE
+
+    **Exception raised**: NONE
+
+
+Key Schedule instructions
+--------------------------------
+
+-----------+-----------+-----------------------+
+| RV32      | RV64      | Mnemonic              |
+===========+===========+=======================+
+|           | ✔         | aes64ks1i rd, rs      |
+-----------+-----------+-----------------------+
+|           | ✔         | aes64ks2 rd, rs       |
+-----------+-----------+-----------------------+
+
+RV64 specific Instructions
+~~~~~~~~~~~~~~~~~~~~~~~~~~
+
+- **AES64KS1I**: This instruction implements part of the KeySchedule operation for the AES Block cipher involving the SBox operation.
+
+    **Format**: aes64ks1i rd, rs1, rnum
+
+    **Description**: This instruction implements the rotation, SubBytes and Round Constant addition steps of the AES block cipher Key Schedule. Note that rnum must be in the range 0x0..0xA.
+
+    **Pseudocode**: if(unsigned(rnum) > A) {
+                        X(rd) = 64'b0;
+                    } else {
+                        tmp = if (rnum ==0xA)
+                                X(rs1)[63..32] 
+                               else 
+                                ror32(X(rs1)[63..32], 8)
+                        X(rd) = (aes_subword_fwd(tmp) ^ aes_decode_rcon(rnum)) @ (aes_subword_fwd(tmp) ^ aes_decode_rcon(rnum));
+
+    **Invalid values**: NONE
+
+    **Exception raised**: NONE
+
+- **AES64KS2**: This instruction implements part of the KeySchedule operation for the AES Block cipher.
+
+    **Format**: aes64ks2 rd, rs1, rs2
+
+    **Description**: This instruction implements the additional XORing of key words as part of the AES block cipher Key Schedule.
+
+    **Pseudocode**: X(rd) = (X(rs1)[63..32] ^ X(rs2)[31..0] ^ X(rs2)[63..32]) @ (X(rs1)[63..32] ^ X(rs2)[31..0]);
+
+    **Invalid values**: NONE
+
+    **Exception raised**: NONE
--- a/docs/01_cva6_user/RISCV_Instructions_RVZkne.rst
+++ b/docs/01_cva6_user/RISCV_Instructions_RVZkne.rst
@ -0,0 +1,161 @@
+.. Licensed under the Solderpad Hardware Licence, Version 2.1 (the "License");
+.. you may not use this file except in compliance with the License.
+.. SPDX-License-Identifier: Apache-2.0 WITH SHL-2.1
+.. You may obtain a copy of the License at https://solderpad.org/licenses/
+
+.. Author: Munail Waqar, 10xEngineers
+.. Date: 03.05.2025
+..
+   Copyright (c) 2023 OpenHW Group
+   Copyright (c) 2023 10xEngineers
+
+   SPDX-License-Identifier: Apache-2.0 WITH SHL-2.1
+
+.. Level 1
+   =======
+
+   Level 2
+   -------
+
+   Level 3
+   ~~~~~~~
+
+   Level 4
+   ^^^^^^^
+
+.. _cva6_riscv_instructions_RV32Zkne:
+
+*Applicability of this chapter to configurations:*
+
+.. csv-table::
+   :widths: auto
+   :align: left
+   :header: "Configuration", "Implementation"
+
+   "CV32A60AX", "Implemented extension"
+   "CV64A6_MMU", "Implemented extension"
+
+=============================
+RVZkne: NIST Suite: AES Encryption
+=============================
+
+The following instructions comprise the Zkne extension:
+
+Encryption instructions
+--------------------
+The Encryption instructions (Zkne) provide support and acceleration for AES encryption and key expansion.
+
+-----------+-----------+----------------------------+
+| RV32      | RV64      | Mnemonic                   |
+===========+===========+============================+
+| ✔         |           | aes32esi rd, rs1, rs2, bs  |
+-----------+-----------+----------------------------+
+| ✔         |           | aes32esmi rd, rs1, rs2, bs |
+-----------+-----------+----------------------------+
+|           | ✔         | aes64es rd, rs1, rs2       |
+-----------+-----------+----------------------------+
+|           | ✔         | aes64esm rd, rs1, rs2      |
+-----------+-----------+----------------------------+
+
+RV32 specific instructions
+~~~~~~~~~~~~~~~~~~~~~~~~~~
+
+
+- **AES32ESI**: AES final round encryption instruction for RV32
+
+    **Format**: aes32esi rd, rs1, rs2, bs
+
+    **Description**: This instruction sources a single byte from rs2 according to bs. To this it applies the forward AES SBox operation, before XORing the result with rs1.
+
+    **Pseudocode**: X(rd) = X(rs1)[31..0] ^ rol32((0x000000 @ aes_sbox_fwd((X(rs2)[31..0] >> bs*8)[7..0])), unsigned(bs*8));
+
+    **Invalid values**: NONE
+
+    **Exception raised**: NONE
+
+- **AES32ESMI**: AES middle round encryption instruction for RV32.
+
+    **Format**: aes32esmi rd, rs1, rs2, bs
+
+    **Description**: This instruction sources a single byte from rs2 according to bs. To this it applies the forward AES SBox operation, and a partial forward MixColumn, before XORing the result with rs1.
+
+    **Pseudocode**: X(rd) = X(rs1)[31..0] ^ rol32(aes_mixcolumn_byte_fwd(aes_sbox_fwd((X(rs2)[31..0] >> bs*8)[7..0])), unsigned(bs*8));
+
+    **Invalid values**: NONE
+
+    **Exception raised**: NONE
+
+RV64 specific instructions
+~~~~~~~~~~~~~~~~~~~~~~~~~~	
+
+- **AES64ES**: AES final round encryption instruction for RV64.
+
+    **Format**: aes64es rd, rs1, rs2
+
+    **Description**: Uses the two 64-bit source registers to represent the entire AES state, and produces half of the next round output, applying the ShiftRows and SubBytes steps.
+
+    **Pseudocode**: X(rd) = aes_apply_fwd_sbox_to_each_byte(aes_rv64_shiftrows_fwd(X(rs2)[63..0], X(rs1)[63..0]));
+
+    **Invalid values**: NONE
+
+    **Exception raised**: NONE
+
+- **AES64ESM**: AES middle round encryption instruction for RV64.
+
+    **Format**: aes64esm rd, rs1, rs2
+
+    **Description**: Uses the two 64-bit source registers to represent the entire AES state, and produces half of the next round output, applying the ShiftRows, SubBytes and MixColumns steps.
+
+    **Pseudocode**: X(rd) = aes_mixcolumn_fwd(aes_apply_fwd_sbox_to_each_byte(aes_rv64_shiftrows_fwd(X(rs2)[63..0], X(rs1)[63..0]))[63..32])
+                            @ 
+                            aes_mixcolumn_fwd(aes_apply_fwd_sbox_to_each_byte(aes_rv64_shiftrows_fwd(X(rs2)[63..0], X(rs1)[63..0]))[31..0]);
+
+    **Invalid values**: NONE
+
+    **Exception raised**: NONE
+
+
+Key Schedule instructions
+--------------------------------
+
+-----------+-----------+-----------------------+
+| RV32      | RV64      | Mnemonic              |
+===========+===========+=======================+
+|           | ✔         | aes64ks1i rd, rs      |
+-----------+-----------+-----------------------+
+|           | ✔         | aes64ks2 rd, rs       |
+-----------+-----------+-----------------------+
+
+RV64 specific Instructions
+~~~~~~~~~~~~~~~~~~~~~~~~~~
+
+- **AES64KS1I**: This instruction implements part of the KeySchedule operation for the AES Block cipher involving the SBox operation.
+
+    **Format**: aes64ks1i rd, rs1, rnum
+
+    **Description**: This instruction implements the rotation, SubBytes and Round Constant addition steps of the AES block cipher Key Schedule. Note that rnum must be in the range 0x0..0xA.
+
+    **Pseudocode**: if(unsigned(rnum) > A) {
+                        X(rd) = 64'b0;
+                    } else {
+                        tmp = if (rnum ==0xA)
+                                X(rs1)[63..32] 
+                               else 
+                                ror32(X(rs1)[63..32], 8)
+                        X(rd) = (aes_subword_fwd(tmp) ^ aes_decode_rcon(rnum)) @ (aes_subword_fwd(tmp) ^ aes_decode_rcon(rnum));
+
+    **Invalid values**: NONE
+
+    **Exception raised**: NONE
+
+- **AES64KS2**: This instruction implements part of the KeySchedule operation for the AES Block cipher.
+
+    **Format**: aes64ks2 rd, rs1, rs2
+
+    **Description**: This instruction implements the additional XORing of key words as part of the AES block cipher Key Schedule.
+
+    **Pseudocode**: X(rd) = (X(rs1)[63..32] ^ X(rs2)[31..0] ^ X(rs2)[63..32]) @ (X(rs1)[63..32] ^ X(rs2)[31..0]);
+
+    **Invalid values**: NONE
+
+    **Exception raised**: NONE
--- a/docs/01_cva6_user/RISCV_Instructions_RVZknh.rst
+++ b/docs/01_cva6_user/RISCV_Instructions_RVZknh.rst
@ -0,0 +1,263 @@
+.. Licensed under the Solderpad Hardware Licence, Version 2.1 (the "License");
+.. you may not use this file except in compliance with the License.
+.. SPDX-License-Identifier: Apache-2.0 WITH SHL-2.1
+.. You may obtain a copy of the License at https://solderpad.org/licenses/
+
+.. Author: Munail Waqar, 10xEngineers
+.. Date: 03.05.2025
+..
+   Copyright (c) 2023 OpenHW Group
+   Copyright (c) 2023 10xEngineers
+
+   SPDX-License-Identifier: Apache-2.0 WITH SHL-2.1
+
+.. Level 1
+   =======
+
+   Level 2
+   -------
+
+   Level 3
+   ~~~~~~~
+
+   Level 4
+   ^^^^^^^
+
+.. _cva6_riscv_instructions_RV32Zknh:
+
+*Applicability of this chapter to configurations:*
+
+.. csv-table::
+   :widths: auto
+   :align: left
+   :header: "Configuration", "Implementation"
+
+   "CV32A60AX", "Implemented extension"
+   "CV64A6_MMU", "Implemented extension"
+
+=============================
+RVZknh: NIST Suite: Hash Function Instructions
+=============================
+
+The following instructions comprise the Zknh extension:
+
+Hash Function instructions
+--------------------
+The Hash Function instructions (Zknh) provide acceleration for the SHA2 family of cryptographic hash functions.
+
+-----------+-----------+----------------------------+
+| RV32      | RV64      | Mnemonic                   |
+===========+===========+============================+
+| ✔         | ✔         | sha256sig0 rd, rs1         |
+-----------+-----------+----------------------------+
+| ✔         | ✔         | sha256sig1 rd, rs1         |
+-----------+-----------+----------------------------+
+| ✔         | ✔         | sha256sum0 rd, rs1         |
+-----------+-----------+----------------------------+
+| ✔         | ✔         | sha256sum1 rd, rs1         |
+-----------+-----------+----------------------------+
+| ✔         |           | sha512sig0h rd, rs1, rs2   |
+-----------+-----------+----------------------------+
+| ✔         |           | sha512sig0l rd, rs1, rs2   |
+-----------+-----------+----------------------------+
+| ✔         |           | sha512sig1h rd, rs1, rs2   |
+-----------+-----------+----------------------------+
+| ✔         |           | sha512sig1l rd, rs1, rs2   |
+-----------+-----------+----------------------------+
+| ✔         |           | sha512sum0r rd, rs1, rs2   |
+-----------+-----------+----------------------------+
+| ✔         |           | sha512sum1r rd, rs1, rs2   |
+-----------+-----------+----------------------------+
+|           | ✔         | sha512sig0 rd, rs1         |
+-----------+-----------+----------------------------+
+|           | ✔         | sha512sig1 rd, rs1         |
+-----------+-----------+----------------------------+
+|           | ✔         | sha512sum0 rd, rs1         |
+-----------+-----------+----------------------------+
+|           | ✔         | sha512sum1 rd, rs1         |
+-----------+-----------+----------------------------+
+
+
+RV32 and RV64 Instructions
+~~~~~~~~~~~~~~~~~~~~~~~~~~
+
+- **SHA256SIG0**: SHA2-256 Sigma0 instruction
+
+    **Format**: sha256sig0 rd, rs1
+
+    **Description**: Implements the Sigma0 transformation function as used in the SHA2-256 hash function. For RV32, the entire XLEN source register is operated on. For RV64, the low 32 bits of the source register are operated on, and the result sign extended to XLEN bits.
+
+    **Pseudocode**: X(rd) = EXTS(ror32(X(rs1)[31..0], 7) ^ ror32(X(rs1)[31..0], 18) ^ (X(rs1)[31..0] >> 3));
+
+    **Invalid values**: NONE
+
+    **Exception raised**: NONE
+
+
+- **SHA256SIG1**: SHA2-256 Sigma1 instruction
+
+    **Format**: sha256sig1 rd, rs1
+
+    **Description**: Implements the Sigma1 transformation function as used in the SHA2-256 hash function. For RV32, the entire XLEN source register is operated on. For RV64, the low 32 bits of the source register are operated on, and the result sign extended to XLEN bits.
+
+    **Pseudocode**: X(rd) = EXTS(ror32(X(rs1)[31..0], 17) ^ ror32(X(rs1)[31..0], 19) ^ (X(rs1)[31..0] >> 10));
+
+    **Invalid values**: NONE
+
+    **Exception raised**: NONE
+
+
+- **SHA256SUM0**: SHA2-256 Sum0 instruction
+
+    **Format**: sha256sum0 rd, rs1
+
+    **Description**: Implements the Sum0 transformation function as used in the SHA2-256 hash function. For RV32, the entire XLEN source register is operated on. For RV64, the low 32 bits of the source register are operated on, and the result sign extended to XLEN bits.
+
+    **Pseudocode**: X(rd) = EXTS(ror32(X(rs1)[31..0], 2) ^ ror32(X(rs1)[31..0], 13) ^ ror32(X(rs1)[31..0] >> 22));
+
+    **Invalid values**: NONE
+
+    **Exception raised**: NONE
+
+
+- **SHA256SUM1**: SHA2-256 Sum1 instruction
+
+    **Format**: sha256sum1 rd, rs1
+
+    **Description**: Implements the Sum1 transformation function as used in the SHA2-256 hash function. For RV32, the entire XLEN source register is operated on. For RV64, the low 32 bits of the source register are operated on, and the result sign extended to XLEN bits.
+
+    **Pseudocode**: X(rd) = EXTS(ror32(X(rs1)[31..0], 6) ^ ror32(X(rs1)[31..0], 11) ^ ror32(X(rs1)[31..0] >> 25));
+
+    **Invalid values**: NONE
+
+    **Exception raised**: NONE
+
+
+
+RV32 specific instructions
+~~~~~~~~~~~~~~~~~~~~~~~~~~	
+
+- **SHA512SIG0H**: SHA2-512 Sigma0 high (RV32)
+
+    **Format**: sha512sig0h rd, rs1, rs2
+
+    **Description**: Implements the high half of the Sigma0 transformation, as used in the SHA2-512 hash function. Used to compute the Sigma0 transform of the SHA2-512 hash function in conjunction with the sha512sig0l instruction. The transform is a 64-bit to 64-bit function, so the input and output are each represented by two 32-bit registers.
+
+    **Pseudocode**: X(rd) = EXTS((X(rs1) >> 1) ^ (X(rs1) >> 7) ^ (X(rs1) >> 8) ^ (X(rs2) << 31) ^ (X(rs2) << 24));
+
+    **Invalid values**: NONE
+
+    **Exception raised**: NONE
+
+- **SHA512SIG0L**: SHA2-512 Sigma0 low (RV32)
+
+    **Format**: sha512sig0l rd, rs1, rs2
+
+    **Description**: Implements the low half of the Sigma0 transformation, as used in the SHA2-512 hash function. Used to compute the Sigma0 transform of the SHA2-512 hash function in conjunction with the sha512sig0h instruction. The transform is a 64-bit to 64-bit function, so the input and output are each represented by two 32-bit registers.
+
+    **Pseudocode**: X(rd) = EXTS((X(rs1) >> 1) ^ (X(rs1) >> 7) ^ (X(rs1) >> 8) ^ (X(rs2) << 31) ^ (X(rs2) << 25) ^ (X(rs2) << 24));
+
+    **Invalid values**: NONE
+
+    **Exception raised**: NONE
+
+- **SHA512SIG1H**: SHA2-512 Sigma1 high (RV32)
+
+    **Format**: sha512sig1h rd, rs1, rs2
+
+    **Description**: Implements the high half of the Sigma1 transformation, as used in the SHA2-512 hash function. Used to compute the Sigma1 transform of the SHA2-512 hash function in conjunction with the sha512sig1l instruction. The transform is a 64-bit to 64-bit function, so the input and output are each represented by two 32-bit registers.
+
+    **Pseudocode**: X(rd) = EXTS((X(rs1) << 3) ^ (X(rs1) >> 6) ^ (X(rs1) >> 19) ^ (X(rs2) >> 29) ^ (X(rs2) << 13));
+
+    **Invalid values**: NONE
+
+    **Exception raised**: NONE
+
+- **SHA512SIG1L**: SHA2-512 Sigma1 low (RV32)
+
+    **Format**: sha512sig1l rd, rs1, rs2
+
+    **Description**: Implements the low half of the Sigma1 transformation, as used in the SHA2-512 hash function. Used to compute the Sigma1 transform of the SHA2-512 hash function in conjunction with the sha512sig0h instruction. The transform is a 64-bit to 64-bit function, so the input and output are each represented by two 32-bit registers.
+
+    **Pseudocode**: X(rd) = EXTS((X(rs1) << 3) ^ (X(rs1) >> 6) ^ (X(rs1) >> 19) ^ (X(rs2) >> 29) ^ (X(rs2) << 26) ^ (X(rs2) << 13));
+
+    **Invalid values**: NONE
+
+    **Exception raised**: NONE
+
+- **SHA512SUM0R**: SHA2-512 Sum0 (RV32)
+
+    **Format**: sha512sum0r rd, rs1, rs2
+
+    **Description**: Implements the Sum0 transformation, as used in the SHA2-512 hash function. The transform is a 64-bit to 64-bit function, so the input and output are each represented by two 32-bit registers.
+
+    **Pseudocode**: X(rd) = EXTS((X(rs1) << 25) ^ (X(rs1) << 30) ^ (X(rs1) >> 28) ^ (X(rs2) >> 7) ^ (X(rs2) >> 2) ^ (X(rs2) << 4));
+
+    **Invalid values**: NONE
+
+    **Exception raised**: NONE
+
+- **SHA512SUM1R**: SHA2-512 Sum1 (RV32)
+
+    **Format**: sha512sum1r rd, rs1, rs2
+
+    **Description**: Implements the Sum1 transformation, as used in the SHA2-512 hash function. The transform is a 64-bit to 64-bit function, so the input and output are each represented by two 32-bit registers.
+
+    **Pseudocode**: X(rd) = EXTS((X(rs1) << 23) ^ (X(rs1) >> 14) ^ (X(rs1) >> 18) ^ (X(rs2) >> 9) ^ (X(rs2) << 18) ^ (X(rs2) << 14));
+
+    **Invalid values**: NONE
+
+    **Exception raised**: NONE
+
+
+
+RV64 specific Instructions
+~~~~~~~~~~~~~~~~~~~~~~~~~~
+
+- **SHA512SIG0**: SHA2-512 Sigma0 instruction (RV64)
+
+    **Format**: sha512sig0 rd, rs1
+
+    **Description**: Implements the Sigma0 transformation function as used in the SHA2-512 hash function.
+
+    **Pseudocode**: X(rd) = ror64(X(rs1), 1) ^ ror64(X(rs1), 8) ^ (X(rs1) >> 7);
+
+    **Invalid values**: NONE
+
+    **Exception raised**: NONE
+
+- **SHA512SIG1**: SHA2-512 Sigma1 instruction (RV64)
+
+    **Format**: sha512sig1 rd, rs1
+
+    **Description**: Implements the Sigma1 transformation function as used in the SHA2-512 hash function.
+
+    **Pseudocode**: X(rd) = ror64(X(rs1), 19) ^ ror64(X(rs1), 61) ^ (X(rs1) >> 6);
+
+    **Invalid values**: NONE
+
+    **Exception raised**: NONE
+
+- **SHA512SUM0**: SHA2-512 Sum0 instruction (RV64)
+
+    **Format**: sha512sum0 rd, rs1
+
+    **Description**: Implements the Sum0 transformation function as used in the SHA2-512 hash function.
+
+    **Pseudocode**: X(rd) = ror64(X(rs1), 28) ^ ror64(X(rs1), 34) ^ ror64(X(rs1) ,39);
+
+    **Invalid values**: NONE
+
+    **Exception raised**: NONE
+
+- **SHA512SUM1**: SHA2-512 Sum1 instruction (RV64)
+
+    **Format**: sha512sum1 rd, rs1
+
+    **Description**: Implements the Sum1 transformation function as used in the SHA2-512 hash function.
+
+    **Pseudocode**: X(rd) = ror64(X(rs1), 14) ^ ror64(X(rs1), 18) ^ ror64(X(rs1) ,41);
+
+    **Invalid values**: NONE
+
+    **Exception raised**: NONE
--- a/verif/sim/cva6.py
+++ b/verif/sim/cva6.py
@ -884,12 +884,12 @@ def load_config(args, cwd):
    if base in ("cv64a6_imafdch_sv39", "cv64a6_imafdch_sv39_wb"):
      args.mabi = "lp64d"
      args.isa  = "rv64gch_zba_zbb_zbs_zbc"
-    elif base in ("cv64a6_imafdc_sv39_wb"):
+    elif base in ("cv64a6_imafdc_sv39_wb",):
      args.mabi = "lp64d"
      args.isa  = "rv64gc_zba_zbb_zbs_zbc"
    elif base in ("cv64a6_imafdc_sv39", "cv64a6_imafdc_sv39_hpdcache", "cv64a6_imafdc_sv39_hpdcache_wb"):
      args.mabi = "lp64d"
-      args.isa  = "rv64gc_zba_zbb_zbs_zbc_zbkb"
+      args.isa  = "rv64gc_zba_zbb_zbs_zbc_zbkb_zbkx_zkne_zknd_zknh"
    elif base == "cv32a60x":
      args.mabi = "ilp32"
      args.isa  = "rv32imc_zba_zbb_zbs_zbc"
@ -906,7 +906,7 @@ def load_config(args, cwd):
      args.isa  = "rv32imac"
    elif base == "cv32a6_imac_sv32":
      args.mabi = "ilp32"
-      args.isa  = "rv32imac_zbkb"
+      args.isa  = "rv32imac_zbkb_zbkx_zkne_zknd_zknh"
    elif base == "cv32a6_imafc_sv32":
      args.mabi = "ilp32f"
      args.isa  = "rv32imafc"
--- a/verif/tests/testlist_riscv-arch-test-cv64a6_imafdc_sv39.yaml
+++ b/verif/tests/testlist_riscv-arch-test-cv64a6_imafdc_sv39.yaml
@ -968,6 +968,8 @@ testlist:
    <<: *common_test_config
    asm_tests: <path_var>/riscv-arch-test/riscv-test-suite/rv64i_m/A/src/amoxor.w-01.S

+
+    #K
  - test: rv64im-pack-01
    <<: *common_test_config
    iterations: 1
@ -987,3 +989,168 @@ testlist:
    <<: *common_test_config
    iterations: 1
    asm_tests: <path_var>/riscv-arch-test/riscv-test-suite/rv64i_m/K/src/brev8-01.S
+
+  - test: rv64i_m-xperm8-01
+    iterations: 1
+    <<: *common_test_config
+    asm_tests: <path_var>/riscv-arch-test/riscv-test-suite/rv64i_m/K/src/xperm8-01.S
+
+  - test: rv64i_m-xperm4-01
+    iterations: 1
+    <<: *common_test_config
+    asm_tests: <path_var>/riscv-arch-test/riscv-test-suite/rv64i_m/K/src/xperm4-01.S
+
+  - test: rv64i_m-aes64es-01
+    iterations: 1
+    <<: *common_test_config
+    asm_tests: <path_var>/riscv-arch-test/riscv-test-suite/rv64i_m/K/src/aes64es-01.S
+
+  - test: rv64i_m-aes64esm-01
+    iterations: 1
+    <<: *common_test_config
+    asm_tests: <path_var>/riscv-arch-test/riscv-test-suite/rv64i_m/K/src/aes64esm-01.S
+
+  - test: rv64i_m-aes64ks2-01
+    iterations: 1
+    <<: *common_test_config
+    asm_tests: <path_var>/riscv-arch-test/riscv-test-suite/rv64i_m/K/src/aes64ks2-01.S
+
+  - test: rv64i_m-aes64ks1i-01
+    iterations: 1
+    <<: *common_test_config
+    asm_tests: <path_var>/riscv-arch-test/riscv-test-suite/rv64i_m/K/src/aes64ks1i-01.S
+
+  - test: rv64i_m-aes64ds-01
+    iterations: 1
+    <<: *common_test_config
+    asm_tests: <path_var>/riscv-arch-test/riscv-test-suite/rv64i_m/K/src/aes64ds-01.S
+
+  - test: rv64i_m-aes64dsm-01
+    iterations: 1
+    <<: *common_test_config
+    asm_tests: <path_var>/riscv-arch-test/riscv-test-suite/rv64i_m/K/src/aes64dsm-01.S
+
+  - test: rv64i_m-aes64im-01
+    iterations: 1
+    <<: *common_test_config
+    asm_tests: <path_var>/riscv-arch-test/riscv-test-suite/rv64i_m/K/src/aes64im-01.S
+  
+  - test: rv64i_m-sha256sig0-01
+    iterations: 1
+    <<: *common_test_config
+    asm_tests: <path_var>/riscv-arch-test/riscv-test-suite/rv64i_m/K/src/sha256sig0-01.S
+
+  - test: rv64i_m-sha256sig0-rwp1
+    iterations: 1
+    <<: *common_test_config
+    asm_tests: <path_var>/riscv-arch-test/riscv-test-suite/rv64i_m/K/src/sha256sig0-rwp1.S
+
+  - test: rv64i_m-sha256sig0-rwp2
+    iterations: 1
+    <<: *common_test_config
+    asm_tests: <path_var>/riscv-arch-test/riscv-test-suite/rv64i_m/K/src/sha256sig0-rwp2.S
+
+  - test: rv64i_m-sha256sig1-01
+    iterations: 1
+    <<: *common_test_config
+    asm_tests: <path_var>/riscv-arch-test/riscv-test-suite/rv64i_m/K/src/sha256sig1-01.S
+
+  - test: rv64i_m-sha256sig1-rwp1
+    iterations: 1
+    <<: *common_test_config
+    asm_tests: <path_var>/riscv-arch-test/riscv-test-suite/rv64i_m/K/src/sha256sig1-rwp1.S
+
+  - test: rv64i_m-sha256sig1-rwp2
+    iterations: 1
+    <<: *common_test_config
+    asm_tests: <path_var>/riscv-arch-test/riscv-test-suite/rv64i_m/K/src/sha256sig1-rwp2.S
+
+  - test: rv64i_m-sha256sum0-01
+    iterations: 1
+    <<: *common_test_config
+    asm_tests: <path_var>/riscv-arch-test/riscv-test-suite/rv64i_m/K/src/sha256sum0-01.S
+
+  - test: rv64i_m-sha256sum0-rwp1
+    iterations: 1
+    <<: *common_test_config
+    asm_tests: <path_var>/riscv-arch-test/riscv-test-suite/rv64i_m/K/src/sha256sum0-rwp1.S
+
+  - test: rv64i_m-sha256sum0-rwp2
+    iterations: 1
+    <<: *common_test_config
+    asm_tests: <path_var>/riscv-arch-test/riscv-test-suite/rv64i_m/K/src/sha256sum0-rwp2.S
+
+  - test: rv64i_m-sha256sum1-01
+    iterations: 1
+    <<: *common_test_config
+    asm_tests: <path_var>/riscv-arch-test/riscv-test-suite/rv64i_m/K/src/sha256sum1-01.S
+  
+  - test: rv64i_m-sha256sum1-rwp1
+    iterations: 1
+    <<: *common_test_config
+    asm_tests: <path_var>/riscv-arch-test/riscv-test-suite/rv64i_m/K/src/sha256sum1-rwp1.S
+
+  - test: rv64i_m-sha256sum1-rwp2
+    iterations: 1
+    <<: *common_test_config
+    asm_tests: <path_var>/riscv-arch-test/riscv-test-suite/rv64i_m/K/src/sha256sum1-rwp2.S
+
+  - test: rv64i_m-sha512sig0-01
+    iterations: 1
+    <<: *common_test_config
+    asm_tests: <path_var>/riscv-arch-test/riscv-test-suite/rv64i_m/K/src/sha512sig0-01.S
+  
+  - test: rv64i_m-sha512sig0-rwp1
+    iterations: 1
+    <<: *common_test_config
+    asm_tests: <path_var>/riscv-arch-test/riscv-test-suite/rv64i_m/K/src/sha512sig0-rwp1.S
+
+  - test: rv64i_m-sha512sig0-rwp2
+    iterations: 1
+    <<: *common_test_config
+    asm_tests: <path_var>/riscv-arch-test/riscv-test-suite/rv64i_m/K/src/sha512sig0-rwp2.S
+
+  - test: rv64i_m-sha512sig1-01
+    iterations: 1
+    <<: *common_test_config
+    asm_tests: <path_var>/riscv-arch-test/riscv-test-suite/rv64i_m/K/src/sha512sig1-01.S
+  
+  - test: rv64i_m-sha512sig1-rwp1
+    iterations: 1
+    <<: *common_test_config
+    asm_tests: <path_var>/riscv-arch-test/riscv-test-suite/rv64i_m/K/src/sha512sig1-rwp1.S
+
+  - test: rv64i_m-sha512sig1-rwp2
+    iterations: 1
+    <<: *common_test_config
+    asm_tests: <path_var>/riscv-arch-test/riscv-test-suite/rv64i_m/K/src/sha512sig1-rwp2.S
+
+  - test: rv64i_m-sha512sum0-01
+    iterations: 1
+    <<: *common_test_config
+    asm_tests: <path_var>/riscv-arch-test/riscv-test-suite/rv64i_m/K/src/sha512sum0-01.S
+  
+  - test: rv64i_m-sha512sum0-rwp1
+    iterations: 1
+    <<: *common_test_config
+    asm_tests: <path_var>/riscv-arch-test/riscv-test-suite/rv64i_m/K/src/sha512sum0-rwp1.S
+
+  - test: rv64i_m-sha512sum0-rwp2
+    iterations: 1
+    <<: *common_test_config
+    asm_tests: <path_var>/riscv-arch-test/riscv-test-suite/rv64i_m/K/src/sha512sum0-rwp2.S
+
+  - test: rv64i_m-sha512sum1-01
+    iterations: 1
+    <<: *common_test_config
+    asm_tests: <path_var>/riscv-arch-test/riscv-test-suite/rv64i_m/K/src/sha512sum1-01.S
+  
+  - test: rv64i_m-sha512sum1-rwp1
+    iterations: 1
+    <<: *common_test_config
+    asm_tests: <path_var>/riscv-arch-test/riscv-test-suite/rv64i_m/K/src/sha512sum1-rwp1.S
+
+  - test: rv64i_m-sha512sum1-rwp2
+    iterations: 1
+    <<: *common_test_config
+    asm_tests: <path_var>/riscv-arch-test/riscv-test-suite/rv64i_m/K/src/sha512sum1-rwp2.S