cva6/core/branch_unit.sv
Farhan Ali Shah 542fe39adc
Some checks are pending
bender-up-to-date / bender-up-to-date (push) Waiting to run
ci / build-riscv-tests (push) Waiting to run
ci / execute-riscv64-tests (push) Blocked by required conditions
ci / execute-riscv32-tests (push) Blocked by required conditions
Adding support for ZCMT Extension for Code-Size Reduction in CVA6 (#2659)
## Introduction
This PR implements the ZCMT extension in the CVA6 core, targeting the 32-bit embedded-class platforms. ZCMT is a code-size reduction feature that utilizes compressed table jump instructions (cm.jt and cm.jalt) to reduce code size for embedded systems
**Note:** Due to implementation complexity, ZCMT extension is primarily targeted at embedded class CPUs. Additionally, it is not compatible with architecture class profiles.(Ref. [Unprivilege spec 27.20](https://drive.google.com/file/d/1uviu1nH-tScFfgrovvFCrj7Omv8tFtkp/view))

## Key additions

- Added zcmt_decoder module for compressed table jump instructions: cm.jt (jump table) and cm.jalt (jump-and-link table)

- Implemented the Jump Vector Table (JVT) CSR to store the base address of the jump table in csr_reg module

- Implemented a return address stack, enabling cm.jalt to behave equivalently to jal ra (jump-and-link with return address), by pushing the return address onto the stack in zcmt_decoder module

## Implementation in CVA6
The implementation of the ZCMT extension involves the following major modifications:

### compressed decoder 
The compressed decoder scans and identifies the cm.jt and cm.jalt instructions, and generates signals indicating that the instruction is both compressed and a ZCMT instruction.

### zcmt_decoder
A new zcmt_decoder module was introduced to decode the cm.jt and cm.jalt instructions, fetch the base address of the JVT table from JVT CSR, extract the index and construct jump instructions to ensure efficient integration of the ZCMT extension in embedded platforms. Table.1 shows the IO port connection of zcmt_decoder module. High-level block diagram of zcmt implementation in CVA6 is shown in Figure 1.

_Table. 1 IO port connection with zcmt_decoder module_
Signals | IO | Description | Connection | Type
-- | -- | -- | -- | --
clk_i | in | Subsystem Clock | SUBSYSTEM | logic
rst_ni | in | Asynchronous reset active low | SUBSYSTEM | logic
instr_i | in | Instruction in | compressed_decoder | logic [31:0]
pc_i | in | Current PC | PC from FRONTEND | logic [CVA6Cfg.VLEN-1:0]
is_zcmt_instr_i | in | Is instruction a zcmt instruction | compressed_decoder | logic
illegal_instr_i | in | Is instruction a illegal instruction | compressed_decoder | logic
is_compressed_i | in | Is instruction a compressed instruction | compressed_decoder | logic
jvt_i | in | JVT struct from CSR | CSR | jvt_t
req_port_i | in | Handshake between CACHE and FRONTEND (fetch) | Cache | dcache_req_o_t
instr_o | out | Instruction out | cvxif_compressed_if_driver | logic [31:0]
illegal_instr_o | out | Is the instruction is illegal | cvxif_compressed_if_driver | logic
is_compressed_o | out | Is the instruction is compressed | cvxif_compressed_if_driver | logic
fetch_stall_o | out | Stall siganl | cvxif_compressed_if_driver | logic
req_port_o | out | Handshake between CACHE and FRONTEND (fetch) | Cache | dcache_req_i_t

### branch unit condition
A condition is implemented in the branch unit to ensure that ZCMT instructions always cause a misprediction, forcing the program to jump to the calculated address of the newly constructed jump instruction.

### JVT CSR
A new JVT csr is implemented in csr_reg which holds the base address of the JVT table. The base address is fetched from the JVT CSR, and combined with the index value to calculate the effective address.

### No MMU
Embedded platform does not utilize the MMU, so zcmt_decoder is connected with cache through port 0 of the Dcache module for implicit read access from the memory.

![zcmt_block drawio](https://github.com/user-attachments/assets/ac7bba75-4f56-42f4-9f5e-0c18f00d4dae)
_Figure. 1 High level block diagram of ZCMT extension implementation_

## Known Limitations
The implementation targets 32-bit instructions for embedded-class platforms without an MMU. Since the core does not utilize an MMU, it is leveraged to connect the zcmt_decoder to the cache via port 0.

## Testing and Verification

- Developed directed test cases to validate cm.jt and cm.jalt instruction functionality
- Verified correct initialization and updates of JVT CSR

### Test Plan 
A test plan is developed to test the functionality of ZCMT extension along with JVT CSR. Directed Assembly test executed to check the functionality. 

_Table. 2 Test plan_
S.no | Features | Description | Pass/Fail Criteria | Test Type | Test status
-- | -- | -- | -- | ---- | --
1 | cm.jt | Simple assembly test to validate the working of cm.jt instruction in  CV32A60x. | Check against Spike's ref. model | Directed | Pass
2 | cm.jalt | Simple assembly test to validate the working of cm.jalt instruction in both CV32A60x. | Check against Spike's ref. model | Directed | Pass
3 | cm.jalt with return address stack | Simple assembly test to validate the working of cm.jalt instruction with return address stack in both CV32A60x. It works as jump and link ( j ra, imm) | Check against Spike's ref. model | Directed | Pass
4 | JVT CSR | Read and write base address of Jump table to JVT CSR | Check against Spike's ref. model | Directed | Pass


**Note**: Please find the test under CVA6_REPO_DIR/verif/tests/custom/zcmt"
2025-01-27 13:23:26 +01:00

138 lines
6.6 KiB
Systemverilog

// Copyright 2018 ETH Zurich and University of Bologna.
// Copyright and related rights are licensed under the Solderpad Hardware
// License, Version 0.51 (the "License"); you may not use this file except in
// compliance with the License. You may obtain a copy of the License at
// http://solderpad.org/licenses/SHL-0.51. Unless required by applicable law
// or agreed to in writing, software, hardware and materials distributed under
// this License is distributed on an "AS IS" BASIS, WITHOUT WARRANTIES OR
// CONDITIONS OF ANY KIND, either express or implied. See the License for the
// specific language governing permissions and limitations under the License.
//
// Author: Florian Zaruba, ETH Zurich
// Date: 09.05.2017
// Description: Branch target calculation and comparison
module branch_unit #(
parameter config_pkg::cva6_cfg_t CVA6Cfg = config_pkg::cva6_cfg_empty,
parameter type bp_resolve_t = logic,
parameter type branchpredict_sbe_t = logic,
parameter type exception_t = logic,
parameter type fu_data_t = logic
) (
// Subsystem Clock - SUBSYSTEM
input logic clk_i,
// Asynchronous reset active low - SUBSYSTEM
input logic rst_ni,
// Virtualization mode state - CSR_REGFILE
input logic v_i,
// Debug mode state - CSR_REGFILE
input logic debug_mode_i,
// FU data needed to execute instruction - ISSUE_STAGE
input fu_data_t fu_data_i,
// Instruction PC - ISSUE_STAGE
input logic [CVA6Cfg.VLEN-1:0] pc_i,
// Is zcmt instruction - ISSUE_STAGE
input logic is_zcmt_i,
// Instruction is compressed - ISSUE_STAGE
input logic is_compressed_instr_i,
// Branch unit instruction is valid - ISSUE_STAGE
input logic branch_valid_i,
// ALU branch compare result - ALU
input logic branch_comp_res_i,
// Brach unit result - ISSUE_STAGE
output logic [CVA6Cfg.VLEN-1:0] branch_result_o,
// Information of branch prediction - ISSUE_STAGE
input branchpredict_sbe_t branch_predict_i,
// Signaling that we resolved the branch - ISSUE_STAGE
output bp_resolve_t resolved_branch_o,
// Branch is resolved, new entries can be accepted by scoreboard - ID_STAGE
output logic resolve_branch_o,
// Branch exception out - TO_BE_COMPLETED
output exception_t branch_exception_o
);
logic [CVA6Cfg.VLEN-1:0] target_address;
logic [CVA6Cfg.VLEN-1:0] next_pc;
// here we handle the various possibilities of mis-predicts
always_comb begin : mispredict_handler
// set the jump base, for JALR we need to look at the register, for all other control flow instructions we can take the current PC
automatic logic [CVA6Cfg.VLEN-1:0] jump_base;
// TODO(zarubaf): The ALU can be used to calculate the branch target
jump_base = (fu_data_i.operation == ariane_pkg::JALR) ? fu_data_i.operand_a[CVA6Cfg.VLEN-1:0] : pc_i;
resolve_branch_o = 1'b0;
resolved_branch_o.target_address = {CVA6Cfg.VLEN{1'b0}};
resolved_branch_o.is_taken = 1'b0;
resolved_branch_o.valid = branch_valid_i;
resolved_branch_o.is_mispredict = 1'b0;
resolved_branch_o.cf_type = branch_predict_i.cf;
// calculate next PC, depending on whether the instruction is compressed or not this may be different
// TODO(zarubaf): We already calculate this a couple of times, maybe re-use?
next_pc = pc_i + ((is_compressed_instr_i) ? {{CVA6Cfg.VLEN-2{1'b0}}, 2'h2} : {{CVA6Cfg.VLEN-3{1'b0}}, 3'h4});
// calculate target address simple 64 bit addition
target_address = $unsigned($signed(jump_base) + $signed(fu_data_i.imm[CVA6Cfg.VLEN-1:0]));
// on a JALR we are supposed to reset the LSB to 0 (according to the specification)
if (fu_data_i.operation == ariane_pkg::JALR) target_address[0] = 1'b0;
// we need to put the branch target address into rd, this is the result of this unit
branch_result_o = next_pc;
resolved_branch_o.pc = pc_i;
// There are only three sources of mispredicts:
// 1. Branches
// 2. Jumps to register addresses
// 3. Zcmt instructions
if (branch_valid_i) begin
// write target address which goes to PC Gen or select target address if zcmt
resolved_branch_o.target_address = (branch_comp_res_i) ? target_address : next_pc;
resolved_branch_o.is_taken = branch_comp_res_i;
if (CVA6Cfg.RVZCMT) begin
if (is_zcmt_i) begin
// Unconditional jump handling
resolved_branch_o.is_mispredict = 1'b1; // miss prediction for ZCMT
resolved_branch_o.cf_type = ariane_pkg::JumpR;
end
end
// check the outcome of the branch speculation
if (ariane_pkg::op_is_branch(fu_data_i.operation)) begin
// Set the `cf_type` of the output as `branch`, this will update the BHT.
resolved_branch_o.cf_type = ariane_pkg::Branch;
// If the ALU comparison does not agree with the BHT prediction set the resolution as mispredicted.
resolved_branch_o.is_mispredict = branch_comp_res_i != (branch_predict_i.cf == ariane_pkg::Branch);
end
if (fu_data_i.operation == ariane_pkg::JALR
// check if the address of the jump register is correct and that we actually predicted
&& (branch_predict_i.cf == ariane_pkg::NoCF || target_address != branch_predict_i.predict_address)) begin
resolved_branch_o.is_mispredict = 1'b1;
// update BTB only if this wasn't a return
if (branch_predict_i.cf != ariane_pkg::Return)
resolved_branch_o.cf_type = ariane_pkg::JumpR;
end
// to resolve the branch in ID
resolve_branch_o = 1'b1;
end
end
// use ALU exception signal for storing instruction fetch exceptions if
// the target address is not aligned to a 2 byte boundary
//
logic jump_taken;
always_comb begin : exception_handling
// Do a jump if it is either unconditional jump (JAL | JALR) or `taken` conditional jump
branch_exception_o.cause = riscv::INSTR_ADDR_MISALIGNED;
branch_exception_o.valid = 1'b0;
if (CVA6Cfg.TvalEn)
branch_exception_o.tval = {{CVA6Cfg.XLEN - CVA6Cfg.VLEN{pc_i[CVA6Cfg.VLEN-1]}}, pc_i};
else branch_exception_o.tval = '0;
branch_exception_o.tval2 = {CVA6Cfg.GPLEN{1'b0}};
branch_exception_o.tinst = '0;
branch_exception_o.gva = CVA6Cfg.RVH ? v_i : 1'b0;
// Only throw instruction address misaligned exception if this is indeed a `taken` conditional branch or
// an unconditional jump
if (!CVA6Cfg.RVC) begin
jump_taken = !(ariane_pkg::op_is_branch(fu_data_i.operation)) ||
((ariane_pkg::op_is_branch(fu_data_i.operation)) && branch_comp_res_i);
if (branch_valid_i && (target_address[0] || target_address[1]) && jump_taken) begin
branch_exception_o.valid = 1'b1;
end
end
end
endmodule