Os fixes6 (#29)

* Support for atomic extension A
* Support instruction fence extension Zifencei
* Update CSRs to Version 20240411 and include compliant support for Zihpm, Sstc, and Smstateen extensions
* Support address translation
* Fixes interrupts and exception handling
* Adds interrupt controllers
* Support coherent multicore systems through a new data cache and arbiter
* Multiple bugfixes
* Adds new scripts for example systems in Vivado and LiteX
* Removes legacy, unused, and broken scripts, examples, and files

---------

Co-authored-by: Chris Keilbart <keilbartchris@gmail.com>
Co-authored-by: msa417 <msa417@ensc-rcl-14.engineering.sfu.ca>
Co-authored-by: Rajnesh Joshi <rajnesh.joshi28@gmail.com>
Co-authored-by: Rajnesh Joshi <rajneshj@sfu.ca>
This commit is contained in:
mohammadshahidzade 2025-03-11 16:06:16 -07:00 committed by GitHub
parent f0b92a923a
commit 4efa1e2d03
No known key found for this signature in database
GPG key ID: B5690EEEBB952194
158 changed files with 8774 additions and 90285 deletions

2
.gitignore vendored Normal file
View file

@ -0,0 +1,2 @@
test_benches/verilator/build
vivado/

0
LICENSE Executable file → Normal file
View file

14
README.md Executable file → Normal file
View file

@ -1,6 +1,5 @@
# CVA5
CVA5 is a 32-bit RISC-V processor designed for FPGAs supporting the Multiply/Divide and Double-precision Floating-Point extensions (RV32IMD). The processor is written in SystemVerilog and has been designed to be both highly extensible and highly configurable.
CVA5 is a 32-bit RISC-V processor designed for FPGAs supporting the Multiply/Divide, Atomic, and Floating-Point extensions (RV32IMAFD). The processor is written in SystemVerilog and has been designed to be both highly extensible and highly configurable.
The CVA5 is derived from the Taiga Project from Simon Fraser University.
@ -8,27 +7,28 @@ The CVA5 is derived from the Taiga Project from Simon Fraser University.
The pipeline has been designed to support parallel, variable-latency execution units and to readily support the inclusion of new execution units.
![CVA5 Block Diagram](examples/zedboard/cva5_small.png)
## Documentation and Project Setup
For up-to-date documentation, as well as an automated build environment setup, refer to [Taiga Project](https://gitlab.com/sfu-rcl/taiga-project)
## License
CVA5 is licensed under the Solderpad License, Version 2.1 ( http://solderpad.org/licenses/SHL-2.1/ ). Solderpad is an extension of the Apache License, and many contributions to CVA5 were made under Apache Version 2.0 ( https://www.apache.org/licenses/LICENSE-2.0 )
## Examples
A zedboard configuration is provided under the examples directory along with tools for running stand-alone applications and providing application level simulation of the system. (See the README in the zedboard directory for details.)
A script to package CVA5 as an IP is available and can be run in Vivado by running `source ./examples/xilinx/package_as_ip.tcl`. A similar script can be executed afterwords to create a system implementing a small hello world application executing from block memory on the Nexys A7 FPGA.
## Publications
C. Keilbart, Y. Gao, M. Chua, E. Matthews, S. J. Wilton, and L. Shannon, “Designing an IEEE-Compliant FPU that Supports Configurable Precision for Soft Processors,” ACM Trans. Reconfgurable Technol. Syst., vol. 17, no. 2, Apr. 2024.
doi: [https://doi.org/10.1145/3650036](https://doi.org/10.1145/3650036)
E. Matthews, A. Lu, Z. Fang and L. Shannon, "Rethinking Integer Divider Design for FPGA-Based Soft-Processors," 2019 IEEE 27th Annual International Symposium on Field-Programmable Custom Computing Machines (FCCM), San Diego, CA, USA, 2019, pp. 289-297.
doi: [https://doi.org/10.1109/FCCM.2019.00046](https://doi.org/10.1109/FCCM.2019.00046)
E. Matthews, Z. Aguila and L. Shannon, "Evaluating the Performance Efficiency of a Soft-Processor, Variable-Length, Parallel-Execution-Unit Architecture for FPGAs Using the RISC-V ISA," 2018 IEEE 26th Annual International Symposium on Field-Programmable Custom Computing Machines (FCCM), Boulder, CO, 2018, pp. 1-8.
doi: [https://doi.org/10.1109/FCCM.2018.00010](https://doi.org/10.1109/FCCM.2018.00010)
E. Matthews and L. Shannon, "TAIGA: A new RISC-V soft-processor framework enabling high performance CPU architectural features," 2017 27th International Conference on Field Programmable Logic and Applications (FPL), Ghent, Belgium, 2017. [https://doi.org/10.23919/FPL.2017.8056766](https://doi.org/10.23919/FPL.2017.8056766)
E. Matthews and L. Shannon, "TAIGA: A new RISC-V soft-processor framework enabling high performance CPU architectural features," 2017 27th International Conference on Field Programmable Logic and Applications (FPL), Ghent, Belgium, 2017.
doi: [https://doi.org/10.23919/FPL.2017.8056766](https://doi.org/10.23919/FPL.2017.8056766)

5
TODO.txt Normal file
View file

@ -0,0 +1,5 @@
A list of possible future improvements to CVA5 are as follows:
- Support configurable RAM types (LUTRAM, BRAM, URAM) for each large memory (branch predictor, instruction cache tagbank, instruction cache databank, data cache tagbank, and data cache databank). This allows for better control over resources and frequency.
- Remove atomic memory operation support from peripheral busses and convert them into exceptions. This allows for a cleaner and more optimal implementation of atomic instructions inside of the data cache alone.
- Eliminate the forwarding in the load queue when no data TLB is configured. This load queue bypass is only necessary when the data TLB is present for frequency reasons, and its removal would save resources.
- Implement a more flexible scheme for handling atomic read-modify-write memory operations in the multicore arbiter. Currently, as soon as such an operation is accepted all further memory operations are delayed until the RMW is resolved. The main problem (and indeed the only - the data cache can handle snoops during a RMW) in implementing a different scheme is starvation; if one core is writing constantly to an address another core is trying to RMW to, then the RMW core must eventually be allowed to succeed.

256
apu/busses/axi_adapter.sv Normal file
View file

@ -0,0 +1,256 @@
/*
* Copyright © 2024 Chris Keilbart
*
* Licensed under the Apache License, Version 2.0 (the "License");
* you may not use this file except in compliance with the License.
* You may obtain a copy of the License at
*
* http://www.apache.org/licenses/LICENSE-2.0
*
* Unless required by applicable law or agreed to in writing, software
* distributed under the License is distributed on an "AS IS" BASIS,
* WITHOUT WARRANTIES OR CONDITIONS OF ANY KIND, either express or implied.
* See the License for the specific language governing permissions and
* limitations under the License.
*
* Initial code developed under the supervision of Dr. Lesley Shannon,
* Reconfigurable Computing Lab, Simon Fraser University.
*
* Author(s):
* Chris Keilbart <ckeilbar@sfu.ca>
*/
module axi_adapter
#(
parameter int unsigned NUM_CORES = 1
) (
input logic clk,
input logic rst,
mem_interface.mem_slave mems[NUM_CORES-1:0],
axi_interface.master axi
);
localparam MAX_OUTSTANDING = 8;
localparam MAX_W = $clog2(MAX_OUTSTANDING);
typedef logic[7:0] hash_t;
typedef logic[MAX_W:0] count_t;
typedef logic[MAX_W-1:0] index_t;
////////////////////////////////////////////////////
//Multicore interface
//Arbitrates requests
logic request_pop;
logic request_valid;
logic[31:2] request_addr;
logic request_rnw;
logic[4:0] request_rlen;
logic[31:0] request_wdata;
logic[3:0] request_wbe;
logic[1+$clog2(NUM_CORES):0] request_id;
logic request_rvalid;
logic[31:0] request_rdata;
logic[1+$clog2(NUM_CORES):0] request_rid;
logic[NUM_CORES-1:0] write_outstanding;
multicore_arbiter #(.NUM_CORES(NUM_CORES)) arb (
.mems(mems),
.request_pop(request_pop),
.request_valid(request_valid),
.request_addr(request_addr),
.request_rnw(request_rnw),
.request_rlen(request_rlen),
.request_wdata(request_wdata),
.request_wbe(request_wbe),
.request_id(request_id),
.request_rvalid(request_rvalid),
.request_rdata(request_rdata),
.request_rid(request_rid),
.write_outstanding(write_outstanding),
.*);
////////////////////////////////////////////////////
//AXI interface
//Read bursts and single writes
//Ordering between reads and writes must be enforced
logic addr_blocked;
//AR
assign axi.arvalid = request_valid & request_rnw & ~addr_blocked;
assign axi.araddr = {request_addr[31:7], request_addr[6:2] & ~request_rlen, 2'b00};
assign axi.arlen = {3'b0, request_rlen};
assign axi.arsize = 3'b010; //4 bytes
assign axi.arburst = 2'b01; //Incrementing
assign axi.arcache = 4'b0011; //Bufferable and non-cacheable memory
assign axi.arlock = 0; //Not locked
//R
assign axi.rready = 1;
assign request_rdata = axi.rdata;
assign request_rvalid = axi.rvalid;
//Don't care about rresp or rlast
logic sent_aw;
logic sent_w;
logic write_sent;
assign write_sent = (axi.awvalid & axi.awready | sent_aw) & (axi.wvalid & axi.wready | sent_w);
always_ff @(posedge clk) begin
if (rst | write_sent) begin
sent_aw <= 0;
sent_w <= 0;
end
else begin
sent_aw <= sent_aw | (axi.awvalid & axi.awready);
sent_w <= sent_w | (axi.wvalid & axi.wready);
end
end
//AW
assign axi.awvalid = request_valid & ~request_rnw & ~sent_aw & ~addr_blocked;
assign axi.awaddr = {request_addr, 2'b00};
assign axi.awlen = '0;
assign axi.awsize = 3'b010; //4 bytes
assign axi.awburst = 2'b01; //Incrementing
assign axi.awcache = 4'b0011; //Bufferable and non-cacheable memory
assign axi.awlock = 0; //Not locked
//W
assign axi.wvalid = request_valid & ~request_rnw & ~sent_w;
assign axi.wdata = request_wdata;
assign axi.wstrb = request_wbe;
assign axi.wlast = 1;
//B
assign axi.bready = 1;
//Don't care about bresp
assign request_pop = (axi.arvalid & axi.arready) | write_sent;
////////////////////////////////////////////////////
//Outstanding tracking
//Outstanding requests are tracked using a fully-associative array
//Writes need to wait for outstanding write collisions to finish
//Writes need to wait for outstanding read collisions to finish
//Reads need to wait for outstanding write collisions to finish
//Free slots
logic[MAX_OUTSTANDING-1:0] frees;
index_t free_index;
priority_encoder #(.WIDTH(MAX_OUTSTANDING)) free_enc (
.priority_vector(frees),
.encoded_result(free_index)
);
always_ff @(posedge clk) begin
if (rst)
frees <= '1;
else begin
if (axi.rvalid & axi.rlast)
frees[axi.rid[MAX_W-1:0]] <= 1;
if (axi.bvalid)
frees[axi.bid[MAX_W-1:0]] <= 1;
if ((axi.awvalid & axi.awready) | (axi.arvalid & axi.arready))
frees[free_index] <= 0;
end
end
always_comb begin
axi.arid = '0;
axi.awid = '0;
axi.arid[MAX_W-1:0] = free_index;
axi.awid[MAX_W-1:0] = free_index;
end
//Outstanding storage
hash_t request_hash;
logic[5:0] request_lower;
logic[5:0] request_upper;
hash_t[MAX_OUTSTANDING-1:0] hashes;
logic[MAX_OUTSTANDING-1:0] rnws;
logic[MAX_OUTSTANDING-1:0][5:0] lowers;
logic[MAX_OUTSTANDING-1:0][5:0] uppers;
logic[MAX_OUTSTANDING-1:0][3:0] wbes;
logic[MAX_OUTSTANDING-1:0][1+$clog2(NUM_CORES):0] ids;
always_ff @(posedge clk) begin
if ((axi.awvalid & axi.awready) | (axi.arvalid & axi.arready)) begin
rnws[free_index] <= request_rnw;
lowers[free_index] <= request_lower;
uppers[free_index] <= request_upper;
wbes[free_index] <= request_wbe;
ids[free_index] <= request_id;
hashes[free_index] <= request_hash;
end
end
////////////////////////////////////////////////////
//Hash function
//8-bit hashes can easily be compared using the carry circuitry
//Only hash address bits that do not correspond to a line
assign request_hash[0] = request_addr[8] ^ request_addr[16] ^ request_addr[24];
assign request_hash[1] = request_addr[9] ^ request_addr[17] ^ request_addr[25];
assign request_hash[2] = request_addr[10] ^ request_addr[18] ^ request_addr[26];
assign request_hash[3] = request_addr[11] ^ request_addr[19] ^ request_addr[27];
assign request_hash[4] = request_addr[12] ^ request_addr[20] ^ request_addr[28];
assign request_hash[5] = request_addr[13] ^ request_addr[21] ^ request_addr[29];
assign request_hash[6] = request_addr[14] ^ request_addr[22] ^ request_addr[30];
assign request_hash[7] = request_addr[15] ^ request_addr[23] ^ request_addr[31];
////////////////////////////////////////////////////
//Collision check
//Collisions are checked at byte granularity; this is complicated by burst requests
//Therefore, the lower and upper boundaries of the request within a 64-byte "arena" are stored
//Collisions must therefore be within the boundaries
//Writes collide with all outstanding requests
//Reads only collide with outstanding writes
logic[MAX_OUTSTANDING-1:0] range_collision;
logic[MAX_OUTSTANDING-1:0] wbe_collision;
logic[MAX_OUTSTANDING-1:0] hash_collision;
always_comb begin
for (int i = 0; i < MAX_OUTSTANDING; i++) begin
hash_collision[i] = ~frees[i] & request_hash == hashes[i];
if (request_rnw) begin
range_collision[i] = (axi.araddr[7:2] <= lowers[i]) & (axi.araddr[7:2] + {1'b0, request_rlen} >= uppers[i]);
wbe_collision[i] = ~rnws[i];
end
else begin
range_collision[i] = (request_addr[7:2] >= lowers[i]) & (request_addr[7:2] <= uppers[i]);
wbe_collision[i] = rnws[i] | |(request_wbe & wbes[i]);
end
end
end
assign addr_blocked = |(hash_collision & range_collision & wbe_collision) | ~|frees;
assign request_lower = request_rnw ? axi.araddr[7:2] : request_addr[7:2];
assign request_upper = request_rnw ? axi.araddr[7:2] + {1'b0, request_rlen} : request_addr[7:2];
assign request_rid = ids[axi.rid[MAX_W-1:0]];
//Write outstanding check
logic[NUM_CORES-1:0][MAX_OUTSTANDING-1:0] outstanding_is_core;
always_comb begin
write_outstanding = '0;
for (int i = 0; i < NUM_CORES; i++) begin
for (int j = 0; j < MAX_OUTSTANDING; j++)
write_outstanding[i] |= ~frees[j] & ~rnws[j] & outstanding_is_core[i][j];
end
end
generate if (NUM_CORES > 1) begin : gen_id_check
always_comb begin
for (int i = 0; i < NUM_CORES; i++) begin
for (int j = 0; j < MAX_OUTSTANDING; j++)
outstanding_is_core[i][j] = i == ids[j][2+:$clog2(NUM_CORES)];
end
end
end else begin : gen_no_check
assign outstanding_is_core = '1;
end endgenerate
endmodule

View file

@ -0,0 +1,203 @@
/*
* Copyright © 2024 Chris Keilbart
*
* Licensed under the Apache License, Version 2.0 (the "License");
* you may not use this file except in compliance with the License.
* You may obtain a copy of the License at
*
* http://www.apache.org/licenses/LICENSE-2.0
*
* Unless required by applicable law or agreed to in writing, software
* distributed under the License is distributed on an "AS IS" BASIS,
* WITHOUT WARRANTIES OR CONDITIONS OF ANY KIND, either express or implied.
* See the License for the specific language governing permissions and
* limitations under the License.
*
* Initial code developed under the supervision of Dr. Lesley Shannon,
* Reconfigurable Computing Lab, Simon Fraser University.
*
* Author(s):
* Chris Keilbart <ckeilbar@sfu.ca>
*/
module multicore_arbiter
#(
parameter int unsigned NUM_CORES = 1
) (
input logic clk,
input logic rst,
mem_interface.mem_slave mems[NUM_CORES-1:0],
//Requests can be returned out of order for different IDs
//However, the original ordering of writes and reads to
//the same address must be preserved
input logic request_pop,
output logic request_valid,
output logic[31:2] request_addr,
output logic request_rnw,
output logic[4:0] request_rlen,
output logic[31:0] request_wdata,
output logic[3:0] request_wbe,
output logic[1+$clog2(NUM_CORES):0] request_id,
input logic request_rvalid,
input logic[31:0] request_rdata,
input logic[1+$clog2(NUM_CORES):0] request_rid,
input logic[NUM_CORES-1:0] write_outstanding
);
//Multiplexes memory requests and submits invalidations
//Write coalescing would be a nice future improvement
localparam FIFO_DEPTH = 32;
typedef logic[$clog2(FIFO_DEPTH):0] count_t;
typedef logic[2+$clog2(NUM_CORES)-1:0] full_id_t;
typedef struct packed {
logic[31:2] addr;
logic[4:0] rlen;
logic rnw;
logic[3:0] wbe;
logic[31:0] wdata;
full_id_t id;
logic rmw;
} request_t;
request_t in_req;
request_t out_req;
fifo_interface #(.DATA_TYPE(request_t)) request_fifo();
logic[NUM_CORES-1:0] requests;
logic[NUM_CORES-1:0] acks;
logic[NUM_CORES-1:0][31:2] addr;
logic[NUM_CORES-1:0][4:0] rlen;
logic[NUM_CORES-1:0] rnw;
logic[NUM_CORES-1:0] rmw;
logic[NUM_CORES-1:0][3:0] wbe;
logic[NUM_CORES-1:0][31:0] wdata;
logic[NUM_CORES-1:0][1:0] id;
full_id_t padded_id;
logic[NUM_CORES-1:0] rvalids;
logic[(NUM_CORES == 1 ? 1 : $clog2(NUM_CORES))-1:0] chosen_port;
logic rvalid;
logic[31:0] rdata;
full_id_t rid;
logic[NUM_CORES-1:0] out_core;
count_t[NUM_CORES-1:0] wcounts;
logic request_push;
logic request_finished;
genvar i;
////////////////////////////////////////////////////
//Implementation
//Unpack interface
generate for (i = 0; i < NUM_CORES; i++) begin : gen_unpack
assign requests[i] = mems[i].request;
assign addr[i] = mems[i].addr;
assign rlen[i] = mems[i].rlen;
assign rnw[i] = mems[i].rnw;
assign rmw[i] = mems[i].rmw;
assign wbe[i] = mems[i].wbe;
assign wdata[i] = mems[i].wdata;
assign id[i] = mems[i].id;
assign acks[i] = request_push & i == int'(chosen_port);
assign mems[i].ack = acks[i];
assign mems[i].inv_addr = in_req.addr;
assign mems[i].inv = request_push & ~in_req.rnw & i != int'(chosen_port);
assign mems[i].rvalid = rvalids[i];
assign mems[i].rdata = request_rdata;
assign mems[i].rid = request_rid[1:0];
assign mems[i].write_outstanding = wcounts[i][$clog2(FIFO_DEPTH)] | write_outstanding[i];
end endgenerate
//Once accepted, stall until the RMW is resolved
logic[(NUM_CORES == 1 ? 1 : $clog2(NUM_CORES))-1:0] rmw_index;
logic rmw_is_on;
logic[NUM_CORES-1:0] accept_request_from_rmw_core;
logic rmw_hit;
always_ff @(posedge clk) begin
if (in_req.rmw & request_push)
rmw_index <= chosen_port;
end
always_ff @(posedge clk) begin
if (rst)
rmw_is_on <= 0;
else if (in_req.rmw & request_push)
rmw_is_on <= 1;
else if (~in_req.rmw)
rmw_is_on <= 0;
end
assign accept_request_from_rmw_core = (rmw_is_on ? (1'b1 << rmw_index) : {NUM_CORES{1'b1}});
assign rmw_hit = |(requests & accept_request_from_rmw_core);
//Request FIFO
assign request_push = ~request_fifo.full & |requests & (~rmw_is_on | rmw_hit);
assign request_fifo.data_in = in_req;
assign out_req = request_fifo.data_out;
assign request_valid = request_fifo.valid;
assign request_addr = out_req.addr;
assign request_rnw = out_req.rnw;
assign request_rlen = out_req.rlen;
assign request_wdata = out_req.wdata;
assign request_wbe = out_req.wbe;
assign request_id = out_req.id;
assign request_fifo.push = request_push;
assign request_fifo.potential_push = request_push;
assign request_fifo.pop = request_pop;
cva5_fifo #(.DATA_TYPE(request_t), .FIFO_DEPTH(FIFO_DEPTH)) fifo_inst (.fifo(request_fifo), .*);
//Arbitration
round_robin #(.NUM_PORTS(NUM_CORES)) rr (
.requests(requests & accept_request_from_rmw_core),
.grant(request_push),
.grantee(chosen_port),
.*);
//Select input
assign in_req = '{
addr : addr[chosen_port],
rlen : rlen[chosen_port],
rnw : rnw[chosen_port],
wbe : wbe[chosen_port],
wdata : wdata[chosen_port],
id : padded_id,
rmw : rmw[chosen_port]
};
generate if (NUM_CORES == 1) begin : gen_no_id
assign padded_id = id[chosen_port];
assign rvalids[0] = request_rvalid;
assign out_core[0] = 1;
end else begin : gen_id
assign padded_id = {chosen_port, id[chosen_port]};
assign rvalids = request_rvalid << request_rid[2+:$clog2(NUM_CORES)];
assign out_core = 1 << request_id[2+:$clog2(NUM_CORES)];
end endgenerate
//Write tracking; tracked for each core
logic[NUM_CORES-1:0] wcount_incr;
logic[NUM_CORES-1:0] wcount_decr;
always_comb begin
for (int j = 0; j < NUM_CORES; j++) begin
wcount_incr[j] = acks[j] & ~rnw[j];
wcount_decr[j] = out_core[j] & ~request_rnw & request_pop;
end
end
always_ff @(posedge clk) begin
if (rst)
wcounts <= '0;
else begin
for (int j = 0; j < NUM_CORES; j++) //Flipped increment / decrement allows the MSB to be used as a nonzero signal
wcounts[j] <= wcounts[j] - count_t'(wcount_incr[j]) + count_t'(wcount_decr[j]);
end
end
endmodule

View file

@ -0,0 +1,103 @@
/*
* Copyright © 2022 Eric Matthews, Chris Keilbart
*
* Licensed under the Apache License, Version 2.0 (the "License");
* you may not use this file except in compliance with the License.
* You may obtain a copy of the License at
*
* http://www.apache.org/licenses/LICENSE-2.0
*
* Unless required by applicable law or agreed to in writing, software
* distributed under the License is distributed on an "AS IS" BASIS,
* WITHOUT WARRANTIES OR CONDITIONS OF ANY KIND, either express or implied.
* See the License for the specific language governing permissions and
* limitations under the License.
*
* Initial code developed under the supervision of Dr. Lesley Shannon,
* Reconfigurable Computing Lab, Simon Fraser University.
*
* Author(s):
* Eric Matthews <ematthew@sfu.ca>
* Chris Keilbart <ckeilbar@sfu.ca>
*/
module wishbone_adapter
#(
parameter int unsigned NUM_CORES = 1
) (
input logic clk,
input logic rst,
mem_interface.mem_slave mems[NUM_CORES-1:0],
wishbone_interface.master wishbone
);
////////////////////////////////////////////////////
//Multicore interface
//Arbitrates requests
logic request_pop;
logic request_valid;
logic[31:2] request_addr;
logic request_rnw;
logic[4:0] request_rlen;
logic[31:0] request_wdata;
logic[3:0] request_wbe;
logic[1+$clog2(NUM_CORES):0] request_id;
logic request_rvalid;
logic[31:0] request_rdata;
logic[1+$clog2(NUM_CORES):0] request_rid;
multicore_arbiter #(.NUM_CORES(NUM_CORES)) arb (
.mems(mems),
.request_pop(request_pop),
.request_valid(request_valid),
.request_addr(request_addr),
.request_rnw(request_rnw),
.request_rlen(request_rlen),
.request_wdata(request_wdata),
.request_wbe(request_wbe),
.request_id(request_id),
.request_rvalid(request_rvalid),
.request_rdata(request_rdata),
.request_rid(request_rid),
.write_outstanding('0),
.*);
////////////////////////////////////////////////////
//Wishbone interface
//Read bursts and single writes
logic[4:0] rcount;
logic[4:0] len;
logic last;
always_ff @(posedge clk) begin
if (rst | request_pop)
rcount <= '0;
else
rcount <= rcount + 5'(wishbone.ack);
end
assign request_pop = wishbone.ack & last;
assign len = {5{request_rnw}} & request_rlen;
assign last = rcount == len;
assign wishbone.cyc = request_valid;
assign wishbone.stb = request_valid;
assign wishbone.we = ~request_rnw;
assign wishbone.sel = request_rnw ? '1 : request_wbe;
assign wishbone.dat_w = request_wdata;
assign wishbone.bte = 2'b00; //Incrementing burst
assign wishbone.cti = {last, last, 1'b1}; //End of burst is used even for single-cycle transfers
assign wishbone.adr[29:5] = request_addr[31:7];
assign wishbone.adr[4:0] = (request_addr[6:2] & ~len) | (rcount & len);
//Return data registered for frequency
always_ff @(posedge clk) begin
request_rdata <= wishbone.dat_r;
request_rvalid <= request_rnw & wishbone.ack;
request_rid <= request_id;
end
endmodule

78
apu/clint/clint.sv Normal file
View file

@ -0,0 +1,78 @@
/*
* Copyright © 2024 Chris Keilbart, Mohammad Shahidzadeh
*
* Licensed under the Apache License, Version 2.0 (the "License");
* you may not use this file except in compliance with the License.
* You may obtain a copy of the License at
*
* http://www.apache.org/licenses/LICENSE-2.0
*
* Unless required by applicable law or agreed to in writing, software
* distributed under the License is distributed on an "AS IS" BASIS,
* WITHOUT WARRANTIES OR CONDITIONS OF ANY KIND, either express or implied.
* See the License for the specific language governing permissions and
* limitations under the License.
*
* Initial code developed under the supervision of Dr. Lesley Shannon,
* Reconfigurable Computing Lab, Simon Fraser University.
*
* Author(s):
* Chris Keilbart <ckeilbar@sfu.ca>
* Mohammad Shahidzadeh <mohammad_shahidzadeh_asadi@sfu.ca>
*/
module clint
#(
parameter int unsigned NUM_CORES = 1
) (
input logic clk,
input logic rst,
input logic write_mtime,
input logic write_mtimecmp,
input logic write_msip,
input logic write_upper, //Else lower; mtime and mtimecmp only
input logic [(NUM_CORES == 1) ? 0 : ($clog2(NUM_CORES)-1) : 0] write_msip_core,
input logic [(NUM_CORES == 1) ? 0 : ($clog2(NUM_CORES)-1) : 0] write_mtimecmp_core,
input logic[31:0] write_data,
output logic[1:0][31:0] mtime,
output logic[NUM_CORES-1:0][1:0][31:0] mtimecmp,
output logic[NUM_CORES-1:0] msip,
output logic[NUM_CORES-1:0] mtip
);
////////////////////////////////////////////////////
//Core Local INTerrupt unit (CLINT)
//Implements mtime, mtimecmp, mtip, and msip from the RISC-V privileged specification
//mtime increments at a constant frequency
//mtip is set when mtime >= mtimecmp
//mtimecmp and msip are registers
logic[63:0] mtime_p_1;
assign mtime_p_1 = {mtime[1], mtime[0]} + 1;
always_ff @(posedge clk) begin
if (rst) begin
mtime <= '0;
mtimecmp <= '1; //Reset to max to prevent interrupts
msip <= '0;
end
else begin
for (int i = 0; i < NUM_CORES; i++)
mtip[i] <= {mtime[1], mtime[0]} >= {mtimecmp[i][1], mtimecmp[i][0]};
mtime[1] <= write_mtime & write_upper ? write_data : mtime_p_1[63:32];
mtime[0] <= write_mtime & ~write_upper ? write_data : mtime_p_1[31:0];
for (int i = 0; i < NUM_CORES; i++) begin
mtimecmp[i][1] <= write_mtimecmp & write_upper & i == int'(write_mtimecmp_core) ? write_data : mtimecmp[i][1];
mtimecmp[i][0] <= write_mtimecmp & ~write_upper & i == int'(write_mtimecmp_core) ? write_data : mtimecmp[i][0];
end
if (write_msip)
msip[write_msip_core] <= write_data[0]; //LSB
end
end
endmodule

220
apu/clint/clint_wrapper.sv Normal file
View file

@ -0,0 +1,220 @@
/*
* Copyright © 2024 Chris Keilbart, Mohammad Shahidzadeh
*
* Licensed under the Apache License, Version 2.0 (the "License");
* you may not use this file except in compliance with the License.
* You may obtain a copy of the License at
*
* http://www.apache.org/licenses/LICENSE-2.0
*
* Unless required by applicable law or agreed to in writing, software
* distributed under the License is distributed on an "AS IS" BASIS,
* WITHOUT WARRANTIES OR CONDITIONS OF ANY KIND, either express or implied.
* See the License for the specific language governing permissions and
* limitations under the License.
*
* Initial code developed under the supervision of Dr. Lesley Shannon,
* Reconfigurable Computing Lab, Simon Fraser University.
*
* Author(s):
* Chris Keilbart <ckeilbar@sfu.ca>
* Mohammad Shahidzadeh <mohammad_shahidzadeh_asadi@sfu.ca>
*/
module clint_wrapper
#(
parameter int unsigned NUM_CORES = 1,
parameter logic AXI = 1'b1 //Else the wishbone bus is used
) (
input logic clk,
input logic rst,
output logic[63:0] mtime,
output logic[NUM_CORES-1:0] msip,
output logic[NUM_CORES-1:0] mtip,
//Address bits 31:16 are ignored, 1:0 assumed to be 00
//Compliant Wishbone classic (not pipelined)
input logic wb_cyc,
input logic wb_stb,
input logic wb_we,
input logic [15:2] wb_adr,
input logic [31:0] wb_dat_i,
output logic [31:0] wb_dat_o,
output logic wb_ack,
//Compliant AXI Lite interface; does not include optional awprot, wstrb, bresp, arprot, and rresp
input logic s_axi_awvalid,
input logic[31:0] s_axi_awaddr,
output logic s_axi_awready,
input logic s_axi_wvalid,
input logic[31:0] s_axi_wdata,
output logic s_axi_wready,
output logic s_axi_bvalid,
input logic s_axi_bready,
input logic s_axi_arvalid,
input logic[31:0] s_axi_araddr,
output logic s_axi_arready,
output logic s_axi_rvalid,
output logic[31:0] s_axi_rdata,
input logic s_axi_rready
);
////////////////////////////////////////////////////
//Core Local INTerrupt unit (CLINT) wrapper
//Handles addressing and bus interface
//16-bit address space
localparam logic [15:0] MSIP_BASE = 16'h0; //Must be 4-byte aligned
localparam logic [15:0] MTIMECMP_BASE = 16'h4000; //Must be 8-byte aligned
localparam logic [15:0] MTIME_BASE = 16'hbff8; //Must be 8-byte aligned
localparam logic [15:0] CORES_MINUS_ONE = 16'(NUM_CORES-1);
localparam CORE_W = NUM_CORES == 1 ? 1 : $clog2(NUM_CORES);
logic[NUM_CORES-1:0][1:0][31:0] mtimecmp;
logic write_mtime;
logic write_mtimecmp;
logic write_msip;
logic write_upper;
logic[CORE_W-1:0] write_msip_core;
logic[CORE_W-1:0] write_mtimecmp_core;
logic[31:0] write_data;
logic[1:0][31:0] mtime_packed;
assign mtime = {mtime_packed[1], mtime_packed[0]};
clint #(.NUM_CORES(NUM_CORES)) core (
.write_mtime(write_mtime),
.write_mtimecmp(write_mtimecmp),
.write_msip(write_msip),
.write_upper(write_upper),
.write_msip_core(write_msip_core),
.write_mtimecmp_core(write_mtimecmp_core),
.write_data(write_data),
.mtime(mtime_packed),
.mtimecmp(mtimecmp),
.msip(msip),
.mtip(mtip),
.*);
//Interface
generate if (AXI) begin : gen_axi_if
//Simple implementation uses state machine for reads and writes
typedef enum logic[2:0] {
IDLE,
RACCEPT,
WACCEPT,
RRESP,
WRESP
} state_t;
state_t state;
state_t next_state;
always_ff @(posedge clk) begin
if (rst)
state <= IDLE;
else
state <= next_state;
end
always_comb begin
unique case (state)
IDLE : begin
if (s_axi_awvalid & s_axi_wvalid)
next_state = WACCEPT;
else if (s_axi_arvalid)
next_state = RACCEPT;
else
next_state = IDLE;
end
RACCEPT : next_state = RRESP;
WACCEPT : next_state = WRESP;
RRESP : next_state = s_axi_rready ? IDLE : RRESP;
WRESP : next_state = s_axi_bready ? IDLE : WRESP;
endcase
end
//Reads
logic doing_read;
assign doing_read = state == RACCEPT;
assign s_axi_arready = doing_read;
assign s_axi_rvalid = state == RRESP;
always_ff @(posedge clk) begin
if (doing_read) begin
case ({s_axi_araddr[15:2], 2'b00}) inside
[MSIP_BASE:MSIP_BASE+4*CORES_MINUS_ONE] : s_axi_rdata <= {31'b0, msip[NUM_CORES == 1 ? '0 : s_axi_araddr[2+:CORE_W]]};
[MTIME_BASE:MTIME_BASE+4] : s_axi_rdata <= mtime_packed[s_axi_araddr[2]];
[MTIMECMP_BASE:MTIMECMP_BASE+4+8*CORES_MINUS_ONE] : s_axi_rdata <= mtimecmp[NUM_CORES == 1 ? '0 : s_axi_araddr[3+:CORE_W]][s_axi_araddr[2]];
default : s_axi_rdata <= '0;
endcase
end
end
//Writes
logic doing_write;
assign doing_write = state == WACCEPT;
assign s_axi_awready = doing_write;
assign s_axi_wready = doing_write;
assign s_axi_bvalid = state == WRESP;
assign write_data = s_axi_wdata;
assign write_upper = s_axi_awaddr[2];
assign write_msip_core = NUM_CORES == 1 ? '0 : s_axi_awaddr[2+:CORE_W];
assign write_mtimecmp_core = NUM_CORES == 1 ? '0 : s_axi_awaddr[3+:CORE_W];
always_comb begin
write_msip = 0;
write_mtime = 0;
write_mtimecmp = 0;
case ({s_axi_awaddr[15:2], 2'b00}) inside
[MSIP_BASE:MSIP_BASE+4*CORES_MINUS_ONE] : write_msip = doing_write;
[MTIME_BASE:MTIME_BASE+4] : write_mtime = doing_write;
[MTIMECMP_BASE:MTIMECMP_BASE+4+8*CORES_MINUS_ONE] : write_mtimecmp = doing_write;
endcase
end
//Not in use
assign wb_ack = 0;
end else begin : gen_wishbone_if
//Combinational response
assign write_data = wb_dat_i;
assign write_upper = wb_adr[2];
assign wb_ack = wb_cyc & wb_stb;
assign write_msip_core = NUM_CORES == 1 ? '0 : wb_adr[2+:CORE_W];
assign write_mtimecmp_core = NUM_CORES == 1 ? '0 : wb_adr[3+:CORE_W];
always_comb begin
write_mtime = 0;
write_mtimecmp = 0;
write_msip = 0;
wb_dat_o = '0;
case ({wb_adr[15:2], 2'b00}) inside
[MSIP_BASE:MSIP_BASE+4*CORES_MINUS_ONE] : begin
write_msip = wb_cyc & wb_stb & wb_we;
wb_dat_o = {31'b0, msip[write_msip_core]};
end
[MTIME_BASE:MTIME_BASE+4] : begin
write_mtime = wb_cyc & wb_stb & wb_we;
wb_dat_o = mtime_packed[write_upper];
end
[MTIMECMP_BASE:MTIMECMP_BASE+4+8*CORES_MINUS_ONE] : begin
write_mtimecmp = wb_cyc & wb_stb & wb_we;
wb_dat_o = mtimecmp[write_mtimecmp_core][write_upper];
end
endcase
end
//Not in use
assign s_axi_awready = 0;
assign s_axi_wready = 0;
assign s_axi_bvalid = 0;
assign s_axi_arready = 0;
assign s_axi_rvalid = 0;
end endgenerate
endmodule

204
apu/plic/plic.sv Normal file
View file

@ -0,0 +1,204 @@
/*
* Copyright © 2024 Chris Keilbart, Mohammad Shahidzadeh
*
* Licensed under the Apache License, Version 2.0 (the "License");
* you may not use this file except in compliance with the License.
* You may obtain a copy of the License at
*
* http://www.apache.org/licenses/LICENSE-2.0
*
* Unless required by applicable law or agreed to in writing, software
* distributed under the License is distributed on an "AS IS" BASIS,
* WITHOUT WARRANTIES OR CONDITIONS OF ANY KIND, either express or implied.
* See the License for the specific language governing permissions and
* limitations under the License.
*
* Initial code developed under the supervision of Dr. Lesley Shannon,
* Reconfigurable Computing Lab, Simon Fraser University.
*
* Author(s):
* Chris Keilbart <ckeilbar@sfu.ca>
* Mohammad Shahidzadeh <mohammad_shahidzadeh_asadi@sfu.ca>
*/
module plic
#(
parameter int unsigned NUM_SOURCES = 1,
parameter int unsigned NUM_TARGETS = 1,
parameter int unsigned PRIORITY_W = 4,
parameter int unsigned REG_STAGE = 1
) (
input logic clk,
input logic rst,
input logic[NUM_SOURCES:1] irq_srcs,
input logic[NUM_SOURCES-1:0] edge_sensitive, //Both rising and falling edges, else level sensitive and active high
//Memory mapped port
input logic read_reg,
input logic write_reg,
input logic[25:2] addr,
input logic[31:0] wdata,
output logic[31:0] rdata,
output logic[NUM_TARGETS-1:0] eip
);
////////////////////////////////////////////////////
//RISC-V Platform-Level Interrupt Controller (PLIC)
//Supports up to 32 interrupt sources; more would require a different design for frequency reasons anyways
//REG_STAGE determines the location of the registers in the comparison tree
//It might need to manually tuned depending on FPGA architecture, NUM_SOURCES, NUM_TARGETS, and PRIORITY_W
//Reads therefore have one cycle of latency, but writes do not
localparam PADDED_SOURCES = NUM_SOURCES+1; //Source 0 does not exist
localparam IRQ_ID_W = $clog2(PADDED_SOURCES); //Will always be at least 1
localparam TARGET_ID_W = NUM_TARGETS == 1 ? 1 : $clog2(NUM_TARGETS);
typedef logic[PRIORITY_W-1:0] priority_t;
typedef logic[PADDED_SOURCES-1:0] irq_t;
typedef logic[IRQ_ID_W-1:0] irq_id_t;
typedef logic[TARGET_ID_W-1:0] target_id_t;
irq_t interrupt_pending;
irq_t[NUM_TARGETS-1:0] interrupt_enable;
priority_t[NUM_TARGETS-1:0] target_threshold;
priority_t[PADDED_SOURCES-1:0] interrupt_priority;
logic[PADDED_SOURCES-1:0] claim_ohot;
logic[PADDED_SOURCES-1:0] complete_ohot;
irq_id_t irq_claim_id;
logic is_complete_claim;
logic is_priority;
logic is_target_threshold;
logic is_interrupt_enable;
////////////////////////////////////////////////////
//Implementation
//Gateway registers external interrupts
logic claim;
logic complete;
assign claim = read_reg & is_complete_claim;
assign claim_ohot = {{NUM_SOURCES{1'b0}}, claim} << irq_claim_id;
assign complete = write_reg & is_complete_claim;
assign complete_ohot = {{NUM_SOURCES{1'b0}}, complete} << wdata;
plic_gateway #(.NUM_IRQS(PADDED_SOURCES)) gateway (
.irq({irq_srcs, 1'b0}),
.edge_sensitive({edge_sensitive, 1'b0}),
.claim(claim_ohot),
.complete(complete_ohot),
.ip(interrupt_pending),
.*);
//Interrupts must meet the minimum priority for a target and can be individually suppressed
irq_t[NUM_TARGETS-1:0] filtered_interrupts;
always_comb begin
for (int i = 0; i < NUM_TARGETS; i++) begin
for (int j = 0; j < PADDED_SOURCES; j++)
filtered_interrupts[i][j] = interrupt_pending[j] & interrupt_enable[i][j] & interrupt_priority[j] > target_threshold[i];
eip[i] = |filtered_interrupts[i];
end
end
//ID Priority
irq_t possible_irqs;
target_id_t addr_rid;
assign addr_rid = addr[12+:TARGET_ID_W];
assign possible_irqs = NUM_TARGETS > 1 ? filtered_interrupts[addr_rid] : filtered_interrupts[0];
plic_cmptree #(
.NUM_IRQS(PADDED_SOURCES),
.PRIORITY_W(PRIORITY_W),
.REG_STAGE_W(2**REG_STAGE)
) tree (
.irq_valid(possible_irqs),
.irq_priority(interrupt_priority),
.highest_id(irq_claim_id),
.highest_priority(),
.highest_valid(),
.*);
////////////////////////////////////////////////////
//Read and address decoding
always_comb begin
rdata = '0;
is_priority = 0;
is_target_threshold = 0;
is_interrupt_enable = 0;
is_complete_claim = 0;
//Interrupt priority
for (int i = 0; i < PADDED_SOURCES; i++) begin
if (addr == 24'(i)) begin
is_priority = 1;
rdata[PRIORITY_W-1:0] = interrupt_priority[i];
end
end
//Interrupt pending
if (addr == 24'h400)
rdata[PADDED_SOURCES-1:0] = interrupt_pending;
//Enable bits
for (int i = 0; i < NUM_TARGETS; i++) begin
if (addr == 24'h800+24'h20*24'(i)) begin
is_interrupt_enable = 1;
rdata[PADDED_SOURCES-1:0] = interrupt_enable[i];
end
end
//Target threshold
for (int i = 0; i < NUM_TARGETS; i++) begin
if (addr == 24'h80000+24'h400*24'(i)) begin
is_target_threshold = 1;
rdata[PRIORITY_W-1:0] = target_threshold[i];
end
end
//Complete/Claim
for (int i = 0; i < NUM_TARGETS; i++) begin
if (addr == 24'h80001+24'h400*24'(i)) begin
is_complete_claim = 1;
rdata[IRQ_ID_W-1:0] = irq_claim_id;
end
end
end
//Write logic
always_ff @(posedge clk) begin
if (rst)
interrupt_priority <= '0;
else begin
if (write_reg & is_priority)
interrupt_priority[addr[2+:IRQ_ID_W]] <= wdata[PRIORITY_W-1:0];
interrupt_priority[0] <= '0; //IRQ 0 hard coded to 0
end
end
target_id_t threshold_index;
assign threshold_index = NUM_TARGETS > 1 ? addr[12+:TARGET_ID_W] : 0;
always_ff @(posedge clk) begin
if (rst)
target_threshold <= '0;
else begin
if (write_reg & is_target_threshold)
target_threshold[threshold_index] <= wdata[PRIORITY_W-1:0];
end
end
target_id_t enable_index;
assign enable_index = NUM_TARGETS > 1 ? addr[7+:TARGET_ID_W] : 0;
always_ff @(posedge clk) begin
if (rst)
interrupt_enable <= '0;
else begin
if (write_reg & is_interrupt_enable)
interrupt_enable[enable_index] <= {wdata[PADDED_SOURCES-1:1], 1'b0}; //IRQ 0 hard coded to 0
end
end
endmodule

124
apu/plic/plic_cmptree.sv Normal file
View file

@ -0,0 +1,124 @@
/*
* Copyright © 2024 Chris Keilbart, Mohammad Shahidzadeh
*
* Licensed under the Apache License, Version 2.0 (the "License");
* you may not use this file except in compliance with the License.
* You may obtain a copy of the License at
*
* http://www.apache.org/licenses/LICENSE-2.0
*
* Unless required by applicable law or agreed to in writing, software
* distributed under the License is distributed on an "AS IS" BASIS,
* WITHOUT WARRANTIES OR CONDITIONS OF ANY KIND, either express or implied.
* See the License for the specific language governing permissions and
* limitations under the License.
*
* Initial code developed under the supervision of Dr. Lesley Shannon,
* Reconfigurable Computing Lab, Simon Fraser University.
*
* Author(s):
* Chris Keilbart <ckeilbar@sfu.ca>
* Mohammad Shahidzadeh <mohammad_shahidzadeh_asadi@sfu.ca>
*/
module plic_cmptree
#(
parameter int unsigned NUM_IRQS = 2,
parameter int unsigned PRIORITY_W = 4,
parameter int unsigned REG_STAGE_W = 2
) (
input logic clk,
input logic[NUM_IRQS-1:0] irq_valid,
input logic[NUM_IRQS-1:0][PRIORITY_W-1:0] irq_priority,
output logic[$clog2(NUM_IRQS)-1:0] highest_id,
output logic[PRIORITY_W-1:0] highest_priority,
output logic highest_valid
);
localparam ID_W = $clog2(NUM_IRQS);
localparam PADDED_W = 2**ID_W;
////////////////////////////////////////////////////
//PLIC Comparison Tree
//The interrupt with the highest priority must be selected
//Ties are broken by the index, with lower values taking priority
//Implemented as a binary tree, and the outputs of a certain layer are registered
//Pad to next power of two
logic[PADDED_W-1:0] padded_irq_valid;
logic[PADDED_W-1:0][PRIORITY_W-1:0] padded_irq_priority;
always_comb begin
for (int i = 0; i < PADDED_W; i++) begin
padded_irq_valid[i] = i < NUM_IRQS ? irq_valid[i] : 1'b0;
padded_irq_priority[i] = i < NUM_IRQS ? irq_priority[i] : '0;
end
end
//Same no matter the stage
logic left_valid;
logic[PRIORITY_W-1:0] left_priority;
logic right_valid;
logic[PRIORITY_W-1:0] right_priority;
logic chose_valid;
logic chose_left;
logic[PRIORITY_W-1:0] chose_priority;
assign chose_valid = left_valid | right_valid;
assign chose_left = left_valid & (~right_valid | left_priority > right_priority);
assign chose_priority = chose_left ? left_priority : right_priority;
//Recursive or base case
generate if (NUM_IRQS == 2) begin : gen_base_case
assign {left_valid, right_valid} = irq_valid;
assign {left_priority, right_priority} = irq_priority;
if (REG_STAGE_W == 2) begin : gen_ff
always_ff @(posedge clk) begin
highest_id[0] <= chose_left;
highest_priority <= chose_priority;
highest_valid <= chose_valid;
end
end else begin : gen_no_ff
assign highest_id[0] = chose_left;
assign highest_priority = chose_priority;
assign highest_valid = chose_priority;
end
end else begin : gen_recursive_case
logic[ID_W-2:0] left_id;
plic_cmptree #(.NUM_IRQS(PADDED_W/2), .PRIORITY_W(PRIORITY_W), .REG_STAGE_W(REG_STAGE_W)) left_tree (
.irq_valid(padded_irq_valid[PADDED_W-1:PADDED_W/2]),
.irq_priority(padded_irq_priority[PADDED_W-1:PADDED_W/2]),
.highest_id(left_id),
.highest_priority(left_priority),
.highest_valid(left_valid),
.*);
logic[ID_W-2:0] right_id;
plic_cmptree #(.NUM_IRQS(PADDED_W/2), .PRIORITY_W(PRIORITY_W), .REG_STAGE_W(REG_STAGE_W)) right_tree (
.irq_valid(padded_irq_valid[PADDED_W/2-1:0]),
.irq_priority(padded_irq_priority[PADDED_W/2-1:0]),
.highest_id(right_id),
.highest_priority(right_priority),
.highest_valid(right_valid),
.*);
if (REG_STAGE_W == NUM_IRQS) begin : gen_ff
always_ff @(posedge clk) begin
highest_id[ID_W-1] <= chose_left;
highest_id[ID_W-2:0] <= chose_left ? left_id : right_id;
highest_priority <= chose_priority;
highest_valid <= chose_valid;
end
end else begin : gen_no_ff
assign highest_id[ID_W-1] = chose_left;
assign highest_id[ID_W-2:0] = chose_left ? left_id : right_id;
assign highest_priority = chose_priority;
assign highest_valid = chose_valid;
end
end endgenerate
endmodule

66
apu/plic/plic_gateway.sv Normal file
View file

@ -0,0 +1,66 @@
/*
* Copyright © 2024 Chris Keilbart, Mohammad Shahidzadeh
*
* Licensed under the Apache License, Version 2.0 (the "License");
* you may not use this file except in compliance with the License.
* You may obtain a copy of the License at
*
* http://www.apache.org/licenses/LICENSE-2.0
*
* Unless required by applicable law or agreed to in writing, software
* distributed under the License is distributed on an "AS IS" BASIS,
* WITHOUT WARRANTIES OR CONDITIONS OF ANY KIND, either express or implied.
* See the License for the specific language governing permissions and
* limitations under the License.
*
* Initial code developed under the supervision of Dr. Lesley Shannon,
* Reconfigurable Computing Lab, Simon Fraser University.
*
* Author(s):
* Chris Keilbart <ckeilbar@sfu.ca>
* Mohammad Shahidzadeh <mohammad_shahidzadeh_asadi@sfu.ca>
*/
module plic_gateway
#(
parameter int unsigned NUM_IRQS = 1
) (
input logic clk,
input logic rst,
input logic[NUM_IRQS-1:0] irq,
input logic[NUM_IRQS-1:0] edge_sensitive, //Both rising and falling edges, else level sensitive and active high
input logic[NUM_IRQS-1:0] claim,
input logic[NUM_IRQS-1:0] complete,
output logic[NUM_IRQS-1:0] ip
);
////////////////////////////////////////////////////
//PLIC Gateway
//Registers interrupts, for them to be later claimed and completed
//Raising
logic[NUM_IRQS-1:0] irq_r;
always_ff @(posedge clk) irq_r <= irq;
logic[NUM_IRQS-1:0] raise;
always_comb begin
for (int i = 0; i < NUM_IRQS; i++)
raise[i] = edge_sensitive[i] ? irq[i] ^ irq_r[i] : irq[i];
end
//Registering
logic[NUM_IRQS-1:0] in_progress;
always_ff @(posedge clk) begin
if (rst) begin
ip <= '0;
in_progress <= '0;
end
else begin
ip <= (ip | raise) & ~in_progress & ~claim;
in_progress <= (in_progress | claim) & ~complete;
end
end
endmodule

180
apu/plic/plic_wrapper.sv Normal file
View file

@ -0,0 +1,180 @@
/*
* Copyright © 2024 Chris Keilbart, Mohammad Shahidzadeh
*
* Licensed under the Apache License, Version 2.0 (the "License");
* you may not use this file except in compliance with the License.
* You may obtain a copy of the License at
*
* http://www.apache.org/licenses/LICENSE-2.0
*
* Unless required by applicable law or agreed to in writing, software
* distributed under the License is distributed on an "AS IS" BASIS,
* WITHOUT WARRANTIES OR CONDITIONS OF ANY KIND, either express or implied.
* See the License for the specific language governing permissions and
* limitations under the License.
*
* Initial code developed under the supervision of Dr. Lesley Shannon,
* Reconfigurable Computing Lab, Simon Fraser University.
*
* Author(s):
* Chris Keilbart <ckeilbar@sfu.ca>
* Mohammad Shahidzadeh <mohammad_shahidzadeh_asadi@sfu.ca>
*/
module plic_wrapper
#(
parameter int unsigned NUM_SOURCES = 1,
parameter int unsigned NUM_TARGETS = 1,
parameter int unsigned PRIORITY_W = 4,
parameter int unsigned REG_STAGE = 1, //The stage in the comparison tree to insert registers at, must be 1 <= N <= clog2(NUM_SOURCES+1)
parameter logic AXI = 1'b0 //Else the wishbone bus is used
) (
input logic clk,
input logic rst,
input logic[NUM_SOURCES:1] irq_srcs,
input logic[NUM_SOURCES-1:0] edge_sensitive, //Both rising and falling edges, else level sensitive and active high
output logic[NUM_TARGETS-1:0] eip,
//Address bits 31:26 are ignored, 1:0 assumed to be 00
//Compliant Wishbone classic (not pipelined)
input logic wb_cyc,
input logic wb_stb,
input logic wb_we,
input logic [25:2] wb_adr,
input logic [31:0] wb_dat_i,
output logic [31:0] wb_dat_o,
output logic wb_ack,
//Compliant AXI Lite interface; does not include optional awprot, wstrb, bresp, arprot, and rresp
input logic s_axi_awvalid,
input logic[31:0] s_axi_awaddr,
output logic s_axi_awready,
input logic s_axi_wvalid,
input logic[31:0] s_axi_wdata,
output logic s_axi_wready,
output logic s_axi_bvalid,
input logic s_axi_bready,
input logic s_axi_arvalid,
input logic[31:0] s_axi_araddr,
output logic s_axi_arready,
output logic s_axi_rvalid,
output logic[31:0] s_axi_rdata,
input logic s_axi_rready
);
////////////////////////////////////////////////////
//RISC-V Platform-Level Interrupt Controller (PLIC) wrapper
//Handles bus interface
//26-bit address space
//If the parameter is too large such that the register stage is skipped, force it lower to enforce correct functionality
localparam int unsigned REG_STAGE_CORRECTED = REG_STAGE > $clog2(NUM_SOURCES+1)+1 ? 1 : REG_STAGE;
logic read_reg;
logic write_reg;
logic[25:2] addr;
logic[31:0] rdata;
logic[31:0] wdata;
plic #(
.NUM_SOURCES(NUM_SOURCES),
.NUM_TARGETS(NUM_TARGETS),
.PRIORITY_W(PRIORITY_W),
.REG_STAGE(REG_STAGE_CORRECTED)
) plic_inst (
.irq_srcs(irq_srcs),
.edge_sensitive(edge_sensitive),
.read_reg(read_reg),
.write_reg(write_reg),
.addr(addr),
.wdata(wdata),
.rdata(rdata),
.eip(eip),
.*);
//Interface
generate if (AXI) begin : gen_axi_if
//Simple implementation uses state machine for reads and writes
typedef enum logic [2:0] {
IDLE,
RACCEPT,
WACCEPT,
RRESP,
WRESP
} state_t;
state_t state;
state_t next_state;
always_ff @(posedge clk) begin
if (rst)
state <= IDLE;
else
state <= next_state;
end
always_comb begin
unique case (state) inside
IDLE : begin
if (s_axi_awvalid & s_axi_wvalid)
next_state = WACCEPT;
else if (s_axi_arvalid)
next_state = RACCEPT;
else
next_state = IDLE;
end
RACCEPT : next_state = RRESP;
WACCEPT : next_state = WRESP;
RRESP : next_state = s_axi_rready ? IDLE : RRESP;
WRESP : next_state = s_axi_bready ? IDLE : WRESP;
endcase
end
//Reads
assign read_reg = state == RACCEPT;
assign s_axi_arready = read_reg;
assign s_axi_rvalid = state == RRESP;
always_ff @(posedge clk) begin
if (read_reg)
s_axi_rdata <= rdata;
end
//Writes
assign write_reg = state == WACCEPT;
assign s_axi_awready = write_reg;
assign s_axi_wready = write_reg;
assign wdata = s_axi_wdata;
assign s_axi_bvalid = state == WRESP;
//Must use the read address while in the idle state to ensure read_reg claims the right interrupt
assign addr = write_reg ? s_axi_awaddr[25:2] : s_axi_araddr[25:2];
//Not in use
assign wb_ack = 0;
end else begin : gen_wishbone_if
//Writes are asynchronous, reads need two cycles
logic read_done;
assign addr = wb_adr[25:2];
assign wdata = wb_dat_i;
assign write_reg = wb_cyc & wb_stb & wb_we;
assign wb_ack = write_reg | read_done;
always_ff @(posedge clk) begin
read_reg <= wb_cyc & wb_stb & ~wb_we & ~read_reg & ~read_done;
read_done <= read_reg;
wb_dat_o <= rdata;
end
//Not in use
assign s_axi_awready = 0;
assign s_axi_wready = 0;
assign s_axi_bvalid = 0;
assign s_axi_arready = 0;
assign s_axi_rvalid = 0;
end endgenerate
endmodule

View file

@ -1,58 +0,0 @@
/*
* Copyright © 2017 Eric Matthews, Lesley Shannon
*
* Licensed under the Apache License, Version 2.0 (the "License");
* you may not use this file except in compliance with the License.
* You may obtain a copy of the License at
*
* http://www.apache.org/licenses/LICENSE-2.0
*
* Unless required by applicable law or agreed to in writing, software
* distributed under the License is distributed on an "AS IS" BASIS,
* WITHOUT WARRANTIES OR CONDITIONS OF ANY KIND, either express or implied.
* See the License for the specific language governing permissions and
* limitations under the License.
*
* Initial code developed under the supervision of Dr. Lesley Shannon,
* Reconfigurable Computing Lab, Simon Fraser University.
*
* Author(s):
* Eric Matthews <ematthew@sfu.ca>
*/
module byte_en_bram
import cva5_config::*;
import cva5_types::*;
import riscv_types::*;
#(
parameter LINES = 4096,
parameter preload_file = "",
parameter USE_PRELOAD_FILE = 0
)
(
input logic clk,
input logic[$clog2(LINES)-1:0] addr_a,
input logic en_a,
input logic[XLEN/8-1:0] be_a,
input logic[XLEN-1:0] data_in_a,
output logic[XLEN-1:0] data_out_a,
input logic[$clog2(LINES)-1:0] addr_b,
input logic en_b,
input logic[XLEN/8-1:0] be_b,
input logic[XLEN-1:0] data_in_b,
output logic[XLEN-1:0] data_out_b
);
generate
if(FPGA_VENDOR == XILINX)
xilinx_byte_enable_ram #(LINES, preload_file, USE_PRELOAD_FILE) ram_block (.*);
else
intel_byte_enable_ram #(LINES, preload_file, USE_PRELOAD_FILE) ram_block (.*);
endgenerate
endmodule

12
core/common_components/cva5_fifo.sv Executable file → Normal file
View file

@ -27,10 +27,6 @@
*/
module cva5_fifo
import cva5_config::*;
import riscv_types::*;
import cva5_types::*;
#(
parameter type DATA_TYPE = logic,
parameter FIFO_DEPTH = 4
@ -49,8 +45,10 @@ module cva5_fifo
always_ff @ (posedge clk) begin
if (rst)
fifo.valid <= 0;
else
fifo.valid <= fifo.push | (fifo.valid & ~fifo.pop);
else if (fifo.push & ~fifo.pop)
fifo.valid <= 1;
else if (fifo.pop & ~fifo.push)
fifo.valid <= 0;
end
assign fifo.full = fifo.valid;
@ -134,6 +132,6 @@ module cva5_fifo
fifo_potenial_push_overflow_assertion:
assert property (@(posedge clk) disable iff (rst) fifo.potential_push |-> (~fifo.full | fifo.pop)) else $error("potential push overflow");
fifo_underflow_assertion:
assert property (@(posedge clk) disable iff (rst) fifo.pop |-> fifo.valid) else $error("underflow");
assert property (@(posedge clk) disable iff (rst) fifo.pop |-> (fifo.valid | fifo.push)) else $error("underflow");
endmodule

0
core/common_components/cycler.sv Executable file → Normal file
View file

View file

@ -1,70 +0,0 @@
/*
* Copyright © 2023 Eric Matthews, Lesley Shannon
*
* Licensed under the Apache License, Version 2.0 (the "License");
* you may not use this file except in compliance with the License.
* You may obtain a copy of the License at
*
* http://www.apache.org/licenses/LICENSE-2.0
*
* Unless required by applicable law or agreed to in writing, software
* distributed under the License is distributed on an "AS IS" BASIS,
* WITHOUT WARRANTIES OR CONDITIONS OF ANY KIND, either express or implied.
* See the License for the specific language governing permissions and
* limitations under the License.
*
* Initial code developed under the supervision of Dr. Lesley Shannon,
* Reconfigurable Computing Lab, Simon Fraser University.
*
* Author(s):
* Eric Matthews <ematthew@sfu.ca>
*/
module dual_port_bram
import cva5_config::*;
import cva5_types::*;
import riscv_types::*;
#(
parameter WIDTH = 32,
parameter LINES = 4096
)
(
input logic clk,
input logic en_a,
input logic wen_a,
input logic[$clog2(LINES)-1:0] addr_a,
input logic[WIDTH-1:0] data_in_a,
output logic[WIDTH-1:0] data_out_a,
input logic en_b,
input logic wen_b,
input logic[$clog2(LINES)-1:0] addr_b,
input logic[WIDTH-1:0] data_in_b,
output logic[WIDTH-1:0] data_out_b
);
(* ram_style = "block", ramstyle = "no_rw_check" *) logic [WIDTH-1:0] ram [LINES];
initial ram = '{default: 0};
always_ff @ (posedge clk) begin
if (en_a) begin
if (wen_a)
ram[addr_a] <= data_in_a;
data_out_a <= ram[addr_a];
end
end
always_ff @ (posedge clk) begin
if (en_b) begin
if (wen_b)
ram[addr_b] <= data_in_b;
data_out_b <= ram[addr_b];
end
end
endmodule

View file

@ -68,7 +68,7 @@ module lfsr
logic feedback;
////////////////////////////////////////////////////
//Implementation
generate if (WIDTH == 2) begin : gen_width_two
generate if (WIDTH <= 2) begin : gen_width_one_or_two
assign feedback = ~value[WIDTH-1];
end
else begin : gen_width_three_plus
@ -84,8 +84,10 @@ module lfsr
always_ff @ (posedge clk) begin
if (NEEDS_RESET & rst)
value <= '0;
else if (en)
value <= {value[WIDTH-2:0], feedback};
else if (en) begin
value <= value << 1;
value[0] <= feedback;
end
end
endmodule

0
core/common_components/one_hot_to_integer.sv Executable file → Normal file
View file

View file

@ -35,7 +35,7 @@ module priority_encoder
);
////////////////////////////////////////////////////
//Width Check
if (WIDTH > 12)
if (WIDTH > 14)
$error("Max priority encoder width exceeded!");
//Tool workaround

View file

@ -0,0 +1,87 @@
/*
* Copyright © 2024 Chris Keilbart, Lesley Shannon
*
* Licensed under the Apache License, Version 2.0 (the "License");
* you may not use this file except in compliance with the License.
* You may obtain a copy of the License at
*
* http://www.apache.org/licenses/LICENSE-2.0
*
* Unless required by applicable law or agreed to in writing, software
* distributed under the License is distributed on an "AS IS" BASIS,
* WITHOUT WARRANTIES OR CONDITIONS OF ANY KIND, either express or implied.
* See the License for the specific language governing permissions and
* limitations under the License.
*
* Initial code developed under the supervision of Dr. Lesley Shannon,
* Reconfigurable Computing Lab, Simon Fraser University.
*
* Author(s):
* Chris Keilbart <ckeilbar@sfu.ca>
*/
module sdp_ram
#(
parameter ADDR_WIDTH = 10,
parameter NUM_COL = 4, //Number of independently writeable components
parameter COL_WIDTH = 16, //Width the "byte" enable controls
parameter PIPELINE_DEPTH = 1, //Depth of the output pipeline, is latency in clock cycles
parameter CASCADE_DEPTH = 4 //Maximum depth of the memory block cascade
)
(
input logic clk,
//Port A
input logic a_en,
input logic[NUM_COL-1:0] a_wbe,
input logic[COL_WIDTH*NUM_COL-1:0] a_wdata,
input logic[ADDR_WIDTH-1:0] a_addr,
//Port B
input logic b_en,
input logic[ADDR_WIDTH-1:0] b_addr,
output logic[COL_WIDTH*NUM_COL-1:0] b_rdata
);
localparam DATA_WIDTH = COL_WIDTH*NUM_COL;
(* cascade_height = CASCADE_DEPTH, ramstyle = "no_rw_check", ram_style = "block" *) //Higher depths use less resources but are slower
logic[DATA_WIDTH-1:0] mem[(1<<ADDR_WIDTH)-1:0];
initial mem = '{default: '0};
//A write
always_ff @(posedge clk) begin
for (int i = 0; i < NUM_COL; i++) begin
if (a_en & a_wbe[i])
mem[a_addr][i*COL_WIDTH +: COL_WIDTH] <= a_wdata[i*COL_WIDTH +: COL_WIDTH];
end
end
//B read
logic[DATA_WIDTH-1:0] b_ram_output;
always_ff @(posedge clk) begin
if (b_en)
b_ram_output <= mem[b_addr];
end
//B pipeline
generate if (PIPELINE_DEPTH > 0) begin : gen_b_pipeline
logic[DATA_WIDTH-1:0] b_data_pipeline[PIPELINE_DEPTH-1:0];
logic[PIPELINE_DEPTH-1:0] b_en_pipeline;
always_ff @(posedge clk) begin
for (int i = 0; i < PIPELINE_DEPTH; i++) begin
b_en_pipeline[i] <= i == 0 ? b_en : b_en_pipeline[i-1];
if (b_en_pipeline[i])
b_data_pipeline[i] <= i == 0 ? b_ram_output : b_data_pipeline[i-1];
end
end
assign b_rdata = b_data_pipeline[PIPELINE_DEPTH-1];
end
else begin : gen_b_transparent_output
assign b_rdata = b_ram_output;
end endgenerate
endmodule

View file

@ -0,0 +1,88 @@
/*
* Copyright © 2024 Chris Keilbart, Lesley Shannon
*
* Licensed under the Apache License, Version 2.0 (the "License");
* you may not use this file except in compliance with the License.
* You may obtain a copy of the License at
*
* http://www.apache.org/licenses/LICENSE-2.0
*
* Unless required by applicable law or agreed to in writing, software
* distributed under the License is distributed on an "AS IS" BASIS,
* WITHOUT WARRANTIES OR CONDITIONS OF ANY KIND, either express or implied.
* See the License for the specific language governing permissions and
* limitations under the License.
*
* Initial code developed under the supervision of Dr. Lesley Shannon,
* Reconfigurable Computing Lab, Simon Fraser University.
*
* Author(s):
* Chris Keilbart <ckeilbar@sfu.ca>
*/
module sdp_ram_padded
#(
parameter ADDR_WIDTH = 10,
parameter NUM_COL = 4, //Number of independently writeable components
parameter COL_WIDTH = 16, //Width the "byte" enable controls
parameter PIPELINE_DEPTH = 1, //Depth of the output pipeline, is latency in clock cycles
parameter CASCADE_DEPTH = 4 //Maximum depth of the memory block cascade
)
(
input logic clk,
//Port A
input logic a_en,
input logic[NUM_COL-1:0] a_wbe,
input logic[COL_WIDTH*NUM_COL-1:0] a_wdata,
input logic[ADDR_WIDTH-1:0] a_addr,
//Port B
input logic b_en,
input logic[ADDR_WIDTH-1:0] b_addr,
output logic[COL_WIDTH*NUM_COL-1:0] b_rdata
);
localparam DATA_WIDTH = COL_WIDTH*NUM_COL;
//Pad columns to the nearest multiple of 8 or 9 to allow the use of the byte enable
//This results in a more compact BRAM encoding
localparam PAD_WIDTH8 = (8 - (COL_WIDTH % 8)) % 8;
localparam PAD_WIDTH9 = (9 - (COL_WIDTH % 9)) % 9;
localparam PAD_WIDTH = PAD_WIDTH8 <= PAD_WIDTH9 ? PAD_WIDTH8 : PAD_WIDTH9;
localparam PADDED_WIDTH = COL_WIDTH + PAD_WIDTH;
localparam TOTAL_WIDTH = NUM_COL * PADDED_WIDTH;
generate if (PAD_WIDTH == 0 || NUM_COL == 1) begin : gen_no_padding
sdp_ram #(
.ADDR_WIDTH(ADDR_WIDTH),
.NUM_COL(NUM_COL),
.COL_WIDTH(COL_WIDTH),
.PIPELINE_DEPTH(PIPELINE_DEPTH),
.CASCADE_DEPTH(CASCADE_DEPTH)
) mem (.*);
end else begin : gen_padded
logic[TOTAL_WIDTH-1:0] a_padded;
logic[TOTAL_WIDTH-1:0] b_padded;
always_comb begin
a_padded = 'x;
for (int i = 0; i < NUM_COL; i++) begin
a_padded[i*PADDED_WIDTH+:COL_WIDTH] = a_wdata[i*COL_WIDTH+:COL_WIDTH];
b_rdata[i*COL_WIDTH+:COL_WIDTH] = b_padded[i*PADDED_WIDTH+:COL_WIDTH];
end
end
sdp_ram #(
.ADDR_WIDTH(ADDR_WIDTH),
.NUM_COL(NUM_COL),
.COL_WIDTH(PADDED_WIDTH),
.PIPELINE_DEPTH(PIPELINE_DEPTH),
.CASCADE_DEPTH(CASCADE_DEPTH)
) mem (
.a_wdata(a_padded),
.b_rdata(b_padded),
.*);
end endgenerate
endmodule

View file

@ -0,0 +1,124 @@
/*
* Copyright © 2024 Chris Keilbart, Lesley Shannon
*
* Licensed under the Apache License, Version 2.0 (the "License");
* you may not use this file except in compliance with the License.
* You may obtain a copy of the License at
*
* http://www.apache.org/licenses/LICENSE-2.0
*
* Unless required by applicable law or agreed to in writing, software
* distributed under the License is distributed on an "AS IS" BASIS,
* WITHOUT WARRANTIES OR CONDITIONS OF ANY KIND, either express or implied.
* See the License for the specific language governing permissions and
* limitations under the License.
*
* Initial code developed under the supervision of Dr. Lesley Shannon,
* Reconfigurable Computing Lab, Simon Fraser University.
*
* Author(s):
* Chris Keilbart <ckeilbar@sfu.ca>
*/
module tdp_ram
#(
parameter ADDR_WIDTH = 10,
parameter NUM_COL = 4, //Number of independently writeable components
parameter COL_WIDTH = 16, //Width the "byte" enable controls
parameter PIPELINE_DEPTH = 1, //Depth of the output pipeline, is latency in clock cycles
parameter CASCADE_DEPTH = 4, //Maximum depth of the memory block cascade
parameter USE_PRELOAD = 0,
parameter PRELOAD_FILE = ""
)
(
input logic clk,
//Port A
input logic a_en,
input logic[NUM_COL-1:0] a_wbe,
input logic[COL_WIDTH*NUM_COL-1:0] a_wdata,
input logic[ADDR_WIDTH-1:0] a_addr,
output logic[COL_WIDTH*NUM_COL-1:0] a_rdata,
//Port B
input logic b_en,
input logic[NUM_COL-1:0] b_wbe,
input logic[COL_WIDTH*NUM_COL-1:0] b_wdata,
input logic[ADDR_WIDTH-1:0] b_addr,
output logic[COL_WIDTH*NUM_COL-1:0] b_rdata
);
localparam DATA_WIDTH = COL_WIDTH*NUM_COL;
(* cascade_height = CASCADE_DEPTH, ramstyle = "no_rw_check", ram_style = "block" *) //Higher depths use less resources but are slower
logic[DATA_WIDTH-1:0] mem[(1<<ADDR_WIDTH)-1:0];
initial begin
if (USE_PRELOAD)
$readmemh(PRELOAD_FILE, mem, 0);
end
//A read/write
logic[DATA_WIDTH-1:0] a_ram_output;
always_ff @(posedge clk) begin
if (a_en) begin
for (int i = 0; i < NUM_COL; i++) begin
if (a_wbe[i])
mem[a_addr][i*COL_WIDTH +: COL_WIDTH] <= a_wdata[i*COL_WIDTH +: COL_WIDTH];
end
if (~|a_wbe)
a_ram_output <= mem[a_addr];
end
end
//A pipeline
generate if (PIPELINE_DEPTH > 0) begin : gen_a_pipeline
logic[DATA_WIDTH-1:0] a_data_pipeline[PIPELINE_DEPTH-1:0];
logic[PIPELINE_DEPTH-1:0] a_en_pipeline;
always_ff @(posedge clk) begin
for (int i = 0; i < PIPELINE_DEPTH; i++) begin
a_en_pipeline[i] <= i == 0 ? a_en : a_en_pipeline[i-1];
if (a_en_pipeline[i])
a_data_pipeline[i] <= i == 0 ? a_ram_output : a_data_pipeline[i-1];
end
end
assign a_rdata = a_data_pipeline[PIPELINE_DEPTH-1];
end
else begin : gen_a_transparent_output
assign a_rdata = a_ram_output;
end endgenerate
//B read/write
logic[DATA_WIDTH-1:0] b_ram_output;
always_ff @(posedge clk) begin
if (b_en) begin
for (int i = 0; i < NUM_COL; i++) begin
if (b_wbe[i])
mem[b_addr][i*COL_WIDTH +: COL_WIDTH] <= b_wdata[i*COL_WIDTH +: COL_WIDTH];
end
if (~|b_wbe)
b_ram_output <= mem[b_addr];
end
end
//B pipeline
generate if (PIPELINE_DEPTH > 0) begin : gen_b_pipeline
logic[DATA_WIDTH-1:0] b_data_pipeline[PIPELINE_DEPTH-1:0];
logic[PIPELINE_DEPTH-1:0] b_en_pipeline;
always_ff @(posedge clk) begin
for (int i = 0; i < PIPELINE_DEPTH; i++) begin
b_en_pipeline[i] <= i == 0 ? b_en : b_en_pipeline[i-1];
if (b_en_pipeline[i])
b_data_pipeline[i] <= i == 0 ? b_ram_output : b_data_pipeline[i-1];
end
end
assign b_rdata = b_data_pipeline[PIPELINE_DEPTH-1];
end
else begin : gen_b_transparent_output
assign b_rdata = b_ram_output;
end endgenerate
endmodule

View file

@ -0,0 +1,67 @@
/*
* Copyright © 2017 Eric Matthews, Lesley Shannon
*
* Licensed under the Apache License, Version 2.0 (the "License");
* you may not use this file except in compliance with the License.
* You may obtain a copy of the License at
*
* http://www.apache.org/licenses/LICENSE-2.0
*
* Unless required by applicable law or agreed to in writing, software
* distributed under the License is distributed on an "AS IS" BASIS,
* WITHOUT WARRANTIES OR CONDITIONS OF ANY KIND, either express or implied.
* See the License for the specific language governing permissions and
* limitations under the License.
*
* Initial code developed under the supervision of Dr. Lesley Shannon,
* Reconfigurable Computing Lab, Simon Fraser University.
*
* Author(s):
* Eric Matthews <ematthew@sfu.ca>
*/
module round_robin
#(
parameter int unsigned NUM_PORTS = 2
) (
input logic clk,
input logic rst,
input logic[NUM_PORTS-1:0] requests,
input logic grant,
output logic[(NUM_PORTS == 1 ? 1 : $clog2(NUM_PORTS))-1:0] grantee
);
localparam PORT_W = $clog2(NUM_PORTS);
generate if (NUM_PORTS == 1) begin : gen_no_arb
assign grantee = '0;
end else begin : gen_arb
logic[PORT_W-1:0] state;
logic[PORT_W-1:0] muxes [NUM_PORTS-1:0];
//Lowest priority to current state
always_ff @(posedge clk) begin
if (rst)
state <= 0;
else if (grant)
state <= grantee;
end
//ex: state 0, highest priority to PORTS-1
always_comb begin
for (int i = 0; i < NUM_PORTS; i++) begin
muxes[i] = PORT_W'(i);
for (int j = 0; j < NUM_PORTS; j++) begin
if (requests[(i + j) % NUM_PORTS])
muxes[i] = PORT_W'((i + j) % NUM_PORTS);
end
end
end
//Select mux output based on current state
assign grantee = muxes[state];
end endgenerate
endmodule

View file

@ -49,8 +49,8 @@ module set_clr_reg_with_rst
else
result <= set | (result & ~clr);
end
end else begin
always_ff @ (posedge clk) begin : gen_clear_over_set
end else begin : gen_clear_over_set
always_ff @ (posedge clk) begin
if (rst)
result <= RST_VALUE;
else

View file

@ -22,16 +22,12 @@
module toggle_memory
import cva5_config::*;
import cva5_types::*;
# (
parameter DEPTH = 8,
parameter NUM_READ_PORTS = 2
)
(
input logic clk,
input logic rst,
input logic toggle,
input logic [$clog2(DEPTH)-1:0] toggle_id,

View file

@ -22,9 +22,6 @@
module toggle_memory_set
import cva5_config::*;
import cva5_types::*;
# (
parameter DEPTH = 64,
parameter NUM_WRITE_PORTS = 3,
@ -32,7 +29,6 @@ module toggle_memory_set
)
(
input logic clk,
input logic rst,
input logic init_clear,
input logic toggle [NUM_WRITE_PORTS],
@ -53,7 +49,7 @@ module toggle_memory_set
//counter for indexing through memories for post-reset clearing/initialization
lfsr #(.WIDTH($clog2(DEPTH)), .NEEDS_RESET(0))
lfsr_counter (
.clk (clk), .rst (rst),
.clk (clk), .rst (1'b0),
.en(init_clear),
.value(clear_index)
);
@ -76,7 +72,7 @@ module toggle_memory_set
for (j = 0; j < NUM_WRITE_PORTS+1; j++) begin : write_port_gen
toggle_memory #(.DEPTH(DEPTH), .NUM_READ_PORTS(NUM_READ_PORTS+1))
mem (
.clk (clk), .rst (rst),
.clk (clk),
.toggle(_toggle[j]),
.toggle_id(_toggle_addr[j]),
.read_id(_read_addr),

View file

@ -1,82 +0,0 @@
/*
* Copyright © 2017 Eric Matthews, Lesley Shannon
*
* Licensed under the Apache License, Version 2.0 (the "License");
* you may not use this file except in compliance with the License.
* You may obtain a copy of the License at
*
* http://www.apache.org/licenses/LICENSE-2.0
*
* Unless required by applicable law or agreed to in writing, software
* distributed under the License is distributed on an "AS IS" BASIS,
* WITHOUT WARRANTIES OR CONDITIONS OF ANY KIND, either express or implied.
* See the License for the specific language governing permissions and
* limitations under the License.
*
* Initial code developed under the supervision of Dr. Lesley Shannon,
* Reconfigurable Computing Lab, Simon Fraser University.
*
* Author(s):
* Eric Matthews <ematthew@sfu.ca>
*/
module intel_byte_enable_ram
import cva5_config::*;
import riscv_types::*;
import cva5_types::*;
#(
parameter LINES = 8192,
parameter preload_file = "",
parameter USE_PRELOAD_FILE = 0
)
(
input logic clk,
input logic[$clog2(LINES)-1:0] addr_a,
input logic en_a,
input logic[XLEN/8-1:0] be_a,
input logic[XLEN-1:0] data_in_a,
output logic[XLEN-1:0] data_out_a,
input logic[$clog2(LINES)-1:0] addr_b,
input logic en_b,
input logic[XLEN/8-1:0] be_b,
input logic[XLEN-1:0] data_in_b,
output logic[XLEN-1:0] data_out_b
);
(* ramstyle = "no_rw_check" *) logic [3:0][7:0] ram [LINES-1:0];
initial
begin
if(USE_PRELOAD_FILE)
$readmemh(preload_file,ram, 0, LINES-1);
end
always_ff @(posedge clk) begin
if (en_a) begin
if (be_a[0]) ram[addr_a][0] <= data_in_a[7:0];
if (be_a[1]) ram[addr_a][1] <= data_in_a[15:8];
if (be_a[2]) ram[addr_a][2] <= data_in_a[23:16];
if (be_a[3]) ram[addr_a][3] <= data_in_a[31:24];
end
data_out_a <= ram[addr_a];
end
always_ff @(posedge clk) begin
if (en_b) begin
if (be_b[0]) ram[addr_b][0] <= data_in_b[7:0];
if (be_b[1]) ram[addr_b][1] <= data_in_b[15:8];
if (be_b[2]) ram[addr_b][2] <= data_in_b[23:16];
if (be_b[3]) ram[addr_b][3] <= data_in_b[31:24];
end
data_out_b <= ram[addr_b];
end
endmodule

View file

@ -1,134 +0,0 @@
/*
* Copyright © 2017 Eric Matthews, Lesley Shannon
*
* Licensed under the Apache License, Version 2.0 (the "License");
* you may not use this file except in compliance with the License.
* You may obtain a copy of the License at
*
* http://www.apache.org/licenses/LICENSE-2.0
*
* Unless required by applicable law or agreed to in writing, software
* distributed under the License is distributed on an "AS IS" BASIS,
* WITHOUT WARRANTIES OR CONDITIONS OF ANY KIND, either express or implied.
* See the License for the specific language governing permissions and
* limitations under the License.
*
* Initial code developed under the supervision of Dr. Lesley Shannon,
* Reconfigurable Computing Lab, Simon Fraser University.
*
* Author(s):
* Eric Matthews <ematthew@sfu.ca>
*/
module cva5_wrapper_xilinx
import cva5_config::*;
import cva5_types::*;
import l2_config_and_types::*;
(
input logic clk,
input logic rst,
local_memory_interface.master instruction_bram,
local_memory_interface.master data_bram,
l2_requester_interface.master l2,
// AXI SIGNALS - need these to unwrap the interface for packaging //
input logic m_axi_arready,
output logic m_axi_arvalid,
output logic [C_M_AXI_ADDR_WIDTH-1:0] m_axi_araddr,
output logic [7:0] m_axi_arlen,
output logic [2:0] m_axi_arsize,
output logic [1:0] m_axi_arburst,
output logic [3:0] m_axi_arcache,
output logic [5:0] m_axi_arid,
//read data
output logic m_axi_rready,
input logic m_axi_rvalid,
input logic [C_M_AXI_DATA_WIDTH-1:0] m_axi_rdata,
input logic [1:0] m_axi_rresp,
input logic m_axi_rlast,
input logic [5:0] m_axi_rid,
//Write channel
//write address
input logic m_axi_awready,
output logic m_axi_awvalid,
output logic [C_M_AXI_ADDR_WIDTH-1:0] m_axi_awaddr,
output logic [7:0] m_axi_awlen,
output logic [2:0] m_axi_awsize,
output logic [1:0] m_axi_awburst,
output logic [3:0] m_axi_awcache,
output logic [5:0] m_axi_awid,
//write data
input logic m_axi_wready,
output logic m_axi_wvalid,
output logic [C_M_AXI_DATA_WIDTH-1:0] m_axi_wdata,
output logic [(C_M_AXI_DATA_WIDTH/8)-1:0] m_axi_wstrb,
output logic m_axi_wlast,
//write response
output logic m_axi_bready,
input logic m_axi_bvalid,
input logic [1:0] m_axi_bresp,
input logic [5:0] m_axi_bid
);
//Unused outputs
avalon_interface m_avalon ();
wishbone_interface dwishbone ();
wishbone_interface iwishbone ();
logic timer_interrupt;
logic interrupt;
//AXI interface
axi_interface m_axi();
assign m_axi_arready = m_axi.arready;
assign m_axi_arvalid = m_axi.arvalid;
assign m_axi_araddr = m_axi.araddr;
assign m_axi_arlen = m_axi.arlen;
assign m_axi_arsize = m_axi.arsize;
assign m_axi_arburst = m_axi.arburst;
assign m_axi_arcache = m_axi.arcache;
//assign m_axi_arid = m_axi.arid;
assign m_axi_rready = m_axi.rready;
assign m_axi_rvalid = m_axi.rvalid;
assign m_axi_rdata = m_axi.rdata;
assign m_axi_rresp = m_axi.rresp;
assign m_axi_rlast = m_axi.rlast;
//assign m_axi_rid = m_axi.rid;
assign m_axi_awready = m_axi.awready;
assign m_axi_awvalid = m_axi.awvalid;
assign m_axi_awaddr = m_axi.awaddr;
assign m_axi_awlen = m_axi.awlen;
assign m_axi_awsize = m_axi.awsize;
assign m_axi_awburst = m_axi.awburst;
assign m_axi_awcache = m_axi.awcache;
//assign m_axi_awid = m_axi.awid;
//write data
assign m_axi_wready = m_axi.wready;
assign m_axi_wvalid = m_axi.wvalid;
assign m_axi_wdata = m_axi.wdata;
assign m_axi_wstrb = m_axi.wstrb;
assign m_axi_wlast = m_axi.wlast;
//write response
assign m_axi_bready = m_axi.bready;
assign m_axi_bvalid = m_axi.bvalid;
assign m_axi_bresp = m_axi.bresp;
//assign m_axi_bid = m_axi.bid;
cva5 cpu(.*);
endmodule

View file

@ -1,87 +0,0 @@
/*
* Copyright © 2017 Eric Matthews, Lesley Shannon
*
* Licensed under the Apache License, Version 2.0 (the "License");
* you may not use this file except in compliance with the License.
* You may obtain a copy of the License at
*
* http://www.apache.org/licenses/LICENSE-2.0
*
* Unless required by applicable law or agreed to in writing, software
* distributed under the License is distributed on an "AS IS" BASIS,
* WITHOUT WARRANTIES OR CONDITIONS OF ANY KIND, either express or implied.
* See the License for the specific language governing permissions and
* limitations under the License.
*
* Initial code developed under the supervision of Dr. Lesley Shannon,
* Reconfigurable Computing Lab, Simon Fraser University.
*
* Author(s):
* Eric Matthews <ematthew@sfu.ca>
*/
module xilinx_byte_enable_ram
import cva5_config::*;
import riscv_types::*;
import cva5_types::*;
#(
parameter LINES = 4096,
parameter preload_file = "",
parameter USE_PRELOAD_FILE = 0
)
(
input logic clk,
input logic[$clog2(LINES)-1:0] addr_a,
input logic en_a,
input logic[XLEN/8-1:0] be_a,
input logic[XLEN-1:0] data_in_a,
output logic[XLEN-1:0] data_out_a,
input logic[$clog2(LINES)-1:0] addr_b,
input logic en_b,
input logic[XLEN/8-1:0] be_b,
input logic[XLEN-1:0] data_in_b,
output logic[XLEN-1:0] data_out_b
);
logic [31:0] ram [LINES-1:0];
initial
begin
if(USE_PRELOAD_FILE)
$readmemh(preload_file,ram, 0, LINES-1);
end
generate begin : gen_xilinx_bram
genvar i;
for (i=0; i < 4; i++) begin
always_ff @(posedge clk) begin
if (en_a) begin
if (be_a[i]) begin
ram[addr_a][8*i+:8] <= data_in_a[8*i+:8];
data_out_a[8*i+:8] <= data_in_a[8*i+:8];
end else begin
data_out_a[8*i+:8] <= ram[addr_a][8*i+:8];
end
end
end
end
for (i=0; i < 4; i++) begin
always_ff @(posedge clk) begin
if (en_b) begin
if (be_b[i]) begin
ram[addr_b][8*i+:8] <= data_in_b[8*i+:8];
data_out_b[8*i+:8] <= data_in_b[8*i+:8];
end else begin
data_out_b[8*i+:8] <= ram[addr_b][8*i+:8];
end
end
end
end
end endgenerate
endmodule

112
core/core_arbiter.sv Normal file
View file

@ -0,0 +1,112 @@
/*
* Copyright © 2024 Chris Keilbart
*
* Licensed under the Apache License, Version 2.0 (the "License");
* you may not use this file except in compliance with the License.
* You may obtain a copy of the License at
*
* http://www.apache.org/licenses/LICENSE-2.0
*
* Unless required by applicable law or agreed to in writing, software
* distributed under the License is distributed on an "AS IS" BASIS,
* WITHOUT WARRANTIES OR CONDITIONS OF ANY KIND, either express or implied.
* See the License for the specific language governing permissions and
* limitations under the License.
*
* Initial code developed under the supervision of Dr. Lesley Shannon,
* Reconfigurable Computing Lab, Simon Fraser University.
*
* Author(s):
* Chris Keilbart <ckeilbar@sfu.ca>
*/
module core_arbiter
#(
parameter logic INCLUDE_DCACHE = 1'b1,
parameter logic INCLUDE_ICACHE = 1'b1,
parameter logic INCLUDE_MMUS = 1'b1
) (
input logic clk,
input logic rst,
mem_interface.rw_slave dcache,
mem_interface.ro_slave icache,
mem_interface.ro_slave dmmu,
mem_interface.ro_slave immu,
mem_interface.mem_master mem
);
//Multiplexes memory requests and demultiplexes memory responses
//If the MMUs are not present the MSB of the memory ID is always 0
//If the I$ is also not present the entire ID can be set to 0
logic[3:0] request;
logic[3:0][31:2] addr;
logic[3:0][4:0] rlen;
logic[3:0] rnw;
logic[3:0] rmw;
logic[1:0] port;
////////////////////////////////////////////////////
//Implementation
//D$
assign request[0] = INCLUDE_DCACHE ? dcache.request : 0;
assign addr[0] = INCLUDE_DCACHE ? dcache.addr : 'x;
assign rlen[0] = INCLUDE_DCACHE ? dcache.rlen : 'x;
assign rnw[0] = INCLUDE_DCACHE ? dcache.rnw : 'x;
assign rmw[0] = INCLUDE_DCACHE ? dcache.rmw : 'x;
assign mem.wbe = dcache.wbe;
assign mem.wdata = dcache.wdata;
assign dcache.inv = mem.inv;
assign dcache.inv_addr = mem.inv_addr;
assign dcache.write_outstanding = mem.write_outstanding;
assign dcache.ack = mem.ack & port == 2'b00;
assign dcache.rvalid = mem.rvalid & mem.rid == 2'b00;
assign dcache.rdata = mem.rdata;
//I$
assign request[1] = INCLUDE_ICACHE ? icache.request : 0;
assign addr[1] = INCLUDE_ICACHE ? icache.addr : 'x;
assign rlen[1] = INCLUDE_ICACHE ? icache.rlen : 'x;
assign rnw[1] = INCLUDE_ICACHE ? 1 : 'x;
assign rmw[1] = INCLUDE_ICACHE ? 0 : 'x;
assign icache.ack = mem.ack & port == 2'b01;
assign icache.rvalid = mem.rvalid & mem.rid == 2'b01;
assign icache.rdata = mem.rdata;
//DMMU
assign request[2] = INCLUDE_MMUS ? dmmu.request : 0;
assign addr[2] = INCLUDE_MMUS ? dmmu.addr : 'x;
assign rlen[2] = INCLUDE_MMUS ? dmmu.rlen : 'x;
assign rnw[2] = INCLUDE_MMUS ? 1 : 'x;
assign rmw[2] = INCLUDE_MMUS ? 0 : 'x;
assign dmmu.rdata = mem.rdata;
assign dmmu.ack = mem.ack & port == 2'b10;
assign dmmu.rvalid = mem.rvalid & mem.rid == 2'b10;
//IMMU
assign request[3] = INCLUDE_MMUS ? immu.request : 0;
assign addr[3] = INCLUDE_MMUS ? immu.addr : 'x;
assign rlen[3] = INCLUDE_MMUS ? immu.rlen : 'x;
assign rnw[3] = INCLUDE_MMUS ? 1 : 'x;
assign rmw[3] = INCLUDE_MMUS ? 0 : 'x;
assign immu.rdata = mem.rdata;
assign immu.ack = mem.ack & port == 2'b11;
assign immu.rvalid = mem.rvalid & mem.rid == 2'b11;
////////////////////////////////////////////////////
//Arbitration
round_robin #(.NUM_PORTS(4)) rr (
.requests(request),
.grant(mem.request & mem.ack),
.grantee(port),
.*);
assign mem.request = |request;
assign mem.addr = addr[port];
assign mem.rlen = rlen[port];
assign mem.rnw = rnw[port];
assign mem.rmw = rmw[port];
assign mem.id = port;
endmodule

221
core/cva5.sv Executable file → Normal file
View file

@ -25,10 +25,10 @@
module cva5
import cva5_config::*;
import l2_config_and_types::*;
import riscv_types::*;
import cva5_types::*;
import fpu_types::*;
import csr_types::*;
#(
parameter cpu_config_t CONFIG = EXAMPLE_CONFIG
@ -46,18 +46,19 @@ module cva5
wishbone_interface.master dwishbone,
wishbone_interface.master iwishbone,
l2_requester_interface.master l2,
mem_interface.mem_master mem,
input logic [63:0] mtime,
input interrupt_t s_interrupt,
input interrupt_t m_interrupt
);
////////////////////////////////////////////////////
//Connecting Signals
l1_arbiter_request_interface l1_request[L1_CONNECTIONS-1:0]();
l1_arbiter_return_interface l1_response[L1_CONNECTIONS-1:0]();
logic sc_complete;
logic sc_success;
mem_interface dcache_mem();
mem_interface icache_mem();
mem_interface dmmu_mem();
mem_interface immu_mem();
branch_predictor_interface bp();
branch_results_t br_results;
@ -90,7 +91,8 @@ module cva5
tlb_interface itlb();
tlb_interface dtlb();
logic tlb_on;
logic instruction_translation_on;
logic data_translation_on;
logic [ASIDLEN-1:0] asid;
//Instruction ID/Metadata
@ -108,11 +110,10 @@ module cva5
fetch_metadata_t fetch_metadata;
//Decode stage
logic decode_advance;
decode_packet_t decode;
decode_packet_t decode;
logic decode_uses_rd;
logic fp_decode_uses_rd;
rs_addr_t decode_rd_addr;
exception_sources_t decode_exception_unit;
logic decode_is_store;
phys_addr_t decode_phys_rd_addr;
phys_addr_t fp_decode_phys_rd_addr;
@ -127,7 +128,6 @@ module cva5
retire_packet_t fp_wb_retire;
retire_packet_t store_retire;
id_t retire_ids [RETIRE_PORTS];
id_t retire_ids_next [RETIRE_PORTS];
logic retire_port_valid [RETIRE_PORTS];
logic [LOG2_RETIRE_PORTS : 0] retire_count;
//Writeback
@ -138,29 +138,33 @@ module cva5
phys_addr_t wb_phys_addr [CONFIG.NUM_WB_GROUPS];
phys_addr_t fp_wb_phys_addr [2];
logic [4:0] fflag_wmask;
//Exception
logic [31:0] oldest_pc;
renamer_interface #(.NUM_WB_GROUPS(CONFIG.NUM_WB_GROUPS), .READ_PORTS(REGFILE_READ_PORTS)) decode_rename_interface ();
renamer_interface #(.NUM_WB_GROUPS(2), .READ_PORTS(3)) fp_decode_rename_interface ();
//Global Control
exception_interface exception [NUM_EXCEPTION_SOURCES]();
logic [$clog2(NUM_EXCEPTION_SOURCES)-1:0] current_exception_unit;
gc_outputs_t gc;
tlb_packet_t sfence;
load_store_status_t load_store_status;
logic [LOG2_MAX_IDS:0] post_issue_count;
logic [1:0] current_privilege;
logic mret;
logic sret;
logic [31:0] epc;
logic [31:0] exception_target_pc;
logic csr_frontend_flush;
logic interrupt_taken;
logic interrupt_pending;
logic processing_csr;
//CSR broadcast info
logic [1:0] current_privilege;
logic tvm;
logic tsr;
envcfg_t menvcfg;
envcfg_t senvcfg;
logic [31:0] mepc;
logic [31:0] sepc;
logic [31:0] exception_target_pc;
//Decode Unit and Fetch Unit
logic issue_stage_ready;
@ -178,18 +182,19 @@ module cva5
//Implementation
////////////////////////////////////////////////////
// Memory Interface
generate if (CONFIG.INCLUDE_S_MODE || CONFIG.INCLUDE_ICACHE || CONFIG.INCLUDE_DCACHE) begin : gen_l1_arbiter
l1_arbiter #(.CONFIG(CONFIG))
generate if (CONFIG.MODES == MSU || CONFIG.INCLUDE_ICACHE || CONFIG.INCLUDE_DCACHE) begin : gen_core_arb
core_arbiter #(.INCLUDE_DCACHE(CONFIG.INCLUDE_DCACHE), .INCLUDE_ICACHE(CONFIG.INCLUDE_ICACHE), .INCLUDE_MMUS(CONFIG.MODES == MSU))
arb(
.clk (clk),
.rst (rst),
.l2 (l2),
.sc_complete (sc_complete),
.sc_success (sc_success),
.l1_request (l1_request),
.l1_response (l1_response)
.dcache (dcache_mem),
.icache (icache_mem),
.dmmu (dmmu_mem),
.immu (immu_mem),
.mem (mem)
);
end
endgenerate
@ -217,7 +222,6 @@ module cva5
.decode_rd_addr (decode_rd_addr),
.decode_phys_rd_addr (decode_phys_rd_addr),
.fp_decode_phys_rd_addr (fp_decode_phys_rd_addr),
.decode_exception_unit (decode_exception_unit),
.decode_is_store (decode_is_store),
.issue (issue),
.instruction_issued (instruction_issued),
@ -231,12 +235,9 @@ module cva5
.fp_wb_retire (fp_wb_retire),
.store_retire (store_retire),
.retire_ids (retire_ids),
.retire_ids_next (retire_ids_next),
.retire_port_valid(retire_port_valid),
.retire_count (retire_count),
.post_issue_count(post_issue_count),
.oldest_pc (oldest_pc),
.current_exception_unit (current_exception_unit)
.post_issue_count(post_issue_count)
);
////////////////////////////////////////////////////
@ -257,18 +258,16 @@ module cva5
.early_branch_flush (early_branch_flush),
.early_branch_flush_ras_adjust (early_branch_flush_ras_adjust),
.if_pc (if_pc),
.fetch_instruction (fetch_instruction),
.instruction_bram (instruction_bram),
.fetch_instruction (fetch_instruction),
.instruction_bram (instruction_bram),
.iwishbone (iwishbone),
.icache_on ('1),
.tlb (itlb),
.l1_request (l1_request[L1_ICACHE_ID]),
.l1_response (l1_response[L1_ICACHE_ID]),
.exception (1'b0)
.tlb (itlb),
.mem (icache_mem)
);
branch_predictor #(.CONFIG(CONFIG))
bp_block (
bp_block (
.clk (clk),
.rst (rst),
.bp (bp),
@ -285,39 +284,33 @@ module cva5
.ras (ras)
);
generate if (CONFIG.INCLUDE_S_MODE) begin : gen_itlb_immu
tlb_lut_ram #(.WAYS(CONFIG.ITLB.WAYS), .DEPTH(CONFIG.ITLB.DEPTH))
i_tlb (
.clk (clk),
.rst (rst),
.gc (gc),
.abort_request (gc.fetch_flush | early_branch_flush),
.asid (asid),
.tlb (itlb),
.mmu (immu)
);
itlb #(.WAYS(CONFIG.ITLB.WAYS), .DEPTH(CONFIG.ITLB.DEPTH))
i_tlb (
.clk (clk),
.rst (rst),
.translation_on (instruction_translation_on),
.sfence (sfence),
.abort_request (gc.fetch_flush | early_branch_flush),
.asid (asid),
.tlb (itlb),
.mmu (immu)
);
generate if (CONFIG.MODES == MSU) begin : gen_immu
mmu i_mmu (
.clk (clk),
.rst (rst),
.mmu (immu) ,
.abort_request (gc.fetch_flush),
.l1_request (l1_request[L1_IMMU_ID]),
.l1_response (l1_response[L1_IMMU_ID])
.mmu (immu),
.abort_request (gc.fetch_flush | early_branch_flush),
.mem (immu_mem)
);
end
else begin
assign itlb.ready = 1;
assign itlb.done = itlb.new_request;
assign itlb.physical_address = itlb.virtual_address;
end
endgenerate
////////////////////////////////////////////////////
//Renamer
renamer #(.NUM_WB_GROUPS(CONFIG.NUM_WB_GROUPS), .READ_PORTS(REGFILE_READ_PORTS), .RENAME_ZERO(0))
renamer #(.NUM_WB_GROUPS(CONFIG.NUM_WB_GROUPS), .READ_PORTS(REGFILE_READ_PORTS), .RENAME_ZERO(0))
renamer_block (
.clk (clk),
.rst (rst),
@ -348,7 +341,6 @@ module cva5
.decode_uses_rd (decode_uses_rd),
.fp_decode_uses_rd (fp_decode_uses_rd),
.decode_rd_addr (decode_rd_addr),
.decode_exception_unit (decode_exception_unit),
.decode_phys_rd_addr (decode_phys_rd_addr),
.fp_decode_phys_rd_addr (fp_decode_phys_rd_addr),
.decode_phys_rs_addr (decode_phys_rs_addr),
@ -395,7 +387,7 @@ module cva5
////////////////////////////////////////////////////
//Execution Units
branch_unit #(.CONFIG(CONFIG))
branch_unit_block (
branch_unit_block (
.clk (clk),
.rst (rst),
.decode_stage (decode),
@ -425,7 +417,7 @@ module cva5
.rf (rf_issue.data),
.constant_alu (constant_alu),
.issue_rs_addr (issue_rs_addr),
.issue (unit_issue[ALU_ID]),
.issue (unit_issue[ALU_ID]),
.wb (unit_wb[ALU_ID])
);
@ -453,20 +445,20 @@ module cva5
.rf (rf_issue.data),
.fp_rf (fp_rf_issue.data),
.issue (unit_issue[LS_ID]),
.dcache_on (1'b1),
.clear_reservation (1'b0),
.dcache_on (1'b1),
.clear_reservation (1'b0),
.tlb (dtlb),
.tlb_on (tlb_on),
.l1_request (l1_request[L1_DCACHE_ID]),
.l1_response (l1_response[L1_DCACHE_ID]),
.sc_complete (sc_complete),
.sc_success (sc_success),
.mem (dcache_mem),
.m_axi (m_axi),
.m_avalon (m_avalon),
.dwishbone (dwishbone),
.dwishbone (dwishbone),
.data_bram (data_bram),
.current_privilege (current_privilege),
.menvcfg (menvcfg),
.senvcfg (senvcfg),
.wb_packet (wb_packet),
.fp_wb_packet (fp_wb_packet),
.retire_id (retire_ids[0]),
.store_retire (store_retire),
.exception (exception[LS_EXCEPTION]),
.load_store_status(load_store_status),
@ -474,32 +466,26 @@ module cva5
.fp_wb (fp_unit_wb[0])
);
generate if (CONFIG.INCLUDE_S_MODE) begin : gen_dtlb_dmmu
tlb_lut_ram #(.WAYS(CONFIG.DTLB.WAYS), .DEPTH(CONFIG.DTLB.DEPTH))
d_tlb (
.clk (clk),
.rst (rst),
.gc (gc),
.abort_request (1'b0),
.asid (asid),
.tlb (dtlb),
.mmu (dmmu)
);
dtlb #(.WAYS(CONFIG.DTLB.WAYS), .DEPTH(CONFIG.DTLB.DEPTH))
d_tlb (
.clk (clk),
.rst (rst),
.translation_on (data_translation_on),
.sfence (sfence),
.asid (asid),
.tlb (dtlb),
.mmu (dmmu)
);
generate if (CONFIG.MODES == MSU) begin : gen_dmmu
mmu d_mmu (
.clk (clk),
.rst (rst),
.mmu (dmmu) ,
.mmu (dmmu),
.abort_request (1'b0),
.l1_request (l1_request[L1_DMMU_ID]),
.l1_response (l1_response[L1_DMMU_ID])
.mem (dmmu_mem)
);
end
else begin
assign dtlb.ready = 1;
assign dtlb.done = dtlb.new_request;
assign dtlb.physical_address = dtlb.virtual_address;
end
endgenerate
generate if (CONFIG.INCLUDE_UNIT.CSR) begin : gen_csrs
@ -515,25 +501,32 @@ module cva5
.uses_rs (unit_uses_rs[CSR_ID]),
.uses_rd (unit_uses_rd[CSR_ID]),
.rf (rf_issue.data),
.issue (unit_issue[CSR_ID]),
.instruction_issued (instruction_issued),
.fp_instruction_issued_with_rd (fp_instruction_issued_with_rd),
.issue (unit_issue[CSR_ID]),
.wb (unit_wb[CSR_ID]),
.current_privilege(current_privilege),
.menvcfg(menvcfg),
.senvcfg(senvcfg),
.fflag_wmask (fflag_wmask),
.dyn_rm (dyn_rm),
.interrupt_taken(interrupt_taken),
.interrupt_pending(interrupt_pending),
.processing_csr(processing_csr),
.tlb_on(tlb_on),
.csr_frontend_flush(csr_frontend_flush),
.instruction_translation_on(instruction_translation_on),
.data_translation_on(data_translation_on),
.asid(asid),
.immu(immu),
.dmmu(dmmu),
.exception(gc.exception),
.exception_pkt(gc.exception),
.exception_target_pc (exception_target_pc),
.mret(mret),
.sret(sret),
.epc(epc),
.mepc(mepc),
.sepc(sepc),
.exception(exception[CSR_EXCEPTION]),
.retire_ids(retire_ids),
.retire_count (retire_count),
.mtime(mtime),
.s_interrupt(s_interrupt),
.m_interrupt(m_interrupt)
);
@ -546,27 +539,30 @@ module cva5
.decode_stage (decode),
.issue_stage (issue),
.issue_stage_ready (issue_stage_ready),
.unit_needed (unit_needed[IEC_ID]),
.uses_rs (unit_uses_rs[IEC_ID]),
.uses_rd (unit_uses_rd[IEC_ID]),
.unit_needed (unit_needed[GC_ID]),
.uses_rs (unit_uses_rs[GC_ID]),
.uses_rd (unit_uses_rd[GC_ID]),
.instruction_issued (instruction_issued),
.constant_alu (constant_alu),
.rf (rf_issue.data),
.issue (unit_issue[IEC_ID]),
.issue (unit_issue[GC_ID]),
.branch_flush (branch_flush),
.local_gc_exception (exception[GC_EXCEPTION]),
.exception (exception),
.exception_target_pc (exception_target_pc),
.current_exception_unit (current_exception_unit),
.csr_frontend_flush (csr_frontend_flush),
.current_privilege (current_privilege),
.tvm (tvm),
.tsr (tsr),
.gc (gc),
.oldest_pc (oldest_pc),
.sfence (sfence),
.mret(mret),
.sret(sret),
.epc(epc),
.retire_ids_next (retire_ids_next),
.mepc(mepc),
.sepc(sepc),
.interrupt_taken(interrupt_taken),
.interrupt_pending(interrupt_pending),
.processing_csr(processing_csr),
.load_store_status(load_store_status),
.post_issue_count (post_issue_count)
.load_store_status(load_store_status)
);
generate if (CONFIG.INCLUDE_UNIT.MUL) begin : gen_mul
@ -599,7 +595,7 @@ module cva5
.uses_rs (unit_uses_rs[DIV_ID]),
.uses_rd (unit_uses_rd[DIV_ID]),
.rf (rf_issue.data),
.issue (unit_issue[DIV_ID]),
.issue (unit_issue[DIV_ID]),
.wb (unit_wb[DIV_ID])
);
end endgenerate
@ -616,7 +612,7 @@ module cva5
.issue_stage (issue),
.issue_stage_ready (issue_stage_ready),
.rf (rf_issue.data),
.issue (unit_issue[CUSTOM_ID]),
.issue (unit_issue[CUSTOM_ID]),
.wb (unit_wb[CUSTOM_ID])
);
end endgenerate
@ -679,7 +675,7 @@ module cva5
.wb_phys_addr (fp_wb_phys_addr)
);
renamer #(.NUM_WB_GROUPS(2), .READ_PORTS(3), .RENAME_ZERO(1))
renamer #(.NUM_WB_GROUPS(2), .READ_PORTS(3), .RENAME_ZERO(1))
fp_renamer_block (
.clk (clk),
.rst (rst),
@ -699,13 +695,6 @@ module cva5
////////////////////////////////////////////////////
//Assertions
//Ensure that reset is held for at least 32 cycles to clear shift regs
// always_ff @ (posedge clk) begin
// assert property(@(posedge clk) $rose (rst) |=> rst[*32]) else $error("Reset not held for long enough!");
// end
////////////////////////////////////////////////////
//Assertions
endmodule

62
core/decode_and_issue.sv Executable file → Normal file
View file

@ -40,7 +40,6 @@ module decode_and_issue
input logic pc_id_available,
input decode_packet_t decode,
output logic decode_advance,
output exception_sources_t decode_exception_unit,
//Renamer
renamer_interface.decode renamer,
@ -190,6 +189,10 @@ module decode_and_issue
////////////////////////////////////////////////////
//Issue
always_ff @(posedge clk) begin
if (instruction_issued) begin
issue.pc_r <= issue.pc;
issue.instruction_r <= issue.instruction;
end
if (issue_stage_ready) begin
issue.pc <= decode.pc;
issue.instruction <= decode.instruction;
@ -208,7 +211,6 @@ module decode_and_issue
fp_issue_rd_wb_group <= fp_decode_wb_group;
issue.is_multicycle <= ~unit_needed[ALU_ID];
issue.id <= decode.id;
issue.exception_unit <= decode_exception_unit;
issue_uses_rs <= decode_uses_rs;
fp_issue_uses_rs <= fp_decode_uses_rs;
issue.uses_rd <= decode_uses_rd;
@ -276,29 +278,23 @@ module decode_and_issue
////////////////////////////////////////////////////
//Illegal Instruction check
generate if (CONFIG.INCLUDE_M_MODE) begin : gen_decode_exceptions
generate if (CONFIG.MODES != BARE) begin : gen_decode_exceptions
logic new_exception;
exception_code_t ecode;
exception_code_t ecall_code;
logic [31:0] tval;
//ECALL and EBREAK captured here, but seperated out when ecode is set
assign illegal_instruction_pattern = ~|unit_needed;
//TODO: Consider ways of parameterizing so that any exception generating unit
//can be automatically added to this expression
always_comb begin
unique case (1'b1)
unit_needed[LS_ID] : decode_exception_unit = LS_EXCEPTION;
unit_needed[BR_ID] : decode_exception_unit = BR_EXCEPTION;
default : decode_exception_unit = PRE_ISSUE_EXCEPTION;
endcase
if (~decode.fetch_metadata.ok)
decode_exception_unit = PRE_ISSUE_EXCEPTION;
end
////////////////////////////////////////////////////
//ECALL/EBREAK
//The type of call instruction is depedent on the current privilege level
logic is_ecall;
logic is_ebreak;
assign is_ecall = decode.instruction inside {ECALL};
assign is_ebreak = decode.instruction inside {EBREAK};
always_comb begin
case (current_privilege)
USER_PRIVILEGE : ecall_code = ECALL_U;
@ -310,11 +306,21 @@ module decode_and_issue
always_ff @(posedge clk) begin
if (issue_stage_ready) begin
ecode <=
decode.instruction inside {ECALL} ? ecall_code :
decode.instruction inside {EBREAK} ? BREAK :
illegal_instruction_pattern ? ILLEGAL_INST :
decode.fetch_metadata.error_code; //(~decode.fetch_metadata.ok)
if (~decode.fetch_metadata.ok)
ecode <= decode.fetch_metadata.error_code;
else if (is_ecall)
ecode <= ecall_code;
else if (is_ebreak)
ecode <= BREAK;
else
ecode <= ILLEGAL_INST;
if (~decode.fetch_metadata.ok | is_ebreak)
tval <= decode.pc;
else if (is_ecall)
tval <= '0;
else
tval <= decode.instruction;
end
end
@ -327,22 +333,20 @@ module decode_and_issue
pre_issue_exception_pending <= illegal_instruction_pattern | (~decode.fetch_metadata.ok);
end
assign new_exception = issue.stage_valid & pre_issue_exception_pending & ~(gc.issue_hold | gc.fetch_flush | exception.valid);
assign new_exception = issue.stage_valid & pre_issue_exception_pending & ~(gc.issue_hold | gc.fetch_flush) & ~exception.valid;
always_ff @(posedge clk) begin
if (rst)
exception.valid <= 0;
else
exception.valid <= (exception.valid | new_exception) & ~exception.ack;
exception.valid <= new_exception;
end
always_ff @(posedge clk) begin
if (new_exception) begin
exception.code <= ecode;
exception.tval <= issue.instruction;
exception.id <= issue.id;
end
end
assign exception.possible = 0; //Not needed because occurs before issue
assign exception.code = ecode;
assign exception.tval = tval;
assign exception.pc = issue.pc;
assign exception.discard = 0;
end endgenerate
////////////////////////////////////////////////////

0
core/execution_units/alu_unit.sv Executable file → Normal file
View file

0
core/execution_units/barrel_shifter.sv Executable file → Normal file
View file

15
core/execution_units/branch_unit.sv Executable file → Normal file
View file

@ -65,7 +65,6 @@ module branch_unit
logic [31:0] new_pc;
logic [31:0] new_pc_ex;
logic [31:0] pc_ex;
logic instruction_is_completing;
logic branch_complete;
@ -200,7 +199,7 @@ module branch_unit
////////////////////////////////////////////////////
//Exception support
generate if (CONFIG.INCLUDE_M_MODE) begin : gen_branch_exception
generate if (CONFIG.MODES != BARE) begin : gen_branch_exception
logic new_exception;
assign new_exception = new_pc[1] & branch_taken & issue.new_request;
@ -208,15 +207,14 @@ module branch_unit
if (rst)
exception.valid <= 0;
else
exception.valid <= (exception.valid & ~exception.ack) | new_exception;
exception.valid <= new_exception;
end
always_ff @(posedge clk) begin
if (issue.new_request)
exception.id <= issue.id;
end
assign exception.possible = 0; //Not needed because branch_flush suppresses issue
assign exception.code = INST_ADDR_MISSALIGNED;
assign exception.tval = new_pc_ex;
assign exception.pc = issue_stage.pc_r;
assign exception.discard = 0;
end
endgenerate
@ -228,13 +226,12 @@ module branch_unit
if (issue.possible_issue) begin
is_return_ex <= is_return;
is_call_ex <= is_call;
pc_ex <= issue_stage.pc;
end
end
assign br_results.id = id_ex;
assign br_results.valid = instruction_is_completing;
assign br_results.pc = pc_ex;
assign br_results.pc = issue_stage.pc_r;
assign br_results.target_pc = new_pc_ex;
assign br_results.branch_taken = branch_taken_ex;
assign br_results.is_branch = ~jal_or_jalr_ex;

859
core/execution_units/csr_unit.sv Executable file → Normal file

File diff suppressed because it is too large Load diff

2
core/execution_units/div_unit.sv Executable file → Normal file
View file

@ -129,7 +129,7 @@ module div_unit
set_clr_reg_with_rst #(.SET_OVER_CLR(1), .WIDTH(1), .RST_VALUE(0)) prev_div_result_valid_m (
.clk, .rst,
.set(issue.new_request & ~((issue_stage.rd_addr == issue_rs_addr[RS1]) | (issue_stage.rd_addr == issue_rs_addr[RS2]))),
.clr((instruction_issued_with_rd & div_rs_overwrite) | gc.writeback_supress), //No instructions will be issued while gc.writeback_supress is asserted
.clr((instruction_issued_with_rd & div_rs_overwrite) | gc.init_clear), //No instructions will be issued while gc.init_clear is asserted
.result(prev_div_result_valid)
);

View file

@ -44,6 +44,7 @@ module gc_unit
input issue_packet_t issue_stage,
input logic issue_stage_ready,
input logic instruction_issued,
input logic [31:0] constant_alu,
input logic [31:0] rf [REGFILE_READ_PORTS],
@ -52,39 +53,38 @@ module gc_unit
//Branch miss predict
input logic branch_flush,
//exception_interface.unit pre_issue_exception,
//Exception
exception_interface.unit local_gc_exception,
exception_interface.econtrol exception [NUM_EXCEPTION_SOURCES],
input logic [31:0] exception_target_pc,
input logic [31:0] oldest_pc,
output logic mret,
output logic sret,
input logic [31:0] epc,
//Retire
input id_t retire_ids_next [RETIRE_PORTS],
input logic [$clog2(NUM_EXCEPTION_SOURCES)-1:0] current_exception_unit,
input logic [31:0] mepc,
input logic [31:0] sepc,
//CSR Interrupts
input logic interrupt_pending,
output logic interrupt_taken,
input logic processing_csr,
//CSR signals
input logic csr_frontend_flush,
input logic [1:0] current_privilege,
input logic tvm,
input logic tsr,
//Output controls
output gc_outputs_t gc,
output tlb_packet_t sfence,
//Ordering support
input load_store_status_t load_store_status,
input logic [LOG2_MAX_IDS:0] post_issue_count
input load_store_status_t load_store_status
);
//Largest depth for TLBs
localparam int TLB_CLEAR_DEPTH = (CONFIG.DTLB.DEPTH > CONFIG.ITLB.DEPTH) ? CONFIG.DTLB.DEPTH : CONFIG.ITLB.DEPTH;
//For general reset clear, greater of TLB depth or id-flight memory blocks (MAX_IDS)
localparam int INIT_CLEAR_DEPTH = CONFIG.INCLUDE_S_MODE ? (TLB_CLEAR_DEPTH > 64 ? TLB_CLEAR_DEPTH : 64) : 64;
localparam int INIT_CLEAR_DEPTH = CONFIG.MODES == MSU ? (TLB_CLEAR_DEPTH > 64 ? TLB_CLEAR_DEPTH : 64) : 64;
////////////////////////////////////////////////////
//Overview
@ -119,120 +119,157 @@ module gc_unit
//LS exceptions (miss-aligned, TLB and MMU) (issue stage)
//fetch flush, take exception. If execute or later exception occurs first, exception is overridden
common_instruction_t instruction;//rs1_addr, rs2_addr, fn3, fn7, rd_addr, upper/lower opcode
typedef enum {RST_STATE, PRE_CLEAR_STATE, INIT_CLEAR_STATE, IDLE_STATE, TLB_CLEAR_STATE, POST_ISSUE_DRAIN, PRE_ISSUE_FLUSH, POST_ISSUE_DISCARD} gc_state;
typedef enum {RST_STATE, PRE_CLEAR_STATE, INIT_CLEAR_STATE, IDLE_STATE, TLB_CLEAR_STATE, WAIT_INTERRUPT, PRE_ISSUE_FLUSH, WAIT_WRITE} gc_state;
gc_state state;
gc_state next_state;
logic init_clear_done;
logic tlb_clear_done;
logic post_issue_idle;
logic ifence_in_progress;
logic ret_in_progress;
//GC registered global outputs
logic gc_init_clear;
logic gc_fetch_hold;
logic gc_issue_hold;
logic gc_rename_revert;
logic gc_fetch_flush;
logic gc_writeback_supress;
logic gc_retire_hold;
logic gc_fetch_ifence;
logic gc_tlb_flush;
logic gc_sq_flush;
logic gc_pc_override;
logic [31:0] gc_pc;
typedef struct packed{
logic [31:0] pc_p4;
logic is_ifence;
logic is_mret;
logic is_sret;
} gc_inputs_t;
logic possible_exception;
gc_inputs_t gc_inputs;
gc_inputs_t gc_inputs_r;
////////////////////////////////////////////////////
//Implementation
////////////////////////////////////////////////////
//Decode
logic [31:0] pc_p4;
logic is_ifence;
logic is_sfence;
logic trivial_sfence;
logic asid_sfence;
logic is_mret;
logic is_sret;
logic is_wfi;
assign instruction = decode_stage.instruction;
assign unit_needed =
(CONFIG.INCLUDE_M_MODE & decode_stage.instruction inside {MRET}) |
(CONFIG.INCLUDE_S_MODE & decode_stage.instruction inside {SRET, SFENCE_VMA}) |
(CONFIG.INCLUDE_IFENCE & decode_stage.instruction inside {FENCE_I});
(CONFIG.MODES != BARE & instruction inside {MRET, WFI}) |
(CONFIG.MODES == MSU & instruction inside {SRET, SFENCE_VMA}) |
(CONFIG.INCLUDE_IFENCE & instruction inside {FENCE_I});
always_comb begin
uses_rs = '0;
uses_rs[RS1] = CONFIG.INCLUDE_S_MODE & decode_stage.instruction inside {SFENCE_VMA};
uses_rs[RS1] = CONFIG.MODES == MSU & instruction inside {SFENCE_VMA};
uses_rs[RS2] = CONFIG.MODES == MSU & instruction inside {SFENCE_VMA};
uses_rd = 0;
end
always_ff @(posedge clk) begin
if (issue_stage_ready) begin
is_ifence = (instruction.upper_opcode == FENCE_T) & CONFIG.INCLUDE_IFENCE;
is_mret = (instruction.upper_opcode == SYSTEM_T) & (decode_stage.instruction[31:20] == MRET_imm) & CONFIG.INCLUDE_M_MODE;
is_sret = (instruction.upper_opcode == SYSTEM_T) & (decode_stage.instruction[31:20] == SRET_imm) & CONFIG.INCLUDE_S_MODE;
is_ifence <= CONFIG.INCLUDE_IFENCE & instruction.upper_opcode[2];
is_sfence <= CONFIG.MODES == MSU & ~instruction.upper_opcode[2] & instruction.fn7[0];
trivial_sfence <= |instruction.rs1_addr;
asid_sfence <= |instruction.rs2_addr;
is_wfi <= CONFIG.MODES != BARE & ~instruction.upper_opcode[2] & ~instruction.fn7[0] & ~instruction.rs2_addr[1];
//Ret instructions need exact decoding
is_mret <= CONFIG.MODES != BARE & instruction inside {MRET};
is_sret <= CONFIG.MODES == MSU & instruction inside {SRET};
end
end
assign gc_inputs.pc_p4 = constant_alu;
assign gc_inputs.is_ifence = is_ifence;
assign gc_inputs.is_mret = is_mret;
assign gc_inputs.is_sret = is_sret;
////////////////////////////////////////////////////
//Issue
logic is_ifence_r;
logic is_sfence_r;
logic is_sret_r;
logic trivial_sfence_r;
logic asid_sfence_r;
logic [31:0] sfence_addr_r;
logic [ASIDLEN-1:0] asid_r;
logic new_exception;
//Input registering
always_ff @(posedge clk) begin
if (issue.new_request)
gc_inputs_r <= gc_inputs;
if (rst) begin
is_ifence_r <= 0;
is_sfence_r <= 0;
mret <= 0;
sret <= 0;
end
else begin
is_ifence_r <= issue.new_request & is_ifence & ~new_exception;
is_sfence_r <= issue.new_request & is_sfence & ~new_exception;
mret <= issue.new_request & is_mret & ~new_exception;
sret <= issue.new_request & is_sret & ~new_exception;
end
end
//ret
always_ff @(posedge clk) begin
if (rst)
ret_in_progress <= 0;
else
ret_in_progress <= (ret_in_progress & ~(next_state == PRE_ISSUE_FLUSH)) | (issue.new_request & (gc_inputs.is_mret | gc_inputs.is_sret));
if (issue.new_request) begin
trivial_sfence_r <= trivial_sfence;
asid_sfence_r <= asid_sfence;
sfence_addr_r <= rf[RS1];
asid_r <= rf[RS2][ASIDLEN-1:0];
end
if (rst) begin
trivial_sfence_r <= 0;
asid_sfence_r <= 0;
end
end
//ifence
always_ff @(posedge clk) begin
if (rst)
ifence_in_progress <= 0;
else
ifence_in_progress <= (ifence_in_progress & ~(next_state == PRE_ISSUE_FLUSH)) | (issue.new_request & gc_inputs.is_ifence);
//Exceptions treated like every other unit
generate if (CONFIG.MODES != BARE) begin : gen_gc_exception
always_comb begin
new_exception = 0;
if (issue.new_request) begin
if (current_privilege == USER_PRIVILEGE)
new_exception = is_sfence | is_sret | is_mret;
else if (current_privilege == SUPERVISOR_PRIVILEGE)
new_exception = (is_sfence & tvm) | (is_sret & tsr);
end
end
always_ff @(posedge clk) begin
if (rst)
local_gc_exception.valid <= 0;
else
local_gc_exception.valid <= new_exception;
end
assign local_gc_exception.possible = 0; //Not needed because appears on first cycle
assign local_gc_exception.code = ILLEGAL_INST;
assign local_gc_exception.tval = issue_stage.instruction_r;
assign local_gc_exception.pc = issue_stage.pc_r;
assign local_gc_exception.discard = 0;
end
endgenerate
////////////////////////////////////////////////////
//GC Operation
assign post_issue_idle = (post_issue_count == 0) & load_store_status.sq_empty;
assign gc.fetch_flush = branch_flush | gc_pc_override;
always_ff @ (posedge clk) begin
gc_fetch_hold <= next_state inside {PRE_CLEAR_STATE, INIT_CLEAR_STATE, POST_ISSUE_DRAIN, PRE_ISSUE_FLUSH};
gc_issue_hold <= processing_csr | (next_state inside {PRE_CLEAR_STATE, INIT_CLEAR_STATE, TLB_CLEAR_STATE, POST_ISSUE_DRAIN, PRE_ISSUE_FLUSH, POST_ISSUE_DISCARD});
gc_writeback_supress <= next_state inside {PRE_CLEAR_STATE, INIT_CLEAR_STATE, POST_ISSUE_DISCARD};
gc_retire_hold <= next_state inside {PRE_ISSUE_FLUSH};
gc_fetch_hold <= next_state inside {PRE_CLEAR_STATE, INIT_CLEAR_STATE, PRE_ISSUE_FLUSH, TLB_CLEAR_STATE, WAIT_WRITE};
gc_issue_hold <= next_state inside {PRE_CLEAR_STATE, INIT_CLEAR_STATE, WAIT_INTERRUPT, PRE_ISSUE_FLUSH, TLB_CLEAR_STATE, WAIT_WRITE};
gc_init_clear <= next_state inside {INIT_CLEAR_STATE};
gc_fetch_ifence <= issue.new_request & is_ifence;
gc_tlb_flush <= next_state inside {INIT_CLEAR_STATE, TLB_CLEAR_STATE};
gc_sq_flush <= state inside {POST_ISSUE_DISCARD} & next_state inside {IDLE_STATE};
end
//work-around for verilator BLKANDNBLK signal optimizations
assign gc.fetch_hold = gc_fetch_hold;
assign gc.issue_hold = gc_issue_hold;
assign gc.writeback_supress = CONFIG.INCLUDE_M_MODE & gc_writeback_supress;
assign gc.retire_hold = gc_retire_hold;
assign gc.issue_hold = gc_issue_hold | possible_exception;
assign gc.init_clear = gc_init_clear;
assign gc.tlb_flush = CONFIG.INCLUDE_S_MODE & gc_tlb_flush;
assign gc.sq_flush = CONFIG.INCLUDE_M_MODE & gc_sq_flush;
assign gc.fetch_ifence = CONFIG.INCLUDE_IFENCE & gc_fetch_ifence;
assign sfence = '{
valid : CONFIG.MODES == MSU & gc_tlb_flush,
asid_only : asid_sfence_r,
asid : asid_r,
addr_only : trivial_sfence_r,
addr : sfence_addr_r
};
////////////////////////////////////////////////////
//GC State Machine
always @(posedge clk) begin
@ -249,19 +286,47 @@ module gc_unit
PRE_CLEAR_STATE : next_state = INIT_CLEAR_STATE;
INIT_CLEAR_STATE : if (init_clear_done) next_state = IDLE_STATE;
IDLE_STATE : begin
if (gc.exception.valid)//new pending exception is also oldest instruction
if ((issue.new_request & ~is_wfi & ~new_exception) | gc.exception.valid | csr_frontend_flush)
next_state = PRE_ISSUE_FLUSH;
else if (issue.new_request | interrupt_pending | gc.exception_pending)
next_state = POST_ISSUE_DRAIN;
else if (interrupt_pending)
next_state = WAIT_INTERRUPT;
end
TLB_CLEAR_STATE : if (tlb_clear_done) next_state = IDLE_STATE;
POST_ISSUE_DRAIN : if (((ifence_in_progress | ret_in_progress) & post_issue_idle) | gc.exception.valid | interrupt_pending) next_state = PRE_ISSUE_FLUSH;
PRE_ISSUE_FLUSH : next_state = POST_ISSUE_DISCARD;
POST_ISSUE_DISCARD : if ((post_issue_count == 0) & load_store_status.no_released_stores_pending) next_state = IDLE_STATE;
WAIT_INTERRUPT : begin
if (gc.exception.valid | csr_frontend_flush) //Exception overrides interrupt
next_state = PRE_ISSUE_FLUSH;
else if (~interrupt_pending) //Something cancelled the interrupt
next_state = IDLE_STATE;
else if (~possible_exception & issue_stage.stage_valid & ~branch_flush) //No more possible exceptions and issue stage has correct PC
next_state = PRE_ISSUE_FLUSH;
end
PRE_ISSUE_FLUSH : begin
if (is_sfence_r)
next_state = TLB_CLEAR_STATE;
else if (is_ifence_r)
next_state = WAIT_WRITE;
else //MRET/SRET, exception, interrupt, CSR flush
next_state = IDLE_STATE;
end
//gc.exception will never be set in these states
TLB_CLEAR_STATE : if (tlb_clear_done) next_state = (load_store_status.outstanding_store) ? WAIT_WRITE : IDLE_STATE;
WAIT_WRITE : if (~load_store_status.outstanding_store) next_state = IDLE_STATE;
default : next_state = RST_STATE;
endcase
end
//Will never encounter an exception and can ignore interrupts -> will not have a new instruction on the transition to idle; interrupts can be ignored
//SFENCE: PRE_ISSUE_FLUSH (Override PC) -> TLB_CLEAR -> WAIT_WRITE
//IFENCE: PRE_ISSUE_FLUSH (Override PC) -> WAIT_WRITE
//MRET/SRET: PRE_ISSUE_FLUSH (Override PC)
//Branch/CSR/LS exceptions: PRE_ISSUE_FLUSH (Override PC)
//Fetch/illegal exception: PRE_ISSUE_FLUSH (Override PC)
//Interrupt: WAIT_UNTIL_RETIRED (capture next PC) -> PRE_ISSUE_FLUSH (Override PC) <- This can be hijacked by an exception
//Interrupt
//wait until issue/execute exceptions are no longer possible, flush fetch, take exception
////////////////////////////////////////////////////
//State Counter
logic [$clog2(INIT_CLEAR_DEPTH):0] state_counter;
@ -272,63 +337,91 @@ module gc_unit
state_counter <= state_counter + 1;
end
assign init_clear_done = state_counter[$clog2(INIT_CLEAR_DEPTH)];
assign tlb_clear_done = state_counter[$clog2(TLB_CLEAR_DEPTH)];
assign tlb_clear_done = state_counter[$clog2(TLB_CLEAR_DEPTH)] | trivial_sfence_r;
////////////////////////////////////////////////////
//Exception handling
generate if (CONFIG.INCLUDE_M_MODE) begin :gen_gc_m_mode
logic [NUM_EXCEPTION_SOURCES-1:0] exception_valid;
logic [NUM_EXCEPTION_SOURCES-1:0] exception_possible;
//Separated out because possible exceptions from CSR must still stall even without M
generate for (genvar i = 0; i < NUM_EXCEPTION_SOURCES; i++) begin : gen_possible_exceptions
assign exception_possible[i] = exception[i].possible;
end endgenerate
assign possible_exception = |exception_possible;
assign gc.exception.possible = possible_exception;
generate if (CONFIG.MODES != BARE) begin : gen_gc_m_mode
//Re-assigning interface inputs to array types so that they can be dynamically indexed
logic [NUM_EXCEPTION_SOURCES-1:0] exception_pending;
exception_code_t [NUM_EXCEPTION_SOURCES-1:0] exception_code;
id_t [NUM_EXCEPTION_SOURCES-1:0] exception_id;
logic [NUM_EXCEPTION_SOURCES-1:0][31:0] exception_tval;
logic exception_ack;
logic [NUM_EXCEPTION_SOURCES-1:0][31:0] exception_pc;
logic [NUM_EXCEPTION_SOURCES-1:0] exception_discard;
logic [$clog2(NUM_EXCEPTION_SOURCES > 1 ? NUM_EXCEPTION_SOURCES : 2)-1:0] exception_source;
for (genvar i = 0; i < NUM_EXCEPTION_SOURCES; i++) begin
assign exception_pending[i] = exception[i].valid;
for (genvar i = 0; i < NUM_EXCEPTION_SOURCES; i++) begin : gen_unpacking
assign exception_valid[i] = exception[i].valid;
assign exception_code[i] = exception[i].code;
assign exception_id[i] = exception[i].id;
assign exception_tval[i] = exception[i].tval;
assign exception[i].ack = exception_ack;
assign exception_discard[i] = exception[i].discard;
assign exception_pc[i] = exception[i].pc;
end
one_hot_to_integer #(.C_WIDTH(NUM_EXCEPTION_SOURCES)) src_mux (
.one_hot(exception_valid),
.int_out(exception_source)
);
//Exception valid when the oldest instruction is a valid ID. This is done with a level of indirection (through the exception unit table)
//for better scalability, avoiding the need to compare against all exception sources.
always_comb begin
gc.exception_pending = |exception_pending;
gc.exception.valid = (retire_ids_next[0] == exception_id[current_exception_unit]) & exception_pending[current_exception_unit];
gc.exception.pc = oldest_pc;
gc.exception.code = exception_code[current_exception_unit];
gc.exception.tval = exception_tval[current_exception_unit];
assign gc.exception.valid = |exception_valid;
assign gc.exception.code = exception_code[exception_source];
assign gc.exception.tval = exception_tval[exception_source];
assign gc.exception.pc = |exception_valid ? exception_pc[exception_source] : issue_stage.pc;
assign gc.exception.valid = |exception_valid;
assign gc.exception.source = exception_valid;
assign interrupt_taken = interrupt_pending & (next_state == PRE_ISSUE_FLUSH) & ~(gc.exception.valid) & ~csr_frontend_flush;
//Writeback and rename handling
logic gc_writeback_suppress_r;
logic gc_rename_revert;
always_ff @(posedge clk) begin
if (rst) begin
gc_writeback_suppress_r <= 0;
gc_rename_revert <= 0;
end
else begin
gc_writeback_suppress_r <= gc.writeback_suppress;
gc_rename_revert <= gc_writeback_suppress_r;
end
end
assign exception_ack = gc.exception.valid;
assign interrupt_taken = interrupt_pending & (next_state == PRE_ISSUE_FLUSH) & ~(ifence_in_progress | ret_in_progress | gc.exception.valid);
assign mret = gc_inputs_r.is_mret & ret_in_progress & (next_state == PRE_ISSUE_FLUSH);
assign sret = gc_inputs_r.is_sret & ret_in_progress & (next_state == PRE_ISSUE_FLUSH);
end endgenerate
assign gc.writeback_suppress = |(exception_valid & exception_discard);
assign gc.rename_revert = gc_rename_revert;
end endgenerate
//PC determination (trap, flush or return)
//Two cycles: on first cycle the processor front end is flushed,
//on the second cycle the new PC is fetched
generate if (CONFIG.INCLUDE_M_MODE || CONFIG.INCLUDE_IFENCE) begin :gen_gc_pc_override
generate if (CONFIG.MODES != BARE || CONFIG.INCLUDE_IFENCE) begin :gen_gc_pc_override
always_ff @ (posedge clk) begin
gc_pc_override <= next_state inside {PRE_ISSUE_FLUSH, INIT_CLEAR_STATE};
gc_pc <=
(gc.exception.valid | interrupt_taken) ? exception_target_pc :
(gc_inputs_r.is_ifence) ? gc_inputs_r.pc_p4 :
epc; //ret
if (gc.exception.valid | interrupt_taken)
gc_pc <= exception_target_pc;
else if (instruction_issued) begin
if (is_mret)
gc_pc <= mepc;
else if (is_sret)
gc_pc <= sepc;
else //IFENCE, SFENCE, CSR flush
gc_pc <= constant_alu;
end
end
//work-around for verilator BLKANDNBLK signal optimizations
assign gc.pc_override = gc_pc_override;
assign gc.pc = gc_pc;
end endgenerate
end endgenerate
////////////////////////////////////////////////////
//Decode / Write-back Handshaking
//CSR reads are passed through the Load-Store unit
@ -342,12 +435,12 @@ module gc_unit
////////////////////////////////////////////////////
//Assertions
`ifdef ENABLE_SIMULATION_ASSERTIONS
generate if (DEBUG_CONVERT_EXCEPTIONS_INTO_ASSERTIONS) begin
unexpected_exception_assertion:
assert property (@(posedge clk) disable iff (rst) (~gc.exception.valid))
else $error("unexpected exception occured: %s", gc.exception.code.name());
end endgenerate
`endif
multiple_exceptions_assertion:
assert property (@(posedge clk) disable iff (rst) $onehot0(exception_valid))
else $error("Simultaneous exceptions");
multiple_possible_exceptions_assertion:
assert property (@(posedge clk) disable iff (rst) $onehot0(exception_possible))
else $error("Simultaneous possible exceptions");
endmodule

View file

@ -28,7 +28,7 @@ module addr_hash
parameter logic USE_BIT_3 = 1
)
(
input logic [31:0] addr,
input logic [11:0] addr,
output addr_hash_t addr_hash
);

54
core/execution_units/load_store_unit/amo_alu.sv Executable file → Normal file
View file

@ -1,5 +1,5 @@
/*
* Copyright © 2017 Eric Matthews, Lesley Shannon
* Copyright © 2017 Eric Matthews, Chris Keilbart, Lesley Shannon
*
* Licensed under the Apache License, Version 2.0 (the "License");
* you may not use this file except in compliance with the License.
@ -18,44 +18,46 @@
*
* Author(s):
* Eric Matthews <ematthew@sfu.ca>
* Chris Keilbart <ckeilbar@sfu.ca>
*/
module amo_alu
import cva5_config::*;
import riscv_types::*;
import cva5_types::*;
#(
parameter int WIDTH = 32
)
(
input amo_alu_inputs_t amo_alu_inputs,
output logic[31:0] result
input amo_t amo_type,
input logic[WIDTH-1:0] rs1,
input logic[WIDTH-1:0] rs2,
output logic[WIDTH-1:0] rd
);
logic signed_op;
logic rs1_smaller_than_rs2;
logic signed [32:0] rs1_ext;
logic signed [32:0] rs2_ext;
//bit 4 for unsigned
assign rs1_ext = {(~amo_alu_inputs.op[4] & amo_alu_inputs.rs1_load[31]), amo_alu_inputs.rs1_load};
assign rs2_ext = {(~amo_alu_inputs.op[4] & amo_alu_inputs.rs2[31]), amo_alu_inputs.rs2};
logic signed [WIDTH:0] rs1_ext;
logic signed [WIDTH:0] rs2_ext;
assign signed_op = amo_type == AMO_MIN_FN5 | amo_type == AMO_MAX_FN5;
assign rs1_ext = {(signed_op & rs1[WIDTH-1]), rs1};
assign rs2_ext = {(signed_op & rs2[WIDTH-1]), rs2};
assign rs1_smaller_than_rs2 = rs1_ext < rs2_ext;
/* verilator lint_off CASEINCOMPLETE */
always_comb begin
case (amo_alu_inputs.op)// <--unique as not all codes are in use
AMO_SWAP_FN5 : result = amo_alu_inputs.rs2;
AMO_ADD_FN5 : result = amo_alu_inputs.rs1_load + amo_alu_inputs.rs2;
AMO_XOR_FN5 : result = amo_alu_inputs.rs1_load ^ amo_alu_inputs.rs2;
AMO_AND_FN5 : result = amo_alu_inputs.rs1_load & amo_alu_inputs.rs2;
AMO_OR_FN5 : result = amo_alu_inputs.rs1_load | amo_alu_inputs.rs2;
AMO_MIN_FN5 : result = rs1_smaller_than_rs2 ? amo_alu_inputs.rs1_load : amo_alu_inputs.rs2;
AMO_MAX_FN5 : result = rs1_smaller_than_rs2 ? amo_alu_inputs.rs2 : amo_alu_inputs.rs1_load;
AMO_MINU_FN5 : result = rs1_smaller_than_rs2 ? amo_alu_inputs.rs1_load : amo_alu_inputs.rs2;
AMO_MAXU_FN5 : result = rs1_smaller_than_rs2 ? amo_alu_inputs.rs2 : amo_alu_inputs.rs1_load;
unique case (amo_type)
AMO_XOR_FN5 : rd = rs1 ^ rs2;
AMO_OR_FN5 : rd = rs1 | rs2;
AMO_AND_FN5 : rd = rs1 & rs2;
AMO_SWAP_FN5 : rd = rs2;
AMO_MIN_FN5 : rd = rs1_smaller_than_rs2 ? rs1 : rs2;
AMO_MAX_FN5 : rd = rs1_smaller_than_rs2 ? rs2 : rs1;
AMO_MINU_FN5 : rd = rs1_smaller_than_rs2 ? rs1 : rs2;
AMO_MAXU_FN5 : rd = rs1_smaller_than_rs2 ? rs2 : rs1;
AMO_ADD_FN5 : rd = rs1 + rs2;
default : rd = 'x; //Default don't care allows some optimization
endcase
end
/* verilator lint_on CASEINCOMPLETE */
endmodule
endmodule

View file

@ -0,0 +1,115 @@
/*
* Copyright © 2024 Chris Keilbart, Lesley Shannon
*
* Licensed under the Apache License, Version 2.0 (the "License");
* you may not use this file except in compliance with the License.
* You may obtain a copy of the License at
*
* http://www.apache.org/licenses/LICENSE-2.0
*
* Unless required by applicable law or agreed to in writing, software
* distributed under the License is distributed on an "AS IS" BASIS,
* WITHOUT WARRANTIES OR CONDITIONS OF ANY KIND, either express or implied.
* See the License for the specific language governing permissions and
* limitations under the License.
*
* Initial code developed under the supervision of Dr. Lesley Shannon,
* Reconfigurable Computing Lab, Simon Fraser University.
*
* Author(s):
* Chris Keilbart <ckeilbar@sfu.ca>
*/
module amo_unit
import riscv_types::*;
#(
parameter int NUM_UNITS = 3,
parameter int RESERVATION_WORDS = 4
) //TODO: reservation shape and size must be discoverable(?)
(
input logic clk,
input logic rst,
amo_interface.amo_unit agents[NUM_UNITS]
);
localparam RESERVATION_WIDTH = 30 - $clog2(RESERVATION_WORDS);
typedef logic[RESERVATION_WIDTH-1:0] reservation_t;
////////////////////////////////////////////////////
//Interface unpacking
logic[NUM_UNITS-1:0] set_reservation;
logic[NUM_UNITS-1:0] clear_reservation;
reservation_t[NUM_UNITS-1:0] reservation;
reservation_t lr_addr;
logic lr_valid;
logic[NUM_UNITS-1:0] rmw_valid;
amo_t[NUM_UNITS-1:0] op;
logic[NUM_UNITS-1:0][31:0] rs1;
logic[NUM_UNITS-1:0][31:0] rs2;
logic[31:0] rd;
generate for (genvar i = 0; i < NUM_UNITS; i++) begin : gen_unpacking
assign set_reservation[i] = agents[i].set_reservation;
assign clear_reservation[i] = agents[i].clear_reservation;
assign reservation[i] = agents[i].reservation[31-:RESERVATION_WIDTH];
assign agents[i].reservation_valid = lr_valid & lr_addr == reservation[i];
assign rmw_valid[i] = agents[i].rmw_valid;
assign op[i] = agents[i].op;
assign rs1[i] = agents[i].rs1;
assign rs2[i] = agents[i].rs2;
assign agents[i].rd = rd;
end endgenerate
////////////////////////////////////////////////////
//Multiplexing
//Shared LR-SC and RMW port across all units
reservation_t set_val;
amo_t selected_op;
logic[31:0] selected_rs1;
logic[31:0] selected_rs2;
logic[$clog2(NUM_UNITS > 1 ? NUM_UNITS : 2)-1:0] reservation_int;
logic[$clog2(NUM_UNITS > 1 ? NUM_UNITS : 2)-1:0] rmw_int;
one_hot_to_integer #(.C_WIDTH(NUM_UNITS)) reservation_conv (
.one_hot(set_reservation),
.int_out(reservation_int)
);
assign set_val = reservation[reservation_int];
one_hot_to_integer #(.C_WIDTH(NUM_UNITS)) rmw_conv (
.one_hot(rmw_valid),
.int_out(rmw_int)
);
assign selected_op = op[rmw_int];
assign selected_rs1 = rs1[rmw_int];
assign selected_rs2 = rs2[rmw_int];
////////////////////////////////////////////////////
//RISC-V LR-SC
//One address is reserved at a time for all units
//The reservation can be set or cleared at any time by any unit, but set has priority over clear on same cycle
always_ff @(posedge clk) begin
if (rst)
lr_valid <= 0;
else
lr_valid <= (lr_valid & ~|clear_reservation) | |set_reservation;
if (|set_reservation)
lr_addr <= set_val;
end
////////////////////////////////////////////////////
//RISC-V Atomic ALU
//Combinational; results valid in same cycle
amo_alu #(.WIDTH(32)) alu_inst (
.amo_type(selected_op),
.rs1(selected_rs1),
.rs2(selected_rs2),
.rd(rd)
);
endmodule

View file

@ -1,327 +0,0 @@
/*
* Copyright © 2022 Eric Matthews
*
* Licensed under the Apache License, Version 2.0 (the "License");
* you may not use this file except in compliance with the License.
* You may obtain a copy of the License at
*
* http://www.apache.org/licenses/LICENSE-2.0
*
* Unless required by applicable law or agreed to in writing, software
* distributed under the License is distributed on an "AS IS" BASIS,
* WITHOUT WARRANTIES OR CONDITIONS OF ANY KIND, either express or implied.
* See the License for the specific language governing permissions and
* limitations under the License.
*
* Initial code developed under the supervision of Dr. Lesley Shannon,
* Reconfigurable Computing Lab, Simon Fraser University.
*
* Author(s):
* Eric Matthews <ematthew@sfu.ca>
*/
module dcache
import cva5_config::*;
import riscv_types::*;
import cva5_types::*;
# (
parameter cpu_config_t CONFIG = EXAMPLE_CONFIG
)
(
input logic clk,
input logic rst,
input logic dcache_on,
l1_arbiter_request_interface.master l1_request,
l1_arbiter_return_interface.master l1_response,
input logic sc_complete,
input logic sc_success,
input logic clear_reservation,
input amo_details_t amo,
input logic uncacheable_load,
input logic uncacheable_store,
input logic is_load,
input logic load_request,
input logic store_request,
output logic load_ready,
output logic store_ready,
input data_access_shared_inputs_t ls_load,
input data_access_shared_inputs_t ls_store,
memory_sub_unit_interface.responder ls
);
localparam derived_cache_config_t SCONFIG = get_derived_cache_params(CONFIG, CONFIG.DCACHE, CONFIG.DCACHE_ADDR);
localparam LOG2_WAYS = (CONFIG.DCACHE.WAYS == 1) ? 1 : $clog2(CONFIG.DCACHE.WAYS);
localparam bit [SCONFIG.SUB_LINE_ADDR_W-1:0] END_OF_LINE_COUNT = SCONFIG.SUB_LINE_ADDR_W'(CONFIG.DCACHE.LINE_W-1);
cache_functions_interface # (.LINE_W(SCONFIG.LINE_ADDR_W), .SUB_LINE_W(SCONFIG.SUB_LINE_ADDR_W)) addr_utils ();
typedef struct packed{
logic [31:0] addr;
logic uncacheable;
} load_stage2_t;
load_stage2_t stage2_load;
typedef struct packed{
logic [31:0] addr;
logic [3:0] be;
logic [31:0] data;
logic cache_op;
logic uncacheable;
} store_stage2_t;
store_stage2_t stage2_store;
logic [CONFIG.DCACHE.WAYS-1:0] load_tag_hit_way;
logic [CONFIG.DCACHE.WAYS-1:0] store_tag_hit_way;
logic [CONFIG.DCACHE.WAYS-1:0] replacement_way;
logic [CONFIG.DCACHE.WAYS-1:0] replacement_way_r;
logic load_tag_check;
logic load_hit;
logic store_hit;
logic [LOG2_WAYS-1:0] tag_hit_index;
logic [LOG2_WAYS-1:0] replacement_index;
logic [LOG2_WAYS-1:0] replacement_index_r;
logic [LOG2_WAYS-1:0] load_sel;
logic is_target_word;
logic [SCONFIG.SUB_LINE_ADDR_W-1:0] word_count;
logic miss_data_valid;
logic line_complete;
logic arb_load_sel;
logic load_l1_arb_ack;
logic store_l1_arb_ack;
logic [31:0] ram_load_data [CONFIG.DCACHE.WAYS-1:0];
typedef enum {
LOAD_IDLE = 0,
LOAD_HIT_CHECK = 1,
LOAD_L1_REQUEST = 2,
LOAD_FILL = 3
} load_path_enum_t;
logic [3:0] load_state, load_state_next;
typedef enum {
STORE_IDLE = 0,
STORE_L1_REQUEST = 1
} store_path_enum_t;
logic [1:0] store_state, store_state_next;
////////////////////////////////////////////////////
//Implementation
////////////////////////////////////////////////////
//Load Path
always_ff @ (posedge clk) begin
if (rst) begin
load_state <= 0;
load_state[LOAD_IDLE] <= 1;
end
else
load_state <= load_state_next;
end
always_comb begin
load_state_next[LOAD_IDLE] = (load_state[LOAD_IDLE] & ~load_request) | ((load_hit & ~load_request) | line_complete);
load_state_next[LOAD_HIT_CHECK] = load_request;
load_state_next[LOAD_L1_REQUEST] = (load_state[LOAD_L1_REQUEST] & ~load_l1_arb_ack) | (load_state[LOAD_HIT_CHECK] & ~load_hit);
load_state_next[LOAD_FILL] = (load_state[LOAD_FILL] & ~line_complete) | (load_state[LOAD_L1_REQUEST] & load_l1_arb_ack);
end
assign load_ready = (load_state[LOAD_IDLE] | load_hit) & (store_state[STORE_IDLE] | store_l1_arb_ack);
always_ff @ (posedge clk) begin
if (load_request) begin
stage2_load.addr <= ls_load.addr;
stage2_load.uncacheable <= uncacheable_load;
end
end
assign load_tag_check = load_request & dcache_on & ~uncacheable_load;
////////////////////////////////////////////////////
//Load Miss
always_ff @ (posedge clk) begin
if (load_request)
word_count <= 0;
else
word_count <= word_count + SCONFIG.SUB_LINE_ADDR_W'(l1_response.data_valid);
end
assign is_target_word = (stage2_load.addr[2 +: SCONFIG.SUB_LINE_ADDR_W] == word_count) | stage2_load.uncacheable;
assign line_complete = l1_response.data_valid & ((word_count == END_OF_LINE_COUNT) | stage2_load.uncacheable);
////////////////////////////////////////////////////
//Store Path
always_ff @ (posedge clk) begin
if (rst) begin
store_state <= 0;
store_state[STORE_IDLE] <= 1;
end
else
store_state <= store_state_next;
end
always_comb begin
store_state_next[STORE_IDLE] = (store_state[STORE_IDLE] & (~store_request | (store_request & ls_store.cache_op))) | (store_l1_arb_ack & ~store_request);
store_state_next[STORE_L1_REQUEST] = (store_state[STORE_L1_REQUEST] & ~store_l1_arb_ack) | (store_request & ~ls_store.cache_op);
end
assign store_ready = (store_state[STORE_IDLE] | store_l1_arb_ack) & (load_state[LOAD_IDLE] | load_hit);
assign ls.ready = is_load ? load_ready : store_ready;
always_ff @ (posedge clk) begin
if (store_request) begin
stage2_store.addr <= ls_store.addr;
stage2_store.uncacheable <= uncacheable_store;
stage2_store.be <= ls_store.be;
stage2_store.data <= ls_store.data_in;
stage2_store.cache_op <= ls_store.cache_op;
end
end
////////////////////////////////////////////////////
//L1 Arbiter Interface
//Priority to oldest request
fifo_interface #(.DATA_TYPE(logic)) request_order();
assign request_order.data_in = load_request;
assign request_order.push = load_request | (store_request & ~ls_store.cache_op);
assign request_order.potential_push = request_order.push;
assign request_order.pop = l1_request.ack | load_hit;
cva5_fifo #(.DATA_TYPE(logic), .FIFO_DEPTH(2))
request_order_fifo (
.clk (clk),
.rst (rst),
.fifo (request_order)
);
assign arb_load_sel = request_order.data_out;
assign l1_request.addr = arb_load_sel ? stage2_load.addr : stage2_store.addr;//Memory interface aligns request to burst size (done there to support AMO line-read word-write)
assign l1_request.data = stage2_store.data;
assign l1_request.rnw = arb_load_sel;
assign l1_request.be = stage2_store.be;
assign l1_request.size = (arb_load_sel & ~stage2_load.uncacheable) ? 5'(CONFIG.DCACHE.LINE_W-1) : 0;//LR and AMO ops are included in load
assign l1_request.is_amo = 0;
assign l1_request.amo = 0;
assign l1_request.request = load_state[LOAD_L1_REQUEST] | store_state[STORE_L1_REQUEST];
assign load_l1_arb_ack = l1_request.ack & arb_load_sel;
assign store_l1_arb_ack = l1_request.ack & ~arb_load_sel;
////////////////////////////////////////////////////
//Replacement policy (free runing one-hot cycler, i.e. pseudo random)
cycler #(CONFIG.DCACHE.WAYS) replacement_policy (
.clk (clk),
.rst (rst),
.en (1'b1),
.one_hot (replacement_way)
);
////////////////////////////////////////////////////
//Tag banks
dcache_tag_banks #(.CONFIG(CONFIG), .SCONFIG(SCONFIG))
tag_banks (
.clk (clk),
.rst (rst),
.load_addr (ls_load.addr),
.load_req (load_tag_check),
.miss_addr (stage2_load.addr),
.miss_req (load_l1_arb_ack),
.miss_way (replacement_way),
.inv_addr ({l1_response.inv_addr, 2'b0}),
.extern_inv (l1_response.inv_valid),
.extern_inv_complete (l1_response.inv_ack),
.store_addr (ls_store.addr),
.store_addr_r (stage2_store.addr),
.store_req (store_request),
.cache_op_req (ls_store.cache_op),
.load_tag_hit (load_hit),
.load_tag_hit_way (load_tag_hit_way),
.store_tag_hit (store_hit),
.store_tag_hit_way (store_tag_hit_way)
);
////////////////////////////////////////////////////
//Data Bank(s)
logic [SCONFIG.LINE_ADDR_W+SCONFIG.SUB_LINE_ADDR_W-1:0] data_read_addr;
assign data_read_addr = load_state[LOAD_FILL] ? {addr_utils.getTagLineAddr(stage2_load.addr), word_count} : addr_utils.getDataLineAddr(ls_load.addr);
generate for (genvar i=0; i < CONFIG.DCACHE.WAYS; i++) begin : data_bank_gen
byte_en_bram #(CONFIG.DCACHE.LINES*CONFIG.DCACHE.LINE_W) data_bank (
.clk(clk),
.addr_a(data_read_addr),
.addr_b(addr_utils.getDataLineAddr(stage2_store.addr)),
.en_a(load_tag_check | (replacement_way_r[i] & l1_response.data_valid)),
.en_b(store_tag_hit_way[i]),
.be_a({4{(replacement_way_r[i] & l1_response.data_valid)}}),
.be_b(stage2_store.be),
.data_in_a(l1_response.data),
.data_in_b(stage2_store.data),
.data_out_a(ram_load_data[i]),
.data_out_b()
);
end endgenerate
////////////////////////////////////////////////////
//Output
//One-hot tag hit / update logic to binary int
one_hot_to_integer #(CONFIG.DCACHE.WAYS)
hit_way_conv (
.one_hot (load_tag_hit_way),
.int_out (tag_hit_index)
);
one_hot_to_integer #(CONFIG.DCACHE.WAYS)
replacment_way_conv (
.one_hot (replacement_way),
.int_out (replacement_index)
);
always_ff @ (posedge clk) begin
if (load_l1_arb_ack) begin
replacement_way_r <= replacement_way;
replacement_index_r <= replacement_index;
end
end
always_ff @ (posedge clk) miss_data_valid <= l1_response.data_valid & is_target_word;
logic collision;
logic [31:0] saved_data;
logic [3:0] saved_be;
assign collision = store_state[STORE_L1_REQUEST] & (stage2_store.addr[31:2] == ls_load.addr[31:2]);
always_ff @ (posedge clk) begin
if (load_request) begin
saved_data <= stage2_store.data;
saved_be <= {4{collision}} & stage2_store.be;
end
end
assign load_sel = load_state[LOAD_HIT_CHECK] ? tag_hit_index : replacement_index_r;
always_comb for (int i = 0; i < 4; i++)
ls.data_out[8*i+:8] = saved_be[i] ? saved_data[8*i+:8] : ram_load_data[load_sel][8*i+:8];
assign ls.data_valid = load_hit | miss_data_valid;
////////////////////////////////////////////////////
//End of Implementation
////////////////////////////////////////////////////
////////////////////////////////////////////////////
//Assertions
dcache_request_when_not_ready_assertion:
assert property (@(posedge clk) disable iff (rst) load_request |-> load_ready)
else $error("dcache received request when not ready");
dache_suprious_l1_ack_assertion:
assert property (@(posedge clk) disable iff (rst) l1_request.ack |-> (load_state[LOAD_L1_REQUEST] | store_state[STORE_L1_REQUEST]))
else $error("dcache received ack without a request");
endmodule

View file

@ -0,0 +1,572 @@
/*
* Copyright © 2024 Chris Keilbart
*
* Licensed under the Apache License, Version 2.0 (the "License");
* you may not use this file except in compliance with the License.
* You may obtain a copy of the License at
*
* http://www.apache.org/licenses/LICENSE-2.0
*
* Unless required by applicable law or agreed to in writing, software
* distributed under the License is distributed on an "AS IS" BASIS,
* WITHOUT WARRANTIES OR CONDITIONS OF ANY KIND, either express or implied.
* See the License for the specific language governing permissions and
* limitations under the License.
*
* Initial code developed under the supervision of Dr. Lesley Shannon,
* Reconfigurable Computing Lab, Simon Fraser University.
*
* Author(s):
* Chris Keilbart <ckeilbar@sfu.ca>
*/
module dcache_inv
import cva5_config::*;
import riscv_types::*;
import cva5_types::*;
# (
parameter cpu_config_t CONFIG = EXAMPLE_CONFIG
)
(
input logic clk,
input logic rst,
mem_interface.rw_master mem,
output logic write_outstanding,
input logic amo,
input amo_t amo_type,
amo_interface.subunit amo_unit,
input logic cbo,
input logic uncacheable,
memory_sub_unit_interface.responder ls,
input logic load_peek, //If the next request may be a load
input logic[31:0] load_addr_peek //The address in that case
);
localparam derived_cache_config_t SCONFIG = get_derived_cache_params(CONFIG, CONFIG.DCACHE, CONFIG.DCACHE_ADDR);
localparam DB_ADDR_LEN = SCONFIG.LINE_ADDR_W + SCONFIG.SUB_LINE_ADDR_W;
cache_functions_interface # (.TAG_W(SCONFIG.TAG_W), .LINE_W(SCONFIG.LINE_ADDR_W), .SUB_LINE_W(SCONFIG.SUB_LINE_ADDR_W)) addr_utils ();
typedef logic[SCONFIG.TAG_W-1:0] tag_t;
typedef logic[SCONFIG.LINE_ADDR_W-1:0] line_t;
typedef logic[SCONFIG.SUB_LINE_ADDR_W-1:0] block_t;
typedef struct packed {
logic valid;
tag_t tag;
} tb_entry_t;
typedef enum {
WRITE,
CBO,
READ,
AMO_LR,
AMO_SC,
AMO_RMW
} req_type_t;
req_type_t stage0_type;
req_type_t stage1_type;
typedef struct packed {
logic[31:0] addr;
logic[31:0] wdata;
logic[3:0] be;
amo_t amo_type;
logic uncacheable;
} req_t;
req_t stage0;
req_t stage1;
logic stage1_valid;
logic stage1_done;
logic stage0_advance_r;
logic resetting;
////////////////////////////////////////////////////
//Implementation
always_ff @(posedge clk) begin
if (rst) begin
stage0_advance_r <= 0;
stage1_valid <= 0;
stage1_type <= WRITE;
end
else begin
stage0_advance_r <= ls.new_request;
if (ls.new_request) begin
stage1_valid <= 1;
stage1_type <= stage0_type;
end
else if (stage1_done)
stage1_valid <= 0;
end
if (ls.new_request)
stage1 <= stage0;
end
always_comb begin
if (cbo)
stage0_type = CBO;
else if (ls.we)
stage0_type = WRITE;
else if (amo & amo_type == AMO_LR_FN5)
stage0_type = AMO_LR;
else if (amo & amo_type == AMO_SC_FN5)
stage0_type = AMO_SC;
else if (amo)
stage0_type = AMO_RMW;
else
stage0_type = READ;
end
assign stage0 = '{
addr : ls.addr,
wdata : ls.data_in,
be : ls.be,
amo_type : amo_type,
uncacheable : uncacheable
};
////////////////////////////////////////////////////
//Snooping
//Invalidate a line in the tagbank upon a hit
line_t snoop_line;
tag_t snoop_tag;
logic snoop_valid;
tb_entry_t[CONFIG.DCACHE.WAYS-1:0] snoop_rdata;
line_t snoop_line_r;
tag_t snoop_tag_r;
logic[CONFIG.DCACHE.WAYS-1:0] snoop_hit;
logic snoop_write;
//Technically snoop addresses do not need to lie within our addressable space, so their tag should be wider
//But this is a niche scenario and there is no harm in aliasing requests into our address space (beyond performance)
assign {snoop_tag, snoop_line} = mem.inv_addr[2+SCONFIG.SUB_LINE_ADDR_W+:SCONFIG.TAG_W+SCONFIG.LINE_ADDR_W];
always_ff @(posedge clk) begin
if (rst)
snoop_valid <= 0;
else
snoop_valid <= mem.inv;
snoop_line_r <= snoop_line;
snoop_tag_r <= snoop_tag;
end
//Hit detection
assign snoop_write = snoop_valid & |snoop_hit;
always_comb begin
for (int i = 0; i < CONFIG.DCACHE.WAYS; i++)
snoop_hit[i] = {snoop_rdata[i].valid, snoop_rdata[i].tag} == {1'b1, snoop_tag_r};
end
//Random replacement policy (cycler)
logic[CONFIG.DCACHE.WAYS-1:0] replacement_way;
cycler #(.C_WIDTH(CONFIG.DCACHE.WAYS)) replacement_policy (
.en(ls.new_request),
.one_hot(replacement_way),
.*);
////////////////////////////////////////////////////
//Tagbank
//Snoops are always accepted and cannot be delayed
//Port A therefore handles all requests and snoop writes
//Port B handles snoop reads + resets
logic a_en;
logic[CONFIG.DCACHE.WAYS-1:0] a_wbe;
tb_entry_t a_wdata;
line_t a_addr;
tb_entry_t[CONFIG.DCACHE.WAYS-1:0] a_rdata;
logic stage1_tb_write;
logic stage1_tb_wval;
logic stage1_tb_write_r;
logic inv_matches_stage1;
logic stage1_tb_wval_r;
logic[CONFIG.DCACHE.WAYS-1:0] hit_ohot_r;
assign a_en = snoop_write | stage1_tb_write_r | ls.new_request; // & ~inv_matches_stage1 )
assign a_wbe = ({CONFIG.DCACHE.WAYS{snoop_write}} & snoop_hit) | ({CONFIG.DCACHE.WAYS{stage1_tb_write_r}} & (stage1_type == CBO | (stage1_type == AMO_RMW & hit_r) ? hit_ohot_r : replacement_way));
always_comb begin
if (snoop_write)
a_addr = snoop_line_r;
else if (stage1_tb_write_r)
a_addr = stage1.addr[2+SCONFIG.SUB_LINE_ADDR_W+:SCONFIG.LINE_ADDR_W];
else
a_addr = stage0.addr[2+SCONFIG.SUB_LINE_ADDR_W+:SCONFIG.LINE_ADDR_W];
end
assign a_wdata = '{
valid : stage1_tb_write_r & stage1_tb_wval_r & ~inv_matches_stage1 ,
tag : stage1.addr[2+SCONFIG.SUB_LINE_ADDR_W+SCONFIG.LINE_ADDR_W+:SCONFIG.TAG_W]
};
//Reset routine
logic b_en;
logic[CONFIG.DCACHE.WAYS-1:0] b_wbe;
tb_entry_t b_wdata;
line_t b_addr;
logic rst_invalid;
line_t rst_line;
assign resetting = ~rst_invalid;
assign b_en = mem.inv | resetting;
assign b_wbe = {CONFIG.DCACHE.WAYS{resetting}};
assign b_wdata = '{default: '0};
assign b_addr = resetting ? rst_line : snoop_line;
always_ff @(posedge clk) begin
if (rst) begin
rst_invalid <= 0;
rst_line <= '0;
end
else if (resetting)
{rst_invalid, rst_line} <= rst_line + 1;
end
tdp_ram #(
.ADDR_WIDTH(SCONFIG.LINE_ADDR_W),
.NUM_COL(CONFIG.DCACHE.WAYS),
.COL_WIDTH($bits(tb_entry_t)),
.PIPELINE_DEPTH(0),
.USE_PRELOAD(0)
) tagbank (
.a_en(a_en),
.a_wbe(a_wbe),
.a_wdata({CONFIG.DCACHE.WAYS{a_wdata}}),
.a_addr(a_addr),
.a_rdata(a_rdata),
.b_en(b_en),
.b_wbe(b_wbe),
.b_wdata({CONFIG.DCACHE.WAYS{b_wdata}}),
.b_addr(b_addr),
.b_rdata(snoop_rdata),
.*);
//Hit detection
logic hit;
logic hit_r;
logic[CONFIG.DCACHE.WAYS-1:0] hit_ohot;
always_comb begin
hit_ohot = '0;
for (int i = 0; i < CONFIG.DCACHE.WAYS; i++)
hit_ohot[i] = a_rdata[i].valid & (a_rdata[i].tag == stage1.addr[2+SCONFIG.SUB_LINE_ADDR_W+SCONFIG.LINE_ADDR_W+:SCONFIG.TAG_W]);
end
assign hit = |hit_ohot;
always_ff @(posedge clk) begin
if (stage0_advance_r) begin
hit_r <= hit;
hit_ohot_r <= hit_ohot;
end
end
////////////////////////////////////////////////////
//Atomic read/modify/write state machine
//Separate from other logic because atomic requests will need to be retried on a snoop invalidation
typedef enum {
RMW_IDLE,
RMW_READ,
RMW_WRITE,
RMW_FILLING
} rmw_state_t;
rmw_state_t current_state;
rmw_state_t next_state;
logic rmw_mem_request;
logic rmw_mem_rnw;
logic rmw_stage1_tb_write;
logic rmw_db_wen;
logic[31:0] rmw_db_wdata;
logic rmw_ls_data_valid;
logic rmw_stage1_done;
logic rmw_retry;
logic force_miss;
logic return_done;
always_ff @(posedge clk) begin
if (rst)
current_state <= RMW_IDLE;
else
current_state <= next_state;
end
always_comb begin
unique case (current_state)
RMW_READ : begin
rmw_mem_request = 1;
rmw_mem_rnw = 1;
rmw_stage1_tb_write = mem.ack & ~stage1.uncacheable;
rmw_db_wen = 0;
rmw_db_wdata = 'x;
rmw_ls_data_valid = 0;
rmw_stage1_done = 0;
next_state = mem.ack ? RMW_FILLING : RMW_READ;
end
RMW_WRITE : begin
rmw_mem_request = ~rmw_retry;
rmw_mem_rnw = 0;
rmw_stage1_tb_write = 0;
rmw_db_wen = ~stage1.uncacheable & mem.ack;
rmw_db_wdata = amo_unit.rd;
rmw_ls_data_valid = mem.ack;
rmw_stage1_done = mem.ack;
if (mem.ack)
next_state = RMW_IDLE;
else if (rmw_retry)
next_state = RMW_READ;
else
next_state = RMW_WRITE;
end
RMW_FILLING : begin
rmw_mem_request = 0;
rmw_mem_rnw = 'x;
rmw_stage1_tb_write = 0;
rmw_db_wen = mem.rvalid & ~stage1.uncacheable;
rmw_db_wdata = mem.rdata;
rmw_ls_data_valid = 0;
rmw_stage1_done = 0;
if (return_done)
next_state = rmw_retry ? RMW_READ : RMW_WRITE;
else
next_state = RMW_FILLING;
end
RMW_IDLE : begin
rmw_mem_request = 0;
rmw_mem_rnw = 'x;
rmw_stage1_tb_write = 0;
rmw_db_wen = 0;
rmw_db_wdata = 'x;
rmw_ls_data_valid = 0;
rmw_stage1_done = 0;
if (stage1_valid & stage1_type == AMO_RMW)
next_state = hit & ~force_miss & ~stage1.uncacheable ? RMW_WRITE : RMW_READ;
else
next_state = RMW_IDLE;
end
endcase
end
////////////////////////////////////////////////////
//Supporting logic
//Various piece of additional stateful logic supporting stage one requests
//Tagbank write logic; always on ack_r because it is guaranteed that there won't be a conflicting snoop tb write
logic ack_r;
always_ff @(posedge clk) begin
ack_r <= mem.ack;
stage1_tb_write_r <= stage1_tb_write;
stage1_tb_wval_r <= stage1_tb_wval;
end
//Track if a request has been sent in stage 1 to prevent duplicates
logic request_sent;
always_ff @(posedge clk) begin
if (rst | stage1_done)
request_sent <= 0;
else if (mem.ack)
request_sent <= 1;
end
//Atomics that collide with a snoop on stage0 must be treated as a miss
always_ff @(posedge clk) begin
if (ls.new_request)
force_miss <= amo & mem.inv & mem.inv_addr[31:2+SCONFIG.SUB_LINE_ADDR_W] == stage0.addr[31:2+SCONFIG.SUB_LINE_ADDR_W];
end
//RMW requests must be retried if invalidated after the read but before the write
assign inv_matches_stage1 = mem.inv & stage1.addr[31:2+SCONFIG.SUB_LINE_ADDR_W] == mem.inv_addr[31:2+SCONFIG.SUB_LINE_ADDR_W];
always_ff @(posedge clk) begin
case (current_state)
RMW_IDLE, RMW_FILLING, RMW_WRITE : rmw_retry <= stage1_valid & stage1_type == AMO_RMW & (rmw_retry | inv_matches_stage1);
default: rmw_retry <= 0;
endcase
end
//Fill burst word counting
logic correct_word;
block_t word_counter;
assign return_done = mem.rvalid & (stage1.uncacheable | word_counter == SCONFIG.SUB_LINE_ADDR_W'(CONFIG.DCACHE.LINE_W-1));
assign correct_word = mem.rvalid & (stage1.uncacheable | word_counter == stage1.addr[2+:SCONFIG.SUB_LINE_ADDR_W]);
always_ff @(posedge clk) begin
if (rst | stage1_done)
word_counter <= '0;
else
word_counter <= word_counter + block_t'(mem.rvalid);
end
////////////////////////////////////////////////////
//Stage 1 request handling
//Heavily dependent on request type
logic db_wen;
logic[CONFIG.DCACHE.WAYS-1:0] db_way;
logic[31:0] db_wdata;
logic lr_valid;
always_comb begin
unique case (stage1_type)
WRITE : begin
mem.request = stage1_valid;
mem.wdata = stage1.wdata;
mem.rnw = 0;
mem.rmw = 0;
stage1_tb_write = 0;
stage1_tb_wval = 'x;
db_wen = stage0_advance_r & hit & ~stage1.uncacheable;
db_wdata = stage1.wdata;
db_way = hit_ohot;
ls.data_valid = 0;
ls.data_out = 'x;
stage1_done = mem.ack;
end
CBO : begin
mem.request = stage1_valid & ~request_sent;
mem.wdata = 'x;
mem.rnw = 0;
mem.rmw = 0;
stage1_tb_write = ~stage1.uncacheable & mem.ack & (stage0_advance_r ? hit : hit_r);
stage1_tb_wval = 0;
db_wen = 0;
db_wdata = 'x;
db_way = 'x;
ls.data_valid = 0;
ls.data_out = 'x;
stage1_done = request_sent & ~stage1_tb_write_r;
end
AMO_LR, READ : begin
mem.request = stage1_valid & ~stage0_advance_r & (stage1.uncacheable | ~hit_r) & ~request_sent;
mem.wdata = 'x;
mem.rnw = 1;
mem.rmw = 0;
stage1_tb_write = ~stage1.uncacheable & mem.ack;
stage1_tb_wval = 1;
db_wen = mem.rvalid & ~stage1.uncacheable;
db_wdata = mem.rdata;
db_way = replacement_way;
ls.data_valid = stage0_advance_r ? hit & ~stage1.uncacheable : correct_word;
ls.data_out = stage0_advance_r ? db_hit_entry : mem.rdata;
stage1_done = stage0_advance_r ? hit & ~stage1.uncacheable : return_done;
end
AMO_SC : begin
mem.request = stage1_valid & lr_valid;
mem.wdata = stage1.wdata;
mem.rnw = 0;
mem.rmw = 0;
stage1_tb_write = 0;
stage1_tb_wval = 'x;
db_wen = ~stage1.uncacheable & mem.ack;
db_wdata = stage1.wdata;
db_way = stage0_advance_r ? hit_ohot : hit_ohot_r;
ls.data_valid = stage1_valid & (mem.ack | ~lr_valid);
ls.data_out = {31'b0, ~lr_valid};
stage1_done = stage1_valid & (mem.ack | ~lr_valid);
end
AMO_RMW : begin
mem.request = rmw_mem_request;
mem.wdata = amo_unit.rd;
mem.rnw = rmw_mem_rnw;
mem.rmw = 1;
stage1_tb_write = rmw_stage1_tb_write;
stage1_tb_wval = 1;
db_wen = rmw_db_wen;
db_wdata = rmw_db_wdata;
db_way = hit_r ? hit_ohot_r : replacement_way; //Will not write on first cycle so can use registered
ls.data_valid = rmw_ls_data_valid;
ls.data_out = amo_unit.rs1;
stage1_done = rmw_stage1_done;
end
endcase
end
assign mem.addr = stage1.addr[31:2];
assign mem.wbe = stage1.be;
assign mem.rlen = stage1.uncacheable ? '0 : 5'(CONFIG.DCACHE.LINE_W-1);
logic[DB_ADDR_LEN-1:0] db_addr;
assign ls.ready = ~resetting & ~snoop_write & (~stage1_valid | stage1_done) & ~(db_wen & load_peek & load_addr_peek[31:DB_ADDR_LEN+2] == stage1.addr[31:DB_ADDR_LEN+2] & load_addr_peek[2+:DB_ADDR_LEN] == db_addr);
assign write_outstanding = (stage1_valid & ~(stage1_type inside {READ, AMO_LR})) | mem.write_outstanding;
////////////////////////////////////////////////////
//Atomics
logic local_reservation_valid;
//local_reservation_valid is with respect to invalidations, the amo.reservation_valid is for other ports
assign lr_valid = amo_unit.reservation_valid & local_reservation_valid;
always_ff @(posedge clk) begin
if (rst | inv_matches_stage1)
local_reservation_valid <= 0;
else if (amo_unit.set_reservation)
local_reservation_valid <= 1;
end
assign amo_unit.reservation = stage1.addr;
//On a miss, set on ack_r
//On a hit, set as long as ~force_miss & ~inv_matches_stage1
assign amo_unit.set_reservation = stage1_valid & stage1_type == AMO_LR & (stage0_advance_r & hit & ~stage1.uncacheable & ~force_miss & ~inv_matches_stage1 | ack_r);
assign amo_unit.clear_reservation = stage1_done & stage1_type != AMO_LR;
//RMW
assign amo_unit.rs2 = stage1.wdata;
assign amo_unit.rmw_valid = stage1_valid & stage1_type == AMO_RMW;
assign amo_unit.op = stage1.amo_type;
always_ff @(posedge clk) begin
if (stage0_advance_r)
amo_unit.rs1 <= db_hit_entry;
else if (correct_word)
amo_unit.rs1 <= mem.rdata;
end
////////////////////////////////////////////////////
//Databank
logic[CONFIG.DCACHE.WAYS-1:0][31:0] db_entries;
logic[31:0] db_hit_entry;
logic[CONFIG.DCACHE.WAYS-1:0][3:0] db_wbe_full;
logic[$clog2(CONFIG.DCACHE.WAYS > 1 ? CONFIG.DCACHE.WAYS : 2)-1:0] hit_int;
always_comb begin
for (int i = 0; i < CONFIG.DCACHE.WAYS; i++)
db_wbe_full[i] = {4{db_way[i]}} & stage1.be;
end
assign db_addr[SCONFIG.SUB_LINE_ADDR_W+:SCONFIG.LINE_ADDR_W] = stage1.addr[2+SCONFIG.SUB_LINE_ADDR_W+:SCONFIG.LINE_ADDR_W];
assign db_addr[SCONFIG.SUB_LINE_ADDR_W-1:0] = mem.rvalid ? word_counter : stage1.addr[2+:SCONFIG.SUB_LINE_ADDR_W];
sdp_ram #(
.ADDR_WIDTH(DB_ADDR_LEN),
.NUM_COL(4*CONFIG.DCACHE.WAYS),
.COL_WIDTH(8),
.PIPELINE_DEPTH(0)
) databank (
.a_en(db_wen),
.a_wbe(db_wbe_full),
.a_wdata({CONFIG.DCACHE.WAYS{db_wdata}}),
.a_addr(db_addr),
.b_en(ls.new_request),
.b_addr(addr_utils.getDataLineAddr(stage0.addr)),
.b_rdata(db_entries),
.*);
one_hot_to_integer #(.C_WIDTH(CONFIG.DCACHE.WAYS)) hit_conv (
.one_hot(hit_ohot),
.int_out(hit_int)
);
assign db_hit_entry = db_entries[hit_int];
////////////////////////////////////////////////////
//Assertions
dcache_request_when_not_ready_assertion:
assert property (@(posedge clk) disable iff (rst) ls.new_request |-> ls.ready)
else $error("dcache received request when not ready");
dcache_spurious_l1_ack_assertion:
assert property (@(posedge clk) disable iff (rst) mem.ack |-> mem.request)
else $error("dcache received ack without a request");
// dcache_ohot_assertion:
// assert property (@(posedge clk) disable iff (rst) ls.new_request |=> $onehot0(hit_ohot))
// else $error("dcache hit multiple ways");
endmodule

View file

@ -0,0 +1,375 @@
/*
* Copyright © 2024 Chris Keilbart
*
* Licensed under the Apache License, Version 2.0 (the "License");
* you may not use this file except in compliance with the License.
* You may obtain a copy of the License at
*
* http://www.apache.org/licenses/LICENSE-2.0
*
* Unless required by applicable law or agreed to in writing, software
* distributed under the License is distributed on an "AS IS" BASIS,
* WITHOUT WARRANTIES OR CONDITIONS OF ANY KIND, either express or implied.
* See the License for the specific language governing permissions and
* limitations under the License.
*
* Initial code developed under the supervision of Dr. Lesley Shannon,
* Reconfigurable Computing Lab, Simon Fraser University.
*
* Author(s):
* Chris Keilbart <ckeilbar@sfu.ca>
*/
module dcache_noinv
import cva5_config::*;
import riscv_types::*;
import cva5_types::*;
# (
parameter cpu_config_t CONFIG = EXAMPLE_CONFIG
)
(
input logic clk,
input logic rst,
mem_interface.rw_master mem,
output logic write_outstanding,
input logic amo,
input amo_t amo_type,
amo_interface.subunit amo_unit,
input logic cbo,
input logic uncacheable,
memory_sub_unit_interface.responder ls,
input logic load_peek, //If the next request may be a load
input logic[31:0] load_addr_peek //The address in that case
);
localparam derived_cache_config_t SCONFIG = get_derived_cache_params(CONFIG, CONFIG.DCACHE, CONFIG.DCACHE_ADDR);
localparam DB_ADDR_LEN = SCONFIG.LINE_ADDR_W + SCONFIG.SUB_LINE_ADDR_W;
cache_functions_interface # (.TAG_W(SCONFIG.TAG_W), .LINE_W(SCONFIG.LINE_ADDR_W), .SUB_LINE_W(SCONFIG.SUB_LINE_ADDR_W)) addr_utils ();
typedef logic[SCONFIG.TAG_W-1:0] tag_t;
typedef struct packed {
logic valid;
tag_t tag;
} tb_entry_t;
typedef struct packed {
logic[31:0] addr;
logic[31:0] data;
logic[3:0] be;
logic rnw;
logic uncacheable;
logic amo;
amo_t amo_type;
logic cbo;
} req_t;
typedef enum {
IDLE,
FIRST_CYCLE,
REQUESTING_READ,
FILLING,
UNCACHEABLE_WAITING_READ,
AMO_WRITE
} stage1_t;
//Implementation
req_t stage0;
req_t stage1;
logic stage1_done;
logic stage0_advance_r;
stage1_t current_state;
logic[DB_ADDR_LEN-1:0] db_addr;
logic db_wen;
logic stage1_is_lr;
logic stage1_is_sc;
assign write_outstanding = ((current_state != IDLE) & (~stage1.rnw | stage1.amo)) | mem.write_outstanding;
//Peeking avoids circular logic
assign ls.ready = (current_state == IDLE) | (stage1_done & ~stage1.cbo & ~(db_wen & load_peek & load_addr_peek[31:DB_ADDR_LEN+2] == stage1.addr[31:DB_ADDR_LEN+2] & load_addr_peek[2+:DB_ADDR_LEN] == db_addr));
always_ff @(posedge clk) begin
if (rst)
stage0_advance_r <= 0;
else
stage0_advance_r <= ls.new_request;
if (ls.new_request)
stage1 <= stage0;
end
assign stage0 = '{
addr : ls.addr,
data : ls.data_in,
be : ls.be,
rnw : ls.re,
uncacheable : uncacheable,
amo : amo,
amo_type : amo_type,
cbo : cbo
};
//Replacement policy
logic[CONFIG.DCACHE.WAYS-1:0] replacement_way;
cycler #(.C_WIDTH(CONFIG.DCACHE.WAYS)) replacement_policy (
.en(ls.new_request),
.one_hot(replacement_way),
.*);
//Tagbank
tb_entry_t[CONFIG.DCACHE.WAYS-1:0] tb_entries;
tb_entry_t new_entry;
logic[CONFIG.DCACHE.WAYS-1:0] hit_ohot;
logic[CONFIG.DCACHE.WAYS-1:0] hit_ohot_r;
logic hit;
logic hit_r;
logic tb_write;
assign tb_write = stage0_advance_r & ~stage1.uncacheable & ((~hit & stage1.rnw & ~stage1_is_sc) | (stage1.cbo & hit));
assign new_entry = '{
valid : ~stage1.cbo,
tag : addr_utils.getTag(stage1.addr)
};
sdp_ram_padded #(
.ADDR_WIDTH(SCONFIG.LINE_ADDR_W),
.NUM_COL(CONFIG.DCACHE.WAYS),
.COL_WIDTH($bits(tb_entry_t)),
.PIPELINE_DEPTH(0)
) tagbank (
.a_en(tb_write),
.a_wbe(replacement_way),
.a_wdata({CONFIG.DCACHE.WAYS{new_entry}}),
.a_addr(addr_utils.getTagLineAddr(stage1.addr)),
.b_en(ls.new_request),
.b_addr(addr_utils.getTagLineAddr(stage0.addr)),
.b_rdata(tb_entries),
.*);
//Hit detection
always_comb begin
hit_ohot = '0;
for (int i = 0; i < CONFIG.DCACHE.WAYS; i++)
hit_ohot[i] = tb_entries[i].valid & (tb_entries[i].tag == addr_utils.getTag(stage1.addr));
end
assign hit = |hit_ohot;
always_ff @(posedge clk) begin
if (stage0_advance_r) begin
hit_r <= hit;
hit_ohot_r <= hit_ohot;
end
end
//Databank
logic[CONFIG.DCACHE.WAYS-1:0][31:0] db_entries;
logic[31:0] db_hit_entry;
logic[$clog2(CONFIG.DCACHE.WAYS > 1 ? CONFIG.DCACHE.WAYS : 2)-1:0] hit_int;
logic[CONFIG.DCACHE.WAYS-1:0] db_way;
logic[CONFIG.DCACHE.WAYS-1:0][3:0] db_wbe_full;
logic[31:0] db_wdata;
logic[SCONFIG.SUB_LINE_ADDR_W-1:0] word_counter;
always_comb begin
for (int i = 0; i < CONFIG.DCACHE.WAYS; i++)
db_wbe_full[i] = {4{db_way[i]}} & stage1.be;
end
assign db_addr = current_state == FILLING ? {addr_utils.getTagLineAddr(stage1.addr), word_counter} : addr_utils.getDataLineAddr(stage1.addr);
sdp_ram #(
.ADDR_WIDTH(DB_ADDR_LEN),
.NUM_COL(4*CONFIG.DCACHE.WAYS),
.COL_WIDTH(8),
.PIPELINE_DEPTH(0)
) databank (
.a_en(db_wen),
.a_wbe(db_wbe_full),
.a_wdata({CONFIG.DCACHE.WAYS{db_wdata}}),
.a_addr(db_addr),
.b_en(ls.new_request),
.b_addr(addr_utils.getDataLineAddr(stage0.addr)),
.b_rdata(db_entries),
.*);
one_hot_to_integer #(.C_WIDTH(CONFIG.DCACHE.WAYS)) hit_conv (
.one_hot(hit_ohot),
.int_out(hit_int)
);
assign db_hit_entry = db_entries[hit_int];
//Arbiter response
logic correct_word;
logic return_done;
assign return_done = mem.rvalid & word_counter == SCONFIG.SUB_LINE_ADDR_W'(CONFIG.DCACHE.LINE_W-1);
assign correct_word = mem.rvalid & word_counter == stage1.addr[2+:SCONFIG.SUB_LINE_ADDR_W];
always_ff @(posedge clk) begin
if (mem.rvalid)
word_counter <= word_counter+1;
if (ls.new_request)
word_counter <= 0;
end
stage1_t next_state;
always_ff @(posedge clk) begin
if (rst)
current_state <= IDLE;
else
current_state <= next_state;
end
//Have to pull this into its own block to prevent a verilator circular dependency
always_comb begin
unique case (current_state)
IDLE : stage1_done = 0;
FIRST_CYCLE : stage1_done = ((~stage1.rnw | (stage1_is_sc & amo_unit.reservation_valid)) & mem.ack) | (stage1_is_sc & ~amo_unit.reservation_valid) | (stage1.rnw & hit & (~stage1.amo | stage1_is_lr) & ~stage1.uncacheable) | stage1.cbo;
REQUESTING_READ : stage1_done = 0;
FILLING : stage1_done = return_done & (stage1_is_lr | ~stage1.amo);
UNCACHEABLE_WAITING_READ : stage1_done = mem.rvalid & (stage1_is_lr | ~stage1.amo);
AMO_WRITE : stage1_done = mem.ack;
endcase
end
always_comb begin
unique case (current_state)
IDLE : begin
mem.request = 0;
mem.wdata = 'x;
mem.rnw = 'x;
mem.rlen = 'x;
db_wen = 0;
db_wdata = 'x;
db_way = 'x;
ls.data_valid = 0;
ls.data_out = 'x;
next_state = ls.new_request ? FIRST_CYCLE : IDLE;
end
FIRST_CYCLE : begin //Handles writes, read hits, uncacheable reads, and SC
mem.request = ~stage1.cbo & (~stage1.rnw | (stage1.uncacheable & ~stage1_is_sc) | (stage1_is_sc & amo_unit.reservation_valid));
mem.wdata = stage1.data;
mem.rnw = stage1.rnw & ~stage1_is_sc;
mem.rlen = '0;
db_wen = ~stage1.cbo & hit & ~stage1.uncacheable & (~stage1.rnw | (stage1_is_sc & amo_unit.reservation_valid));
db_wdata = stage1.data;
db_way = hit_ohot;
ls.data_valid = (stage0_advance_r & stage1_is_sc) | (stage1.rnw & ~stage1.uncacheable & hit & ~stage1_is_sc);
ls.data_out = stage1_is_sc ? {31'b0, ~amo_unit.reservation_valid} : db_hit_entry;
if (stage1_done)
next_state = ls.new_request ? FIRST_CYCLE : IDLE;
else if (stage1.uncacheable & mem.ack)
next_state = UNCACHEABLE_WAITING_READ;
else if (stage1.rnw & ~stage1.uncacheable & ~hit & ~stage1_is_sc)
next_state = REQUESTING_READ;
else if (stage1.amo & hit & ~stage1.uncacheable & ~stage1_is_sc)
next_state = AMO_WRITE;
else
next_state = FIRST_CYCLE;
end
REQUESTING_READ : begin
mem.request = 1;
mem.wdata = 'x;
mem.rnw = 1;
mem.rlen = 5'(CONFIG.DCACHE.LINE_W-1);
db_wen = 0;
db_wdata = 'x;
db_way = 'x;
ls.data_valid = 0;
ls.data_out = 'x;
next_state = mem.ack ? FILLING : REQUESTING_READ;
end
FILLING : begin
mem.request = 0;
mem.wdata = 'x;
mem.rnw = 'x;
mem.rlen = 'x;
db_wen = mem.rvalid;
db_wdata = mem.rdata;
db_way = replacement_way;
ls.data_valid = correct_word;
ls.data_out = mem.rdata;
if (return_done) begin
if (stage1.amo & ~stage1_is_lr)
next_state = AMO_WRITE;
else
next_state = ls.new_request ? FIRST_CYCLE : IDLE;
end
else
next_state = FILLING;
end
UNCACHEABLE_WAITING_READ : begin
mem.request = 0;
mem.wdata = 'x;
mem.rnw = 'x;
mem.rlen = 'x;
db_wen = 0;
db_wdata = 'x;
db_way = 'x;
ls.data_valid = mem.rvalid;
ls.data_out = mem.rdata;
if (mem.rvalid) begin
if (stage1.amo & ~stage1_is_lr)
next_state = AMO_WRITE;
else
next_state = ls.new_request ? FIRST_CYCLE : IDLE;
end
else
next_state = UNCACHEABLE_WAITING_READ;
end
AMO_WRITE : begin
mem.request = 1;
mem.wdata = amo_unit.rd;
mem.rnw = 0;
mem.rlen = 'x;
db_wen = ~stage1.uncacheable;
db_wdata = amo_unit.rd;
db_way = hit_r ? hit_ohot_r : replacement_way;
ls.data_valid = 0;
ls.data_out = 'x;
if (mem.ack)
next_state = ls.new_request ? FIRST_CYCLE : IDLE;
else
next_state = AMO_WRITE;
end
endcase
end
//AMO
assign stage1_is_lr = stage1.amo & stage1.amo_type == AMO_LR_FN5;
assign stage1_is_sc = stage1.amo & stage1.amo_type == AMO_SC_FN5;
assign amo_unit.reservation = stage1.addr;
assign amo_unit.rs2 = stage1.data;
assign amo_unit.rmw_valid = (current_state != IDLE) & stage1.amo;
assign amo_unit.op = stage1.amo_type;
assign amo_unit.set_reservation = stage1_is_lr & stage1_done;
assign amo_unit.clear_reservation = stage1_done;
always_ff @(posedge clk) begin
if (stage0_advance_r)
amo_unit.rs1 <= db_hit_entry;
else if (correct_word | (mem.rvalid & stage1.uncacheable))
amo_unit.rs1 <= mem.rdata;
end
assign mem.addr = stage1.addr[31:2];
assign mem.wbe = stage1.be;
assign mem.rmw = 0; //Although this unit can issue RMWs, they do not need special treatment as they are not coherent with other units
////////////////////////////////////////////////////
//Assertions
dcache_request_when_not_ready_assertion:
assert property (@(posedge clk) disable iff (rst) ls.new_request |-> ls.ready)
else $error("dcache received request when not ready");
dcache_spurious_l1_ack_assertion:
assert property (@(posedge clk) disable iff (rst) mem.ack |-> mem.request)
else $error("dcache received ack without a request");
// dcache_ohot_assertion:
// assert property (@(posedge clk) disable iff (rst) ls.new_request |=> $onehot0(hit_ohot))
// else $error("dcache hit multiple ways");
endmodule

View file

@ -1,114 +0,0 @@
/*
* Copyright © 2022 Eric Matthews
*
* Licensed under the Apache License, Version 2.0 (the "License");
* you may not use this file except in compliance with the License.
* You may obtain a copy of the License at
*
* http://www.apache.org/licenses/LICENSE-2.0
*
* Unless required by applicable law or agreed to in writing, software
* distributed under the License is distributed on an "AS IS" BASIS,
* WITHOUT WARRANTIES OR CONDITIONS OF ANY KIND, either express or implied.
* See the License for the specific language governing permissions and
* limitations under the License.
*
* Initial code developed under the supervision of Dr. Lesley Shannon,
* Reconfigurable Computing Lab, Simon Fraser University.
*
* Author(s):
* Eric Matthews <ematthew@sfu.ca>
*/
module dcache_tag_banks
import cva5_config::*;
import cva5_types::*;
# (
parameter cpu_config_t CONFIG = EXAMPLE_CONFIG,
parameter derived_cache_config_t SCONFIG = '{LINE_ADDR_W : 9, SUB_LINE_ADDR_W : 2, TAG_W : 15}
)
(
input logic clk,
input logic rst,
//Port A
input logic[31:0] load_addr,
input logic load_req,
input logic[31:0] miss_addr,
input logic miss_req,
input logic[CONFIG.DCACHE.WAYS-1:0] miss_way,
input logic[31:0] inv_addr,
input logic extern_inv,
output logic extern_inv_complete,
//Port B
input logic[31:0] store_addr,
input logic[31:0] store_addr_r,
input logic store_req,
input logic cache_op_req,
output logic load_tag_hit,
output logic store_tag_hit,
output logic[CONFIG.DCACHE.WAYS-1:0] load_tag_hit_way,
output logic[CONFIG.DCACHE.WAYS-1:0] store_tag_hit_way
);
typedef struct packed {
logic valid;
logic [SCONFIG.TAG_W-1:0] tag;
} dtag_entry_t;
cache_functions_interface # (.TAG_W(SCONFIG.TAG_W), .LINE_W(SCONFIG.LINE_ADDR_W), .SUB_LINE_W(SCONFIG.SUB_LINE_ADDR_W)) addr_utils ();
dtag_entry_t tag_line_a [CONFIG.DCACHE.WAYS-1:0];
dtag_entry_t tag_line_b [CONFIG.DCACHE.WAYS-1:0];
dtag_entry_t new_tagline;
logic [SCONFIG.LINE_ADDR_W-1:0] porta_addr;
logic [SCONFIG.LINE_ADDR_W-1:0] portb_addr;
logic external_inv;
logic load_req_r;
logic store_req_r;
////////////////////////////////////////////////////
//Implementation
always_ff @ (posedge clk) load_req_r <= load_req;
always_ff @ (posedge clk) store_req_r <= store_req & ~cache_op_req;
assign external_inv = extern_inv & CONFIG.DCACHE.USE_EXTERNAL_INVALIDATIONS;
assign porta_addr = miss_req ? addr_utils.getTagLineAddr(miss_addr) : external_inv ? addr_utils.getTagLineAddr(inv_addr) : addr_utils.getTagLineAddr(store_addr);
assign portb_addr = addr_utils.getTagLineAddr(load_addr);
assign extern_inv_complete = external_inv & ~miss_req;
assign new_tagline = '{valid: miss_req, tag: addr_utils.getTag(miss_addr)};
////////////////////////////////////////////////////
//Memory instantiation and hit detection
generate for (genvar i = 0; i < CONFIG.DCACHE.WAYS; i++) begin : tag_bank_gen
dual_port_bram #(.WIDTH($bits(dtag_entry_t)), .LINES(CONFIG.DCACHE.LINES)) dtag_bank (
.clk (clk),
.en_a (store_req | (miss_req & miss_way[i]) | external_inv),
.wen_a ((miss_req & miss_way[i]) | external_inv | (store_req & cache_op_req)),
.addr_a (porta_addr),
.data_in_a (new_tagline),
.data_out_a (tag_line_a[i]),
.en_b (load_req),
.wen_b ('0),
.addr_b (portb_addr),
.data_in_b ('0),
.data_out_b(tag_line_b[i])
);
assign store_tag_hit_way[i] = ({store_req_r, 1'b1, addr_utils.getTag(store_addr_r)} == {1'b1, tag_line_a[i]});
assign load_tag_hit_way[i] = ({load_req_r, 1'b1, addr_utils.getTag(miss_addr)} == {1'b1, tag_line_b[i]});
end endgenerate
assign load_tag_hit = |load_tag_hit_way;
assign store_tag_hit = |store_tag_hit_way;
endmodule

View file

@ -49,20 +49,33 @@ module load_store_queue //ID-based input buffer for Load/Store Unit
localparam DOUBLE_MIN_WIDTH = FLEN >= 32 ? 32 : FLEN;
typedef struct packed {
logic [31:0] addr;
logic [11:0] offset;
logic [2:0] fn3;
logic fp;
logic double;
logic amo;
amo_t amo_type;
logic [31:0] amo_wdata;
id_t id;
logic store_collision;
logic [LOG2_SQ_DEPTH-1:0] sq_index;
} lq_entry_t;
typedef struct packed {
logic discard;
logic [19:0] addr;
ls_subunit_t subunit;
} addr_entry_t;
logic [LOG2_SQ_DEPTH-1:0] sq_index;
logic [LOG2_SQ_DEPTH-1:0] sq_oldest;
addr_hash_t addr_hash;
logic potential_store_conflict;
logic lq_addr_discard;
logic sq_addr_discard;
logic load_blocked;
logic load_pop;
logic load_addr_bit_3;
logic [2:0] load_fn3;
@ -72,7 +85,9 @@ module load_store_queue //ID-based input buffer for Load/Store Unit
logic [31:0] store_data;
fifo_interface #(.DATA_TYPE(lq_entry_t)) lq();
fifo_interface #(.DATA_TYPE(addr_entry_t)) lq_addr();
store_queue_interface sq();
fifo_interface #(.DATA_TYPE(addr_entry_t)) sq_addr();
////////////////////////////////////////////////////
//Implementation
@ -85,7 +100,7 @@ module load_store_queue //ID-based input buffer for Load/Store Unit
//Address hash for load-store collision checking
addr_hash #(.USE_BIT_3(~CONFIG.INCLUDE_UNIT.FPU))
lsq_addr_hash (
.addr (lsq.data_in.addr),
.addr (lsq.data_in.offset),
.addr_hash (addr_hash)
);
@ -97,31 +112,49 @@ module load_store_queue //ID-based input buffer for Load/Store Unit
.rst(rst),
.fifo(lq)
);
cva5_fifo #(.DATA_TYPE(addr_entry_t), .FIFO_DEPTH(MAX_IDS))
load_queue_addr_fifo (
.clk(clk),
.rst(rst),
.fifo(lq_addr)
);
//FIFO control signals
assign lq.push = lsq.push & lsq.data_in.load;
assign lq.potential_push = lsq.potential_push;
assign lq.pop = load_pop;
assign lq.pop = load_pop | lq_addr_discard;
assign lq_addr.push = lsq.addr_push & lsq.addr_data_in.rnw;
assign lq_addr.potential_push = lq_addr.push;
assign lq_addr.data_in.addr = lsq.addr_data_in.addr;
assign lq_addr.data_in.subunit = lsq.addr_data_in.subunit;
assign lq_addr.data_in.discard = lsq.addr_data_in.discard;
assign lq_addr.pop = load_pop | lq_addr_discard;
assign lq_addr_discard = lq_addr.valid ? lq_addr.data_out.discard : lsq.addr_push & lsq.addr_data_in.rnw & lsq.addr_data_in.discard;
//FIFO data ports
assign lq.data_in = '{
addr : lsq.data_in.addr,
offset : lsq.data_in.offset,
fn3 : lsq.data_in.fn3,
fp : lsq.data_in.fp,
double : lsq.data_in.double,
amo : lsq.data_in.amo,
amo_type : lsq.data_in.amo_type,
amo_wdata : lsq.data_in.data,
id : lsq.data_in.id,
store_collision : potential_store_conflict,
store_collision : potential_store_conflict | (CONFIG.INCLUDE_AMO & lsq.data_in.amo), //Collision forces sequential consistence
sq_index : sq_index
};
////////////////////////////////////////////////////
//Store Queue
assign sq.push = lsq.push & (lsq.data_in.store | lsq.data_in.cache_op);
assign sq.pop = store_pop;
assign sq.pop = store_pop | sq_addr_discard;
assign sq.data_in = lsq.data_in;
store_queue # (.CONFIG(CONFIG)) sq_block (
.clk (clk),
.rst (rst | gc.sq_flush),
.rst (rst),
.sq (sq),
.store_forward_wb_group (store_forward_wb_group),
.fp_store_forward_wb_group (fp_store_forward_wb_group),
@ -133,6 +166,22 @@ module load_store_queue //ID-based input buffer for Load/Store Unit
.fp_wb_packet (fp_wb_packet),
.store_retire (store_retire)
);
cva5_fifo #(.DATA_TYPE(addr_entry_t), .FIFO_DEPTH(CONFIG.SQ_DEPTH))
store_queue_addr_fifo (
.clk(clk),
.rst(rst),
.fifo(sq_addr)
);
assign sq_addr.push = lsq.addr_push & ~lsq.addr_data_in.rnw;
assign sq_addr.potential_push = sq_addr.push;
assign sq_addr.data_in.addr = lsq.addr_data_in.addr;
assign sq_addr.data_in.subunit = lsq.addr_data_in.subunit;
assign sq_addr.data_in.discard = lsq.addr_data_in.discard;
assign sq_addr.pop = store_pop | sq_addr_discard;
assign sq_addr_discard = sq.valid & (~lq.valid | load_blocked) & (sq_addr.valid ? sq_addr.data_out.discard : lsq.addr_push & ~lsq.addr_data_in.rnw & lsq.addr_data_in.discard);
////////////////////////////////////////////////////
//Output
@ -148,7 +197,7 @@ module load_store_queue //ID-based input buffer for Load/Store Unit
assign load_fp_hold = ~load_p2 & lq.data_out.double;
assign load_pop = lsq.load_pop & ~load_fp_hold;
assign load_addr_bit_3 = load_fp_hold | lq.data_out.addr[2];
assign load_addr_bit_3 = load_fp_hold | lq.data_out.offset[2];
assign load_fn3 = lq.data_out.fp ? LS_W_fn3 : lq.data_out.fn3;
always_comb begin
@ -171,7 +220,7 @@ module load_store_queue //ID-based input buffer for Load/Store Unit
end else begin : gen_no_load_split
//All loads are single cycle (load only the upper word)
assign load_pop = lsq.load_pop;
assign load_addr_bit_3 = lq.data_out.addr[2] | lq.data_out.double;
assign load_addr_bit_3 = lq.data_out.offset[2] | lq.data_out.double;
assign load_fn3 = lq.data_out.fp ? LS_W_fn3 : lq.data_out.fn3;
always_comb begin
if (lq.data_out.double)
@ -194,7 +243,7 @@ module load_store_queue //ID-based input buffer for Load/Store Unit
assign store_fp_hold = ~store_p2 & sq.data_out.double;
assign store_pop = lsq.store_pop & ~store_fp_hold;
assign store_addr_bit_3 = sq.data_out.double ? store_p2 : sq.data_out.addr[2];
assign store_addr_bit_3 = sq.data_out.double ? store_p2 : sq.data_out.offset[2];
always_ff @(posedge clk) begin
if (rst)
@ -217,47 +266,52 @@ module load_store_queue //ID-based input buffer for Load/Store Unit
end else begin : gen_no_fpu
//Plain integer memory operations
assign load_pop = lsq.load_pop;
assign load_addr_bit_3 = lq.data_out.addr[2];
assign load_addr_bit_3 = lq.data_out.offset[2];
assign load_fn3 = lq.data_out.fn3;
assign load_type = INT_DONE;
assign store_pop = lsq.store_pop;
assign store_addr_bit_3 = sq.data_out.addr[2];
assign store_addr_bit_3 = sq.data_out.offset[2];
assign store_data = sq.data_out.data;
end
endgenerate
logic load_blocked;
assign load_blocked = (lq.data_out.store_collision & (lq.data_out.sq_index != sq_oldest));
assign lsq.load_valid = lq.valid & ~load_blocked;
assign lsq.store_valid = sq.valid;
//Requests are only valid if the TLB has returned the physical address and there was no exception
assign lsq.load_valid = lq.valid & ~load_blocked & (lq_addr.valid ? ~lq_addr.data_out.discard : lsq.addr_push & lsq.addr_data_in.rnw & ~lsq.addr_data_in.discard);
assign lsq.store_valid = sq.valid & (sq_addr.valid ? ~sq_addr.data_out.discard : lsq.addr_push & ~lsq.addr_data_in.rnw & ~lsq.addr_data_in.discard);
assign lsq.load_data_out = '{
addr : {lq.data_out.addr[31:3], load_addr_bit_3, lq.data_out.addr[1:0]},
addr : {(lq_addr.valid ? lq_addr.data_out.addr : lsq.addr_data_in.addr), lq.data_out.offset[11:3], load_addr_bit_3, lq.data_out.offset[1:0]},
load : 1,
store : 0,
cache_op : 0,
be : 'x,
amo : lq.data_out.amo,
amo_type : lq.data_out.amo_type,
be : '1,
fn3 : load_fn3,
data_in : 'x,
subunit : lq_addr.valid ? lq_addr.data_out.subunit : lsq.addr_data_in.subunit,
data_in : CONFIG.INCLUDE_AMO ? lq.data_out.amo_wdata : 'x,
id : lq.data_out.id,
fp_op : load_type
};
assign lsq.store_data_out = '{
addr : {sq.data_out.addr[31:3], store_addr_bit_3, sq.data_out.addr[1:0]},
addr : {(sq_addr.valid ? sq_addr.data_out.addr : lsq.addr_data_in.addr), sq.data_out.offset[11:3], store_addr_bit_3, sq.data_out.offset[1:0]},
load : 0,
store : 1,
cache_op : sq.data_out.cache_op,
amo : 0,
amo_type : amo_t'('x),
be : sq.data_out.be,
fn3 : 'x,
subunit : sq_addr.valid ? sq_addr.data_out.subunit : lsq.addr_data_in.subunit,
data_in : store_data,
id : 'x,
fp_op : fp_ls_op_t'('x)
};
assign lsq.sq_empty = sq.empty;
assign lsq.no_released_stores_pending = sq.no_released_stores_pending;
assign lsq.empty = ~lq.valid & sq.empty;
////////////////////////////////////////////////////

392
core/execution_units/load_store_unit/load_store_unit.sv Executable file → Normal file
View file

@ -26,6 +26,7 @@ module load_store_unit
import riscv_types::*;
import cva5_types::*;
import fpu_types::*;
import csr_types::*;
import opcodes::*;
# (
@ -62,12 +63,8 @@ module load_store_unit
input logic dcache_on,
input logic clear_reservation,
tlb_interface.requester tlb,
input logic tlb_on,
l1_arbiter_request_interface.master l1_request,
l1_arbiter_return_interface.master l1_response,
input sc_complete,
input sc_success,
mem_interface.rw_master mem,
axi_interface.master m_axi,
avalon_interface.master m_avalon,
@ -75,11 +72,17 @@ module load_store_unit
local_memory_interface.master data_bram,
//CSR
input logic [1:0] current_privilege,
input envcfg_t menvcfg,
input envcfg_t senvcfg,
//Writeback-Store Interface
input wb_packet_t wb_packet [CONFIG.NUM_WB_GROUPS],
input fp_wb_packet_t fp_wb_packet [2],
//Retire release
//Retire
input id_t retire_id,
input retire_packet_t store_retire,
exception_interface.unit exception,
@ -99,6 +102,7 @@ module load_store_unit
localparam ATTRIBUTES_DEPTH = 1;
//Subunit signals
amo_interface amo_if[NUM_SUB_UNITS]();
addr_utils_interface #(CONFIG.DLOCAL_MEM_ADDR.L, CONFIG.DLOCAL_MEM_ADDR.H) dlocal_mem_addr_utils ();
addr_utils_interface #(CONFIG.PERIPHERAL_BUS_ADDR.L, CONFIG.PERIPHERAL_BUS_ADDR.H) dpbus_addr_utils ();
addr_utils_interface #(CONFIG.DCACHE_ADDR.L, CONFIG.DCACHE_ADDR.H) dcache_addr_utils ();
@ -111,11 +115,14 @@ module load_store_unit
data_access_shared_inputs_t shared_inputs;
logic [31:0] unit_data_array [NUM_SUB_UNITS-1:0];
logic [NUM_SUB_UNITS-1:0] unit_ready;
logic [NUM_SUB_UNITS-1:0] unit_write_outstanding;
logic write_outstanding;
logic [NUM_SUB_UNITS-1:0] unit_data_valid;
logic [NUM_SUB_UNITS-1:0] last_unit;
logic [NUM_SUB_UNITS_W-1:0] last_unit;
logic sub_unit_ready;
logic [NUM_SUB_UNITS_W-1:0] subunit_id;
ls_subunit_t padded_subunit_id;
logic unit_switch;
logic unit_switch_in_progress;
@ -126,6 +133,7 @@ module load_store_unit
logic sub_unit_load_issue;
logic sub_unit_store_issue;
logic load_response;
logic load_complete;
logic [31:0] virtual_address;
@ -134,10 +142,20 @@ module load_store_unit
logic [31:0] aligned_load_data;
logic [31:0] final_load_data;
logic tlb_request_r;
logic tlb_lq;
logic unaligned_addr;
logic load_exception_complete;
logic exception_is_fp;
logic exception_is_store;
logic nontrivial_fence;
logic fence_hold;
logic illegal_cbo;
logic exception_lsq_push;
logic nomatch_fault;
logic late_exception;
id_t exception_id;
typedef struct packed{
logic is_signed;
@ -166,14 +184,19 @@ module load_store_unit
assign unit_needed = instruction inside {LB, LH, LW, LBU, LHU, SB, SH, SW, FENCE} |
(CONFIG.INCLUDE_CBO & instruction inside {CBO_INVAL, CBO_CLEAN, CBO_FLUSH}) |
(CONFIG.INCLUDE_UNIT.FPU & instruction inside {SP_FLW, SP_FSW, DP_FLD, DP_FSD});
(CONFIG.INCLUDE_UNIT.FPU & instruction inside {SP_FLW, SP_FSW, DP_FLD, DP_FSD}) |
(CONFIG.INCLUDE_AMO & instruction inside {AMO_ADD, AMO_XOR, AMO_OR, AMO_AND, AMO_MIN, AMO_MAX, AMO_MINU, AMO_MAXU, AMO_SWAP, AMO_LR, AMO_SC});
always_comb begin
uses_rs = '0;
uses_rs[RS1] = instruction inside {LB, LH, LW, LBU, LHU, SB, SH, SW} |
(CONFIG.INCLUDE_CBO & instruction inside {CBO_INVAL, CBO_CLEAN, CBO_FLUSH}) |
(CONFIG.INCLUDE_UNIT.FPU & instruction inside {SP_FLW, SP_FSW, DP_FLD, DP_FSD});
uses_rs[RS2] = CONFIG.INCLUDE_FORWARDING_TO_STORES ? 0 : instruction inside {SB, SH, SW};
uses_rd = instruction inside {LB, LH, LW, LBU, LHU};
(CONFIG.INCLUDE_UNIT.FPU & instruction inside {SP_FLW, SP_FSW, DP_FLD, DP_FSD}) |
(CONFIG.INCLUDE_AMO & instruction inside {AMO_ADD, AMO_XOR, AMO_OR, AMO_AND, AMO_MIN, AMO_MAX, AMO_MINU, AMO_MAXU, AMO_SWAP, AMO_LR, AMO_SC});
if (CONFIG.INCLUDE_AMO)
uses_rs[RS2] = instruction inside {AMO_ADD, AMO_XOR, AMO_OR, AMO_AND, AMO_MIN, AMO_MAX, AMO_MINU, AMO_MAXU, AMO_SWAP, AMO_SC};
if (~CONFIG.INCLUDE_FORWARDING_TO_STORES)
uses_rs[RS2] |= instruction inside {SB, SH, SW};
uses_rd = instruction inside {LB, LH, LW, LBU, LHU} | (CONFIG.INCLUDE_AMO & instruction inside {AMO_ADD, AMO_XOR, AMO_OR, AMO_AND, AMO_MIN, AMO_MAX, AMO_MINU, AMO_MAXU, AMO_SWAP, AMO_LR, AMO_SC});
fp_uses_rs = '0;
fp_uses_rs[RS2] = ~CONFIG.INCLUDE_FORWARDING_TO_STORES & CONFIG.INCLUDE_UNIT.FPU & instruction inside {SP_FSW, DP_FSD};
fp_uses_rd = CONFIG.INCLUDE_UNIT.FPU & instruction inside {SP_FLW, DP_FLD};
@ -186,8 +209,13 @@ module load_store_unit
logic is_store;
logic is_fence;
logic is_cbo;
cbo_t cbo_type;
logic is_fpu;
logic is_double;
logic nontrivial_fence;
logic is_amo;
amo_t amo_type;
logic rd_zero;
logic [11:0] offset;
} ls_attr_t;
ls_attr_t decode_attr;
@ -198,17 +226,55 @@ module load_store_unit
assign load_offset = instruction[31:20];
assign store_offset = {instruction[31:25], instruction[11:7]};
//Only a reduced subset of possible fences require stalling, because of the following guarantees:
//The load queue does not reorder loads
//The store queue does not reorder stores
//Earlier loads are always selected before later stores
//The data cache and local memory are sequentially consistent (no reordering)
//All peripheral busses are sequentially consistent across request types
always_comb begin
if (NUM_SUB_UNITS == 3)
nontrivial_fence = (
(instruction[27] & (instruction[22] | instruction[20])) | //Peripheral read before any write
(instruction[26] & (instruction[23] | |instruction[21:20])) | //Peripheral write before anything other than a peripheral write
(instruction[25] & instruction[22]) | //Regular read before peripheral write
(instruction[24]) //Regular write before anything
);
else if (NUM_SUB_UNITS == 2 & ~CONFIG.INCLUDE_PERIPHERAL_BUS)
nontrivial_fence = instruction[24] & |instruction[21:20]; //Regular write before any regular
else if (NUM_SUB_UNITS == 2)
nontrivial_fence = (
(instruction[27] & (instruction[22] | instruction[20])) | //Peripheral read before any write
(instruction[26] & (instruction[23] | |instruction[21:20])) | //Peripheral write before anything other than a peripheral write
(instruction[25] & instruction[22]) | //Memory read before peripheral write
(instruction[24] & |instruction[23:21]) //Memory write before anything other than a memory write
);
else if (NUM_SUB_UNITS == 1 & ~CONFIG.INCLUDE_PERIPHERAL_BUS)
nontrivial_fence = instruction[24] & instruction[21]; //Memory write before memory read
else if (NUM_SUB_UNITS == 1 & CONFIG.INCLUDE_PERIPHERAL_BUS)
nontrivial_fence = (
(instruction[27] & instruction[22]) | //Peripheral read before peripheral write
(instruction[26] & instruction[23]) //Peripheral write before peripheral read
);
else //0 subunits??
nontrivial_fence = 0;
end
assign decode_attr = '{
is_load : instruction inside {LB, LH, LW, LBU, LHU} | CONFIG.INCLUDE_UNIT.FPU & instruction inside {SP_FLW, DP_FLD},
is_load : ~instruction.upper_opcode[5] & ~instruction.upper_opcode[3],
is_store : instruction inside {SB, SH, SW} | CONFIG.INCLUDE_UNIT.FPU & instruction inside {SP_FSW, DP_FSD},
is_fence : instruction inside {FENCE},
is_fence : ~instruction.fn3[1] & instruction.upper_opcode[3],
nontrivial_fence : nontrivial_fence,
is_cbo : CONFIG.INCLUDE_CBO & instruction inside {CBO_INVAL, CBO_CLEAN, CBO_FLUSH},
is_fpu : CONFIG.INCLUDE_UNIT.FPU & instruction inside {SP_FLW, SP_FSW, DP_FLD, DP_FSD},
is_double : CONFIG.INCLUDE_UNIT.FPU & instruction inside {DP_FLD, DP_FSD},
offset : instruction[5] ? store_offset : ((CONFIG.INCLUDE_CBO & instruction[2]) ? '0 : load_offset)
cbo_type : cbo_t'(instruction[21:20]),
is_fpu : CONFIG.INCLUDE_UNIT.FPU & instruction.upper_opcode[3:2] == 2'b01,
is_double : CONFIG.INCLUDE_UNIT.FPU & instruction.fn3[1:0] == 2'b11,
is_amo : CONFIG.INCLUDE_AMO & instruction.upper_opcode[3] & instruction.upper_opcode[5],
amo_type : amo_t'(instruction[31:27]),
rd_zero : ~|instruction.rd_addr,
offset : (CONFIG.INCLUDE_CBO | CONFIG.INCLUDE_AMO) & instruction[3] ? '0 : (instruction[5] ? store_offset : load_offset)
};
assign decode_is_store = decode_attr.is_store | decode_attr.is_cbo;
assign decode_is_store = decode_attr.is_store | decode_attr.is_cbo; //Must be exact
always_ff @(posedge clk) begin
if (issue_stage_ready)
@ -238,8 +304,36 @@ module load_store_unit
);
////////////////////////////////////////////////////
//Alignment Exception
generate if (CONFIG.INCLUDE_M_MODE) begin : gen_ls_exceptions
//CSR Permissions
//Can impact fences, atomic instructions, and CBO
logic fiom;
logic fiom_amo_hold;
generate if (CONFIG.MODES inside {MU, MSU}) begin : gen_csr_env
//Fence on IO implies memory; force all fences to be nontrivial for simplicity
always_comb begin
if (CONFIG.MODES == MU)
fiom = current_privilege == USER_PRIVILEGE & menvcfg.fiom;
else
fiom = (current_privilege != MACHINE_PRIVILEGE & menvcfg.fiom) | (current_privilege == USER_PRIVILEGE & senvcfg.fiom);
end
//AMO instructions AQ-RL consider all memory regions; force write drain for simplicity
logic fiom_amo_hold_r;
logic set_fiom_amo_hold;
assign set_fiom_amo_hold = lsq.load_valid & shared_inputs.amo & fiom & write_outstanding;
assign fiom_amo_hold = set_fiom_amo_hold | fiom_amo_hold_r;
always_ff @(posedge clk) begin
if (rst | ~write_outstanding)
fiom_amo_hold_r <= 0;
else
fiom_amo_hold_r <= fiom_amo_hold_r | set_fiom_amo_hold;
end
end endgenerate
////////////////////////////////////////////////////
//Exceptions
generate if (CONFIG.MODES != BARE) begin : gen_ls_exceptions
logic new_exception;
always_comb begin
if (issue_stage.fn3 == LS_H_fn3 | issue_stage.fn3 == L_HU_fn3)
@ -254,53 +348,103 @@ module load_store_unit
unaligned_addr = 0;
end
assign new_exception = unaligned_addr & issue.new_request & ~issue_attr.is_fence;
logic menv_illegal;
logic senv_illegal;
assign menv_illegal = CONFIG.INCLUDE_CBO & (issue_attr.is_cbo & issue_attr.cbo_type == INVAL ? menvcfg.cbie == 2'b00 : ~menvcfg.cbcfe);
assign senv_illegal = CONFIG.INCLUDE_CBO & (issue_attr.is_cbo & issue_attr.cbo_type == INVAL ? senvcfg.cbie == 2'b00 : ~senvcfg.cbcfe);
assign illegal_cbo = CONFIG.MODES == MU ? current_privilege == USER_PRIVILEGE & menv_illegal : (current_privilege != MACHINE_PRIVILEGE & menv_illegal) | (current_privilege == USER_PRIVILEGE & senv_illegal);
assign nomatch_fault = tlb.done & ~|sub_unit_address_match;
assign late_exception = tlb.is_fault | nomatch_fault;
//Hold writeback exceptions until they are ready to retire
logic rd_zero_r;
logic delay_exception;
logic delayed_exception;
assign delay_exception = (
(issue.new_request & unaligned_addr & (issue_attr.is_load | issue_attr.is_amo) & issue.id != retire_id & ~issue_attr.rd_zero) |
(late_exception & tlb_lq & exception_id != retire_id & ~rd_zero_r)
);
always_ff @(posedge clk) begin
if (rst)
delayed_exception <= 0;
else if (delay_exception)
delayed_exception <= 1;
else if (new_exception)
delayed_exception <= 0;
end
assign new_exception = (
(issue.new_request & ((unaligned_addr & issue_attr.is_store) | illegal_cbo)) |
(issue.new_request & unaligned_addr & (issue_attr.is_load | issue_attr.is_amo) & (issue.id == retire_id | issue_attr.rd_zero)) |
(late_exception & ~tlb_lq) |
(late_exception & tlb_lq & (exception_id == retire_id | rd_zero_r)) |
(delayed_exception & exception_id == retire_id)
);
always_ff @(posedge clk) begin
if (rst)
exception.valid <= 0;
else
exception.valid <= (exception.valid & ~exception.ack) | new_exception;
exception.valid <= new_exception;
end
logic is_load;
logic is_load_r;
assign is_load = issue_attr.is_load & ~(issue_attr.is_amo & issue_attr.amo_type != AMO_LR_FN5);
always_ff @(posedge clk) begin
if (rst)
exception_is_fp <= 0;
else if (new_exception)
exception_lsq_push <= issue.new_request & ((unaligned_addr & ~issue_attr.is_fence & ~issue_attr.is_cbo) | illegal_cbo);
if (issue.new_request) begin
rd_zero_r <= issue_attr.rd_zero;
exception_is_fp <= CONFIG.INCLUDE_UNIT.FPU & issue_attr.is_fpu;
end
always_ff @(posedge clk) begin
if (new_exception & ~exception.valid) begin
exception.code <= issue_attr.is_store ? STORE_AMO_ADDR_MISSALIGNED : LOAD_ADDR_MISSALIGNED;
exception.tval <= virtual_address;
exception.id <= issue.id;
is_load_r <= is_load;
if (illegal_cbo) begin
exception.code <= ILLEGAL_INST;
exception.tval <= issue_stage.instruction;
end else begin
exception.code <= is_load ? LOAD_ADDR_MISSALIGNED : STORE_AMO_ADDR_MISSALIGNED;
exception.tval <= virtual_address;
end
exception_id <= issue.id;
end
else if (tlb.is_fault)
exception.code <= is_load_r ? LOAD_PAGE_FAULT : STORE_OR_AMO_PAGE_FAULT;
else if (nomatch_fault)
exception.code <= is_load_r ? LOAD_FAULT : STORE_AMO_FAULT;
end
assign exception.possible = (tlb_request_r & (~tlb.done | ~|sub_unit_address_match)) | exception.valid | delayed_exception; //Must suppress issue for issue-time exceptions too
assign exception.pc = issue_stage.pc_r;
assign exception.discard = tlb_lq & ~rd_zero_r;
always_ff @(posedge clk) begin
if (rst)
load_exception_complete <= 0;
else
load_exception_complete <= exception.valid & exception.ack & (exception.code == LOAD_ADDR_MISSALIGNED);
end
assign exception_is_store = ~tlb_lq;
end endgenerate
////////////////////////////////////////////////////
//Load-Store status
assign load_store_status = '{
sq_empty : lsq.sq_empty,
no_released_stores_pending : lsq.no_released_stores_pending,
idle : lsq.empty & (~load_attributes.valid) & (&unit_ready)
outstanding_store : ~lsq.sq_empty | write_outstanding,
idle : lsq.empty & (~load_attributes.valid) & (&unit_ready) & (~write_outstanding)
};
////////////////////////////////////////////////////
//TLB interface
//Address calculation
assign virtual_address = rf[RS1] + 32'(signed'(issue_attr.offset));
////////////////////////////////////////////////////
//TLB interface
always_ff @(posedge clk) begin
if (rst)
tlb_request_r <= 0;
else if (tlb.new_request)
tlb_request_r <= 1;
else if (tlb.done | tlb.is_fault)
tlb_request_r <= 0;
end
assign tlb.rnw = issue_attr.is_load | (issue_attr.is_amo & issue_attr.amo_type == AMO_LR_FN5) | issue_attr.is_cbo;
assign tlb.virtual_address = virtual_address;
assign tlb.new_request = tlb_on & issue.new_request;
assign tlb.execute = 0;
assign tlb.rnw = issue_attr.is_load & ~issue_attr.is_store;
assign tlb.new_request = issue.new_request & ~issue_attr.is_fence & (~unaligned_addr | issue_attr.is_cbo) & ~illegal_cbo;
////////////////////////////////////////////////////
//Byte enable generation
@ -318,18 +462,22 @@ module load_store_unit
end
default : be = '1;
endcase
if (issue_attr.is_cbo) //Treat CBOM as writes that don't do anything
be = '0;
end
////////////////////////////////////////////////////
//Load Store Queue
assign lsq.data_in = '{
addr : tlb_on ? tlb.physical_address : virtual_address,
offset : virtual_address[11:0],
fn3 : issue_stage.fn3,
be : be,
data : rf[RS2],
load : issue_attr.is_load,
load : issue_attr.is_load | issue_attr.is_amo,
store : issue_attr.is_store,
cache_op : issue_attr.is_cbo,
amo : issue_attr.is_amo,
amo_type : issue_attr.amo_type,
id : issue.id,
id_needed : rd_attributes.id,
fp : issue_attr.is_fpu,
@ -338,7 +486,7 @@ module load_store_unit
};
assign lsq.potential_push = issue.possible_issue;
assign lsq.push = issue.new_request & ~unaligned_addr & (~tlb_on | tlb.done) & ~issue_attr.is_fence;
assign lsq.push = issue.new_request & ~issue_attr.is_fence;
load_store_queue # (.CONFIG(CONFIG)) lsq_block (
.clk (clk),
@ -355,48 +503,67 @@ module load_store_unit
assign lsq.load_pop = sub_unit_load_issue;
assign lsq.store_pop = sub_unit_store_issue;
//Physical address passed separately
assign lsq.addr_push = tlb.done | tlb.is_fault | exception_lsq_push;
assign lsq.addr_data_in = '{
addr : tlb.physical_address[31:12],
rnw : tlb_lq,
discard : late_exception | exception_lsq_push,
subunit : padded_subunit_id
};
always_ff @(posedge clk) begin
if (issue.new_request)
tlb_lq <= ~issue_attr.is_store & ~issue_attr.is_cbo;
end
////////////////////////////////////////////////////
//Unit tracking
always_ff @ (posedge clk) begin
if (load_attributes.push)
last_unit <= sub_unit_address_match;
last_unit <= subunit_id;
end
//When switching units, ensure no outstanding loads so that there can be no timing collisions with results
assign unit_switch = lsq.load_valid & (sub_unit_address_match != last_unit) & load_attributes.valid;
assign unit_switch = lsq.load_valid & (subunit_id != last_unit) & load_attributes.valid;
always_ff @ (posedge clk) begin
unit_switch_in_progress <= (unit_switch_in_progress | unit_switch) & ~load_attributes.valid;
end
assign unit_switch_hold = unit_switch | unit_switch_in_progress;
assign unit_switch_hold = unit_switch | unit_switch_in_progress | fiom_amo_hold;
////////////////////////////////////////////////////
//Primary Control Signals
assign sel_load = lsq.load_valid;
assign sub_unit_ready = unit_ready[subunit_id] & (~unit_switch_hold);
assign load_complete = |unit_data_valid;
assign load_response = |unit_data_valid;
assign load_complete = load_response & (~exception.valid | exception_is_store);
assign issue.ready = (~tlb_on | tlb.ready) & (~lsq.full) & (~fence_hold) & (~exception.valid);
//TLB status and exceptions can be ignored because they will prevent instructions from issuing
assign issue.ready = ~lsq.full & ~fence_hold;
assign sub_unit_load_issue = sel_load & lsq.load_valid & sub_unit_ready & sub_unit_address_match[subunit_id];
assign sub_unit_store_issue = (lsq.store_valid & ~sel_load) & sub_unit_ready & sub_unit_address_match[subunit_id];
assign sub_unit_load_issue = sel_load & lsq.load_valid & sub_unit_ready;
assign sub_unit_store_issue = (lsq.store_valid & ~sel_load) & sub_unit_ready;
assign sub_unit_issue = sub_unit_load_issue | sub_unit_store_issue;
assign write_outstanding = |unit_write_outstanding;
always_ff @ (posedge clk) begin
if (rst)
fence_hold <= 0;
else
fence_hold <= (fence_hold & ~load_store_status.idle) | (issue.new_request & issue_attr.is_fence);
fence_hold <= (fence_hold & ~load_store_status.idle) | (issue.new_request & issue_attr.is_fence & (issue_attr.nontrivial_fence | fiom));
end
////////////////////////////////////////////////////
//Load attributes FIFO
logic [1:0] final_mux_sel;
assign subunit_id = shared_inputs.subunit[NUM_SUB_UNITS_W-1:0];
one_hot_to_integer #(NUM_SUB_UNITS)
sub_unit_select (
.one_hot (sub_unit_address_match),
.int_out (subunit_id)
.int_out (padded_subunit_id[NUM_SUB_UNITS_W-1:0])
);
always_comb begin
@ -431,7 +598,7 @@ module load_store_unit
////////////////////////////////////////////////////
//Unit Instantiation
generate for (genvar i=0; i < NUM_SUB_UNITS; i++) begin : gen_load_store_sources
assign sub_unit[i].new_request = sub_unit_issue & sub_unit_address_match[i];
assign sub_unit[i].new_request = sub_unit_issue & subunit_id == i;
assign sub_unit[i].addr = shared_inputs.addr;
assign sub_unit[i].re = shared_inputs.load;
assign sub_unit[i].we = shared_inputs.store;
@ -445,10 +612,14 @@ module load_store_unit
endgenerate
generate if (CONFIG.INCLUDE_DLOCAL_MEM) begin : gen_ls_local_mem
assign sub_unit_address_match[LOCAL_MEM_ID] = dlocal_mem_addr_utils.address_range_check(shared_inputs.addr);
assign sub_unit_address_match[LOCAL_MEM_ID] = dlocal_mem_addr_utils.address_range_check(tlb.physical_address);
local_mem_sub_unit d_local_mem (
.clk (clk),
.rst (rst),
.write_outstanding (unit_write_outstanding[LOCAL_MEM_ID]),
.amo (shared_inputs.amo),
.amo_type (shared_inputs.amo_type),
.amo_unit (amo_if[LOCAL_MEM_ID]),
.unit (sub_unit[LOCAL_MEM_ID]),
.local_mem (data_bram)
);
@ -456,27 +627,38 @@ module load_store_unit
endgenerate
generate if (CONFIG.INCLUDE_PERIPHERAL_BUS) begin : gen_ls_pbus
assign sub_unit_address_match[BUS_ID] = dpbus_addr_utils.address_range_check(shared_inputs.addr);
if(CONFIG.PERIPHERAL_BUS_TYPE == AXI_BUS)
assign sub_unit_address_match[BUS_ID] = dpbus_addr_utils.address_range_check(tlb.physical_address);
if(CONFIG.PERIPHERAL_BUS_TYPE == AXI_BUS) begin : gen_axi
axi_master axi_bus (
.clk (clk),
.rst (rst),
.write_outstanding (unit_write_outstanding[BUS_ID]),
.m_axi (m_axi),
.size ({1'b0,shared_inputs.fn3[1:0]}),
.amo (shared_inputs.amo),
.amo_type (shared_inputs.amo_type),
.amo_unit (amo_if[BUS_ID]),
.ls (sub_unit[BUS_ID])
); //Lower two bits of fn3 match AXI specification for request size (byte/halfword/word)
else if (CONFIG.PERIPHERAL_BUS_TYPE == WISHBONE_BUS)
wishbone_master wishbone_bus (
end else if (CONFIG.PERIPHERAL_BUS_TYPE == WISHBONE_BUS) begin : gen_wishbone
wishbone_master #(.LR_WAIT(CONFIG.AMO_UNIT.LR_WAIT), .INCLUDE_AMO(CONFIG.INCLUDE_AMO)) wishbone_bus (
.clk (clk),
.rst (rst),
.write_outstanding (unit_write_outstanding[BUS_ID]),
.wishbone (dwishbone),
.amo (shared_inputs.amo),
.amo_type (shared_inputs.amo_type),
.amo_unit (amo_if[BUS_ID]),
.ls (sub_unit[BUS_ID])
);
else if (CONFIG.PERIPHERAL_BUS_TYPE == AVALON_BUS) begin
avalon_master avalon_bus (
end else if (CONFIG.PERIPHERAL_BUS_TYPE == AVALON_BUS) begin : gen_avalon
avalon_master #(.LR_WAIT(CONFIG.AMO_UNIT.LR_WAIT), .INCLUDE_AMO(CONFIG.INCLUDE_AMO)) avalon_bus (
.clk (clk),
.rst (rst),
.m_avalon (m_avalon),
.write_outstanding (unit_write_outstanding[BUS_ID]),
.m_avalon (m_avalon),
.amo (shared_inputs.amo),
.amo_type (shared_inputs.amo_type),
.amo_unit (amo_if[BUS_ID]),
.ls (sub_unit[BUS_ID])
);
end
@ -484,46 +666,53 @@ module load_store_unit
endgenerate
generate if (CONFIG.INCLUDE_DCACHE) begin : gen_ls_dcache
logic load_ready;
logic store_ready;
logic uncacheable_load;
logic uncacheable_store;
logic dcache_load_request;
logic dcache_store_request;
assign sub_unit_address_match[DCACHE_ID] = dcache_addr_utils.address_range_check(shared_inputs.addr);
assign sub_unit_address_match[DCACHE_ID] = dcache_addr_utils.address_range_check(tlb.physical_address);
assign uncacheable_load = CONFIG.DCACHE.USE_NON_CACHEABLE & uncacheable_utils.address_range_check(shared_inputs.addr);
assign uncacheable_store = CONFIG.DCACHE.USE_NON_CACHEABLE & uncacheable_utils.address_range_check(shared_inputs.addr);
assign dcache_load_request = sub_unit_load_issue & sub_unit_address_match[DCACHE_ID];
assign dcache_store_request = sub_unit_store_issue & sub_unit_address_match[DCACHE_ID];
dcache # (.CONFIG(CONFIG))
data_cache (
.clk (clk),
.rst (rst),
.dcache_on (dcache_on),
.l1_request (l1_request),
.l1_response (l1_response),
.sc_complete (sc_complete),
.sc_success (sc_success),
.clear_reservation (clear_reservation),
.amo (),
.uncacheable_load (uncacheable_load),
.uncacheable_store (uncacheable_store),
.is_load (sel_load),
.load_ready (load_ready),
.store_ready (store_ready),
.load_request (dcache_load_request),
.store_request (dcache_store_request),
.ls_load (lsq.load_data_out),
.ls_store (lsq.store_data_out),
.ls (sub_unit[DCACHE_ID])
);
if (CONFIG.DCACHE.USE_EXTERNAL_INVALIDATIONS) begin : gen_full_dcache
dcache_inv #(.CONFIG(CONFIG)) data_cache (
.mem(mem),
.write_outstanding(unit_write_outstanding[DCACHE_ID]),
.amo(shared_inputs.amo),
.amo_type(shared_inputs.amo_type),
.amo_unit(amo_if[DCACHE_ID]),
.uncacheable(uncacheable_load | uncacheable_store),
.cbo(shared_inputs.cache_op),
.ls(sub_unit[DCACHE_ID]),
.load_peek(lsq.load_valid),
.load_addr_peek(lsq.load_data_out.addr),
.*);
end else begin : gen_small_dcache
dcache_noinv #(.CONFIG(CONFIG)) data_cache (
.mem(mem),
.write_outstanding(unit_write_outstanding[DCACHE_ID]),
.amo(shared_inputs.amo),
.amo_type(shared_inputs.amo_type),
.amo_unit(amo_if[DCACHE_ID]),
.uncacheable(uncacheable_load | uncacheable_store),
.cbo(shared_inputs.cache_op),
.ls(sub_unit[DCACHE_ID]),
.load_peek(lsq.load_valid),
.load_addr_peek(lsq.load_data_out.addr),
.*);
end
end
endgenerate
generate if (CONFIG.INCLUDE_AMO) begin : gen_amo
amo_unit #(
.NUM_UNITS(NUM_SUB_UNITS),
.RESERVATION_WORDS(CONFIG.AMO_UNIT.RESERVATION_WORDS)
) amo_inst (
.agents(amo_if),
.*);
end endgenerate
////////////////////////////////////////////////////
//Output Muxing
logic sign_bit_data [4];
@ -581,13 +770,12 @@ module load_store_unit
////////////////////////////////////////////////////
//Output bank
assign wb.rd = final_load_data;
assign wb.done = (load_complete & (~CONFIG.INCLUDE_UNIT.FPU | wb_attr.fp_op == INT_DONE)) | (load_exception_complete & ~exception_is_fp);
//TODO: exceptions seemingly clobber load data if it appears on the same cycle
assign wb.id = load_exception_complete ? exception.id : wb_attr.id;
assign wb.done = (load_complete & (~CONFIG.INCLUDE_UNIT.FPU | wb_attr.fp_op == INT_DONE)) | (exception.valid & ~exception_is_fp & ~exception_is_store);
assign wb.id = exception.valid & ~exception_is_store ? exception_id : wb_attr.id;
assign fp_wb.rd = fp_result;
assign fp_wb.done = (load_complete & (wb_attr.fp_op == SINGLE_DONE | wb_attr.fp_op == DOUBLE_DONE)) | (load_exception_complete & exception_is_fp);
assign fp_wb.id = load_exception_complete ? exception.id : wb_attr.id;
assign fp_wb.done = (load_complete & (wb_attr.fp_op == SINGLE_DONE | wb_attr.fp_op == DOUBLE_DONE)) | (exception.valid & exception_is_fp & ~exception_is_store);
assign fp_wb.id = exception.valid & ~exception_is_store ? exception_id : wb_attr.id;
////////////////////////////////////////////////////
//End of Implementation

View file

@ -39,6 +39,7 @@ module store_queue
//Address hash (shared by loads and stores)
input addr_hash_t addr_hash,
//hash check on adding a load to the queue
output logic [$clog2(CONFIG.SQ_DEPTH)-1:0] sq_index,
output logic [$clog2(CONFIG.SQ_DEPTH)-1:0] sq_oldest,
@ -73,6 +74,8 @@ module store_queue
logic [CONFIG.SQ_DEPTH-1:0] valid;
logic [CONFIG.SQ_DEPTH-1:0] valid_next;
addr_hash_t [CONFIG.SQ_DEPTH-1:0] hashes;
logic [CONFIG.SQ_DEPTH-1:0] ids_valid;
id_t [CONFIG.SQ_DEPTH-1:0] ids;
//LUTRAM-based memory blocks
sq_entry_t output_entry;
@ -131,7 +134,7 @@ module store_queue
.raddr(sq_oldest_next),
.ram_write(sq.push),
.new_ram_data('{
addr : sq.data_in.addr,
offset : sq.data_in.offset,
be : sq.data_in.be,
cache_op : sq.data_in.cache_op,
data : '0,
@ -151,22 +154,28 @@ module store_queue
.waddr(sq.data_in.id),
.raddr(store_retire.id),
.ram_write(sq.push),
.new_ram_data(sq.data_in.addr[1:0]),
.new_ram_data(sq.data_in.offset[1:0]),
.ram_data_out(retire_alignment)
);
//Compare store addr-hashes against new load addr-hash
//ID collisions also handled to prevent overwriting store data
always_comb begin
potential_store_conflict = 0;
for (int i = 0; i < CONFIG.SQ_DEPTH; i++)
for (int i = 0; i < CONFIG.SQ_DEPTH; i++) begin
potential_store_conflict |= {(valid[i] & ~issued_one_hot[i]), addr_hash} == {1'b1, hashes[i]};
potential_store_conflict |= {(valid[i] & ~issued_one_hot[i] & ids_valid[i]), sq.data_in.id} == {1'b1, ids[i]};
end
end
////////////////////////////////////////////////////
//Register-based storage
//Address hashes
always_ff @ (posedge clk) begin
for (int i = 0; i < CONFIG.SQ_DEPTH; i++) begin
if (new_request_one_hot[i])
if (new_request_one_hot[i]) begin
hashes[i] <= addr_hash;
ids[i] <= sq.data_in.id_needed;
ids_valid[i] <= CONFIG.INCLUDE_UNIT.FPU & sq.data_in.fp ? |fp_store_forward_wb_group : |store_forward_wb_group;
end
end
end
////////////////////////////////////////////////////
@ -178,8 +187,6 @@ module store_queue
released_count <= released_count + (LOG2_SQ_DEPTH + 1)'(store_retire.valid) - (LOG2_SQ_DEPTH + 1)'(sq.pop);
end
assign sq.no_released_stores_pending = ~|released_count;
////////////////////////////////////////////////////
//Forwarding and Store Data
//Forwarding is only needed from multi-cycle writeback ports
@ -308,7 +315,7 @@ module store_queue
assign sq.valid = |released_count;
assign sq.data_out = '{
addr : output_entry_r.addr,
offset : output_entry_r.offset,
be : output_entry_r.be,
cache_op : output_entry_r.cache_op,
data : sq_data_out[31:0],

0
core/execution_units/mul_unit.sv Executable file → Normal file
View file

86
core/fetch_stage/branch_predictor.sv Executable file → Normal file
View file

@ -45,7 +45,7 @@ module branch_predictor
localparam longint BUS_RANGE = 64'(CONFIG.IBUS_ADDR.H) - 64'(CONFIG.IBUS_ADDR.L) + 1;
function int get_memory_width();
if(CONFIG.INCLUDE_S_MODE)
if(CONFIG.MODES == MSU)
return 32;
else if (CONFIG.INCLUDE_ICACHE && (
(CONFIG.INCLUDE_ILOCAL_MEM && CACHE_RANGE > SCRATCH_RANGE) ||
@ -66,6 +66,7 @@ module branch_predictor
localparam BTAG_W = get_memory_width() - BRANCH_ADDR_W - 2;
cache_functions_interface #(.TAG_W(BTAG_W), .LINE_W(BRANCH_ADDR_W), .SUB_LINE_W(0)) addr_utils();
typedef logic[1:0] branch_predictor_metadata_t;
typedef struct packed {
logic valid;
logic [BTAG_W-1:0] tag;
@ -102,60 +103,51 @@ module branch_predictor
/////////////////////////////////////////
genvar i;
generate if (CONFIG.INCLUDE_BRANCH_PREDICTOR)
for (i=0; i<CONFIG.BP.WAYS; i++) begin : gen_branch_tag_banks
dual_port_bram #(.WIDTH($bits(branch_table_entry_t)), .LINES(CONFIG.BP.ENTRIES))
tag_bank (
.clk (clk),
.en_a (tag_update_way[i]),
.wen_a (tag_update_way[i]),
.addr_a (addr_utils.getHashedLineAddr(br_results.pc, i)),
.data_in_a (ex_entry),
.data_out_a (),
.en_b (bp.new_mem_request),
.wen_b (0),
.addr_b (addr_utils.getHashedLineAddr(bp.next_pc, i)),
.data_in_b ('0),
.data_out_b (if_entry[i]));
end
endgenerate
generate if (CONFIG.INCLUDE_BRANCH_PREDICTOR) begin : gen_bp
for (i=0; i<CONFIG.BP.WAYS; i++) begin : gen_bp_rams
sdp_ram #(
.ADDR_WIDTH(BRANCH_ADDR_W),
.NUM_COL(1),
.COL_WIDTH($bits(branch_table_entry_t)),
.PIPELINE_DEPTH(0)
) tag_bank (
.a_en(tag_update_way[i]),
.a_wbe(tag_update_way[i]),
.a_wdata(ex_entry),
.a_addr(addr_utils.getHashedLineAddr(br_results.pc, i)),
.b_en(bp.new_mem_request),
.b_addr(addr_utils.getHashedLineAddr(bp.next_pc, i)),
.b_rdata(if_entry[i]),
.*);
generate if (CONFIG.INCLUDE_BRANCH_PREDICTOR)
for (i=0; i<CONFIG.BP.WAYS; i++) begin : gen_branch_table_banks
dual_port_bram #(.WIDTH(32), .LINES(CONFIG.BP.ENTRIES))
addr_table (
.clk (clk),
.en_a (target_update_way[i]),
.wen_a (target_update_way[i]),
.addr_a (addr_utils.getHashedLineAddr(br_results.pc, i)),
.data_in_a (br_results.target_pc),
.data_out_a (),
.en_b (bp.new_mem_request),
.wen_b (0),
.addr_b (addr_utils.getHashedLineAddr(bp.next_pc, i)),
.data_in_b ('0),
.data_out_b (predicted_pc[i])
);
end
endgenerate
sdp_ram #(
.ADDR_WIDTH(BRANCH_ADDR_W),
.NUM_COL(1),
.COL_WIDTH(32),
.PIPELINE_DEPTH(0)
) addr_table (
.a_en(target_update_way[i]),
.a_wbe(target_update_way[i]),
.a_wdata(br_results.target_pc),
.a_addr(addr_utils.getHashedLineAddr(br_results.pc, i)),
.b_en(bp.new_mem_request),
.b_addr(addr_utils.getHashedLineAddr(bp.next_pc, i)),
.b_rdata(predicted_pc[i]),
.*);
generate if (CONFIG.INCLUDE_BRANCH_PREDICTOR)
for (i=0; i<CONFIG.BP.WAYS; i++) begin : gen_branch_hit_detection
assign tag_matches[i] = ({if_entry[i].valid, if_entry[i].tag} == {1'b1, addr_utils.getTag(bp.if_pc)});
end
end
endgenerate
////////////////////////////////////////////////////
//Instruction Fetch Response
generate if (CONFIG.BP.WAYS > 1)
one_hot_to_integer #(CONFIG.BP.WAYS)
hit_way_conv (
.one_hot(tag_matches),
.int_out(hit_way)
);
else
assign hit_way = 0;
endgenerate
one_hot_to_integer #(CONFIG.BP.WAYS)
hit_way_conv (
.one_hot(tag_matches),
.int_out(hit_way)
);
assign tag_match = |tag_matches;
assign use_predicted_pc = CONFIG.INCLUDE_BRANCH_PREDICTOR & tag_match;

87
core/fetch_stage/fetch.sv Executable file → Normal file
View file

@ -36,7 +36,6 @@ module fetch
input logic branch_flush,
input gc_outputs_t gc,
input logic exception,
//ID Support
input id_t pc_id,
@ -58,8 +57,7 @@ module fetch
local_memory_interface.master instruction_bram,
wishbone_interface.master iwishbone,
input logic icache_on,
l1_arbiter_request_interface.master l1_request,
l1_arbiter_return_interface.master l1_response
mem_interface.ro_master mem
);
localparam NUM_SUB_UNITS = int'(CONFIG.INCLUDE_ILOCAL_MEM) + int'(CONFIG.INCLUDE_ICACHE) + int'(CONFIG.INCLUDE_IBUS);
@ -77,6 +75,7 @@ module fetch
addr_utils_interface #(CONFIG.IBUS_ADDR.L, CONFIG.IBUS_ADDR.H) ibus_addr_utils ();
memory_sub_unit_interface sub_unit[NUM_SUB_UNITS-1:0]();
amo_interface unused();
logic [NUM_SUB_UNITS-1:0] sub_unit_address_match;
logic [NUM_SUB_UNITS-1:0] unit_ready;
@ -89,6 +88,7 @@ module fetch
typedef struct packed{
logic is_predicted_branch_or_jump;
logic is_branch;
logic [31:0] early_flush_pc;
logic address_valid;
logic mmu_fault;
logic [NUM_SUB_UNITS_W-1:0] subunit_id;
@ -102,8 +102,9 @@ module fetch
logic [31:0] pc_plus_4;
logic [31:0] pc_mux [4];
logic [1:0] pc_sel;
logic [31:0] early_flush_pc;
logic [31:0] pc_mux [5];
logic [2:0] pc_sel;
logic [31:0] next_pc;
logic [31:0] pc;
@ -130,15 +131,16 @@ module fetch
assign pc_plus_4 = pc + 4;
priority_encoder #(.WIDTH(4))
priority_encoder #(.WIDTH(5))
pc_sel_encoder (
.priority_vector ({1'b1, (bp.use_prediction & ~early_branch_flush), branch_flush, gc.pc_override}),
.priority_vector ({1'b1, bp.use_prediction, early_branch_flush, branch_flush, gc.pc_override}),
.encoded_result (pc_sel)
);
assign pc_mux[0] = gc.pc;
assign pc_mux[1] = bp.branch_flush_pc;
assign pc_mux[2] = bp.is_return ? ras.addr : bp.predicted_pc;
assign pc_mux[3] = pc_plus_4;
assign pc_mux[2] = early_flush_pc;
assign pc_mux[3] = bp.is_return ? ras.addr : bp.predicted_pc;
assign pc_mux[4] = pc_plus_4;
assign next_pc = pc_mux[pc_sel];
//If an exception occurs here in the fetch logic,
@ -147,7 +149,7 @@ module fetch
always_ff @(posedge clk) begin
if (flush_or_rst)
exception_pending <= 0;
else if (tlb.is_fault | (new_mem_request & ~address_valid))
else if ((tlb.is_fault & ~fetch_attr_fifo.full) | (new_mem_request & ~address_valid))
exception_pending <= 1;
end
@ -170,16 +172,15 @@ module fetch
////////////////////////////////////////////////////
//TLB
assign tlb.virtual_address = pc;
assign tlb.execute = 1;
assign tlb.rnw = 0;
assign tlb.new_request = tlb.ready;
assign tlb.rnw = 1;
assign tlb.new_request = tlb.ready & pc_id_available & ~fetch_attr_fifo.full & (~exception_pending) & (~gc.fetch_hold);
//////////////////////////////////////////////
//Issue Control Signals
assign flush_or_rst = (rst | gc.fetch_flush | early_branch_flush);
assign new_mem_request = tlb.done & pc_id_available & ~fetch_attr_fifo.full & units_ready & (~gc.fetch_hold) & (~exception_pending);
assign pc_id_assigned = new_mem_request | tlb.is_fault;
assign new_mem_request = tlb.done & units_ready & ~gc.fetch_hold & ~fetch_attr_fifo.full;
assign pc_id_assigned = new_mem_request | (tlb.is_fault & ~fetch_attr_fifo.full);
//////////////////////////////////////////////
//Subunit Tracking
@ -192,6 +193,7 @@ module fetch
assign fetch_attr_fifo.data_in = '{
is_predicted_branch_or_jump : bp.use_prediction,
is_branch : (bp.use_prediction & bp.is_branch),
early_flush_pc : pc_plus_4,
address_valid : address_valid,
mmu_fault : tlb.is_fault,
subunit_id : subunit_id
@ -207,19 +209,20 @@ module fetch
.fifo (fetch_attr_fifo)
);
assign fetch_attr = fetch_attr_fifo.data_out;
assign early_flush_pc = fetch_attr.early_flush_pc;
assign inflight_count_next = inflight_count + MAX_OUTSTANDING_REQUESTS_W'(fetch_attr_fifo.push) - MAX_OUTSTANDING_REQUESTS_W'(fetch_attr_fifo.pop);
always_ff @(posedge clk) begin
if (rst)
inflight_count <= 0;
else
inflight_count <= inflight_count_next;
inflight_count <= inflight_count_next;
end
always_ff @(posedge clk) begin
if (rst)
flush_count <= 0;
else if (gc.fetch_flush)
else if (gc.fetch_flush | early_branch_flush)
flush_count <= inflight_count_next;
else if (|flush_count & fetch_attr_fifo.pop)
flush_count <= flush_count - 1;
@ -231,7 +234,7 @@ module fetch
//for any sub unit. That request can either be completed or aborted.
//In either case, data_valid must NOT be asserted.
generate for (i=0; i < NUM_SUB_UNITS; i++) begin : gen_fetch_sources
assign sub_unit[i].new_request = fetch_attr_fifo.push & sub_unit_address_match[i];
assign sub_unit[i].new_request = fetch_attr_fifo.push & sub_unit_address_match[i] & ~tlb.is_fault;
assign sub_unit[i].addr = tlb.physical_address;
assign sub_unit[i].re = 1;
assign sub_unit[i].we = 0;
@ -249,6 +252,10 @@ module fetch
local_mem_sub_unit i_local_mem (
.clk (clk),
.rst (rst),
.write_outstanding (),
.amo (1'b0),
.amo_type ('x),
.amo_unit (unused),
.unit (sub_unit[LOCAL_MEM_ID]),
.local_mem (instruction_bram)
);
@ -260,6 +267,10 @@ module fetch
wishbone_master iwishbone_bus (
.clk (clk),
.rst (rst),
.write_outstanding (),
.amo (1'b0),
.amo_type ('x),
.amo_unit (unused),
.wishbone (iwishbone),
.ls (sub_unit[BUS_ID])
);
@ -267,19 +278,37 @@ module fetch
endgenerate
generate if (CONFIG.INCLUDE_ICACHE) begin : gen_fetch_icache
////////////////////////////////////////////////////
//Instruction fence
//A fence first prevents any new instructions from being issued then waits for inflight fetches to complete
//The fence signal can only be delivered to the icache once it is idle
//This logic will be optimized away when instruction fences aren't enabled as gc.fetch_ifence will be constant 0
logic ifence_pending;
logic ifence_start;
assign ifence_start = ifence_pending & ~|inflight_count_next;
always_ff @(posedge clk) begin
if (rst)
ifence_pending <= 0;
else begin
if (gc.fetch_ifence)
ifence_pending <= 1;
else if (~|inflight_count_next)
ifence_pending <= 0;
end
end
assign sub_unit_address_match[ICACHE_ID] = icache_addr_utils.address_range_check(tlb.physical_address);
icache #(.CONFIG(CONFIG))
i_cache (
.clk (clk),
.rst (rst),
.gc (gc),
.ifence (ifence_start),
.icache_on (icache_on),
.l1_request (l1_request),
.l1_response (l1_response),
.mem (mem),
.fetch_sub (sub_unit[ICACHE_ID])
);
end
endgenerate
end endgenerate
assign units_ready = &unit_ready;
assign address_valid = |sub_unit_address_match;
@ -287,25 +316,25 @@ module fetch
////////////////////////////////////////////////////
//Instruction metada updates
logic valid_fetch_result;
assign valid_fetch_result = CONFIG.INCLUDE_M_MODE ? (fetch_attr_fifo.valid & fetch_attr.address_valid & (~fetch_attr.mmu_fault)) : 1;
assign valid_fetch_result = CONFIG.MODES != BARE ? (fetch_attr_fifo.valid & fetch_attr.address_valid & (~fetch_attr.mmu_fault)) : 1;
assign if_pc = pc;
assign fetch_metadata.ok = valid_fetch_result;
assign fetch_metadata.error_code = INST_ACCESS_FAULT;
assign fetch_metadata.error_code = fetch_attr.mmu_fault ? INST_PAGE_FAULT : INST_ACCESS_FAULT;
assign fetch_instruction = unit_data_array[fetch_attr.subunit_id];
assign internal_fetch_complete = fetch_attr_fifo.valid & (fetch_attr.address_valid ? |unit_data_valid : ~valid_fetch_result);//allow instruction to propagate to decode if address is invalid
assign internal_fetch_complete = fetch_attr_fifo.valid & (~valid_fetch_result | |unit_data_valid);//allow instruction to propagate to decode if address is invalid
assign fetch_complete = internal_fetch_complete & ~|flush_count;
////////////////////////////////////////////////////
//Branch Predictor corruption check
//Needed if instruction memory is changed after any branches have been executed
generate if (CONFIG.INCLUDE_IFENCE | CONFIG.INCLUDE_S_MODE) begin : gen_branch_corruption_check
generate if (CONFIG.INCLUDE_IFENCE | CONFIG.MODES == MSU) begin : gen_branch_corruption_check
logic is_branch_or_jump;
assign is_branch_or_jump = fetch_instruction[6:2] inside {JAL_T, JALR_T, BRANCH_T};
assign early_branch_flush = (valid_fetch_result & (|unit_data_valid)) & fetch_attr.is_predicted_branch_or_jump & (~is_branch_or_jump);
assign early_branch_flush_ras_adjust = (valid_fetch_result & (|unit_data_valid)) & fetch_attr.is_branch & (~is_branch_or_jump);
assign early_branch_flush = (valid_fetch_result & (|unit_data_valid)) & fetch_attr.is_predicted_branch_or_jump & (~is_branch_or_jump) & (~|flush_count);
assign early_branch_flush_ras_adjust = (valid_fetch_result & (|unit_data_valid)) & fetch_attr.is_branch & (~is_branch_or_jump) & (~|flush_count);
end endgenerate
////////////////////////////////////////////////////
//End of Implementation

96
core/fetch_stage/icache.sv Executable file → Normal file
View file

@ -33,10 +33,9 @@ module icache
(
input logic clk,
input logic rst,
input gc_outputs_t gc,
input logic ifence,
input logic icache_on,
l1_arbiter_request_interface.master l1_request,
l1_arbiter_return_interface.master l1_response,
mem_interface.ro_master mem,
memory_sub_unit_interface.responder fetch_sub
);
@ -46,6 +45,9 @@ module icache
cache_functions_interface #(.TAG_W(SCONFIG.TAG_W), .LINE_W(SCONFIG.LINE_ADDR_W), .SUB_LINE_W(SCONFIG.SUB_LINE_ADDR_W)) addr_utils();
logic ifence_in_progress;
logic[SCONFIG.LINE_ADDR_W-1:0] ifence_counter;
logic tag_hit;
logic [CONFIG.ICACHE.WAYS-1:0] tag_hit_way;
@ -59,7 +61,7 @@ module icache
logic line_complete;
logic [31:0] data_out [CONFIG.ICACHE.WAYS-1:0];
logic [CONFIG.ICACHE.WAYS-1:0][31:0] data_out;
logic linefill_in_progress;
logic request_in_progress;
@ -94,6 +96,29 @@ module icache
.rst (rst),
.fifo (input_fifo)
);
////////////////////////////////////////////////////
//Instruction fence
generate if (CONFIG.INCLUDE_IFENCE) begin : gen_ifence
always_ff @(posedge clk) begin
if (rst) begin
ifence_counter <= '0;
ifence_in_progress <= 0;
end else begin
if (ifence)
ifence_in_progress <= 1;
else if (&ifence_counter)
ifence_in_progress <= 0;
if (ifence_in_progress)
ifence_counter <= ifence_counter+1;
end
end
end else begin : gen_no_ifence
assign ifence_in_progress = 0;
assign ifence_counter = '0;
end endgenerate
////////////////////////////////////////////////////
//Ready determination
always_ff @ (posedge clk) begin
@ -103,7 +128,7 @@ module icache
request_in_progress <= (request_in_progress & ~fetch_sub.data_valid) | new_request;
end
assign fetch_sub.ready = ~input_fifo.full;
assign fetch_sub.ready = ~input_fifo.full & ~ifence_in_progress;
////////////////////////////////////////////////////
//General Control Logic
@ -124,7 +149,7 @@ module icache
if (rst)
tag_update <= 0;
else
tag_update <= l1_request.ack;
tag_update <= mem.ack;
end
//Replacement policy is psuedo random
@ -144,22 +169,17 @@ module icache
logic initiate_l1_request;
logic request_r;
assign l1_request.addr = second_cycle_addr;
assign l1_request.data = 0;
assign l1_request.rnw = 1;
assign l1_request.be = 0;
assign l1_request.size = 5'(CONFIG.ICACHE.LINE_W-1);
assign l1_request.is_amo = 0;
assign l1_request.amo = 0;
assign mem.addr = second_cycle_addr[31:2];
assign mem.rlen = 5'(CONFIG.ICACHE.LINE_W-1);
assign initiate_l1_request = second_cycle & (~tag_hit | ~icache_on);
always_ff @ (posedge clk) begin
if (rst)
request_r <= 0;
else
request_r <= (initiate_l1_request | request_r) & ~l1_request.ack;
request_r <= (initiate_l1_request | request_r) & ~mem.ack;
end
assign l1_request.request = request_r;
assign mem.request = request_r;
////////////////////////////////////////////////////
//Miss state tracking
@ -167,7 +187,7 @@ module icache
if (rst)
linefill_in_progress <= 0;
else
linefill_in_progress <= (linefill_in_progress & ~line_complete) | l1_request.ack;
linefill_in_progress <= (linefill_in_progress & ~line_complete) | mem.ack;
end
////////////////////////////////////////////////////
@ -176,6 +196,8 @@ module icache
icache_tag_banks (
.clk(clk),
.rst(rst), //clears the read_hit_allowed flag
.ifence(ifence_in_progress),
.ifence_addr(ifence_counter),
.stage1_line_addr(addr_utils.getTagLineAddr(new_request_addr)),
.stage2_line_addr(addr_utils.getTagLineAddr(second_cycle_addr)),
.stage2_tag(addr_utils.getTag(second_cycle_addr)),
@ -188,22 +210,20 @@ module icache
////////////////////////////////////////////////////
//Data Banks
genvar i;
generate for (i=0; i < CONFIG.ICACHE.WAYS; i++) begin : idata_bank_gen
dual_port_bram #(.WIDTH(32), .LINES(CONFIG.ICACHE.LINES*CONFIG.ICACHE.LINE_W)) idata_bank (
.clk(clk),
.en_a(new_request),
.wen_a(0),
.addr_a(addr_utils.getDataLineAddr(new_request_addr)),
.data_in_a('0),
.data_out_a(data_out[i]),
.en_b(1),
.wen_b(tag_update_way[i] & l1_response.data_valid),
.addr_b(addr_utils.getDataLineAddr({second_cycle_addr[31:SCONFIG.SUB_LINE_ADDR_W+2], word_count, 2'b0})),
.data_in_b(l1_response.data),
.data_out_b()
);
end endgenerate
sdp_ram #(
.ADDR_WIDTH(SCONFIG.LINE_ADDR_W+SCONFIG.SUB_LINE_ADDR_W),
.NUM_COL(CONFIG.ICACHE.WAYS),
.COL_WIDTH(32),
.PIPELINE_DEPTH(0)
) idata_bank (
.a_en(mem.rvalid),
.a_wbe(tag_update_way),
.a_wdata({CONFIG.ICACHE.WAYS{mem.rdata}}),
.a_addr(addr_utils.getDataLineAddr({second_cycle_addr[31:SCONFIG.SUB_LINE_ADDR_W+2], word_count, 2'b0})),
.b_en(new_request),
.b_addr(addr_utils.getDataLineAddr(new_request_addr)),
.b_rdata(data_out),
.*);
////////////////////////////////////////////////////
//Miss data path
@ -214,11 +234,11 @@ module icache
if (rst)
word_count <= 0;
else
word_count <= word_count + SCONFIG.SUB_LINE_ADDR_W'(l1_response.data_valid);
word_count <= word_count + SCONFIG.SUB_LINE_ADDR_W'(mem.rvalid);
end
assign miss_data_valid = request_in_progress & l1_response.data_valid & is_target_word;
assign line_complete = l1_response.data_valid & (word_count == END_OF_LINE_COUNT);
assign miss_data_valid = request_in_progress & mem.rvalid & is_target_word;
assign line_complete = mem.rvalid & (word_count == END_OF_LINE_COUNT);
////////////////////////////////////////////////////
//Output muxing
@ -228,7 +248,7 @@ module icache
logic [31:0] output_array [OMUX_W];
always_comb begin
priority_vector[0] = miss_data_valid;
output_array[0] = l1_response.data;
output_array[0] = mem.rdata;
for (int i = 0; i < CONFIG.ICACHE.WAYS; i++) begin
priority_vector[i+1] = tag_hit_way[i];
output_array[i+1] = data_out[i];
@ -250,11 +270,11 @@ module icache
////////////////////////////////////////////////////
//Assertions
icache_l1_arb_ack_assertion:
assert property (@(posedge clk) disable iff (rst) l1_request.ack |-> l1_request.request)
assert property (@(posedge clk) disable iff (rst) mem.ack |-> mem.request)
else $error("Spurious icache ack received from arbiter!");
icache_l1_arb_data_valid_assertion:
assert property (@(posedge clk) disable iff (rst) l1_response.data_valid |-> linefill_in_progress)
assert property (@(posedge clk) disable iff (rst) mem.rvalid |-> linefill_in_progress)
else $error("Spurious icache data received from arbiter!");
endmodule

42
core/fetch_stage/icache_tag_banks.sv Executable file → Normal file
View file

@ -33,6 +33,8 @@ module itag_banks
(
input logic clk,
input logic rst,
input logic ifence,
input logic[SCONFIG.LINE_ADDR_W-1:0] ifence_addr,
input logic[SCONFIG.LINE_ADDR_W-1:0] stage1_line_addr,
input logic[SCONFIG.LINE_ADDR_W-1:0] stage2_line_addr,
@ -49,7 +51,7 @@ module itag_banks
//Valid + tag
typedef logic [SCONFIG.TAG_W : 0] itag_entry_t;
itag_entry_t tag_line[CONFIG.ICACHE.WAYS-1:0];
itag_entry_t[CONFIG.ICACHE.WAYS-1:0] tag_line;
logic hit_allowed;
@ -60,25 +62,25 @@ module itag_banks
hit_allowed <= stage1_adv;
end
genvar i;
generate
for (i=0; i < CONFIG.ICACHE.WAYS; i++) begin : tag_bank_gen
dual_port_bram #(.WIDTH(SCONFIG.TAG_W+1), .LINES(CONFIG.ICACHE.LINES)) itag_bank (.*,
.clk(clk),
.en_a(stage1_adv),
.wen_a('0),
.addr_a(stage1_line_addr),
.data_in_a('0),
.data_out_a(tag_line[i]),
.en_b(update),
.wen_b(update_way[i]),
.addr_b(stage2_line_addr),
.data_in_b({1'b1, stage2_tag}),
.data_out_b()
);
assign tag_hit_way[i] = ({hit_allowed, 1'b1, stage2_tag} == {1'b1, tag_line[i]});
end
endgenerate
sdp_ram_padded #(
.ADDR_WIDTH(SCONFIG.LINE_ADDR_W),
.NUM_COL(CONFIG.ICACHE.WAYS),
.COL_WIDTH(SCONFIG.TAG_W+1),
.PIPELINE_DEPTH(0)
) itag_bank (
.a_en(update | ifence),
.a_wbe(update_way | {CONFIG.ICACHE.WAYS{ifence}}),
.a_wdata({CONFIG.ICACHE.WAYS{~ifence, stage2_tag}}),
.a_addr(ifence ? ifence_addr : stage2_line_addr),
.b_en(stage1_adv),
.b_addr(stage1_line_addr),
.b_rdata(tag_line),
.*);
always_comb begin
for (int i = 0; i < CONFIG.ICACHE.WAYS; i++)
tag_hit_way[i] = ({hit_allowed, 1'b1, stage2_tag} == {1'b1, tag_line[i]});
end
assign tag_hit = |tag_hit_way;

2
core/fetch_stage/ras.sv Executable file → Normal file
View file

@ -77,4 +77,4 @@ module ras
read_index <= new_index;
end
endmodule
endmodule

View file

@ -53,7 +53,6 @@ module instruction_metadata_and_id_management
input logic decode_uses_rd,
input logic fp_decode_uses_rd,
input rs_addr_t decode_rd_addr,
input exception_sources_t decode_exception_unit,
input logic decode_is_store,
//renamer
input phys_addr_t decode_phys_rd_addr,
@ -76,15 +75,11 @@ module instruction_metadata_and_id_management
output retire_packet_t fp_wb_retire,
output retire_packet_t store_retire,
output id_t retire_ids [RETIRE_PORTS],
output id_t retire_ids_next [RETIRE_PORTS],
output logic retire_port_valid [RETIRE_PORTS],
output logic [LOG2_RETIRE_PORTS : 0] retire_count,
//CSR
output logic [LOG2_MAX_IDS:0] post_issue_count,
//Exception
output logic [31:0] oldest_pc,
output logic [$clog2(NUM_EXCEPTION_SOURCES)-1:0] current_exception_unit
output logic [LOG2_MAX_IDS:0] post_issue_count
);
//////////////////////////////////////////
localparam NUM_WB_GROUPS = CONFIG.NUM_WB_GROUPS + 32'(CONFIG.INCLUDE_UNIT.FPU) + 32'(CONFIG.INCLUDE_UNIT.FPU);
@ -115,6 +110,7 @@ module instruction_metadata_and_id_management
retire_packet_t fp_wb_retire_next;
retire_packet_t store_retire_next;
id_t retire_ids_next [RETIRE_PORTS];
logic retire_port_valid_next [RETIRE_PORTS];
logic [LOG2_RETIRE_PORTS : 0] retire_count_next;
////////////////////////////////////////////////////
@ -133,18 +129,6 @@ module instruction_metadata_and_id_management
.ram_data_out(decode_pc)
);
generate if (CONFIG.INCLUDE_M_MODE) begin : gen_pc_id_exception_support
lutram_1w_1r #(.DATA_TYPE(logic[31:0]), .DEPTH(MAX_IDS))
pc_table_exception (
.clk(clk),
.waddr(pc_id),
.raddr(retire_ids_next[0]),
.ram_write(pc_id_assigned),
.new_ram_data(if_pc),
.ram_data_out(oldest_pc)
);
end endgenerate
////////////////////////////////////////////////////
//Instruction table
lutram_1w_1r #(.DATA_TYPE(logic[31:0]), .DEPTH(MAX_IDS))
@ -220,20 +204,6 @@ module instruction_metadata_and_id_management
.ram_data_out(wb_phys_addrs)
);
////////////////////////////////////////////////////
//Exception unit table
generate if (CONFIG.INCLUDE_M_MODE) begin : gen_id_exception_support
lutram_1w_1r #(.DATA_TYPE(logic[$bits(exception_sources_t)-1:0]), .DEPTH(MAX_IDS))
exception_unit_table (
.clk(clk),
.waddr(decode_id),
.raddr(retire_ids_next[0]),
.ram_write(decode_advance),
.new_ram_data(decode_exception_unit),
.ram_data_out(current_exception_unit)
);
end endgenerate
////////////////////////////////////////////////////
//ID Management
@ -255,7 +225,7 @@ module instruction_metadata_and_id_management
decode_id <= oldest_pre_issue_id;
end
else begin
pc_id <= (early_branch_flush ? fetch_id : pc_id) + LOG2_MAX_IDS'(pc_id_assigned);
pc_id <= early_branch_flush ? fetch_id + LOG2_MAX_IDS'(fetch_complete) : pc_id + LOG2_MAX_IDS'(pc_id_assigned);
fetch_id <= fetch_id + LOG2_MAX_IDS'(fetch_complete);
decode_id <= decode_id + LOG2_MAX_IDS'(decode_advance);
end
@ -270,10 +240,8 @@ module instruction_metadata_and_id_management
retire_ids_next[i] <= retire_ids_next[i] + LOG2_MAX_IDS'(retire_count_next);
end
always_ff @ (posedge clk) begin
if (~gc.retire_hold)
retire_ids[i] <= retire_ids_next[i];
end
always_ff @ (posedge clk)
retire_ids[i] <= retire_ids_next[i];
end endgenerate
//Represented as a negative value so that the MSB indicates that the decode stage is valid
@ -343,7 +311,6 @@ module instruction_metadata_and_id_management
) id_waiting_for_writeback_toggle_mem_set
(
.clk (clk),
.rst (rst),
.init_clear (gc.init_clear),
.toggle (id_waiting_toggle),
.toggle_addr (id_waiting_toggle_addr),
@ -363,13 +330,11 @@ module instruction_metadata_and_id_management
//Supports retiring up to RETIRE_PORTS instructions. The retired block of instructions must be
//contiguous and must start with the first retire port. Additionally, only one register file writing
//instruction is supported per cycle.
//If an exception is pending, only retire a single intrustuction per cycle. As such, the pending
//exception will have to become the oldest instruction retire_ids[0] before it can retire.
logic retire_with_rd_found;
logic retire_with_fp_rd_found;
logic retire_with_store_found;
always_comb begin
contiguous_retire = ~gc.retire_hold;
contiguous_retire = 1;
retire_with_rd_found = 0;
retire_with_fp_rd_found = 0;
retire_with_store_found = 0;
@ -386,7 +351,7 @@ module instruction_metadata_and_id_management
retire_with_rd_found |= retire_port_valid_next[i] & retire_type[i] == RD;
retire_with_fp_rd_found |= retire_port_valid_next[i] & retire_type[i] == FP_RD;
retire_with_store_found |= retire_port_valid_next[i] & retire_type[i] == STORE;
contiguous_retire &= retire_port_valid_next[i] & ~gc.exception_pending;
contiguous_retire &= retire_port_valid_next[i];
if (retire_port_valid_next[i] & retire_type[i] == RD)
retire_with_rd_sel = LOG2_RETIRE_PORTS'(i);
@ -423,9 +388,9 @@ module instruction_metadata_and_id_management
fp_wb_retire <= fp_wb_retire_next;
store_retire <= store_retire_next;
retire_count <= gc.writeback_supress ? '0 : retire_count_next;
retire_count <= retire_count_next;
for (int i = 0; i < RETIRE_PORTS; i++)
retire_port_valid[i] <= retire_port_valid_next[i] & ~gc.writeback_supress;
retire_port_valid[i] <= retire_port_valid_next[i];
end
////////////////////////////////////////////////////
@ -439,7 +404,7 @@ module instruction_metadata_and_id_management
valid : fetched_count_neg[LOG2_MAX_IDS],
pc : decode_pc,
instruction : decode_instruction,
fetch_metadata : CONFIG.INCLUDE_M_MODE ? decode_fetch_metadata : ADDR_OK
fetch_metadata : CONFIG.MODES != BARE ? decode_fetch_metadata : ADDR_OK
};
////////////////////////////////////////////////////

View file

@ -1,134 +0,0 @@
/*
* Copyright © 2017-2020 Eric Matthews, Lesley Shannon
*
* Licensed under the Apache License, Version 2.0 (the "License");
* you may not use this file except in compliance with the License.
* You may obtain a copy of the License at
*
* http://www.apache.org/licenses/LICENSE-2.0
*
* Unless required by applicable law or agreed to in writing, software
* distributed under the License is distributed on an "AS IS" BASIS,
* WITHOUT WARRANTIES OR CONDITIONS OF ANY KIND, either express or implied.
* See the License for the specific language governing permissions and
* limitations under the License.
*
* Initial code developed under the supervision of Dr. Lesley Shannon,
* Reconfigurable Computing Lab, Simon Fraser University.
*
* Author(s):
* Eric Matthews <ematthew@sfu.ca>
*/
module l1_arbiter
import cva5_config::*;
import riscv_types::*;
import cva5_types::*;
import l2_config_and_types::*;
# (
parameter cpu_config_t CONFIG = EXAMPLE_CONFIG
)
(
input logic clk,
input logic rst,
l2_requester_interface.master l2,
output sc_complete,
output sc_success,
l1_arbiter_request_interface.slave l1_request[L1_CONNECTIONS-1:0],
l1_arbiter_return_interface.slave l1_response[L1_CONNECTIONS-1:0]
);
l2_request_t [L1_CONNECTIONS-1:0] l2_requests;
logic [L1_CONNECTIONS-1:0] requests;
logic [L1_CONNECTIONS-1:0] acks;
logic [((L1_CONNECTIONS == 1) ? 0 : ($clog2(L1_CONNECTIONS)-1)) : 0] arb_sel;
logic fifos_full;
logic request_exists;
////////////////////////////////////////////////////
//Implementation
//Interface to array
generate for (genvar i = 0; i < L1_CONNECTIONS; i++) begin : gen_requests
assign requests[i] = l1_request[i].request;
assign l1_request[i].ack = acks[i];
end endgenerate
//Always accept L2 data
assign l2.rd_data_ack = l2.rd_data_valid;
//Always accept store-conditional result
assign sc_complete = CONFIG.INCLUDE_AMO & l2.con_valid;
assign sc_success = CONFIG.INCLUDE_AMO & l2.con_result;
//Arbiter can pop address FIFO at a different rate than the data FIFO, so check that both have space.
assign fifos_full = l2.request_full | l2.data_full;
assign request_exists = |requests;
assign l2.request_push = request_exists & ~fifos_full;
////////////////////////////////////////////////////
//Dcache Specific
assign l2.wr_data_push = l2.request_push & ~l2.rnw;
assign l2.wr_data = l1_request[L1_DCACHE_ID].data;
assign l2.wr_data_be = l1_request[L1_DCACHE_ID].be;
assign l2.inv_ack = CONFIG.DCACHE.USE_EXTERNAL_INVALIDATIONS ? l1_response[L1_DCACHE_ID].inv_ack : l2.inv_valid;
assign l1_response[L1_DCACHE_ID].inv_addr = l2.inv_addr;
assign l1_response[L1_DCACHE_ID].inv_valid = CONFIG.DCACHE.USE_EXTERNAL_INVALIDATIONS & l2.inv_valid;
////////////////////////////////////////////////////
//Interface mapping
generate for (genvar i = 0; i < L1_CONNECTIONS; i++) begin : gen_l2_requests
assign l2_requests[i] = '{
addr : l1_request[i].addr[31:2],
rnw : l1_request[i].rnw,
is_amo : l1_request[i].is_amo,
amo_type_or_burst_size : l1_request[i].size,
sub_id : L2_SUB_ID_W'(i)
};
end endgenerate
////////////////////////////////////////////////////
//Arbitration
logic [$clog2(L1_CONNECTIONS)-1:0] state;
logic [$clog2(L1_CONNECTIONS)-1:0] muxes [L1_CONNECTIONS-1:0];
always_ff @(posedge clk) begin
if (rst)
state <= 0;
else if (l2.request_push)
state <= arb_sel;
end
always_comb begin
for (int i = 0; i < L1_CONNECTIONS; i++) begin
muxes[i] = $clog2(L1_CONNECTIONS)'(i);
for (int j = 0; j < L1_CONNECTIONS; j++) begin
if (requests[(i + j) % L1_CONNECTIONS])
muxes[i] = $clog2(L1_CONNECTIONS)'((i + j) % L1_CONNECTIONS);
end
end
end
assign arb_sel = muxes[state];
assign acks = L1_CONNECTIONS'(l2.request_push) << arb_sel;
assign l2.addr = l2_requests[arb_sel].addr;
assign l2.rnw = l2_requests[arb_sel].rnw;
assign l2.is_amo = l2_requests[arb_sel].is_amo;
assign l2.amo_type_or_burst_size = l2_requests[arb_sel].amo_type_or_burst_size;
assign l2.sub_id = l2_requests[arb_sel].sub_id;
generate for (genvar i = 0; i < L1_CONNECTIONS; i++) begin : gen_l1_responses
assign l1_response[i].data = l2.rd_data;
assign l1_response[i].data_valid = l2.rd_data_valid & (l2.rd_sub_id == i);
end endgenerate
endmodule

View file

@ -1,5 +1,5 @@
/*
* Copyright © 2019 Eric Matthews, Lesley Shannon
* Copyright © 2019 Eric Matthews, Chris Keilbart, Lesley Shannon
*
* Licensed under the Apache License, Version 2.0 (the "License");
* you may not use this file except in compliance with the License.
@ -18,75 +18,160 @@
*
* Author(s):
* Eric Matthews <ematthew@sfu.ca>
* Chris Keilbart <ckeilbar@sfu.ca>
*/
module avalon_master
import cva5_config::*;
import riscv_types::*;
import cva5_types::*;
#(
parameter int unsigned LR_WAIT = 32, //The number of cycles lock is held after an LR
parameter logic INCLUDE_AMO = 1 //Required because the tools cannot fully optimize even if amo signals are tied off
)
(
input logic clk,
input logic rst,
output logic write_outstanding,
avalon_interface.master m_avalon,
input logic amo,
input amo_t amo_type,
amo_interface.subunit amo_unit,
memory_sub_unit_interface.responder ls
);
//implementation
////////////////////////////////////////////////////
//Implementation
typedef enum {
READY,
REQUESTING,
REQUESTING_AMO_R,
REQUESTING_AMO_M,
REQUESTING_AMO_W,
READY_LR,
REQUESTING_SC
} state_t;
state_t current_state;
always_ff @ (posedge clk) begin
if (ls.new_request) begin
m_avalon.addr <= ls.addr;
m_avalon.byteenable <= ls.be;
m_avalon.writedata <= ls.data_in;
end
end
logic[$clog2(LR_WAIT)-1:0] lock_counter;
logic request_is_sc;
assign request_is_sc = amo & amo_type == AMO_SC_FN5;
always_ff @ (posedge clk) begin
assign amo_unit.set_reservation = ls.new_request & amo & amo_type == AMO_LR_FN5;
assign amo_unit.clear_reservation = ls.new_request;
assign amo_unit.reservation = ls.addr;
assign amo_unit.rs1 = ls.data_out;
assign amo_unit.rs2 = m_avalon.writedata;
always_ff @(posedge clk) begin
m_avalon.addr[1:0] <= '0;
unique case (current_state)
READY : begin //Accept any request
ls.ready <= ~ls.new_request | request_is_sc;
ls.data_out <= 32'b1;
ls.data_valid <= ls.new_request & request_is_sc;
m_avalon.addr[31:2] <= ls.addr[31:2];
m_avalon.byteenable <= ls.be;
m_avalon.writedata <= ls.data_in;
m_avalon.read <= ls.new_request & ls.re & ~request_is_sc;
m_avalon.write <= ls.new_request & ls.we;
m_avalon.lock <= ls.new_request & amo;
write_outstanding <= ls.new_request & (ls.we | amo);
amo_unit.rmw_valid <= 0;
amo_unit.op <= amo_type;
lock_counter <= '0;
if (ls.new_request & (~amo | amo_type == AMO_LR_FN5))
current_state <= REQUESTING;
else if (ls.new_request & amo & amo_type != AMO_SC_FN5)
current_state <= REQUESTING_AMO_R;
end
REQUESTING : begin //Wait for response
ls.ready <= ~m_avalon.waitrequest;
ls.data_out <= m_avalon.readdata;
ls.data_valid <= m_avalon.read & ~m_avalon.waitrequest;
m_avalon.read <= m_avalon.read & m_avalon.waitrequest;
m_avalon.write <= m_avalon.write & m_avalon.waitrequest;
write_outstanding <= m_avalon.write & ~m_avalon.waitrequest;
if (~m_avalon.waitrequest)
current_state <= m_avalon.lock ? READY_LR : READY;
end
REQUESTING_AMO_R : begin //Read for an AMO
if (INCLUDE_AMO) begin
ls.data_out <= m_avalon.readdata;
ls.data_valid <= ~m_avalon.waitrequest;
m_avalon.read <= m_avalon.waitrequest;
amo_unit.rmw_valid <= ~m_avalon.waitrequest;
if (~m_avalon.waitrequest)
current_state <= REQUESTING_AMO_M;
end
end
REQUESTING_AMO_M : begin //One cycle for computing the AMO write value
if (INCLUDE_AMO) begin
ls.data_valid <= 0;
m_avalon.writedata <= amo_unit.rd;
m_avalon.write <= 1;
amo_unit.rmw_valid <= 0;
current_state <= REQUESTING_AMO_W;
end
end
REQUESTING_AMO_W : begin //Write for an AMO
if (INCLUDE_AMO) begin
ls.ready <= ~m_avalon.waitrequest;
m_avalon.write <= m_avalon.waitrequest;
m_avalon.lock <= m_avalon.waitrequest;
write_outstanding <= m_avalon.waitrequest;
if (~m_avalon.waitrequest)
current_state <= READY;
end
end
READY_LR : begin //Lock is held; hold for LR_WAIT cycles
if (INCLUDE_AMO) begin
ls.ready <= ~ls.new_request | (request_is_sc & ~amo_unit.reservation_valid);
ls.data_out <= {31'b0, ~amo_unit.reservation_valid};
ls.data_valid <= ls.new_request & request_is_sc;
m_avalon.addr[31:2] <= ls.addr[31:2];
m_avalon.byteenable <= ls.be;
m_avalon.writedata <= ls.data_in;
m_avalon.read <= ls.new_request & ls.re & ~request_is_sc;
m_avalon.write <= ls.new_request & (ls.we | (request_is_sc & amo_unit.reservation_valid));
write_outstanding <= ls.new_request & (ls.we | amo);
amo_unit.rmw_valid <= 0;
amo_unit.op <= amo_type;
if (ls.new_request)
m_avalon.lock <= amo;
else if (32'(lock_counter) == LR_WAIT-1)
m_avalon.lock <= 0;
lock_counter <= lock_counter + 1;
if (ls.new_request & (~amo | amo_type == AMO_LR_FN5))
current_state <= REQUESTING;
else if (ls.new_request & amo & amo_type != AMO_SC_FN5)
current_state <= REQUESTING_AMO_R;
else if (ls.new_request & amo & amo_type == AMO_SC_FN5 & amo_unit.reservation_valid)
current_state <= REQUESTING_SC;
else if (32'(lock_counter) == LR_WAIT-1 | ls.new_request)
current_state <= READY;
end
end
REQUESTING_SC : begin //Exclusive write
if (INCLUDE_AMO) begin
ls.ready <= ~m_avalon.waitrequest;
ls.data_valid <= 0;
m_avalon.write <= m_avalon.waitrequest;
m_avalon.lock <= m_avalon.waitrequest;
write_outstanding <= m_avalon.waitrequest;
if (~m_avalon.waitrequest)
current_state <= REQUESTING;
end
end
endcase
if (rst)
ls.ready <= 1;
else if (ls.new_request)
ls.ready <= 0;
else if (~ls.ready & ~m_avalon.waitrequest)
ls.ready <= 1;
current_state <= READY;
end
always_ff @ (posedge clk) begin
if (rst)
ls.data_valid <= 0;
else if (m_avalon.read & ~m_avalon.waitrequest)
ls.data_valid <= 1;
else
ls.data_valid <= 0;
end
always_ff @ (posedge clk) begin
if (m_avalon.read & ~m_avalon.waitrequest)
ls.data_out <= m_avalon.readdata;
else
ls.data_out <= 0;
end
always_ff @ (posedge clk) begin
if (rst)
m_avalon.read <= 0;
else if (ls.new_request & ls.re)
m_avalon.read <= 1;
else if (~m_avalon.waitrequest)
m_avalon.read <= 0;
end
always_ff @ (posedge clk) begin
if (rst)
m_avalon.write <= 0;
else if (ls.new_request & ls.we)
m_avalon.write <= 1;
else if (~m_avalon.waitrequest)
m_avalon.write <= 0;
end
endmodule

159
core/memory_sub_units/axi_master.sv Executable file → Normal file
View file

@ -1,5 +1,5 @@
/*
* Copyright © 2017 Eric Matthews, Lesley Shannon
* Copyright © 2024 Eric Matthews, Chris Keilbart, Lesley Shannon
*
* Licensed under the Apache License, Version 2.0 (the "License");
* you may not use this file except in compliance with the License.
@ -18,89 +18,120 @@
*
* Author(s):
* Eric Matthews <ematthew@sfu.ca>
* Chris Keilbart <ckeilbar@sfu.ca>
*/
module axi_master
import cva5_config::*;
import riscv_types::*;
import cva5_types::*;
(
input logic clk,
input logic rst,
output logic write_outstanding,
axi_interface.master m_axi,
input logic [2:0] size,
input logic amo,
input amo_t amo_type,
amo_interface.subunit amo_unit,
memory_sub_unit_interface.responder ls
);
logic ready;
////////////////////////////////////////////////////
//Implementation
typedef enum {
READY,
REQUESTING_WRITE,
REQUESTING_READ,
REQUESTING_AMO_M,
WAITING_READ,
WAITING_WRITE
} state_t;
state_t current_state;
//read constants
assign m_axi.arlen = 0; // 1 request
assign m_axi.arburst = 0;// burst type does not matter
assign m_axi.rready = 1; //always ready to receive data
logic request_is_invalid_sc;
assign request_is_invalid_sc = amo & amo_type == AMO_SC_FN5 & ~amo_unit.reservation_valid;
always_ff @ (posedge clk) begin
if (ls.new_request) begin
m_axi.araddr <= ls.addr;
m_axi.arsize <= size;
m_axi.awsize <= size;
m_axi.awaddr <= ls.addr;
m_axi.wdata <= ls.data_in;
m_axi.wstrb <= ls.be;
end
end
assign amo_unit.set_reservation = ls.new_request & amo & amo_type == AMO_LR_FN5;
assign amo_unit.clear_reservation = ls.new_request;
assign amo_unit.reservation = ls.addr;
assign amo_unit.rs1 = ls.data_out;
//write constants
assign m_axi.awlen = 0;
assign m_axi.awburst = 0;
logic[29:0] addr;
assign m_axi.awaddr = {addr, 2'b0};
assign m_axi.araddr = {addr, 2'b0};
assign m_axi.awlen = '0;
assign m_axi.arlen = '0;
assign m_axi.awburst = '0;
assign m_axi.arburst = '0;
assign m_axi.awid = '0;
assign m_axi.arid = '0;
assign m_axi.rready = 1;
assign m_axi.bready = 1;
set_clr_reg_with_rst #(.SET_OVER_CLR(0), .WIDTH(1), .RST_VALUE(1)) ready_m (
.clk, .rst,
.set(m_axi.rvalid | m_axi.bvalid),
.clr(ls.new_request),
.result(ready)
);
assign ls.ready = ready;
always_ff @ (posedge clk) begin
always_ff @(posedge clk) begin
unique case (current_state)
READY : begin //Accept any request
ls.ready <= ~ls.new_request | request_is_invalid_sc;
ls.data_out <= 1;
ls.data_valid <= ls.new_request & request_is_invalid_sc;
addr <= ls.addr[31:2];
m_axi.awlock <= amo & amo_type != AMO_LR_FN5; //Used in WAITING_READ to determine if it was a RMW
m_axi.awvalid <= ls.new_request & (ls.we | (amo & amo_type == AMO_SC_FN5 & amo_unit.reservation_valid));
m_axi.arlock <= amo & amo_type != AMO_SC_FN5; //Used in WAITING_WRITE to determine if it was a RNW
m_axi.arvalid <= ls.new_request & ls.re & ~(amo & amo_type == AMO_SC_FN5);
m_axi.wvalid <= ls.new_request & (ls.we | (amo & amo_type == AMO_SC_FN5 & amo_unit.reservation_valid));
m_axi.wdata <= ls.data_in;
m_axi.wstrb <= ls.be;
write_outstanding <= ls.new_request & (ls.we | amo);
amo_unit.rmw_valid <= 0;
amo_unit.op <= amo_type;
amo_unit.rs2 <= ls.data_in; //Cannot use wdata because wdata will be overwritten if the RMW is not exclusive
if (ls.new_request & (ls.we | (amo & amo_type == AMO_SC_FN5 & amo_unit.reservation_valid)))
current_state <= REQUESTING_WRITE;
else if (ls.new_request & ~request_is_invalid_sc)
current_state <= REQUESTING_READ;
end
REQUESTING_READ : begin //Wait for read to be accepted
m_axi.arvalid <= ~m_axi.arready;
if (m_axi.arready)
current_state <= WAITING_READ;
end
WAITING_READ : begin //Wait for read response
ls.ready <= m_axi.rvalid & ~m_axi.awlock;
ls.data_out <= m_axi.rdata;
ls.data_valid <= m_axi.rvalid;
amo_unit.rmw_valid <= m_axi.rvalid;
if (m_axi.rvalid)
current_state <= m_axi.awlock ? REQUESTING_AMO_M : READY;
end
REQUESTING_AMO_M : begin //One cycle for computing the AMO write value
ls.data_valid <= 0;
m_axi.awvalid <= 1;
m_axi.wvalid <= 1;
m_axi.wdata <= amo_unit.rd;
amo_unit.rmw_valid <= 0;
current_state <= REQUESTING_WRITE;
end
REQUESTING_WRITE : begin //Wait for write (address and data) to be accepted
m_axi.awvalid <= m_axi.awvalid & ~m_axi.awready;
m_axi.wvalid <= m_axi.wvalid & ~m_axi.wready;
if ((~m_axi.awvalid | m_axi.awready) & (~m_axi.wvalid | m_axi.wready))
current_state <= WAITING_WRITE;
end
WAITING_WRITE : begin //Wait for write response; resubmit if RMW was not exclusive
ls.ready <= m_axi.bvalid & (~m_axi.arlock | m_axi.bresp == 2'b01);
ls.data_out <= {31'b0, m_axi.bresp != 2'b01};
ls.data_valid <= m_axi.bvalid & m_axi.awlock & ~m_axi.arlock;
m_axi.arvalid <= m_axi.bvalid & m_axi.arlock & m_axi.bresp != 2'b01;
write_outstanding <= ~(m_axi.bvalid & (~m_axi.arlock | m_axi.bresp == 2'b01));
if (m_axi.bvalid)
current_state <= m_axi.arlock & m_axi.bresp != 2'b01 ? REQUESTING_READ : READY;
end
endcase
if (rst)
ls.data_valid <= 0;
else
ls.data_valid <= m_axi.rvalid;
current_state <= READY;
end
//read channel
set_clr_reg_with_rst #(.SET_OVER_CLR(1), .WIDTH(1), .RST_VALUE(0)) arvalid_m (
.clk, .rst,
.set(ls.new_request & ls.re),
.clr(m_axi.arready),
.result(m_axi.arvalid)
);
always_ff @ (posedge clk) begin
if (m_axi.rvalid)
ls.data_out <= m_axi.rdata;
end
//write channel
set_clr_reg_with_rst #(.SET_OVER_CLR(1), .WIDTH(1), .RST_VALUE(0)) awvalid_m (
.clk, .rst,
.set(ls.new_request & ls.we),
.clr(m_axi.awready),
.result(m_axi.awvalid)
);
set_clr_reg_with_rst #(.SET_OVER_CLR(1), .WIDTH(1), .RST_VALUE(0)) wvalid_m (
.clk, .rst,
.set(ls.new_request & ls.we),
.clr(m_axi.wready),
.result(m_axi.wvalid)
);
assign m_axi.wlast = m_axi.wvalid;
endmodule

67
core/memory_sub_units/local_mem_sub_unit.sv Executable file → Normal file
View file

@ -1,5 +1,5 @@
/*
* Copyright © 2017 Eric Matthews, Lesley Shannon
* Copyright © 2017 Eric Matthews, Chris Keilbart, Lesley Shannon
*
* Licensed under the Apache License, Version 2.0 (the "License");
* you may not use this file except in compliance with the License.
@ -18,35 +18,78 @@
*
* Author(s):
* Eric Matthews <ematthew@sfu.ca>
* Chris Keilbart <ckeilbar@sfu.ca>
*/
module local_mem_sub_unit
import cva5_config::*;
import riscv_types::*;
import cva5_types::*;
(
input logic clk,
input logic rst,
output logic write_outstanding,
input logic amo,
input amo_t amo_type,
amo_interface.subunit amo_unit,
memory_sub_unit_interface.responder unit,
local_memory_interface.master local_mem
);
assign unit.ready = 1;
//If amo is tied to 0 and amo_unit is disconnected the tools can optimize most of the logic away
assign local_mem.addr = unit.addr[31:2];
assign local_mem.en = unit.new_request;
assign local_mem.be = unit.be;
assign local_mem.data_in = unit.data_in;
assign unit.data_out = local_mem.data_out;
logic rmw;
logic[31:2] rmw_addr;
logic[31:0] rmw_rs2;
amo_t rmw_op;
logic sc_valid;
logic sc_valid_r;
assign write_outstanding = 0;
always_ff @ (posedge clk) begin
if (rst)
assign sc_valid = amo & amo_type == AMO_SC_FN5 & amo_unit.reservation_valid;
assign amo_unit.set_reservation = unit.new_request & amo & amo_type == AMO_LR_FN5;
assign amo_unit.clear_reservation = unit.new_request;
assign amo_unit.reservation = unit.addr;
assign amo_unit.rmw_valid = rmw;
assign amo_unit.op = rmw_op;
assign amo_unit.rs1 = local_mem.data_out;
assign amo_unit.rs2 = rmw_rs2;
always_comb begin
if (rmw) begin
unit.ready = 0;
local_mem.addr = rmw_addr;
local_mem.en = 1;
local_mem.be = '1;
local_mem.data_in = amo_unit.rd;
unit.data_out = local_mem.data_out;
end else begin
unit.ready = 1;
local_mem.addr = unit.addr[31:2];
local_mem.en = unit.new_request;
local_mem.be = {4{unit.we | sc_valid}} & unit.be; //SC only writes when it succeeds
local_mem.data_in = unit.data_in;
unit.data_out = sc_valid_r ? 32'b1 : local_mem.data_out;
end
end
always_ff @(posedge clk) begin
if (rst) begin
unit.data_valid <= 0;
else
rmw <= 0;
sc_valid_r <= 0;
end
else begin
unit.data_valid <= unit.new_request & unit.re;
rmw <= unit.new_request & amo & ~(amo_type inside {AMO_LR_FN5, AMO_SC_FN5});
sc_valid_r <= sc_valid;
end
rmw_addr <= unit.addr[31:2];
rmw_rs2 <= unit.data_in;
rmw_op <= amo_type;
end
endmodule

View file

@ -1,5 +1,5 @@
/*
* Copyright © 2019 Eric Matthews, Lesley Shannon
* Copyright © 2019 Eric Matthews, Chris Keilbart, Lesley Shannon
*
* Licensed under the Apache License, Version 2.0 (the "License");
* you may not use this file except in compliance with the License.
@ -18,57 +18,163 @@
*
* Author(s):
* Eric Matthews <ematthew@sfu.ca>
* Chris Keilbart <ckeilbar@sfu.ca>
*/
module wishbone_master
import cva5_config::*;
import riscv_types::*;
import cva5_types::*;
#(
parameter int unsigned LR_WAIT = 32, //The number of cycles the master holds cyc after an LR
parameter logic INCLUDE_AMO = 1 //Required because the tools cannot fully optimize even if amo signals are tied off
)
(
input logic clk,
input logic rst,
output logic write_outstanding,
wishbone_interface.master wishbone,
input logic amo,
input amo_t amo_type,
amo_interface.subunit amo_unit,
memory_sub_unit_interface.responder ls
);
logic busy;
////////////////////////////////////////////////////
//Implementation
assign wishbone.cti = 0;
assign wishbone.bte = 0;
typedef enum {
READY,
REQUESTING,
REQUESTING_AMO_R,
REQUESTING_AMO_M,
REQUESTING_AMO_W,
READY_LR,
REQUESTING_SC
} state_t;
state_t current_state;
always_ff @ (posedge clk) begin
if (ls.new_request) begin
wishbone.adr <= ls.addr[31:2];
wishbone.sel <= ls.we ? ls.be : '1;
wishbone.we <= ls.we;
wishbone.dat_w <= ls.data_in;
end
end
logic[$clog2(LR_WAIT)-1:0] cyc_counter;
logic request_is_sc;
assign request_is_sc = amo & amo_type == AMO_SC_FN5;
always_ff @ (posedge clk) begin
assign amo_unit.set_reservation = ls.new_request & amo & amo_type == AMO_LR_FN5;
assign amo_unit.clear_reservation = ls.new_request;
assign amo_unit.reservation = ls.addr;
assign amo_unit.rs1 = ls.data_out;
assign amo_unit.rs2 = wishbone.dat_w;
assign wishbone.cti = '0;
assign wishbone.bte = '0;
always_ff @(posedge clk) begin
wishbone.adr[1:0] <= '0;
unique case (current_state)
READY : begin //Accept any request
ls.ready <= ~ls.new_request | request_is_sc;
ls.data_out <= 32'b1;
ls.data_valid <= ls.new_request & request_is_sc;
wishbone.adr[31:2] <= ls.addr[31:2];
wishbone.sel <= ls.we ? ls.be : '1;
wishbone.dat_w <= ls.data_in;
wishbone.we <= ls.we;
wishbone.stb <= ls.new_request & ~request_is_sc;
wishbone.cyc <= ls.new_request & ~request_is_sc;
write_outstanding <= ls.new_request & (ls.we | amo);
amo_unit.rmw_valid <= 0;
amo_unit.op <= amo_type;
cyc_counter <= amo ? 1 : 0;
if (ls.new_request & (~amo | amo_type == AMO_LR_FN5))
current_state <= REQUESTING;
else if (ls.new_request & amo & amo_type != AMO_SC_FN5)
current_state <= REQUESTING_AMO_R;
end
REQUESTING : begin //Wait for response
ls.ready <= wishbone.ack;
ls.data_out <= wishbone.dat_r;
ls.data_valid <= ~wishbone.we & wishbone.ack;
wishbone.stb <= ~wishbone.ack;
wishbone.cyc <= ~wishbone.ack | cyc_counter[0];
write_outstanding <= wishbone.we & ~wishbone.ack;
if (wishbone.ack)
current_state <= cyc_counter[0] ? READY_LR : READY;
end
REQUESTING_AMO_R : begin //Read for an AMO
if (INCLUDE_AMO) begin
ls.data_out <= wishbone.dat_r;
ls.data_valid <= wishbone.ack;
wishbone.stb <= ~wishbone.ack;
amo_unit.rmw_valid <= wishbone.ack;
if (wishbone.ack)
current_state <= REQUESTING_AMO_M;
end
end
REQUESTING_AMO_M : begin //One cycle for computing the AMO write value
if (INCLUDE_AMO) begin
ls.data_valid <= 0;
wishbone.dat_w <= amo_unit.rd;
wishbone.stb <= 1;
wishbone.we <= 1;
amo_unit.rmw_valid <= 0;
current_state <= REQUESTING_AMO_W;
end
end
REQUESTING_AMO_W : begin //Write for an AMO
if (INCLUDE_AMO) begin
ls.ready <= wishbone.ack;
wishbone.cyc <= ~wishbone.ack;
wishbone.stb <= ~wishbone.ack;
write_outstanding <= ~wishbone.ack;
if (wishbone.ack)
current_state <= READY;
end
end
READY_LR : begin //Cyc is held; hold for LR_WAIT cycles
if (INCLUDE_AMO) begin
ls.ready <= ~ls.new_request | (request_is_sc & ~amo_unit.reservation_valid);
ls.data_out <= {31'b0, ~amo_unit.reservation_valid};
ls.data_valid <= ls.new_request & request_is_sc;
wishbone.adr[31:2] <= ls.addr[31:2];
wishbone.sel <= ls.we ? ls.be : '1;
wishbone.dat_w <= ls.data_in;
wishbone.we <= ls.we | request_is_sc;
wishbone.stb <= ls.new_request & ~(request_is_sc & ~amo_unit.reservation_valid);
write_outstanding <= ls.new_request & (ls.we | amo);
amo_unit.rmw_valid <= 0;
amo_unit.op <= amo_type;
if (ls.new_request)
wishbone.cyc <= ~(request_is_sc & ~amo_unit.reservation_valid);
else if (32'(cyc_counter) == LR_WAIT-1)
wishbone.cyc <= 0;
cyc_counter <= cyc_counter + 1;
if (ls.new_request & (~amo | amo_type == AMO_LR_FN5))
current_state <= REQUESTING;
else if (ls.new_request & amo & amo_type != AMO_SC_FN5)
current_state <= REQUESTING_AMO_R;
else if (ls.new_request & amo & amo_type == AMO_SC_FN5 & amo_unit.reservation_valid)
current_state <= REQUESTING_SC;
else if (32'(cyc_counter) == LR_WAIT-1 | ls.new_request)
current_state <= READY;
end
end
REQUESTING_SC : begin //Exclusive write
if (INCLUDE_AMO) begin
ls.ready <= wishbone.ack;
ls.data_valid <= 0;
wishbone.stb = ~wishbone.ack;
wishbone.cyc = ~wishbone.ack;
write_outstanding <= ~wishbone.ack;
if (wishbone.ack)
current_state <= REQUESTING;
end
end
endcase
if (rst)
busy <= 0;
else
busy <= (busy & ~wishbone.ack) | ls.new_request;
end
assign ls.ready = (~busy);
assign wishbone.stb = busy;
assign wishbone.cyc = busy;
always_ff @ (posedge clk) begin
if (rst)
ls.data_valid <= 0;
else
ls.data_valid <= ~wishbone.we & wishbone.ack;
end
always_ff @ (posedge clk) begin
if (wishbone.ack)
ls.data_out <= wishbone.dat_r;
current_state <= READY;
end
endmodule

333
core/mmu/dtlb.sv Normal file
View file

@ -0,0 +1,333 @@
/*
* Copyright © 2017 Eric Matthews, Chris Keilbart, Lesley Shannon
*
* Licensed under the Apache License, Version 2.0 (the "License");
* you may not use this file except in compliance with the License.
* You may obtain a copy of the License at
*
* http://www.apache.org/licenses/LICENSE-2.0
*
* Unless required by applicable law or agreed to in writing, software
* distributed under the License is distributed on an "AS IS" BASIS,
* WITHOUT WARRANTIES OR CONDITIONS OF ANY KIND, either express or implied.
* See the License for the specific language governing permissions and
* limitations under the License.
*
* Initial code developed under the supervision of Dr. Lesley Shannon,
* Reconfigurable Computing Lab, Simon Fraser University.
*
* Author(s):
* Eric Matthews <ematthew@sfu.ca>
* Chris Keilbart <ckeilbar@sfu.ca>
*/
module dtlb
import cva5_types::*;
import riscv_types::*;
#(
parameter WAYS = 2,
parameter DEPTH = 32
)
(
input logic clk,
input logic rst,
input logic translation_on,
input tlb_packet_t sfence,
input logic [ASIDLEN-1:0] asid,
mmu_interface.tlb mmu,
tlb_interface.tlb tlb
);
//////////////////////////////////////////
localparam TAG_W = 20 - $clog2(DEPTH);
localparam TAG_W_S = 10 - $clog2(DEPTH);
localparam WAY_W = WAYS == 1 ? 1 : $clog2(WAYS);
typedef struct packed {
logic valid;
logic [ASIDLEN-1:0] asid;
logic [TAG_W-1:0] tag;
//Signals from the PTE
logic [9:0] ppn1;
logic [9:0] ppn0;
logic dirty;
logic globe;
logic user;
logic execute;
logic write;
logic read;
} tlb_entry_t;
typedef struct packed {
logic valid;
logic [ASIDLEN-1:0] asid;
logic [TAG_W_S-1:0] tag;
//Signals from the PTE
logic [9:0] ppn1;
logic dirty;
logic globe;
logic user;
logic execute;
logic write;
logic read;
} tlb_entry_s_t;
////////////////////////////////////////////////////
//Implementation
//Regular and super pages stored separately
//Regular pages are set associative and super pages are direct mapped
//Random replacement
logic[WAYS-1:0] replacement_way;
cycler #(.C_WIDTH(WAYS)) replacement_policy (
.en(1'b1),
.one_hot(replacement_way),
.*);
//LUTRAM storage
logic [$clog2(DEPTH)-1:0] tlb_raddr;
logic [$clog2(DEPTH)-1:0] tlb_raddr_s;
logic [$clog2(DEPTH)-1:0] tlb_waddr;
logic [$clog2(DEPTH)-1:0] tlb_waddr_s;
tlb_entry_t [WAYS-1:0] rdata;
tlb_entry_s_t rdata_s;
logic [WAYS-1:0] write;
logic write_s;
tlb_entry_t wdata;
tlb_entry_s_t wdata_s;
generate for (genvar i = 0; i < WAYS; i++) begin : gen_lut_rams
lutram_1w_1r #(.DATA_TYPE(tlb_entry_t), .DEPTH(DEPTH)) data_table (
.waddr(tlb_waddr),
.raddr(tlb_raddr),
.ram_write(write[i]),
.new_ram_data(wdata),
.ram_data_out(rdata[i]),
.*);
end endgenerate
lutram_1w_1r #(.DATA_TYPE(tlb_entry_s_t), .DEPTH(DEPTH)) data_table_s (
.waddr(tlb_waddr_s),
.raddr(tlb_raddr_s),
.ram_write(write_s),
.new_ram_data(wdata_s),
.ram_data_out(rdata_s),
.*);
//Hit detection
logic [TAG_W-1:0] cmp_tag;
logic [TAG_W_S-1:0] cmp_tag_s;
logic [ASIDLEN-1:0] cmp_asid;
logic [WAYS-1:0] tag_hit;
logic tag_hit_s;
logic [WAYS-1:0] asid_hit;
logic asid_hit_s;
logic [WAYS-1:0] rdata_global;
logic rdata_global_s;
logic [WAYS-1:0][9:0] ppn0;
logic [WAYS-1:0][9:0] ppn1;
logic [9:0] ppn1_s;
logic [WAYS-1:0] perms_valid_comb;
logic perms_valid_comb_s;
logic [WAYS-1:0] perms_valid;
logic perms_valid_s;
logic [WAYS-1:0] hit_ohot;
logic hit_ohot_s;
logic [WAY_W-1:0] hit_way;
logic hit;
assign cmp_tag = sfence.valid ? sfence.addr[31-:TAG_W] : tlb.virtual_address[31-:TAG_W];
assign cmp_tag_s = sfence.valid ? sfence.addr[31-:TAG_W_S] : tlb.virtual_address[31-:TAG_W_S];
assign cmp_asid = sfence.valid ? sfence.asid : asid;
always_ff @(posedge clk) begin
for (int i = 0; i < WAYS; i++) begin
tag_hit[i] <= rdata[i].tag == cmp_tag;
rdata_global[i] <= rdata[i].globe;
ppn0[i] <= rdata[i].ppn0;
ppn1[i] <= rdata[i].ppn1;
asid_hit[i] <= rdata[i].asid == cmp_asid;
perms_valid[i] <= perms_valid_comb[i];
hit_ohot[i] <= rdata[i].valid & (rdata[i].tag == cmp_tag) & (rdata[i].asid == cmp_asid | rdata[i].globe);
end
tag_hit_s <= rdata_s.tag == cmp_tag_s;
rdata_global_s <= rdata_s.globe;
ppn1_s <= rdata_s.ppn1;
asid_hit_s <= rdata_s.asid == cmp_asid;
perms_valid_s <= perms_valid_comb_s;
hit_ohot_s <= rdata_s.valid & (rdata_s.tag == cmp_tag_s) & (rdata_s.asid == cmp_asid | rdata_s.globe);
end
assign hit = |hit_ohot | hit_ohot_s;
priority_encoder #(.WIDTH(WAYS)) hit_cast (
.priority_vector(hit_ohot),
.encoded_result(hit_way)
);
generate for (genvar i = 0; i < WAYS; i++) begin : gen_perms_check
perms_check checks (
.pte_perms('{
d : rdata[i].dirty,
a : 1'b1,
u : rdata[i].user,
x : rdata[i].execute,
w : rdata[i].write,
r : rdata[i].read,
default: 'x
}),
.rnw(tlb.rnw),
.execute(1'b0),
.mxr(mmu.mxr),
.sum(mmu.sum),
.privilege(mmu.privilege),
.valid(perms_valid_comb[i])
);
end endgenerate
perms_check checks (
.pte_perms('{
d : rdata_s.dirty,
a : 1'b1,
u : rdata_s.user,
x : rdata_s.execute,
w : rdata_s.write,
r : rdata_s.read,
default: 'x
}),
.rnw(tlb.rnw),
.execute(1'b0),
.mxr(mmu.mxr),
.sum(mmu.sum),
.privilege(mmu.privilege),
.valid(perms_valid_comb_s)
);
//SFENCE
logic sfence_valid_r;
logic [$clog2(DEPTH)-1:0] flush_addr;
lfsr #(.WIDTH($clog2(DEPTH)), .NEEDS_RESET(0)) lfsr_counter (
.en(1'b1),
.value(flush_addr),
.*);
always_ff @(posedge clk) begin
if (tlb.new_request | sfence.valid) begin
tlb_waddr <= tlb_raddr;
tlb_waddr_s <= tlb_raddr_s;
end
sfence_valid_r <= sfence.valid; //Other SFENCE signals remain registered and do not need to be saved
end
always_comb begin
if (sfence.valid) begin
tlb_raddr = sfence.addr_only ? sfence.addr[12 +: $clog2(DEPTH)] : flush_addr;
tlb_raddr_s = sfence.addr_only ? sfence.addr[22 +: $clog2(DEPTH)] : flush_addr;
end
else begin
tlb_raddr = tlb.virtual_address[12 +: $clog2(DEPTH)];
tlb_raddr_s = tlb.virtual_address[22 +: $clog2(DEPTH)];
end
end
assign wdata = '{
valid : ~sfence_valid_r,
asid : asid,
tag : mmu.virtual_address[31-:TAG_W],
ppn1 : mmu.upper_physical_address[19:10],
ppn0 : mmu.upper_physical_address[9:0],
dirty : mmu.perms.d,
globe : mmu.perms.g,
user : mmu.perms.u,
execute : mmu.perms.x,
write : mmu.perms.w,
read : mmu.perms.r
};
assign wdata_s = '{
valid : ~sfence_valid_r,
asid : asid,
tag : mmu.virtual_address[31-:TAG_W_S],
ppn1 : mmu.upper_physical_address[19:10],
dirty : mmu.perms.d,
globe : mmu.perms.g,
user : mmu.perms.u,
execute : mmu.perms.x,
write : mmu.perms.w,
read : mmu.perms.r
};
always_comb begin
for (int i = 0; i < WAYS; i++) begin
case ({sfence_valid_r, sfence.addr_only, sfence.asid_only})
3'b100: begin //Clear everything
write[i] = 1'b1;
write_s = 1'b1;
end
3'b101: begin //Clear non global for specified address space
write[i] = ~rdata_global[i] & asid_hit[i];
write_s = ~rdata_global_s & asid_hit_s;
end
3'b110: begin //Clear matching addresses
write[i] = tag_hit[i];
write_s = tag_hit_s;
end
3'b111: begin //Clear if both
write[i] = (~rdata[i].globe & asid_hit[i]) & tag_hit[i];
write_s = (~rdata_s.globe & asid_hit_s) & tag_hit_s;
end
default: begin
write[i] = mmu.write_entry & ~mmu.superpage & replacement_way[i];
write_s = mmu.write_entry & mmu.superpage;
end
endcase
end
end
//Permission fail
logic perm_fail;
assign perm_fail = |(hit_ohot & ~perms_valid) | (hit_ohot_s & ~perms_valid_s);
//MMU interface
logic new_request_r;
assign mmu.request = translation_on & new_request_r & ~hit & ~perm_fail;
assign mmu.execute = 0;
always_ff @(posedge clk) begin
new_request_r <= tlb.new_request;
if (tlb.new_request) begin
mmu.virtual_address <= tlb.virtual_address;
mmu.rnw <= tlb.rnw;
end
end
//TLB interface
assign tlb.done = (new_request_r & ((hit & ~perm_fail) | ~translation_on)) | mmu.write_entry;
assign tlb.ready = 1; //Not always ready, but requests will not be sent if it isn't done
assign tlb.is_fault = mmu.is_fault | (new_request_r & translation_on & perm_fail);
always_comb begin
tlb.physical_address[11:0] = mmu.virtual_address[11:0];
if (~translation_on)
tlb.physical_address[31:12] = mmu.virtual_address[31:12];
else if (new_request_r) begin
tlb.physical_address[31:22] = hit_ohot_s ? ppn1_s : ppn1[hit_way];
tlb.physical_address[21:12] = hit_ohot_s ? mmu.virtual_address[21:12] : ppn0[hit_way];
end else begin
tlb.physical_address[31:22] = mmu.upper_physical_address[19:10];
tlb.physical_address[21:12] = mmu.superpage ? mmu.virtual_address[21:12] : mmu.upper_physical_address[9:0];
end
end
////////////////////////////////////////////////////
//End of Implementation
////////////////////////////////////////////////////
////////////////////////////////////////////////////
//Assertions
request_on_miss:
assert property (@(posedge clk) disable iff (rst) (mmu.request) |-> ~tlb.new_request)
else $error("Request during miss in TLB!");
endmodule

294
core/mmu/itlb.sv Normal file
View file

@ -0,0 +1,294 @@
/*
* Copyright © 2017 Eric Matthews, Chris Keilbart, Lesley Shannon
*
* Licensed under the Apache License, Version 2.0 (the "License");
* you may not use this file except in compliance with the License.
* You may obtain a copy of the License at
*
* http://www.apache.org/licenses/LICENSE-2.0
*
* Unless required by applicable law or agreed to in writing, software
* distributed under the License is distributed on an "AS IS" BASIS,
* WITHOUT WARRANTIES OR CONDITIONS OF ANY KIND, either express or implied.
* See the License for the specific language governing permissions and
* limitations under the License.
*
* Initial code developed under the supervision of Dr. Lesley Shannon,
* Reconfigurable Computing Lab, Simon Fraser University.
*
* Author(s):
* Eric Matthews <ematthew@sfu.ca>
* Chris Keilbart <ckeilbar@sfu.ca>
*/
module itlb
import riscv_types::*;
import cva5_types::*;
#(
parameter WAYS = 2,
parameter DEPTH = 32
)
(
input logic clk,
input logic rst,
input logic translation_on,
input tlb_packet_t sfence,
input logic abort_request,
input logic [ASIDLEN-1:0] asid,
mmu_interface.tlb mmu,
tlb_interface.tlb tlb
);
//////////////////////////////////////////
localparam TAG_W = 20 - $clog2(DEPTH);
localparam TAG_W_S = 10 - $clog2(DEPTH);
localparam WAY_W = WAYS == 1 ? 1 : $clog2(WAYS);
typedef struct packed {
logic valid;
logic [ASIDLEN-1:0] asid;
logic [TAG_W-1:0] tag;
//Signals from the PTE
logic [9:0] ppn1;
logic [9:0] ppn0;
logic globe;
logic user;
} tlb_entry_t;
typedef struct packed {
logic valid;
logic [ASIDLEN-1:0] asid;
logic [TAG_W_S-1:0] tag;
//Signals from the PTE
logic [9:0] ppn1;
logic globe;
logic user;
} tlb_entry_s_t;
////////////////////////////////////////////////////
//Implementation
//Regular and super pages stored separately
//Regular pages are set associative and super pages are direct mapped
//Random replacement
logic[WAYS-1:0] replacement_way;
cycler #(.C_WIDTH(WAYS)) replacement_policy (
.en(1'b1),
.one_hot(replacement_way),
.*);
//LUTRAM storage
logic [$clog2(DEPTH)-1:0] tlb_addr;
logic [$clog2(DEPTH)-1:0] tlb_addr_s;
tlb_entry_t [WAYS-1:0] rdata;
tlb_entry_s_t rdata_s;
logic [WAYS-1:0] write;
logic write_s;
tlb_entry_t wdata;
tlb_entry_s_t wdata_s;
generate for (genvar i = 0; i < WAYS; i++) begin : gen_lut_rams
lutram_1w_1r #(.DATA_TYPE(tlb_entry_t), .DEPTH(DEPTH)) data_table (
.waddr(tlb_addr),
.raddr(tlb_addr),
.ram_write(write[i]),
.new_ram_data(wdata),
.ram_data_out(rdata[i]),
.*);
end endgenerate
lutram_1w_1r #(.DATA_TYPE(tlb_entry_s_t), .DEPTH(DEPTH)) data_table_s (
.waddr(tlb_addr_s),
.raddr(tlb_addr_s),
.ram_write(write_s),
.new_ram_data(wdata_s),
.ram_data_out(rdata_s),
.*);
//Hit detection
logic [TAG_W-1:0] cmp_tag;
logic [TAG_W_S-1:0] cmp_tag_s;
logic [ASIDLEN-1:0] cmp_asid;
logic [WAYS-1:0] tag_hit;
logic tag_hit_s;
logic [WAYS-1:0] asid_hit;
logic asid_hit_s;
logic [WAYS-1:0] perms_valid;
logic perms_valid_s;
logic [WAYS-1:0] hit_ohot;
logic hit_ohot_s;
logic [WAY_W-1:0] hit_way;
logic hit;
assign cmp_tag = sfence.valid ? sfence.addr[31-:TAG_W] : tlb.virtual_address[31-:TAG_W];
assign cmp_tag_s = sfence.valid ? sfence.addr[31-:TAG_W_S] : tlb.virtual_address[31-:TAG_W_S];
assign cmp_asid = sfence.valid ? sfence.asid : asid;
always_comb begin
for (int i = 0; i < WAYS; i++) begin
tag_hit[i] = rdata[i].tag == cmp_tag;
asid_hit[i] = rdata[i].asid == cmp_asid;
hit_ohot[i] = rdata[i].valid & tag_hit[i] & (asid_hit[i] | rdata[i].globe);
end
tag_hit_s = rdata_s.tag == cmp_tag_s;
asid_hit_s = rdata_s.asid == cmp_asid;
hit_ohot_s = rdata_s.valid & tag_hit_s & (asid_hit_s | rdata_s.globe);
end
assign hit = |hit_ohot | hit_ohot_s;
priority_encoder #(.WIDTH(WAYS)) hit_cast (
.priority_vector(hit_ohot),
.encoded_result(hit_way)
);
generate for (genvar i = 0; i < WAYS; i++) begin : gen_perms_check
perms_check checks (
.pte_perms('{
x : 1'b1,
a : 1'b1,
u : rdata[i].user,
default: 'x
}),
.rnw(tlb.rnw),
.execute(1'b1),
.mxr(mmu.mxr),
.sum(mmu.sum),
.privilege(mmu.privilege),
.valid(perms_valid[i])
);
end endgenerate
perms_check checks_s (
.pte_perms('{
x : 1'b1,
a : 1'b1,
u : rdata_s.user,
default: 'x
}),
.rnw(tlb.rnw),
.execute(1'b1),
.mxr(mmu.mxr),
.sum(mmu.sum),
.privilege(mmu.privilege),
.valid(perms_valid_s)
);
//SFENCE
logic [$clog2(DEPTH)-1:0] flush_addr;
lfsr #(.WIDTH($clog2(DEPTH)), .NEEDS_RESET(0)) lfsr_counter (
.en(1'b1),
.value(flush_addr),
.*);
always_comb begin
if (sfence.valid) begin
tlb_addr = sfence.addr_only ? sfence.addr[12 +: $clog2(DEPTH)] : flush_addr;
tlb_addr_s = sfence.addr_only ? sfence.addr[22 +: $clog2(DEPTH)] : flush_addr;
end
else begin
tlb_addr = tlb.virtual_address[12 +: $clog2(DEPTH)];
tlb_addr_s = tlb.virtual_address[22 +: $clog2(DEPTH)];
end
end
assign wdata = '{
valid : ~sfence.valid,
asid : asid,
tag : tlb.virtual_address[31-:TAG_W],
ppn1 : mmu.upper_physical_address[19:10],
ppn0 : mmu.upper_physical_address[9:0],
globe : mmu.perms.g,
user : mmu.perms.u
};
assign wdata_s = '{
valid : ~sfence.valid,
asid : asid,
tag : tlb.virtual_address[31-:TAG_W_S],
ppn1 : mmu.upper_physical_address[19:10],
globe : mmu.perms.g,
user : mmu.perms.u
};
always_comb begin
for (int i = 0; i < WAYS; i++) begin
case ({sfence.valid, sfence.addr_only, sfence.asid_only})
3'b100: begin //Clear everything
write[i] = 1'b1;
write_s = 1'b1;
end
3'b101: begin //Clear non global for specified address space
write[i] = ~rdata[i].globe & asid_hit[i];
write_s = ~rdata_s.globe & asid_hit_s;
end
3'b110: begin //Clear matching addresses
write[i] = tag_hit[i];
write_s = tag_hit_s;
end
3'b111: begin //Clear if both
write[i] = (~rdata[i].globe & asid_hit[i]) & tag_hit[i];
write_s = (~rdata_s.globe & asid_hit_s) & tag_hit_s;
end
default: begin
write[i] = mmu.write_entry & ~mmu.superpage & replacement_way[i];
write_s = mmu.write_entry & mmu.superpage;
end
endcase
end
end
//Permission fail
logic perm_fail;
assign perm_fail = |(hit_ohot & ~perms_valid) | (hit_ohot_s & ~perms_valid_s);
//MMU interface
logic request_in_progress;
always_ff @ (posedge clk) begin
if (rst)
request_in_progress <= 0;
else if (mmu.write_entry | mmu.is_fault | abort_request)
request_in_progress <= 0;
else if (mmu.request)
request_in_progress <= 1;
end
assign mmu.request = translation_on & tlb.new_request & ~hit & ~perm_fail;
assign mmu.execute = 1;
assign mmu.rnw = tlb.rnw;
assign mmu.virtual_address = tlb.virtual_address;
//TLB interface
logic mmu_request_complete;
always_ff @(posedge clk) begin
if (rst)
mmu_request_complete <= 0;
else
mmu_request_complete <= mmu.write_entry & ~abort_request;
end
assign tlb.done = translation_on ? (hit & ~perm_fail & (tlb.new_request | mmu_request_complete)) : tlb.new_request;
assign tlb.ready = ~request_in_progress & ~mmu_request_complete;
assign tlb.is_fault = mmu.is_fault | (tlb.new_request & translation_on & perm_fail);
always_comb begin
tlb.physical_address[11:0] = tlb.virtual_address[11:0];
if (~translation_on)
tlb.physical_address[31:12] = tlb.virtual_address[31:12];
else if (hit_ohot_s) begin
tlb.physical_address[21:12] = tlb.virtual_address[21:12];
tlb.physical_address[31:22] = rdata_s.ppn1;
end
else begin
tlb.physical_address[21:12] = rdata[hit_way].ppn0;
tlb.physical_address[31:22] = rdata[hit_way].ppn1;
end
end
////////////////////////////////////////////////////
//End of Implementation
////////////////////////////////////////////////////
////////////////////////////////////////////////////
//Assertions
endmodule

99
core/mmu.sv → core/mmu/mmu.sv Executable file → Normal file
View file

@ -22,9 +22,6 @@
module mmu
import cva5_config::*;
import riscv_types::*;
import cva5_types::*;
import csr_types::*;
(
@ -32,22 +29,14 @@ module mmu
input logic rst,
mmu_interface.mmu mmu,
input logic abort_request,
l1_arbiter_request_interface.master l1_request,
l1_arbiter_return_interface.master l1_response
mem_interface.ro_master mem
);
typedef struct packed{
logic [11:0] ppn1;
logic [9:0] ppn0;
logic [1:0] reserved;
logic d;
logic a;
logic g;
logic u;
logic x;
logic w;
logic r;
logic v;
pte_perms_t perms;
} pte_t;
typedef enum {
@ -63,8 +52,7 @@ module mmu
logic [6:0] next_state;
pte_t pte;
logic access_valid;
logic privilege_valid;
logic perms_valid;
localparam MAX_ABORTED_REQUESTS = 4;
logic abort_queue_full;
@ -74,25 +62,21 @@ module mmu
////////////////////////////////////////////////////
//L1 arbiter Interfrace
assign l1_request.rnw = 1;
assign l1_request.be = '1;
assign l1_request.size = '0;
assign l1_request.is_amo = 0;
assign l1_request.amo = 0;
assign mem.rlen = '0;
assign l1_request.request = (state[SEND_REQUEST_1] | state[SEND_REQUEST_2]) & ~abort_request;
assign mem.request = (state[SEND_REQUEST_1] | state[SEND_REQUEST_2]) & ~abort_request;
//Page Table addresses
always_ff @ (posedge clk) begin
if (state[IDLE] | l1_response.data_valid) begin
if (state[IDLE] | (mem.rvalid & ~discard_data)) begin
if (state[IDLE])
l1_request.addr <= {mmu.satp_ppn[19:0], mmu.virtual_address[31:22], 2'b00};
mem.addr <= {mmu.satp_ppn[19:0], mmu.virtual_address[31:22]};
else
l1_request.addr <= {{pte.ppn1[9:0], pte.ppn0}, mmu.virtual_address[21:12], 2'b00};
mem.addr <= {pte.ppn1[9:0], pte.ppn0, mmu.virtual_address[21:12]};
end
end
assign pte = l1_response.data;
assign pte = mem.rdata;
////////////////////////////////////////////////////
//Supports unlimited tracking of aborted requests
@ -103,7 +87,7 @@ module mmu
logic delayed_abort_complete;
assign delayed_abort = abort_request & (state[WAIT_REQUEST_1] | state[WAIT_REQUEST_2]);
assign delayed_abort_complete = discard_data & l1_response.data_valid;
assign delayed_abort_complete = (discard_data | abort_request) & mem.rvalid;
always_ff @ (posedge clk) begin
if (rst)
abort_tracking <= 0;
@ -113,18 +97,16 @@ module mmu
assign discard_data = abort_tracking[COUNT_W];
assign abort_queue_full = abort_tracking[COUNT_W] & ~|abort_tracking[COUNT_W-1:0];
////////////////////////////////////////////////////
//Access and permission checks
//A and D bits are software managed
assign access_valid =
(mmu.execute & pte.x & pte.a) | //fetch
(mmu.rnw & (pte.r | (pte.x & mmu.mxr)) & pte.a) | //load
((~mmu.rnw & ~mmu.execute) & pte.w & pte.a & pte.d); //store
assign privilege_valid =
(mmu.privilege == MACHINE_PRIVILEGE) |
((mmu.privilege == SUPERVISOR_PRIVILEGE) & (~pte.u | (pte.u & mmu.sum))) |
((mmu.privilege == USER_PRIVILEGE) & pte.u);
perms_check perm (
.pte_perms(pte.perms),
.rnw(mmu.rnw),
.execute(mmu.execute),
.mxr(mmu.mxr),
.sum(mmu.sum),
.privilege(mmu.privilege),
.valid(perms_valid)
);
////////////////////////////////////////////////////
//State Machine
@ -135,29 +117,29 @@ module mmu
if (mmu.request & ~abort_queue_full)
next_state = 2**SEND_REQUEST_1;
state[SEND_REQUEST_1] :
if (l1_request.ack)
if (mem.ack)
next_state = 2**WAIT_REQUEST_1;
state[WAIT_REQUEST_1] :
if (l1_response.data_valid & ~discard_data) begin
if (~pte.v | (~pte.r & pte.w)) //page not valid OR invalid xwr pattern
if (mem.rvalid & ~discard_data) begin
if (~pte.perms.v | (~pte.perms.r & pte.perms.w)) //page not valid OR invalid xwr pattern
next_state = 2**COMPLETE_FAULT;
else if (pte.v & (pte.r | pte.x)) begin//superpage (all remaining xwr patterns other than all zeros)
if (access_valid & privilege_valid)
else if (pte.perms.v & (pte.perms.r | pte.perms.x)) begin//superpage (all remaining xwr patterns other than all zeros)
if (perms_valid & ~|pte.ppn0) //check for misaligned superpage
next_state = 2**COMPLETE_SUCCESS;
else
next_state = 2**COMPLETE_FAULT;
end else //(pte.v & ~pte.x & ~pte.w & ~pte.r) pointer to next level in page table
end else //(pte.perms.v & ~pte.perms.x & ~pte.perms.w & ~pte.perms.r) pointer to next level in page table
next_state = 2**SEND_REQUEST_2;
end
state[SEND_REQUEST_2] :
if (l1_request.ack)
if (mem.ack)
next_state = 2**WAIT_REQUEST_2;
state[WAIT_REQUEST_2] :
if (l1_response.data_valid & ~discard_data) begin
if (access_valid & privilege_valid)
next_state = 2**COMPLETE_SUCCESS;
else
if (mem.rvalid & ~discard_data) begin
if (~perms_valid | ~pte.perms.v | (~pte.perms.r & pte.perms.w)) //perm fail or invalid
next_state = 2**COMPLETE_FAULT;
else
next_state = 2**COMPLETE_SUCCESS;
end
state[COMPLETE_SUCCESS], state[COMPLETE_FAULT] :
next_state = 2**IDLE;
@ -177,7 +159,16 @@ module mmu
////////////////////////////////////////////////////
//TLB return path
always_ff @ (posedge clk) begin
if (l1_response.data_valid) begin
if (mem.rvalid) begin
mmu.superpage <= state[WAIT_REQUEST_1];
mmu.perms.d <= pte.perms.d;
mmu.perms.a <= pte.perms.a;
mmu.perms.g <= pte.perms.g | (state[WAIT_REQUEST_2] & mmu.perms.g);
mmu.perms.u <= pte.perms.u;
mmu.perms.x <= pte.perms.x;
mmu.perms.w <= pte.perms.w;
mmu.perms.r <= pte.perms.r;
mmu.perms.v <= pte.perms.v;
mmu.upper_physical_address[19:10] <= pte.ppn1[9:0];
mmu.upper_physical_address[9:0] <= state[WAIT_REQUEST_2] ? pte.ppn0 : mmu.virtual_address[21:12];
end
@ -191,17 +182,15 @@ module mmu
////////////////////////////////////////////////////
//Assertions
`ifdef ENABLE_SIMULATION_ASSERTIONS
mmu_spurious_l1_response:
assert property (@(posedge clk) disable iff (rst) (l1_response.data_valid) |-> (state[WAIT_REQUEST_1] | state[WAIT_REQUEST_2]))
else $error("mmu recieved response without a request");
`endif
mmu_spurious_l1_response:
assert property (@(posedge clk) disable iff (rst) (mem.rvalid) |-> (state[WAIT_REQUEST_1] | state[WAIT_REQUEST_2]))
else $error("mmu recieved response without a request");
//TLB request remains high until it recieves a response from the MMU unless
//the transaction is aborted. As such, if TLB request is low and we are not in the
//IDLE state, then our current processor state has been corrupted
mmu_tlb_state_mismatch:
assert property (@(posedge clk) disable iff (rst) (~mmu.request) |-> (state[IDLE]))
assert property (@(posedge clk) disable iff (rst) (mmu.request) |-> (state[IDLE]))
else $error("MMU and TLB state mismatch");
endmodule

58
core/mmu/perms_check.sv Normal file
View file

@ -0,0 +1,58 @@
/*
* Copyright © 2024 Liam Feng, Chris Keilbart, Eric Matthews
*
* Licensed under the Apache License, Version 2.0 (the "License");
* you may not use this file except in compliance with the License.
* You may obtain a copy of the License at
*
* http://www.apache.org/licenses/LICENSE-2.0
*
* Unless required by applicable law or agreed to in writing, software
* distributed under the License is distributed on an "AS IS" BASIS,
* WITHOUT WARRANTIES OR CONDITIONS OF ANY KIND, either express or implied.
* See the License for the specific language governing permissions and
* limitations under the License.
*
* Initial code developed under the supervision of Dr. Lesley Shannon,
* Reconfigurable Computing Lab, Simon Fraser University.
*
* Author(s):
* Liam Feng <lfa32@sfu.ca>
* Chris Keilbart <ckeilbar@sfu.ca>
* Eric Matthews <ematthew@sfu.ca>
*/
module perms_check
import csr_types::*;
(
input pte_perms_t pte_perms,
input logic rnw, //LS type
input logic execute, //Fetch
input logic mxr, //Make eXecutable Readable
input logic sum, //permit Supervisor User Memory access
input privilege_t privilege, //Effective operatinf privilege
output logic valid
);
logic access_valid;
logic privilege_valid;
//Access and permission checks
//A and D bits are software managed; this implementation corresponds to the Svade extension
assign access_valid =
(execute & pte_perms.x & pte_perms.a) | //fetch
(rnw & (pte_perms.r | (pte_perms.x & mxr)) & pte_perms.a) | //load
((~rnw & ~execute) & pte_perms.w & pte_perms.a & pte_perms.d); //store
assign privilege_valid =
(privilege == MACHINE_PRIVILEGE) |
((privilege == SUPERVISOR_PRIVILEGE) & (~pte_perms.u | (pte_perms.u & sum))) |
((privilege == USER_PRIVILEGE) & pte_perms.u);
assign valid = access_valid & privilege_valid;
endmodule

3
core/register_file.sv Executable file → Normal file
View file

@ -100,7 +100,6 @@ module register_file
) id_inuse_toggle_mem_set
(
.clk (clk),
.rst (rst),
.init_clear (gc.init_clear),
.toggle (toggle),
.toggle_addr (toggle_addr),
@ -118,7 +117,7 @@ module register_file
.clk,
.waddr(wb_phys_addr[i]),
.raddr(decode_phys_rs_addr),
.ram_write(commit[i].valid & ~gc.writeback_supress),
.ram_write(commit[i].valid & ~gc.writeback_suppress),
.new_ram_data(commit[i].data),
.ram_data_out(regfile_rs_data[i])
);

View file

@ -91,4 +91,4 @@ module register_free_list
fifo_underflow_assertion:
assert property (@(posedge clk) disable iff (rst) fifo.pop |-> fifo.valid) else $error("underflow");
endmodule
endmodule

View file

@ -96,7 +96,7 @@ module renamer
assign free_list.potential_push = (gc.init_clear & ~clear_index[5]) | (wb_retire.valid);
assign free_list.push = free_list.potential_push;
assign free_list.data_in = gc.init_clear ? {1'b1, clear_index[4:0]} : (gc.writeback_supress ? inuse_table_output.spec_phys_addr : inuse_table_output.previous_phys_addr);
assign free_list.data_in = gc.init_clear ? {1'b1, clear_index[4:0]} : (gc.rename_revert ? inuse_table_output.spec_phys_addr : inuse_table_output.previous_phys_addr);
assign free_list.pop = rename_valid;
////////////////////////////////////////////////////
@ -137,12 +137,12 @@ module renamer
rs_addr_t spec_table_write_index;
rs_addr_t spec_table_write_index_mux [4];
assign spec_table_update = rename_valid | rollback | gc.init_clear | (wb_retire.valid & gc.writeback_supress);
assign spec_table_update = rename_valid | rollback | gc.init_clear | gc.rename_revert;
logic [1:0] spec_table_sel;
one_hot_to_integer #(.C_WIDTH(4)) spec_table_sel_one_hot_to_int (
.one_hot ({gc.init_clear, rollback, (wb_retire.valid & gc.writeback_supress), 1'b0}),
.one_hot ({gc.init_clear, rollback, gc.rename_revert, 1'b0}),
.int_out (spec_table_sel)
);
@ -150,7 +150,7 @@ module renamer
assign spec_table_write_index_mux[0] = decode.rd_addr;
assign spec_table_next_mux[0].phys_addr = free_list.data_out;
assign spec_table_next_mux[0].wb_group = decode.rd_wb_group;
//gc.writeback_supress
//gc.rename_revert
assign spec_table_write_index_mux[1] = inuse_table_output.rd_addr;
assign spec_table_next_mux[1].phys_addr = inuse_table_output.previous_phys_addr;
assign spec_table_next_mux[1].wb_group = inuse_table_output.previous_wb_group;
@ -167,7 +167,7 @@ module renamer
assign spec_table_next = spec_table_next_mux[spec_table_sel];
assign spec_table_read_addr[0] = spec_table_write_index;
assign spec_table_read_addr[1+:READ_PORTS] = decode.rs_addr;
assign spec_table_read_addr[1:READ_PORTS] = decode.rs_addr;
lutram_1w_mr #(
.DATA_TYPE(spec_table_t),

View file

@ -1,170 +0,0 @@
/*
* Copyright © 2017 Eric Matthews, Lesley Shannon
*
* Licensed under the Apache License, Version 2.0 (the "License");
* you may not use this file except in compliance with the License.
* You may obtain a copy of the License at
*
* http://www.apache.org/licenses/LICENSE-2.0
*
* Unless required by applicable law or agreed to in writing, software
* distributed under the License is distributed on an "AS IS" BASIS,
* WITHOUT WARRANTIES OR CONDITIONS OF ANY KIND, either express or implied.
* See the License for the specific language governing permissions and
* limitations under the License.
*
* Initial code developed under the supervision of Dr. Lesley Shannon,
* Reconfigurable Computing Lab, Simon Fraser University.
*
* Author(s):
* Eric Matthews <ematthew@sfu.ca>
*/
module tlb_lut_ram
import cva5_config::*;
import riscv_types::*;
import cva5_types::*;
#(
parameter WAYS = 2,
parameter DEPTH = 32
)
(
input logic clk,
input logic rst,
input gc_outputs_t gc,
input logic abort_request,
input logic [ASIDLEN-1:0] asid,
mmu_interface.tlb mmu,
tlb_interface.tlb tlb
);
//////////////////////////////////////////
localparam TLB_TAG_W = 32-12-$clog2(DEPTH);
typedef struct packed {
logic valid;
logic [TLB_TAG_W-1:0] tag;
logic [19:0] phys_addr;
} tlb_entry_t;
logic [$clog2(DEPTH)-1:0] tlb_addr;
logic [TLB_TAG_W-1:0] virtual_tag;
tlb_entry_t ram [DEPTH-1:0][WAYS-1:0];
logic [DEPTH-1:0] valid [WAYS-1:0];
logic [WAYS-1:0] tag_hit;
logic hit;
logic [WAYS-1:0] replacement_way;
logic [$bits(tlb_entry_t)-1:0] ram_data [WAYS-1:0];
tlb_entry_t ram_entry [WAYS-1:0];
tlb_entry_t new_entry;
logic [$clog2(DEPTH)-1:0] flush_addr;
logic [WAYS-1:0] tlb_write;
logic request_in_progress;
logic mmu_request_complete;
////////////////////////////////////////////////////
//Implementation
//LUTRAM-based
//Reset is performed sequentially, coordinated by the gc unit
lfsr #(.WIDTH($clog2(DEPTH)), .NEEDS_RESET(0))
lfsr_counter (
.clk (clk), .rst (rst),
.en(gc.tlb_flush),
.value(flush_addr)
);
assign tlb_addr = gc.tlb_flush ? flush_addr : tlb.virtual_address[12 +: $clog2(DEPTH)];
assign tlb_write = {WAYS{gc.tlb_flush}} | replacement_way;
assign new_entry.valid = ~gc.tlb_flush;
assign new_entry.tag = virtual_tag;
assign new_entry.phys_addr = mmu.upper_physical_address;
genvar i;
generate
for (i=0; i<WAYS; i=i+1) begin : lut_rams
lutram_1w_1r #(.DATA_TYPE(tlb_entry_t), .DEPTH(DEPTH))
write_port (
.clk(clk),
.waddr(tlb_addr),
.raddr(tlb_addr),
.ram_write(tlb_write[i]),
.new_ram_data(new_entry),
.ram_data_out(ram_data[i])
);
assign ram_entry[i] = ram_data[i];
end
endgenerate
cycler #(.C_WIDTH(WAYS)) replacement_policy (
.clk (clk),
.rst (rst),
.en (1'b1),
.one_hot (replacement_way)
);
assign virtual_tag = tlb.virtual_address[31:32-TLB_TAG_W];
always_comb begin
for (int i=0; i<WAYS; i=i+1) begin
tag_hit[i] = {ram_entry[i].valid, ram_entry[i].tag} == {1'b1, virtual_tag};
end
end
assign tlb.ready = ~request_in_progress;
always_ff @ (posedge clk) begin
if (rst)
request_in_progress <= 0;
else if (mmu.write_entry | mmu.is_fault | abort_request)
request_in_progress <= 0;
else if (tlb.new_request & ~hit)
request_in_progress <= 1;
end
assign mmu.request = request_in_progress;
always_ff @ (posedge clk) begin
if (rst)
mmu_request_complete <= 0;
else
mmu_request_complete <= mmu.write_entry;
end
assign mmu.virtual_address = tlb.virtual_address;
assign mmu.execute = tlb.execute;
assign mmu.rnw = tlb.rnw;
//On a TLB miss, the entry is requested from the MMU
//Once the request completes, it will update the TLB, causing
//the current request to output a hit
assign hit = |tag_hit;
assign tlb.done = hit & (tlb.new_request | mmu_request_complete);
assign tlb.is_fault = mmu.is_fault;
always_comb begin
tlb.physical_address[11:0] = tlb.virtual_address[11:0];
tlb.physical_address[31:12] = 0;
for (int i = 0; i < WAYS; i++) begin
if (tag_hit[i]) tlb.physical_address[31:12] |= ram_entry[i].phys_addr;
end
end
////////////////////////////////////////////////////
//End of Implementation
////////////////////////////////////////////////////
////////////////////////////////////////////////////
//Assertions
multiple_tag_hit_in_tlb:
assert property (@(posedge clk) disable iff (rst) (tlb.done) |-> $onehot(tag_hit))
else $error("Multiple tag hits in TLB!");
endmodule

View file

@ -29,7 +29,7 @@ package csr_types;
typedef enum logic [1:0] {
USER_PRIVILEGE = 2'b00,
SUPERVISOR_PRIVILEGE = 2'b01,
//reserved
RESERVED_PRIVILEGE = 2'b10,
MACHINE_PRIVILEGE = 2'b11
} privilege_t;
@ -72,8 +72,6 @@ package csr_types;
logic A; //Atomic
} misa_t;
typedef struct packed {
logic sd;
logic [7:0] zeros;
@ -86,7 +84,7 @@ package csr_types;
logic [1:0] xs;
logic [1:0] fs;
logic [1:0] mpp;
logic [1:0] zeros1;
logic [1:0] vs;
logic spp;
logic mpie;
logic ube;
@ -121,7 +119,9 @@ package csr_types;
typedef struct packed {
logic [31:16] custom;
logic [15:12] zeros;
logic [15:14] zeros;
logic lcofip;
logic zero0;
logic meip;
logic zero1;
logic seip;
@ -138,7 +138,9 @@ package csr_types;
typedef struct packed {
logic [31:16] custom;
logic [15:12] zeros;
logic [15:14] zeros;
logic lcofie;
logic zero0;
logic meie;
logic zero1;
logic seie;
@ -154,11 +156,65 @@ package csr_types;
} mie_t;
typedef struct packed {
logic is_interrupt;
logic [XLEN-1-1-ECODE_W:0] zeroes;
logic [ECODE_W-1:0] code;
} mcause_t;
logic [31:16] custom;
logic [15:14] zeros;
logic lcofipd;
logic [12:10] zero1;
logic seid;
logic [8:6] zero2;
logic stid;
logic [4:2] zero3;
logic ssid;
logic zero4;
} mideleg_t;
typedef struct packed {
logic is_interrupt;
logic [XLEN-1-1-ECODE_W:0] zeros;
logic [ECODE_W-1:0] code;
} cause_t;
typedef struct packed {
logic [28:0] hpm;
logic ir;
logic tm;
logic cy;
} mcounter_t;
typedef struct packed {
logic [24:0] zeros_high;
logic cbze;
logic cbcfe;
logic [1:0] cbie;
logic [1:0] zeros_low;
logic fiom;
} envcfg_t;
typedef struct packed {
logic stce;
logic pbmte;
logic adue;
logic cde;
logic [27:0] zeros;
} envcfgh_t;
typedef struct packed {
logic [28:0] zeros;
logic jvt;
logic fcsr;
logic c;
} stateen0_t;
typedef struct packed {
logic se0;
logic envcfg;
logic zero;
logic csrind;
logic aia;
logic imsic;
logic contex;
logic [24:0] zeros;
} mstateen0h_t;
typedef struct packed {
logic mode;
@ -166,5 +222,15 @@ package csr_types;
logic [21:0] ppn;
} satp_t;
typedef struct packed {
logic d;
logic a;
logic g;
logic u;
logic x;
logic w;
logic r;
logic v;
} pte_perms_t;
endpackage

119
core/types_and_interfaces/cva5_config.sv Executable file → Normal file
View file

@ -32,31 +32,38 @@ package cva5_config;
////////////////////////////////////////////////////
//CSR Options
typedef struct packed {
int unsigned COUNTER_W; //CSR counter width (33-64 bits): 48-bits --> 32 days @ 100MHz
bit MCYCLE_WRITEABLE;
bit MINSTR_WRITEABLE;
bit MTVEC_WRITEABLE;
bit INCLUDE_MSCRATCH;
bit INCLUDE_MCAUSE;
bit INCLUDE_MTVAL;
} csr_non_standard_config_t;
typedef enum {
BARE,
M,
MU,
MSU
} modes_t;
typedef struct packed {
bit [31:0] MACHINE_IMPLEMENTATION_ID;
bit [31:0] CPU_ID;
bit [31:0] RESET_VEC; //PC value on reset
bit [31:0] RESET_MTVEC;
csr_non_standard_config_t NON_STANDARD_OPTIONS;
bit [31:0] RESET_TVEC;
bit [31:0] MCONFIGPTR;
bit INCLUDE_ZICNTR;
bit INCLUDE_ZIHPM;
bit INCLUDE_SSTC;
bit INCLUDE_SMSTATEEN;
} csr_config_t;
//Memory range [L, H]
//Address range is inclusive and must be aligned to its size
typedef struct packed {
bit [31:0] L;
bit [31:0] H;
logic [31:0] L;
logic [31:0] H;
} memory_config_t;
//Atomic configuration
typedef struct packed {
int unsigned LR_WAIT; //Must be >= the maximum number of cycles a constrained LR-SC can take
int unsigned RESERVATION_WORDS; //The amount of 32-bit words that are reserved by an LR instruction, must be == cache line size (if cache present)
} amo_config_t;
////////////////////////////////////////////////////
//Cache Options
//Size in bytes: (LINES * WAYS * LINE_W * 4)
@ -109,7 +116,7 @@ package cva5_config;
//Additionally, writeback units must be grouped before non-writeback units
localparam MAX_NUM_UNITS = 9;
typedef struct packed {
bit IEC;
bit GC;
bit BR;
//End of Write-Back Units
bit CUSTOM;
@ -122,7 +129,7 @@ package cva5_config;
} units_t;
typedef enum bit [$clog2(MAX_NUM_UNITS)-1:0] {
IEC_ID = 8,
GC_ID = 8,
BR_ID = 7,
//End of Write-Back Units (insert new writeback units here)
CUSTOM_ID = 6,
@ -161,22 +168,21 @@ package cva5_config;
typedef struct packed {
//ISA options
bit INCLUDE_M_MODE;
bit INCLUDE_S_MODE;
bit INCLUDE_U_MODE;
modes_t MODES;
bit INCLUDE_IFENCE; //local mem operations only
bit INCLUDE_AMO;
bit INCLUDE_CBO; //Data cache invalidation operations
//Units
units_t INCLUDE_UNIT;
units_t INCLUDE_UNIT; //Value of ALU, LS, BR, and GC ignored
//CSR constants
csr_config_t CSRS;
//Memory Options
int unsigned SQ_DEPTH;//CAM-based reasonable max of 4
bit INCLUDE_FORWARDING_TO_STORES;
amo_config_t AMO_UNIT;
//Caches
bit INCLUDE_ICACHE;
cache_config_t ICACHE;
@ -232,46 +238,38 @@ package cva5_config;
localparam cpu_config_t EXAMPLE_CONFIG = '{
//ISA options
INCLUDE_M_MODE : 1,
INCLUDE_S_MODE : 0,
INCLUDE_U_MODE : 0,
MODES : MSU,
INCLUDE_UNIT : '{
ALU : 1,
LS : 1,
MUL : 1,
DIV : 1,
CSR : 1,
FPU : 1,
CUSTOM : 0,
BR : 1,
IEC : 1
default: '0
},
INCLUDE_IFENCE : 1,
INCLUDE_AMO : 0,
INCLUDE_CBO : 0,
//CSR constants
CSRS : '{
MACHINE_IMPLEMENTATION_ID : 0,
CPU_ID : 0,
RESET_VEC : 32'h80000000,
RESET_MTVEC : 32'h80000100,
NON_STANDARD_OPTIONS : '{
COUNTER_W : 33,
MCYCLE_WRITEABLE : 0,
MINSTR_WRITEABLE : 0,
MTVEC_WRITEABLE : 1,
INCLUDE_MSCRATCH : 0,
INCLUDE_MCAUSE : 1,
INCLUDE_MTVAL : 1
}
RESET_TVEC : 32'h00000000,
MCONFIGPTR : '0,
INCLUDE_ZICNTR : 1,
INCLUDE_ZIHPM : 1,
INCLUDE_SSTC : 1,
INCLUDE_SMSTATEEN : 1
},
//Memory Options
SQ_DEPTH : 4,
INCLUDE_FORWARDING_TO_STORES : 1,
INCLUDE_ICACHE : 0,
AMO_UNIT : '{
LR_WAIT : 32,
RESERVATION_WORDS : 8
},
INCLUDE_ICACHE : 1,
ICACHE_ADDR : '{
L: 32'h80000000,
H: 32'h8FFFFFFF
@ -291,7 +289,7 @@ package cva5_config;
WAYS : 2,
DEPTH : 64
},
INCLUDE_DCACHE : 0,
INCLUDE_DCACHE : 1,
DCACHE_ADDR : '{
L: 32'h80000000,
H: 32'h8FFFFFFF
@ -311,12 +309,12 @@ package cva5_config;
WAYS : 2,
DEPTH : 64
},
INCLUDE_ILOCAL_MEM : 1,
INCLUDE_ILOCAL_MEM : 0,
ILOCAL_MEM_ADDR : '{
L : 32'h80000000,
H : 32'h8FFFFFFF
},
INCLUDE_DLOCAL_MEM : 1,
INCLUDE_DLOCAL_MEM : 0,
DLOCAL_MEM_ADDR : '{
L : 32'h80000000,
H : 32'h8FFFFFFF
@ -344,10 +342,6 @@ package cva5_config;
WB_GROUP : EXAMPLE_WB_GROUP_CONFIG
};
////////////////////////////////////////////////////
//Bus Options
parameter C_M_AXI_ADDR_WIDTH = 32; //Kept as parameter, due to localparam failing with scripted IP packaging
parameter C_M_AXI_DATA_WIDTH = 32; //Kept as parameter, due to localparam failing with scripted IP packaging
////////////////////////////////////////////////////
//ID limit
@ -377,35 +371,14 @@ package cva5_config;
////////////////////////////////////////////////////
//Exceptions
localparam NUM_EXCEPTION_SOURCES = 3; //LS, Branch, Illegal
localparam NUM_EXCEPTION_SOURCES = 5; //LS, Branch, Illegal, CSR, GC
//Stored in a ID table on issue, checked at retire
typedef enum bit [1:0] {
typedef enum bit [2:0] {
LS_EXCEPTION = 0,
BR_EXCEPTION = 1,
PRE_ISSUE_EXCEPTION = 2
PRE_ISSUE_EXCEPTION = 2,
CSR_EXCEPTION = 3,
GC_EXCEPTION = 4
} exception_sources_t;
////////////////////////////////////////////////////
//L1 Arbiter IDs
localparam L1_CONNECTIONS = 4;
typedef enum bit [1:0] {
L1_DCACHE_ID = 0,
L1_ICACHE_ID = 1,
L1_DMMU_ID = 2,
L1_IMMU_ID = 3
} l1_id_t;
////////////////////////////////////////////////////
//Debug Parameters
//To enable assertions specific to formal debug, uncomment or set in tool flow
//`define ENABLE_FORMAL_ASSERTIONS
//To enable assertions specific to simulation (verilator), uncomment or set in tool flow
//`define ENABLE_SIMULATION_ASSERTIONS
//When no exceptions are expected in a simulation, turn on this flag
//to convert any exceptions into assertions
localparam DEBUG_CONVERT_EXCEPTIONS_INTO_ASSERTIONS = 0;
endpackage

54
core/types_and_interfaces/cva5_types.sv Executable file → Normal file
View file

@ -27,9 +27,10 @@ package cva5_types;
localparam LOG2_RETIRE_PORTS = $clog2(RETIRE_PORTS);
localparam LOG2_MAX_IDS = $clog2(MAX_IDS);
localparam MAX_LS_SUBUNITS = 3;
typedef logic[LOG2_MAX_IDS-1:0] id_t;
typedef logic[1:0] branch_predictor_metadata_t;
typedef logic[$clog2(MAX_LS_SUBUNITS)-1:0] ls_subunit_t;
typedef logic [3:0] addr_hash_t;
typedef logic [5:0] phys_addr_t;
@ -43,6 +44,8 @@ package cva5_types;
typedef struct packed{
logic valid;
logic possible;
logic [NUM_EXCEPTION_SOURCES-1:0] source;
exception_code_t code;
logic [31:0] tval;
logic [31:0] pc;
@ -64,7 +67,9 @@ package cva5_types;
typedef struct packed{
logic [31:0] pc;
logic [31:0] pc_r;
logic [31:0] instruction;
logic [31:0] instruction_r;
logic [2:0] fn3;
logic [6:0] opcode;
@ -76,7 +81,6 @@ package cva5_types;
logic fp_uses_rd;
logic is_multicycle;
id_t id;
exception_sources_t exception_unit;
logic stage_valid;
fetch_metadata_t fetch_metadata;
} issue_packet_t;
@ -98,18 +102,13 @@ package cva5_types;
logic [4:0] op;
}amo_alu_inputs_t;
typedef struct packed{
logic is_lr;
logic is_sc;
logic is_amo;
logic [4:0] op;
} amo_details_t;
typedef struct packed {
logic [31:0] addr;
logic [11:0] offset;
logic load;
logic store;
logic cache_op;
logic amo;
amo_t amo_type;
logic [3:0] be;
logic [2:0] fn3;
logic [31:0] data;
@ -121,7 +120,14 @@ package cva5_types;
} lsq_entry_t;
typedef struct packed {
logic [31:0] addr;
logic [19:0] addr;
logic rnw;
logic discard;
ls_subunit_t subunit;
} lsq_addr_entry_t;
typedef struct packed {
logic [11:0] offset;
logic [3:0] be;
logic cache_op;
logic [31:0] data;
@ -131,8 +137,7 @@ package cva5_types;
} sq_entry_t;
typedef struct packed {
logic sq_empty;
logic no_released_stores_pending;
logic outstanding_store;
logic idle;
} load_store_status_t;
@ -165,29 +170,32 @@ package cva5_types;
logic load;
logic store;
logic cache_op;
logic amo;
amo_t amo_type;
logic [3:0] be;
logic [2:0] fn3;
ls_subunit_t subunit;
logic [31:0] data_in;
id_t id;
fp_ls_op_t fp_op;
} data_access_shared_inputs_t;
typedef enum {
LUTRAM_FIFO,
NON_MUXED_INPUT_FIFO,
NON_MUXED_OUTPUT_FIFO
} fifo_type_t;
typedef struct packed {
logic valid;
logic asid_only;
logic[ASIDLEN-1:0] asid;
logic addr_only;
logic[31:0] addr;
} tlb_packet_t;
typedef struct packed{
logic init_clear;
logic fetch_hold;
logic issue_hold;
logic fetch_flush;
logic writeback_supress;
logic retire_hold;
logic sq_flush;
logic tlb_flush;
logic exception_pending;
logic fetch_ifence;
logic writeback_suppress;
logic rename_revert;
exception_packet_t exception;
logic pc_override;
logic [31:0] pc;

View file

@ -25,17 +25,18 @@ interface axi_interface;
logic arready;
logic arvalid;
logic [C_M_AXI_ADDR_WIDTH-1:0] araddr;
logic [31:0] araddr;
logic [7:0] arlen;
logic [2:0] arsize;
logic [1:0] arburst;
logic [3:0] arcache;
logic [5:0] arid;
logic arlock;
//read data
logic rready;
logic rvalid;
logic [C_M_AXI_DATA_WIDTH-1:0] rdata;
logic [31:0] rdata;
logic [1:0] rresp;
logic rlast;
logic [5:0] rid;
@ -44,18 +45,19 @@ interface axi_interface;
//write address
logic awready;
logic awvalid;
logic [C_M_AXI_ADDR_WIDTH-1:0] awaddr;
logic [31:0] awaddr;
logic [7:0] awlen;
logic [2:0] awsize;
logic [1:0] awburst;
logic [3:0] awcache;
logic [5:0] awid;
logic awlock;
//write data
logic wready;
logic wvalid;
logic [C_M_AXI_DATA_WIDTH-1:0] wdata;
logic [(C_M_AXI_DATA_WIDTH/8)-1:0] wstrb;
logic [31:0] wdata;
logic [3:0] wstrb;
logic wlast;
//write response
@ -65,12 +67,12 @@ interface axi_interface;
logic [5:0] bid;
modport master (input arready, rvalid, rdata, rresp, rlast, rid, awready, wready, bvalid, bresp, bid,
output arvalid, araddr, arlen, arsize, arburst, arcache, arid, rready, awvalid, awaddr, awlen, awsize, awburst, awcache, awid,
output arvalid, araddr, arlen, arsize, arburst, arcache, arlock, arid, rready, awvalid, awaddr, awlen, awsize, awburst, awcache, awid, awlock,
wvalid, wdata, wstrb, wlast, bready);
modport slave (input arvalid, araddr, arlen, arsize, arburst, arcache,
modport slave (input arvalid, araddr, arlen, arsize, arburst, arcache, arlock,
rready,
awvalid, awaddr, awlen, awsize, awburst, awcache, arid,
awvalid, awaddr, awlen, awsize, awburst, awcache, awlock, arid,
wvalid, wdata, wstrb, wlast, awid,
bready,
output arready, rvalid, rdata, rresp, rlast, rid,
@ -78,20 +80,13 @@ interface axi_interface;
wready,
bvalid, bresp, bid);
`ifdef __CVA5_FORMAL__
modport formal (input arready, arvalid, araddr, arlen, arsize, arburst, arcache,
rready, rvalid, rdata, rresp, rlast, rid,
awready, awvalid, awaddr, awlen, awsize, awburst, awcache, arid,
wready, wvalid, wdata, wstrb, wlast, awid,
bready, bvalid, bresp, bid);
`endif
endinterface
interface avalon_interface;
logic [31:0] addr;
logic read;
logic write;
logic lock;
logic [3:0] byteenable;
logic [31:0] readdata;
logic [31:0] writedata;
@ -100,14 +95,9 @@ interface avalon_interface;
logic writeresponsevalid;
modport master (input readdata, waitrequest, readdatavalid, writeresponsevalid,
output addr, read, write, byteenable, writedata);
output addr, read, write, lock, byteenable, writedata);
modport slave (output readdata, waitrequest, readdatavalid, writeresponsevalid,
input addr, read, write, byteenable, writedata);
`ifdef __CVA5_FORMAL__
modport formal (input readdata, waitrequest, readdatavalid, writeresponsevalid,
addr, read, write, byteenable, writedata);
`endif
input addr, read, write, lock, byteenable, writedata);
endinterface
@ -129,48 +119,46 @@ interface wishbone_interface;
modport slave (output dat_r, ack, err,
input adr, dat_w, sel, cyc, stb, we, cti, bte);
`ifdef __CVA5_FORMAL__
modport formal (input adr, dat_w, sel, cyc, stb, we, cti, bte, dat_r, ack, err);
`endif
endinterface
interface l1_arbiter_request_interface;
import l2_config_and_types::*;
logic [31:0] addr;
logic [31:0] data ;
logic rnw ;
logic [3:0] be;
logic [4:0] size;
logic is_amo;
logic [4:0] amo;
interface mem_interface;
logic request;
logic[31:2] addr;
logic[4:0] rlen; //Nobody truly needs requests > 32 words
logic ack;
modport master (output addr, data, rnw, be, size, is_amo, amo, request, input ack);
modport slave (input addr, data, rnw, be, size, is_amo, amo, request, output ack);
logic rvalid;
logic[31:0] rdata;
logic[1:0] rid;
logic rnw;
logic rmw;
logic[3:0] wbe;
logic[31:0] wdata;
`ifdef __CVA5_FORMAL__
modport formal (input addr, data, rnw, be, size, is_amo, amo, request, ack);
`endif
logic inv;
logic[31:2] inv_addr;
logic write_outstanding;
logic[1:0] id;
modport ro_master (output request, addr, rlen, input ack, rvalid, rdata);
modport ro_slave (input request, addr, rlen, output ack, rvalid, rdata);
modport rw_master (output request, addr, rlen, rnw, rmw, wbe, wdata, input ack, rvalid, rdata, inv, inv_addr, write_outstanding);
modport rw_slave (input request, addr, rlen, rnw, rmw, wbe, wdata, output ack, rvalid, rdata, inv, inv_addr, write_outstanding);
modport mem_master (output request, addr, rlen, rnw, rmw, wbe, wdata, id, input ack, rvalid, rdata, rid, inv, inv_addr, write_outstanding);
modport mem_slave (input request, addr, rlen, rnw, rmw, wbe, wdata, id, output ack, rvalid, rdata, rid, inv, inv_addr, write_outstanding);
endinterface
interface l1_arbiter_return_interface;
logic [31:2] inv_addr;
logic inv_valid;
logic inv_ack;
logic [31:0] data;
logic data_valid;
interface local_memory_interface;
logic[29:0] addr;
logic en;
logic[3:0] be;
logic[31:0] data_in;
logic[31:0] data_out;
modport master (input inv_addr, inv_valid, data, data_valid, output inv_ack);
modport slave (output inv_addr, inv_valid, data, data_valid, input inv_ack);
`ifdef __CVA5_FORMAL__
modport formal (input inv_addr, inv_valid, data, data_valid, inv_ack);
`endif
modport slave (input addr, en, be, data_in, output data_out);
modport master (output addr, en, be, data_in, input data_out);
endinterface

92
core/types_and_interfaces/internal_interfaces.sv Executable file → Normal file
View file

@ -98,14 +98,15 @@ interface exception_interface;
import cva5_types::*;
logic valid;
logic ack;
logic possible;
exception_code_t code;
id_t id;
logic [31:0] tval;
logic [31:0] pc;
logic discard;
modport unit (output valid, code, id, tval, input ack);
modport econtrol (input valid, code, id, tval, output ack);
modport unit (output valid, possible, code, tval, pc, discard);
modport econtrol (input valid, possible, code, tval, pc, discard);
endinterface
interface fifo_interface #(parameter type DATA_TYPE = logic);
@ -122,6 +123,8 @@ interface fifo_interface #(parameter type DATA_TYPE = logic);
endinterface
interface mmu_interface;
import csr_types::*;
//From TLB
logic request;
logic execute;
@ -130,6 +133,8 @@ interface mmu_interface;
//TLB response
logic write_entry;
logic superpage;
pte_perms_t perms;
logic [19:0] upper_physical_address;
logic is_fault;
@ -137,10 +142,10 @@ interface mmu_interface;
logic [21:0] satp_ppn;
logic mxr; //Make eXecutable Readable
logic sum; //permit Supervisor User Memory access
logic [1:0] privilege;
privilege_t privilege;
modport mmu (input virtual_address, request, execute, rnw, satp_ppn, mxr, sum, privilege, output write_entry, upper_physical_address, is_fault);
modport tlb (input write_entry, upper_physical_address, is_fault, output request, virtual_address, execute, rnw);
modport mmu (input virtual_address, request, execute, rnw, satp_ppn, mxr, sum, privilege, output write_entry, superpage, perms, upper_physical_address, is_fault);
modport tlb (input write_entry, superpage, perms, upper_physical_address, is_fault, mxr, sum, privilege, output request, virtual_address, execute, rnw);
modport csr (output satp_ppn, mxr, sum, privilege);
endinterface
@ -154,18 +159,17 @@ interface tlb_interface;
//TLB Inputs
logic [31:0] virtual_address;
logic rnw;
logic execute;
//TLB Outputs
logic is_fault;
logic [31:0] physical_address;
modport tlb (
input new_request, virtual_address, rnw, execute,
input new_request, virtual_address, rnw,
output ready, done, is_fault, physical_address
);
modport requester (
output new_request, virtual_address, rnw, execute,
output new_request, virtual_address, rnw,
input ready, done, is_fault, physical_address
);
endinterface
@ -181,6 +185,10 @@ interface load_store_queue_interface;
logic load_pop;
logic store_pop;
//Address translation
logic addr_push;
lsq_addr_entry_t addr_data_in;
//LSQ outputs
data_access_shared_inputs_t load_data_out;
data_access_shared_inputs_t store_data_out;
@ -193,15 +201,14 @@ interface load_store_queue_interface;
//LSQ status
logic sq_empty;
logic empty;
logic no_released_stores_pending;
modport queue (
input data_in, potential_push, push, load_pop, store_pop,
output full, load_data_out, store_data_out, load_valid, store_valid, sq_empty, empty, no_released_stores_pending
input data_in, potential_push, push, addr_push, addr_data_in, load_pop, store_pop,
output full, load_data_out, store_data_out, load_valid, store_valid, sq_empty, empty
);
modport ls (
output data_in, potential_push, push, load_pop, store_pop,
input full, load_data_out, store_data_out, load_valid, store_valid, sq_empty, empty, no_released_stores_pending
output data_in, potential_push, push, addr_push, addr_data_in, load_pop, store_pop,
input full, load_data_out, store_data_out, load_valid, store_valid, sq_empty, empty
);
endinterface
@ -221,15 +228,14 @@ interface store_queue_interface;
//SQ status
logic empty;
logic no_released_stores_pending;
modport queue (
input data_in, push, pop,
output full, data_out, valid, empty, no_released_stores_pending
output full, data_out, valid, empty
);
modport ls (
output data_in, push, pop,
input full, data_out, valid, empty, no_released_stores_pending
input full, data_out, valid, empty
);
endinterface
@ -258,23 +264,14 @@ interface cache_functions_interface #(parameter int TAG_W = 8, parameter int LIN
endinterface
interface addr_utils_interface #(parameter bit [31:0] BASE_ADDR = 32'h00000000, parameter bit [31:0] UPPER_BOUND = 32'hFFFFFFFF);
//Based on the lower and upper address ranges,
//find the number of bits needed to uniquely identify this memory range.
//Assumption: address range is aligned to its size
function int unsigned bit_range ();
for(int i=0; i < 32; i++) begin
if (BASE_ADDR[i] == UPPER_BOUND[i])
return (32 - i);
end
return 0;
endfunction
localparam int unsigned BIT_RANGE = bit_range();
/* verilator lint_off SELRANGE */
interface addr_utils_interface #(parameter logic [31:0] BASE_ADDR = 32'h00000000, parameter logic [31:0] UPPER_BOUND = 32'hFFFFFFFF);
//The range should be aligned for performance
function address_range_check (input logic[31:0] addr);
return (BIT_RANGE == 0) ? 1 : (addr[31:32-BIT_RANGE] == BASE_ADDR[31:32-BIT_RANGE]);
/* verilator lint_off UNSIGNED */
/* verilator lint_off CMPCONST */
return addr >= BASE_ADDR & addr <= UPPER_BOUND;
/* verilator lint_on UNSIGNED */
/* verilator lint_on CMPCONST */
endfunction
endinterface
@ -406,3 +403,30 @@ interface fp_intermediate_wb_interface;
input id, done, rd, expo_overflow, fflags, rm, hidden, grs, clz, carry, safe, subnormal, right_shift, right_shift_amt, ignore_max_expo, d2s
);
endinterface
interface amo_interface;
import riscv_types::*;
//Atomic Load Reserved and Store Conditional
logic set_reservation;
logic clear_reservation;
logic[31:0] reservation;
logic reservation_valid;
//Atomic Read-Modify-Write
logic rmw_valid;
amo_t op;
logic[31:0] rs1;
logic[31:0] rs2;
logic[31:0] rd;
modport subunit (
input reservation_valid, rd,
output set_reservation, clear_reservation, reservation, rmw_valid, op, rs1, rs2
);
modport amo_unit (
output reservation_valid, rd,
input set_reservation, clear_reservation, reservation, rmw_valid, op, rs1, rs2
);
endinterface

View file

@ -148,8 +148,8 @@ package opcodes;
localparam [31:0] AMO_MINU = 32'b11000????????????010?????0101111;
localparam [31:0] AMO_MAXU = 32'b11100????????????010?????0101111;
localparam [31:0] AMO_SWAP = 32'b00001????????????010?????0101111;
localparam [31:0] LR = 32'b00010??00000?????010?????0101111;
localparam [31:0] SC = 32'b00011????????????010?????0101111;
localparam [31:0] AMO_LR = 32'b00010??00000?????010?????0101111;
localparam [31:0] AMO_SC = 32'b00011????????????010?????0101111;
//Machine/Supervisor
localparam [31:0] SRET = 32'b00010000001000000000000001110011;

View file

@ -113,15 +113,23 @@ package riscv_types;
URET_imm = 12'b000000000010,
SRET_imm = 12'b000100000010,
MRET_imm = 12'b001100000010,
SFENCE_imm = 12'b0001001?????
SFENCE_imm = 12'b0001001?????,
WFI_imm = 12'b000100000101
} imm_sys_t;
//Other registers exist but are not supported
typedef enum logic [11:0] {
//Floating Point
FFLAGS = 12'h001,
FRM = 12'h002,
FCSR = 12'h003,
//Machine info
MVENDORID = 12'hF11,
MARCHID = 12'hF12,
MIMPID = 12'hF13,
MHARTID = 12'hF14,
MCONFIGPTR = 12'hF15,
//Machine trap setup
MSTATUS = 12'h300,
MISA = 12'h301,
@ -130,55 +138,79 @@ package riscv_types;
MIE = 12'h304,
MTVEC = 12'h305,
MCOUNTEREN = 12'h306,
MSTATUSH = 12'h310,
MEDELEGH = 12'h312,
//Machine trap handling
MSCRATCH = 12'h340,
MEPC = 12'h341,
MCAUSE = 12'h342,
MTVAL = 12'h343,
MIP = 12'h344,
//Machine configuration
MENVCFG = 12'h30A,
MENVCFGH = 12'h31A,
//No optional mseccfg/mseccfgh
//No PMP
//Machine Counters
MCYCLE = 12'hB00,
MINSTRET = 12'hB02,
MHPMCOUNTER3 = 12'hB03,
MHPMCOUNTER31 = 12'hB1F,
MCYCLEH = 12'hB80,
MINSTRETH = 12'hB82,
MHPMCOUNTER3H = 12'hB83,
MHPMCOUNTER31H = 12'hB9F,
//Machine counter setup
MCOUNTINHIBIT = 12'h320,
MHPMEVENT3 = 12'h323,
MHPMEVENT31 = 12'h33F,
MHPMEVENT3H = 12'h723,
MHPMEVENT31H = 12'h73F,
//Machine state enable
MSTATEEN0 = 12'h30C,
MSTATEEN1 = 12'h30D,
MSTATEEN2 = 12'h30E,
MSTATEEN3 = 12'h30F,
MSTATEEN0H = 12'h31C,
MSTATEEN1H = 12'h31D,
MSTATEEN2H = 12'h31E,
MSTATEEN3H = 12'h31F,
//Supervisor regs
//Supervisor Trap Setup
SSTATUS = 12'h100,
SEDELEG = 12'h102,
SIDELEG = 12'h103,
SIE = 12'h104,
STVEC = 12'h105,
SCOUNTEREN = 12'h106,
//Supervisor configuration
SENVCFG = 12'h10A,
//Supervisor trap handling
SSCRATCH = 12'h140,
SEPC = 12'h141,
SCAUSE = 12'h142,
STVAL = 12'h143,
SIP = 12'h144,
STIMECMP = 12'h14D,
STIMECMPH = 12'h15D,
//Supervisor address translation and protection
SATP = 12'h180,
//Supervisor state enable
SSTATEEN0 = 12'h10C,
SSTATEEN1 = 12'h10D,
SSTATEEN2 = 12'h10E,
SSTATEEN3 = 12'h10F,
//User regs
//USER Floating Point
FFLAGS = 12'h001,
FRM = 12'h002,
FCSR = 12'h003,
//User Counter Timers
//Timers and counters
CYCLE = 12'hC00,
TIME = 12'hC01,
INSTRET = 12'hC02,
HPMCOUNTER3 = 12'hC03,
HPMCOUNTER31 = 12'hC1F,
CYCLEH = 12'hC80,
TIMEH = 12'hC81,
INSTRETH = 12'hC82,
//Debug regs
DCSR = 12'h7B0,
DPC = 12'h7B1,
DSCRATCH = 12'h7B2
HPMCOUNTER3H = 12'hC83,
HPMCOUNTER31H = 12'hC9F
} csr_reg_addr_t;
typedef enum logic [2:0] {
@ -198,11 +230,6 @@ package riscv_types;
CSR_RC = 2'b11
} csr_op_t;
typedef enum logic [4:0] {
BARE = 5'd0,
SV32 = 5'd8
} vm_t;
localparam ASIDLEN = 9;//pid
typedef enum logic [ECODE_W-1:0] {
@ -221,7 +248,9 @@ package riscv_types;
INST_PAGE_FAULT = 5'd12,
LOAD_PAGE_FAULT = 5'd13,
//reserved
STORE_OR_AMO_PAGE_FAULT = 5'd15
STORE_OR_AMO_PAGE_FAULT = 5'd15,
SOFTWARE_CHECK = 5'd18,
HARDWARE_ERROR = 5'd19
//reserved
} exception_code_t;
@ -238,7 +267,9 @@ package riscv_types;
//RESERVED
S_EXTERNAL_INTERRUPT = 5'd9,
//RESERVED
M_EXTERNAL_INTERRUPT = 5'd11
M_EXTERNAL_INTERRUPT = 5'd11,
//RESERVED
LOCAL_COUNT_OVERFLOW_INTERRUPT = 5'd13
} interrupt_code_t;
typedef enum bit [4:0] {
@ -255,6 +286,12 @@ package riscv_types;
AMO_MAXU_FN5 = 5'b11100
} amo_t;
typedef enum bit [1:0] {
INVAL = 2'b00,
CLEAN = 2'b01,
FLUSH = 2'b10
} cbo_t;
//Assembly register definitions for simulation purposes
typedef struct packed{
logic [XLEN-1:0] zero;

View file

View file

@ -624,4 +624,4 @@ for (index = 0; index < NUM_CPUS; index=index+1) begin
end
end
endgenerate
endmodule
endmodule

View file

@ -1,123 +0,0 @@
/*
* Copyright © 2022 Eric Matthews, Lesley Shannon
*
* Licensed under the Apache License, Version 2.0 (the "License");
* you may not use this file except in compliance with the License.
* You may obtain a copy of the License at
*
* http://www.apache.org/licenses/LICENSE-2.0
*
* Unless required by applicable law or agreed to in writing, software
* distributed under the License is distributed on an "AS IS" BASIS,
* WITHOUT WARRANTIES OR CONDITIONS OF ANY KIND, either express or implied.
* See the License for the specific language governing permissions and
* limitations under the License.
*
* Initial code developed under the supervision of Dr. Lesley Shannon,
* Reconfigurable Computing Lab, Simon Fraser University.
*
* Author(s):
* Eric Matthews <ematthew@sfu.ca>
*/
module l1_to_wishbone
import cva5_config::*;
import riscv_types::*;
import cva5_types::*;
import l2_config_and_types::*;
(
input logic clk,
input logic rst,
l2_requester_interface.slave cpu,
wishbone_interface.master wishbone
);
localparam MAX_REQUESTS = 32;
fifo_interface #(.DATA_TYPE(l2_request_t)) request_fifo ();
fifo_interface #(.DATA_TYPE(l2_data_request_t)) data_fifo ();
l2_request_t request;
l2_data_request_t data_request;
logic request_complete;
////////////////////////////////////////////////////
//Implementation
assign cpu.request_full = request_fifo.full;
assign cpu.data_full = data_fifo.full;
//Repack input attributes
assign request_fifo.data_in = '{
addr : cpu.addr,
rnw : cpu.rnw,
is_amo : cpu.is_amo,
amo_type_or_burst_size : cpu.amo_type_or_burst_size,
sub_id : cpu.sub_id
};
assign request_fifo.push = cpu.request_push;
assign request_fifo.potential_push = cpu.request_push;
assign request_fifo.pop = request_complete;
assign request = request_fifo.data_out;
assign data_fifo.push = cpu.wr_data_push;
assign data_fifo.potential_push = cpu.wr_data_push;
assign data_fifo.pop = wishbone.we & wishbone.ack;
assign data_fifo.data_in = '{
data : cpu.wr_data,
be : cpu_wr_data_be
};
assign data_request = data_fifo.data_out;
cva5_fifo #(.DATA_TYPE(l2_request_t), .FIFO_DEPTH(MAX_REQUESTS))
request_fifo_block (
.clk (clk),
.rst (rst),
.fifo (request_fifo)
);
cva5_fifo #(.DATA_TYPE(l2_data_request_t), .FIFO_DEPTH(MAX_REQUESTS))
data_fifo_block (
.clk (clk),
.rst (rst),
.fifo (data_fifo)
);
////////////////////////////////////////////////////
//Wishbone
logic [4:0] burst_size;
logic [4:0] burst_count;
assign wishbone.cti = 0;
assign wishbone.bte = 0;
always_ff @ (posedge clk) begin
if (rst | request_fifo.pop)
burst_count <= 0;
else
burst_count <= burst_count + 5'(wishbone.ack);
end
assign burst_size = request.amo_type_or_burst_size;
assign request_complete = wishbone.ack & (burst_count == burst_size);
assign wishbone.adr[29:5] = request.addr[29:5];
assign wishbone.adr[4:0] = (request.addr[4:0] & ~burst_size) | (burst_count & burst_size);
assign wishbone.sel = request.rnw ? '1 : data_request.be;
assign wishbone.we = ~request.rnw;
assign wishbone.dat_w = data_request.data;
assign wishbone.stb = request_fifo.valid;
assign wishbone.cyc = request_fifo.valid;
////////////////////////////////////////////////////
//Return Path
//L1 always acks data, no need for rd_data_ack
always_ff @ (posedge clk) begin
cpu.rd_data <= wishbone.dat_r;
cpu.rd_data_valid <= request.rnw & wishbone.ack;
cpu.rd_sub_id <= request.sub_id;
end
endmodule

552
examples/litex/litex_wrapper.sv Executable file → Normal file
View file

@ -1,5 +1,5 @@
/*
* Copyright © 2022 Eric Matthews, Lesley Shannon
* Copyright © 2022 Eric Matthews, Lesley Shannon
*
* Licensed under the Apache License, Version 2.0 (the "License");
* you may not use this file except in compliance with the License.
@ -23,44 +23,25 @@
module litex_wrapper
import cva5_config::*;
import cva5_types::*;
import l2_config_and_types::*;
#(
parameter LITEX_VARIANT = 0,
parameter bit [31:0] RESET_VEC = 0,
parameter bit [31:0] NON_CACHABLE_L = 32'h80000000,
parameter bit [31:0] NON_CACHABLE_H =32'hFFFFFFFF
parameter bit [31:0] NON_CACHABLE_H = 32'hFFFFFFFF,
parameter int unsigned NUM_CORES = 1,
parameter logic AXI = 1'b1 //Else the wishbone bus is used
)
(
input logic clk,
input logic rst,
input logic [15:0] litex_interrupt,
output logic [29:0] ibus_adr,
output logic [31:0] ibus_dat_w,
output logic [3:0] ibus_sel,
output logic ibus_cyc,
output logic ibus_stb,
output logic ibus_we,
output logic ibus_cti,
output logic ibus_bte,
input logic [31:0] ibus_dat_r,
input logic ibus_ack,
input logic ibus_err,
output logic [29:0] dbus_adr,
output logic [31:0] dbus_dat_w,
output logic [3:0] dbus_sel,
output logic dbus_cyc,
output logic dbus_stb,
output logic dbus_we,
output logic dbus_cti,
output logic dbus_bte,
input logic [31:0] dbus_dat_r,
input logic dbus_ack,
input logic dbus_err,
input logic [NUM_CORES-1:0] meip,
input logic [NUM_CORES-1:0] seip,
input logic [NUM_CORES-1:0] mtip,
input logic [NUM_CORES-1:0] msip,
input logic [63:0] mtime,
//Wishbone memory port (used only if configured)
output logic [29:0] idbus_adr,
output logic [31:0] idbus_dat_w,
output logic [3:0] idbus_sel,
@ -71,125 +52,47 @@ module litex_wrapper
output logic idbus_bte,
input logic [31:0] idbus_dat_r,
input logic idbus_ack,
input logic idbus_err
input logic idbus_err,
//AXI memory port (used only if configured)
//AR
input logic m_axi_arready,
output logic m_axi_arvalid,
output logic [31:0] m_axi_araddr,
output logic [7:0] m_axi_arlen,
output logic [2:0] m_axi_arsize, //Constant, 32b
output logic [1:0] m_axi_arburst, //Constant, incrementing
output logic [3:0] m_axi_arcache, //Constant, normal non-cacheable bufferable
output logic [5:0] m_axi_arid,
//R
output logic m_axi_rready,
input logic m_axi_rvalid,
input logic [31:0] m_axi_rdata,
input logic [1:0] m_axi_rresp,
input logic m_axi_rlast,
input logic [5:0] m_axi_rid,
//AW
input logic m_axi_awready,
output logic m_axi_awvalid,
output logic [31:0] m_axi_awaddr,
output logic [7:0] m_axi_awlen, //Constant, 0
output logic [2:0] m_axi_awsize, //Constant, 32b
output logic [1:0] m_axi_awburst, //Constant, incrementing
output logic [3:0] m_axi_awcache, //Constant, normal non-cacheable bufferable
output logic [5:0] m_axi_awid,
//W
input logic m_axi_wready,
output logic m_axi_wvalid,
output logic [31:0] m_axi_wdata,
output logic [3:0] m_axi_wstrb,
output logic m_axi_wlast,
//B
output logic m_axi_bready,
input logic m_axi_bvalid,
input logic [1:0] m_axi_bresp,
input logic [5:0] m_axi_bid
);
localparam wb_group_config_t MINIMAL_WB_GROUP_CONFIG = '{
0 : '{0: ALU_ID, default : NON_WRITEBACK_ID},
1 : '{0: LS_ID, 1: CSR_ID, default : NON_WRITEBACK_ID},
default : '{default : NON_WRITEBACK_ID}
};
localparam cpu_config_t MINIMAL_CONFIG = '{
//ISA options
INCLUDE_M_MODE : 1,
INCLUDE_S_MODE : 0,
INCLUDE_U_MODE : 0,
INCLUDE_UNIT : '{
ALU : 1,
LS : 1,
MUL : 0,
DIV : 0,
CSR : 1,
CUSTOM : 0,
BR : 1,
IEC : 1
},
INCLUDE_IFENCE : 0,
INCLUDE_AMO : 0,
//CSR constants
CSRS : '{
MACHINE_IMPLEMENTATION_ID : 0,
CPU_ID : 0,
RESET_VEC : RESET_VEC,
RESET_MTVEC : 32'h00000000,
NON_STANDARD_OPTIONS : '{
COUNTER_W : 33,
MCYCLE_WRITEABLE : 0,
MINSTR_WRITEABLE : 0,
MTVEC_WRITEABLE : 1,
INCLUDE_MSCRATCH : 0,
INCLUDE_MCAUSE : 1,
INCLUDE_MTVAL : 1
}
},
//Memory Options
SQ_DEPTH : 2,
INCLUDE_FORWARDING_TO_STORES : 0,
INCLUDE_ICACHE : 0,
ICACHE_ADDR : '{
L: 32'h40000000,
H: 32'h4FFFFFFF
},
ICACHE : '{
LINES : 512,
LINE_W : 4,
WAYS : 2,
USE_EXTERNAL_INVALIDATIONS : 0,
USE_NON_CACHEABLE : 0,
NON_CACHEABLE : '{
L: 32'h00000000,
H: 32'h00000000
}
},
ITLB : '{
WAYS : 2,
DEPTH : 64
},
INCLUDE_DCACHE : 0,
DCACHE_ADDR : '{
L: 32'h40000000,
H: 32'h4FFFFFFF
},
DCACHE : '{
LINES : 512,
LINE_W : 4,
WAYS : 2,
USE_EXTERNAL_INVALIDATIONS : 0,
USE_NON_CACHEABLE : 0,
NON_CACHEABLE : '{
L: 32'h00000000,
H: 32'h00000000
}
},
DTLB : '{
WAYS : 2,
DEPTH : 64
},
INCLUDE_ILOCAL_MEM : 0,
ILOCAL_MEM_ADDR : '{
L : 32'h80000000,
H : 32'h8FFFFFFF
},
INCLUDE_DLOCAL_MEM : 0,
DLOCAL_MEM_ADDR : '{
L : 32'h80000000,
H : 32'h8FFFFFFF
},
INCLUDE_IBUS : 1,
IBUS_ADDR : '{
L : 32'h00000000,
H : 32'hFFFFFFFF
},
INCLUDE_PERIPHERAL_BUS : 1,
PERIPHERAL_BUS_ADDR : '{
L : 32'h00000000,
H : 32'hFFFFFFFF
},
PERIPHERAL_BUS_TYPE : WISHBONE_BUS,
//Branch Predictor Options
INCLUDE_BRANCH_PREDICTOR : 0,
BP : '{
WAYS : 2,
ENTRIES : 512,
RAS_ENTRIES : 8
},
//Writeback Options
NUM_WB_GROUPS : 2,
WB_GROUP : MINIMAL_WB_GROUP_CONFIG
};
localparam wb_group_config_t STANDARD_WB_GROUP_CONFIG = '{
0 : '{0: ALU_ID, default : NON_WRITEBACK_ID},
1 : '{0: LS_ID, default : NON_WRITEBACK_ID},
@ -197,151 +100,205 @@ module litex_wrapper
default : '{default : NON_WRITEBACK_ID}
};
localparam cpu_config_t STANDARD_CONFIG = '{
//ISA options
INCLUDE_M_MODE : 1,
INCLUDE_S_MODE : 0,
INCLUDE_U_MODE : 0,
INCLUDE_UNIT : '{
ALU : 1,
LS : 1,
MUL : 1,
DIV : 1,
CSR : 1,
CUSTOM : 0,
BR : 1,
IEC : 1
},
INCLUDE_IFENCE : 0,
INCLUDE_AMO : 0,
//CSR constants
CSRS : '{
MACHINE_IMPLEMENTATION_ID : 0,
CPU_ID : 0,
RESET_VEC : RESET_VEC,
RESET_MTVEC : 32'h00000000,
NON_STANDARD_OPTIONS : '{
COUNTER_W : 33,
MCYCLE_WRITEABLE : 0,
MINSTR_WRITEABLE : 0,
MTVEC_WRITEABLE : 1,
INCLUDE_MSCRATCH : 0,
INCLUDE_MCAUSE : 1,
INCLUDE_MTVAL : 1
}
},
//Memory Options
SQ_DEPTH : 4,
INCLUDE_FORWARDING_TO_STORES : 1,
INCLUDE_ICACHE : 1,
ICACHE_ADDR : '{
L : 32'h00000000,
H : 32'hFFFFFFFF
},
ICACHE : '{
LINES : 512,
LINE_W : 4,
WAYS : 2,
USE_EXTERNAL_INVALIDATIONS : 0,
USE_NON_CACHEABLE : 0,
NON_CACHEABLE : '{
L: NON_CACHABLE_L,
H: NON_CACHABLE_H
}
},
ITLB : '{
WAYS : 2,
DEPTH : 64
},
INCLUDE_DCACHE : 1,
DCACHE_ADDR : '{
L : 32'h00000000,
H : 32'hFFFFFFFF
},
DCACHE : '{
LINES : 512,
LINE_W : 4,
WAYS : 2,
USE_EXTERNAL_INVALIDATIONS : 0,
USE_NON_CACHEABLE : 1,
NON_CACHEABLE : '{
L: NON_CACHABLE_L,
H: NON_CACHABLE_H
}
},
DTLB : '{
WAYS : 2,
DEPTH : 64
},
INCLUDE_ILOCAL_MEM : 0,
ILOCAL_MEM_ADDR : '{
L : 32'h80000000,
H : 32'h8FFFFFFF
},
INCLUDE_DLOCAL_MEM : 0,
DLOCAL_MEM_ADDR : '{
L : 32'h80000000,
H : 32'h8FFFFFFF
},
INCLUDE_IBUS : 0,
IBUS_ADDR : '{
L : 32'h00000000,
H : 32'hFFFFFFFF
},
INCLUDE_PERIPHERAL_BUS : 0,
PERIPHERAL_BUS_ADDR : '{
L : 32'h00000000,
H : 32'hFFFFFFFF
},
PERIPHERAL_BUS_TYPE : WISHBONE_BUS,
//Branch Predictor Options
INCLUDE_BRANCH_PREDICTOR : 1,
BP : '{
WAYS : 2,
ENTRIES : 512,
RAS_ENTRIES : 8
},
//Writeback Options
NUM_WB_GROUPS : 3,
WB_GROUP : STANDARD_WB_GROUP_CONFIG
};
function cpu_config_t config_select (input integer variant);
case (variant)
0 : config_select = MINIMAL_CONFIG;
1 : config_select = STANDARD_CONFIG;
default : config_select = STANDARD_CONFIG;
endcase
endfunction
localparam cpu_config_t LITEX_CONFIG = config_select(LITEX_VARIANT);
//Unused interfaces
axi_interface m_axi();
avalon_interface m_avalon();
local_memory_interface instruction_bram();
local_memory_interface data_bram();
interrupt_t s_interrupt;
axi_interface axi[NUM_CORES-1:0]();
avalon_interface avalon[NUM_CORES-1:0]();
wishbone_interface dwishbone[NUM_CORES-1:0]();
wishbone_interface iwishbone[NUM_CORES-1:0]();
local_memory_interface instruction_bram[NUM_CORES-1:0]();
local_memory_interface data_bram[NUM_CORES-1:0]();
//L2 to Wishbone
l2_requester_interface l2();
//Interrupts
interrupt_t[NUM_CORES-1:0] s_interrupt;
interrupt_t[NUM_CORES-1:0] m_interrupt;
//Wishbone interfaces
wishbone_interface dwishbone();
wishbone_interface iwishbone();
wishbone_interface idwishbone();
//Memory interfaces for each core
mem_interface mem[NUM_CORES-1:0]();
generate for (genvar i = 0; i < NUM_CORES; i++) begin : gen_cores
localparam cpu_config_t STANDARD_CONFIG_I = '{
//ISA options
MODES : MSU,
INCLUDE_UNIT : '{
MUL : 1,
DIV : 1,
CSR : 1,
FPU : 0,
CUSTOM : 0,
default: '0
},
INCLUDE_IFENCE : 1,
INCLUDE_AMO : 1,
INCLUDE_CBO : 0,
//CSR constants
CSRS : '{
MACHINE_IMPLEMENTATION_ID : 0,
CPU_ID : i,
RESET_VEC : RESET_VEC,
RESET_TVEC : 32'h00000000,
MCONFIGPTR : '0,
INCLUDE_ZICNTR : 1,
INCLUDE_ZIHPM : 1,
INCLUDE_SSTC : 1,
INCLUDE_SMSTATEEN : 1
},
//Memory Options
SQ_DEPTH : 4,
INCLUDE_FORWARDING_TO_STORES : 1,
AMO_UNIT : '{
LR_WAIT : 8,
RESERVATION_WORDS : 8
},
INCLUDE_ICACHE : 1,
ICACHE_ADDR : '{
L : 32'h00000000,
H : 32'h7FFFFFFF
},
ICACHE : '{
LINES : 512,
LINE_W : 8,
WAYS : 2,
USE_EXTERNAL_INVALIDATIONS : 0,
USE_NON_CACHEABLE : 0,
NON_CACHEABLE : '{
L: NON_CACHABLE_L,
H: NON_CACHABLE_H
}
},
ITLB : '{
WAYS : 2,
DEPTH : 64
},
INCLUDE_DCACHE : 1,
DCACHE_ADDR : '{
L : 32'h00000000,
H : 32'hFFFFFFFF
},
DCACHE : '{
LINES : 512,
LINE_W : 8,
WAYS : 2,
USE_EXTERNAL_INVALIDATIONS : 1,
USE_NON_CACHEABLE : 1,
NON_CACHEABLE : '{
L: NON_CACHABLE_L,
H: NON_CACHABLE_H
}
},
DTLB : '{
WAYS : 2,
DEPTH : 64
},
INCLUDE_ILOCAL_MEM : 0,
ILOCAL_MEM_ADDR : '{
L : 32'h80000000,
H : 32'h8FFFFFFF
},
INCLUDE_DLOCAL_MEM : 0,
DLOCAL_MEM_ADDR : '{
L : 32'h80000000,
H : 32'h8FFFFFFF
},
INCLUDE_IBUS : 0,
IBUS_ADDR : '{
L : 32'h00000000,
H : 32'hFFFFFFFF
},
INCLUDE_PERIPHERAL_BUS : 0,
PERIPHERAL_BUS_ADDR : '{
L : 32'h00000000,
H : 32'hFFFFFFFF
},
PERIPHERAL_BUS_TYPE : WISHBONE_BUS,
//Branch Predictor Options
INCLUDE_BRANCH_PREDICTOR : 1,
BP : '{
WAYS : 2,
ENTRIES : 512,
RAS_ENTRIES : 8
},
//Writeback Options
NUM_WB_GROUPS : 3,
WB_GROUP : STANDARD_WB_GROUP_CONFIG
};
//Timer and External interrupts
interrupt_t m_interrupt;
assign m_interrupt.software = 0;
assign m_interrupt.timer = litex_interrupt[1];
assign m_interrupt.external = litex_interrupt[0];
assign m_interrupt[i].software = msip[i];
assign m_interrupt[i].timer = mtip[i];
assign m_interrupt[i].external = meip[i];
assign s_interrupt[i].software = 0; //Not possible
assign s_interrupt[i].timer = 0; //Internal
assign s_interrupt[i].external = seip[i];
cva5 #(.CONFIG(LITEX_CONFIG)) cpu(.*);
cva5 #(.CONFIG(STANDARD_CONFIG_I)) cpu(
.instruction_bram(instruction_bram[i]),
.data_bram(data_bram[i]),
.m_axi(axi[i]),
.m_avalon(avalon[i]),
.dwishbone(dwishbone[i]),
.iwishbone(iwishbone[i]),
.mem(mem[i]),
.mtime(mtime),
.s_interrupt(s_interrupt[i]),
.m_interrupt(m_interrupt[i]),
.*);
end endgenerate
//Final memory interface
generate if (AXI) begin : gen_axi_if
axi_interface m_axi();
//Mux requests from one or more cores onto the AXI bus
axi_adapter #(.NUM_CORES(NUM_CORES)) axi_adapter (
.mems(mem),
.axi(m_axi),
.*);
assign m_axi.arready = m_axi_arready;
assign m_axi_arvalid = m_axi.arvalid;
assign m_axi_araddr = m_axi.araddr;
assign m_axi_arlen = m_axi.arlen;
assign m_axi_arsize = m_axi.arsize;
assign m_axi_arburst = m_axi.arburst;
assign m_axi_arcache = m_axi.arcache;
assign m_axi_arid = m_axi.arid;
assign m_axi_rready = m_axi.rready;
assign m_axi.rvalid = m_axi_rvalid;
assign m_axi.rdata = m_axi_rdata;
assign m_axi.rresp = m_axi_rresp;
assign m_axi.rlast = m_axi_rlast;
assign m_axi.rid = m_axi_rid;
assign m_axi.awready = m_axi_awready;
assign m_axi_awvalid = m_axi.awvalid;
assign m_axi_awaddr = m_axi.awaddr;
assign m_axi_awlen = m_axi.awlen;
assign m_axi_awsize = m_axi.awsize;
assign m_axi_awburst = m_axi.awburst;
assign m_axi_awcache = m_axi.awcache;
assign m_axi_awid = m_axi.awid;
assign m_axi.wready = m_axi_wready;
assign m_axi_wvalid = m_axi.wvalid;
assign m_axi_wdata = m_axi.wdata;
assign m_axi_wstrb = m_axi.wstrb;
assign m_axi_wlast = m_axi.wlast;
assign m_axi_bready = m_axi.bready;
assign m_axi.bvalid = m_axi_bvalid;
assign m_axi.bresp = m_axi_bresp;
assign m_axi.bid = m_axi_bid;
end else begin : gen_wishbone_if
wishbone_interface idwishbone();
//Mux requests from one or more cores onto the wishbone bus
wishbone_adapter #(.NUM_CORES(NUM_CORES)) wb_adapter (
.mems(mem),
.wishbone(idwishbone),
.*);
generate if (LITEX_VARIANT != 0) begin : l1_arb_gen
l1_to_wishbone arb(.*, .cpu(l2), .wishbone(idwishbone));
assign idbus_adr = idwishbone.adr;
assign idbus_dat_w = idwishbone.dat_w;
assign idbus_sel = idwishbone.sel;
@ -353,31 +310,6 @@ module litex_wrapper
assign idwishbone.dat_r = idbus_dat_r;
assign idwishbone.ack = idbus_ack;
assign idwishbone.err = idbus_err;
end else begin
assign ibus_adr = iwishbone.adr;
assign ibus_dat_w = iwishbone.dat_w;
assign ibus_sel = iwishbone.sel;
assign ibus_cyc = iwishbone.cyc;
assign ibus_stb = iwishbone.stb;
assign ibus_we = iwishbone.we;
assign ibus_cti = iwishbone.cti;
assign ibus_bte = iwishbone.bte;
assign iwishbone.dat_r = ibus_dat_r;
assign iwishbone.ack = ibus_ack;
assign iwishbone.err = ibus_err;
assign dbus_adr = dwishbone.adr;
assign dbus_dat_w = dwishbone.dat_w;
assign dbus_sel = dwishbone.sel;
assign dbus_cyc = dwishbone.cyc;
assign dbus_stb = dwishbone.stb;
assign dbus_we = dwishbone.we;
assign dbus_cti = dwishbone.cti;
assign dbus_bte = dwishbone.bte;
assign dwishbone.dat_r = dbus_dat_r;
assign dwishbone.ack = dbus_ack;
assign dwishbone.err = dbus_err;
end endgenerate
endmodule

View file

@ -1,165 +0,0 @@
/*
* Copyright © 2022 Eric Matthews, Lesley Shannon
*
* Licensed under the Apache License, Version 2.0 (the "License");
* you may not use this file except in compliance with the License.
* You may obtain a copy of the License at
*
* http://www.apache.org/licenses/LICENSE-2.0
*
* Unless required by applicable law or agreed to in writing, software
* distributed under the License is distributed on an "AS IS" BASIS,
* WITHOUT WARRANTIES OR CONDITIONS OF ANY KIND, either express or implied.
* See the License for the specific language governing permissions and
* limitations under the License.
*
* Initial code developed under the supervision of Dr. Lesley Shannon,
* Reconfigurable Computing Lab, Simon Fraser University.
*
* Author(s):
* Eric Matthews <ematthew@sfu.ca>
*/
module l1_to_axi
import cva5_config::*;
import riscv_types::*;
import cva5_types::*;
import l2_config_and_types::*;
(
input logic clk,
input logic rst,
l2_requester_interface.slave cpu,
axi_interface.master axi
);
localparam MAX_REQUESTS = 16;
fifo_interface #(.DATA_TYPE(l2_request_t)) request_fifo ();
fifo_interface #(.DATA_TYPE(l2_data_request_t)) data_fifo ();
l2_request_t request;
logic write_request;
logic read_pop;
logic write_pop;
logic aw_complete;
logic w_complete;
logic aw_complete_r;
logic w_complete_r;
////////////////////////////////////////////////////
//Implementation
assign cpu.request_full = request_fifo.full;
assign cpu.data_full = data_fifo.full;
//Repack input attributes
assign request_fifo.data_in = '{
addr : cpu.addr,
rnw : cpu.rnw,
is_amo : cpu.is_amo,
amo_type_or_burst_size : cpu.amo_type_or_burst_size,
sub_id : cpu.sub_id
};
assign request_fifo.push = cpu.request_push;
assign request_fifo.potential_push = cpu.request_push;
assign request_fifo.pop = read_pop | write_pop;
assign request = request_fifo.data_out;
assign data_fifo.push = cpu.wr_data_push;
assign data_fifo.potential_push = cpu.wr_data_push;
assign data_fifo.pop = write_pop;
assign data_fifo.data_in = '{
data : cpu.wr_data,
be : cpu.wr_data_be
};
cva5_fifo #(.DATA_TYPE(l2_request_t), .FIFO_DEPTH(MAX_REQUESTS))
request_fifo_block (
.clk (clk),
.rst (rst),
.fifo (request_fifo)
);
cva5_fifo #(.DATA_TYPE(l2_data_request_t), .FIFO_DEPTH(MAX_REQUESTS))
data_fifo_block (
.clk (clk),
.rst (rst),
.fifo (data_fifo)
);
////////////////////////////////////////////////////
//AXI
localparam MAX_WRITE_IN_FLIGHT = 512;
logic [$clog2(MAX_WRITE_IN_FLIGHT+1)-1:0] write_in_flight_count;
logic [$clog2(MAX_WRITE_IN_FLIGHT+1)-1:0] write_in_flight_count_next;
logic [4:0] burst_size;
assign burst_size = request.amo_type_or_burst_size;
//Read Channel
assign axi.arlen = 8'(burst_size);
assign axi.arburst = (burst_size !=0) ? 2'b01 : '0;// INCR
assign axi.rready = 1; //always ready to receive data
assign axi.arsize = 3'b010;//4 bytes
assign axi.arcache = 4'b0000; //Normal Non-cacheable Non-bufferable
assign axi.arid = 6'(request.sub_id);
assign axi.araddr = {request.addr, 2'b00} & {25'h1FFFFFF, ~burst_size, 2'b00};
assign axi.arvalid = request.rnw & request_fifo.valid & ((request.sub_id[1:0] != L1_DCACHE_ID) | ((request.sub_id[1:0] == L1_DCACHE_ID) & (write_in_flight_count == 0)));
assign read_pop = axi.arvalid & axi.arready;
//Write Channel
assign axi.awlen = '0;
assign axi.awburst = '0;//2'b01;// INCR
assign axi.awsize = 3'b010;//4 bytes
assign axi.bready = 1;
assign axi.awcache = 4'b0000;//Normal Non-cacheable Non-bufferable
assign axi.awaddr = {request.addr, 2'b00};
assign axi.awid = 6'(request.sub_id);
assign write_request = (~request.rnw) & request_fifo.valid & data_fifo.valid;
assign axi.awvalid = write_request & ~aw_complete_r;
assign axi.wdata = data_fifo.data_out.data;
assign axi.wstrb = data_fifo.data_out.be;
assign axi.wvalid = write_request & ~w_complete_r;
assign axi.wlast = axi.wvalid;
assign aw_complete = axi.awvalid & axi.awready;
assign w_complete = axi.wvalid & axi.wready;
always_ff @ (posedge clk) begin
if (rst)
aw_complete_r <= 0;
else
aw_complete_r <= (aw_complete_r | aw_complete) & ~write_pop;
end
always_ff @ (posedge clk) begin
if (rst)
w_complete_r <= 0;
else
w_complete_r <= (w_complete_r | w_complete) & ~write_pop;
end
always_ff @ (posedge clk) begin
if (rst)
write_in_flight_count <= 0;
else
write_in_flight_count <= write_in_flight_count + $clog2(MAX_WRITE_IN_FLIGHT+1)'(write_pop) - $clog2(MAX_WRITE_IN_FLIGHT+1)'(axi.bvalid);
end
assign write_pop = (aw_complete | aw_complete_r) & (w_complete | w_complete_r);
////////////////////////////////////////////////////
//Return Path
//L1 always acks data, no need for rd_data_ack
always_ff @ (posedge clk) begin
cpu.rd_data <= axi.rdata;
cpu.rd_data_valid <= axi.rvalid;
cpu.rd_sub_id <= axi.rid[L2_SUB_ID_W-1:0];
end
endmodule

View file

@ -1,147 +0,0 @@
/*
* Copyright © 2023 Eric Matthews
*
* Licensed under the Apache License, Version 2.0 (the "License");
* you may not use this file except in compliance with the License.
* You may obtain a copy of the License at
*
* http://www.apache.org/licenses/LICENSE-2.0
*
* Unless required by applicable law or agreed to in writing, software
* distributed under the License is distributed on an "AS IS" BASIS,
* WITHOUT WARRANTIES OR CONDITIONS OF ANY KIND, either express or implied.
* See the License for the specific language governing permissions and
* limitations under the License.
*
* Initial code developed under the supervision of Dr. Lesley Shannon,
* Reconfigurable Computing Lab, Simon Fraser University.
*
* Author(s):
* Eric Matthews <ematthew@sfu.ca>
*/
package nexys_config;
import cva5_config::*;
localparam wb_group_config_t NEXYS_WB_GROUP_CONFIG = '{
0 : '{0: ALU_ID, default : NON_WRITEBACK_ID},
1 : '{0: LS_ID, default : NON_WRITEBACK_ID},
2 : '{0: MUL_ID, 1: DIV_ID, 2: CSR_ID, 3: FPU_ID, 4: CUSTOM_ID, default : NON_WRITEBACK_ID},
default : '{default : NON_WRITEBACK_ID}
};
localparam cpu_config_t NEXYS_CONFIG = '{
//ISA options
INCLUDE_M_MODE : 1,
INCLUDE_S_MODE : 0,
INCLUDE_U_MODE : 0,
INCLUDE_UNIT : '{
ALU : 1,
LS : 1,
MUL : 1,
DIV : 1,
CSR : 1,
FPU : 0,
CUSTOM : 0,
BR : 1,
IEC : 1
},
INCLUDE_IFENCE : 0,
INCLUDE_AMO : 0,
INCLUDE_CBO : 0,
//CSR constants
CSRS : '{
MACHINE_IMPLEMENTATION_ID : 0,
CPU_ID : 0,
RESET_VEC : 32'h80000000,
RESET_MTVEC : 32'h80000000,
NON_STANDARD_OPTIONS : '{
COUNTER_W : 33,
MCYCLE_WRITEABLE : 0,
MINSTR_WRITEABLE : 0,
MTVEC_WRITEABLE : 1,
INCLUDE_MSCRATCH : 0,
INCLUDE_MCAUSE : 1,
INCLUDE_MTVAL : 1
}
},
//Memory Options
SQ_DEPTH : 8,
INCLUDE_FORWARDING_TO_STORES : 1,
INCLUDE_ICACHE : 1,
ICACHE_ADDR : '{
L : 32'h80000000,
H : 32'h87FFFFFF
},
ICACHE : '{
LINES : 256,
LINE_W : 8,
WAYS : 2,
USE_EXTERNAL_INVALIDATIONS : 0,
USE_NON_CACHEABLE : 0,
NON_CACHEABLE : '{
L : 32'h88000000,
H : 32'h8FFFFFFF
}
},
ITLB : '{
WAYS : 2,
DEPTH : 64
},
INCLUDE_DCACHE : 1,
DCACHE_ADDR : '{
L : 32'h80000000,
H : 32'h8FFFFFFF
},
DCACHE : '{
LINES : 512,
LINE_W : 8,
WAYS : 1,
USE_EXTERNAL_INVALIDATIONS : 0,
USE_NON_CACHEABLE : 1,
NON_CACHEABLE : '{
L : 32'h88000000,
H : 32'h8FFFFFFF
}
},
DTLB : '{
WAYS : 2,
DEPTH : 64
},
INCLUDE_ILOCAL_MEM : 0,
ILOCAL_MEM_ADDR : '{
L : 32'h80000000,
H : 32'h8FFFFFFF
},
INCLUDE_DLOCAL_MEM : 0,
DLOCAL_MEM_ADDR : '{
L : 32'h80000000,
H : 32'h8FFFFFFF
},
INCLUDE_IBUS : 0,
IBUS_ADDR : '{
L : 32'h00000000,
H : 32'hFFFFFFFF
},
INCLUDE_PERIPHERAL_BUS : 0,
PERIPHERAL_BUS_ADDR : '{
L : 32'h00000000,
H : 32'hFFFFFFFF
},
PERIPHERAL_BUS_TYPE : AXI_BUS,
//Branch Predictor Options
INCLUDE_BRANCH_PREDICTOR : 1,
BP : '{
WAYS : 2,
ENTRIES : 512,
RAS_ENTRIES : 8
},
//Writeback Options
NUM_WB_GROUPS : 3,
WB_GROUP : NEXYS_WB_GROUP_CONFIG
};
endpackage

View file

@ -1,422 +0,0 @@
/*
* Copyright © 2017 Eric Matthews, Lesley Shannon
*
* Licensed under the Apache License, Version 2.0 (the "License");
* you may not use this file except in compliance with the License.
* You may obtain a copy of the License at
*
* http://www.apache.org/licenses/LICENSE-2.0
*
* Unless required by applicable law or agreed to in writing, software
* distributed under the License is distributed on an "AS IS" BASIS,
* WITHOUT WARRANTIES OR CONDITIONS OF ANY KIND, either express or implied.
* See the License for the specific language governing permissions and
* limitations under the License.
*
* Initial code developed under the supervision of Dr. Lesley Shannon,
* Reconfigurable Computing Lab, Simon Fraser University.
*
* Author(s):
* Eric Matthews <ematthew@sfu.ca>
*/
module cva5_sim
import cva5_config::*;
import l2_config_and_types::*;
import riscv_types::*;
import cva5_types::*;
import nexys_config::*;
# (
parameter MEMORY_FILE = "<path to executable>.hw_init" //change this to appropriate location
)
(
input logic clk,
input logic rst,
//DDR AXI
output logic [31:0]ddr_axi_araddr,
output logic [1:0]ddr_axi_arburst,
output logic [3:0]ddr_axi_arcache,
output logic [5:0]ddr_axi_arid,
output logic [7:0]ddr_axi_arlen,
output logic [0:0]ddr_axi_arlock,
output logic [2:0]ddr_axi_arprot,
output logic [3:0]ddr_axi_arqos,
input logic ddr_axi_arready,
output logic [3:0]ddr_axi_arregion,
output logic [2:0]ddr_axi_arsize,
output logic ddr_axi_arvalid,
output logic [31:0]ddr_axi_awaddr,
output logic [1:0]ddr_axi_awburst,
output logic [3:0]ddr_axi_awcache,
output logic [5:0]ddr_axi_awid,
output logic [7:0]ddr_axi_awlen,
output logic [0:0]ddr_axi_awlock,
output logic [2:0]ddr_axi_awprot,
output logic [3:0]ddr_axi_awqos,
input logic ddr_axi_awready,
output logic [3:0]ddr_axi_awregion,
output logic [2:0]ddr_axi_awsize,
output logic ddr_axi_awvalid,
output logic [5:0]ddr_axi_bid,
output logic ddr_axi_bready,
input logic [1:0]ddr_axi_bresp,
input logic ddr_axi_bvalid,
input logic [31:0]ddr_axi_rdata,
input logic [5:0]ddr_axi_rid,
input logic ddr_axi_rlast,
output logic ddr_axi_rready,
input logic [1:0]ddr_axi_rresp,
input logic ddr_axi_rvalid,
output logic [31:0]ddr_axi_wdata,
output logic ddr_axi_wlast,
input logic ddr_axi_wready,
output logic [3:0]ddr_axi_wstrb,
output logic ddr_axi_wvalid,
output logic [5:0]ddr_axi_wid,
//Local Memory
output logic [29:0] instruction_bram_addr,
output logic instruction_bram_en,
output logic [3:0] instruction_bram_be,
output logic [31:0] instruction_bram_data_in,
input logic [31:0] instruction_bram_data_out,
output logic [29:0] data_bram_addr,
output logic data_bram_en,
output logic [3:0] data_bram_be,
output logic [31:0] data_bram_data_in,
input logic [31:0] data_bram_data_out,
//Used by verilator
output logic write_uart,
output logic [7:0] uart_byte,
//Trace Interface
output integer NUM_RETIRE_PORTS,
output logic [31:0] retire_ports_instruction [RETIRE_PORTS],
output logic [31:0] retire_ports_pc [RETIRE_PORTS],
output logic retire_ports_valid [RETIRE_PORTS],
output logic store_queue_empty
);
parameter SCRATCH_MEM_KB = 128;
parameter MEM_LINES = (SCRATCH_MEM_KB*1024)/4;
parameter UART_ADDR = 32'h88001000;
parameter UART_ADDR_LINE_STATUS = 32'h88001014;
interrupt_t s_interrupt;
interrupt_t m_interrupt;
assign s_interrupt = '{default: 0};
assign m_interrupt = '{default: 0};
local_memory_interface instruction_bram();
local_memory_interface data_bram();
axi_interface m_axi ();
avalon_interface m_avalon();
wishbone_interface dwishbone();
wishbone_interface iwishbone();
//L2 and AXI
axi_interface axi ();
l2_requester_interface l2 ();
assign instruction_bram_addr = instruction_bram.addr;
assign instruction_bram_en = instruction_bram.en;
assign instruction_bram_be = instruction_bram.be;
assign instruction_bram_data_in = instruction_bram.data_in;
assign instruction_bram.data_out = instruction_bram_data_out;
assign data_bram_addr = data_bram.addr;
assign data_bram_en = data_bram.en;
assign data_bram_be = data_bram.be;
assign data_bram_data_in = data_bram.data_in;
assign data_bram.data_out = data_bram_data_out;
l1_to_axi arb(.*, .cpu(l2), .axi(axi));
cva5 #(.CONFIG(NEXYS_CONFIG)) cpu(.*);
initial begin
write_uart = 0;
uart_byte = 0;
end
//Capture writes to UART
always_ff @(posedge clk) begin
write_uart <= (axi.wvalid && axi.wready && axi.awaddr == UART_ADDR);
uart_byte <= axi.wdata[7:0];
end
////////////////////////////////////////////////////
//DDR AXI interface
assign ddr_axi_araddr = axi.araddr;
assign ddr_axi_arburst = axi.arburst;
assign ddr_axi_arcache = axi.arcache;
assign ddr_axi_arid = axi.arid;
assign ddr_axi_arlen = axi.arlen;
assign axi.arready = ddr_axi_arready;
assign ddr_axi_arsize = axi.arsize;
assign ddr_axi_arvalid = axi.arvalid;
assign ddr_axi_awaddr = axi.awaddr;
assign ddr_axi_awburst = axi.awburst;
assign ddr_axi_awcache = axi.awcache;
assign ddr_axi_awid = axi.awid;
assign ddr_axi_awlen = axi.awlen;
assign axi.awready = ddr_axi_awready;
assign ddr_axi_awvalid = axi.awvalid;
assign axi.bid = ddr_axi_bid;
assign ddr_axi_bready = axi.bready;
assign axi.bresp = ddr_axi_bresp;
assign axi.bvalid = ddr_axi_bvalid;
assign axi.rdata = ddr_axi_rdata;
assign axi.rid = ddr_axi_rid;
assign axi.rlast = ddr_axi_rlast;
assign ddr_axi_rready = axi.rready;
assign axi.rresp = ddr_axi_rresp;
assign axi.rvalid = ddr_axi_rvalid;
assign ddr_axi_wdata = axi.wdata;
assign ddr_axi_wlast = axi.wlast;
assign axi.wready = ddr_axi_wready;
assign ddr_axi_wstrb = axi.wstrb;
assign ddr_axi_wvalid = axi.wvalid;
////////////////////////////////////////////////////
//Trace Interface
localparam BENCHMARK_START_COLLECTION_NOP = 32'h00C00013;
localparam BENCHMARK_END_COLLECTION_NOP = 32'h00D00013;
logic start_collection;
logic end_collection;
//NOP detection
always_comb begin
start_collection = 0;
end_collection = 0;
foreach(retire_ports_valid[i]) begin
start_collection |= retire_ports_valid[i] & (retire_ports_instruction[i] == BENCHMARK_START_COLLECTION_NOP);
end_collection |= retire_ports_valid[i] & (retire_ports_instruction[i] == BENCHMARK_END_COLLECTION_NOP);
end
end
//Hierarchy paths for major components
`define FETCH_P cpu.fetch_block
`define ICACHE_P cpu.fetch_block.gen_fetch_icache.i_cache
`define BRANCH_P cpu.branch_unit_block
`define ISSUE_P cpu.decode_and_issue_block
`define RENAME_P cpu.renamer_block
`define METADATA_P cpu.id_block
`define LS_P cpu.load_store_unit_block
`define DIV_P cpu.gen_div.div_unit_block
`define LSQ_P cpu.load_store_unit_block.lsq_block
`define DCACHE_P cpu.load_store_unit_block.gen_ls_dcache.data_cache
stats_t stats_enum;
instruction_mix_stats_t instruction_mix_enum;
localparam NUM_STATS = stats_enum.num();
localparam NUM_INSTRUCTION_MIX_STATS = instruction_mix_enum.num();
logic stats [NUM_STATS];
logic is_mul [RETIRE_PORTS];
logic is_div [RETIRE_PORTS];
logic [NUM_INSTRUCTION_MIX_STATS-1:0] instruction_mix_stats [RETIRE_PORTS];
logic icache_hit;
logic icache_miss;
logic iarb_stall;
logic dcache_hit;
logic dcache_miss;
logic darb_stall;
//Issue stalls
logic base_no_instruction_stall;
logic base_no_id_sub_stall;
logic base_flush_sub_stall;
logic base_unit_busy_stall;
logic base_operands_stall;
logic base_hold_stall;
logic single_source_issue_stall;
logic [3:0] stall_source_count;
///////////////
//Issue rd_addr to unit mem
//Used for determining what outputs an operand stall is waiting on
logic [MAX_NUM_UNITS-1:0] rd_addr_table [32];
always_ff @(posedge clk) begin
if (cpu.instruction_issued_with_rd)
rd_addr_table[`ISSUE_P.issue.rd_addr] <= `ISSUE_P.unit_needed_issue_stage;
end
generate if (NEXYS_CONFIG.INCLUDE_ICACHE) begin
assign icache_hit = `ICACHE_P.tag_hit;
assign icache_miss = `ICACHE_P.second_cycle & ~`ICACHE_P.tag_hit;
assign iarb_stall = `ICACHE_P.request_r & ~cpu.l1_request[L1_ICACHE_ID].ack;
end endgenerate
generate if (NEXYS_CONFIG.INCLUDE_DCACHE) begin
assign dcache_hit = `DCACHE_P.load_hit;
assign dcache_miss = `DCACHE_P.line_complete;
assign darb_stall = cpu.l1_request[L1_DCACHE_ID].request & ~cpu.l1_request[L1_DCACHE_ID].ack;
end endgenerate
logic [MAX_NUM_UNITS-1:0] unit_ready;
generate for (i=0; i<MAX_NUM_UNITS; i++)
assign unit_ready[i] = cpu.unit_issue[i].ready;
endgenerate
always_comb begin
stats = '{default: '0};
//Fetch
stats[FETCH_EARLY_BR_CORRECTION_STAT] = `FETCH_P.early_branch_flush;
stats[FETCH_SUB_UNIT_STALL_STAT] = `METADATA_P.pc_id_available & ~`FETCH_P.units_ready;
stats[FETCH_ID_STALL_STAT] = ~`METADATA_P.pc_id_available;
stats[FETCH_IC_HIT_STAT] = icache_hit;
stats[FETCH_IC_MISS_STAT] = icache_miss;
stats[FETCH_IC_ARB_STALL_STAT] = iarb_stall;
//Branch predictor
stats[FETCH_BP_BR_CORRECT_STAT] = `BRANCH_P.instruction_is_completing & ~`BRANCH_P.is_return_ex & ~`BRANCH_P.branch_flush;
stats[FETCH_BP_BR_MISPREDICT_STAT] = `BRANCH_P.instruction_is_completing & ~`BRANCH_P.is_return_ex & `BRANCH_P.branch_flush;
stats[FETCH_BP_RAS_CORRECT_STAT] = `BRANCH_P.instruction_is_completing & `BRANCH_P.is_return_ex & ~`BRANCH_P.branch_flush;
stats[FETCH_BP_RAS_MISPREDICT_STAT] = `BRANCH_P.instruction_is_completing & `BRANCH_P.is_return_ex & `BRANCH_P.branch_flush;
//Issue stalls
base_no_instruction_stall = ~`ISSUE_P.issue.stage_valid | cpu.gc.fetch_flush;
base_no_id_sub_stall = (`METADATA_P.post_issue_count == MAX_IDS);
base_flush_sub_stall = cpu.gc.fetch_flush;
base_unit_busy_stall = `ISSUE_P.issue.stage_valid & ~|(`ISSUE_P.unit_needed_issue_stage & unit_ready);
base_operands_stall = `ISSUE_P.issue.stage_valid & ~(&`ISSUE_P.operand_ready);
base_hold_stall = `ISSUE_P.issue.stage_valid & `ISSUE_P.issue_hold;
stall_source_count = 4'(base_no_instruction_stall) + 4'(base_unit_busy_stall) + 4'(base_operands_stall) + 4'(base_hold_stall);
single_source_issue_stall = (stall_source_count == 1);
//Issue stall determination
stats[ISSUE_NO_INSTRUCTION_STAT] = base_no_instruction_stall & single_source_issue_stall;
stats[ISSUE_NO_ID_STAT] = base_no_instruction_stall & base_no_id_sub_stall & single_source_issue_stall;
stats[ISSUE_FLUSH_STAT] = base_no_instruction_stall & base_flush_sub_stall & single_source_issue_stall;
stats[ISSUE_UNIT_BUSY_STAT] = base_unit_busy_stall & single_source_issue_stall;
stats[ISSUE_OPERANDS_NOT_READY_STAT] = base_operands_stall & single_source_issue_stall;
stats[ISSUE_HOLD_STAT] = base_hold_stall & single_source_issue_stall;
stats[ISSUE_MULTI_SOURCE_STAT] = (base_no_instruction_stall | base_unit_busy_stall | base_operands_stall | base_hold_stall) & ~single_source_issue_stall;
//Misc Issue stats
stats[ISSUE_OPERAND_STALL_FOR_BRANCH_STAT] = stats[ISSUE_OPERANDS_NOT_READY_STAT] & `ISSUE_P.unit_needed_issue_stage[BR_ID];
stats[ISSUE_STORE_WITH_FORWARDED_DATA_STAT] = `ISSUE_P.issue_to[LS_ID] & `LS_P.issue_attr.is_store & `LS_P.rs2_inuse;
stats[ISSUE_DIVIDER_RESULT_REUSE_STAT] = `ISSUE_P.issue_to[DIV_ID] & `DIV_P.div_op_reuse;
//Issue Stall Source
for (int i = 0; i < REGFILE_READ_PORTS; i++) begin
stats[ISSUE_OPERAND_STALL_ON_LOAD_STAT] |= `ISSUE_P.issue.stage_valid & rd_addr_table[`ISSUE_P.issue_rs_addr[i]][LS_ID] & ~`ISSUE_P.operand_ready[i] ;
stats[ISSUE_OPERAND_STALL_ON_MULTIPLY_STAT] |= EXAMPLE_CONFIG.INCLUDE_UNIT.MUL & `ISSUE_P.issue.stage_valid & rd_addr_table[`ISSUE_P.issue_rs_addr[i]][MUL_ID] & ~`ISSUE_P.operand_ready[i] ;
stats[ISSUE_OPERAND_STALL_ON_DIVIDE_STAT] |= EXAMPLE_CONFIG.INCLUDE_UNIT.DIV & `ISSUE_P.issue.stage_valid & rd_addr_table[`ISSUE_P.issue_rs_addr[i]][DIV_ID] & ~`ISSUE_P.operand_ready[i] ;
end
//LS Stats
stats[LSU_LOAD_BLOCKED_BY_STORE_STAT] = `LSQ_P.lq.valid & `LSQ_P.load_blocked;
stats[LSU_SUB_UNIT_STALL_STAT] = (`LS_P.lsq.load_valid | `LS_P.lsq.store_valid) & ~`LS_P.sub_unit_ready;
stats[LSU_DC_HIT_STAT] = dcache_hit;
stats[LSU_DC_MISS_STAT] = dcache_miss;
stats[LSU_DC_ARB_STALL_STAT] = darb_stall;
//Retire Instruction Mix
for (int i = 0; i < RETIRE_PORTS; i++) begin
is_mul[i] = retire_ports_instruction[i][25] & ~retire_ports_instruction[i][14];
is_div[i] = retire_ports_instruction[i][25] & retire_ports_instruction[i][14];
instruction_mix_stats[i][ALU_STAT] = cpu.retire_port_valid[i] & (retire_ports_instruction[i][6:2] inside {ARITH_T, ARITH_IMM_T, AUIPC_T, LUI_T}) & ~(is_mul[i] | is_div[i]);
instruction_mix_stats[i][BR_STAT] = cpu.retire_port_valid[i] & (retire_ports_instruction[i][6:2] inside {BRANCH_T, JAL_T, JALR_T});
instruction_mix_stats[i][MUL_STAT] = cpu.retire_port_valid[i] & (retire_ports_instruction[i][6:2] inside {ARITH_T}) & is_mul[i];
instruction_mix_stats[i][DIV_STAT] = cpu.retire_port_valid[i] & (retire_ports_instruction[i][6:2] inside {ARITH_T}) & is_div[i];
instruction_mix_stats[i][LOAD_STAT] = cpu.retire_port_valid[i] & (retire_ports_instruction[i][6:2] inside {LOAD_T, FPU_LOAD_T, AMO_T});// & retire_ports_instruction[i][14:12] inside {LS_B_fn3, L_BU_fn3};
instruction_mix_stats[i][STORE_STAT] = cpu.retire_port_valid[i] & (retire_ports_instruction[i][6:2] inside {STORE_T, FPU_STORE_T, AMO_T});
instruction_mix_stats[i][FPU_STAT] = cpu.retire_port_valid[i] & (retire_ports_instruction[i][6:2] inside {FPU_MADD_T, FPU_MSUB_T, FPU_NMSUB_T, FPU_NMADD_T, FPU_OP_T});
instruction_mix_stats[i][MISC_STAT] = cpu.retire_port_valid[i] & (retire_ports_instruction[i][6:2] inside {SYSTEM_T, FENCE_T});
end
end
sim_stats #(.NUM_OF_STATS(NUM_STATS), .NUM_INSTRUCTION_MIX_STATS(NUM_INSTRUCTION_MIX_STATS)) stats_block (
.clk (clk),
.rst (rst),
.start_collection (start_collection),
.end_collection (end_collection),
.stats (stats),
.instruction_mix_stats (instruction_mix_stats),
.retire_count (cpu.retire_count)
);
////////////////////////////////////////////////////
//Performs the lookups to provide the speculative architectural register file with
//standard register names for simulation purposes
logic [31:0][31:0] sim_registers_unamed_groups[NEXYS_CONFIG.NUM_WB_GROUPS];
logic [31:0][31:0] sim_registers_unamed;
simulation_named_regfile sim_register;
typedef struct packed{
phys_addr_t phys_addr;
logic [$clog2(NEXYS_CONFIG.NUM_WB_GROUPS)-1:0] wb_group;
} spec_table_t;
spec_table_t translation [32];
genvar i, j;
generate for (i = 0; i < 32; i++) begin : gen_reg_file_sim
for (j = 0; j < NEXYS_CONFIG.NUM_WB_GROUPS; j++) begin
if (FPGA_VENDOR == XILINX) begin
assign translation[i] = cpu.renamer_block.spec_table_ram.xilinx_gen.ram[i];
assign sim_registers_unamed_groups[j][i] =
cpu.register_file_block.register_file_gen[j].register_file_bank.xilinx_gen.ram[translation[i].phys_addr];
end else if (FPGA_VENDOR == INTEL) begin
assign translation[i] = cpu.renamer_block.spec_table_ram.intel_gen.lutrams[0].write_port.ram[i];
assign sim_registers_unamed_groups[j][i] =
cpu.register_file_block.register_file_gen[j].register_file_bank.intel_gen.lutrams[0].write_port.ram[translation[i].phys_addr];
end
end
assign sim_registers_unamed[31-i] = sim_registers_unamed_groups[translation[i].wb_group][i];
end
endgenerate
//FPU
logic [31:0][FLEN-1:0] fp_sim_registers_unamed_groups[2];
logic [31:0][FLEN-1:0] fp_sim_registers_unamed;
fp_simulation_named_regfile fp_sim_register;
typedef struct packed{
phys_addr_t fp_phys_addr;
logic fp_wb_group;
} fp_spec_table_t;
fp_spec_table_t fp_translation [32];
generate if (NEXYS_CONFIG.INCLUDE_UNIT.FPU) begin : gen_fp_reg_file_sim
for (i = 0; i < 32; i++) begin
for (j = 0; j < 2; j++) begin
if (FPGA_VENDOR == XILINX) begin
assign fp_translation[i] = cpu.gen_fpu.fp_renamer_block.spec_table_ram.xilinx_gen.ram[i];
assign fp_sim_registers_unamed_groups[j][i] = cpu.gen_fpu.fp_register_file_block.register_file_gen[j].register_file_bank.xilinx_gen.ram[fp_translation[i].fp_phys_addr];
end else if (FPGA_VENDOR == INTEL) begin
assign fp_translation[i] = cpu.gen_fpu.fp_renamer_block.spec_table_ram.intel_gen.lutrams[0].write_port.ram[i];
assign fp_sim_registers_unamed_groups[j][i] = cpu.gen_fpu.fp_register_file_block.register_file_gen[j].register_file_bank.intel_gen.lutrams[0].write_port.ram[fp_translation[i].fp_phys_addr];
end
end
assign fp_sim_registers_unamed[31-i] = fp_sim_registers_unamed_groups[fp_translation[i].fp_wb_group][i];
end
end
endgenerate
assign NUM_RETIRE_PORTS = RETIRE_PORTS;
generate for (genvar i = 0; i < RETIRE_PORTS; i++) begin
assign retire_ports_pc[i] = cpu.id_block.pc_table.ram[cpu.retire_ids[i]];
assign retire_ports_instruction[i] = cpu.id_block.instruction_table.ram[cpu.retire_ids[i]];
assign retire_ports_valid[i] = cpu.retire_port_valid[i];
end endgenerate
assign store_queue_empty = cpu.load_store_status.sq_empty;
////////////////////////////////////////////////////
//Assertion Binding
endmodule

View file

@ -1,142 +0,0 @@
/*
* Copyright © 2017 Eric Matthews, Lesley Shannon
*
* Licensed under the Apache License, Version 2.0 (the "License");
* you may not use this file except in compliance with the License.
* You may obtain a copy of the License at
*
* http://www.apache.org/licenses/LICENSE-2.0
*
* Unless required by applicable law or agreed to in writing, software
* distributed under the License is distributed on an "AS IS" BASIS,
* WITHOUT WARRANTIES OR CONDITIONS OF ANY KIND, either express or implied.
* See the License for the specific language governing permissions and
* limitations under the License.
*
* Initial code developed under the supervision of Dr. Lesley Shannon,
* Reconfigurable Computing Lab, Simon Fraser University.
*
* Author(s):
* Eric Matthews <ematthew@sfu.ca>
*/
module nexys_wrapper
import cva5_config::*;
import cva5_types::*;
import l2_config_and_types::*;
import nexys_config::*;
(
input logic clk,
input logic rst,
// AXI SIGNALS - need these to unwrap the interface for packaging //
input logic m_axi_arready,
output logic m_axi_arvalid,
output logic [31:0] m_axi_araddr,
output logic [7:0] m_axi_arlen,
output logic [2:0] m_axi_arsize,
output logic [1:0] m_axi_arburst,
output logic [3:0] m_axi_arcache,
output logic [5:0] m_axi_arid,
//read data
output logic m_axi_rready,
input logic m_axi_rvalid,
input logic [31:0] m_axi_rdata,
input logic [1:0] m_axi_rresp,
input logic m_axi_rlast,
input logic [5:0] m_axi_rid,
//Write channel
//write address
input logic m_axi_awready,
output logic m_axi_awvalid,
output logic [31:0] m_axi_awaddr,
output logic [7:0] m_axi_awlen,
output logic [2:0] m_axi_awsize,
output logic [1:0] m_axi_awburst,
output logic [3:0] m_axi_awcache,
output logic [5:0] m_axi_awid,
//write data
input logic m_axi_wready,
output logic m_axi_wvalid,
output logic [31:0] m_axi_wdata,
output logic [3:0] m_axi_wstrb,
output logic m_axi_wlast,
//write response
output logic m_axi_bready,
input logic m_axi_bvalid,
input logic [1:0] m_axi_bresp,
input logic [5:0] m_axi_bid
);
//Unused outputs
local_memory_interface instruction_bram ();
local_memory_interface data_bram ();
avalon_interface m_avalon ();
wishbone_interface dwishbone ();
wishbone_interface iwishbone ();
axi_interface m_axi ();
interrupt_t m_interrupt;
interrupt_t s_interrupt;
//L2 and AXI
l2_requester_interface l2 ();
axi_interface axi ();
logic rst_r1, rst_r2;
assign axi.arready = m_axi_arready;
assign m_axi_arvalid = axi.arvalid;
assign m_axi_araddr = axi.araddr;
assign m_axi_arlen = axi.arlen;
assign m_axi_arsize = axi.arsize;
assign m_axi_arburst = axi.arburst;
assign m_axi_arcache = axi.arcache;
assign m_axi_arid = axi.arid;
assign m_axi_rready = axi.rready;
assign axi.rvalid = m_axi_rvalid;
assign axi.rdata = m_axi_rdata;
assign axi.rresp = m_axi_rresp;
assign axi.rlast = m_axi_rlast;
assign axi.rid = m_axi_rid;
assign axi.awready = m_axi_awready;
assign m_axi_awvalid = axi.awvalid;
assign m_axi_awaddr = axi.awaddr;
assign m_axi_awlen = axi.awlen;
assign m_axi_awsize = axi.awsize;
assign m_axi_awburst = axi.awburst;
assign m_axi_awcache = axi.awcache;
assign m_axi_awid = axi.awid;
//write data
assign axi.wready = m_axi_wready;
assign m_axi_wvalid = axi.wvalid;
assign m_axi_wdata = axi.wdata;
assign m_axi_wstrb = axi.wstrb;
assign m_axi_wlast = axi.wlast;
//write response
assign m_axi_bready = axi.bready;
assign axi.bvalid = m_axi_bvalid;
assign axi.bresp = m_axi_bresp;
assign axi.bid = m_axi_bid;
always_ff @ (posedge clk) begin
rst_r1 <= rst;
rst_r2 <= rst_r1;
end
l1_to_axi arb(.*, .cpu(l2), .axi(axi));
cva5 #(.CONFIG(NEXYS_CONFIG)) cpu(.rst(rst_r2), .*);
endmodule

View file

@ -1,88 +0,0 @@
# Set the reference directory for source file relative paths (by default the value is script directory path)
set origin_dir [file dirname [info script]]
# Set the project name
set _xil_proj_name_ "cva5_nexys_wrapper"
set sources_dir $origin_dir/../../../
# Create project
create_project ${_xil_proj_name_} $origin_dir/${_xil_proj_name_}
# Set the directory path for the new project
set proj_dir [get_property directory [current_project]]
# Set project properties
set obj [current_project]
set_property -name "simulator_language" -value "Mixed" -objects $obj
set_property -name "target_language" -value "Verilog" -objects $obj
# Create 'sources_1' fileset (if not found)
if {[string equal [get_filesets -quiet sources_1] ""]} {
create_fileset -srcset sources_1
}
#import sources needed for blackbox packaging
import_files -norecurse $sources_dir/examples/nexys/nexys_config.sv
import_files -norecurse $sources_dir/examples/nexys/nexys_wrapper.sv
import_files -norecurse $sources_dir/l2_arbiter/l2_external_interfaces.sv
import_files -norecurse $sources_dir/local_memory/local_memory_interface.sv
import_files -norecurse $sources_dir/core/types_and_interfaces/external_interfaces.sv
import_files -norecurse $sources_dir/core/types_and_interfaces/cva5_config.sv
import_files -norecurse $sources_dir/core/types_and_interfaces/riscv_types.sv
import_files -norecurse $sources_dir/core/types_and_interfaces/cva5_types.sv
import_files -norecurse $sources_dir/core/types_and_interfaces/csr_types.sv
import_files -norecurse $sources_dir/l2_arbiter/l2_config_and_types.sv
# Set IP repository paths
set obj [get_filesets sources_1]
set_property "ip_repo_paths" "[file normalize "$origin_dir/${_xil_proj_name_}"]" $obj
# Rebuild user ip_repo's index before adding any source files
update_ip_catalog -rebuild
# Set 'sources_1' fileset properties
set obj [get_filesets sources_1]
set_property -name "top" -value "nexys_wrapper" -objects $obj
set_property -name "top_auto_set" -value "0" -objects $obj
set_property -name "top_file" -value " ${sources_dir}/examples/nexys/nexys_wrapper.sv" -objects $obj
############## Initial IP Packaging
ipx::package_project -import_files -force -root_dir $proj_dir
update_compile_order -fileset sources_1
ipx::create_xgui_files [ipx::current_core]
ipx::update_checksums [ipx::current_core]
ipx::save_core [ipx::current_core]
# To set the axi interface as aximm and port map all the signals over #
##### Naming
set_property name CVA5 [ipx::current_core]
set_property display_name CVA5_NEXYS7 [ipx::current_core]
set_property description CVA5_NEXYS7 [ipx::current_core]
set_property vendor {} [ipx::current_core]
set_property vendor user [ipx::current_core]
##### Re-Adding of project files
set_property ip_repo_paths $sources_dir/${_xil_proj_name_} [current_project]
current_project $_xil_proj_name_
update_ip_catalog
import_files -force -fileset [get_filesets sources_1] $sources_dir/core
import_files -force -fileset [get_filesets sources_1] $sources_dir/l2_arbiter
import_files -force -fileset [get_filesets sources_1] $sources_dir/local_memory
import_files -fileset [get_filesets sources_1] $sources_dir/examples/nexys/l1_to_axi.sv
############## Re-packaging of core
update_compile_order -fileset sources_1
ipx::merge_project_changes files [ipx::current_core]
set_property core_revision 1 [ipx::current_core]
ipx::create_xgui_files [ipx::current_core]
ipx::update_checksums [ipx::current_core]
ipx::save_core [ipx::current_core]
current_project ${_xil_proj_name_}
set_property "ip_repo_paths" "[file normalize "$origin_dir/${_xil_proj_name_} "]" $obj
update_ip_catalog -rebuild

View file

@ -1,18 +0,0 @@
set_property -dict {PACKAGE_PIN H17 IOSTANDARD LVCMOS33} [get_ports {LED[0]}]
set_property -dict {PACKAGE_PIN K15 IOSTANDARD LVCMOS33} [get_ports {LED[1]}]
set_property -dict {PACKAGE_PIN J13 IOSTANDARD LVCMOS33} [get_ports {LED[2]}]
set_property -dict {PACKAGE_PIN N14 IOSTANDARD LVCMOS33} [get_ports {LED[3]}]
set_property -dict {PACKAGE_PIN R18 IOSTANDARD LVCMOS33} [get_ports {LED[4]}]
set_property -dict {PACKAGE_PIN V17 IOSTANDARD LVCMOS33} [get_ports {LED[5]}]
set_property -dict {PACKAGE_PIN U17 IOSTANDARD LVCMOS33} [get_ports {LED[6]}]
set_property -dict {PACKAGE_PIN U16 IOSTANDARD LVCMOS33} [get_ports {LED[7]}]
set_property -dict {PACKAGE_PIN V16 IOSTANDARD LVCMOS33} [get_ports {LED[8]}]
set_property -dict {PACKAGE_PIN T15 IOSTANDARD LVCMOS33} [get_ports {LED[9]}]
set_property -dict {PACKAGE_PIN U14 IOSTANDARD LVCMOS33} [get_ports {LED[10]}]
set_property -dict {PACKAGE_PIN T16 IOSTANDARD LVCMOS33} [get_ports {LED[11]}]
set_property -dict {PACKAGE_PIN V15 IOSTANDARD LVCMOS33} [get_ports {LED[12]}]
set_property -dict {PACKAGE_PIN V14 IOSTANDARD LVCMOS33} [get_ports {LED[13]}]
set_property -dict {PACKAGE_PIN V12 IOSTANDARD LVCMOS33} [get_ports {LED[14]}]
set_property -dict {PACKAGE_PIN V11 IOSTANDARD LVCMOS33} [get_ports {LED[15]}]

View file

@ -1,510 +0,0 @@
################################################################
# This is a generated script based on design: system
#
# Though there are limitations about the generated script,
# the main purpose of this utility is to make learning
# IP Integrator Tcl commands easier.
################################################################
namespace eval _tcl {
proc get_script_folder {} {
set script_path [file normalize [info script]]
set script_folder [file dirname $script_path]
return $script_folder
}
}
variable script_folder
set script_folder [_tcl::get_script_folder]
################################################################
# Check if script is running in correct Vivado version.
################################################################
set scripts_vivado_version 2022.1
set current_vivado_version [version -short]
if { [string first $scripts_vivado_version $current_vivado_version] == -1 } {
puts ""
catch {common::send_gid_msg -ssname BD::TCL -id 2041 -severity "ERROR" "This script was generated using Vivado <$scripts_vivado_version> and is being run in <$current_vivado_version> of Vivado. Please run the script in Vivado <$scripts_vivado_version> then open the design in Vivado <$current_vivado_version>. Upgrade the design by running \"Tools => Report => Report IP Status...\", then run write_bd_tcl to create an updated script."}
return 1
}
################################################################
# START
################################################################
# To test this script, run the following commands from Vivado Tcl console:
# source system_script.tcl
# If there is no project opened, this script will create a
# project, but make sure you do not have an existing project
# <./myproj/project_1.xpr> in the current working folder.
set list_projs [get_projects -quiet]
if { $list_projs eq "" } {
create_project cva5-competition-baseline cva5-competition-baseline -part xc7a100tcsg324-1
set_property BOARD_PART digilentinc.com:nexys-a7-100t:part0:1.2 [current_project]
} else {
common::send_gid_msg -ssname BD::TCL -id 2100 -severity "ERROR" "Open project must be closed before running."
return -1
}
# CHANGE DESIGN NAME HERE
variable design_name
set design_name system
add_files -fileset constrs_1 -norecurse $script_folder/manual_pin_assignments.xdc
set_property ip_repo_paths $script_folder/cva5_nexys_wrapper [current_project]
update_ip_catalog
# If you do not already have an existing IP Integrator design open,
# you can create a design using the following command:
# create_bd_design $design_name
# Creating design if needed
set errMsg ""
set nRet 0
set cur_design [current_bd_design -quiet]
set list_cells [get_bd_cells -quiet]
if { ${design_name} eq "" } {
# USE CASES:
# 1) Design_name not set
set errMsg "Please set the variable <design_name> to a non-empty value."
set nRet 1
} elseif { ${cur_design} ne "" && ${list_cells} eq "" } {
# USE CASES:
# 2): Current design opened AND is empty AND names same.
# 3): Current design opened AND is empty AND names diff; design_name NOT in project.
# 4): Current design opened AND is empty AND names diff; design_name exists in project.
if { $cur_design ne $design_name } {
common::send_gid_msg -ssname BD::TCL -id 2001 -severity "INFO" "Changing value of <design_name> from <$design_name> to <$cur_design> since current design is empty."
set design_name [get_property NAME $cur_design]
}
common::send_gid_msg -ssname BD::TCL -id 2002 -severity "INFO" "Constructing design in IPI design <$cur_design>..."
} elseif { ${cur_design} ne "" && $list_cells ne "" && $cur_design eq $design_name } {
# USE CASES:
# 5) Current design opened AND has components AND same names.
set errMsg "Design <$design_name> already exists in your project, please set the variable <design_name> to another value."
set nRet 1
} elseif { [get_files -quiet ${design_name}.bd] ne "" } {
# USE CASES:
# 6) Current opened design, has components, but diff names, design_name exists in project.
# 7) No opened design, design_name exists in project.
set errMsg "Design <$design_name> already exists in your project, please set the variable <design_name> to another value."
set nRet 2
} else {
# USE CASES:
# 8) No opened design, design_name not in project.
# 9) Current opened design, has components, but diff names, design_name not in project.
common::send_gid_msg -ssname BD::TCL -id 2003 -severity "INFO" "Currently there is no design <$design_name> in project, so creating one..."
create_bd_design $design_name
common::send_gid_msg -ssname BD::TCL -id 2004 -severity "INFO" "Making design <$design_name> as current_bd_design."
current_bd_design $design_name
}
common::send_gid_msg -ssname BD::TCL -id 2005 -severity "INFO" "Currently the variable <design_name> is equal to \"$design_name\"."
if { $nRet != 0 } {
catch {common::send_gid_msg -ssname BD::TCL -id 2006 -severity "ERROR" $errMsg}
return $nRet
}
set bCheckIPsPassed 1
##################################################################
# CHECK IPs
##################################################################
set bCheckIPs 1
if { $bCheckIPs == 1 } {
set list_check_ips "\
user:user:CVA5:1.0\
xilinx.com:ip:axi_gpio:2.0\
xilinx.com:ip:axi_uart16550:2.0\
xilinx.com:ip:clk_wiz:6.0\
xilinx.com:ip:mdm:3.2\
xilinx.com:ip:mig_7series:4.2\
xilinx.com:ip:proc_sys_reset:5.0\
xilinx.com:ip:xlslice:1.0\
"
set list_ips_missing ""
common::send_gid_msg -ssname BD::TCL -id 2011 -severity "INFO" "Checking if the following IPs exist in the project's IP catalog: $list_check_ips ."
foreach ip_vlnv $list_check_ips {
set ip_obj [get_ipdefs -all $ip_vlnv]
if { $ip_obj eq "" } {
lappend list_ips_missing $ip_vlnv
}
}
if { $list_ips_missing ne "" } {
catch {common::send_gid_msg -ssname BD::TCL -id 2012 -severity "ERROR" "The following IPs are not found in the IP Catalog:\n $list_ips_missing\n\nResolution: Please add the repository containing the IP(s) to the project." }
set bCheckIPsPassed 0
}
}
if { $bCheckIPsPassed != 1 } {
common::send_gid_msg -ssname BD::TCL -id 2023 -severity "WARNING" "Will not continue with creation of design due to the error(s) above."
return 3
}
##################################################################
# MIG PRJ FILE TCL PROCs
##################################################################
proc write_mig_file_system_mig_7series_0_0 { str_mig_prj_filepath } {
file mkdir [ file dirname "$str_mig_prj_filepath" ]
set mig_prj_file [open $str_mig_prj_filepath w+]
puts $mig_prj_file {<?xml version="1.0" encoding="UTF-8" standalone="no" ?>}
puts $mig_prj_file {<Project NoOfControllers="1">}
puts $mig_prj_file { }
puts $mig_prj_file {<!-- IMPORTANT: This is an internal file that has been generated by the MIG software. Any direct editing or changes made to this file may result in unpredictable behavior or data corruption. It is strongly advised that users do not edit the contents of this file. Re-run the MIG GUI with the required settings if any of the options provided below need to be altered. -->}
puts $mig_prj_file { <ModuleName>system_mig_7series_0_0</ModuleName>}
puts $mig_prj_file { <dci_inouts_inputs>1</dci_inouts_inputs>}
puts $mig_prj_file { <dci_inputs>1</dci_inputs>}
puts $mig_prj_file { <Debug_En>OFF</Debug_En>}
puts $mig_prj_file { <DataDepth_En>1024</DataDepth_En>}
puts $mig_prj_file { <LowPower_En>ON</LowPower_En>}
puts $mig_prj_file { <XADC_En>Enabled</XADC_En>}
puts $mig_prj_file { <TargetFPGA>xc7a100t-csg324/-1</TargetFPGA>}
puts $mig_prj_file { <Version>4.2</Version>}
puts $mig_prj_file { <SystemClock>No Buffer</SystemClock>}
puts $mig_prj_file { <ReferenceClock>No Buffer</ReferenceClock>}
puts $mig_prj_file { <SysResetPolarity>ACTIVE LOW</SysResetPolarity>}
puts $mig_prj_file { <BankSelectionFlag>FALSE</BankSelectionFlag>}
puts $mig_prj_file { <InternalVref>1</InternalVref>}
puts $mig_prj_file { <dci_hr_inouts_inputs>50 Ohms</dci_hr_inouts_inputs>}
puts $mig_prj_file { <dci_cascade>0</dci_cascade>}
puts $mig_prj_file { <Controller number="0">}
puts $mig_prj_file { <MemoryDevice>DDR2_SDRAM/Components/MT47H64M16HR-25E</MemoryDevice>}
puts $mig_prj_file { <TimePeriod>5000</TimePeriod>}
puts $mig_prj_file { <VccAuxIO>1.8V</VccAuxIO>}
puts $mig_prj_file { <PHYRatio>2:1</PHYRatio>}
puts $mig_prj_file { <InputClkFreq>100</InputClkFreq>}
puts $mig_prj_file { <UIExtraClocks>1</UIExtraClocks>}
puts $mig_prj_file { <MMCM_VCO>1200</MMCM_VCO>}
puts $mig_prj_file { <MMCMClkOut0> 6.000</MMCMClkOut0>}
puts $mig_prj_file { <MMCMClkOut1>1</MMCMClkOut1>}
puts $mig_prj_file { <MMCMClkOut2>1</MMCMClkOut2>}
puts $mig_prj_file { <MMCMClkOut3>1</MMCMClkOut3>}
puts $mig_prj_file { <MMCMClkOut4>1</MMCMClkOut4>}
puts $mig_prj_file { <DataWidth>16</DataWidth>}
puts $mig_prj_file { <DeepMemory>1</DeepMemory>}
puts $mig_prj_file { <DataMask>1</DataMask>}
puts $mig_prj_file { <ECC>Disabled</ECC>}
puts $mig_prj_file { <Ordering>Strict</Ordering>}
puts $mig_prj_file { <BankMachineCnt>4</BankMachineCnt>}
puts $mig_prj_file { <CustomPart>FALSE</CustomPart>}
puts $mig_prj_file { <NewPartName/>}
puts $mig_prj_file { <RowAddress>13</RowAddress>}
puts $mig_prj_file { <ColAddress>10</ColAddress>}
puts $mig_prj_file { <BankAddress>3</BankAddress>}
puts $mig_prj_file { <C0_MEM_SIZE>134217728</C0_MEM_SIZE>}
puts $mig_prj_file { <UserMemoryAddressMap>BANK_ROW_COLUMN</UserMemoryAddressMap>}
puts $mig_prj_file { <PinSelection>}
puts $mig_prj_file { <Pin IN_TERM="" IOSTANDARD="SSTL18_II" PADName="M4" SLEW="" VCCAUX_IO="" name="ddr2_addr[0]"/>}
puts $mig_prj_file { <Pin IN_TERM="" IOSTANDARD="SSTL18_II" PADName="R2" SLEW="" VCCAUX_IO="" name="ddr2_addr[10]"/>}
puts $mig_prj_file { <Pin IN_TERM="" IOSTANDARD="SSTL18_II" PADName="K5" SLEW="" VCCAUX_IO="" name="ddr2_addr[11]"/>}
puts $mig_prj_file { <Pin IN_TERM="" IOSTANDARD="SSTL18_II" PADName="N6" SLEW="" VCCAUX_IO="" name="ddr2_addr[12]"/>}
puts $mig_prj_file { <Pin IN_TERM="" IOSTANDARD="SSTL18_II" PADName="P4" SLEW="" VCCAUX_IO="" name="ddr2_addr[1]"/>}
puts $mig_prj_file { <Pin IN_TERM="" IOSTANDARD="SSTL18_II" PADName="M6" SLEW="" VCCAUX_IO="" name="ddr2_addr[2]"/>}
puts $mig_prj_file { <Pin IN_TERM="" IOSTANDARD="SSTL18_II" PADName="T1" SLEW="" VCCAUX_IO="" name="ddr2_addr[3]"/>}
puts $mig_prj_file { <Pin IN_TERM="" IOSTANDARD="SSTL18_II" PADName="L3" SLEW="" VCCAUX_IO="" name="ddr2_addr[4]"/>}
puts $mig_prj_file { <Pin IN_TERM="" IOSTANDARD="SSTL18_II" PADName="P5" SLEW="" VCCAUX_IO="" name="ddr2_addr[5]"/>}
puts $mig_prj_file { <Pin IN_TERM="" IOSTANDARD="SSTL18_II" PADName="M2" SLEW="" VCCAUX_IO="" name="ddr2_addr[6]"/>}
puts $mig_prj_file { <Pin IN_TERM="" IOSTANDARD="SSTL18_II" PADName="N1" SLEW="" VCCAUX_IO="" name="ddr2_addr[7]"/>}
puts $mig_prj_file { <Pin IN_TERM="" IOSTANDARD="SSTL18_II" PADName="L4" SLEW="" VCCAUX_IO="" name="ddr2_addr[8]"/>}
puts $mig_prj_file { <Pin IN_TERM="" IOSTANDARD="SSTL18_II" PADName="N5" SLEW="" VCCAUX_IO="" name="ddr2_addr[9]"/>}
puts $mig_prj_file { <Pin IN_TERM="" IOSTANDARD="SSTL18_II" PADName="P2" SLEW="" VCCAUX_IO="" name="ddr2_ba[0]"/>}
puts $mig_prj_file { <Pin IN_TERM="" IOSTANDARD="SSTL18_II" PADName="P3" SLEW="" VCCAUX_IO="" name="ddr2_ba[1]"/>}
puts $mig_prj_file { <Pin IN_TERM="" IOSTANDARD="SSTL18_II" PADName="R1" SLEW="" VCCAUX_IO="" name="ddr2_ba[2]"/>}
puts $mig_prj_file { <Pin IN_TERM="" IOSTANDARD="SSTL18_II" PADName="L1" SLEW="" VCCAUX_IO="" name="ddr2_cas_n"/>}
puts $mig_prj_file { <Pin IN_TERM="" IOSTANDARD="DIFF_SSTL18_II" PADName="L5" SLEW="" VCCAUX_IO="" name="ddr2_ck_n[0]"/>}
puts $mig_prj_file { <Pin IN_TERM="" IOSTANDARD="DIFF_SSTL18_II" PADName="L6" SLEW="" VCCAUX_IO="" name="ddr2_ck_p[0]"/>}
puts $mig_prj_file { <Pin IN_TERM="" IOSTANDARD="SSTL18_II" PADName="M1" SLEW="" VCCAUX_IO="" name="ddr2_cke[0]"/>}
puts $mig_prj_file { <Pin IN_TERM="" IOSTANDARD="SSTL18_II" PADName="K6" SLEW="" VCCAUX_IO="" name="ddr2_cs_n[0]"/>}
puts $mig_prj_file { <Pin IN_TERM="" IOSTANDARD="SSTL18_II" PADName="T6" SLEW="" VCCAUX_IO="" name="ddr2_dm[0]"/>}
puts $mig_prj_file { <Pin IN_TERM="" IOSTANDARD="SSTL18_II" PADName="U1" SLEW="" VCCAUX_IO="" name="ddr2_dm[1]"/>}
puts $mig_prj_file { <Pin IN_TERM="" IOSTANDARD="SSTL18_II" PADName="R7" SLEW="" VCCAUX_IO="" name="ddr2_dq[0]"/>}
puts $mig_prj_file { <Pin IN_TERM="" IOSTANDARD="SSTL18_II" PADName="V5" SLEW="" VCCAUX_IO="" name="ddr2_dq[10]"/>}
puts $mig_prj_file { <Pin IN_TERM="" IOSTANDARD="SSTL18_II" PADName="U4" SLEW="" VCCAUX_IO="" name="ddr2_dq[11]"/>}
puts $mig_prj_file { <Pin IN_TERM="" IOSTANDARD="SSTL18_II" PADName="V4" SLEW="" VCCAUX_IO="" name="ddr2_dq[12]"/>}
puts $mig_prj_file { <Pin IN_TERM="" IOSTANDARD="SSTL18_II" PADName="T4" SLEW="" VCCAUX_IO="" name="ddr2_dq[13]"/>}
puts $mig_prj_file { <Pin IN_TERM="" IOSTANDARD="SSTL18_II" PADName="V1" SLEW="" VCCAUX_IO="" name="ddr2_dq[14]"/>}
puts $mig_prj_file { <Pin IN_TERM="" IOSTANDARD="SSTL18_II" PADName="T3" SLEW="" VCCAUX_IO="" name="ddr2_dq[15]"/>}
puts $mig_prj_file { <Pin IN_TERM="" IOSTANDARD="SSTL18_II" PADName="V6" SLEW="" VCCAUX_IO="" name="ddr2_dq[1]"/>}
puts $mig_prj_file { <Pin IN_TERM="" IOSTANDARD="SSTL18_II" PADName="R8" SLEW="" VCCAUX_IO="" name="ddr2_dq[2]"/>}
puts $mig_prj_file { <Pin IN_TERM="" IOSTANDARD="SSTL18_II" PADName="U7" SLEW="" VCCAUX_IO="" name="ddr2_dq[3]"/>}
puts $mig_prj_file { <Pin IN_TERM="" IOSTANDARD="SSTL18_II" PADName="V7" SLEW="" VCCAUX_IO="" name="ddr2_dq[4]"/>}
puts $mig_prj_file { <Pin IN_TERM="" IOSTANDARD="SSTL18_II" PADName="R6" SLEW="" VCCAUX_IO="" name="ddr2_dq[5]"/>}
puts $mig_prj_file { <Pin IN_TERM="" IOSTANDARD="SSTL18_II" PADName="U6" SLEW="" VCCAUX_IO="" name="ddr2_dq[6]"/>}
puts $mig_prj_file { <Pin IN_TERM="" IOSTANDARD="SSTL18_II" PADName="R5" SLEW="" VCCAUX_IO="" name="ddr2_dq[7]"/>}
puts $mig_prj_file { <Pin IN_TERM="" IOSTANDARD="SSTL18_II" PADName="T5" SLEW="" VCCAUX_IO="" name="ddr2_dq[8]"/>}
puts $mig_prj_file { <Pin IN_TERM="" IOSTANDARD="SSTL18_II" PADName="U3" SLEW="" VCCAUX_IO="" name="ddr2_dq[9]"/>}
puts $mig_prj_file { <Pin IN_TERM="" IOSTANDARD="DIFF_SSTL18_II" PADName="V9" SLEW="" VCCAUX_IO="" name="ddr2_dqs_n[0]"/>}
puts $mig_prj_file { <Pin IN_TERM="" IOSTANDARD="DIFF_SSTL18_II" PADName="V2" SLEW="" VCCAUX_IO="" name="ddr2_dqs_n[1]"/>}
puts $mig_prj_file { <Pin IN_TERM="" IOSTANDARD="DIFF_SSTL18_II" PADName="U9" SLEW="" VCCAUX_IO="" name="ddr2_dqs_p[0]"/>}
puts $mig_prj_file { <Pin IN_TERM="" IOSTANDARD="DIFF_SSTL18_II" PADName="U2" SLEW="" VCCAUX_IO="" name="ddr2_dqs_p[1]"/>}
puts $mig_prj_file { <Pin IN_TERM="" IOSTANDARD="SSTL18_II" PADName="M3" SLEW="" VCCAUX_IO="" name="ddr2_odt[0]"/>}
puts $mig_prj_file { <Pin IN_TERM="" IOSTANDARD="SSTL18_II" PADName="N4" SLEW="" VCCAUX_IO="" name="ddr2_ras_n"/>}
puts $mig_prj_file { <Pin IN_TERM="" IOSTANDARD="SSTL18_II" PADName="N2" SLEW="" VCCAUX_IO="" name="ddr2_we_n"/>}
puts $mig_prj_file { </PinSelection>}
puts $mig_prj_file { <System_Control>}
puts $mig_prj_file { <Pin Bank="Select Bank" PADName="No connect" name="sys_rst"/>}
puts $mig_prj_file { <Pin Bank="Select Bank" PADName="No connect" name="init_calib_complete"/>}
puts $mig_prj_file { <Pin Bank="Select Bank" PADName="No connect" name="tg_compare_error"/>}
puts $mig_prj_file { </System_Control>}
puts $mig_prj_file { <TimingParameters>}
puts $mig_prj_file { <Parameters tfaw="45" tras="40" trcd="15" trefi="7.8" trfc="127.5" trp="12.5" trrd="10" trtp="7.5" twtr="7.5"/>}
puts $mig_prj_file { </TimingParameters>}
puts $mig_prj_file { <mrBurstLength name="Burst Length">8</mrBurstLength>}
puts $mig_prj_file { <mrBurstType name="Burst Type">Sequential</mrBurstType>}
puts $mig_prj_file { <mrCasLatency name="CAS Latency">3</mrCasLatency>}
puts $mig_prj_file { <mrMode name="Mode">Normal</mrMode>}
puts $mig_prj_file { <mrDllReset name="DLL Reset">No</mrDllReset>}
puts $mig_prj_file { <mrPdMode name="PD Mode">Fast exit</mrPdMode>}
puts $mig_prj_file { <mrWriteRecovery name="Write Recovery">3</mrWriteRecovery>}
puts $mig_prj_file { <emrDllEnable name="DLL Enable">Enable-Normal</emrDllEnable>}
puts $mig_prj_file { <emrOutputDriveStrength name="Output Drive Strength">Fullstrength</emrOutputDriveStrength>}
puts $mig_prj_file { <emrCSSelection name="Controller Chip Select Pin">Enable</emrCSSelection>}
puts $mig_prj_file { <emrCKSelection name="Memory Clock Selection">1</emrCKSelection>}
puts $mig_prj_file { <emrRTT name="RTT (nominal) - ODT">50ohms</emrRTT>}
puts $mig_prj_file { <emrPosted name="Additive Latency (AL)">0</emrPosted>}
puts $mig_prj_file { <emrOCD name="OCD Operation">OCD Exit</emrOCD>}
puts $mig_prj_file { <emrDQS name="DQS# Enable">Enable</emrDQS>}
puts $mig_prj_file { <emrRDQS name="RDQS Enable">Disable</emrRDQS>}
puts $mig_prj_file { <emrOutputs name="Outputs">Enable</emrOutputs>}
puts $mig_prj_file { <PortInterface>AXI</PortInterface>}
puts $mig_prj_file { <AXIParameters>}
puts $mig_prj_file { <C0_C_RD_WR_ARB_ALGORITHM>ROUND_ROBIN</C0_C_RD_WR_ARB_ALGORITHM>}
puts $mig_prj_file { <C0_S_AXI_ADDR_WIDTH>27</C0_S_AXI_ADDR_WIDTH>}
puts $mig_prj_file { <C0_S_AXI_DATA_WIDTH>32</C0_S_AXI_DATA_WIDTH>}
puts $mig_prj_file { <C0_S_AXI_ID_WIDTH>7</C0_S_AXI_ID_WIDTH>}
puts $mig_prj_file { <C0_S_AXI_SUPPORTS_NARROW_BURST>1</C0_S_AXI_SUPPORTS_NARROW_BURST>}
puts $mig_prj_file { </AXIParameters>}
puts $mig_prj_file { </Controller>}
puts $mig_prj_file {</Project>}
close $mig_prj_file
}
# End of write_mig_file_system_mig_7series_0_0()
##################################################################
# DESIGN PROCs
##################################################################
# Procedure to create entire design; Provide argument to make
# procedure reusable. If parentCell is "", will use root.
proc create_root_design { parentCell } {
variable script_folder
variable design_name
if { $parentCell eq "" } {
set parentCell [get_bd_cells /]
}
# Get object for parentCell
set parentObj [get_bd_cells $parentCell]
if { $parentObj == "" } {
catch {common::send_gid_msg -ssname BD::TCL -id 2090 -severity "ERROR" "Unable to find parent cell <$parentCell>!"}
return
}
# Make sure parentObj is hier blk
set parentType [get_property TYPE $parentObj]
if { $parentType ne "hier" } {
catch {common::send_gid_msg -ssname BD::TCL -id 2091 -severity "ERROR" "Parent <$parentObj> has TYPE = <$parentType>. Expected to be <hier>."}
return
}
# Save current instance; Restore later
set oldCurInst [current_bd_instance .]
# Set parent object as current
current_bd_instance $parentObj
# Create interface ports
set ddr2_sdram [ create_bd_intf_port -mode Master -vlnv xilinx.com:interface:ddrx_rtl:1.0 ddr2_sdram ]
set dip_switches_16bits [ create_bd_intf_port -mode Master -vlnv xilinx.com:interface:gpio_rtl:1.0 dip_switches_16bits ]
set rgb_led [ create_bd_intf_port -mode Master -vlnv xilinx.com:interface:gpio_rtl:1.0 rgb_led ]
set usb_uart [ create_bd_intf_port -mode Master -vlnv xilinx.com:interface:uart_rtl:1.0 usb_uart ]
# Create ports
set LED [ create_bd_port -dir O -from 15 -to 0 -type data LED ]
set reset [ create_bd_port -dir I -type rst reset ]
set_property -dict [ list \
CONFIG.POLARITY {ACTIVE_LOW} \
] $reset
set sys_clock [ create_bd_port -dir I -type clk -freq_hz 100000000 sys_clock ]
set_property -dict [ list \
CONFIG.PHASE {0.0} \
] $sys_clock
# Create instance: CVA5_0, and set properties
set CVA5_0 [ create_bd_cell -type ip -vlnv user:user:CVA5:1.0 CVA5_0 ]
# Create instance: axi_gpio_0, and set properties
set axi_gpio_0 [ create_bd_cell -type ip -vlnv xilinx.com:ip:axi_gpio:2.0 axi_gpio_0 ]
set_property -dict [ list \
CONFIG.C_ALL_OUTPUTS_2 {1} \
CONFIG.C_DOUT_DEFAULT_2 {0x00000001} \
CONFIG.C_GPIO2_WIDTH {16} \
CONFIG.C_IS_DUAL {1} \
CONFIG.GPIO2_BOARD_INTERFACE {Custom} \
CONFIG.GPIO_BOARD_INTERFACE {dip_switches_16bits} \
CONFIG.USE_BOARD_FLOW {true} \
] $axi_gpio_0
# Create instance: axi_gpio_1, and set properties
set axi_gpio_1 [ create_bd_cell -type ip -vlnv xilinx.com:ip:axi_gpio:2.0 axi_gpio_1 ]
set_property -dict [ list \
CONFIG.GPIO_BOARD_INTERFACE {rgb_led} \
CONFIG.USE_BOARD_FLOW {true} \
] $axi_gpio_1
# Create instance: axi_interconnect_0, and set properties
set axi_interconnect_0 [ create_bd_cell -type ip -vlnv xilinx.com:ip:axi_interconnect:2.1 axi_interconnect_0 ]
set_property -dict [ list \
CONFIG.NUM_MI {4} \
CONFIG.NUM_SI {2} \
CONFIG.S00_HAS_DATA_FIFO {2} \
CONFIG.S01_HAS_DATA_FIFO {2} \
CONFIG.STRATEGY {2} \
] $axi_interconnect_0
# Create instance: axi_uart16550_0, and set properties
set axi_uart16550_0 [ create_bd_cell -type ip -vlnv xilinx.com:ip:axi_uart16550:2.0 axi_uart16550_0 ]
set_property -dict [ list \
CONFIG.UART_BOARD_INTERFACE {usb_uart} \
CONFIG.USE_BOARD_FLOW {true} \
] $axi_uart16550_0
# Create instance: clk_wiz_0, and set properties
set clk_wiz_0 [ create_bd_cell -type ip -vlnv xilinx.com:ip:clk_wiz:6.0 clk_wiz_0 ]
set_property -dict [ list \
CONFIG.CLK_IN1_BOARD_INTERFACE {sys_clock} \
CONFIG.RESET_BOARD_INTERFACE {reset} \
CONFIG.RESET_PORT {resetn} \
CONFIG.RESET_TYPE {ACTIVE_LOW} \
CONFIG.USE_LOCKED {false} \
] $clk_wiz_0
# Create instance: mdm_1, and set properties
set mdm_1 [ create_bd_cell -type ip -vlnv xilinx.com:ip:mdm:3.2 mdm_1 ]
set_property -dict [ list \
CONFIG.C_ADDR_SIZE {32} \
CONFIG.C_DBG_MEM_ACCESS {1} \
CONFIG.C_MB_DBG_PORTS {0} \
CONFIG.C_M_AXI_ADDR_WIDTH {32} \
] $mdm_1
# Create instance: mig_7series_0, and set properties
set mig_7series_0 [ create_bd_cell -type ip -vlnv xilinx.com:ip:mig_7series:4.2 mig_7series_0 ]
# Generate the PRJ File for MIG
set str_mig_folder [get_property IP_DIR [ get_ips [ get_property CONFIG.Component_Name $mig_7series_0 ] ] ]
set str_mig_file_name mig_b.prj
set str_mig_file_path ${str_mig_folder}/${str_mig_file_name}
write_mig_file_system_mig_7series_0_0 $str_mig_file_path
set_property -dict [ list \
CONFIG.BOARD_MIG_PARAM {ddr2_sdram} \
CONFIG.RESET_BOARD_INTERFACE {reset} \
CONFIG.XML_INPUT_FILE {mig_b.prj} \
] $mig_7series_0
# Create instance: proc_sys_reset_0, and set properties
set proc_sys_reset_0 [ create_bd_cell -type ip -vlnv xilinx.com:ip:proc_sys_reset:5.0 proc_sys_reset_0 ]
# Create instance: xlslice_0, and set properties
set xlslice_0 [ create_bd_cell -type ip -vlnv xilinx.com:ip:xlslice:1.0 xlslice_0 ]
set_property -dict [ list \
CONFIG.DIN_WIDTH {16} \
] $xlslice_0
# Create interface connections
connect_bd_intf_net -intf_net CVA5_0_m_axi [get_bd_intf_pins CVA5_0/m_axi] [get_bd_intf_pins axi_interconnect_0/S00_AXI]
connect_bd_intf_net -intf_net axi_gpio_0_GPIO [get_bd_intf_ports dip_switches_16bits] [get_bd_intf_pins axi_gpio_0/GPIO]
connect_bd_intf_net -intf_net axi_gpio_1_GPIO [get_bd_intf_ports rgb_led] [get_bd_intf_pins axi_gpio_1/GPIO]
connect_bd_intf_net -intf_net axi_interconnect_0_M00_AXI [get_bd_intf_pins axi_interconnect_0/M00_AXI] [get_bd_intf_pins mig_7series_0/S_AXI]
connect_bd_intf_net -intf_net axi_interconnect_0_M01_AXI [get_bd_intf_pins axi_interconnect_0/M01_AXI] [get_bd_intf_pins axi_uart16550_0/S_AXI]
connect_bd_intf_net -intf_net axi_interconnect_0_M02_AXI [get_bd_intf_pins axi_gpio_0/S_AXI] [get_bd_intf_pins axi_interconnect_0/M02_AXI]
connect_bd_intf_net -intf_net axi_interconnect_0_M03_AXI [get_bd_intf_pins axi_gpio_1/S_AXI] [get_bd_intf_pins axi_interconnect_0/M03_AXI]
connect_bd_intf_net -intf_net axi_uart16550_0_UART [get_bd_intf_ports usb_uart] [get_bd_intf_pins axi_uart16550_0/UART]
connect_bd_intf_net -intf_net mdm_1_M_AXI [get_bd_intf_pins axi_interconnect_0/S01_AXI] [get_bd_intf_pins mdm_1/M_AXI]
connect_bd_intf_net -intf_net mig_7series_0_DDR2 [get_bd_intf_ports ddr2_sdram] [get_bd_intf_pins mig_7series_0/DDR2]
# Create port connections
connect_bd_net -net axi_gpio_0_gpio2_io_o [get_bd_ports LED] [get_bd_pins axi_gpio_0/gpio2_io_o] [get_bd_pins xlslice_0/Din]
connect_bd_net -net clk_wiz_0_clk_out1 [get_bd_pins clk_wiz_0/clk_out1] [get_bd_pins mig_7series_0/sys_clk_i]
connect_bd_net -net mdm_1_Debug_SYS_Rst [get_bd_pins mdm_1/Debug_SYS_Rst] [get_bd_pins proc_sys_reset_0/mb_debug_sys_rst]
connect_bd_net -net mig_7series_0_ui_addn_clk_0 [get_bd_pins mig_7series_0/clk_ref_i] [get_bd_pins mig_7series_0/ui_addn_clk_0]
connect_bd_net -net mig_7series_0_ui_clk [get_bd_pins CVA5_0/clk] [get_bd_pins axi_gpio_0/s_axi_aclk] [get_bd_pins axi_gpio_1/s_axi_aclk] [get_bd_pins axi_interconnect_0/ACLK] [get_bd_pins axi_interconnect_0/M00_ACLK] [get_bd_pins axi_interconnect_0/M01_ACLK] [get_bd_pins axi_interconnect_0/M02_ACLK] [get_bd_pins axi_interconnect_0/M03_ACLK] [get_bd_pins axi_interconnect_0/S00_ACLK] [get_bd_pins axi_interconnect_0/S01_ACLK] [get_bd_pins axi_uart16550_0/s_axi_aclk] [get_bd_pins mdm_1/M_AXI_ACLK] [get_bd_pins mig_7series_0/ui_clk] [get_bd_pins proc_sys_reset_0/slowest_sync_clk]
connect_bd_net -net mig_7series_0_ui_clk_sync_rst [get_bd_pins mig_7series_0/ui_clk_sync_rst] [get_bd_pins proc_sys_reset_0/ext_reset_in]
connect_bd_net -net proc_sys_reset_0_interconnect_aresetn [get_bd_pins axi_interconnect_0/ARESETN] [get_bd_pins proc_sys_reset_0/interconnect_aresetn]
connect_bd_net -net reset_1 [get_bd_ports reset] [get_bd_pins clk_wiz_0/resetn] [get_bd_pins mig_7series_0/sys_rst]
connect_bd_net -net rst_clk_wiz_1_100M_peripheral_aresetn [get_bd_pins axi_gpio_0/s_axi_aresetn] [get_bd_pins axi_gpio_1/s_axi_aresetn] [get_bd_pins axi_interconnect_0/M00_ARESETN] [get_bd_pins axi_interconnect_0/M01_ARESETN] [get_bd_pins axi_interconnect_0/M02_ARESETN] [get_bd_pins axi_interconnect_0/M03_ARESETN] [get_bd_pins axi_interconnect_0/S00_ARESETN] [get_bd_pins axi_interconnect_0/S01_ARESETN] [get_bd_pins axi_uart16550_0/s_axi_aresetn] [get_bd_pins mdm_1/M_AXI_ARESETN] [get_bd_pins mig_7series_0/aresetn] [get_bd_pins proc_sys_reset_0/peripheral_aresetn]
connect_bd_net -net sys_clock_1 [get_bd_ports sys_clock] [get_bd_pins clk_wiz_0/clk_in1]
connect_bd_net -net xlslice_0_Dout [get_bd_pins CVA5_0/rst] [get_bd_pins xlslice_0/Dout]
# Create address segments
assign_bd_address -offset 0x88100000 -range 0x00010000 -target_address_space [get_bd_addr_spaces CVA5_0/m_axi] [get_bd_addr_segs axi_gpio_0/S_AXI/Reg] -force
assign_bd_address -offset 0x88200000 -range 0x00010000 -target_address_space [get_bd_addr_spaces CVA5_0/m_axi] [get_bd_addr_segs axi_gpio_1/S_AXI/Reg] -force
assign_bd_address -offset 0x88000000 -range 0x00010000 -target_address_space [get_bd_addr_spaces CVA5_0/m_axi] [get_bd_addr_segs axi_uart16550_0/S_AXI/Reg] -force
assign_bd_address -offset 0x80000000 -range 0x08000000 -target_address_space [get_bd_addr_spaces CVA5_0/m_axi] [get_bd_addr_segs mig_7series_0/memmap/memaddr] -force
assign_bd_address -offset 0x88100000 -range 0x00010000 -target_address_space [get_bd_addr_spaces mdm_1/Data] [get_bd_addr_segs axi_gpio_0/S_AXI/Reg] -force
assign_bd_address -offset 0x88200000 -range 0x00010000 -target_address_space [get_bd_addr_spaces mdm_1/Data] [get_bd_addr_segs axi_gpio_1/S_AXI/Reg] -force
assign_bd_address -offset 0x88000000 -range 0x00010000 -target_address_space [get_bd_addr_spaces mdm_1/Data] [get_bd_addr_segs axi_uart16550_0/S_AXI/Reg] -force
assign_bd_address -offset 0x80000000 -range 0x08000000 -target_address_space [get_bd_addr_spaces mdm_1/Data] [get_bd_addr_segs mig_7series_0/memmap/memaddr] -force
# Restore current instance
current_bd_instance $oldCurInst
validate_bd_design
save_bd_design
}
# End of create_root_design()
##################################################################
# MAIN FLOW
##################################################################
create_root_design ""
make_wrapper -files [get_files $script_folder/cva5-competition-baseline/cva5-competition-baseline.srcs/sources_1/bd/system/system.bd] -top
add_files -norecurse $script_folder/cva5-competition-baseline/cva5-competition-baseline.gen/sources_1/bd/system/hdl/system_wrapper.v
update_compile_order -fileset sources_1

48
examples/sw/main.c Normal file
View file

@ -0,0 +1,48 @@
/*
* Copyright © 2024 Chris Keilbart
*
* Licensed under the Apache License, Version 2.0 (the "License");
* you may not use this file except in compliance with the License.
* You may obtain a copy of the License at
*
* http://www.apache.org/licenses/LICENSE-2.0
*
* Unless required by applicable law or agreed to in writing, software
* distributed under the License is distributed on an "AS IS" BASIS,
* WITHOUT WARRANTIES OR CONDITIONS OF ANY KIND, either express or implied.
* See the License for the specific language governing permissions and
* limitations under the License.
*
* Initial code developed under the supervision of Dr. Lesley Shannon,
* Reconfigurable Computing Lab, Simon Fraser University.
*
* Author(s):
* Chris Keilbart <ckeilbar@sfu.ca>
*/
#include <stdio.h>
#include <stdatomic.h>
//Needed to "sleep" for the correct duration
#define MHZ 100
static void usleep(unsigned usecs) {
unsigned int counter = usecs * MHZ;
counter /= 2; //Two instructions per loop body; decrement and compare
do {
atomic_thread_fence(memory_order_relaxed); //This prevents the loop from being optimized away but doesn't do anything
} while (counter-- > 0); //Assumes that a tight add and branch loop can be sustained at 1 IPC
}
//Example program to run on CVA5, assumes running at 100 MHz
//The accompanying mem.mif file is the corresponding executable in hexadecimal format
//It was compiled targetting RV32IM with a 4KB memory size, and a uart_putc function supporting the AXI UART Lite AMD IP
int main(void) {
while(1) {
puts("Hello World!");
usleep(1000*1000);
}
return 0;
}

1024
examples/sw/mem.mif Normal file

File diff suppressed because it is too large Load diff

62
examples/xilinx/clint.tcl Normal file
View file

@ -0,0 +1,62 @@
# Set project name and origin directory
set origin_dir [file dirname [info script]]
set _xil_proj_name_ "axi_clint_top_IP"
# Create Vivado project
create_project ${_xil_proj_name_} $origin_dir/${_xil_proj_name_} -part xcvu9p-flga2104-2L-e -force
# Set board and project properties
set obj [current_project]
set_property -name "board_part" -value "xilinx.com:vcu118:part0:2.4" -objects $obj
set_property -name "default_lib" -value "xil_defaultlib" -objects $obj
set_property -name "ip_cache_permissions" -value "read write" -objects $obj
set_property -name "ip_output_repo" -value "$origin_dir/${_xil_proj_name_}.cache/ip" -objects $obj
set_property -name "sim.ip.auto_export_scripts" -value "1" -objects $obj
set_property -name "simulator_language" -value "Mixed" -objects $obj
set_property -name "target_language" -value "Verilog" -objects $obj
# Create 'sources_1' fileset
if {[string equal [get_filesets -quiet sources_1] ""]} {
create_fileset -srcset sources_1
}
# Import the wishbone_plic_top module
import_files -norecurse $origin_dir/clint_wrapper.sv -force
import_files -norecurse $origin_dir/clint.sv -force
# Set top module
set obj [get_filesets sources_1]
set_property -name "top" -value "clint_wrapper" -objects $obj
# Set IP repository paths
set obj [get_filesets sources_1]
set_property "ip_repo_paths" "[file normalize "$origin_dir"]" $obj
# Update IP catalog
update_ip_catalog -rebuild
# create_ip -name ila -vendor xilinx.com -library ip -version 6.2 -module_name ila_clint
# set_property -dict [list CONFIG.C_PROBE3_WIDTH {32} CONFIG.C_PROBE1_WIDTH {64} CONFIG.C_PROBE0_WIDTH {64} CONFIG.C_NUM_OF_PROBES {4}] [get_ips ila_clint]
# generate_target {instantiation_template} [get_files /media/CVA5_PLIC/cva5/axi_plic_top_IP/axi_plic_top_IP.srcs/sources_1/ip/ila_clint/ila_clint.xci]
############## Initial IP Packaging########################################
ipx::package_project -import_files -force -root_dir $origin_dir
update_compile_order -fileset sources_1
# ipx::add_subcore xilinx.com:ip:ila:6.2 [ipx::get_file_groups xilinx_anylanguagesynthesis -of_objects [ipx::current_core]]
ipx::add_bus_interface s_axi [ipx::current_core]
set_property name s_axi [ipx::get_bus_interfaces S_AXI -of_objects [ipx::current_core]]
ipx::add_bus_parameter NUM_READ_OUTSTANDING [ipx::get_bus_interfaces s_axi -of_objects [ipx::current_core]]
ipx::add_bus_parameter NUM_WRITE_OUTSTANDING [ipx::get_bus_interfaces s_axi -of_objects [ipx::current_core]]
puts "INFO: IP package ${_xil_proj_name_} created successfully."
exit

View file

@ -0,0 +1,84 @@
/*
* Copyright © 2024 Chris Keilbart
*
* Licensed under the Apache License, Version 2.0 (the "License");
* you may not use this file except in compliance with the License.
* You may obtain a copy of the License at
*
* http://www.apache.org/licenses/LICENSE-2.0
*
* Unless required by applicable law or agreed to in writing, software
* distributed under the License is distributed on an "AS IS" BASIS,
* WITHOUT WARRANTIES OR CONDITIONS OF ANY KIND, either express or implied.
* See the License for the specific language governing permissions and
* limitations under the License.
*
* Initial code developed under the supervision of Dr. Lesley Shannon,
* Reconfigurable Computing Lab, Simon Fraser University.
*
* Author(s):
* Chris Keilbart <ckeilbar@sfu.ca>
*/
module cva5_top
#(
parameter LOCAL_MEM = "mem.mif",
parameter WORDS = 1024
)
(
input clk,
input rstn, //Synchronous active low
//Peripheral AXI4-Lite bus
//AR
input m_axi_arready,
output m_axi_arvalid,
output [31:0] m_axi_araddr,
//R
output m_axi_rready,
input m_axi_rvalid,
input [31:0] m_axi_rdata,
input [1:0] m_axi_rresp,
//AW
input m_axi_awready,
output m_axi_awvalid,
output [31:0] m_axi_awaddr,
//W
input m_axi_wready,
output m_axi_wvalid,
output [31:0] m_axi_wdata,
output [3:0] m_axi_wstrb,
//B
output m_axi_bready,
input m_axi_bvalid,
input [1:0] m_axi_bresp
);
cva5_wrapper #(.LOCAL_MEM(LOCAL_MEM), .WORDS(WORDS)) cva5_inst(
.clk(clk),
.rstn(rstn),
.m_axi_arready(m_axi_arready),
.m_axi_arvalid(m_axi_arvalid),
.m_axi_araddr(m_axi_araddr),
.m_axi_rready(m_axi_rready),
.m_axi_rvalid(m_axi_rvalid),
.m_axi_rdata(m_axi_rdata),
.m_axi_rresp(m_axi_rresp),
.m_axi_awready(m_axi_awready),
.m_axi_awvalid(m_axi_awvalid),
.m_axi_awaddr(m_axi_awaddr),
.m_axi_wready(m_axi_wready),
.m_axi_wvalid(m_axi_wvalid),
.m_axi_wdata(m_axi_wdata),
.m_axi_wstrb(m_axi_wstrb),
.m_axi_bready(m_axi_bready),
.m_axi_bvalid(m_axi_bvalid),
.m_axi_bresp(m_axi_bresp)
);
endmodule

View file

@ -0,0 +1,259 @@
/*
* Copyright © 2024 Chris Keilbart
*
* Licensed under the Apache License, Version 2.0 (the "License");
* you may not use this file except in compliance with the License.
* You may obtain a copy of the License at
*
* http://www.apache.org/licenses/LICENSE-2.0
*
* Unless required by applicable law or agreed to in writing, software
* distributed under the License is distributed on an "AS IS" BASIS,
* WITHOUT WARRANTIES OR CONDITIONS OF ANY KIND, either express or implied.
* See the License for the specific language governing permissions and
* limitations under the License.
*
* Initial code developed under the supervision of Dr. Lesley Shannon,
* Reconfigurable Computing Lab, Simon Fraser University.
*
* Author(s):
* Chris Keilbart <ckeilbar@sfu.ca>
*/
module cva5_wrapper
import cva5_config::*;
import cva5_types::*;
#(
parameter string LOCAL_MEM = "mem.mif",
parameter int unsigned WORDS = 1024
)
(
input logic clk,
input logic rstn, //Synchronous active low
//Peripheral AXI bus
//AR
input logic m_axi_arready,
output logic m_axi_arvalid,
output logic [31:0] m_axi_araddr,
//R
output logic m_axi_rready,
input logic m_axi_rvalid,
input logic [31:0] m_axi_rdata,
input logic [1:0] m_axi_rresp,
//AW
input logic m_axi_awready,
output logic m_axi_awvalid,
output logic [31:0] m_axi_awaddr,
//W
input logic m_axi_wready,
output logic m_axi_wvalid,
output logic [31:0] m_axi_wdata,
output logic [3:0] m_axi_wstrb,
//B
output logic m_axi_bready,
input logic m_axi_bvalid,
input logic [1:0] m_axi_bresp
);
//CPU connections
local_memory_interface data_bram();
local_memory_interface instruction_bram();
axi_interface m_axi();
avalon_interface m_avalon(); //Unused
wishbone_interface dwishbone(); //Unused
wishbone_interface iwishbone(); //Unused
mem_interface mem();
logic[63:0] mtime;
interrupt_t s_interrupt; //Unused
interrupt_t m_interrupt; //Unused
////////////////////////////////////////////////////
//Implementation
//Instantiates a CVA5 processor using local memory
//Program start address 0x8000_0000
//Local memory space from 0x8000_0000 through 0x80FF_FFFF
//Peripheral bus from 0x6000_0000 through 0x6FFF_FFFF
localparam wb_group_config_t WB_CPU_CONFIG = '{
0 : '{0: ALU_ID, default : NON_WRITEBACK_ID},
1 : '{0: LS_ID, default : NON_WRITEBACK_ID},
2 : '{0: MUL_ID, 1: DIV_ID, 2: CSR_ID, 3: FPU_ID, 4: CUSTOM_ID, default : NON_WRITEBACK_ID},
default : '{default : NON_WRITEBACK_ID}
};
localparam cpu_config_t CPU_CONFIG = '{
//ISA options
MODES : M,
INCLUDE_UNIT : '{
MUL : 1,
DIV : 1,
CSR : 1,
FPU : 0,
CUSTOM : 0,
default: '0
},
INCLUDE_IFENCE : 0,
INCLUDE_AMO : 0,
INCLUDE_CBO : 0,
//CSR constants
CSRS : '{
MACHINE_IMPLEMENTATION_ID : 0,
CPU_ID : 0,
RESET_VEC : 32'h80000000,
RESET_TVEC : 32'h00000000,
MCONFIGPTR : '0,
INCLUDE_ZICNTR : 1,
INCLUDE_ZIHPM : 0,
INCLUDE_SSTC : 0,
INCLUDE_SMSTATEEN : 0
},
//Memory Options
SQ_DEPTH : 4,
INCLUDE_FORWARDING_TO_STORES : 1,
AMO_UNIT : '{
LR_WAIT : 32,
RESERVATION_WORDS : 8
},
INCLUDE_ICACHE : 0,
ICACHE_ADDR : '{
L: 32'h80000000,
H: 32'h8FFFFFFF
},
ICACHE : '{
LINES : 512,
LINE_W : 4,
WAYS : 2,
USE_EXTERNAL_INVALIDATIONS : 0,
USE_NON_CACHEABLE : 0,
NON_CACHEABLE : '{
L: 32'h70000000,
H: 32'h7FFFFFFF
}
},
ITLB : '{
WAYS : 2,
DEPTH : 64
},
INCLUDE_DCACHE : 0,
DCACHE_ADDR : '{
L: 32'h80000000,
H: 32'h8FFFFFFF
},
DCACHE : '{
LINES : 512,
LINE_W : 4,
WAYS : 2,
USE_EXTERNAL_INVALIDATIONS : 0,
USE_NON_CACHEABLE : 0,
NON_CACHEABLE : '{
L: 32'h70000000,
H: 32'h7FFFFFFF
}
},
DTLB : '{
WAYS : 2,
DEPTH : 64
},
INCLUDE_ILOCAL_MEM : 1,
ILOCAL_MEM_ADDR : '{
L : 32'h80000000,
H : 32'h80FFFFFF
},
INCLUDE_DLOCAL_MEM : 1,
DLOCAL_MEM_ADDR : '{
L : 32'h80000000,
H : 32'h80FFFFFF
},
INCLUDE_IBUS : 0,
IBUS_ADDR : '{
L : 32'h60000000,
H : 32'h6FFFFFFF
},
INCLUDE_PERIPHERAL_BUS : 1,
PERIPHERAL_BUS_ADDR : '{
L : 32'h60000000,
H : 32'h6FFFFFFF
},
PERIPHERAL_BUS_TYPE : AXI_BUS,
//Branch Predictor Options
INCLUDE_BRANCH_PREDICTOR : 1,
BP : '{
WAYS : 2,
ENTRIES : 512,
RAS_ENTRIES : 8
},
//Writeback Options
NUM_WB_GROUPS : 3,
WB_GROUP : WB_CPU_CONFIG
};
logic rst;
assign rst = ~rstn;
cva5 #(.CONFIG(CPU_CONFIG)) cpu(.*);
always_ff @(posedge clk) begin
if (rst)
mtime <= '0;
else
mtime <= mtime + 1;
end
assign s_interrupt = '{default: '0};
assign m_interrupt = '{default: '0};
//AXI peripheral mapping; ID widths are missmatched but unused
assign m_axi.arready = m_axi_arready;
assign m_axi_arvalid = m_axi.arvalid;
assign m_axi_araddr = m_axi.araddr;
assign m_axi_rready = m_axi.rready;
assign m_axi.rvalid = m_axi_rvalid;
assign m_axi.rdata = m_axi_rdata;
assign m_axi.rresp = m_axi_rresp;
assign m_axi.rid = 6'b0;
assign m_axi.awready = m_axi_awready;
assign m_axi_awvalid = m_axi.awvalid;
assign m_axi_awaddr = m_axi.awaddr;
assign m_axi.wready = m_axi_wready;
assign m_axi_wvalid = m_axi.wvalid;
assign m_axi_wdata = m_axi.wdata;
assign m_axi_wstrb = m_axi.wstrb;
assign m_axi_bready = m_axi.bready;
assign m_axi.bvalid = m_axi_bvalid;
assign m_axi.bresp = m_axi_bresp;
assign m_axi.bid = 6'b0;
//Block memory
localparam BRAM_ADDR_W = $clog2(WORDS);
tdp_ram #(
.ADDR_WIDTH(BRAM_ADDR_W),
.NUM_COL(4),
.COL_WIDTH(8),
.PIPELINE_DEPTH(0),
.CASCADE_DEPTH(8),
.USE_PRELOAD(1),
.PRELOAD_FILE(LOCAL_MEM)
) local_mem (
.a_en(instruction_bram.en),
.a_wbe(instruction_bram.be),
.a_wdata(instruction_bram.data_in),
.a_addr(instruction_bram.addr[BRAM_ADDR_W-1:0]),
.a_rdata(instruction_bram.data_out),
.b_en(data_bram.en),
.b_wbe(data_bram.be),
.b_wdata(data_bram.data_in),
.b_addr(data_bram.addr[BRAM_ADDR_W-1:0]),
.b_rdata(data_bram.data_out),
.*);
endmodule

View file

@ -0,0 +1,35 @@
puts "This script will create a system project for CVA5 in the current folder to run a demo application from block memory on the Nexys A7-100T"
puts "You should install the board support files from https://github.com/Digilent/vivado-boards before running this script"
# Create the project
create_project -force -part xc7a100tcsg324-1 CVA5BD ./vivado/CVA5BD
set_property board_part digilentinc.com:nexys-a7-100t:part0:1.3 [current_project]
set_property ip_repo_paths ./vivado/ip_repo [current_project]
update_ip_catalog
# Block diagram
create_bd_design "soc"
# UART
create_bd_cell -type ip -vlnv xilinx.com:ip:axi_uartlite:2.0 axi_uartlite_0
apply_board_connection -board_interface "usb_uart" -ip_intf "axi_uartlite_0/UART" -diagram "soc"
# Reset
create_bd_cell -type ip -vlnv xilinx.com:ip:proc_sys_reset:5.0 proc_sys_reset_0
apply_board_connection -board_interface "reset" -ip_intf "proc_sys_reset_0/ext_reset" -diagram "soc"
# Clock
create_bd_cell -type ip -vlnv xilinx.com:ip:clk_wiz:6.0 clk_wiz_0
# Connect to clock on board
apply_bd_automation -rule xilinx.com:bd_rule:board -config { Board_Interface {sys_clock ( System Clock ) } Manual_Source {Auto}} [get_bd_pins clk_wiz_0/clk_in1]
# Connect to reset on board via inverter
apply_bd_automation -rule xilinx.com:bd_rule:board -config { Board_Interface {reset ( Reset ) } Manual_Source {New External Port (ACTIVE_HIGH)}} [get_bd_pins clk_wiz_0/reset]
# Processor
create_bd_cell -type ip -vlnv xilinx.com:user:cva5_top:1.0 cva5_top_0
# Connect processor to UART via interconnect
apply_bd_automation -rule xilinx.com:bd_rule:axi4 -config { Clk_master {Auto} Clk_slave {Auto} Clk_xbar {Auto} Master {/cva5_top_0/m_axi} Slave {/axi_uartlite_0/S_AXI} ddr_seg {Auto} intc_ip {New AXI Interconnect} master_apm {0}} [get_bd_intf_pins axi_uartlite_0/S_AXI]
# Set address space
set_property offset 0x60000000 [get_bd_addr_segs {cva5_top_0/m_axi/SEG_axi_uartlite_0_Reg}]
regenerate_bd_layout
make_wrapper -files [get_files ./vivado/CVA5BD/CVA5BD.srcs/sources_1/bd/soc/soc.bd] -top
add_files ./vivado/CVA5BD/CVA5BD.gen/sources_1/bd/soc/hdl/soc_wrapper.v
update_compile_order -fileset sources_1

View file

@ -0,0 +1,20 @@
puts "This script will create a project for CVA5 in the current folder and package it as an IP"
# Create the project
create_project -force -part xc7a100tcsg324-1 CVA5IP ./vivado/CVA5IP
add_files -force {core examples/xilinx examples/sw}
set_property top cva5_top [current_fileset]
update_compile_order -fileset sources_1
# Now package as IP using intermediate project
ipx::package_project -root_dir ./vivado/ip_repo -vendor xilinx.com -library user -taxonomy /UserIP -import_files -set_current false -force
ipx::unload_core ./vivado/ip_repo/component.xml
ipx::edit_ip_in_project -upgrade true -name tmp_edit_project -directory ./vivado/ip_repo ./vivado/ip_repo/component.xml
ipx::update_source_project_archive -component [ipx::current_core]
ipx::create_xgui_files [ipx::current_core]
ipx::update_checksums [ipx::current_core]
ipx::check_integrity [ipx::current_core]
ipx::save_core [ipx::current_core]
ipx::move_temp_component_back -component [ipx::current_core]
close_project -delete
close_project

64
examples/xilinx/plic.tcl Normal file
View file

@ -0,0 +1,64 @@
# Set project name and origin directory
set origin_dir [file dirname [info script]]
set _xil_proj_name_ "axi_clint_top_IP"
# Create Vivado project
create_project ${_xil_proj_name_} $origin_dir/${_xil_proj_name_} -part xcvu9p-flga2104-2L-e -force
# Set board and project properties
set obj [current_project]
set_property -name "board_part" -value "xilinx.com:vcu118:part0:2.4" -objects $obj
set_property -name "default_lib" -value "xil_defaultlib" -objects $obj
set_property -name "ip_cache_permissions" -value "read write" -objects $obj
set_property -name "ip_output_repo" -value "$origin_dir/${_xil_proj_name_}.cache/ip" -objects $obj
set_property -name "sim.ip.auto_export_scripts" -value "1" -objects $obj
set_property -name "simulator_language" -value "Mixed" -objects $obj
set_property -name "target_language" -value "Verilog" -objects $obj
# Create 'sources_1' fileset
if {[string equal [get_filesets -quiet sources_1] ""]} {
create_fileset -srcset sources_1
}
# Import the wishbone_plic_top module
import_files -norecurse $origin_dir/plic.sv -force
import_files -norecurse $origin_dir/plic_cmptree.sv -force
import_files -norecurse $origin_dir/plic_gateway.sv -force
import_files -norecurse $origin_dir/plic_wrapper.sv -force
# Set top module
set obj [get_filesets sources_1]
set_property -name "top" -value "plic_wrapper" -objects $obj
# Set IP repository paths
set obj [get_filesets sources_1]
set_property "ip_repo_paths" "[file normalize "$origin_dir"]" $obj
# Update IP catalog
update_ip_catalog -rebuild
# create_ip -name ila -vendor xilinx.com -library ip -version 6.2 -module_name ila_clint
# set_property -dict [list CONFIG.C_PROBE3_WIDTH {32} CONFIG.C_PROBE1_WIDTH {64} CONFIG.C_PROBE0_WIDTH {64} CONFIG.C_NUM_OF_PROBES {4}] [get_ips ila_clint]
# generate_target {instantiation_template} [get_files /media/CVA5_PLIC/cva5/axi_plic_top_IP/axi_plic_top_IP.srcs/sources_1/ip/ila_clint/ila_clint.xci]
############## Initial IP Packaging########################################
ipx::package_project -import_files -force -root_dir $origin_dir
update_compile_order -fileset sources_1
# ipx::add_subcore xilinx.com:ip:ila:6.2 [ipx::get_file_groups xilinx_anylanguagesynthesis -of_objects [ipx::current_core]]
ipx::add_bus_interface s_axi [ipx::current_core]
set_property name s_axi [ipx::get_bus_interfaces S_AXI -of_objects [ipx::current_core]]
ipx::add_bus_parameter NUM_READ_OUTSTANDING [ipx::get_bus_interfaces s_axi -of_objects [ipx::current_core]]
ipx::add_bus_parameter NUM_WRITE_OUTSTANDING [ipx::get_bus_interfaces s_axi -of_objects [ipx::current_core]]
puts "INFO: IP package ${_xil_proj_name_} created successfully."
exit

View file

@ -1,11 +0,0 @@
Creating a Hardware Project for the Zedboard
-----------
We have provided a TCL script that automates the creation of a CVA5 system on a zedBoard through Vivado.
We also provide the manual steps that the script automate.
Hardware setup scripts and steps found here: [Hardware Setup](https://gitlab.com/sfu-rcl/taiga-project/-/wikis/Hardware-Setup)
Simulation setup scripts and steps found here: [Simulation Setup](https://gitlab.com/sfu-rcl/taiga-project/-/wikis/Simulation-Setup)

File diff suppressed because it is too large Load diff

Binary file not shown.

Before

Width:  |  Height:  |  Size: 88 KiB

Some files were not shown because too many files have changed in this diff Show more