ibex/vendor/lowrisc_ip/prim/doc/prim_keccak.md
Philipp Wagner 8b42024cd5 Use vendored-in primitives from OpenTitan
Instead of using copies of primitives from OpenTitan, vendor the files
in directly from OpenTitan, and use them.

Benefits:

- Less potential for diverging code between OpenTitan and Ibex, causing
  problems when importing Ibex into OT.

- Use of the abstract primitives instead of the generic ones. The
  abstract primitives are replaced during synthesis time with
  target-dependent implementations. For simulation, nothing changes. For
  synthesis for a given target technology (e.g. a specific ASIC or FPGA
  technology), the primitives system can be instructed to choose
  optimized versions (if available).

  This is most relevant for the icache, which hard-coded the generic
  SRAM primitive before. This primitive is always implemented as
  registers. By using the abstract primitive (prim_ram_1p) instead, the
  RAMs can be replaced with memory-compiler-generated ones if necessary.

There are no real draw-backs, but a couple points to be aware of:

- Our ram_1p and ram_2p implementations are kept as wrapper around the
  primitives, since their interface deviates slightly from the one in
  prim_ram*. This also includes a rather unfortunate naming confusion
  around rvalid, which means "read data valid" in the OpenTitan advanced
  RAM primitives (prim_ram_1p_adv for example), but means "ack" in
  PULP-derived IP and in our bus implementation.

- The core_ibex UVM DV doesn't use FuseSoC to generate its file list,
  but uses a hard-coded list in `ibex_files.f` instead. Since the
  dynamic primitives system requires the use of FuseSoC we need to
  provide a stop-gap until this file is removed. Issue #893 tracks
  progress on that.

- Dynamic primitives depend no a not-yet-merged feature of FuseSoC
  (https://github.com/olofk/fusesoc/pull/391). We depend on the same
  functionality in OpenTitan and have instructed users to use a patched
  branch of FuseSoC for a long time through `python-requirements.txt`,
  so no action is needed for users which are either successfully
  interacting with the OpenTitan source code, or have followed our
  instructions. All other users will see a reasonably descriptive error
  message during a FuseSoC run.

- This commit is massive, but there are no good ways to split it into
  bisectable, yet small, chunks. I'm sorry. Reviewers can safely ignore
  all code in `vendor/lowrisc_ip`, it's an import from OpenTitan.

- The check_tool_requirements tooling isn't easily vendor-able from
  OpenTitan at the moment. I've filed
  https://github.com/lowRISC/opentitan/issues/2309 to get that sorted.

- The LFSR primitive doesn't have a own core file, forcing us to include
  the catch-all `lowrisc:prim:all` core. I've filed
  https://github.com/lowRISC/opentitan/issues/2310 to get that sorted.
2020-05-27 10:23:15 +01:00

3.8 KiB

title
Primitive Component: Keccak permutation

Overview

prim_keccak is a single round implementation of the permutation stage in SHA3 algorithm. Keccak primitive module assumes the number of rounds is less than or equal to 12 + 2L. It supports all combinations of the data width described in the spec. This implementation is not currently hardened against side-channel or fault injection attacks. It implements the Keccak_p function.

Parameters

Name Type Description
Width int state width in bits. can be 25, 50, 100, 200, 400, 800, or 1600

Derived Parameters

The parameters below are derived parameter from Width parameter.

Name Type Description
W int number of slices in state. Width/25
L int log2 of W
MaxRound int maximum allowed round value. 12 + 2L
RndW int bit-width to represent MaxRound. log2 of MaxRound

Signal Interfaces

Signal Type Description
rnd_i input [RndW] current round number [0..(MaxRound-1)]
s_i input [Width] state input
s_o output[Width] permutated state output

s_i and s_o are little-endian bitarrays. The SHA3 spec shows how to convert the bitstream into the 5x5xW state cube. For instance, bit 0 of the stream maps to A[0,0,0]. The bit 0 in the spec is the first bit of the bitstream. In prim_keccak, s_i[0] is the first bit and s_i[Width-1] is the last bit.

Theory of Operations

         |                                                          |
rnd_i    |                                                          |
---/---->| -----------------------------------------\               |
 [RndW]  |                                          |               |
         |                                          |               |
s_i      |                                          V               | s_o
===/====>| bit2s() -> chi(pi(rho(theta))) -> iota( ,rnd) -> s2bit() |==/==>
 [Width] |            |-----------keccak_p--------------|           |[Width]
         |                                                          |

prim_keccak implements "Step Mappings" section in SHA3 spec. It is composed of five unique permutation functions, theta, rho, pi, chi, and iota. Also it has functions that converts bitstream of Width into 5x5xW state and vice versa.

Three constant parameters are defined inside the keccak primitive module. The rotate position described in phi function is hard-coded as below. The value is described in the SHA3 specification.

localparam int PiRotate [5][5] = '{
  //y  0    1    2    3    4     x
  '{   0,   3,   1,   4,   2},// 0
  '{   1,   4,   2,   0,   3},// 1
  '{   2,   0,   3,   1,   4},// 2
  '{   3,   1,   4,   2,   0},// 3
  '{   4,   2,   0,   3,   1} // 4
};

The shift amount in rho function is defined as RhoOffset parameter. The value is same as in the specification, but it is used as RhoOffset % W. For instance, RhoOffset[2][2] is 171. If Width is 1600, the value used in the design is 171%64, which is 43.

The round constant is calculated by the tool hw/ip/prim/util/keccak_rc.py. The recommended default value of 24 rounds is used in this design, but an argument (changed with the -r flag) is provided for reference. The keccak_rc.py script creates 64 bit of constants and the prim_keccak module uses only lower bits of the constants if the Width is less than 1600. For instance, if Width is 800, lower 32bits of the round constant are used.