mirror of
https://gitee.com/bianbu-linux/linux-6.6
synced 2025-04-24 14:07:52 -04:00
Networking changes for 6.2.
Core ---- - Allow live renaming when an interface is up - Add retpoline wrappers for tc, improving considerably the performances of complex queue discipline configurations. - Add inet drop monitor support. - A few GRO performance improvements. - Add infrastructure for atomic dev stats, addressing long standing data races. - De-duplicate common code between OVS and conntrack offloading infrastructure. - A bunch of UBSAN_BOUNDS/FORTIFY_SOURCE improvements. - Netfilter: introduce packet parser for tunneled packets - Replace IPVS timer-based estimators with kthreads to scale up the workload with the number of available CPUs. - Add the helper support for connection-tracking OVS offload. BPF --- - Support for user defined BPF objects: the use case is to allocate own objects, build own object hierarchies and use the building blocks to build own data structures flexibly, for example, linked lists in BPF. - Make cgroup local storage available to non-cgroup attached BPF programs. - Avoid unnecessary deadlock detection and failures wrt BPF task storage helpers. - A relevant bunch of BPF verifier fixes and improvements. - Veristat tool improvements to support custom filtering, sorting, and replay of results. - Add LLVM disassembler as default library for dumping JITed code. - Lots of new BPF documentation for various BPF maps. - Add bpf_rcu_read_{,un}lock() support for sleepable programs. - Add RCU grace period chaining to BPF to wait for the completion of access from both sleepable and non-sleepable BPF programs. - Add support storing struct task_struct objects as kptrs in maps. - Improve helper UAPI by explicitly defining BPF_FUNC_xxx integer values. - Add libbpf *_opts API-variants for bpf_*_get_fd_by_id() functions. Protocols --------- - TCP: implement Protective Load Balancing across switch links. - TCP: allow dynamically disabling TCP-MD5 static key, reverting back to fast[er]-path. - UDP: Introduce optional per-netns hash lookup table. - IPv6: simplify and cleanup sockets disposal. - Netlink: support different type policies for each generic netlink operation. - MPTCP: add MSG_FASTOPEN and FastOpen listener side support. - MPTCP: add netlink notification support for listener sockets events. - SCTP: add VRF support, allowing sctp sockets binding to VRF devices. - Add bridging MAC Authentication Bypass (MAB) support. - Extensions for Ethernet VPN bridging implementation to better support multicast scenarios. - More work for Wi-Fi 7 support, comprising conversion of all the existing drivers to internal TX queue usage. - IPSec: introduce a new offload type (packet offload) allowing complete header processing and crypto offloading. - IPSec: extended ack support for more descriptive XFRM error reporting. - RXRPC: increase SACK table size and move processing into a per-local endpoint kernel thread, reducing considerably the required locking. - IEEE 802154: synchronous send frame and extended filtering support, initial support for scanning available 15.4 networks. - Tun: bump the link speed from 10Mbps to 10Gbps. - Tun/VirtioNet: implement UDP segmentation offload support. Driver API ---------- - PHY/SFP: improve power level switching between standard level 1 and the higher power levels. - New API for netdev <-> devlink_port linkage. - PTP: convert existing drivers to new frequency adjustment implementation. - DSA: add support for rx offloading. - Autoload DSA tagging driver when dynamically changing protocol. - Add new PCP and APPTRUST attributes to Data Center Bridging. - Add configuration support for 800Gbps link speed. - Add devlink port function attribute to enable/disable RoCE and migratable. - Extend devlink-rate to support strict prioriry and weighted fair queuing. - Add devlink support to directly reading from region memory. - New device tree helper to fetch MAC address from nvmem. - New big TCP helper to simplify temporary header stripping. New hardware / drivers ---------------------- - Ethernet: - Marvel Octeon CNF95N and CN10KB Ethernet Switches. - Marvel Prestera AC5X Ethernet Switch. - WangXun 10 Gigabit NIC. - Motorcomm yt8521 Gigabit Ethernet. - Microchip ksz9563 Gigabit Ethernet Switch. - Microsoft Azure Network Adapter. - Linux Automation 10Base-T1L adapter. - PHY: - Aquantia AQR112 and AQR412. - Motorcomm YT8531S. - PTP: - Orolia ART-CARD. - WiFi: - MediaTek Wi-Fi 7 (802.11be) devices. - RealTek rtw8821cu, rtw8822bu, rtw8822cu and rtw8723du USB devices. - Bluetooth: - Broadcom BCM4377/4378/4387 Bluetooth chipsets. - Realtek RTL8852BE and RTL8723DS. - Cypress.CYW4373A0 WiFi + Bluetooth combo device. Drivers ------- - CAN: - gs_usb: bus error reporting support. - kvaser_usb: listen only and bus error reporting support. - Ethernet NICs: - Intel (100G): - extend action skbedit to RX queue mapping. - implement devlink-rate support. - support direct read from memory. - nVidia/Mellanox (mlx5): - SW steering improvements, increasing rules update rate. - Support for enhanced events compression. - extend H/W offload packet manipulation capabilities. - implement IPSec packet offload mode. - nVidia/Mellanox (mlx4): - better big TCP support. - Netronome Ethernet NICs (nfp): - IPsec offload support. - add support for multicast filter. - Broadcom: - RSS and PTP support improvements. - AMD/SolarFlare: - netlink extened ack improvements. - add basic flower matches to offload, and related stats. - Virtual NICs: - ibmvnic: introduce affinity hint support. - small / embedded: - FreeScale fec: add initial XDP support. - Marvel mv643xx_eth: support MII/GMII/RGMII modes for Kirkwood. - TI am65-cpsw: add suspend/resume support. - Mediatek MT7986: add RX wireless wthernet dispatch support. - Realtek 8169: enable GRO software interrupt coalescing per default. - Ethernet high-speed switches: - Microchip (sparx5): - add support for Sparx5 TC/flower H/W offload via VCAP. - Mellanox mlxsw: - add 802.1X and MAC Authentication Bypass offload support. - add ip6gre support. - Embedded Ethernet switches: - Mediatek (mtk_eth_soc): - improve PCS implementation, add DSA untag support. - enable flow offload support. - Renesas: - add rswitch R-Car Gen4 gPTP support. - Microchip (lan966x): - add full XDP support. - add TC H/W offload via VCAP. - enable PTP on bridge interfaces. - Microchip (ksz8): - add MTU support for KSZ8 series. - Qualcomm 802.11ax WiFi (ath11k): - support configuring channel dwell time during scan. - MediaTek WiFi (mt76): - enable Wireless Ethernet Dispatch (WED) offload support. - add ack signal support. - enable coredump support. - remain_on_channel support. - Intel WiFi (iwlwifi): - enable Wi-Fi 7 Extremely High Throughput (EHT) PHY capabilities. - 320 MHz channels support. - RealTek WiFi (rtw89): - new dynamic header firmware format support. - wake-over-WLAN support. Signed-off-by: Paolo Abeni <pabeni@redhat.com> -----BEGIN PGP SIGNATURE----- iQJGBAABCAAwFiEEg1AjqC77wbdLX2LbKSR5jcyPE6QFAmOYXUcSHHBhYmVuaUBy ZWRoYXQuY29tAAoJECkkeY3MjxOk8zQP/R7BZtbJMTPiWkRnSoKHnAyupDVwrz5U ktukLkwPsCyJuEbAjgxrxf4EEEQ9uq2FFlxNSYuKiiQMqIpFxV6KED7LCUygn4Tc kxtkp0Q+5XiqisWlQmtfExf2OjuuPqcjV9tWCDBI6GebKUbfNwY/eI44RcMu4BSv DzIlW5GkX/kZAPqnnuqaLsN3FudDTJHGEAD7NbA++7wJ076RWYSLXlFv0Z+SCSPS H8/PEG0/ZK/65rIWMAFRClJ9BNIDwGVgp0GrsIvs1gqbRUOlA1hl1rDM21TqtNFf 5QPQT7sIfTcCE/nerxKJD5JE3JyP+XRlRn96PaRw3rt4MgI6I/EOj/HOKQ5tMCNc oPiqb7N70+hkLZyr42qX+vN9eDPjp2koEQm7EO2Zs+/534/zWDs24Zfk/Aa1ps0I Fa82oGjAgkBhGe/FZ6i5cYoLcyxqRqZV1Ws9XQMl72qRC7/BwvNbIW6beLpCRyeM yYIU+0e9dEm+wHQEdh2niJuVtR63hy8tvmPx56lyh+6u0+pondkwbfSiC5aD3kAC ikKsN5DyEsdXyiBAlytCEBxnaOjQy4RAz+3YXSiS0eBNacXp03UUrNGx4Pzpu/D0 QLFJhBnMFFCgy5to8/DvKnrTPgZdSURwqbIUcZdvU21f1HLR8tUTpaQnYffc/Whm V8gnt1EL+0cc =CbJC -----END PGP SIGNATURE----- Merge tag 'net-next-6.2' of git://git.kernel.org/pub/scm/linux/kernel/git/netdev/net-next Pull networking updates from Paolo Abeni: "Core: - Allow live renaming when an interface is up - Add retpoline wrappers for tc, improving considerably the performances of complex queue discipline configurations - Add inet drop monitor support - A few GRO performance improvements - Add infrastructure for atomic dev stats, addressing long standing data races - De-duplicate common code between OVS and conntrack offloading infrastructure - A bunch of UBSAN_BOUNDS/FORTIFY_SOURCE improvements - Netfilter: introduce packet parser for tunneled packets - Replace IPVS timer-based estimators with kthreads to scale up the workload with the number of available CPUs - Add the helper support for connection-tracking OVS offload BPF: - Support for user defined BPF objects: the use case is to allocate own objects, build own object hierarchies and use the building blocks to build own data structures flexibly, for example, linked lists in BPF - Make cgroup local storage available to non-cgroup attached BPF programs - Avoid unnecessary deadlock detection and failures wrt BPF task storage helpers - A relevant bunch of BPF verifier fixes and improvements - Veristat tool improvements to support custom filtering, sorting, and replay of results - Add LLVM disassembler as default library for dumping JITed code - Lots of new BPF documentation for various BPF maps - Add bpf_rcu_read_{,un}lock() support for sleepable programs - Add RCU grace period chaining to BPF to wait for the completion of access from both sleepable and non-sleepable BPF programs - Add support storing struct task_struct objects as kptrs in maps - Improve helper UAPI by explicitly defining BPF_FUNC_xxx integer values - Add libbpf *_opts API-variants for bpf_*_get_fd_by_id() functions Protocols: - TCP: implement Protective Load Balancing across switch links - TCP: allow dynamically disabling TCP-MD5 static key, reverting back to fast[er]-path - UDP: Introduce optional per-netns hash lookup table - IPv6: simplify and cleanup sockets disposal - Netlink: support different type policies for each generic netlink operation - MPTCP: add MSG_FASTOPEN and FastOpen listener side support - MPTCP: add netlink notification support for listener sockets events - SCTP: add VRF support, allowing sctp sockets binding to VRF devices - Add bridging MAC Authentication Bypass (MAB) support - Extensions for Ethernet VPN bridging implementation to better support multicast scenarios - More work for Wi-Fi 7 support, comprising conversion of all the existing drivers to internal TX queue usage - IPSec: introduce a new offload type (packet offload) allowing complete header processing and crypto offloading - IPSec: extended ack support for more descriptive XFRM error reporting - RXRPC: increase SACK table size and move processing into a per-local endpoint kernel thread, reducing considerably the required locking - IEEE 802154: synchronous send frame and extended filtering support, initial support for scanning available 15.4 networks - Tun: bump the link speed from 10Mbps to 10Gbps - Tun/VirtioNet: implement UDP segmentation offload support Driver API: - PHY/SFP: improve power level switching between standard level 1 and the higher power levels - New API for netdev <-> devlink_port linkage - PTP: convert existing drivers to new frequency adjustment implementation - DSA: add support for rx offloading - Autoload DSA tagging driver when dynamically changing protocol - Add new PCP and APPTRUST attributes to Data Center Bridging - Add configuration support for 800Gbps link speed - Add devlink port function attribute to enable/disable RoCE and migratable - Extend devlink-rate to support strict prioriry and weighted fair queuing - Add devlink support to directly reading from region memory - New device tree helper to fetch MAC address from nvmem - New big TCP helper to simplify temporary header stripping New hardware / drivers: - Ethernet: - Marvel Octeon CNF95N and CN10KB Ethernet Switches - Marvel Prestera AC5X Ethernet Switch - WangXun 10 Gigabit NIC - Motorcomm yt8521 Gigabit Ethernet - Microchip ksz9563 Gigabit Ethernet Switch - Microsoft Azure Network Adapter - Linux Automation 10Base-T1L adapter - PHY: - Aquantia AQR112 and AQR412 - Motorcomm YT8531S - PTP: - Orolia ART-CARD - WiFi: - MediaTek Wi-Fi 7 (802.11be) devices - RealTek rtw8821cu, rtw8822bu, rtw8822cu and rtw8723du USB devices - Bluetooth: - Broadcom BCM4377/4378/4387 Bluetooth chipsets - Realtek RTL8852BE and RTL8723DS - Cypress.CYW4373A0 WiFi + Bluetooth combo device Drivers: - CAN: - gs_usb: bus error reporting support - kvaser_usb: listen only and bus error reporting support - Ethernet NICs: - Intel (100G): - extend action skbedit to RX queue mapping - implement devlink-rate support - support direct read from memory - nVidia/Mellanox (mlx5): - SW steering improvements, increasing rules update rate - Support for enhanced events compression - extend H/W offload packet manipulation capabilities - implement IPSec packet offload mode - nVidia/Mellanox (mlx4): - better big TCP support - Netronome Ethernet NICs (nfp): - IPsec offload support - add support for multicast filter - Broadcom: - RSS and PTP support improvements - AMD/SolarFlare: - netlink extened ack improvements - add basic flower matches to offload, and related stats - Virtual NICs: - ibmvnic: introduce affinity hint support - small / embedded: - FreeScale fec: add initial XDP support - Marvel mv643xx_eth: support MII/GMII/RGMII modes for Kirkwood - TI am65-cpsw: add suspend/resume support - Mediatek MT7986: add RX wireless wthernet dispatch support - Realtek 8169: enable GRO software interrupt coalescing per default - Ethernet high-speed switches: - Microchip (sparx5): - add support for Sparx5 TC/flower H/W offload via VCAP - Mellanox mlxsw: - add 802.1X and MAC Authentication Bypass offload support - add ip6gre support - Embedded Ethernet switches: - Mediatek (mtk_eth_soc): - improve PCS implementation, add DSA untag support - enable flow offload support - Renesas: - add rswitch R-Car Gen4 gPTP support - Microchip (lan966x): - add full XDP support - add TC H/W offload via VCAP - enable PTP on bridge interfaces - Microchip (ksz8): - add MTU support for KSZ8 series - Qualcomm 802.11ax WiFi (ath11k): - support configuring channel dwell time during scan - MediaTek WiFi (mt76): - enable Wireless Ethernet Dispatch (WED) offload support - add ack signal support - enable coredump support - remain_on_channel support - Intel WiFi (iwlwifi): - enable Wi-Fi 7 Extremely High Throughput (EHT) PHY capabilities - 320 MHz channels support - RealTek WiFi (rtw89): - new dynamic header firmware format support - wake-over-WLAN support" * tag 'net-next-6.2' of git://git.kernel.org/pub/scm/linux/kernel/git/netdev/net-next: (2002 commits) ipvs: fix type warning in do_div() on 32 bit net: lan966x: Remove a useless test in lan966x_ptp_add_trap() net: ipa: add IPA v4.7 support dt-bindings: net: qcom,ipa: Add SM6350 compatible bnxt: Use generic HBH removal helper in tx path IPv6/GRO: generic helper to remove temporary HBH/jumbo header in driver selftests: forwarding: Add bridge MDB test selftests: forwarding: Rename bridge_mdb test bridge: mcast: Support replacement of MDB port group entries bridge: mcast: Allow user space to specify MDB entry routing protocol bridge: mcast: Allow user space to add (*, G) with a source list and filter mode bridge: mcast: Add support for (*, G) with a source list and filter mode bridge: mcast: Avoid arming group timer when (S, G) corresponds to a source bridge: mcast: Add a flag for user installed source entries bridge: mcast: Expose __br_multicast_del_group_src() bridge: mcast: Expose br_multicast_new_group_src() bridge: mcast: Add a centralized error path bridge: mcast: Place netlink policy before validation functions bridge: mcast: Split (*, G) and (S, G) addition into different functions bridge: mcast: Do not derive entry type from its filter mode ...
This commit is contained in:
commit
7e68dd7d07
2013 changed files with 166136 additions and 34555 deletions
|
@ -298,3 +298,48 @@ A: NO.
|
||||||
|
|
||||||
The BTF_ID macro does not cause a function to become part of the ABI
|
The BTF_ID macro does not cause a function to become part of the ABI
|
||||||
any more than does the EXPORT_SYMBOL_GPL macro.
|
any more than does the EXPORT_SYMBOL_GPL macro.
|
||||||
|
|
||||||
|
Q: What is the compatibility story for special BPF types in map values?
|
||||||
|
-----------------------------------------------------------------------
|
||||||
|
Q: Users are allowed to embed bpf_spin_lock, bpf_timer fields in their BPF map
|
||||||
|
values (when using BTF support for BPF maps). This allows to use helpers for
|
||||||
|
such objects on these fields inside map values. Users are also allowed to embed
|
||||||
|
pointers to some kernel types (with __kptr and __kptr_ref BTF tags). Will the
|
||||||
|
kernel preserve backwards compatibility for these features?
|
||||||
|
|
||||||
|
A: It depends. For bpf_spin_lock, bpf_timer: YES, for kptr and everything else:
|
||||||
|
NO, but see below.
|
||||||
|
|
||||||
|
For struct types that have been added already, like bpf_spin_lock and bpf_timer,
|
||||||
|
the kernel will preserve backwards compatibility, as they are part of UAPI.
|
||||||
|
|
||||||
|
For kptrs, they are also part of UAPI, but only with respect to the kptr
|
||||||
|
mechanism. The types that you can use with a __kptr and __kptr_ref tagged
|
||||||
|
pointer in your struct are NOT part of the UAPI contract. The supported types can
|
||||||
|
and will change across kernel releases. However, operations like accessing kptr
|
||||||
|
fields and bpf_kptr_xchg() helper will continue to be supported across kernel
|
||||||
|
releases for the supported types.
|
||||||
|
|
||||||
|
For any other supported struct type, unless explicitly stated in this document
|
||||||
|
and added to bpf.h UAPI header, such types can and will arbitrarily change their
|
||||||
|
size, type, and alignment, or any other user visible API or ABI detail across
|
||||||
|
kernel releases. The users must adapt their BPF programs to the new changes and
|
||||||
|
update them to make sure their programs continue to work correctly.
|
||||||
|
|
||||||
|
NOTE: BPF subsystem specially reserves the 'bpf\_' prefix for type names, in
|
||||||
|
order to introduce more special fields in the future. Hence, user programs must
|
||||||
|
avoid defining types with 'bpf\_' prefix to not be broken in future releases.
|
||||||
|
In other words, no backwards compatibility is guaranteed if one using a type
|
||||||
|
in BTF with 'bpf\_' prefix.
|
||||||
|
|
||||||
|
Q: What is the compatibility story for special BPF types in allocated objects?
|
||||||
|
------------------------------------------------------------------------------
|
||||||
|
Q: Same as above, but for allocated objects (i.e. objects allocated using
|
||||||
|
bpf_obj_new for user defined types). Will the kernel preserve backwards
|
||||||
|
compatibility for these features?
|
||||||
|
|
||||||
|
A: NO.
|
||||||
|
|
||||||
|
Unlike map value types, there are no stability guarantees for this case. The
|
||||||
|
whole API to work with allocated objects and any support for special fields
|
||||||
|
inside them is unstable (since it is exposed through kfuncs).
|
||||||
|
|
|
@ -44,6 +44,33 @@ is a guarantee that the reported issue will be overlooked.**
|
||||||
Submitting patches
|
Submitting patches
|
||||||
==================
|
==================
|
||||||
|
|
||||||
|
Q: How do I run BPF CI on my changes before sending them out for review?
|
||||||
|
------------------------------------------------------------------------
|
||||||
|
A: BPF CI is GitHub based and hosted at https://github.com/kernel-patches/bpf.
|
||||||
|
While GitHub also provides a CLI that can be used to accomplish the same
|
||||||
|
results, here we focus on the UI based workflow.
|
||||||
|
|
||||||
|
The following steps lay out how to start a CI run for your patches:
|
||||||
|
|
||||||
|
- Create a fork of the aforementioned repository in your own account (one time
|
||||||
|
action)
|
||||||
|
|
||||||
|
- Clone the fork locally, check out a new branch tracking either the bpf-next
|
||||||
|
or bpf branch, and apply your to-be-tested patches on top of it
|
||||||
|
|
||||||
|
- Push the local branch to your fork and create a pull request against
|
||||||
|
kernel-patches/bpf's bpf-next_base or bpf_base branch, respectively
|
||||||
|
|
||||||
|
Shortly after the pull request has been created, the CI workflow will run. Note
|
||||||
|
that capacity is shared with patches submitted upstream being checked and so
|
||||||
|
depending on utilization the run can take a while to finish.
|
||||||
|
|
||||||
|
Note furthermore that both base branches (bpf-next_base and bpf_base) will be
|
||||||
|
updated as patches are pushed to the respective upstream branches they track. As
|
||||||
|
such, your patch set will automatically (be attempted to) be rebased as well.
|
||||||
|
This behavior can result in a CI run being aborted and restarted with the new
|
||||||
|
base line.
|
||||||
|
|
||||||
Q: To which mailing list do I need to submit my BPF patches?
|
Q: To which mailing list do I need to submit my BPF patches?
|
||||||
------------------------------------------------------------
|
------------------------------------------------------------
|
||||||
A: Please submit your BPF patches to the bpf kernel mailing list:
|
A: Please submit your BPF patches to the bpf kernel mailing list:
|
||||||
|
|
485
Documentation/bpf/bpf_iterators.rst
Normal file
485
Documentation/bpf/bpf_iterators.rst
Normal file
|
@ -0,0 +1,485 @@
|
||||||
|
=============
|
||||||
|
BPF Iterators
|
||||||
|
=============
|
||||||
|
|
||||||
|
|
||||||
|
----------
|
||||||
|
Motivation
|
||||||
|
----------
|
||||||
|
|
||||||
|
There are a few existing ways to dump kernel data into user space. The most
|
||||||
|
popular one is the ``/proc`` system. For example, ``cat /proc/net/tcp6`` dumps
|
||||||
|
all tcp6 sockets in the system, and ``cat /proc/net/netlink`` dumps all netlink
|
||||||
|
sockets in the system. However, their output format tends to be fixed, and if
|
||||||
|
users want more information about these sockets, they have to patch the kernel,
|
||||||
|
which often takes time to publish upstream and release. The same is true for popular
|
||||||
|
tools like `ss <https://man7.org/linux/man-pages/man8/ss.8.html>`_ where any
|
||||||
|
additional information needs a kernel patch.
|
||||||
|
|
||||||
|
To solve this problem, the `drgn
|
||||||
|
<https://www.kernel.org/doc/html/latest/bpf/drgn.html>`_ tool is often used to
|
||||||
|
dig out the kernel data with no kernel change. However, the main drawback for
|
||||||
|
drgn is performance, as it cannot do pointer tracing inside the kernel. In
|
||||||
|
addition, drgn cannot validate a pointer value and may read invalid data if the
|
||||||
|
pointer becomes invalid inside the kernel.
|
||||||
|
|
||||||
|
The BPF iterator solves the above problem by providing flexibility on what data
|
||||||
|
(e.g., tasks, bpf_maps, etc.) to collect by calling BPF programs for each kernel
|
||||||
|
data object.
|
||||||
|
|
||||||
|
----------------------
|
||||||
|
How BPF Iterators Work
|
||||||
|
----------------------
|
||||||
|
|
||||||
|
A BPF iterator is a type of BPF program that allows users to iterate over
|
||||||
|
specific types of kernel objects. Unlike traditional BPF tracing programs that
|
||||||
|
allow users to define callbacks that are invoked at particular points of
|
||||||
|
execution in the kernel, BPF iterators allow users to define callbacks that
|
||||||
|
should be executed for every entry in a variety of kernel data structures.
|
||||||
|
|
||||||
|
For example, users can define a BPF iterator that iterates over every task on
|
||||||
|
the system and dumps the total amount of CPU runtime currently used by each of
|
||||||
|
them. Another BPF task iterator may instead dump the cgroup information for each
|
||||||
|
task. Such flexibility is the core value of BPF iterators.
|
||||||
|
|
||||||
|
A BPF program is always loaded into the kernel at the behest of a user space
|
||||||
|
process. A user space process loads a BPF program by opening and initializing
|
||||||
|
the program skeleton as required and then invoking a syscall to have the BPF
|
||||||
|
program verified and loaded by the kernel.
|
||||||
|
|
||||||
|
In traditional tracing programs, a program is activated by having user space
|
||||||
|
obtain a ``bpf_link`` to the program with ``bpf_program__attach()``. Once
|
||||||
|
activated, the program callback will be invoked whenever the tracepoint is
|
||||||
|
triggered in the main kernel. For BPF iterator programs, a ``bpf_link`` to the
|
||||||
|
program is obtained using ``bpf_link_create()``, and the program callback is
|
||||||
|
invoked by issuing system calls from user space.
|
||||||
|
|
||||||
|
Next, let us see how you can use the iterators to iterate on kernel objects and
|
||||||
|
read data.
|
||||||
|
|
||||||
|
------------------------
|
||||||
|
How to Use BPF iterators
|
||||||
|
------------------------
|
||||||
|
|
||||||
|
BPF selftests are a great resource to illustrate how to use the iterators. In
|
||||||
|
this section, we’ll walk through a BPF selftest which shows how to load and use
|
||||||
|
a BPF iterator program. To begin, we’ll look at `bpf_iter.c
|
||||||
|
<https://git.kernel.org/pub/scm/linux/kernel/git/bpf/bpf-next.git/tree/tools/testing/selftests/bpf/prog_tests/bpf_iter.c>`_,
|
||||||
|
which illustrates how to load and trigger BPF iterators on the user space side.
|
||||||
|
Later, we’ll look at a BPF program that runs in kernel space.
|
||||||
|
|
||||||
|
Loading a BPF iterator in the kernel from user space typically involves the
|
||||||
|
following steps:
|
||||||
|
|
||||||
|
* The BPF program is loaded into the kernel through ``libbpf``. Once the kernel
|
||||||
|
has verified and loaded the program, it returns a file descriptor (fd) to user
|
||||||
|
space.
|
||||||
|
* Obtain a ``link_fd`` to the BPF program by calling the ``bpf_link_create()``
|
||||||
|
specified with the BPF program file descriptor received from the kernel.
|
||||||
|
* Next, obtain a BPF iterator file descriptor (``bpf_iter_fd``) by calling the
|
||||||
|
``bpf_iter_create()`` specified with the ``bpf_link`` received from Step 2.
|
||||||
|
* Trigger the iteration by calling ``read(bpf_iter_fd)`` until no data is
|
||||||
|
available.
|
||||||
|
* Close the iterator fd using ``close(bpf_iter_fd)``.
|
||||||
|
* If needed to reread the data, get a new ``bpf_iter_fd`` and do the read again.
|
||||||
|
|
||||||
|
The following are a few examples of selftest BPF iterator programs:
|
||||||
|
|
||||||
|
* `bpf_iter_tcp4.c <https://git.kernel.org/pub/scm/linux/kernel/git/bpf/bpf-next.git/tree/tools/testing/selftests/bpf/progs/bpf_iter_tcp4.c>`_
|
||||||
|
* `bpf_iter_task_vma.c <https://git.kernel.org/pub/scm/linux/kernel/git/bpf/bpf-next.git/tree/tools/testing/selftests/bpf/progs/bpf_iter_task_vma.c>`_
|
||||||
|
* `bpf_iter_task_file.c <https://git.kernel.org/pub/scm/linux/kernel/git/bpf/bpf-next.git/tree/tools/testing/selftests/bpf/progs/bpf_iter_task_file.c>`_
|
||||||
|
|
||||||
|
Let us look at ``bpf_iter_task_file.c``, which runs in kernel space:
|
||||||
|
|
||||||
|
Here is the definition of ``bpf_iter__task_file`` in `vmlinux.h
|
||||||
|
<https://facebookmicrosites.github.io/bpf/blog/2020/02/19/bpf-portability-and-co-re.html#btf>`_.
|
||||||
|
Any struct name in ``vmlinux.h`` in the format ``bpf_iter__<iter_name>``
|
||||||
|
represents a BPF iterator. The suffix ``<iter_name>`` represents the type of
|
||||||
|
iterator.
|
||||||
|
|
||||||
|
::
|
||||||
|
|
||||||
|
struct bpf_iter__task_file {
|
||||||
|
union {
|
||||||
|
struct bpf_iter_meta *meta;
|
||||||
|
};
|
||||||
|
union {
|
||||||
|
struct task_struct *task;
|
||||||
|
};
|
||||||
|
u32 fd;
|
||||||
|
union {
|
||||||
|
struct file *file;
|
||||||
|
};
|
||||||
|
};
|
||||||
|
|
||||||
|
In the above code, the field 'meta' contains the metadata, which is the same for
|
||||||
|
all BPF iterator programs. The rest of the fields are specific to different
|
||||||
|
iterators. For example, for task_file iterators, the kernel layer provides the
|
||||||
|
'task', 'fd' and 'file' field values. The 'task' and 'file' are `reference
|
||||||
|
counted
|
||||||
|
<https://facebookmicrosites.github.io/bpf/blog/2018/08/31/object-lifetime.html#file-descriptors-and-reference-counters>`_,
|
||||||
|
so they won't go away when the BPF program runs.
|
||||||
|
|
||||||
|
Here is a snippet from the ``bpf_iter_task_file.c`` file:
|
||||||
|
|
||||||
|
::
|
||||||
|
|
||||||
|
SEC("iter/task_file")
|
||||||
|
int dump_task_file(struct bpf_iter__task_file *ctx)
|
||||||
|
{
|
||||||
|
struct seq_file *seq = ctx->meta->seq;
|
||||||
|
struct task_struct *task = ctx->task;
|
||||||
|
struct file *file = ctx->file;
|
||||||
|
__u32 fd = ctx->fd;
|
||||||
|
|
||||||
|
if (task == NULL || file == NULL)
|
||||||
|
return 0;
|
||||||
|
|
||||||
|
if (ctx->meta->seq_num == 0) {
|
||||||
|
count = 0;
|
||||||
|
BPF_SEQ_PRINTF(seq, " tgid gid fd file\n");
|
||||||
|
}
|
||||||
|
|
||||||
|
if (tgid == task->tgid && task->tgid != task->pid)
|
||||||
|
count++;
|
||||||
|
|
||||||
|
if (last_tgid != task->tgid) {
|
||||||
|
last_tgid = task->tgid;
|
||||||
|
unique_tgid_count++;
|
||||||
|
}
|
||||||
|
|
||||||
|
BPF_SEQ_PRINTF(seq, "%8d %8d %8d %lx\n", task->tgid, task->pid, fd,
|
||||||
|
(long)file->f_op);
|
||||||
|
return 0;
|
||||||
|
}
|
||||||
|
|
||||||
|
In the above example, the section name ``SEC(iter/task_file)``, indicates that
|
||||||
|
the program is a BPF iterator program to iterate all files from all tasks. The
|
||||||
|
context of the program is ``bpf_iter__task_file`` struct.
|
||||||
|
|
||||||
|
The user space program invokes the BPF iterator program running in the kernel
|
||||||
|
by issuing a ``read()`` syscall. Once invoked, the BPF
|
||||||
|
program can export data to user space using a variety of BPF helper functions.
|
||||||
|
You can use either ``bpf_seq_printf()`` (and BPF_SEQ_PRINTF helper macro) or
|
||||||
|
``bpf_seq_write()`` function based on whether you need formatted output or just
|
||||||
|
binary data, respectively. For binary-encoded data, the user space applications
|
||||||
|
can process the data from ``bpf_seq_write()`` as needed. For the formatted data,
|
||||||
|
you can use ``cat <path>`` to print the results similar to ``cat
|
||||||
|
/proc/net/netlink`` after pinning the BPF iterator to the bpffs mount. Later,
|
||||||
|
use ``rm -f <path>`` to remove the pinned iterator.
|
||||||
|
|
||||||
|
For example, you can use the following command to create a BPF iterator from the
|
||||||
|
``bpf_iter_ipv6_route.o`` object file and pin it to the ``/sys/fs/bpf/my_route``
|
||||||
|
path:
|
||||||
|
|
||||||
|
::
|
||||||
|
|
||||||
|
$ bpftool iter pin ./bpf_iter_ipv6_route.o /sys/fs/bpf/my_route
|
||||||
|
|
||||||
|
And then print out the results using the following command:
|
||||||
|
|
||||||
|
::
|
||||||
|
|
||||||
|
$ cat /sys/fs/bpf/my_route
|
||||||
|
|
||||||
|
|
||||||
|
-------------------------------------------------------
|
||||||
|
Implement Kernel Support for BPF Iterator Program Types
|
||||||
|
-------------------------------------------------------
|
||||||
|
|
||||||
|
To implement a BPF iterator in the kernel, the developer must make a one-time
|
||||||
|
change to the following key data structure defined in the `bpf.h
|
||||||
|
<https://git.kernel.org/pub/scm/linux/kernel/git/bpf/bpf-next.git/tree/include/linux/bpf.h>`_
|
||||||
|
file.
|
||||||
|
|
||||||
|
::
|
||||||
|
|
||||||
|
struct bpf_iter_reg {
|
||||||
|
const char *target;
|
||||||
|
bpf_iter_attach_target_t attach_target;
|
||||||
|
bpf_iter_detach_target_t detach_target;
|
||||||
|
bpf_iter_show_fdinfo_t show_fdinfo;
|
||||||
|
bpf_iter_fill_link_info_t fill_link_info;
|
||||||
|
bpf_iter_get_func_proto_t get_func_proto;
|
||||||
|
u32 ctx_arg_info_size;
|
||||||
|
u32 feature;
|
||||||
|
struct bpf_ctx_arg_aux ctx_arg_info[BPF_ITER_CTX_ARG_MAX];
|
||||||
|
const struct bpf_iter_seq_info *seq_info;
|
||||||
|
};
|
||||||
|
|
||||||
|
After filling the data structure fields, call ``bpf_iter_reg_target()`` to
|
||||||
|
register the iterator to the main BPF iterator subsystem.
|
||||||
|
|
||||||
|
The following is the breakdown for each field in struct ``bpf_iter_reg``.
|
||||||
|
|
||||||
|
.. list-table::
|
||||||
|
:widths: 25 50
|
||||||
|
:header-rows: 1
|
||||||
|
|
||||||
|
* - Fields
|
||||||
|
- Description
|
||||||
|
* - target
|
||||||
|
- Specifies the name of the BPF iterator. For example: ``bpf_map``,
|
||||||
|
``bpf_map_elem``. The name should be different from other ``bpf_iter`` target names in the kernel.
|
||||||
|
* - attach_target and detach_target
|
||||||
|
- Allows for target specific ``link_create`` action since some targets
|
||||||
|
may need special processing. Called during the user space link_create stage.
|
||||||
|
* - show_fdinfo and fill_link_info
|
||||||
|
- Called to fill target specific information when user tries to get link
|
||||||
|
info associated with the iterator.
|
||||||
|
* - get_func_proto
|
||||||
|
- Permits a BPF iterator to access BPF helpers specific to the iterator.
|
||||||
|
* - ctx_arg_info_size and ctx_arg_info
|
||||||
|
- Specifies the verifier states for BPF program arguments associated with
|
||||||
|
the bpf iterator.
|
||||||
|
* - feature
|
||||||
|
- Specifies certain action requests in the kernel BPF iterator
|
||||||
|
infrastructure. Currently, only BPF_ITER_RESCHED is supported. This means
|
||||||
|
that the kernel function cond_resched() is called to avoid other kernel
|
||||||
|
subsystem (e.g., rcu) misbehaving.
|
||||||
|
* - seq_info
|
||||||
|
- Specifies certain action requests in the kernel BPF iterator
|
||||||
|
infrastructure. Currently, only BPF_ITER_RESCHED is supported. This means
|
||||||
|
that the kernel function cond_resched() is called to avoid other kernel
|
||||||
|
subsystem (e.g., rcu) misbehaving.
|
||||||
|
|
||||||
|
|
||||||
|
`Click here
|
||||||
|
<https://lore.kernel.org/bpf/20210212183107.50963-2-songliubraving@fb.com/>`_
|
||||||
|
to see an implementation of the ``task_vma`` BPF iterator in the kernel.
|
||||||
|
|
||||||
|
---------------------------------
|
||||||
|
Parameterizing BPF Task Iterators
|
||||||
|
---------------------------------
|
||||||
|
|
||||||
|
By default, BPF iterators walk through all the objects of the specified types
|
||||||
|
(processes, cgroups, maps, etc.) across the entire system to read relevant
|
||||||
|
kernel data. But often, there are cases where we only care about a much smaller
|
||||||
|
subset of iterable kernel objects, such as only iterating tasks within a
|
||||||
|
specific process. Therefore, BPF iterator programs support filtering out objects
|
||||||
|
from iteration by allowing user space to configure the iterator program when it
|
||||||
|
is attached.
|
||||||
|
|
||||||
|
--------------------------
|
||||||
|
BPF Task Iterator Program
|
||||||
|
--------------------------
|
||||||
|
|
||||||
|
The following code is a BPF iterator program to print files and task information
|
||||||
|
through the ``seq_file`` of the iterator. It is a standard BPF iterator program
|
||||||
|
that visits every file of an iterator. We will use this BPF program in our
|
||||||
|
example later.
|
||||||
|
|
||||||
|
::
|
||||||
|
|
||||||
|
#include <vmlinux.h>
|
||||||
|
#include <bpf/bpf_helpers.h>
|
||||||
|
|
||||||
|
char _license[] SEC("license") = "GPL";
|
||||||
|
|
||||||
|
SEC("iter/task_file")
|
||||||
|
int dump_task_file(struct bpf_iter__task_file *ctx)
|
||||||
|
{
|
||||||
|
struct seq_file *seq = ctx->meta->seq;
|
||||||
|
struct task_struct *task = ctx->task;
|
||||||
|
struct file *file = ctx->file;
|
||||||
|
__u32 fd = ctx->fd;
|
||||||
|
if (task == NULL || file == NULL)
|
||||||
|
return 0;
|
||||||
|
if (ctx->meta->seq_num == 0) {
|
||||||
|
BPF_SEQ_PRINTF(seq, " tgid pid fd file\n");
|
||||||
|
}
|
||||||
|
BPF_SEQ_PRINTF(seq, "%8d %8d %8d %lx\n", task->tgid, task->pid, fd,
|
||||||
|
(long)file->f_op);
|
||||||
|
return 0;
|
||||||
|
}
|
||||||
|
|
||||||
|
----------------------------------------
|
||||||
|
Creating a File Iterator with Parameters
|
||||||
|
----------------------------------------
|
||||||
|
|
||||||
|
Now, let us look at how to create an iterator that includes only files of a
|
||||||
|
process.
|
||||||
|
|
||||||
|
First, fill the ``bpf_iter_attach_opts`` struct as shown below:
|
||||||
|
|
||||||
|
::
|
||||||
|
|
||||||
|
LIBBPF_OPTS(bpf_iter_attach_opts, opts);
|
||||||
|
union bpf_iter_link_info linfo;
|
||||||
|
memset(&linfo, 0, sizeof(linfo));
|
||||||
|
linfo.task.pid = getpid();
|
||||||
|
opts.link_info = &linfo;
|
||||||
|
opts.link_info_len = sizeof(linfo);
|
||||||
|
|
||||||
|
``linfo.task.pid``, if it is non-zero, directs the kernel to create an iterator
|
||||||
|
that only includes opened files for the process with the specified ``pid``. In
|
||||||
|
this example, we will only be iterating files for our process. If
|
||||||
|
``linfo.task.pid`` is zero, the iterator will visit every opened file of every
|
||||||
|
process. Similarly, ``linfo.task.tid`` directs the kernel to create an iterator
|
||||||
|
that visits opened files of a specific thread, not a process. In this example,
|
||||||
|
``linfo.task.tid`` is different from ``linfo.task.pid`` only if the thread has a
|
||||||
|
separate file descriptor table. In most circumstances, all process threads share
|
||||||
|
a single file descriptor table.
|
||||||
|
|
||||||
|
Now, in the userspace program, pass the pointer of struct to the
|
||||||
|
``bpf_program__attach_iter()``.
|
||||||
|
|
||||||
|
::
|
||||||
|
|
||||||
|
link = bpf_program__attach_iter(prog, &opts); iter_fd =
|
||||||
|
bpf_iter_create(bpf_link__fd(link));
|
||||||
|
|
||||||
|
If both *tid* and *pid* are zero, an iterator created from this struct
|
||||||
|
``bpf_iter_attach_opts`` will include every opened file of every task in the
|
||||||
|
system (in the namespace, actually.) It is the same as passing a NULL as the
|
||||||
|
second argument to ``bpf_program__attach_iter()``.
|
||||||
|
|
||||||
|
The whole program looks like the following code:
|
||||||
|
|
||||||
|
::
|
||||||
|
|
||||||
|
#include <stdio.h>
|
||||||
|
#include <unistd.h>
|
||||||
|
#include <bpf/bpf.h>
|
||||||
|
#include <bpf/libbpf.h>
|
||||||
|
#include "bpf_iter_task_ex.skel.h"
|
||||||
|
|
||||||
|
static int do_read_opts(struct bpf_program *prog, struct bpf_iter_attach_opts *opts)
|
||||||
|
{
|
||||||
|
struct bpf_link *link;
|
||||||
|
char buf[16] = {};
|
||||||
|
int iter_fd = -1, len;
|
||||||
|
int ret = 0;
|
||||||
|
|
||||||
|
link = bpf_program__attach_iter(prog, opts);
|
||||||
|
if (!link) {
|
||||||
|
fprintf(stderr, "bpf_program__attach_iter() fails\n");
|
||||||
|
return -1;
|
||||||
|
}
|
||||||
|
iter_fd = bpf_iter_create(bpf_link__fd(link));
|
||||||
|
if (iter_fd < 0) {
|
||||||
|
fprintf(stderr, "bpf_iter_create() fails\n");
|
||||||
|
ret = -1;
|
||||||
|
goto free_link;
|
||||||
|
}
|
||||||
|
/* not check contents, but ensure read() ends without error */
|
||||||
|
while ((len = read(iter_fd, buf, sizeof(buf) - 1)) > 0) {
|
||||||
|
buf[len] = 0;
|
||||||
|
printf("%s", buf);
|
||||||
|
}
|
||||||
|
printf("\n");
|
||||||
|
free_link:
|
||||||
|
if (iter_fd >= 0)
|
||||||
|
close(iter_fd);
|
||||||
|
bpf_link__destroy(link);
|
||||||
|
return 0;
|
||||||
|
}
|
||||||
|
|
||||||
|
static void test_task_file(void)
|
||||||
|
{
|
||||||
|
LIBBPF_OPTS(bpf_iter_attach_opts, opts);
|
||||||
|
struct bpf_iter_task_ex *skel;
|
||||||
|
union bpf_iter_link_info linfo;
|
||||||
|
skel = bpf_iter_task_ex__open_and_load();
|
||||||
|
if (skel == NULL)
|
||||||
|
return;
|
||||||
|
memset(&linfo, 0, sizeof(linfo));
|
||||||
|
linfo.task.pid = getpid();
|
||||||
|
opts.link_info = &linfo;
|
||||||
|
opts.link_info_len = sizeof(linfo);
|
||||||
|
printf("PID %d\n", getpid());
|
||||||
|
do_read_opts(skel->progs.dump_task_file, &opts);
|
||||||
|
bpf_iter_task_ex__destroy(skel);
|
||||||
|
}
|
||||||
|
|
||||||
|
int main(int argc, const char * const * argv)
|
||||||
|
{
|
||||||
|
test_task_file();
|
||||||
|
return 0;
|
||||||
|
}
|
||||||
|
|
||||||
|
The following lines are the output of the program.
|
||||||
|
::
|
||||||
|
|
||||||
|
PID 1859
|
||||||
|
|
||||||
|
tgid pid fd file
|
||||||
|
1859 1859 0 ffffffff82270aa0
|
||||||
|
1859 1859 1 ffffffff82270aa0
|
||||||
|
1859 1859 2 ffffffff82270aa0
|
||||||
|
1859 1859 3 ffffffff82272980
|
||||||
|
1859 1859 4 ffffffff8225e120
|
||||||
|
1859 1859 5 ffffffff82255120
|
||||||
|
1859 1859 6 ffffffff82254f00
|
||||||
|
1859 1859 7 ffffffff82254d80
|
||||||
|
1859 1859 8 ffffffff8225abe0
|
||||||
|
|
||||||
|
------------------
|
||||||
|
Without Parameters
|
||||||
|
------------------
|
||||||
|
|
||||||
|
Let us look at how a BPF iterator without parameters skips files of other
|
||||||
|
processes in the system. In this case, the BPF program has to check the pid or
|
||||||
|
the tid of tasks, or it will receive every opened file in the system (in the
|
||||||
|
current *pid* namespace, actually). So, we usually add a global variable in the
|
||||||
|
BPF program to pass a *pid* to the BPF program.
|
||||||
|
|
||||||
|
The BPF program would look like the following block.
|
||||||
|
|
||||||
|
::
|
||||||
|
|
||||||
|
......
|
||||||
|
int target_pid = 0;
|
||||||
|
|
||||||
|
SEC("iter/task_file")
|
||||||
|
int dump_task_file(struct bpf_iter__task_file *ctx)
|
||||||
|
{
|
||||||
|
......
|
||||||
|
if (task->tgid != target_pid) /* Check task->pid instead to check thread IDs */
|
||||||
|
return 0;
|
||||||
|
BPF_SEQ_PRINTF(seq, "%8d %8d %8d %lx\n", task->tgid, task->pid, fd,
|
||||||
|
(long)file->f_op);
|
||||||
|
return 0;
|
||||||
|
}
|
||||||
|
|
||||||
|
The user space program would look like the following block:
|
||||||
|
|
||||||
|
::
|
||||||
|
|
||||||
|
......
|
||||||
|
static void test_task_file(void)
|
||||||
|
{
|
||||||
|
......
|
||||||
|
skel = bpf_iter_task_ex__open_and_load();
|
||||||
|
if (skel == NULL)
|
||||||
|
return;
|
||||||
|
skel->bss->target_pid = getpid(); /* process ID. For thread id, use gettid() */
|
||||||
|
memset(&linfo, 0, sizeof(linfo));
|
||||||
|
linfo.task.pid = getpid();
|
||||||
|
opts.link_info = &linfo;
|
||||||
|
opts.link_info_len = sizeof(linfo);
|
||||||
|
......
|
||||||
|
}
|
||||||
|
|
||||||
|
``target_pid`` is a global variable in the BPF program. The user space program
|
||||||
|
should initialize the variable with a process ID to skip opened files of other
|
||||||
|
processes in the BPF program. When you parametrize a BPF iterator, the iterator
|
||||||
|
calls the BPF program fewer times which can save significant resources.
|
||||||
|
|
||||||
|
---------------------------
|
||||||
|
Parametrizing VMA Iterators
|
||||||
|
---------------------------
|
||||||
|
|
||||||
|
By default, a BPF VMA iterator includes every VMA in every process. However,
|
||||||
|
you can still specify a process or a thread to include only its VMAs. Unlike
|
||||||
|
files, a thread can not have a separate address space (since Linux 2.6.0-test6).
|
||||||
|
Here, using *tid* makes no difference from using *pid*.
|
||||||
|
|
||||||
|
----------------------------
|
||||||
|
Parametrizing Task Iterators
|
||||||
|
----------------------------
|
||||||
|
|
||||||
|
A BPF task iterator with *pid* includes all tasks (threads) of a process. The
|
||||||
|
BPF program receives these tasks one after another. You can specify a BPF task
|
||||||
|
iterator with *tid* parameter to include only the tasks that match the given
|
||||||
|
*tid*.
|
|
@ -1062,4 +1062,9 @@ format.::
|
||||||
7. Testing
|
7. Testing
|
||||||
==========
|
==========
|
||||||
|
|
||||||
Kernel bpf selftest `test_btf.c` provides extensive set of BTF-related tests.
|
The kernel BPF selftest `tools/testing/selftests/bpf/prog_tests/btf.c`_
|
||||||
|
provides an extensive set of BTF-related tests.
|
||||||
|
|
||||||
|
.. Links
|
||||||
|
.. _tools/testing/selftests/bpf/prog_tests/btf.c:
|
||||||
|
https://git.kernel.org/pub/scm/linux/kernel/git/stable/linux.git/tree/tools/testing/selftests/bpf/prog_tests/btf.c
|
||||||
|
|
|
@ -24,11 +24,13 @@ that goes into great technical depth about the BPF Architecture.
|
||||||
maps
|
maps
|
||||||
bpf_prog_run
|
bpf_prog_run
|
||||||
classic_vs_extended.rst
|
classic_vs_extended.rst
|
||||||
|
bpf_iterators
|
||||||
bpf_licensing
|
bpf_licensing
|
||||||
test_debug
|
test_debug
|
||||||
clang-notes
|
clang-notes
|
||||||
linux-notes
|
linux-notes
|
||||||
other
|
other
|
||||||
|
redirect
|
||||||
|
|
||||||
.. only:: subproject and html
|
.. only:: subproject and html
|
||||||
|
|
||||||
|
|
|
@ -122,11 +122,11 @@ BPF_END 0xd0 byte swap operations (see `Byte swap instructions`_ below)
|
||||||
|
|
||||||
``BPF_XOR | BPF_K | BPF_ALU`` means::
|
``BPF_XOR | BPF_K | BPF_ALU`` means::
|
||||||
|
|
||||||
src_reg = (u32) src_reg ^ (u32) imm32
|
dst_reg = (u32) dst_reg ^ (u32) imm32
|
||||||
|
|
||||||
``BPF_XOR | BPF_K | BPF_ALU64`` means::
|
``BPF_XOR | BPF_K | BPF_ALU64`` means::
|
||||||
|
|
||||||
src_reg = src_reg ^ imm32
|
dst_reg = dst_reg ^ imm32
|
||||||
|
|
||||||
|
|
||||||
Byte swap instructions
|
Byte swap instructions
|
||||||
|
|
|
@ -72,6 +72,30 @@ argument as its size. By default, without __sz annotation, the size of the type
|
||||||
of the pointer is used. Without __sz annotation, a kfunc cannot accept a void
|
of the pointer is used. Without __sz annotation, a kfunc cannot accept a void
|
||||||
pointer.
|
pointer.
|
||||||
|
|
||||||
|
2.2.2 __k Annotation
|
||||||
|
--------------------
|
||||||
|
|
||||||
|
This annotation is only understood for scalar arguments, where it indicates that
|
||||||
|
the verifier must check the scalar argument to be a known constant, which does
|
||||||
|
not indicate a size parameter, and the value of the constant is relevant to the
|
||||||
|
safety of the program.
|
||||||
|
|
||||||
|
An example is given below::
|
||||||
|
|
||||||
|
void *bpf_obj_new(u32 local_type_id__k, ...)
|
||||||
|
{
|
||||||
|
...
|
||||||
|
}
|
||||||
|
|
||||||
|
Here, bpf_obj_new uses local_type_id argument to find out the size of that type
|
||||||
|
ID in program's BTF and return a sized pointer to it. Each type ID will have a
|
||||||
|
distinct size, hence it is crucial to treat each such call as distinct when
|
||||||
|
values don't match during verifier state pruning checks.
|
||||||
|
|
||||||
|
Hence, whenever a constant scalar argument is accepted by a kfunc which is not a
|
||||||
|
size parameter, and the value of the constant matters for program safety, __k
|
||||||
|
suffix should be used.
|
||||||
|
|
||||||
.. _BPF_kfunc_nodef:
|
.. _BPF_kfunc_nodef:
|
||||||
|
|
||||||
2.3 Using an existing kernel function
|
2.3 Using an existing kernel function
|
||||||
|
@ -137,22 +161,20 @@ KF_ACQUIRE and KF_RET_NULL flags.
|
||||||
--------------------------
|
--------------------------
|
||||||
|
|
||||||
The KF_TRUSTED_ARGS flag is used for kfuncs taking pointer arguments. It
|
The KF_TRUSTED_ARGS flag is used for kfuncs taking pointer arguments. It
|
||||||
indicates that the all pointer arguments will always have a guaranteed lifetime,
|
indicates that the all pointer arguments are valid, and that all pointers to
|
||||||
and pointers to kernel objects are always passed to helpers in their unmodified
|
BTF objects have been passed in their unmodified form (that is, at a zero
|
||||||
form (as obtained from acquire kfuncs).
|
offset, and without having been obtained from walking another pointer).
|
||||||
|
|
||||||
It can be used to enforce that a pointer to a refcounted object acquired from a
|
There are two types of pointers to kernel objects which are considered "valid":
|
||||||
kfunc or BPF helper is passed as an argument to this kfunc without any
|
|
||||||
modifications (e.g. pointer arithmetic) such that it is trusted and points to
|
|
||||||
the original object.
|
|
||||||
|
|
||||||
Meanwhile, it is also allowed pass pointers to normal memory to such kfuncs,
|
1. Pointers which are passed as tracepoint or struct_ops callback arguments.
|
||||||
but those can have a non-zero offset.
|
2. Pointers which were returned from a KF_ACQUIRE or KF_KPTR_GET kfunc.
|
||||||
|
|
||||||
This flag is often used for kfuncs that operate (change some property, perform
|
Pointers to non-BTF objects (e.g. scalar pointers) may also be passed to
|
||||||
some operation) on an object that was obtained using an acquire kfunc. Such
|
KF_TRUSTED_ARGS kfuncs, and may have a non-zero offset.
|
||||||
kfuncs need an unchanged pointer to ensure the integrity of the operation being
|
|
||||||
performed on the expected object.
|
The definition of "valid" pointers is subject to change at any time, and has
|
||||||
|
absolutely no ABI stability guarantees.
|
||||||
|
|
||||||
2.4.6 KF_SLEEPABLE flag
|
2.4.6 KF_SLEEPABLE flag
|
||||||
-----------------------
|
-----------------------
|
||||||
|
@ -169,6 +191,15 @@ rebooting or panicking. Due to this additional restrictions apply to these
|
||||||
calls. At the moment they only require CAP_SYS_BOOT capability, but more can be
|
calls. At the moment they only require CAP_SYS_BOOT capability, but more can be
|
||||||
added later.
|
added later.
|
||||||
|
|
||||||
|
2.4.8 KF_RCU flag
|
||||||
|
-----------------
|
||||||
|
|
||||||
|
The KF_RCU flag is used for kfuncs which have a rcu ptr as its argument.
|
||||||
|
When used together with KF_ACQUIRE, it indicates the kfunc should have a
|
||||||
|
single argument which must be a trusted argument or a MEM_RCU pointer.
|
||||||
|
The argument may have reference count of 0 and the kfunc must take this
|
||||||
|
into consideration.
|
||||||
|
|
||||||
2.5 Registering the kfuncs
|
2.5 Registering the kfuncs
|
||||||
--------------------------
|
--------------------------
|
||||||
|
|
||||||
|
@ -191,3 +222,201 @@ type. An example is shown below::
|
||||||
return register_btf_kfunc_id_set(BPF_PROG_TYPE_TRACING, &bpf_task_kfunc_set);
|
return register_btf_kfunc_id_set(BPF_PROG_TYPE_TRACING, &bpf_task_kfunc_set);
|
||||||
}
|
}
|
||||||
late_initcall(init_subsystem);
|
late_initcall(init_subsystem);
|
||||||
|
|
||||||
|
3. Core kfuncs
|
||||||
|
==============
|
||||||
|
|
||||||
|
The BPF subsystem provides a number of "core" kfuncs that are potentially
|
||||||
|
applicable to a wide variety of different possible use cases and programs.
|
||||||
|
Those kfuncs are documented here.
|
||||||
|
|
||||||
|
3.1 struct task_struct * kfuncs
|
||||||
|
-------------------------------
|
||||||
|
|
||||||
|
There are a number of kfuncs that allow ``struct task_struct *`` objects to be
|
||||||
|
used as kptrs:
|
||||||
|
|
||||||
|
.. kernel-doc:: kernel/bpf/helpers.c
|
||||||
|
:identifiers: bpf_task_acquire bpf_task_release
|
||||||
|
|
||||||
|
These kfuncs are useful when you want to acquire or release a reference to a
|
||||||
|
``struct task_struct *`` that was passed as e.g. a tracepoint arg, or a
|
||||||
|
struct_ops callback arg. For example:
|
||||||
|
|
||||||
|
.. code-block:: c
|
||||||
|
|
||||||
|
/**
|
||||||
|
* A trivial example tracepoint program that shows how to
|
||||||
|
* acquire and release a struct task_struct * pointer.
|
||||||
|
*/
|
||||||
|
SEC("tp_btf/task_newtask")
|
||||||
|
int BPF_PROG(task_acquire_release_example, struct task_struct *task, u64 clone_flags)
|
||||||
|
{
|
||||||
|
struct task_struct *acquired;
|
||||||
|
|
||||||
|
acquired = bpf_task_acquire(task);
|
||||||
|
|
||||||
|
/*
|
||||||
|
* In a typical program you'd do something like store
|
||||||
|
* the task in a map, and the map will automatically
|
||||||
|
* release it later. Here, we release it manually.
|
||||||
|
*/
|
||||||
|
bpf_task_release(acquired);
|
||||||
|
return 0;
|
||||||
|
}
|
||||||
|
|
||||||
|
----
|
||||||
|
|
||||||
|
A BPF program can also look up a task from a pid. This can be useful if the
|
||||||
|
caller doesn't have a trusted pointer to a ``struct task_struct *`` object that
|
||||||
|
it can acquire a reference on with bpf_task_acquire().
|
||||||
|
|
||||||
|
.. kernel-doc:: kernel/bpf/helpers.c
|
||||||
|
:identifiers: bpf_task_from_pid
|
||||||
|
|
||||||
|
Here is an example of it being used:
|
||||||
|
|
||||||
|
.. code-block:: c
|
||||||
|
|
||||||
|
SEC("tp_btf/task_newtask")
|
||||||
|
int BPF_PROG(task_get_pid_example, struct task_struct *task, u64 clone_flags)
|
||||||
|
{
|
||||||
|
struct task_struct *lookup;
|
||||||
|
|
||||||
|
lookup = bpf_task_from_pid(task->pid);
|
||||||
|
if (!lookup)
|
||||||
|
/* A task should always be found, as %task is a tracepoint arg. */
|
||||||
|
return -ENOENT;
|
||||||
|
|
||||||
|
if (lookup->pid != task->pid) {
|
||||||
|
/* bpf_task_from_pid() looks up the task via its
|
||||||
|
* globally-unique pid from the init_pid_ns. Thus,
|
||||||
|
* the pid of the lookup task should always be the
|
||||||
|
* same as the input task.
|
||||||
|
*/
|
||||||
|
bpf_task_release(lookup);
|
||||||
|
return -EINVAL;
|
||||||
|
}
|
||||||
|
|
||||||
|
/* bpf_task_from_pid() returns an acquired reference,
|
||||||
|
* so it must be dropped before returning from the
|
||||||
|
* tracepoint handler.
|
||||||
|
*/
|
||||||
|
bpf_task_release(lookup);
|
||||||
|
return 0;
|
||||||
|
}
|
||||||
|
|
||||||
|
3.2 struct cgroup * kfuncs
|
||||||
|
--------------------------
|
||||||
|
|
||||||
|
``struct cgroup *`` objects also have acquire and release functions:
|
||||||
|
|
||||||
|
.. kernel-doc:: kernel/bpf/helpers.c
|
||||||
|
:identifiers: bpf_cgroup_acquire bpf_cgroup_release
|
||||||
|
|
||||||
|
These kfuncs are used in exactly the same manner as bpf_task_acquire() and
|
||||||
|
bpf_task_release() respectively, so we won't provide examples for them.
|
||||||
|
|
||||||
|
----
|
||||||
|
|
||||||
|
You may also acquire a reference to a ``struct cgroup`` kptr that's already
|
||||||
|
stored in a map using bpf_cgroup_kptr_get():
|
||||||
|
|
||||||
|
.. kernel-doc:: kernel/bpf/helpers.c
|
||||||
|
:identifiers: bpf_cgroup_kptr_get
|
||||||
|
|
||||||
|
Here's an example of how it can be used:
|
||||||
|
|
||||||
|
.. code-block:: c
|
||||||
|
|
||||||
|
/* struct containing the struct task_struct kptr which is actually stored in the map. */
|
||||||
|
struct __cgroups_kfunc_map_value {
|
||||||
|
struct cgroup __kptr_ref * cgroup;
|
||||||
|
};
|
||||||
|
|
||||||
|
/* The map containing struct __cgroups_kfunc_map_value entries. */
|
||||||
|
struct {
|
||||||
|
__uint(type, BPF_MAP_TYPE_HASH);
|
||||||
|
__type(key, int);
|
||||||
|
__type(value, struct __cgroups_kfunc_map_value);
|
||||||
|
__uint(max_entries, 1);
|
||||||
|
} __cgroups_kfunc_map SEC(".maps");
|
||||||
|
|
||||||
|
/* ... */
|
||||||
|
|
||||||
|
/**
|
||||||
|
* A simple example tracepoint program showing how a
|
||||||
|
* struct cgroup kptr that is stored in a map can
|
||||||
|
* be acquired using the bpf_cgroup_kptr_get() kfunc.
|
||||||
|
*/
|
||||||
|
SEC("tp_btf/cgroup_mkdir")
|
||||||
|
int BPF_PROG(cgroup_kptr_get_example, struct cgroup *cgrp, const char *path)
|
||||||
|
{
|
||||||
|
struct cgroup *kptr;
|
||||||
|
struct __cgroups_kfunc_map_value *v;
|
||||||
|
s32 id = cgrp->self.id;
|
||||||
|
|
||||||
|
/* Assume a cgroup kptr was previously stored in the map. */
|
||||||
|
v = bpf_map_lookup_elem(&__cgroups_kfunc_map, &id);
|
||||||
|
if (!v)
|
||||||
|
return -ENOENT;
|
||||||
|
|
||||||
|
/* Acquire a reference to the cgroup kptr that's already stored in the map. */
|
||||||
|
kptr = bpf_cgroup_kptr_get(&v->cgroup);
|
||||||
|
if (!kptr)
|
||||||
|
/* If no cgroup was present in the map, it's because
|
||||||
|
* we're racing with another CPU that removed it with
|
||||||
|
* bpf_kptr_xchg() between the bpf_map_lookup_elem()
|
||||||
|
* above, and our call to bpf_cgroup_kptr_get().
|
||||||
|
* bpf_cgroup_kptr_get() internally safely handles this
|
||||||
|
* race, and will return NULL if the task is no longer
|
||||||
|
* present in the map by the time we invoke the kfunc.
|
||||||
|
*/
|
||||||
|
return -EBUSY;
|
||||||
|
|
||||||
|
/* Free the reference we just took above. Note that the
|
||||||
|
* original struct cgroup kptr is still in the map. It will
|
||||||
|
* be freed either at a later time if another context deletes
|
||||||
|
* it from the map, or automatically by the BPF subsystem if
|
||||||
|
* it's still present when the map is destroyed.
|
||||||
|
*/
|
||||||
|
bpf_cgroup_release(kptr);
|
||||||
|
|
||||||
|
return 0;
|
||||||
|
}
|
||||||
|
|
||||||
|
----
|
||||||
|
|
||||||
|
Another kfunc available for interacting with ``struct cgroup *`` objects is
|
||||||
|
bpf_cgroup_ancestor(). This allows callers to access the ancestor of a cgroup,
|
||||||
|
and return it as a cgroup kptr.
|
||||||
|
|
||||||
|
.. kernel-doc:: kernel/bpf/helpers.c
|
||||||
|
:identifiers: bpf_cgroup_ancestor
|
||||||
|
|
||||||
|
Eventually, BPF should be updated to allow this to happen with a normal memory
|
||||||
|
load in the program itself. This is currently not possible without more work in
|
||||||
|
the verifier. bpf_cgroup_ancestor() can be used as follows:
|
||||||
|
|
||||||
|
.. code-block:: c
|
||||||
|
|
||||||
|
/**
|
||||||
|
* Simple tracepoint example that illustrates how a cgroup's
|
||||||
|
* ancestor can be accessed using bpf_cgroup_ancestor().
|
||||||
|
*/
|
||||||
|
SEC("tp_btf/cgroup_mkdir")
|
||||||
|
int BPF_PROG(cgrp_ancestor_example, struct cgroup *cgrp, const char *path)
|
||||||
|
{
|
||||||
|
struct cgroup *parent;
|
||||||
|
|
||||||
|
/* The parent cgroup resides at the level before the current cgroup's level. */
|
||||||
|
parent = bpf_cgroup_ancestor(cgrp, cgrp->level - 1);
|
||||||
|
if (!parent)
|
||||||
|
return -ENOENT;
|
||||||
|
|
||||||
|
bpf_printk("Parent id is %d", parent->self.id);
|
||||||
|
|
||||||
|
/* Return the parent cgroup that was acquired above. */
|
||||||
|
bpf_cgroup_release(parent);
|
||||||
|
return 0;
|
||||||
|
}
|
||||||
|
|
|
@ -1,5 +1,7 @@
|
||||||
.. SPDX-License-Identifier: (LGPL-2.1 OR BSD-2-Clause)
|
.. SPDX-License-Identifier: (LGPL-2.1 OR BSD-2-Clause)
|
||||||
|
|
||||||
|
.. _libbpf:
|
||||||
|
|
||||||
libbpf
|
libbpf
|
||||||
======
|
======
|
||||||
|
|
||||||
|
@ -7,6 +9,7 @@ libbpf
|
||||||
:maxdepth: 1
|
:maxdepth: 1
|
||||||
|
|
||||||
API Documentation <https://libbpf.readthedocs.io/en/latest/api.html>
|
API Documentation <https://libbpf.readthedocs.io/en/latest/api.html>
|
||||||
|
program_types
|
||||||
libbpf_naming_convention
|
libbpf_naming_convention
|
||||||
libbpf_build
|
libbpf_build
|
||||||
|
|
||||||
|
|
203
Documentation/bpf/libbpf/program_types.rst
Normal file
203
Documentation/bpf/libbpf/program_types.rst
Normal file
|
@ -0,0 +1,203 @@
|
||||||
|
.. SPDX-License-Identifier: (LGPL-2.1 OR BSD-2-Clause)
|
||||||
|
|
||||||
|
.. _program_types_and_elf:
|
||||||
|
|
||||||
|
Program Types and ELF Sections
|
||||||
|
==============================
|
||||||
|
|
||||||
|
The table below lists the program types, their attach types where relevant and the ELF section
|
||||||
|
names supported by libbpf for them. The ELF section names follow these rules:
|
||||||
|
|
||||||
|
- ``type`` is an exact match, e.g. ``SEC("socket")``
|
||||||
|
- ``type+`` means it can be either exact ``SEC("type")`` or well-formed ``SEC("type/extras")``
|
||||||
|
with a '``/``' separator between ``type`` and ``extras``.
|
||||||
|
|
||||||
|
When ``extras`` are specified, they provide details of how to auto-attach the BPF program. The
|
||||||
|
format of ``extras`` depends on the program type, e.g. ``SEC("tracepoint/<category>/<name>")``
|
||||||
|
for tracepoints or ``SEC("usdt/<path>:<provider>:<name>")`` for USDT probes. The extras are
|
||||||
|
described in more detail in the footnotes.
|
||||||
|
|
||||||
|
|
||||||
|
+-------------------------------------------+----------------------------------------+----------------------------------+-----------+
|
||||||
|
| Program Type | Attach Type | ELF Section Name | Sleepable |
|
||||||
|
+===========================================+========================================+==================================+===========+
|
||||||
|
| ``BPF_PROG_TYPE_CGROUP_DEVICE`` | ``BPF_CGROUP_DEVICE`` | ``cgroup/dev`` | |
|
||||||
|
+-------------------------------------------+----------------------------------------+----------------------------------+-----------+
|
||||||
|
| ``BPF_PROG_TYPE_CGROUP_SKB`` | | ``cgroup/skb`` | |
|
||||||
|
+ +----------------------------------------+----------------------------------+-----------+
|
||||||
|
| | ``BPF_CGROUP_INET_EGRESS`` | ``cgroup_skb/egress`` | |
|
||||||
|
+ +----------------------------------------+----------------------------------+-----------+
|
||||||
|
| | ``BPF_CGROUP_INET_INGRESS`` | ``cgroup_skb/ingress`` | |
|
||||||
|
+-------------------------------------------+----------------------------------------+----------------------------------+-----------+
|
||||||
|
| ``BPF_PROG_TYPE_CGROUP_SOCKOPT`` | ``BPF_CGROUP_GETSOCKOPT`` | ``cgroup/getsockopt`` | |
|
||||||
|
+ +----------------------------------------+----------------------------------+-----------+
|
||||||
|
| | ``BPF_CGROUP_SETSOCKOPT`` | ``cgroup/setsockopt`` | |
|
||||||
|
+-------------------------------------------+----------------------------------------+----------------------------------+-----------+
|
||||||
|
| ``BPF_PROG_TYPE_CGROUP_SOCK_ADDR`` | ``BPF_CGROUP_INET4_BIND`` | ``cgroup/bind4`` | |
|
||||||
|
+ +----------------------------------------+----------------------------------+-----------+
|
||||||
|
| | ``BPF_CGROUP_INET4_CONNECT`` | ``cgroup/connect4`` | |
|
||||||
|
+ +----------------------------------------+----------------------------------+-----------+
|
||||||
|
| | ``BPF_CGROUP_INET4_GETPEERNAME`` | ``cgroup/getpeername4`` | |
|
||||||
|
+ +----------------------------------------+----------------------------------+-----------+
|
||||||
|
| | ``BPF_CGROUP_INET4_GETSOCKNAME`` | ``cgroup/getsockname4`` | |
|
||||||
|
+ +----------------------------------------+----------------------------------+-----------+
|
||||||
|
| | ``BPF_CGROUP_INET6_BIND`` | ``cgroup/bind6`` | |
|
||||||
|
+ +----------------------------------------+----------------------------------+-----------+
|
||||||
|
| | ``BPF_CGROUP_INET6_CONNECT`` | ``cgroup/connect6`` | |
|
||||||
|
+ +----------------------------------------+----------------------------------+-----------+
|
||||||
|
| | ``BPF_CGROUP_INET6_GETPEERNAME`` | ``cgroup/getpeername6`` | |
|
||||||
|
+ +----------------------------------------+----------------------------------+-----------+
|
||||||
|
| | ``BPF_CGROUP_INET6_GETSOCKNAME`` | ``cgroup/getsockname6`` | |
|
||||||
|
+ +----------------------------------------+----------------------------------+-----------+
|
||||||
|
| | ``BPF_CGROUP_UDP4_RECVMSG`` | ``cgroup/recvmsg4`` | |
|
||||||
|
+ +----------------------------------------+----------------------------------+-----------+
|
||||||
|
| | ``BPF_CGROUP_UDP4_SENDMSG`` | ``cgroup/sendmsg4`` | |
|
||||||
|
+ +----------------------------------------+----------------------------------+-----------+
|
||||||
|
| | ``BPF_CGROUP_UDP6_RECVMSG`` | ``cgroup/recvmsg6`` | |
|
||||||
|
+ +----------------------------------------+----------------------------------+-----------+
|
||||||
|
| | ``BPF_CGROUP_UDP6_SENDMSG`` | ``cgroup/sendmsg6`` | |
|
||||||
|
+-------------------------------------------+----------------------------------------+----------------------------------+-----------+
|
||||||
|
| ``BPF_PROG_TYPE_CGROUP_SOCK`` | ``BPF_CGROUP_INET4_POST_BIND`` | ``cgroup/post_bind4`` | |
|
||||||
|
+ +----------------------------------------+----------------------------------+-----------+
|
||||||
|
| | ``BPF_CGROUP_INET6_POST_BIND`` | ``cgroup/post_bind6`` | |
|
||||||
|
+ +----------------------------------------+----------------------------------+-----------+
|
||||||
|
| | ``BPF_CGROUP_INET_SOCK_CREATE`` | ``cgroup/sock_create`` | |
|
||||||
|
+ + +----------------------------------+-----------+
|
||||||
|
| | | ``cgroup/sock`` | |
|
||||||
|
+ +----------------------------------------+----------------------------------+-----------+
|
||||||
|
| | ``BPF_CGROUP_INET_SOCK_RELEASE`` | ``cgroup/sock_release`` | |
|
||||||
|
+-------------------------------------------+----------------------------------------+----------------------------------+-----------+
|
||||||
|
| ``BPF_PROG_TYPE_CGROUP_SYSCTL`` | ``BPF_CGROUP_SYSCTL`` | ``cgroup/sysctl`` | |
|
||||||
|
+-------------------------------------------+----------------------------------------+----------------------------------+-----------+
|
||||||
|
| ``BPF_PROG_TYPE_EXT`` | | ``freplace+`` [#fentry]_ | |
|
||||||
|
+-------------------------------------------+----------------------------------------+----------------------------------+-----------+
|
||||||
|
| ``BPF_PROG_TYPE_FLOW_DISSECTOR`` | ``BPF_FLOW_DISSECTOR`` | ``flow_dissector`` | |
|
||||||
|
+-------------------------------------------+----------------------------------------+----------------------------------+-----------+
|
||||||
|
| ``BPF_PROG_TYPE_KPROBE`` | | ``kprobe+`` [#kprobe]_ | |
|
||||||
|
+ + +----------------------------------+-----------+
|
||||||
|
| | | ``kretprobe+`` [#kprobe]_ | |
|
||||||
|
+ + +----------------------------------+-----------+
|
||||||
|
| | | ``ksyscall+`` [#ksyscall]_ | |
|
||||||
|
+ + +----------------------------------+-----------+
|
||||||
|
| | | ``kretsyscall+`` [#ksyscall]_ | |
|
||||||
|
+ + +----------------------------------+-----------+
|
||||||
|
| | | ``uprobe+`` [#uprobe]_ | |
|
||||||
|
+ + +----------------------------------+-----------+
|
||||||
|
| | | ``uprobe.s+`` [#uprobe]_ | Yes |
|
||||||
|
+ + +----------------------------------+-----------+
|
||||||
|
| | | ``uretprobe+`` [#uprobe]_ | |
|
||||||
|
+ + +----------------------------------+-----------+
|
||||||
|
| | | ``uretprobe.s+`` [#uprobe]_ | Yes |
|
||||||
|
+ + +----------------------------------+-----------+
|
||||||
|
| | | ``usdt+`` [#usdt]_ | |
|
||||||
|
+ +----------------------------------------+----------------------------------+-----------+
|
||||||
|
| | ``BPF_TRACE_KPROBE_MULTI`` | ``kprobe.multi+`` [#kpmulti]_ | |
|
||||||
|
+ + +----------------------------------+-----------+
|
||||||
|
| | | ``kretprobe.multi+`` [#kpmulti]_ | |
|
||||||
|
+-------------------------------------------+----------------------------------------+----------------------------------+-----------+
|
||||||
|
| ``BPF_PROG_TYPE_LIRC_MODE2`` | ``BPF_LIRC_MODE2`` | ``lirc_mode2`` | |
|
||||||
|
+-------------------------------------------+----------------------------------------+----------------------------------+-----------+
|
||||||
|
| ``BPF_PROG_TYPE_LSM`` | ``BPF_LSM_CGROUP`` | ``lsm_cgroup+`` | |
|
||||||
|
+ +----------------------------------------+----------------------------------+-----------+
|
||||||
|
| | ``BPF_LSM_MAC`` | ``lsm+`` [#lsm]_ | |
|
||||||
|
+ + +----------------------------------+-----------+
|
||||||
|
| | | ``lsm.s+`` [#lsm]_ | Yes |
|
||||||
|
+-------------------------------------------+----------------------------------------+----------------------------------+-----------+
|
||||||
|
| ``BPF_PROG_TYPE_LWT_IN`` | | ``lwt_in`` | |
|
||||||
|
+-------------------------------------------+----------------------------------------+----------------------------------+-----------+
|
||||||
|
| ``BPF_PROG_TYPE_LWT_OUT`` | | ``lwt_out`` | |
|
||||||
|
+-------------------------------------------+----------------------------------------+----------------------------------+-----------+
|
||||||
|
| ``BPF_PROG_TYPE_LWT_SEG6LOCAL`` | | ``lwt_seg6local`` | |
|
||||||
|
+-------------------------------------------+----------------------------------------+----------------------------------+-----------+
|
||||||
|
| ``BPF_PROG_TYPE_LWT_XMIT`` | | ``lwt_xmit`` | |
|
||||||
|
+-------------------------------------------+----------------------------------------+----------------------------------+-----------+
|
||||||
|
| ``BPF_PROG_TYPE_PERF_EVENT`` | | ``perf_event`` | |
|
||||||
|
+-------------------------------------------+----------------------------------------+----------------------------------+-----------+
|
||||||
|
| ``BPF_PROG_TYPE_RAW_TRACEPOINT_WRITABLE`` | | ``raw_tp.w+`` [#rawtp]_ | |
|
||||||
|
+ + +----------------------------------+-----------+
|
||||||
|
| | | ``raw_tracepoint.w+`` | |
|
||||||
|
+-------------------------------------------+----------------------------------------+----------------------------------+-----------+
|
||||||
|
| ``BPF_PROG_TYPE_RAW_TRACEPOINT`` | | ``raw_tp+`` [#rawtp]_ | |
|
||||||
|
+ + +----------------------------------+-----------+
|
||||||
|
| | | ``raw_tracepoint+`` | |
|
||||||
|
+-------------------------------------------+----------------------------------------+----------------------------------+-----------+
|
||||||
|
| ``BPF_PROG_TYPE_SCHED_ACT`` | | ``action`` | |
|
||||||
|
+-------------------------------------------+----------------------------------------+----------------------------------+-----------+
|
||||||
|
| ``BPF_PROG_TYPE_SCHED_CLS`` | | ``classifier`` | |
|
||||||
|
+ + +----------------------------------+-----------+
|
||||||
|
| | | ``tc`` | |
|
||||||
|
+-------------------------------------------+----------------------------------------+----------------------------------+-----------+
|
||||||
|
| ``BPF_PROG_TYPE_SK_LOOKUP`` | ``BPF_SK_LOOKUP`` | ``sk_lookup`` | |
|
||||||
|
+-------------------------------------------+----------------------------------------+----------------------------------+-----------+
|
||||||
|
| ``BPF_PROG_TYPE_SK_MSG`` | ``BPF_SK_MSG_VERDICT`` | ``sk_msg`` | |
|
||||||
|
+-------------------------------------------+----------------------------------------+----------------------------------+-----------+
|
||||||
|
| ``BPF_PROG_TYPE_SK_REUSEPORT`` | ``BPF_SK_REUSEPORT_SELECT_OR_MIGRATE`` | ``sk_reuseport/migrate`` | |
|
||||||
|
+ +----------------------------------------+----------------------------------+-----------+
|
||||||
|
| | ``BPF_SK_REUSEPORT_SELECT`` | ``sk_reuseport`` | |
|
||||||
|
+-------------------------------------------+----------------------------------------+----------------------------------+-----------+
|
||||||
|
| ``BPF_PROG_TYPE_SK_SKB`` | | ``sk_skb`` | |
|
||||||
|
+ +----------------------------------------+----------------------------------+-----------+
|
||||||
|
| | ``BPF_SK_SKB_STREAM_PARSER`` | ``sk_skb/stream_parser`` | |
|
||||||
|
+ +----------------------------------------+----------------------------------+-----------+
|
||||||
|
| | ``BPF_SK_SKB_STREAM_VERDICT`` | ``sk_skb/stream_verdict`` | |
|
||||||
|
+-------------------------------------------+----------------------------------------+----------------------------------+-----------+
|
||||||
|
| ``BPF_PROG_TYPE_SOCKET_FILTER`` | | ``socket`` | |
|
||||||
|
+-------------------------------------------+----------------------------------------+----------------------------------+-----------+
|
||||||
|
| ``BPF_PROG_TYPE_SOCK_OPS`` | ``BPF_CGROUP_SOCK_OPS`` | ``sockops`` | |
|
||||||
|
+-------------------------------------------+----------------------------------------+----------------------------------+-----------+
|
||||||
|
| ``BPF_PROG_TYPE_STRUCT_OPS`` | | ``struct_ops+`` | |
|
||||||
|
+-------------------------------------------+----------------------------------------+----------------------------------+-----------+
|
||||||
|
| ``BPF_PROG_TYPE_SYSCALL`` | | ``syscall`` | Yes |
|
||||||
|
+-------------------------------------------+----------------------------------------+----------------------------------+-----------+
|
||||||
|
| ``BPF_PROG_TYPE_TRACEPOINT`` | | ``tp+`` [#tp]_ | |
|
||||||
|
+ + +----------------------------------+-----------+
|
||||||
|
| | | ``tracepoint+`` [#tp]_ | |
|
||||||
|
+-------------------------------------------+----------------------------------------+----------------------------------+-----------+
|
||||||
|
| ``BPF_PROG_TYPE_TRACING`` | ``BPF_MODIFY_RETURN`` | ``fmod_ret+`` [#fentry]_ | |
|
||||||
|
+ + +----------------------------------+-----------+
|
||||||
|
| | | ``fmod_ret.s+`` [#fentry]_ | Yes |
|
||||||
|
+ +----------------------------------------+----------------------------------+-----------+
|
||||||
|
| | ``BPF_TRACE_FENTRY`` | ``fentry+`` [#fentry]_ | |
|
||||||
|
+ + +----------------------------------+-----------+
|
||||||
|
| | | ``fentry.s+`` [#fentry]_ | Yes |
|
||||||
|
+ +----------------------------------------+----------------------------------+-----------+
|
||||||
|
| | ``BPF_TRACE_FEXIT`` | ``fexit+`` [#fentry]_ | |
|
||||||
|
+ + +----------------------------------+-----------+
|
||||||
|
| | | ``fexit.s+`` [#fentry]_ | Yes |
|
||||||
|
+ +----------------------------------------+----------------------------------+-----------+
|
||||||
|
| | ``BPF_TRACE_ITER`` | ``iter+`` [#iter]_ | |
|
||||||
|
+ + +----------------------------------+-----------+
|
||||||
|
| | | ``iter.s+`` [#iter]_ | Yes |
|
||||||
|
+ +----------------------------------------+----------------------------------+-----------+
|
||||||
|
| | ``BPF_TRACE_RAW_TP`` | ``tp_btf+`` [#fentry]_ | |
|
||||||
|
+-------------------------------------------+----------------------------------------+----------------------------------+-----------+
|
||||||
|
| ``BPF_PROG_TYPE_XDP`` | ``BPF_XDP_CPUMAP`` | ``xdp.frags/cpumap`` | |
|
||||||
|
+ + +----------------------------------+-----------+
|
||||||
|
| | | ``xdp/cpumap`` | |
|
||||||
|
+ +----------------------------------------+----------------------------------+-----------+
|
||||||
|
| | ``BPF_XDP_DEVMAP`` | ``xdp.frags/devmap`` | |
|
||||||
|
+ + +----------------------------------+-----------+
|
||||||
|
| | | ``xdp/devmap`` | |
|
||||||
|
+ +----------------------------------------+----------------------------------+-----------+
|
||||||
|
| | ``BPF_XDP`` | ``xdp.frags`` | |
|
||||||
|
+ + +----------------------------------+-----------+
|
||||||
|
| | | ``xdp`` | |
|
||||||
|
+-------------------------------------------+----------------------------------------+----------------------------------+-----------+
|
||||||
|
|
||||||
|
|
||||||
|
.. rubric:: Footnotes
|
||||||
|
|
||||||
|
.. [#fentry] The ``fentry`` attach format is ``fentry[.s]/<function>``.
|
||||||
|
.. [#kprobe] The ``kprobe`` attach format is ``kprobe/<function>[+<offset>]``. Valid
|
||||||
|
characters for ``function`` are ``a-zA-Z0-9_.`` and ``offset`` must be a valid
|
||||||
|
non-negative integer.
|
||||||
|
.. [#ksyscall] The ``ksyscall`` attach format is ``ksyscall/<syscall>``.
|
||||||
|
.. [#uprobe] The ``uprobe`` attach format is ``uprobe[.s]/<path>:<function>[+<offset>]``.
|
||||||
|
.. [#usdt] The ``usdt`` attach format is ``usdt/<path>:<provider>:<name>``.
|
||||||
|
.. [#kpmulti] The ``kprobe.multi`` attach format is ``kprobe.multi/<pattern>`` where ``pattern``
|
||||||
|
supports ``*`` and ``?`` wildcards. Valid characters for pattern are
|
||||||
|
``a-zA-Z0-9_.*?``.
|
||||||
|
.. [#lsm] The ``lsm`` attachment format is ``lsm[.s]/<hook>``.
|
||||||
|
.. [#rawtp] The ``raw_tp`` attach format is ``raw_tracepoint[.w]/<tracepoint>``.
|
||||||
|
.. [#tp] The ``tracepoint`` attach format is ``tracepoint/<category>/<name>``.
|
||||||
|
.. [#iter] The ``iter`` attach format is ``iter[.s]/<struct-name>``.
|
262
Documentation/bpf/map_array.rst
Normal file
262
Documentation/bpf/map_array.rst
Normal file
|
@ -0,0 +1,262 @@
|
||||||
|
.. SPDX-License-Identifier: GPL-2.0-only
|
||||||
|
.. Copyright (C) 2022 Red Hat, Inc.
|
||||||
|
|
||||||
|
================================================
|
||||||
|
BPF_MAP_TYPE_ARRAY and BPF_MAP_TYPE_PERCPU_ARRAY
|
||||||
|
================================================
|
||||||
|
|
||||||
|
.. note::
|
||||||
|
- ``BPF_MAP_TYPE_ARRAY`` was introduced in kernel version 3.19
|
||||||
|
- ``BPF_MAP_TYPE_PERCPU_ARRAY`` was introduced in version 4.6
|
||||||
|
|
||||||
|
``BPF_MAP_TYPE_ARRAY`` and ``BPF_MAP_TYPE_PERCPU_ARRAY`` provide generic array
|
||||||
|
storage. The key type is an unsigned 32-bit integer (4 bytes) and the map is
|
||||||
|
of constant size. The size of the array is defined in ``max_entries`` at
|
||||||
|
creation time. All array elements are pre-allocated and zero initialized when
|
||||||
|
created. ``BPF_MAP_TYPE_PERCPU_ARRAY`` uses a different memory region for each
|
||||||
|
CPU whereas ``BPF_MAP_TYPE_ARRAY`` uses the same memory region. The value
|
||||||
|
stored can be of any size, however, all array elements are aligned to 8
|
||||||
|
bytes.
|
||||||
|
|
||||||
|
Since kernel 5.5, memory mapping may be enabled for ``BPF_MAP_TYPE_ARRAY`` by
|
||||||
|
setting the flag ``BPF_F_MMAPABLE``. The map definition is page-aligned and
|
||||||
|
starts on the first page. Sufficient page-sized and page-aligned blocks of
|
||||||
|
memory are allocated to store all array values, starting on the second page,
|
||||||
|
which in some cases will result in over-allocation of memory. The benefit of
|
||||||
|
using this is increased performance and ease of use since userspace programs
|
||||||
|
would not be required to use helper functions to access and mutate data.
|
||||||
|
|
||||||
|
Usage
|
||||||
|
=====
|
||||||
|
|
||||||
|
Kernel BPF
|
||||||
|
----------
|
||||||
|
|
||||||
|
bpf_map_lookup_elem()
|
||||||
|
~~~~~~~~~~~~~~~~~~~~~
|
||||||
|
|
||||||
|
.. code-block:: c
|
||||||
|
|
||||||
|
void *bpf_map_lookup_elem(struct bpf_map *map, const void *key)
|
||||||
|
|
||||||
|
Array elements can be retrieved using the ``bpf_map_lookup_elem()`` helper.
|
||||||
|
This helper returns a pointer into the array element, so to avoid data races
|
||||||
|
with userspace reading the value, the user must use primitives like
|
||||||
|
``__sync_fetch_and_add()`` when updating the value in-place.
|
||||||
|
|
||||||
|
bpf_map_update_elem()
|
||||||
|
~~~~~~~~~~~~~~~~~~~~~
|
||||||
|
|
||||||
|
.. code-block:: c
|
||||||
|
|
||||||
|
long bpf_map_update_elem(struct bpf_map *map, const void *key, const void *value, u64 flags)
|
||||||
|
|
||||||
|
Array elements can be updated using the ``bpf_map_update_elem()`` helper.
|
||||||
|
|
||||||
|
``bpf_map_update_elem()`` returns 0 on success, or negative error in case of
|
||||||
|
failure.
|
||||||
|
|
||||||
|
Since the array is of constant size, ``bpf_map_delete_elem()`` is not supported.
|
||||||
|
To clear an array element, you may use ``bpf_map_update_elem()`` to insert a
|
||||||
|
zero value to that index.
|
||||||
|
|
||||||
|
Per CPU Array
|
||||||
|
-------------
|
||||||
|
|
||||||
|
Values stored in ``BPF_MAP_TYPE_ARRAY`` can be accessed by multiple programs
|
||||||
|
across different CPUs. To restrict storage to a single CPU, you may use a
|
||||||
|
``BPF_MAP_TYPE_PERCPU_ARRAY``.
|
||||||
|
|
||||||
|
When using a ``BPF_MAP_TYPE_PERCPU_ARRAY`` the ``bpf_map_update_elem()`` and
|
||||||
|
``bpf_map_lookup_elem()`` helpers automatically access the slot for the current
|
||||||
|
CPU.
|
||||||
|
|
||||||
|
bpf_map_lookup_percpu_elem()
|
||||||
|
~~~~~~~~~~~~~~~~~~~~~~~~~~~~
|
||||||
|
|
||||||
|
.. code-block:: c
|
||||||
|
|
||||||
|
void *bpf_map_lookup_percpu_elem(struct bpf_map *map, const void *key, u32 cpu)
|
||||||
|
|
||||||
|
The ``bpf_map_lookup_percpu_elem()`` helper can be used to lookup the array
|
||||||
|
value for a specific CPU. Returns value on success , or ``NULL`` if no entry was
|
||||||
|
found or ``cpu`` is invalid.
|
||||||
|
|
||||||
|
Concurrency
|
||||||
|
-----------
|
||||||
|
|
||||||
|
Since kernel version 5.1, the BPF infrastructure provides ``struct bpf_spin_lock``
|
||||||
|
to synchronize access.
|
||||||
|
|
||||||
|
Userspace
|
||||||
|
---------
|
||||||
|
|
||||||
|
Access from userspace uses libbpf APIs with the same names as above, with
|
||||||
|
the map identified by its ``fd``.
|
||||||
|
|
||||||
|
Examples
|
||||||
|
========
|
||||||
|
|
||||||
|
Please see the ``tools/testing/selftests/bpf`` directory for functional
|
||||||
|
examples. The code samples below demonstrate API usage.
|
||||||
|
|
||||||
|
Kernel BPF
|
||||||
|
----------
|
||||||
|
|
||||||
|
This snippet shows how to declare an array in a BPF program.
|
||||||
|
|
||||||
|
.. code-block:: c
|
||||||
|
|
||||||
|
struct {
|
||||||
|
__uint(type, BPF_MAP_TYPE_ARRAY);
|
||||||
|
__type(key, u32);
|
||||||
|
__type(value, long);
|
||||||
|
__uint(max_entries, 256);
|
||||||
|
} my_map SEC(".maps");
|
||||||
|
|
||||||
|
|
||||||
|
This example BPF program shows how to access an array element.
|
||||||
|
|
||||||
|
.. code-block:: c
|
||||||
|
|
||||||
|
int bpf_prog(struct __sk_buff *skb)
|
||||||
|
{
|
||||||
|
struct iphdr ip;
|
||||||
|
int index;
|
||||||
|
long *value;
|
||||||
|
|
||||||
|
if (bpf_skb_load_bytes(skb, ETH_HLEN, &ip, sizeof(ip)) < 0)
|
||||||
|
return 0;
|
||||||
|
|
||||||
|
index = ip.protocol;
|
||||||
|
value = bpf_map_lookup_elem(&my_map, &index);
|
||||||
|
if (value)
|
||||||
|
__sync_fetch_and_add(value, skb->len);
|
||||||
|
|
||||||
|
return 0;
|
||||||
|
}
|
||||||
|
|
||||||
|
Userspace
|
||||||
|
---------
|
||||||
|
|
||||||
|
BPF_MAP_TYPE_ARRAY
|
||||||
|
~~~~~~~~~~~~~~~~~~
|
||||||
|
|
||||||
|
This snippet shows how to create an array, using ``bpf_map_create_opts`` to
|
||||||
|
set flags.
|
||||||
|
|
||||||
|
.. code-block:: c
|
||||||
|
|
||||||
|
#include <bpf/libbpf.h>
|
||||||
|
#include <bpf/bpf.h>
|
||||||
|
|
||||||
|
int create_array()
|
||||||
|
{
|
||||||
|
int fd;
|
||||||
|
LIBBPF_OPTS(bpf_map_create_opts, opts, .map_flags = BPF_F_MMAPABLE);
|
||||||
|
|
||||||
|
fd = bpf_map_create(BPF_MAP_TYPE_ARRAY,
|
||||||
|
"example_array", /* name */
|
||||||
|
sizeof(__u32), /* key size */
|
||||||
|
sizeof(long), /* value size */
|
||||||
|
256, /* max entries */
|
||||||
|
&opts); /* create opts */
|
||||||
|
return fd;
|
||||||
|
}
|
||||||
|
|
||||||
|
This snippet shows how to initialize the elements of an array.
|
||||||
|
|
||||||
|
.. code-block:: c
|
||||||
|
|
||||||
|
int initialize_array(int fd)
|
||||||
|
{
|
||||||
|
__u32 i;
|
||||||
|
long value;
|
||||||
|
int ret;
|
||||||
|
|
||||||
|
for (i = 0; i < 256; i++) {
|
||||||
|
value = i;
|
||||||
|
ret = bpf_map_update_elem(fd, &i, &value, BPF_ANY);
|
||||||
|
if (ret < 0)
|
||||||
|
return ret;
|
||||||
|
}
|
||||||
|
|
||||||
|
return ret;
|
||||||
|
}
|
||||||
|
|
||||||
|
This snippet shows how to retrieve an element value from an array.
|
||||||
|
|
||||||
|
.. code-block:: c
|
||||||
|
|
||||||
|
int lookup(int fd)
|
||||||
|
{
|
||||||
|
__u32 index = 42;
|
||||||
|
long value;
|
||||||
|
int ret;
|
||||||
|
|
||||||
|
ret = bpf_map_lookup_elem(fd, &index, &value);
|
||||||
|
if (ret < 0)
|
||||||
|
return ret;
|
||||||
|
|
||||||
|
/* use value here */
|
||||||
|
assert(value == 42);
|
||||||
|
|
||||||
|
return ret;
|
||||||
|
}
|
||||||
|
|
||||||
|
BPF_MAP_TYPE_PERCPU_ARRAY
|
||||||
|
~~~~~~~~~~~~~~~~~~~~~~~~~
|
||||||
|
|
||||||
|
This snippet shows how to initialize the elements of a per CPU array.
|
||||||
|
|
||||||
|
.. code-block:: c
|
||||||
|
|
||||||
|
int initialize_array(int fd)
|
||||||
|
{
|
||||||
|
int ncpus = libbpf_num_possible_cpus();
|
||||||
|
long values[ncpus];
|
||||||
|
__u32 i, j;
|
||||||
|
int ret;
|
||||||
|
|
||||||
|
for (i = 0; i < 256 ; i++) {
|
||||||
|
for (j = 0; j < ncpus; j++)
|
||||||
|
values[j] = i;
|
||||||
|
ret = bpf_map_update_elem(fd, &i, &values, BPF_ANY);
|
||||||
|
if (ret < 0)
|
||||||
|
return ret;
|
||||||
|
}
|
||||||
|
|
||||||
|
return ret;
|
||||||
|
}
|
||||||
|
|
||||||
|
This snippet shows how to access the per CPU elements of an array value.
|
||||||
|
|
||||||
|
.. code-block:: c
|
||||||
|
|
||||||
|
int lookup(int fd)
|
||||||
|
{
|
||||||
|
int ncpus = libbpf_num_possible_cpus();
|
||||||
|
__u32 index = 42, j;
|
||||||
|
long values[ncpus];
|
||||||
|
int ret;
|
||||||
|
|
||||||
|
ret = bpf_map_lookup_elem(fd, &index, &values);
|
||||||
|
if (ret < 0)
|
||||||
|
return ret;
|
||||||
|
|
||||||
|
for (j = 0; j < ncpus; j++) {
|
||||||
|
/* Use per CPU value here */
|
||||||
|
assert(values[j] == 42);
|
||||||
|
}
|
||||||
|
|
||||||
|
return ret;
|
||||||
|
}
|
||||||
|
|
||||||
|
Semantics
|
||||||
|
=========
|
||||||
|
|
||||||
|
As shown in the example above, when accessing a ``BPF_MAP_TYPE_PERCPU_ARRAY``
|
||||||
|
in userspace, each value is an array with ``ncpus`` elements.
|
||||||
|
|
||||||
|
When calling ``bpf_map_update_elem()`` the flag ``BPF_NOEXIST`` can not be used
|
||||||
|
for these maps.
|
174
Documentation/bpf/map_bloom_filter.rst
Normal file
174
Documentation/bpf/map_bloom_filter.rst
Normal file
|
@ -0,0 +1,174 @@
|
||||||
|
.. SPDX-License-Identifier: GPL-2.0-only
|
||||||
|
.. Copyright (C) 2022 Red Hat, Inc.
|
||||||
|
|
||||||
|
=========================
|
||||||
|
BPF_MAP_TYPE_BLOOM_FILTER
|
||||||
|
=========================
|
||||||
|
|
||||||
|
.. note::
|
||||||
|
- ``BPF_MAP_TYPE_BLOOM_FILTER`` was introduced in kernel version 5.16
|
||||||
|
|
||||||
|
``BPF_MAP_TYPE_BLOOM_FILTER`` provides a BPF bloom filter map. Bloom
|
||||||
|
filters are a space-efficient probabilistic data structure used to
|
||||||
|
quickly test whether an element exists in a set. In a bloom filter,
|
||||||
|
false positives are possible whereas false negatives are not.
|
||||||
|
|
||||||
|
The bloom filter map does not have keys, only values. When the bloom
|
||||||
|
filter map is created, it must be created with a ``key_size`` of 0. The
|
||||||
|
bloom filter map supports two operations:
|
||||||
|
|
||||||
|
- push: adding an element to the map
|
||||||
|
- peek: determining whether an element is present in the map
|
||||||
|
|
||||||
|
BPF programs must use ``bpf_map_push_elem`` to add an element to the
|
||||||
|
bloom filter map and ``bpf_map_peek_elem`` to query the map. These
|
||||||
|
operations are exposed to userspace applications using the existing
|
||||||
|
``bpf`` syscall in the following way:
|
||||||
|
|
||||||
|
- ``BPF_MAP_UPDATE_ELEM`` -> push
|
||||||
|
- ``BPF_MAP_LOOKUP_ELEM`` -> peek
|
||||||
|
|
||||||
|
The ``max_entries`` size that is specified at map creation time is used
|
||||||
|
to approximate a reasonable bitmap size for the bloom filter, and is not
|
||||||
|
otherwise strictly enforced. If the user wishes to insert more entries
|
||||||
|
into the bloom filter than ``max_entries``, this may lead to a higher
|
||||||
|
false positive rate.
|
||||||
|
|
||||||
|
The number of hashes to use for the bloom filter is configurable using
|
||||||
|
the lower 4 bits of ``map_extra`` in ``union bpf_attr`` at map creation
|
||||||
|
time. If no number is specified, the default used will be 5 hash
|
||||||
|
functions. In general, using more hashes decreases both the false
|
||||||
|
positive rate and the speed of a lookup.
|
||||||
|
|
||||||
|
It is not possible to delete elements from a bloom filter map. A bloom
|
||||||
|
filter map may be used as an inner map. The user is responsible for
|
||||||
|
synchronising concurrent updates and lookups to ensure no false negative
|
||||||
|
lookups occur.
|
||||||
|
|
||||||
|
Usage
|
||||||
|
=====
|
||||||
|
|
||||||
|
Kernel BPF
|
||||||
|
----------
|
||||||
|
|
||||||
|
bpf_map_push_elem()
|
||||||
|
~~~~~~~~~~~~~~~~~~~
|
||||||
|
|
||||||
|
.. code-block:: c
|
||||||
|
|
||||||
|
long bpf_map_push_elem(struct bpf_map *map, const void *value, u64 flags)
|
||||||
|
|
||||||
|
A ``value`` can be added to a bloom filter using the
|
||||||
|
``bpf_map_push_elem()`` helper. The ``flags`` parameter must be set to
|
||||||
|
``BPF_ANY`` when adding an entry to the bloom filter. This helper
|
||||||
|
returns ``0`` on success, or negative error in case of failure.
|
||||||
|
|
||||||
|
bpf_map_peek_elem()
|
||||||
|
~~~~~~~~~~~~~~~~~~~
|
||||||
|
|
||||||
|
.. code-block:: c
|
||||||
|
|
||||||
|
long bpf_map_peek_elem(struct bpf_map *map, void *value)
|
||||||
|
|
||||||
|
The ``bpf_map_peek_elem()`` helper is used to determine whether
|
||||||
|
``value`` is present in the bloom filter map. This helper returns ``0``
|
||||||
|
if ``value`` is probably present in the map, or ``-ENOENT`` if ``value``
|
||||||
|
is definitely not present in the map.
|
||||||
|
|
||||||
|
Userspace
|
||||||
|
---------
|
||||||
|
|
||||||
|
bpf_map_update_elem()
|
||||||
|
~~~~~~~~~~~~~~~~~~~~~
|
||||||
|
|
||||||
|
.. code-block:: c
|
||||||
|
|
||||||
|
int bpf_map_update_elem (int fd, const void *key, const void *value, __u64 flags)
|
||||||
|
|
||||||
|
A userspace program can add a ``value`` to a bloom filter using libbpf's
|
||||||
|
``bpf_map_update_elem`` function. The ``key`` parameter must be set to
|
||||||
|
``NULL`` and ``flags`` must be set to ``BPF_ANY``. Returns ``0`` on
|
||||||
|
success, or negative error in case of failure.
|
||||||
|
|
||||||
|
bpf_map_lookup_elem()
|
||||||
|
~~~~~~~~~~~~~~~~~~~~~
|
||||||
|
|
||||||
|
.. code-block:: c
|
||||||
|
|
||||||
|
int bpf_map_lookup_elem (int fd, const void *key, void *value)
|
||||||
|
|
||||||
|
A userspace program can determine the presence of ``value`` in a bloom
|
||||||
|
filter using libbpf's ``bpf_map_lookup_elem`` function. The ``key``
|
||||||
|
parameter must be set to ``NULL``. Returns ``0`` if ``value`` is
|
||||||
|
probably present in the map, or ``-ENOENT`` if ``value`` is definitely
|
||||||
|
not present in the map.
|
||||||
|
|
||||||
|
Examples
|
||||||
|
========
|
||||||
|
|
||||||
|
Kernel BPF
|
||||||
|
----------
|
||||||
|
|
||||||
|
This snippet shows how to declare a bloom filter in a BPF program:
|
||||||
|
|
||||||
|
.. code-block:: c
|
||||||
|
|
||||||
|
struct {
|
||||||
|
__uint(type, BPF_MAP_TYPE_BLOOM_FILTER);
|
||||||
|
__type(value, __u32);
|
||||||
|
__uint(max_entries, 1000);
|
||||||
|
__uint(map_extra, 3);
|
||||||
|
} bloom_filter SEC(".maps");
|
||||||
|
|
||||||
|
This snippet shows how to determine presence of a value in a bloom
|
||||||
|
filter in a BPF program:
|
||||||
|
|
||||||
|
.. code-block:: c
|
||||||
|
|
||||||
|
void *lookup(__u32 key)
|
||||||
|
{
|
||||||
|
if (bpf_map_peek_elem(&bloom_filter, &key) == 0) {
|
||||||
|
/* Verify not a false positive and fetch an associated
|
||||||
|
* value using a secondary lookup, e.g. in a hash table
|
||||||
|
*/
|
||||||
|
return bpf_map_lookup_elem(&hash_table, &key);
|
||||||
|
}
|
||||||
|
return 0;
|
||||||
|
}
|
||||||
|
|
||||||
|
Userspace
|
||||||
|
---------
|
||||||
|
|
||||||
|
This snippet shows how to use libbpf to create a bloom filter map from
|
||||||
|
userspace:
|
||||||
|
|
||||||
|
.. code-block:: c
|
||||||
|
|
||||||
|
int create_bloom()
|
||||||
|
{
|
||||||
|
LIBBPF_OPTS(bpf_map_create_opts, opts,
|
||||||
|
.map_extra = 3); /* number of hashes */
|
||||||
|
|
||||||
|
return bpf_map_create(BPF_MAP_TYPE_BLOOM_FILTER,
|
||||||
|
"ipv6_bloom", /* name */
|
||||||
|
0, /* key size, must be zero */
|
||||||
|
sizeof(ipv6_addr), /* value size */
|
||||||
|
10000, /* max entries */
|
||||||
|
&opts); /* create options */
|
||||||
|
}
|
||||||
|
|
||||||
|
This snippet shows how to add an element to a bloom filter from
|
||||||
|
userspace:
|
||||||
|
|
||||||
|
.. code-block:: c
|
||||||
|
|
||||||
|
int add_element(struct bpf_map *bloom_map, __u32 value)
|
||||||
|
{
|
||||||
|
int bloom_fd = bpf_map__fd(bloom_map);
|
||||||
|
return bpf_map_update_elem(bloom_fd, NULL, &value, BPF_ANY);
|
||||||
|
}
|
||||||
|
|
||||||
|
References
|
||||||
|
==========
|
||||||
|
|
||||||
|
https://lwn.net/ml/bpf/20210831225005.2762202-1-joannekoong@fb.com/
|
109
Documentation/bpf/map_cgrp_storage.rst
Normal file
109
Documentation/bpf/map_cgrp_storage.rst
Normal file
|
@ -0,0 +1,109 @@
|
||||||
|
.. SPDX-License-Identifier: GPL-2.0-only
|
||||||
|
.. Copyright (C) 2022 Meta Platforms, Inc. and affiliates.
|
||||||
|
|
||||||
|
=========================
|
||||||
|
BPF_MAP_TYPE_CGRP_STORAGE
|
||||||
|
=========================
|
||||||
|
|
||||||
|
The ``BPF_MAP_TYPE_CGRP_STORAGE`` map type represents a local fix-sized
|
||||||
|
storage for cgroups. It is only available with ``CONFIG_CGROUPS``.
|
||||||
|
The programs are made available by the same Kconfig. The
|
||||||
|
data for a particular cgroup can be retrieved by looking up the map
|
||||||
|
with that cgroup.
|
||||||
|
|
||||||
|
This document describes the usage and semantics of the
|
||||||
|
``BPF_MAP_TYPE_CGRP_STORAGE`` map type.
|
||||||
|
|
||||||
|
Usage
|
||||||
|
=====
|
||||||
|
|
||||||
|
The map key must be ``sizeof(int)`` representing a cgroup fd.
|
||||||
|
To access the storage in a program, use ``bpf_cgrp_storage_get``::
|
||||||
|
|
||||||
|
void *bpf_cgrp_storage_get(struct bpf_map *map, struct cgroup *cgroup, void *value, u64 flags)
|
||||||
|
|
||||||
|
``flags`` could be 0 or ``BPF_LOCAL_STORAGE_GET_F_CREATE`` which indicates that
|
||||||
|
a new local storage will be created if one does not exist.
|
||||||
|
|
||||||
|
The local storage can be removed with ``bpf_cgrp_storage_delete``::
|
||||||
|
|
||||||
|
long bpf_cgrp_storage_delete(struct bpf_map *map, struct cgroup *cgroup)
|
||||||
|
|
||||||
|
The map is available to all program types.
|
||||||
|
|
||||||
|
Examples
|
||||||
|
========
|
||||||
|
|
||||||
|
A BPF program example with BPF_MAP_TYPE_CGRP_STORAGE::
|
||||||
|
|
||||||
|
#include <vmlinux.h>
|
||||||
|
#include <bpf/bpf_helpers.h>
|
||||||
|
#include <bpf/bpf_tracing.h>
|
||||||
|
|
||||||
|
struct {
|
||||||
|
__uint(type, BPF_MAP_TYPE_CGRP_STORAGE);
|
||||||
|
__uint(map_flags, BPF_F_NO_PREALLOC);
|
||||||
|
__type(key, int);
|
||||||
|
__type(value, long);
|
||||||
|
} cgrp_storage SEC(".maps");
|
||||||
|
|
||||||
|
SEC("tp_btf/sys_enter")
|
||||||
|
int BPF_PROG(on_enter, struct pt_regs *regs, long id)
|
||||||
|
{
|
||||||
|
struct task_struct *task = bpf_get_current_task_btf();
|
||||||
|
long *ptr;
|
||||||
|
|
||||||
|
ptr = bpf_cgrp_storage_get(&cgrp_storage, task->cgroups->dfl_cgrp, 0,
|
||||||
|
BPF_LOCAL_STORAGE_GET_F_CREATE);
|
||||||
|
if (ptr)
|
||||||
|
__sync_fetch_and_add(ptr, 1);
|
||||||
|
|
||||||
|
return 0;
|
||||||
|
}
|
||||||
|
|
||||||
|
Userspace accessing map declared above::
|
||||||
|
|
||||||
|
#include <linux/bpf.h>
|
||||||
|
#include <linux/libbpf.h>
|
||||||
|
|
||||||
|
__u32 map_lookup(struct bpf_map *map, int cgrp_fd)
|
||||||
|
{
|
||||||
|
__u32 *value;
|
||||||
|
value = bpf_map_lookup_elem(bpf_map__fd(map), &cgrp_fd);
|
||||||
|
if (value)
|
||||||
|
return *value;
|
||||||
|
return 0;
|
||||||
|
}
|
||||||
|
|
||||||
|
Difference Between BPF_MAP_TYPE_CGRP_STORAGE and BPF_MAP_TYPE_CGROUP_STORAGE
|
||||||
|
============================================================================
|
||||||
|
|
||||||
|
The old cgroup storage map ``BPF_MAP_TYPE_CGROUP_STORAGE`` has been marked as
|
||||||
|
deprecated (renamed to ``BPF_MAP_TYPE_CGROUP_STORAGE_DEPRECATED``). The new
|
||||||
|
``BPF_MAP_TYPE_CGRP_STORAGE`` map should be used instead. The following
|
||||||
|
illusates the main difference between ``BPF_MAP_TYPE_CGRP_STORAGE`` and
|
||||||
|
``BPF_MAP_TYPE_CGROUP_STORAGE_DEPRECATED``.
|
||||||
|
|
||||||
|
(1). ``BPF_MAP_TYPE_CGRP_STORAGE`` can be used by all program types while
|
||||||
|
``BPF_MAP_TYPE_CGROUP_STORAGE_DEPRECATED`` is available only to cgroup program types
|
||||||
|
like BPF_CGROUP_INET_INGRESS or BPF_CGROUP_SOCK_OPS, etc.
|
||||||
|
|
||||||
|
(2). ``BPF_MAP_TYPE_CGRP_STORAGE`` supports local storage for more than one
|
||||||
|
cgroup while ``BPF_MAP_TYPE_CGROUP_STORAGE_DEPRECATED`` only supports one cgroup
|
||||||
|
which is attached by a BPF program.
|
||||||
|
|
||||||
|
(3). ``BPF_MAP_TYPE_CGROUP_STORAGE_DEPRECATED`` allocates local storage at attach time so
|
||||||
|
``bpf_get_local_storage()`` always returns non-NULL local storage.
|
||||||
|
``BPF_MAP_TYPE_CGRP_STORAGE`` allocates local storage at runtime so
|
||||||
|
it is possible that ``bpf_cgrp_storage_get()`` may return null local storage.
|
||||||
|
To avoid such null local storage issue, user space can do
|
||||||
|
``bpf_map_update_elem()`` to pre-allocate local storage before a BPF program
|
||||||
|
is attached.
|
||||||
|
|
||||||
|
(4). ``BPF_MAP_TYPE_CGRP_STORAGE`` supports deleting local storage by a BPF program
|
||||||
|
while ``BPF_MAP_TYPE_CGROUP_STORAGE_DEPRECATED`` only deletes storage during
|
||||||
|
prog detach time.
|
||||||
|
|
||||||
|
So overall, ``BPF_MAP_TYPE_CGRP_STORAGE`` supports all ``BPF_MAP_TYPE_CGROUP_STORAGE_DEPRECATED``
|
||||||
|
functionality and beyond. It is recommended to use ``BPF_MAP_TYPE_CGRP_STORAGE``
|
||||||
|
instead of ``BPF_MAP_TYPE_CGROUP_STORAGE_DEPRECATED``.
|
177
Documentation/bpf/map_cpumap.rst
Normal file
177
Documentation/bpf/map_cpumap.rst
Normal file
|
@ -0,0 +1,177 @@
|
||||||
|
.. SPDX-License-Identifier: GPL-2.0-only
|
||||||
|
.. Copyright (C) 2022 Red Hat, Inc.
|
||||||
|
|
||||||
|
===================
|
||||||
|
BPF_MAP_TYPE_CPUMAP
|
||||||
|
===================
|
||||||
|
|
||||||
|
.. note::
|
||||||
|
- ``BPF_MAP_TYPE_CPUMAP`` was introduced in kernel version 4.15
|
||||||
|
|
||||||
|
.. kernel-doc:: kernel/bpf/cpumap.c
|
||||||
|
:doc: cpu map
|
||||||
|
|
||||||
|
An example use-case for this map type is software based Receive Side Scaling (RSS).
|
||||||
|
|
||||||
|
The CPUMAP represents the CPUs in the system indexed as the map-key, and the
|
||||||
|
map-value is the config setting (per CPUMAP entry). Each CPUMAP entry has a dedicated
|
||||||
|
kernel thread bound to the given CPU to represent the remote CPU execution unit.
|
||||||
|
|
||||||
|
Starting from Linux kernel version 5.9 the CPUMAP can run a second XDP program
|
||||||
|
on the remote CPU. This allows an XDP program to split its processing across
|
||||||
|
multiple CPUs. For example, a scenario where the initial CPU (that sees/receives
|
||||||
|
the packets) needs to do minimal packet processing and the remote CPU (to which
|
||||||
|
the packet is directed) can afford to spend more cycles processing the frame. The
|
||||||
|
initial CPU is where the XDP redirect program is executed. The remote CPU
|
||||||
|
receives raw ``xdp_frame`` objects.
|
||||||
|
|
||||||
|
Usage
|
||||||
|
=====
|
||||||
|
|
||||||
|
Kernel BPF
|
||||||
|
----------
|
||||||
|
bpf_redirect_map()
|
||||||
|
^^^^^^^^^^^^^^^^^^
|
||||||
|
.. code-block:: c
|
||||||
|
|
||||||
|
long bpf_redirect_map(struct bpf_map *map, u32 key, u64 flags)
|
||||||
|
|
||||||
|
Redirect the packet to the endpoint referenced by ``map`` at index ``key``.
|
||||||
|
For ``BPF_MAP_TYPE_CPUMAP`` this map contains references to CPUs.
|
||||||
|
|
||||||
|
The lower two bits of ``flags`` are used as the return code if the map lookup
|
||||||
|
fails. This is so that the return value can be one of the XDP program return
|
||||||
|
codes up to ``XDP_TX``, as chosen by the caller.
|
||||||
|
|
||||||
|
User space
|
||||||
|
----------
|
||||||
|
.. note::
|
||||||
|
CPUMAP entries can only be updated/looked up/deleted from user space and not
|
||||||
|
from an eBPF program. Trying to call these functions from a kernel eBPF
|
||||||
|
program will result in the program failing to load and a verifier warning.
|
||||||
|
|
||||||
|
bpf_map_update_elem()
|
||||||
|
^^^^^^^^^^^^^^^^^^^^^
|
||||||
|
.. code-block:: c
|
||||||
|
|
||||||
|
int bpf_map_update_elem(int fd, const void *key, const void *value, __u64 flags);
|
||||||
|
|
||||||
|
CPU entries can be added or updated using the ``bpf_map_update_elem()``
|
||||||
|
helper. This helper replaces existing elements atomically. The ``value`` parameter
|
||||||
|
can be ``struct bpf_cpumap_val``.
|
||||||
|
|
||||||
|
.. code-block:: c
|
||||||
|
|
||||||
|
struct bpf_cpumap_val {
|
||||||
|
__u32 qsize; /* queue size to remote target CPU */
|
||||||
|
union {
|
||||||
|
int fd; /* prog fd on map write */
|
||||||
|
__u32 id; /* prog id on map read */
|
||||||
|
} bpf_prog;
|
||||||
|
};
|
||||||
|
|
||||||
|
The flags argument can be one of the following:
|
||||||
|
- BPF_ANY: Create a new element or update an existing element.
|
||||||
|
- BPF_NOEXIST: Create a new element only if it did not exist.
|
||||||
|
- BPF_EXIST: Update an existing element.
|
||||||
|
|
||||||
|
bpf_map_lookup_elem()
|
||||||
|
^^^^^^^^^^^^^^^^^^^^^
|
||||||
|
.. code-block:: c
|
||||||
|
|
||||||
|
int bpf_map_lookup_elem(int fd, const void *key, void *value);
|
||||||
|
|
||||||
|
CPU entries can be retrieved using the ``bpf_map_lookup_elem()``
|
||||||
|
helper.
|
||||||
|
|
||||||
|
bpf_map_delete_elem()
|
||||||
|
^^^^^^^^^^^^^^^^^^^^^
|
||||||
|
.. code-block:: c
|
||||||
|
|
||||||
|
int bpf_map_delete_elem(int fd, const void *key);
|
||||||
|
|
||||||
|
CPU entries can be deleted using the ``bpf_map_delete_elem()``
|
||||||
|
helper. This helper will return 0 on success, or negative error in case of
|
||||||
|
failure.
|
||||||
|
|
||||||
|
Examples
|
||||||
|
========
|
||||||
|
Kernel
|
||||||
|
------
|
||||||
|
|
||||||
|
The following code snippet shows how to declare a ``BPF_MAP_TYPE_CPUMAP`` called
|
||||||
|
``cpu_map`` and how to redirect packets to a remote CPU using a round robin scheme.
|
||||||
|
|
||||||
|
.. code-block:: c
|
||||||
|
|
||||||
|
struct {
|
||||||
|
__uint(type, BPF_MAP_TYPE_CPUMAP);
|
||||||
|
__type(key, __u32);
|
||||||
|
__type(value, struct bpf_cpumap_val);
|
||||||
|
__uint(max_entries, 12);
|
||||||
|
} cpu_map SEC(".maps");
|
||||||
|
|
||||||
|
struct {
|
||||||
|
__uint(type, BPF_MAP_TYPE_ARRAY);
|
||||||
|
__type(key, __u32);
|
||||||
|
__type(value, __u32);
|
||||||
|
__uint(max_entries, 12);
|
||||||
|
} cpus_available SEC(".maps");
|
||||||
|
|
||||||
|
struct {
|
||||||
|
__uint(type, BPF_MAP_TYPE_PERCPU_ARRAY);
|
||||||
|
__type(key, __u32);
|
||||||
|
__type(value, __u32);
|
||||||
|
__uint(max_entries, 1);
|
||||||
|
} cpus_iterator SEC(".maps");
|
||||||
|
|
||||||
|
SEC("xdp")
|
||||||
|
int xdp_redir_cpu_round_robin(struct xdp_md *ctx)
|
||||||
|
{
|
||||||
|
__u32 key = 0;
|
||||||
|
__u32 cpu_dest = 0;
|
||||||
|
__u32 *cpu_selected, *cpu_iterator;
|
||||||
|
__u32 cpu_idx;
|
||||||
|
|
||||||
|
cpu_iterator = bpf_map_lookup_elem(&cpus_iterator, &key);
|
||||||
|
if (!cpu_iterator)
|
||||||
|
return XDP_ABORTED;
|
||||||
|
cpu_idx = *cpu_iterator;
|
||||||
|
|
||||||
|
*cpu_iterator += 1;
|
||||||
|
if (*cpu_iterator == bpf_num_possible_cpus())
|
||||||
|
*cpu_iterator = 0;
|
||||||
|
|
||||||
|
cpu_selected = bpf_map_lookup_elem(&cpus_available, &cpu_idx);
|
||||||
|
if (!cpu_selected)
|
||||||
|
return XDP_ABORTED;
|
||||||
|
cpu_dest = *cpu_selected;
|
||||||
|
|
||||||
|
if (cpu_dest >= bpf_num_possible_cpus())
|
||||||
|
return XDP_ABORTED;
|
||||||
|
|
||||||
|
return bpf_redirect_map(&cpu_map, cpu_dest, 0);
|
||||||
|
}
|
||||||
|
|
||||||
|
User space
|
||||||
|
----------
|
||||||
|
|
||||||
|
The following code snippet shows how to dynamically set the max_entries for a
|
||||||
|
CPUMAP to the max number of cpus available on the system.
|
||||||
|
|
||||||
|
.. code-block:: c
|
||||||
|
|
||||||
|
int set_max_cpu_entries(struct bpf_map *cpu_map)
|
||||||
|
{
|
||||||
|
if (bpf_map__set_max_entries(cpu_map, libbpf_num_possible_cpus()) < 0) {
|
||||||
|
fprintf(stderr, "Failed to set max entries for cpu_map map: %s",
|
||||||
|
strerror(errno));
|
||||||
|
return -1;
|
||||||
|
}
|
||||||
|
return 0;
|
||||||
|
}
|
||||||
|
|
||||||
|
References
|
||||||
|
===========
|
||||||
|
|
||||||
|
- https://developers.redhat.com/blog/2021/05/13/receive-side-scaling-rss-with-ebpf-and-cpumap#redirecting_into_a_cpumap
|
238
Documentation/bpf/map_devmap.rst
Normal file
238
Documentation/bpf/map_devmap.rst
Normal file
|
@ -0,0 +1,238 @@
|
||||||
|
.. SPDX-License-Identifier: GPL-2.0-only
|
||||||
|
.. Copyright (C) 2022 Red Hat, Inc.
|
||||||
|
|
||||||
|
=================================================
|
||||||
|
BPF_MAP_TYPE_DEVMAP and BPF_MAP_TYPE_DEVMAP_HASH
|
||||||
|
=================================================
|
||||||
|
|
||||||
|
.. note::
|
||||||
|
- ``BPF_MAP_TYPE_DEVMAP`` was introduced in kernel version 4.14
|
||||||
|
- ``BPF_MAP_TYPE_DEVMAP_HASH`` was introduced in kernel version 5.4
|
||||||
|
|
||||||
|
``BPF_MAP_TYPE_DEVMAP`` and ``BPF_MAP_TYPE_DEVMAP_HASH`` are BPF maps primarily
|
||||||
|
used as backend maps for the XDP BPF helper call ``bpf_redirect_map()``.
|
||||||
|
``BPF_MAP_TYPE_DEVMAP`` is backed by an array that uses the key as
|
||||||
|
the index to lookup a reference to a net device. While ``BPF_MAP_TYPE_DEVMAP_HASH``
|
||||||
|
is backed by a hash table that uses a key to lookup a reference to a net device.
|
||||||
|
The user provides either <``key``/ ``ifindex``> or <``key``/ ``struct bpf_devmap_val``>
|
||||||
|
pairs to update the maps with new net devices.
|
||||||
|
|
||||||
|
.. note::
|
||||||
|
- The key to a hash map doesn't have to be an ``ifindex``.
|
||||||
|
- While ``BPF_MAP_TYPE_DEVMAP_HASH`` allows for densely packing the net devices
|
||||||
|
it comes at the cost of a hash of the key when performing a look up.
|
||||||
|
|
||||||
|
The setup and packet enqueue/send code is shared between the two types of
|
||||||
|
devmap; only the lookup and insertion is different.
|
||||||
|
|
||||||
|
Usage
|
||||||
|
=====
|
||||||
|
Kernel BPF
|
||||||
|
----------
|
||||||
|
bpf_redirect_map()
|
||||||
|
^^^^^^^^^^^^^^^^^^
|
||||||
|
.. code-block:: c
|
||||||
|
|
||||||
|
long bpf_redirect_map(struct bpf_map *map, u32 key, u64 flags)
|
||||||
|
|
||||||
|
Redirect the packet to the endpoint referenced by ``map`` at index ``key``.
|
||||||
|
For ``BPF_MAP_TYPE_DEVMAP`` and ``BPF_MAP_TYPE_DEVMAP_HASH`` this map contains
|
||||||
|
references to net devices (for forwarding packets through other ports).
|
||||||
|
|
||||||
|
The lower two bits of *flags* are used as the return code if the map lookup
|
||||||
|
fails. This is so that the return value can be one of the XDP program return
|
||||||
|
codes up to ``XDP_TX``, as chosen by the caller. The higher bits of ``flags``
|
||||||
|
can be set to ``BPF_F_BROADCAST`` or ``BPF_F_EXCLUDE_INGRESS`` as defined
|
||||||
|
below.
|
||||||
|
|
||||||
|
With ``BPF_F_BROADCAST`` the packet will be broadcast to all the interfaces
|
||||||
|
in the map, with ``BPF_F_EXCLUDE_INGRESS`` the ingress interface will be excluded
|
||||||
|
from the broadcast.
|
||||||
|
|
||||||
|
.. note::
|
||||||
|
- The key is ignored if BPF_F_BROADCAST is set.
|
||||||
|
- The broadcast feature can also be used to implement multicast forwarding:
|
||||||
|
simply create multiple DEVMAPs, each one corresponding to a single multicast group.
|
||||||
|
|
||||||
|
This helper will return ``XDP_REDIRECT`` on success, or the value of the two
|
||||||
|
lower bits of the ``flags`` argument if the map lookup fails.
|
||||||
|
|
||||||
|
More information about redirection can be found :doc:`redirect`
|
||||||
|
|
||||||
|
bpf_map_lookup_elem()
|
||||||
|
^^^^^^^^^^^^^^^^^^^^^
|
||||||
|
.. code-block:: c
|
||||||
|
|
||||||
|
void *bpf_map_lookup_elem(struct bpf_map *map, const void *key)
|
||||||
|
|
||||||
|
Net device entries can be retrieved using the ``bpf_map_lookup_elem()``
|
||||||
|
helper.
|
||||||
|
|
||||||
|
User space
|
||||||
|
----------
|
||||||
|
.. note::
|
||||||
|
DEVMAP entries can only be updated/deleted from user space and not
|
||||||
|
from an eBPF program. Trying to call these functions from a kernel eBPF
|
||||||
|
program will result in the program failing to load and a verifier warning.
|
||||||
|
|
||||||
|
bpf_map_update_elem()
|
||||||
|
^^^^^^^^^^^^^^^^^^^^^
|
||||||
|
.. code-block:: c
|
||||||
|
|
||||||
|
int bpf_map_update_elem(int fd, const void *key, const void *value, __u64 flags);
|
||||||
|
|
||||||
|
Net device entries can be added or updated using the ``bpf_map_update_elem()``
|
||||||
|
helper. This helper replaces existing elements atomically. The ``value`` parameter
|
||||||
|
can be ``struct bpf_devmap_val`` or a simple ``int ifindex`` for backwards
|
||||||
|
compatibility.
|
||||||
|
|
||||||
|
.. code-block:: c
|
||||||
|
|
||||||
|
struct bpf_devmap_val {
|
||||||
|
__u32 ifindex; /* device index */
|
||||||
|
union {
|
||||||
|
int fd; /* prog fd on map write */
|
||||||
|
__u32 id; /* prog id on map read */
|
||||||
|
} bpf_prog;
|
||||||
|
};
|
||||||
|
|
||||||
|
The ``flags`` argument can be one of the following:
|
||||||
|
- ``BPF_ANY``: Create a new element or update an existing element.
|
||||||
|
- ``BPF_NOEXIST``: Create a new element only if it did not exist.
|
||||||
|
- ``BPF_EXIST``: Update an existing element.
|
||||||
|
|
||||||
|
DEVMAPs can associate a program with a device entry by adding a ``bpf_prog.fd``
|
||||||
|
to ``struct bpf_devmap_val``. Programs are run after ``XDP_REDIRECT`` and have
|
||||||
|
access to both Rx device and Tx device. The program associated with the ``fd``
|
||||||
|
must have type XDP with expected attach type ``xdp_devmap``.
|
||||||
|
When a program is associated with a device index, the program is run on an
|
||||||
|
``XDP_REDIRECT`` and before the buffer is added to the per-cpu queue. Examples
|
||||||
|
of how to attach/use xdp_devmap progs can be found in the kernel selftests:
|
||||||
|
|
||||||
|
- ``tools/testing/selftests/bpf/prog_tests/xdp_devmap_attach.c``
|
||||||
|
- ``tools/testing/selftests/bpf/progs/test_xdp_with_devmap_helpers.c``
|
||||||
|
|
||||||
|
bpf_map_lookup_elem()
|
||||||
|
^^^^^^^^^^^^^^^^^^^^^
|
||||||
|
.. code-block:: c
|
||||||
|
|
||||||
|
.. c:function::
|
||||||
|
int bpf_map_lookup_elem(int fd, const void *key, void *value);
|
||||||
|
|
||||||
|
Net device entries can be retrieved using the ``bpf_map_lookup_elem()``
|
||||||
|
helper.
|
||||||
|
|
||||||
|
bpf_map_delete_elem()
|
||||||
|
^^^^^^^^^^^^^^^^^^^^^
|
||||||
|
.. code-block:: c
|
||||||
|
|
||||||
|
.. c:function::
|
||||||
|
int bpf_map_delete_elem(int fd, const void *key);
|
||||||
|
|
||||||
|
Net device entries can be deleted using the ``bpf_map_delete_elem()``
|
||||||
|
helper. This helper will return 0 on success, or negative error in case of
|
||||||
|
failure.
|
||||||
|
|
||||||
|
Examples
|
||||||
|
========
|
||||||
|
|
||||||
|
Kernel BPF
|
||||||
|
----------
|
||||||
|
|
||||||
|
The following code snippet shows how to declare a ``BPF_MAP_TYPE_DEVMAP``
|
||||||
|
called tx_port.
|
||||||
|
|
||||||
|
.. code-block:: c
|
||||||
|
|
||||||
|
struct {
|
||||||
|
__uint(type, BPF_MAP_TYPE_DEVMAP);
|
||||||
|
__type(key, __u32);
|
||||||
|
__type(value, __u32);
|
||||||
|
__uint(max_entries, 256);
|
||||||
|
} tx_port SEC(".maps");
|
||||||
|
|
||||||
|
The following code snippet shows how to declare a ``BPF_MAP_TYPE_DEVMAP_HASH``
|
||||||
|
called forward_map.
|
||||||
|
|
||||||
|
.. code-block:: c
|
||||||
|
|
||||||
|
struct {
|
||||||
|
__uint(type, BPF_MAP_TYPE_DEVMAP_HASH);
|
||||||
|
__type(key, __u32);
|
||||||
|
__type(value, struct bpf_devmap_val);
|
||||||
|
__uint(max_entries, 32);
|
||||||
|
} forward_map SEC(".maps");
|
||||||
|
|
||||||
|
.. note::
|
||||||
|
|
||||||
|
The value type in the DEVMAP above is a ``struct bpf_devmap_val``
|
||||||
|
|
||||||
|
The following code snippet shows a simple xdp_redirect_map program. This program
|
||||||
|
would work with a user space program that populates the devmap ``forward_map`` based
|
||||||
|
on ingress ifindexes. The BPF program (below) is redirecting packets using the
|
||||||
|
ingress ``ifindex`` as the ``key``.
|
||||||
|
|
||||||
|
.. code-block:: c
|
||||||
|
|
||||||
|
SEC("xdp")
|
||||||
|
int xdp_redirect_map_func(struct xdp_md *ctx)
|
||||||
|
{
|
||||||
|
int index = ctx->ingress_ifindex;
|
||||||
|
|
||||||
|
return bpf_redirect_map(&forward_map, index, 0);
|
||||||
|
}
|
||||||
|
|
||||||
|
The following code snippet shows a BPF program that is broadcasting packets to
|
||||||
|
all the interfaces in the ``tx_port`` devmap.
|
||||||
|
|
||||||
|
.. code-block:: c
|
||||||
|
|
||||||
|
SEC("xdp")
|
||||||
|
int xdp_redirect_map_func(struct xdp_md *ctx)
|
||||||
|
{
|
||||||
|
return bpf_redirect_map(&tx_port, 0, BPF_F_BROADCAST | BPF_F_EXCLUDE_INGRESS);
|
||||||
|
}
|
||||||
|
|
||||||
|
User space
|
||||||
|
----------
|
||||||
|
|
||||||
|
The following code snippet shows how to update a devmap called ``tx_port``.
|
||||||
|
|
||||||
|
.. code-block:: c
|
||||||
|
|
||||||
|
int update_devmap(int ifindex, int redirect_ifindex)
|
||||||
|
{
|
||||||
|
int ret;
|
||||||
|
|
||||||
|
ret = bpf_map_update_elem(bpf_map__fd(tx_port), &ifindex, &redirect_ifindex, 0);
|
||||||
|
if (ret < 0) {
|
||||||
|
fprintf(stderr, "Failed to update devmap_ value: %s\n",
|
||||||
|
strerror(errno));
|
||||||
|
}
|
||||||
|
|
||||||
|
return ret;
|
||||||
|
}
|
||||||
|
|
||||||
|
The following code snippet shows how to update a hash_devmap called ``forward_map``.
|
||||||
|
|
||||||
|
.. code-block:: c
|
||||||
|
|
||||||
|
int update_devmap(int ifindex, int redirect_ifindex)
|
||||||
|
{
|
||||||
|
struct bpf_devmap_val devmap_val = { .ifindex = redirect_ifindex };
|
||||||
|
int ret;
|
||||||
|
|
||||||
|
ret = bpf_map_update_elem(bpf_map__fd(forward_map), &ifindex, &devmap_val, 0);
|
||||||
|
if (ret < 0) {
|
||||||
|
fprintf(stderr, "Failed to update devmap_ value: %s\n",
|
||||||
|
strerror(errno));
|
||||||
|
}
|
||||||
|
return ret;
|
||||||
|
}
|
||||||
|
|
||||||
|
References
|
||||||
|
===========
|
||||||
|
|
||||||
|
- https://lwn.net/Articles/728146/
|
||||||
|
- https://git.kernel.org/pub/scm/linux/kernel/git/bpf/bpf-next.git/commit/?id=6f9d451ab1a33728adb72d7ff66a7b374d665176
|
||||||
|
- https://elixir.bootlin.com/linux/latest/source/net/core/filter.c#L4106
|
|
@ -34,7 +34,14 @@ the ``BPF_F_NO_COMMON_LRU`` flag when calling ``bpf_map_create``.
|
||||||
Usage
|
Usage
|
||||||
=====
|
=====
|
||||||
|
|
||||||
.. c:function::
|
Kernel BPF
|
||||||
|
----------
|
||||||
|
|
||||||
|
bpf_map_update_elem()
|
||||||
|
~~~~~~~~~~~~~~~~~~~~~
|
||||||
|
|
||||||
|
.. code-block:: c
|
||||||
|
|
||||||
long bpf_map_update_elem(struct bpf_map *map, const void *key, const void *value, u64 flags)
|
long bpf_map_update_elem(struct bpf_map *map, const void *key, const void *value, u64 flags)
|
||||||
|
|
||||||
Hash entries can be added or updated using the ``bpf_map_update_elem()``
|
Hash entries can be added or updated using the ``bpf_map_update_elem()``
|
||||||
|
@ -49,14 +56,22 @@ parameter can be used to control the update behaviour:
|
||||||
``bpf_map_update_elem()`` returns 0 on success, or negative error in
|
``bpf_map_update_elem()`` returns 0 on success, or negative error in
|
||||||
case of failure.
|
case of failure.
|
||||||
|
|
||||||
.. c:function::
|
bpf_map_lookup_elem()
|
||||||
|
~~~~~~~~~~~~~~~~~~~~~
|
||||||
|
|
||||||
|
.. code-block:: c
|
||||||
|
|
||||||
void *bpf_map_lookup_elem(struct bpf_map *map, const void *key)
|
void *bpf_map_lookup_elem(struct bpf_map *map, const void *key)
|
||||||
|
|
||||||
Hash entries can be retrieved using the ``bpf_map_lookup_elem()``
|
Hash entries can be retrieved using the ``bpf_map_lookup_elem()``
|
||||||
helper. This helper returns a pointer to the value associated with
|
helper. This helper returns a pointer to the value associated with
|
||||||
``key``, or ``NULL`` if no entry was found.
|
``key``, or ``NULL`` if no entry was found.
|
||||||
|
|
||||||
.. c:function::
|
bpf_map_delete_elem()
|
||||||
|
~~~~~~~~~~~~~~~~~~~~~
|
||||||
|
|
||||||
|
.. code-block:: c
|
||||||
|
|
||||||
long bpf_map_delete_elem(struct bpf_map *map, const void *key)
|
long bpf_map_delete_elem(struct bpf_map *map, const void *key)
|
||||||
|
|
||||||
Hash entries can be deleted using the ``bpf_map_delete_elem()``
|
Hash entries can be deleted using the ``bpf_map_delete_elem()``
|
||||||
|
@ -70,7 +85,11 @@ For ``BPF_MAP_TYPE_PERCPU_HASH`` and ``BPF_MAP_TYPE_LRU_PERCPU_HASH``
|
||||||
the ``bpf_map_update_elem()`` and ``bpf_map_lookup_elem()`` helpers
|
the ``bpf_map_update_elem()`` and ``bpf_map_lookup_elem()`` helpers
|
||||||
automatically access the hash slot for the current CPU.
|
automatically access the hash slot for the current CPU.
|
||||||
|
|
||||||
.. c:function::
|
bpf_map_lookup_percpu_elem()
|
||||||
|
~~~~~~~~~~~~~~~~~~~~~~~~~~~~
|
||||||
|
|
||||||
|
.. code-block:: c
|
||||||
|
|
||||||
void *bpf_map_lookup_percpu_elem(struct bpf_map *map, const void *key, u32 cpu)
|
void *bpf_map_lookup_percpu_elem(struct bpf_map *map, const void *key, u32 cpu)
|
||||||
|
|
||||||
The ``bpf_map_lookup_percpu_elem()`` helper can be used to lookup the
|
The ``bpf_map_lookup_percpu_elem()`` helper can be used to lookup the
|
||||||
|
@ -89,7 +108,11 @@ See ``tools/testing/selftests/bpf/progs/test_spin_lock.c``.
|
||||||
Userspace
|
Userspace
|
||||||
---------
|
---------
|
||||||
|
|
||||||
.. c:function::
|
bpf_map_get_next_key()
|
||||||
|
~~~~~~~~~~~~~~~~~~~~~~
|
||||||
|
|
||||||
|
.. code-block:: c
|
||||||
|
|
||||||
int bpf_map_get_next_key(int fd, const void *cur_key, void *next_key)
|
int bpf_map_get_next_key(int fd, const void *cur_key, void *next_key)
|
||||||
|
|
||||||
In userspace, it is possible to iterate through the keys of a hash using
|
In userspace, it is possible to iterate through the keys of a hash using
|
||||||
|
|
197
Documentation/bpf/map_lpm_trie.rst
Normal file
197
Documentation/bpf/map_lpm_trie.rst
Normal file
|
@ -0,0 +1,197 @@
|
||||||
|
.. SPDX-License-Identifier: GPL-2.0-only
|
||||||
|
.. Copyright (C) 2022 Red Hat, Inc.
|
||||||
|
|
||||||
|
=====================
|
||||||
|
BPF_MAP_TYPE_LPM_TRIE
|
||||||
|
=====================
|
||||||
|
|
||||||
|
.. note::
|
||||||
|
- ``BPF_MAP_TYPE_LPM_TRIE`` was introduced in kernel version 4.11
|
||||||
|
|
||||||
|
``BPF_MAP_TYPE_LPM_TRIE`` provides a longest prefix match algorithm that
|
||||||
|
can be used to match IP addresses to a stored set of prefixes.
|
||||||
|
Internally, data is stored in an unbalanced trie of nodes that uses
|
||||||
|
``prefixlen,data`` pairs as its keys. The ``data`` is interpreted in
|
||||||
|
network byte order, i.e. big endian, so ``data[0]`` stores the most
|
||||||
|
significant byte.
|
||||||
|
|
||||||
|
LPM tries may be created with a maximum prefix length that is a multiple
|
||||||
|
of 8, in the range from 8 to 2048. The key used for lookup and update
|
||||||
|
operations is a ``struct bpf_lpm_trie_key``, extended by
|
||||||
|
``max_prefixlen/8`` bytes.
|
||||||
|
|
||||||
|
- For IPv4 addresses the data length is 4 bytes
|
||||||
|
- For IPv6 addresses the data length is 16 bytes
|
||||||
|
|
||||||
|
The value type stored in the LPM trie can be any user defined type.
|
||||||
|
|
||||||
|
.. note::
|
||||||
|
When creating a map of type ``BPF_MAP_TYPE_LPM_TRIE`` you must set the
|
||||||
|
``BPF_F_NO_PREALLOC`` flag.
|
||||||
|
|
||||||
|
Usage
|
||||||
|
=====
|
||||||
|
|
||||||
|
Kernel BPF
|
||||||
|
----------
|
||||||
|
|
||||||
|
bpf_map_lookup_elem()
|
||||||
|
~~~~~~~~~~~~~~~~~~~~~
|
||||||
|
|
||||||
|
.. code-block:: c
|
||||||
|
|
||||||
|
void *bpf_map_lookup_elem(struct bpf_map *map, const void *key)
|
||||||
|
|
||||||
|
The longest prefix entry for a given data value can be found using the
|
||||||
|
``bpf_map_lookup_elem()`` helper. This helper returns a pointer to the
|
||||||
|
value associated with the longest matching ``key``, or ``NULL`` if no
|
||||||
|
entry was found.
|
||||||
|
|
||||||
|
The ``key`` should have ``prefixlen`` set to ``max_prefixlen`` when
|
||||||
|
performing longest prefix lookups. For example, when searching for the
|
||||||
|
longest prefix match for an IPv4 address, ``prefixlen`` should be set to
|
||||||
|
``32``.
|
||||||
|
|
||||||
|
bpf_map_update_elem()
|
||||||
|
~~~~~~~~~~~~~~~~~~~~~
|
||||||
|
|
||||||
|
.. code-block:: c
|
||||||
|
|
||||||
|
long bpf_map_update_elem(struct bpf_map *map, const void *key, const void *value, u64 flags)
|
||||||
|
|
||||||
|
Prefix entries can be added or updated using the ``bpf_map_update_elem()``
|
||||||
|
helper. This helper replaces existing elements atomically.
|
||||||
|
|
||||||
|
``bpf_map_update_elem()`` returns ``0`` on success, or negative error in
|
||||||
|
case of failure.
|
||||||
|
|
||||||
|
.. note::
|
||||||
|
The flags parameter must be one of BPF_ANY, BPF_NOEXIST or BPF_EXIST,
|
||||||
|
but the value is ignored, giving BPF_ANY semantics.
|
||||||
|
|
||||||
|
bpf_map_delete_elem()
|
||||||
|
~~~~~~~~~~~~~~~~~~~~~
|
||||||
|
|
||||||
|
.. code-block:: c
|
||||||
|
|
||||||
|
long bpf_map_delete_elem(struct bpf_map *map, const void *key)
|
||||||
|
|
||||||
|
Prefix entries can be deleted using the ``bpf_map_delete_elem()``
|
||||||
|
helper. This helper will return 0 on success, or negative error in case
|
||||||
|
of failure.
|
||||||
|
|
||||||
|
Userspace
|
||||||
|
---------
|
||||||
|
|
||||||
|
Access from userspace uses libbpf APIs with the same names as above, with
|
||||||
|
the map identified by ``fd``.
|
||||||
|
|
||||||
|
bpf_map_get_next_key()
|
||||||
|
~~~~~~~~~~~~~~~~~~~~~~
|
||||||
|
|
||||||
|
.. code-block:: c
|
||||||
|
|
||||||
|
int bpf_map_get_next_key (int fd, const void *cur_key, void *next_key)
|
||||||
|
|
||||||
|
A userspace program can iterate through the entries in an LPM trie using
|
||||||
|
libbpf's ``bpf_map_get_next_key()`` function. The first key can be
|
||||||
|
fetched by calling ``bpf_map_get_next_key()`` with ``cur_key`` set to
|
||||||
|
``NULL``. Subsequent calls will fetch the next key that follows the
|
||||||
|
current key. ``bpf_map_get_next_key()`` returns ``0`` on success,
|
||||||
|
``-ENOENT`` if ``cur_key`` is the last key in the trie, or negative
|
||||||
|
error in case of failure.
|
||||||
|
|
||||||
|
``bpf_map_get_next_key()`` will iterate through the LPM trie elements
|
||||||
|
from leftmost leaf first. This means that iteration will return more
|
||||||
|
specific keys before less specific ones.
|
||||||
|
|
||||||
|
Examples
|
||||||
|
========
|
||||||
|
|
||||||
|
Please see ``tools/testing/selftests/bpf/test_lpm_map.c`` for examples
|
||||||
|
of LPM trie usage from userspace. The code snippets below demonstrate
|
||||||
|
API usage.
|
||||||
|
|
||||||
|
Kernel BPF
|
||||||
|
----------
|
||||||
|
|
||||||
|
The following BPF code snippet shows how to declare a new LPM trie for IPv4
|
||||||
|
address prefixes:
|
||||||
|
|
||||||
|
.. code-block:: c
|
||||||
|
|
||||||
|
#include <linux/bpf.h>
|
||||||
|
#include <bpf/bpf_helpers.h>
|
||||||
|
|
||||||
|
struct ipv4_lpm_key {
|
||||||
|
__u32 prefixlen;
|
||||||
|
__u32 data;
|
||||||
|
};
|
||||||
|
|
||||||
|
struct {
|
||||||
|
__uint(type, BPF_MAP_TYPE_LPM_TRIE);
|
||||||
|
__type(key, struct ipv4_lpm_key);
|
||||||
|
__type(value, __u32);
|
||||||
|
__uint(map_flags, BPF_F_NO_PREALLOC);
|
||||||
|
__uint(max_entries, 255);
|
||||||
|
} ipv4_lpm_map SEC(".maps");
|
||||||
|
|
||||||
|
The following BPF code snippet shows how to lookup by IPv4 address:
|
||||||
|
|
||||||
|
.. code-block:: c
|
||||||
|
|
||||||
|
void *lookup(__u32 ipaddr)
|
||||||
|
{
|
||||||
|
struct ipv4_lpm_key key = {
|
||||||
|
.prefixlen = 32,
|
||||||
|
.data = ipaddr
|
||||||
|
};
|
||||||
|
|
||||||
|
return bpf_map_lookup_elem(&ipv4_lpm_map, &key);
|
||||||
|
}
|
||||||
|
|
||||||
|
Userspace
|
||||||
|
---------
|
||||||
|
|
||||||
|
The following snippet shows how to insert an IPv4 prefix entry into an
|
||||||
|
LPM trie:
|
||||||
|
|
||||||
|
.. code-block:: c
|
||||||
|
|
||||||
|
int add_prefix_entry(int lpm_fd, __u32 addr, __u32 prefixlen, struct value *value)
|
||||||
|
{
|
||||||
|
struct ipv4_lpm_key ipv4_key = {
|
||||||
|
.prefixlen = prefixlen,
|
||||||
|
.data = addr
|
||||||
|
};
|
||||||
|
return bpf_map_update_elem(lpm_fd, &ipv4_key, value, BPF_ANY);
|
||||||
|
}
|
||||||
|
|
||||||
|
The following snippet shows a userspace program walking through the entries
|
||||||
|
of an LPM trie:
|
||||||
|
|
||||||
|
|
||||||
|
.. code-block:: c
|
||||||
|
|
||||||
|
#include <bpf/libbpf.h>
|
||||||
|
#include <bpf/bpf.h>
|
||||||
|
|
||||||
|
void iterate_lpm_trie(int map_fd)
|
||||||
|
{
|
||||||
|
struct ipv4_lpm_key *cur_key = NULL;
|
||||||
|
struct ipv4_lpm_key next_key;
|
||||||
|
struct value value;
|
||||||
|
int err;
|
||||||
|
|
||||||
|
for (;;) {
|
||||||
|
err = bpf_map_get_next_key(map_fd, cur_key, &next_key);
|
||||||
|
if (err)
|
||||||
|
break;
|
||||||
|
|
||||||
|
bpf_map_lookup_elem(map_fd, &next_key, &value);
|
||||||
|
|
||||||
|
/* Use key and value here */
|
||||||
|
|
||||||
|
cur_key = &next_key;
|
||||||
|
}
|
||||||
|
}
|
130
Documentation/bpf/map_of_maps.rst
Normal file
130
Documentation/bpf/map_of_maps.rst
Normal file
|
@ -0,0 +1,130 @@
|
||||||
|
.. SPDX-License-Identifier: GPL-2.0-only
|
||||||
|
.. Copyright (C) 2022 Red Hat, Inc.
|
||||||
|
|
||||||
|
========================================================
|
||||||
|
BPF_MAP_TYPE_ARRAY_OF_MAPS and BPF_MAP_TYPE_HASH_OF_MAPS
|
||||||
|
========================================================
|
||||||
|
|
||||||
|
.. note::
|
||||||
|
- ``BPF_MAP_TYPE_ARRAY_OF_MAPS`` and ``BPF_MAP_TYPE_HASH_OF_MAPS`` were
|
||||||
|
introduced in kernel version 4.12
|
||||||
|
|
||||||
|
``BPF_MAP_TYPE_ARRAY_OF_MAPS`` and ``BPF_MAP_TYPE_HASH_OF_MAPS`` provide general
|
||||||
|
purpose support for map in map storage. One level of nesting is supported, where
|
||||||
|
an outer map contains instances of a single type of inner map, for example
|
||||||
|
``array_of_maps->sock_map``.
|
||||||
|
|
||||||
|
When creating an outer map, an inner map instance is used to initialize the
|
||||||
|
metadata that the outer map holds about its inner maps. This inner map has a
|
||||||
|
separate lifetime from the outer map and can be deleted after the outer map has
|
||||||
|
been created.
|
||||||
|
|
||||||
|
The outer map supports element lookup, update and delete from user space using
|
||||||
|
the syscall API. A BPF program is only allowed to do element lookup in the outer
|
||||||
|
map.
|
||||||
|
|
||||||
|
.. note::
|
||||||
|
- Multi-level nesting is not supported.
|
||||||
|
- Any BPF map type can be used as an inner map, except for
|
||||||
|
``BPF_MAP_TYPE_PROG_ARRAY``.
|
||||||
|
- A BPF program cannot update or delete outer map entries.
|
||||||
|
|
||||||
|
For ``BPF_MAP_TYPE_ARRAY_OF_MAPS`` the key is an unsigned 32-bit integer index
|
||||||
|
into the array. The array is a fixed size with ``max_entries`` elements that are
|
||||||
|
zero initialized when created.
|
||||||
|
|
||||||
|
For ``BPF_MAP_TYPE_HASH_OF_MAPS`` the key type can be chosen when defining the
|
||||||
|
map. The kernel is responsible for allocating and freeing key/value pairs, up to
|
||||||
|
the max_entries limit that you specify. Hash maps use pre-allocation of hash
|
||||||
|
table elements by default. The ``BPF_F_NO_PREALLOC`` flag can be used to disable
|
||||||
|
pre-allocation when it is too memory expensive.
|
||||||
|
|
||||||
|
Usage
|
||||||
|
=====
|
||||||
|
|
||||||
|
Kernel BPF Helper
|
||||||
|
-----------------
|
||||||
|
|
||||||
|
bpf_map_lookup_elem()
|
||||||
|
~~~~~~~~~~~~~~~~~~~~~
|
||||||
|
|
||||||
|
.. code-block:: c
|
||||||
|
|
||||||
|
void *bpf_map_lookup_elem(struct bpf_map *map, const void *key)
|
||||||
|
|
||||||
|
Inner maps can be retrieved using the ``bpf_map_lookup_elem()`` helper. This
|
||||||
|
helper returns a pointer to the inner map, or ``NULL`` if no entry was found.
|
||||||
|
|
||||||
|
Examples
|
||||||
|
========
|
||||||
|
|
||||||
|
Kernel BPF Example
|
||||||
|
------------------
|
||||||
|
|
||||||
|
This snippet shows how to create and initialise an array of devmaps in a BPF
|
||||||
|
program. Note that the outer array can only be modified from user space using
|
||||||
|
the syscall API.
|
||||||
|
|
||||||
|
.. code-block:: c
|
||||||
|
|
||||||
|
struct inner_map {
|
||||||
|
__uint(type, BPF_MAP_TYPE_DEVMAP);
|
||||||
|
__uint(max_entries, 10);
|
||||||
|
__type(key, __u32);
|
||||||
|
__type(value, __u32);
|
||||||
|
} inner_map1 SEC(".maps"), inner_map2 SEC(".maps");
|
||||||
|
|
||||||
|
struct {
|
||||||
|
__uint(type, BPF_MAP_TYPE_ARRAY_OF_MAPS);
|
||||||
|
__uint(max_entries, 2);
|
||||||
|
__type(key, __u32);
|
||||||
|
__array(values, struct inner_map);
|
||||||
|
} outer_map SEC(".maps") = {
|
||||||
|
.values = { &inner_map1,
|
||||||
|
&inner_map2 }
|
||||||
|
};
|
||||||
|
|
||||||
|
See ``progs/test_btf_map_in_map.c`` in ``tools/testing/selftests/bpf`` for more
|
||||||
|
examples of declarative initialisation of outer maps.
|
||||||
|
|
||||||
|
User Space
|
||||||
|
----------
|
||||||
|
|
||||||
|
This snippet shows how to create an array based outer map:
|
||||||
|
|
||||||
|
.. code-block:: c
|
||||||
|
|
||||||
|
int create_outer_array(int inner_fd) {
|
||||||
|
LIBBPF_OPTS(bpf_map_create_opts, opts, .inner_map_fd = inner_fd);
|
||||||
|
int fd;
|
||||||
|
|
||||||
|
fd = bpf_map_create(BPF_MAP_TYPE_ARRAY_OF_MAPS,
|
||||||
|
"example_array", /* name */
|
||||||
|
sizeof(__u32), /* key size */
|
||||||
|
sizeof(__u32), /* value size */
|
||||||
|
256, /* max entries */
|
||||||
|
&opts); /* create opts */
|
||||||
|
return fd;
|
||||||
|
}
|
||||||
|
|
||||||
|
|
||||||
|
This snippet shows how to add an inner map to an outer map:
|
||||||
|
|
||||||
|
.. code-block:: c
|
||||||
|
|
||||||
|
int add_devmap(int outer_fd, int index, const char *name) {
|
||||||
|
int fd;
|
||||||
|
|
||||||
|
fd = bpf_map_create(BPF_MAP_TYPE_DEVMAP, name,
|
||||||
|
sizeof(__u32), sizeof(__u32), 256, NULL);
|
||||||
|
if (fd < 0)
|
||||||
|
return fd;
|
||||||
|
|
||||||
|
return bpf_map_update_elem(outer_fd, &index, &fd, BPF_ANY);
|
||||||
|
}
|
||||||
|
|
||||||
|
References
|
||||||
|
==========
|
||||||
|
|
||||||
|
- https://lore.kernel.org/netdev/20170322170035.923581-3-kafai@fb.com/
|
||||||
|
- https://lore.kernel.org/netdev/20170322170035.923581-4-kafai@fb.com/
|
146
Documentation/bpf/map_queue_stack.rst
Normal file
146
Documentation/bpf/map_queue_stack.rst
Normal file
|
@ -0,0 +1,146 @@
|
||||||
|
.. SPDX-License-Identifier: GPL-2.0-only
|
||||||
|
.. Copyright (C) 2022 Red Hat, Inc.
|
||||||
|
|
||||||
|
=========================================
|
||||||
|
BPF_MAP_TYPE_QUEUE and BPF_MAP_TYPE_STACK
|
||||||
|
=========================================
|
||||||
|
|
||||||
|
.. note::
|
||||||
|
- ``BPF_MAP_TYPE_QUEUE`` and ``BPF_MAP_TYPE_STACK`` were introduced
|
||||||
|
in kernel version 4.20
|
||||||
|
|
||||||
|
``BPF_MAP_TYPE_QUEUE`` provides FIFO storage and ``BPF_MAP_TYPE_STACK``
|
||||||
|
provides LIFO storage for BPF programs. These maps support peek, pop and
|
||||||
|
push operations that are exposed to BPF programs through the respective
|
||||||
|
helpers. These operations are exposed to userspace applications using
|
||||||
|
the existing ``bpf`` syscall in the following way:
|
||||||
|
|
||||||
|
- ``BPF_MAP_LOOKUP_ELEM`` -> peek
|
||||||
|
- ``BPF_MAP_LOOKUP_AND_DELETE_ELEM`` -> pop
|
||||||
|
- ``BPF_MAP_UPDATE_ELEM`` -> push
|
||||||
|
|
||||||
|
``BPF_MAP_TYPE_QUEUE`` and ``BPF_MAP_TYPE_STACK`` do not support
|
||||||
|
``BPF_F_NO_PREALLOC``.
|
||||||
|
|
||||||
|
Usage
|
||||||
|
=====
|
||||||
|
|
||||||
|
Kernel BPF
|
||||||
|
----------
|
||||||
|
|
||||||
|
bpf_map_push_elem()
|
||||||
|
~~~~~~~~~~~~~~~~~~~
|
||||||
|
|
||||||
|
.. code-block:: c
|
||||||
|
|
||||||
|
long bpf_map_push_elem(struct bpf_map *map, const void *value, u64 flags)
|
||||||
|
|
||||||
|
An element ``value`` can be added to a queue or stack using the
|
||||||
|
``bpf_map_push_elem`` helper. The ``flags`` parameter must be set to
|
||||||
|
``BPF_ANY`` or ``BPF_EXIST``. If ``flags`` is set to ``BPF_EXIST`` then,
|
||||||
|
when the queue or stack is full, the oldest element will be removed to
|
||||||
|
make room for ``value`` to be added. Returns ``0`` on success, or
|
||||||
|
negative error in case of failure.
|
||||||
|
|
||||||
|
bpf_map_peek_elem()
|
||||||
|
~~~~~~~~~~~~~~~~~~~
|
||||||
|
|
||||||
|
.. code-block:: c
|
||||||
|
|
||||||
|
long bpf_map_peek_elem(struct bpf_map *map, void *value)
|
||||||
|
|
||||||
|
This helper fetches an element ``value`` from a queue or stack without
|
||||||
|
removing it. Returns ``0`` on success, or negative error in case of
|
||||||
|
failure.
|
||||||
|
|
||||||
|
bpf_map_pop_elem()
|
||||||
|
~~~~~~~~~~~~~~~~~~
|
||||||
|
|
||||||
|
.. code-block:: c
|
||||||
|
|
||||||
|
long bpf_map_pop_elem(struct bpf_map *map, void *value)
|
||||||
|
|
||||||
|
This helper removes an element into ``value`` from a queue or
|
||||||
|
stack. Returns ``0`` on success, or negative error in case of failure.
|
||||||
|
|
||||||
|
|
||||||
|
Userspace
|
||||||
|
---------
|
||||||
|
|
||||||
|
bpf_map_update_elem()
|
||||||
|
~~~~~~~~~~~~~~~~~~~~~
|
||||||
|
|
||||||
|
.. code-block:: c
|
||||||
|
|
||||||
|
int bpf_map_update_elem (int fd, const void *key, const void *value, __u64 flags)
|
||||||
|
|
||||||
|
A userspace program can push ``value`` onto a queue or stack using libbpf's
|
||||||
|
``bpf_map_update_elem`` function. The ``key`` parameter must be set to
|
||||||
|
``NULL`` and ``flags`` must be set to ``BPF_ANY`` or ``BPF_EXIST``, with the
|
||||||
|
same semantics as the ``bpf_map_push_elem`` kernel helper. Returns ``0`` on
|
||||||
|
success, or negative error in case of failure.
|
||||||
|
|
||||||
|
bpf_map_lookup_elem()
|
||||||
|
~~~~~~~~~~~~~~~~~~~~~
|
||||||
|
|
||||||
|
.. code-block:: c
|
||||||
|
|
||||||
|
int bpf_map_lookup_elem (int fd, const void *key, void *value)
|
||||||
|
|
||||||
|
A userspace program can peek at the ``value`` at the head of a queue or stack
|
||||||
|
using the libbpf ``bpf_map_lookup_elem`` function. The ``key`` parameter must be
|
||||||
|
set to ``NULL``. Returns ``0`` on success, or negative error in case of
|
||||||
|
failure.
|
||||||
|
|
||||||
|
bpf_map_lookup_and_delete_elem()
|
||||||
|
~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~
|
||||||
|
|
||||||
|
.. code-block:: c
|
||||||
|
|
||||||
|
int bpf_map_lookup_and_delete_elem (int fd, const void *key, void *value)
|
||||||
|
|
||||||
|
A userspace program can pop a ``value`` from the head of a queue or stack using
|
||||||
|
the libbpf ``bpf_map_lookup_and_delete_elem`` function. The ``key`` parameter
|
||||||
|
must be set to ``NULL``. Returns ``0`` on success, or negative error in case of
|
||||||
|
failure.
|
||||||
|
|
||||||
|
Examples
|
||||||
|
========
|
||||||
|
|
||||||
|
Kernel BPF
|
||||||
|
----------
|
||||||
|
|
||||||
|
This snippet shows how to declare a queue in a BPF program:
|
||||||
|
|
||||||
|
.. code-block:: c
|
||||||
|
|
||||||
|
struct {
|
||||||
|
__uint(type, BPF_MAP_TYPE_QUEUE);
|
||||||
|
__type(value, __u32);
|
||||||
|
__uint(max_entries, 10);
|
||||||
|
} queue SEC(".maps");
|
||||||
|
|
||||||
|
|
||||||
|
Userspace
|
||||||
|
---------
|
||||||
|
|
||||||
|
This snippet shows how to use libbpf's low-level API to create a queue from
|
||||||
|
userspace:
|
||||||
|
|
||||||
|
.. code-block:: c
|
||||||
|
|
||||||
|
int create_queue()
|
||||||
|
{
|
||||||
|
return bpf_map_create(BPF_MAP_TYPE_QUEUE,
|
||||||
|
"sample_queue", /* name */
|
||||||
|
0, /* key size, must be zero */
|
||||||
|
sizeof(__u32), /* value size */
|
||||||
|
10, /* max entries */
|
||||||
|
NULL); /* create options */
|
||||||
|
}
|
||||||
|
|
||||||
|
|
||||||
|
References
|
||||||
|
==========
|
||||||
|
|
||||||
|
https://lwn.net/ml/netdev/153986858555.9127.14517764371945179514.stgit@kernel/
|
155
Documentation/bpf/map_sk_storage.rst
Normal file
155
Documentation/bpf/map_sk_storage.rst
Normal file
|
@ -0,0 +1,155 @@
|
||||||
|
.. SPDX-License-Identifier: GPL-2.0-only
|
||||||
|
.. Copyright (C) 2022 Red Hat, Inc.
|
||||||
|
|
||||||
|
=======================
|
||||||
|
BPF_MAP_TYPE_SK_STORAGE
|
||||||
|
=======================
|
||||||
|
|
||||||
|
.. note::
|
||||||
|
- ``BPF_MAP_TYPE_SK_STORAGE`` was introduced in kernel version 5.2
|
||||||
|
|
||||||
|
``BPF_MAP_TYPE_SK_STORAGE`` is used to provide socket-local storage for BPF
|
||||||
|
programs. A map of type ``BPF_MAP_TYPE_SK_STORAGE`` declares the type of storage
|
||||||
|
to be provided and acts as the handle for accessing the socket-local
|
||||||
|
storage. The values for maps of type ``BPF_MAP_TYPE_SK_STORAGE`` are stored
|
||||||
|
locally with each socket instead of with the map. The kernel is responsible for
|
||||||
|
allocating storage for a socket when requested and for freeing the storage when
|
||||||
|
either the map or the socket is deleted.
|
||||||
|
|
||||||
|
.. note::
|
||||||
|
- The key type must be ``int`` and ``max_entries`` must be set to ``0``.
|
||||||
|
- The ``BPF_F_NO_PREALLOC`` flag must be used when creating a map for
|
||||||
|
socket-local storage.
|
||||||
|
|
||||||
|
Usage
|
||||||
|
=====
|
||||||
|
|
||||||
|
Kernel BPF
|
||||||
|
----------
|
||||||
|
|
||||||
|
bpf_sk_storage_get()
|
||||||
|
~~~~~~~~~~~~~~~~~~~~
|
||||||
|
|
||||||
|
.. code-block:: c
|
||||||
|
|
||||||
|
void *bpf_sk_storage_get(struct bpf_map *map, void *sk, void *value, u64 flags)
|
||||||
|
|
||||||
|
Socket-local storage can be retrieved using the ``bpf_sk_storage_get()``
|
||||||
|
helper. The helper gets the storage from ``sk`` that is associated with ``map``.
|
||||||
|
If the ``BPF_LOCAL_STORAGE_GET_F_CREATE`` flag is used then
|
||||||
|
``bpf_sk_storage_get()`` will create the storage for ``sk`` if it does not
|
||||||
|
already exist. ``value`` can be used together with
|
||||||
|
``BPF_LOCAL_STORAGE_GET_F_CREATE`` to initialize the storage value, otherwise it
|
||||||
|
will be zero initialized. Returns a pointer to the storage on success, or
|
||||||
|
``NULL`` in case of failure.
|
||||||
|
|
||||||
|
.. note::
|
||||||
|
- ``sk`` is a kernel ``struct sock`` pointer for LSM or tracing programs.
|
||||||
|
- ``sk`` is a ``struct bpf_sock`` pointer for other program types.
|
||||||
|
|
||||||
|
bpf_sk_storage_delete()
|
||||||
|
~~~~~~~~~~~~~~~~~~~~~~~
|
||||||
|
|
||||||
|
.. code-block:: c
|
||||||
|
|
||||||
|
long bpf_sk_storage_delete(struct bpf_map *map, void *sk)
|
||||||
|
|
||||||
|
Socket-local storage can be deleted using the ``bpf_sk_storage_delete()``
|
||||||
|
helper. The helper deletes the storage from ``sk`` that is identified by
|
||||||
|
``map``. Returns ``0`` on success, or negative error in case of failure.
|
||||||
|
|
||||||
|
User space
|
||||||
|
----------
|
||||||
|
|
||||||
|
bpf_map_update_elem()
|
||||||
|
~~~~~~~~~~~~~~~~~~~~~
|
||||||
|
|
||||||
|
.. code-block:: c
|
||||||
|
|
||||||
|
int bpf_map_update_elem(int map_fd, const void *key, const void *value, __u64 flags)
|
||||||
|
|
||||||
|
Socket-local storage for the socket identified by ``key`` belonging to
|
||||||
|
``map_fd`` can be added or updated using the ``bpf_map_update_elem()`` libbpf
|
||||||
|
function. ``key`` must be a pointer to a valid ``fd`` in the user space
|
||||||
|
program. The ``flags`` parameter can be used to control the update behaviour:
|
||||||
|
|
||||||
|
- ``BPF_ANY`` will create storage for ``fd`` or update existing storage.
|
||||||
|
- ``BPF_NOEXIST`` will create storage for ``fd`` only if it did not already
|
||||||
|
exist, otherwise the call will fail with ``-EEXIST``.
|
||||||
|
- ``BPF_EXIST`` will update existing storage for ``fd`` if it already exists,
|
||||||
|
otherwise the call will fail with ``-ENOENT``.
|
||||||
|
|
||||||
|
Returns ``0`` on success, or negative error in case of failure.
|
||||||
|
|
||||||
|
bpf_map_lookup_elem()
|
||||||
|
~~~~~~~~~~~~~~~~~~~~~
|
||||||
|
|
||||||
|
.. code-block:: c
|
||||||
|
|
||||||
|
int bpf_map_lookup_elem(int map_fd, const void *key, void *value)
|
||||||
|
|
||||||
|
Socket-local storage for the socket identified by ``key`` belonging to
|
||||||
|
``map_fd`` can be retrieved using the ``bpf_map_lookup_elem()`` libbpf
|
||||||
|
function. ``key`` must be a pointer to a valid ``fd`` in the user space
|
||||||
|
program. Returns ``0`` on success, or negative error in case of failure.
|
||||||
|
|
||||||
|
bpf_map_delete_elem()
|
||||||
|
~~~~~~~~~~~~~~~~~~~~~
|
||||||
|
|
||||||
|
.. code-block:: c
|
||||||
|
|
||||||
|
int bpf_map_delete_elem(int map_fd, const void *key)
|
||||||
|
|
||||||
|
Socket-local storage for the socket identified by ``key`` belonging to
|
||||||
|
``map_fd`` can be deleted using the ``bpf_map_delete_elem()`` libbpf
|
||||||
|
function. Returns ``0`` on success, or negative error in case of failure.
|
||||||
|
|
||||||
|
Examples
|
||||||
|
========
|
||||||
|
|
||||||
|
Kernel BPF
|
||||||
|
----------
|
||||||
|
|
||||||
|
This snippet shows how to declare socket-local storage in a BPF program:
|
||||||
|
|
||||||
|
.. code-block:: c
|
||||||
|
|
||||||
|
struct {
|
||||||
|
__uint(type, BPF_MAP_TYPE_SK_STORAGE);
|
||||||
|
__uint(map_flags, BPF_F_NO_PREALLOC);
|
||||||
|
__type(key, int);
|
||||||
|
__type(value, struct my_storage);
|
||||||
|
} socket_storage SEC(".maps");
|
||||||
|
|
||||||
|
This snippet shows how to retrieve socket-local storage in a BPF program:
|
||||||
|
|
||||||
|
.. code-block:: c
|
||||||
|
|
||||||
|
SEC("sockops")
|
||||||
|
int _sockops(struct bpf_sock_ops *ctx)
|
||||||
|
{
|
||||||
|
struct my_storage *storage;
|
||||||
|
struct bpf_sock *sk;
|
||||||
|
|
||||||
|
sk = ctx->sk;
|
||||||
|
if (!sk)
|
||||||
|
return 1;
|
||||||
|
|
||||||
|
storage = bpf_sk_storage_get(&socket_storage, sk, 0,
|
||||||
|
BPF_LOCAL_STORAGE_GET_F_CREATE);
|
||||||
|
if (!storage)
|
||||||
|
return 1;
|
||||||
|
|
||||||
|
/* Use 'storage' here */
|
||||||
|
|
||||||
|
return 1;
|
||||||
|
}
|
||||||
|
|
||||||
|
|
||||||
|
Please see the ``tools/testing/selftests/bpf`` directory for functional
|
||||||
|
examples.
|
||||||
|
|
||||||
|
References
|
||||||
|
==========
|
||||||
|
|
||||||
|
https://lwn.net/ml/netdev/20190426171103.61892-1-kafai@fb.com/
|
192
Documentation/bpf/map_xskmap.rst
Normal file
192
Documentation/bpf/map_xskmap.rst
Normal file
|
@ -0,0 +1,192 @@
|
||||||
|
.. SPDX-License-Identifier: GPL-2.0-only
|
||||||
|
.. Copyright (C) 2022 Red Hat, Inc.
|
||||||
|
|
||||||
|
===================
|
||||||
|
BPF_MAP_TYPE_XSKMAP
|
||||||
|
===================
|
||||||
|
|
||||||
|
.. note::
|
||||||
|
- ``BPF_MAP_TYPE_XSKMAP`` was introduced in kernel version 4.18
|
||||||
|
|
||||||
|
The ``BPF_MAP_TYPE_XSKMAP`` is used as a backend map for XDP BPF helper
|
||||||
|
call ``bpf_redirect_map()`` and ``XDP_REDIRECT`` action, like 'devmap' and 'cpumap'.
|
||||||
|
This map type redirects raw XDP frames to `AF_XDP`_ sockets (XSKs), a new type of
|
||||||
|
address family in the kernel that allows redirection of frames from a driver to
|
||||||
|
user space without having to traverse the full network stack. An AF_XDP socket
|
||||||
|
binds to a single netdev queue. A mapping of XSKs to queues is shown below:
|
||||||
|
|
||||||
|
.. code-block:: none
|
||||||
|
|
||||||
|
+---------------------------------------------------+
|
||||||
|
| xsk A | xsk B | xsk C |<---+ User space
|
||||||
|
=========================================================|==========
|
||||||
|
| Queue 0 | Queue 1 | Queue 2 | | Kernel
|
||||||
|
+---------------------------------------------------+ |
|
||||||
|
| Netdev eth0 | |
|
||||||
|
+---------------------------------------------------+ |
|
||||||
|
| +=============+ | |
|
||||||
|
| | key | xsk | | |
|
||||||
|
| +---------+ +=============+ | |
|
||||||
|
| | | | 0 | xsk A | | |
|
||||||
|
| | | +-------------+ | |
|
||||||
|
| | | | 1 | xsk B | | |
|
||||||
|
| | BPF |-- redirect -->+-------------+-------------+
|
||||||
|
| | prog | | 2 | xsk C | |
|
||||||
|
| | | +-------------+ |
|
||||||
|
| | | |
|
||||||
|
| | | |
|
||||||
|
| +---------+ |
|
||||||
|
| |
|
||||||
|
+---------------------------------------------------+
|
||||||
|
|
||||||
|
.. note::
|
||||||
|
An AF_XDP socket that is bound to a certain <netdev/queue_id> will *only*
|
||||||
|
accept XDP frames from that <netdev/queue_id>. If an XDP program tries to redirect
|
||||||
|
from a <netdev/queue_id> other than what the socket is bound to, the frame will
|
||||||
|
not be received on the socket.
|
||||||
|
|
||||||
|
Typically an XSKMAP is created per netdev. This map contains an array of XSK File
|
||||||
|
Descriptors (FDs). The number of array elements is typically set or adjusted using
|
||||||
|
the ``max_entries`` map parameter. For AF_XDP ``max_entries`` is equal to the number
|
||||||
|
of queues supported by the netdev.
|
||||||
|
|
||||||
|
.. note::
|
||||||
|
Both the map key and map value size must be 4 bytes.
|
||||||
|
|
||||||
|
Usage
|
||||||
|
=====
|
||||||
|
|
||||||
|
Kernel BPF
|
||||||
|
----------
|
||||||
|
bpf_redirect_map()
|
||||||
|
^^^^^^^^^^^^^^^^^^
|
||||||
|
.. code-block:: c
|
||||||
|
|
||||||
|
long bpf_redirect_map(struct bpf_map *map, u32 key, u64 flags)
|
||||||
|
|
||||||
|
Redirect the packet to the endpoint referenced by ``map`` at index ``key``.
|
||||||
|
For ``BPF_MAP_TYPE_XSKMAP`` this map contains references to XSK FDs
|
||||||
|
for sockets attached to a netdev's queues.
|
||||||
|
|
||||||
|
.. note::
|
||||||
|
If the map is empty at an index, the packet is dropped. This means that it is
|
||||||
|
necessary to have an XDP program loaded with at least one XSK in the
|
||||||
|
XSKMAP to be able to get any traffic to user space through the socket.
|
||||||
|
|
||||||
|
bpf_map_lookup_elem()
|
||||||
|
^^^^^^^^^^^^^^^^^^^^^
|
||||||
|
.. code-block:: c
|
||||||
|
|
||||||
|
void *bpf_map_lookup_elem(struct bpf_map *map, const void *key)
|
||||||
|
|
||||||
|
XSK entry references of type ``struct xdp_sock *`` can be retrieved using the
|
||||||
|
``bpf_map_lookup_elem()`` helper.
|
||||||
|
|
||||||
|
User space
|
||||||
|
----------
|
||||||
|
.. note::
|
||||||
|
XSK entries can only be updated/deleted from user space and not from
|
||||||
|
a BPF program. Trying to call these functions from a kernel BPF program will
|
||||||
|
result in the program failing to load and a verifier warning.
|
||||||
|
|
||||||
|
bpf_map_update_elem()
|
||||||
|
^^^^^^^^^^^^^^^^^^^^^
|
||||||
|
.. code-block:: c
|
||||||
|
|
||||||
|
int bpf_map_update_elem(int fd, const void *key, const void *value, __u64 flags)
|
||||||
|
|
||||||
|
XSK entries can be added or updated using the ``bpf_map_update_elem()``
|
||||||
|
helper. The ``key`` parameter is equal to the queue_id of the queue the XSK
|
||||||
|
is attaching to. And the ``value`` parameter is the FD value of that socket.
|
||||||
|
|
||||||
|
Under the hood, the XSKMAP update function uses the XSK FD value to retrieve the
|
||||||
|
associated ``struct xdp_sock`` instance.
|
||||||
|
|
||||||
|
The flags argument can be one of the following:
|
||||||
|
|
||||||
|
- BPF_ANY: Create a new element or update an existing element.
|
||||||
|
- BPF_NOEXIST: Create a new element only if it did not exist.
|
||||||
|
- BPF_EXIST: Update an existing element.
|
||||||
|
|
||||||
|
bpf_map_lookup_elem()
|
||||||
|
^^^^^^^^^^^^^^^^^^^^^
|
||||||
|
.. code-block:: c
|
||||||
|
|
||||||
|
int bpf_map_lookup_elem(int fd, const void *key, void *value)
|
||||||
|
|
||||||
|
Returns ``struct xdp_sock *`` or negative error in case of failure.
|
||||||
|
|
||||||
|
bpf_map_delete_elem()
|
||||||
|
^^^^^^^^^^^^^^^^^^^^^
|
||||||
|
.. code-block:: c
|
||||||
|
|
||||||
|
int bpf_map_delete_elem(int fd, const void *key)
|
||||||
|
|
||||||
|
XSK entries can be deleted using the ``bpf_map_delete_elem()``
|
||||||
|
helper. This helper will return 0 on success, or negative error in case of
|
||||||
|
failure.
|
||||||
|
|
||||||
|
.. note::
|
||||||
|
When `libxdp`_ deletes an XSK it also removes the associated socket
|
||||||
|
entry from the XSKMAP.
|
||||||
|
|
||||||
|
Examples
|
||||||
|
========
|
||||||
|
Kernel
|
||||||
|
------
|
||||||
|
|
||||||
|
The following code snippet shows how to declare a ``BPF_MAP_TYPE_XSKMAP`` called
|
||||||
|
``xsks_map`` and how to redirect packets to an XSK.
|
||||||
|
|
||||||
|
.. code-block:: c
|
||||||
|
|
||||||
|
struct {
|
||||||
|
__uint(type, BPF_MAP_TYPE_XSKMAP);
|
||||||
|
__type(key, __u32);
|
||||||
|
__type(value, __u32);
|
||||||
|
__uint(max_entries, 64);
|
||||||
|
} xsks_map SEC(".maps");
|
||||||
|
|
||||||
|
|
||||||
|
SEC("xdp")
|
||||||
|
int xsk_redir_prog(struct xdp_md *ctx)
|
||||||
|
{
|
||||||
|
__u32 index = ctx->rx_queue_index;
|
||||||
|
|
||||||
|
if (bpf_map_lookup_elem(&xsks_map, &index))
|
||||||
|
return bpf_redirect_map(&xsks_map, index, 0);
|
||||||
|
return XDP_PASS;
|
||||||
|
}
|
||||||
|
|
||||||
|
User space
|
||||||
|
----------
|
||||||
|
|
||||||
|
The following code snippet shows how to update an XSKMAP with an XSK entry.
|
||||||
|
|
||||||
|
.. code-block:: c
|
||||||
|
|
||||||
|
int update_xsks_map(struct bpf_map *xsks_map, int queue_id, int xsk_fd)
|
||||||
|
{
|
||||||
|
int ret;
|
||||||
|
|
||||||
|
ret = bpf_map_update_elem(bpf_map__fd(xsks_map), &queue_id, &xsk_fd, 0);
|
||||||
|
if (ret < 0)
|
||||||
|
fprintf(stderr, "Failed to update xsks_map: %s\n", strerror(errno));
|
||||||
|
|
||||||
|
return ret;
|
||||||
|
}
|
||||||
|
|
||||||
|
For an example on how create AF_XDP sockets, please see the AF_XDP-example and
|
||||||
|
AF_XDP-forwarding programs in the `bpf-examples`_ directory in the `libxdp`_ repository.
|
||||||
|
For a detailed explaination of the AF_XDP interface please see:
|
||||||
|
|
||||||
|
- `libxdp-readme`_.
|
||||||
|
- `AF_XDP`_ kernel documentation.
|
||||||
|
|
||||||
|
.. note::
|
||||||
|
The most comprehensive resource for using XSKMAPs and AF_XDP is `libxdp`_.
|
||||||
|
|
||||||
|
.. _libxdp: https://github.com/xdp-project/xdp-tools/tree/master/lib/libxdp
|
||||||
|
.. _AF_XDP: https://www.kernel.org/doc/html/latest/networking/af_xdp.html
|
||||||
|
.. _bpf-examples: https://github.com/xdp-project/bpf-examples
|
||||||
|
.. _libxdp-readme: https://github.com/xdp-project/xdp-tools/tree/master/lib/libxdp#using-af_xdp-sockets
|
|
@ -1,46 +1,19 @@
|
||||||
|
|
||||||
=========
|
========
|
||||||
eBPF maps
|
BPF maps
|
||||||
=========
|
========
|
||||||
|
|
||||||
'maps' is a generic storage of different types for sharing data between kernel
|
BPF 'maps' provide generic storage of different types for sharing data between
|
||||||
and userspace.
|
kernel and user space. There are several storage types available, including
|
||||||
|
hash, array, bloom filter and radix-tree. Several of the map types exist to
|
||||||
|
support specific BPF helpers that perform actions based on the map contents. The
|
||||||
|
maps are accessed from BPF programs via BPF helpers which are documented in the
|
||||||
|
`man-pages`_ for `bpf-helpers(7)`_.
|
||||||
|
|
||||||
The maps are accessed from user space via BPF syscall, which has commands:
|
BPF maps are accessed from user space via the ``bpf`` syscall, which provides
|
||||||
|
commands to create maps, lookup elements, update elements and delete
|
||||||
- create a map with given type and attributes
|
elements. More details of the BPF syscall are available in
|
||||||
``map_fd = bpf(BPF_MAP_CREATE, union bpf_attr *attr, u32 size)``
|
:doc:`/userspace-api/ebpf/syscall` and in the `man-pages`_ for `bpf(2)`_.
|
||||||
using attr->map_type, attr->key_size, attr->value_size, attr->max_entries
|
|
||||||
returns process-local file descriptor or negative error
|
|
||||||
|
|
||||||
- lookup key in a given map
|
|
||||||
``err = bpf(BPF_MAP_LOOKUP_ELEM, union bpf_attr *attr, u32 size)``
|
|
||||||
using attr->map_fd, attr->key, attr->value
|
|
||||||
returns zero and stores found elem into value or negative error
|
|
||||||
|
|
||||||
- create or update key/value pair in a given map
|
|
||||||
``err = bpf(BPF_MAP_UPDATE_ELEM, union bpf_attr *attr, u32 size)``
|
|
||||||
using attr->map_fd, attr->key, attr->value
|
|
||||||
returns zero or negative error
|
|
||||||
|
|
||||||
- find and delete element by key in a given map
|
|
||||||
``err = bpf(BPF_MAP_DELETE_ELEM, union bpf_attr *attr, u32 size)``
|
|
||||||
using attr->map_fd, attr->key
|
|
||||||
|
|
||||||
- to delete map: close(fd)
|
|
||||||
Exiting process will delete maps automatically
|
|
||||||
|
|
||||||
userspace programs use this syscall to create/access maps that eBPF programs
|
|
||||||
are concurrently updating.
|
|
||||||
|
|
||||||
maps can have different types: hash, array, bloom filter, radix-tree, etc.
|
|
||||||
|
|
||||||
The map is defined by:
|
|
||||||
|
|
||||||
- type
|
|
||||||
- max number of elements
|
|
||||||
- key size in bytes
|
|
||||||
- value size in bytes
|
|
||||||
|
|
||||||
Map Types
|
Map Types
|
||||||
=========
|
=========
|
||||||
|
@ -49,4 +22,60 @@ Map Types
|
||||||
:maxdepth: 1
|
:maxdepth: 1
|
||||||
:glob:
|
:glob:
|
||||||
|
|
||||||
map_*
|
map_*
|
||||||
|
|
||||||
|
Usage Notes
|
||||||
|
===========
|
||||||
|
|
||||||
|
.. c:function::
|
||||||
|
int bpf(int command, union bpf_attr *attr, u32 size)
|
||||||
|
|
||||||
|
Use the ``bpf()`` system call to perform the operation specified by
|
||||||
|
``command``. The operation takes parameters provided in ``attr``. The ``size``
|
||||||
|
argument is the size of the ``union bpf_attr`` in ``attr``.
|
||||||
|
|
||||||
|
**BPF_MAP_CREATE**
|
||||||
|
|
||||||
|
Create a map with the desired type and attributes in ``attr``:
|
||||||
|
|
||||||
|
.. code-block:: c
|
||||||
|
|
||||||
|
int fd;
|
||||||
|
union bpf_attr attr = {
|
||||||
|
.map_type = BPF_MAP_TYPE_ARRAY; /* mandatory */
|
||||||
|
.key_size = sizeof(__u32); /* mandatory */
|
||||||
|
.value_size = sizeof(__u32); /* mandatory */
|
||||||
|
.max_entries = 256; /* mandatory */
|
||||||
|
.map_flags = BPF_F_MMAPABLE;
|
||||||
|
.map_name = "example_array";
|
||||||
|
};
|
||||||
|
|
||||||
|
fd = bpf(BPF_MAP_CREATE, &attr, sizeof(attr));
|
||||||
|
|
||||||
|
Returns a process-local file descriptor on success, or negative error in case of
|
||||||
|
failure. The map can be deleted by calling ``close(fd)``. Maps held by open
|
||||||
|
file descriptors will be deleted automatically when a process exits.
|
||||||
|
|
||||||
|
.. note:: Valid characters for ``map_name`` are ``A-Z``, ``a-z``, ``0-9``,
|
||||||
|
``'_'`` and ``'.'``.
|
||||||
|
|
||||||
|
**BPF_MAP_LOOKUP_ELEM**
|
||||||
|
|
||||||
|
Lookup key in a given map using ``attr->map_fd``, ``attr->key``,
|
||||||
|
``attr->value``. Returns zero and stores found elem into ``attr->value`` on
|
||||||
|
success, or negative error on failure.
|
||||||
|
|
||||||
|
**BPF_MAP_UPDATE_ELEM**
|
||||||
|
|
||||||
|
Create or update key/value pair in a given map using ``attr->map_fd``, ``attr->key``,
|
||||||
|
``attr->value``. Returns zero on success or negative error on failure.
|
||||||
|
|
||||||
|
**BPF_MAP_DELETE_ELEM**
|
||||||
|
|
||||||
|
Find and delete element by key in a given map using ``attr->map_fd``,
|
||||||
|
``attr->key``. Returns zero on success or negative error on failure.
|
||||||
|
|
||||||
|
.. Links:
|
||||||
|
.. _man-pages: https://www.kernel.org/doc/man-pages/
|
||||||
|
.. _bpf(2): https://man7.org/linux/man-pages/man2/bpf.2.html
|
||||||
|
.. _bpf-helpers(7): https://man7.org/linux/man-pages/man7/bpf-helpers.7.html
|
||||||
|
|
|
@ -7,3 +7,6 @@ Program Types
|
||||||
:glob:
|
:glob:
|
||||||
|
|
||||||
prog_*
|
prog_*
|
||||||
|
|
||||||
|
For a list of all program types, see :ref:`program_types_and_elf` in
|
||||||
|
the :ref:`libbpf` documentation.
|
||||||
|
|
81
Documentation/bpf/redirect.rst
Normal file
81
Documentation/bpf/redirect.rst
Normal file
|
@ -0,0 +1,81 @@
|
||||||
|
.. SPDX-License-Identifier: GPL-2.0-only
|
||||||
|
.. Copyright (C) 2022 Red Hat, Inc.
|
||||||
|
|
||||||
|
========
|
||||||
|
Redirect
|
||||||
|
========
|
||||||
|
XDP_REDIRECT
|
||||||
|
############
|
||||||
|
Supported maps
|
||||||
|
--------------
|
||||||
|
|
||||||
|
XDP_REDIRECT works with the following map types:
|
||||||
|
|
||||||
|
- ``BPF_MAP_TYPE_DEVMAP``
|
||||||
|
- ``BPF_MAP_TYPE_DEVMAP_HASH``
|
||||||
|
- ``BPF_MAP_TYPE_CPUMAP``
|
||||||
|
- ``BPF_MAP_TYPE_XSKMAP``
|
||||||
|
|
||||||
|
For more information on these maps, please see the specific map documentation.
|
||||||
|
|
||||||
|
Process
|
||||||
|
-------
|
||||||
|
|
||||||
|
.. kernel-doc:: net/core/filter.c
|
||||||
|
:doc: xdp redirect
|
||||||
|
|
||||||
|
.. note::
|
||||||
|
Not all drivers support transmitting frames after a redirect, and for
|
||||||
|
those that do, not all of them support non-linear frames. Non-linear xdp
|
||||||
|
bufs/frames are bufs/frames that contain more than one fragment.
|
||||||
|
|
||||||
|
Debugging packet drops
|
||||||
|
----------------------
|
||||||
|
Silent packet drops for XDP_REDIRECT can be debugged using:
|
||||||
|
|
||||||
|
- bpf_trace
|
||||||
|
- perf_record
|
||||||
|
|
||||||
|
bpf_trace
|
||||||
|
^^^^^^^^^
|
||||||
|
The following bpftrace command can be used to capture and count all XDP tracepoints:
|
||||||
|
|
||||||
|
.. code-block:: none
|
||||||
|
|
||||||
|
sudo bpftrace -e 'tracepoint:xdp:* { @cnt[probe] = count(); }'
|
||||||
|
Attaching 12 probes...
|
||||||
|
^C
|
||||||
|
|
||||||
|
@cnt[tracepoint:xdp:mem_connect]: 18
|
||||||
|
@cnt[tracepoint:xdp:mem_disconnect]: 18
|
||||||
|
@cnt[tracepoint:xdp:xdp_exception]: 19605
|
||||||
|
@cnt[tracepoint:xdp:xdp_devmap_xmit]: 1393604
|
||||||
|
@cnt[tracepoint:xdp:xdp_redirect]: 22292200
|
||||||
|
|
||||||
|
.. note::
|
||||||
|
The various xdp tracepoints can be found in ``source/include/trace/events/xdp.h``
|
||||||
|
|
||||||
|
The following bpftrace command can be used to extract the ``ERRNO`` being returned as
|
||||||
|
part of the err parameter:
|
||||||
|
|
||||||
|
.. code-block:: none
|
||||||
|
|
||||||
|
sudo bpftrace -e \
|
||||||
|
'tracepoint:xdp:xdp_redirect*_err {@redir_errno[-args->err] = count();}
|
||||||
|
tracepoint:xdp:xdp_devmap_xmit {@devmap_errno[-args->err] = count();}'
|
||||||
|
|
||||||
|
perf record
|
||||||
|
^^^^^^^^^^^
|
||||||
|
The perf tool also supports recording tracepoints:
|
||||||
|
|
||||||
|
.. code-block:: none
|
||||||
|
|
||||||
|
perf record -a -e xdp:xdp_redirect_err \
|
||||||
|
-e xdp:xdp_redirect_map_err \
|
||||||
|
-e xdp:xdp_exception \
|
||||||
|
-e xdp:xdp_devmap_xmit
|
||||||
|
|
||||||
|
References
|
||||||
|
===========
|
||||||
|
|
||||||
|
- https://github.com/xdp-project/xdp-tutorial/tree/master/tracing02-xdp-monitor
|
|
@ -29,6 +29,38 @@ properties:
|
||||||
interrupts:
|
interrupts:
|
||||||
maxItems: 1
|
maxItems: 1
|
||||||
|
|
||||||
|
memory-region:
|
||||||
|
items:
|
||||||
|
- description: firmware EMI region
|
||||||
|
- description: firmware ILM region
|
||||||
|
- description: firmware DLM region
|
||||||
|
- description: firmware CPU DATA region
|
||||||
|
- description: firmware BOOT region
|
||||||
|
|
||||||
|
memory-region-names:
|
||||||
|
items:
|
||||||
|
- const: wo-emi
|
||||||
|
- const: wo-ilm
|
||||||
|
- const: wo-dlm
|
||||||
|
- const: wo-data
|
||||||
|
- const: wo-boot
|
||||||
|
|
||||||
|
mediatek,wo-ccif:
|
||||||
|
$ref: /schemas/types.yaml#/definitions/phandle
|
||||||
|
description: mediatek wed-wo controller interface.
|
||||||
|
|
||||||
|
allOf:
|
||||||
|
- if:
|
||||||
|
properties:
|
||||||
|
compatible:
|
||||||
|
contains:
|
||||||
|
const: mediatek,mt7622-wed
|
||||||
|
then:
|
||||||
|
properties:
|
||||||
|
memory-region-names: false
|
||||||
|
memory-region: false
|
||||||
|
mediatek,wo-ccif: false
|
||||||
|
|
||||||
required:
|
required:
|
||||||
- compatible
|
- compatible
|
||||||
- reg
|
- reg
|
||||||
|
@ -49,3 +81,23 @@ examples:
|
||||||
interrupts = <GIC_SPI 214 IRQ_TYPE_LEVEL_LOW>;
|
interrupts = <GIC_SPI 214 IRQ_TYPE_LEVEL_LOW>;
|
||||||
};
|
};
|
||||||
};
|
};
|
||||||
|
|
||||||
|
- |
|
||||||
|
#include <dt-bindings/interrupt-controller/arm-gic.h>
|
||||||
|
#include <dt-bindings/interrupt-controller/irq.h>
|
||||||
|
soc {
|
||||||
|
#address-cells = <2>;
|
||||||
|
#size-cells = <2>;
|
||||||
|
|
||||||
|
wed@15010000 {
|
||||||
|
compatible = "mediatek,mt7986-wed", "syscon";
|
||||||
|
reg = <0 0x15010000 0 0x1000>;
|
||||||
|
interrupts = <GIC_SPI 205 IRQ_TYPE_LEVEL_HIGH>;
|
||||||
|
|
||||||
|
memory-region = <&wo_emi>, <&wo_ilm>, <&wo_dlm>,
|
||||||
|
<&wo_data>, <&wo_boot>;
|
||||||
|
memory-region-names = "wo-emi", "wo-ilm", "wo-dlm",
|
||||||
|
"wo-data", "wo-boot";
|
||||||
|
mediatek,wo-ccif = <&wo_ccif0>;
|
||||||
|
};
|
||||||
|
};
|
||||||
|
|
|
@ -7,7 +7,8 @@ $schema: http://devicetree.org/meta-schemas/core.yaml#
|
||||||
title: Freescale INTMUX interrupt multiplexer
|
title: Freescale INTMUX interrupt multiplexer
|
||||||
|
|
||||||
maintainers:
|
maintainers:
|
||||||
- Joakim Zhang <qiangqing.zhang@nxp.com>
|
- Shawn Guo <shawnguo@kernel.org>
|
||||||
|
- NXP Linux Team <linux-imx@nxp.com>
|
||||||
|
|
||||||
properties:
|
properties:
|
||||||
compatible:
|
compatible:
|
||||||
|
|
|
@ -46,6 +46,10 @@ properties:
|
||||||
interrupts:
|
interrupts:
|
||||||
maxItems: 1
|
maxItems: 1
|
||||||
|
|
||||||
|
reset-gpios:
|
||||||
|
maxItems: 1
|
||||||
|
description: GPIO connected to active low reset
|
||||||
|
|
||||||
required:
|
required:
|
||||||
- compatible
|
- compatible
|
||||||
- reg
|
- reg
|
||||||
|
|
|
@ -27,7 +27,9 @@ properties:
|
||||||
- usbb95,772b # ASIX AX88772B
|
- usbb95,772b # ASIX AX88772B
|
||||||
- usbb95,7e2b # ASIX AX88772B
|
- usbb95,7e2b # ASIX AX88772B
|
||||||
|
|
||||||
reg: true
|
reg:
|
||||||
|
maxItems: 1
|
||||||
|
|
||||||
local-mac-address: true
|
local-mac-address: true
|
||||||
mac-address: true
|
mac-address: true
|
||||||
|
|
||||||
|
|
|
@ -1,5 +0,0 @@
|
||||||
The following properties are common to the Bluetooth controllers:
|
|
||||||
|
|
||||||
- local-bd-address: array of 6 bytes, specifies the BD address that was
|
|
||||||
uniquely assigned to the Bluetooth device, formatted with least significant
|
|
||||||
byte first (little-endian).
|
|
|
@ -0,0 +1,29 @@
|
||||||
|
# SPDX-License-Identifier: (GPL-2.0 OR BSD-2-Clause)
|
||||||
|
%YAML 1.2
|
||||||
|
---
|
||||||
|
$id: http://devicetree.org/schemas/net/bluetooth/bluetooth-controller.yaml#
|
||||||
|
$schema: http://devicetree.org/meta-schemas/core.yaml#
|
||||||
|
|
||||||
|
title: Bluetooth Controller Generic Binding
|
||||||
|
|
||||||
|
maintainers:
|
||||||
|
- Marcel Holtmann <marcel@holtmann.org>
|
||||||
|
- Johan Hedberg <johan.hedberg@gmail.com>
|
||||||
|
- Luiz Augusto von Dentz <luiz.dentz@gmail.com>
|
||||||
|
|
||||||
|
properties:
|
||||||
|
$nodename:
|
||||||
|
pattern: "^bluetooth(@.*)?$"
|
||||||
|
|
||||||
|
local-bd-address:
|
||||||
|
$ref: /schemas/types.yaml#/definitions/uint8-array
|
||||||
|
maxItems: 6
|
||||||
|
description:
|
||||||
|
Specifies the BD address that was uniquely assigned to the Bluetooth
|
||||||
|
device. Formatted with least significant byte first (little-endian), e.g.
|
||||||
|
in order to assign the address 00:11:22:33:44:55 this property must have
|
||||||
|
the value [55 44 33 22 11 00].
|
||||||
|
|
||||||
|
additionalProperties: true
|
||||||
|
|
||||||
|
...
|
|
@ -0,0 +1,81 @@
|
||||||
|
# SPDX-License-Identifier: (GPL-2.0 OR BSD-2-Clause)
|
||||||
|
%YAML 1.2
|
||||||
|
---
|
||||||
|
$id: http://devicetree.org/schemas/net/bluetooth/brcm,bcm4377-bluetooth.yaml#
|
||||||
|
$schema: http://devicetree.org/meta-schemas/core.yaml#
|
||||||
|
|
||||||
|
title: Broadcom BCM4377 family PCIe Bluetooth Chips
|
||||||
|
|
||||||
|
maintainers:
|
||||||
|
- Sven Peter <sven@svenpeter.dev>
|
||||||
|
|
||||||
|
description:
|
||||||
|
This binding describes Broadcom BCM4377 family PCIe-attached bluetooth chips
|
||||||
|
usually found in Apple machines. The Wi-Fi part of the chip is described in
|
||||||
|
bindings/net/wireless/brcm,bcm4329-fmac.yaml.
|
||||||
|
|
||||||
|
allOf:
|
||||||
|
- $ref: bluetooth-controller.yaml#
|
||||||
|
|
||||||
|
properties:
|
||||||
|
compatible:
|
||||||
|
enum:
|
||||||
|
- pci14e4,5fa0 # BCM4377
|
||||||
|
- pci14e4,5f69 # BCM4378
|
||||||
|
- pci14e4,5f71 # BCM4387
|
||||||
|
|
||||||
|
reg:
|
||||||
|
maxItems: 1
|
||||||
|
|
||||||
|
brcm,board-type:
|
||||||
|
$ref: /schemas/types.yaml#/definitions/string
|
||||||
|
description: Board type of the Bluetooth chip. This is used to decouple
|
||||||
|
the overall system board from the Bluetooth module and used to construct
|
||||||
|
firmware and calibration data filenames.
|
||||||
|
On Apple platforms, this should be the Apple module-instance codename
|
||||||
|
prefixed by "apple,", e.g. "apple,atlantisb".
|
||||||
|
pattern: '^apple,.*'
|
||||||
|
|
||||||
|
brcm,taurus-cal-blob:
|
||||||
|
$ref: /schemas/types.yaml#/definitions/uint8-array
|
||||||
|
description: A per-device calibration blob for the Bluetooth radio. This
|
||||||
|
should be filled in by the bootloader from platform configuration
|
||||||
|
data, if necessary, and will be uploaded to the device.
|
||||||
|
This blob is used if the chip stepping of the Bluetooth module does not
|
||||||
|
support beamforming.
|
||||||
|
|
||||||
|
brcm,taurus-bf-cal-blob:
|
||||||
|
$ref: /schemas/types.yaml#/definitions/uint8-array
|
||||||
|
description: A per-device calibration blob for the Bluetooth radio. This
|
||||||
|
should be filled in by the bootloader from platform configuration
|
||||||
|
data, if necessary, and will be uploaded to the device.
|
||||||
|
This blob is used if the chip stepping of the Bluetooth module supports
|
||||||
|
beamforming.
|
||||||
|
|
||||||
|
local-bd-address: true
|
||||||
|
|
||||||
|
required:
|
||||||
|
- compatible
|
||||||
|
- reg
|
||||||
|
- local-bd-address
|
||||||
|
- brcm,board-type
|
||||||
|
|
||||||
|
additionalProperties: false
|
||||||
|
|
||||||
|
examples:
|
||||||
|
- |
|
||||||
|
pcie@a0000000 {
|
||||||
|
#address-cells = <3>;
|
||||||
|
#size-cells = <2>;
|
||||||
|
reg = <0xa0000000 0x1000000>;
|
||||||
|
device_type = "pci";
|
||||||
|
ranges = <0x43000000 0x6 0xa0000000 0xa0000000 0x0 0x20000000>;
|
||||||
|
|
||||||
|
bluetooth@0,1 {
|
||||||
|
compatible = "pci14e4,5f69";
|
||||||
|
reg = <0x100 0x0 0x0 0x0 0x0>;
|
||||||
|
brcm,board-type = "apple,honshu";
|
||||||
|
/* To be filled by the bootloader */
|
||||||
|
local-bd-address = [00 00 00 00 00 00];
|
||||||
|
};
|
||||||
|
};
|
|
@ -1,7 +1,7 @@
|
||||||
# SPDX-License-Identifier: (GPL-2.0 OR BSD-2-Clause)
|
# SPDX-License-Identifier: (GPL-2.0 OR BSD-2-Clause)
|
||||||
%YAML 1.2
|
%YAML 1.2
|
||||||
---
|
---
|
||||||
$id: http://devicetree.org/schemas/net/qualcomm-bluetooth.yaml#
|
$id: http://devicetree.org/schemas/net/bluetooth/qualcomm-bluetooth.yaml#
|
||||||
$schema: http://devicetree.org/meta-schemas/core.yaml#
|
$schema: http://devicetree.org/meta-schemas/core.yaml#
|
||||||
|
|
||||||
title: Qualcomm Bluetooth Chips
|
title: Qualcomm Bluetooth Chips
|
||||||
|
@ -79,8 +79,7 @@ properties:
|
||||||
firmware-name:
|
firmware-name:
|
||||||
description: specify the name of nvm firmware to load
|
description: specify the name of nvm firmware to load
|
||||||
|
|
||||||
local-bd-address:
|
local-bd-address: true
|
||||||
description: see Documentation/devicetree/bindings/net/bluetooth.txt
|
|
||||||
|
|
||||||
|
|
||||||
required:
|
required:
|
||||||
|
@ -89,6 +88,7 @@ required:
|
||||||
additionalProperties: false
|
additionalProperties: false
|
||||||
|
|
||||||
allOf:
|
allOf:
|
||||||
|
- $ref: bluetooth-controller.yaml#
|
||||||
- if:
|
- if:
|
||||||
properties:
|
properties:
|
||||||
compatible:
|
compatible:
|
|
@ -19,11 +19,14 @@ properties:
|
||||||
- brcm,bcm4329-bt
|
- brcm,bcm4329-bt
|
||||||
- brcm,bcm4330-bt
|
- brcm,bcm4330-bt
|
||||||
- brcm,bcm4334-bt
|
- brcm,bcm4334-bt
|
||||||
|
- brcm,bcm43430a0-bt
|
||||||
|
- brcm,bcm43430a1-bt
|
||||||
- brcm,bcm43438-bt
|
- brcm,bcm43438-bt
|
||||||
- brcm,bcm4345c5
|
- brcm,bcm4345c5
|
||||||
- brcm,bcm43540-bt
|
- brcm,bcm43540-bt
|
||||||
- brcm,bcm4335a0
|
- brcm,bcm4335a0
|
||||||
- brcm,bcm4349-bt
|
- brcm,bcm4349-bt
|
||||||
|
- cypress,cyw4373a0-bt
|
||||||
- infineon,cyw55572-bt
|
- infineon,cyw55572-bt
|
||||||
|
|
||||||
shutdown-gpios:
|
shutdown-gpios:
|
||||||
|
|
|
@ -17,6 +17,7 @@ properties:
|
||||||
compatible:
|
compatible:
|
||||||
oneOf:
|
oneOf:
|
||||||
- enum:
|
- enum:
|
||||||
|
- fsl,imx93-flexcan
|
||||||
- fsl,imx8qm-flexcan
|
- fsl,imx8qm-flexcan
|
||||||
- fsl,imx8mp-flexcan
|
- fsl,imx8mp-flexcan
|
||||||
- fsl,imx6q-flexcan
|
- fsl,imx6q-flexcan
|
||||||
|
|
|
@ -9,9 +9,6 @@ title: Renesas R-Car CAN FD Controller
|
||||||
maintainers:
|
maintainers:
|
||||||
- Fabrizio Castro <fabrizio.castro.jz@renesas.com>
|
- Fabrizio Castro <fabrizio.castro.jz@renesas.com>
|
||||||
|
|
||||||
allOf:
|
|
||||||
- $ref: can-controller.yaml#
|
|
||||||
|
|
||||||
properties:
|
properties:
|
||||||
compatible:
|
compatible:
|
||||||
oneOf:
|
oneOf:
|
||||||
|
@ -33,7 +30,7 @@ properties:
|
||||||
|
|
||||||
- items:
|
- items:
|
||||||
- enum:
|
- enum:
|
||||||
- renesas,r9a07g043-canfd # RZ/G2UL
|
- renesas,r9a07g043-canfd # RZ/G2UL and RZ/Five
|
||||||
- renesas,r9a07g044-canfd # RZ/G2{L,LC}
|
- renesas,r9a07g044-canfd # RZ/G2{L,LC}
|
||||||
- renesas,r9a07g054-canfd # RZ/V2L
|
- renesas,r9a07g054-canfd # RZ/V2L
|
||||||
- const: renesas,rzg2l-canfd # RZ/G2L family
|
- const: renesas,rzg2l-canfd # RZ/G2L family
|
||||||
|
@ -77,12 +74,13 @@ properties:
|
||||||
description: Maximum frequency of the CANFD clock.
|
description: Maximum frequency of the CANFD clock.
|
||||||
|
|
||||||
patternProperties:
|
patternProperties:
|
||||||
"^channel[01]$":
|
"^channel[0-7]$":
|
||||||
type: object
|
type: object
|
||||||
description:
|
description:
|
||||||
The controller supports two channels and each is represented as a child
|
The controller supports multiple channels and each is represented as a
|
||||||
node. Each child node supports the "status" property only, which
|
child node. Each channel can be enabled/disabled individually.
|
||||||
is used to enable/disable the respective channel.
|
|
||||||
|
additionalProperties: false
|
||||||
|
|
||||||
required:
|
required:
|
||||||
- compatible
|
- compatible
|
||||||
|
@ -98,60 +96,73 @@ required:
|
||||||
- channel0
|
- channel0
|
||||||
- channel1
|
- channel1
|
||||||
|
|
||||||
if:
|
allOf:
|
||||||
properties:
|
- $ref: can-controller.yaml#
|
||||||
compatible:
|
|
||||||
contains:
|
|
||||||
enum:
|
|
||||||
- renesas,rzg2l-canfd
|
|
||||||
then:
|
|
||||||
properties:
|
|
||||||
interrupts:
|
|
||||||
items:
|
|
||||||
- description: CAN global error interrupt
|
|
||||||
- description: CAN receive FIFO interrupt
|
|
||||||
- description: CAN0 error interrupt
|
|
||||||
- description: CAN0 transmit interrupt
|
|
||||||
- description: CAN0 transmit/receive FIFO receive completion interrupt
|
|
||||||
- description: CAN1 error interrupt
|
|
||||||
- description: CAN1 transmit interrupt
|
|
||||||
- description: CAN1 transmit/receive FIFO receive completion interrupt
|
|
||||||
|
|
||||||
interrupt-names:
|
- if:
|
||||||
items:
|
properties:
|
||||||
- const: g_err
|
compatible:
|
||||||
- const: g_recc
|
contains:
|
||||||
- const: ch0_err
|
enum:
|
||||||
- const: ch0_rec
|
- renesas,rzg2l-canfd
|
||||||
- const: ch0_trx
|
then:
|
||||||
- const: ch1_err
|
properties:
|
||||||
- const: ch1_rec
|
interrupts:
|
||||||
- const: ch1_trx
|
items:
|
||||||
|
- description: CAN global error interrupt
|
||||||
|
- description: CAN receive FIFO interrupt
|
||||||
|
- description: CAN0 error interrupt
|
||||||
|
- description: CAN0 transmit interrupt
|
||||||
|
- description: CAN0 transmit/receive FIFO receive completion interrupt
|
||||||
|
- description: CAN1 error interrupt
|
||||||
|
- description: CAN1 transmit interrupt
|
||||||
|
- description: CAN1 transmit/receive FIFO receive completion interrupt
|
||||||
|
|
||||||
resets:
|
interrupt-names:
|
||||||
maxItems: 2
|
items:
|
||||||
|
- const: g_err
|
||||||
|
- const: g_recc
|
||||||
|
- const: ch0_err
|
||||||
|
- const: ch0_rec
|
||||||
|
- const: ch0_trx
|
||||||
|
- const: ch1_err
|
||||||
|
- const: ch1_rec
|
||||||
|
- const: ch1_trx
|
||||||
|
|
||||||
reset-names:
|
resets:
|
||||||
items:
|
maxItems: 2
|
||||||
- const: rstp_n
|
|
||||||
- const: rstc_n
|
|
||||||
|
|
||||||
required:
|
reset-names:
|
||||||
- reset-names
|
items:
|
||||||
else:
|
- const: rstp_n
|
||||||
properties:
|
- const: rstc_n
|
||||||
interrupts:
|
|
||||||
items:
|
|
||||||
- description: Channel interrupt
|
|
||||||
- description: Global interrupt
|
|
||||||
|
|
||||||
interrupt-names:
|
required:
|
||||||
items:
|
- reset-names
|
||||||
- const: ch_int
|
else:
|
||||||
- const: g_int
|
properties:
|
||||||
|
interrupts:
|
||||||
|
items:
|
||||||
|
- description: Channel interrupt
|
||||||
|
- description: Global interrupt
|
||||||
|
|
||||||
resets:
|
interrupt-names:
|
||||||
maxItems: 1
|
items:
|
||||||
|
- const: ch_int
|
||||||
|
- const: g_int
|
||||||
|
|
||||||
|
resets:
|
||||||
|
maxItems: 1
|
||||||
|
|
||||||
|
- if:
|
||||||
|
not:
|
||||||
|
properties:
|
||||||
|
compatible:
|
||||||
|
contains:
|
||||||
|
const: renesas,r8a779a0-canfd
|
||||||
|
then:
|
||||||
|
patternProperties:
|
||||||
|
"^channel[2-7]$": false
|
||||||
|
|
||||||
unevaluatedProperties: false
|
unevaluatedProperties: false
|
||||||
|
|
||||||
|
|
|
@ -19,7 +19,8 @@ allOf:
|
||||||
|
|
||||||
properties:
|
properties:
|
||||||
reg:
|
reg:
|
||||||
description: Port number
|
items:
|
||||||
|
- description: Port number
|
||||||
|
|
||||||
label:
|
label:
|
||||||
description:
|
description:
|
||||||
|
|
|
@ -12,7 +12,7 @@ allOf:
|
||||||
maintainers:
|
maintainers:
|
||||||
- Andrew Lunn <andrew@lunn.ch>
|
- Andrew Lunn <andrew@lunn.ch>
|
||||||
- Florian Fainelli <f.fainelli@gmail.com>
|
- Florian Fainelli <f.fainelli@gmail.com>
|
||||||
- Vivien Didelot <vivien.didelot@gmail.com>
|
- Vladimir Oltean <olteanv@gmail.com>
|
||||||
- Kurt Kanzenbach <kurt@linutronix.de>
|
- Kurt Kanzenbach <kurt@linutronix.de>
|
||||||
|
|
||||||
description:
|
description:
|
||||||
|
|
|
@ -74,10 +74,10 @@ properties:
|
||||||
|
|
||||||
properties:
|
properties:
|
||||||
pcs-handle:
|
pcs-handle:
|
||||||
|
maxItems: 1
|
||||||
description:
|
description:
|
||||||
phandle pointing to a PCS sub-node compatible with
|
phandle pointing to a PCS sub-node compatible with
|
||||||
renesas,rzn1-miic.yaml#
|
renesas,rzn1-miic.yaml#
|
||||||
$ref: /schemas/types.yaml#/definitions/phandle
|
|
||||||
|
|
||||||
unevaluatedProperties: false
|
unevaluatedProperties: false
|
||||||
|
|
||||||
|
|
|
@ -108,11 +108,17 @@ properties:
|
||||||
$ref: "#/properties/phy-connection-type"
|
$ref: "#/properties/phy-connection-type"
|
||||||
|
|
||||||
pcs-handle:
|
pcs-handle:
|
||||||
$ref: /schemas/types.yaml#/definitions/phandle
|
$ref: /schemas/types.yaml#/definitions/phandle-array
|
||||||
|
items:
|
||||||
|
maxItems: 1
|
||||||
description:
|
description:
|
||||||
Specifies a reference to a node representing a PCS PHY device on a MDIO
|
Specifies a reference to a node representing a PCS PHY device on a MDIO
|
||||||
bus to link with an external PHY (phy-handle) if exists.
|
bus to link with an external PHY (phy-handle) if exists.
|
||||||
|
|
||||||
|
pcs-handle-names:
|
||||||
|
description:
|
||||||
|
The name of each PCS in pcs-handle.
|
||||||
|
|
||||||
phy-handle:
|
phy-handle:
|
||||||
$ref: /schemas/types.yaml#/definitions/phandle
|
$ref: /schemas/types.yaml#/definitions/phandle
|
||||||
description:
|
description:
|
||||||
|
@ -216,6 +222,9 @@ properties:
|
||||||
required:
|
required:
|
||||||
- speed
|
- speed
|
||||||
|
|
||||||
|
dependencies:
|
||||||
|
pcs-handle-names: [pcs-handle]
|
||||||
|
|
||||||
allOf:
|
allOf:
|
||||||
- if:
|
- if:
|
||||||
properties:
|
properties:
|
||||||
|
|
|
@ -7,7 +7,9 @@ $schema: http://devicetree.org/meta-schemas/core.yaml#
|
||||||
title: Freescale Fast Ethernet Controller (FEC)
|
title: Freescale Fast Ethernet Controller (FEC)
|
||||||
|
|
||||||
maintainers:
|
maintainers:
|
||||||
- Joakim Zhang <qiangqing.zhang@nxp.com>
|
- Shawn Guo <shawnguo@kernel.org>
|
||||||
|
- Wei Fang <wei.fang@nxp.com>
|
||||||
|
- NXP Linux Team <linux-imx@nxp.com>
|
||||||
|
|
||||||
allOf:
|
allOf:
|
||||||
- $ref: ethernet-controller.yaml#
|
- $ref: ethernet-controller.yaml#
|
||||||
|
|
|
@ -85,9 +85,39 @@ properties:
|
||||||
$ref: /schemas/types.yaml#/definitions/phandle
|
$ref: /schemas/types.yaml#/definitions/phandle
|
||||||
description: A reference to the IEEE1588 timer
|
description: A reference to the IEEE1588 timer
|
||||||
|
|
||||||
|
phys:
|
||||||
|
description: A reference to the SerDes lane(s)
|
||||||
|
maxItems: 1
|
||||||
|
|
||||||
|
phy-names:
|
||||||
|
items:
|
||||||
|
- const: serdes
|
||||||
|
|
||||||
pcsphy-handle:
|
pcsphy-handle:
|
||||||
$ref: /schemas/types.yaml#/definitions/phandle
|
$ref: /schemas/types.yaml#/definitions/phandle-array
|
||||||
description: A reference to the PCS (typically found on the SerDes)
|
minItems: 1
|
||||||
|
maxItems: 3
|
||||||
|
deprecated: true
|
||||||
|
description: See pcs-handle.
|
||||||
|
|
||||||
|
pcs-handle:
|
||||||
|
minItems: 1
|
||||||
|
maxItems: 3
|
||||||
|
description: |
|
||||||
|
A reference to the various PCSs (typically found on the SerDes). If
|
||||||
|
pcs-handle-names is absent, and phy-connection-type is "xgmii", then the first
|
||||||
|
reference will be assumed to be for "xfi". Otherwise, if pcs-handle-names is
|
||||||
|
absent, then the first reference will be assumed to be for "sgmii".
|
||||||
|
|
||||||
|
pcs-handle-names:
|
||||||
|
minItems: 1
|
||||||
|
maxItems: 3
|
||||||
|
items:
|
||||||
|
enum:
|
||||||
|
- sgmii
|
||||||
|
- qsgmii
|
||||||
|
- xfi
|
||||||
|
description: The type of each PCS in pcsphy-handle.
|
||||||
|
|
||||||
tbi-handle:
|
tbi-handle:
|
||||||
$ref: /schemas/types.yaml#/definitions/phandle
|
$ref: /schemas/types.yaml#/definitions/phandle
|
||||||
|
@ -100,6 +130,10 @@ required:
|
||||||
- fsl,fman-ports
|
- fsl,fman-ports
|
||||||
- ptp-timer
|
- ptp-timer
|
||||||
|
|
||||||
|
dependencies:
|
||||||
|
pcs-handle-names:
|
||||||
|
- pcs-handle
|
||||||
|
|
||||||
allOf:
|
allOf:
|
||||||
- $ref: ethernet-controller.yaml#
|
- $ref: ethernet-controller.yaml#
|
||||||
- if:
|
- if:
|
||||||
|
@ -110,14 +144,6 @@ allOf:
|
||||||
then:
|
then:
|
||||||
required:
|
required:
|
||||||
- tbi-handle
|
- tbi-handle
|
||||||
- if:
|
|
||||||
properties:
|
|
||||||
compatible:
|
|
||||||
contains:
|
|
||||||
const: fsl,fman-memac
|
|
||||||
then:
|
|
||||||
required:
|
|
||||||
- pcsphy-handle
|
|
||||||
|
|
||||||
unevaluatedProperties: false
|
unevaluatedProperties: false
|
||||||
|
|
||||||
|
@ -138,8 +164,9 @@ examples:
|
||||||
reg = <0xe8000 0x1000>;
|
reg = <0xe8000 0x1000>;
|
||||||
fsl,fman-ports = <&fman0_rx_0x0c &fman0_tx_0x2c>;
|
fsl,fman-ports = <&fman0_rx_0x0c &fman0_tx_0x2c>;
|
||||||
ptp-timer = <&ptp_timer0>;
|
ptp-timer = <&ptp_timer0>;
|
||||||
pcsphy-handle = <&pcsphy4>;
|
pcs-handle = <&pcsphy4>, <&qsgmiib_pcs1>;
|
||||||
phy-handle = <&sgmii_phy1>;
|
pcs-handle-names = "sgmii", "qsgmii";
|
||||||
phy-connection-type = "sgmii";
|
phys = <&serdes1 1>;
|
||||||
|
phy-names = "serdes";
|
||||||
};
|
};
|
||||||
...
|
...
|
||||||
|
|
|
@ -31,7 +31,7 @@ properties:
|
||||||
phy-mode: true
|
phy-mode: true
|
||||||
|
|
||||||
pcs-handle:
|
pcs-handle:
|
||||||
$ref: /schemas/types.yaml#/definitions/phandle
|
maxItems: 1
|
||||||
description:
|
description:
|
||||||
A reference to a node representing a PCS PHY device found on
|
A reference to a node representing a PCS PHY device found on
|
||||||
the internal MDIO bus.
|
the internal MDIO bus.
|
||||||
|
|
|
@ -320,8 +320,9 @@ For internal PHY device on internal mdio bus, a PHY node should be created.
|
||||||
See the definition of the PHY node in booting-without-of.txt for an
|
See the definition of the PHY node in booting-without-of.txt for an
|
||||||
example of how to define a PHY (Internal PHY has no interrupt line).
|
example of how to define a PHY (Internal PHY has no interrupt line).
|
||||||
- For "fsl,fman-mdio" compatible internal mdio bus, the PHY is TBI PHY.
|
- For "fsl,fman-mdio" compatible internal mdio bus, the PHY is TBI PHY.
|
||||||
- For "fsl,fman-memac-mdio" compatible internal mdio bus, the PHY is PCS PHY,
|
- For "fsl,fman-memac-mdio" compatible internal mdio bus, the PHY is PCS PHY.
|
||||||
PCS PHY addr must be '0'.
|
The PCS PHY address should correspond to the value of the appropriate
|
||||||
|
MDEV_PORT.
|
||||||
|
|
||||||
EXAMPLE
|
EXAMPLE
|
||||||
|
|
||||||
|
|
|
@ -0,0 +1,62 @@
|
||||||
|
# SPDX-License-Identifier: GPL-2.0-only OR BSD-2-Clause
|
||||||
|
%YAML 1.2
|
||||||
|
---
|
||||||
|
$id: http://devicetree.org/schemas/net/marvell,dfx-server.yaml#
|
||||||
|
$schema: http://devicetree.org/meta-schemas/core.yaml#
|
||||||
|
|
||||||
|
title: Marvell Prestera DFX server
|
||||||
|
|
||||||
|
maintainers:
|
||||||
|
- Miquel Raynal <miquel.raynal@bootlin.com>
|
||||||
|
|
||||||
|
select:
|
||||||
|
properties:
|
||||||
|
compatible:
|
||||||
|
contains:
|
||||||
|
const: marvell,dfx-server
|
||||||
|
required:
|
||||||
|
- compatible
|
||||||
|
|
||||||
|
properties:
|
||||||
|
compatible:
|
||||||
|
items:
|
||||||
|
- const: marvell,dfx-server
|
||||||
|
- const: simple-bus
|
||||||
|
|
||||||
|
reg:
|
||||||
|
maxItems: 1
|
||||||
|
|
||||||
|
ranges: true
|
||||||
|
|
||||||
|
'#address-cells':
|
||||||
|
const: 1
|
||||||
|
|
||||||
|
'#size-cells':
|
||||||
|
const: 1
|
||||||
|
|
||||||
|
required:
|
||||||
|
- compatible
|
||||||
|
- reg
|
||||||
|
- ranges
|
||||||
|
|
||||||
|
# The DFX server may expose clocks described as subnodes
|
||||||
|
additionalProperties:
|
||||||
|
type: object
|
||||||
|
|
||||||
|
examples:
|
||||||
|
- |
|
||||||
|
|
||||||
|
#define MBUS_ID(target,attributes) (((target) << 24) | ((attributes) << 16))
|
||||||
|
bus@0 {
|
||||||
|
reg = <0 0>;
|
||||||
|
#address-cells = <2>;
|
||||||
|
#size-cells = <1>;
|
||||||
|
|
||||||
|
dfx-bus@ac000000 {
|
||||||
|
compatible = "marvell,dfx-server", "simple-bus";
|
||||||
|
#address-cells = <1>;
|
||||||
|
#size-cells = <1>;
|
||||||
|
ranges = <0 MBUS_ID(0x08, 0x00) 0 0x100000>;
|
||||||
|
reg = <MBUS_ID(0x08, 0x00) 0 0x100000>;
|
||||||
|
};
|
||||||
|
};
|
305
Documentation/devicetree/bindings/net/marvell,pp2.yaml
Normal file
305
Documentation/devicetree/bindings/net/marvell,pp2.yaml
Normal file
|
@ -0,0 +1,305 @@
|
||||||
|
# SPDX-License-Identifier: (GPL-2.0-only OR BSD-2-Clause)
|
||||||
|
%YAML 1.2
|
||||||
|
---
|
||||||
|
$id: http://devicetree.org/schemas/net/marvell,pp2.yaml#
|
||||||
|
$schema: http://devicetree.org/meta-schemas/core.yaml#
|
||||||
|
|
||||||
|
title: Marvell CN913X / Marvell Armada 375, 7K, 8K Ethernet Controller
|
||||||
|
|
||||||
|
maintainers:
|
||||||
|
- Marcin Wojtas <mw@semihalf.com>
|
||||||
|
- Russell King <linux@armlinux.org>
|
||||||
|
|
||||||
|
description: |
|
||||||
|
Marvell Armada 375 Ethernet Controller (PPv2.1)
|
||||||
|
Marvell Armada 7K/8K Ethernet Controller (PPv2.2)
|
||||||
|
Marvell CN913X Ethernet Controller (PPv2.3)
|
||||||
|
|
||||||
|
properties:
|
||||||
|
compatible:
|
||||||
|
enum:
|
||||||
|
- marvell,armada-375-pp2
|
||||||
|
- marvell,armada-7k-pp22
|
||||||
|
|
||||||
|
reg:
|
||||||
|
minItems: 3
|
||||||
|
maxItems: 4
|
||||||
|
|
||||||
|
"#address-cells":
|
||||||
|
const: 1
|
||||||
|
|
||||||
|
"#size-cells":
|
||||||
|
const: 0
|
||||||
|
|
||||||
|
clocks:
|
||||||
|
minItems: 2
|
||||||
|
items:
|
||||||
|
- description: main controller clock
|
||||||
|
- description: GOP clock
|
||||||
|
- description: MG clock
|
||||||
|
- description: MG Core clock
|
||||||
|
- description: AXI clock
|
||||||
|
|
||||||
|
clock-names:
|
||||||
|
minItems: 2
|
||||||
|
items:
|
||||||
|
- const: pp_clk
|
||||||
|
- const: gop_clk
|
||||||
|
- const: mg_clk
|
||||||
|
- const: mg_core_clk
|
||||||
|
- const: axi_clk
|
||||||
|
|
||||||
|
dma-coherent: true
|
||||||
|
|
||||||
|
marvell,system-controller:
|
||||||
|
$ref: /schemas/types.yaml#/definitions/phandle
|
||||||
|
description: a phandle to the system controller.
|
||||||
|
|
||||||
|
patternProperties:
|
||||||
|
'^(ethernet-)?port@[0-2]$':
|
||||||
|
type: object
|
||||||
|
description: subnode for each ethernet port.
|
||||||
|
$ref: ethernet-controller.yaml#
|
||||||
|
unevaluatedProperties: false
|
||||||
|
|
||||||
|
properties:
|
||||||
|
reg:
|
||||||
|
description: ID of the port from the MAC point of view.
|
||||||
|
maximum: 2
|
||||||
|
|
||||||
|
interrupts:
|
||||||
|
minItems: 1
|
||||||
|
maxItems: 10
|
||||||
|
description: interrupt(s) for the port
|
||||||
|
|
||||||
|
interrupt-names:
|
||||||
|
minItems: 1
|
||||||
|
items:
|
||||||
|
- const: hif0
|
||||||
|
- const: hif1
|
||||||
|
- const: hif2
|
||||||
|
- const: hif3
|
||||||
|
- const: hif4
|
||||||
|
- const: hif5
|
||||||
|
- const: hif6
|
||||||
|
- const: hif7
|
||||||
|
- const: hif8
|
||||||
|
- const: link
|
||||||
|
|
||||||
|
description: >
|
||||||
|
if more than a single interrupt for is given, must be the
|
||||||
|
name associated to the interrupts listed. Valid names are:
|
||||||
|
"hifX", with X in [0..8], and "link". The names "tx-cpu0",
|
||||||
|
"tx-cpu1", "tx-cpu2", "tx-cpu3" and "rx-shared" are supported
|
||||||
|
for backward compatibility but shouldn't be used for new
|
||||||
|
additions.
|
||||||
|
|
||||||
|
phys:
|
||||||
|
minItems: 1
|
||||||
|
maxItems: 2
|
||||||
|
description: >
|
||||||
|
Generic PHY, providing SerDes connectivity. For most modes,
|
||||||
|
one lane is sufficient, but some (e.g. RXAUI) may require two.
|
||||||
|
|
||||||
|
phy-mode:
|
||||||
|
enum:
|
||||||
|
- gmii
|
||||||
|
- sgmii
|
||||||
|
- rgmii-id
|
||||||
|
- 1000base-x
|
||||||
|
- 2500base-x
|
||||||
|
- 5gbase-r
|
||||||
|
- rxaui
|
||||||
|
- 10gbase-r
|
||||||
|
|
||||||
|
port-id:
|
||||||
|
$ref: /schemas/types.yaml#/definitions/uint32
|
||||||
|
deprecated: true
|
||||||
|
description: >
|
||||||
|
ID of the port from the MAC point of view.
|
||||||
|
Legacy binding for backward compatibility.
|
||||||
|
|
||||||
|
marvell,loopback:
|
||||||
|
$ref: /schemas/types.yaml#/definitions/flag
|
||||||
|
description: port is loopback mode.
|
||||||
|
|
||||||
|
gop-port-id:
|
||||||
|
$ref: /schemas/types.yaml#/definitions/uint32
|
||||||
|
description: >
|
||||||
|
only for marvell,armada-7k-pp22, ID of the port from the
|
||||||
|
GOP (Group Of Ports) point of view. This ID is used to index the
|
||||||
|
per-port registers in the second register area.
|
||||||
|
|
||||||
|
required:
|
||||||
|
- reg
|
||||||
|
- interrupts
|
||||||
|
- phy-mode
|
||||||
|
- port-id
|
||||||
|
|
||||||
|
required:
|
||||||
|
- compatible
|
||||||
|
- reg
|
||||||
|
- clocks
|
||||||
|
- clock-names
|
||||||
|
|
||||||
|
allOf:
|
||||||
|
- if:
|
||||||
|
properties:
|
||||||
|
compatible:
|
||||||
|
const: marvell,armada-7k-pp22
|
||||||
|
then:
|
||||||
|
properties:
|
||||||
|
reg:
|
||||||
|
items:
|
||||||
|
- description: Packet Processor registers
|
||||||
|
- description: Networking interfaces registers
|
||||||
|
- description: CM3 address space used for TX Flow Control
|
||||||
|
|
||||||
|
clocks:
|
||||||
|
minItems: 5
|
||||||
|
|
||||||
|
clock-names:
|
||||||
|
minItems: 5
|
||||||
|
|
||||||
|
patternProperties:
|
||||||
|
'^(ethernet-)?port@[0-2]$':
|
||||||
|
required:
|
||||||
|
- gop-port-id
|
||||||
|
|
||||||
|
required:
|
||||||
|
- marvell,system-controller
|
||||||
|
else:
|
||||||
|
properties:
|
||||||
|
reg:
|
||||||
|
items:
|
||||||
|
- description: Packet Processor registers
|
||||||
|
- description: LMS registers
|
||||||
|
- description: Register area per eth0
|
||||||
|
- description: Register area per eth1
|
||||||
|
|
||||||
|
clocks:
|
||||||
|
maxItems: 2
|
||||||
|
|
||||||
|
clock-names:
|
||||||
|
maxItems: 2
|
||||||
|
|
||||||
|
patternProperties:
|
||||||
|
'^(ethernet-)?port@[0-1]$':
|
||||||
|
properties:
|
||||||
|
reg:
|
||||||
|
maximum: 1
|
||||||
|
|
||||||
|
gop-port-id: false
|
||||||
|
|
||||||
|
additionalProperties: false
|
||||||
|
|
||||||
|
examples:
|
||||||
|
- |
|
||||||
|
// For Armada 375 variant
|
||||||
|
#include <dt-bindings/interrupt-controller/mvebu-icu.h>
|
||||||
|
#include <dt-bindings/interrupt-controller/arm-gic.h>
|
||||||
|
|
||||||
|
ethernet@f0000 {
|
||||||
|
#address-cells = <1>;
|
||||||
|
#size-cells = <0>;
|
||||||
|
compatible = "marvell,armada-375-pp2";
|
||||||
|
reg = <0xf0000 0xa000>,
|
||||||
|
<0xc0000 0x3060>,
|
||||||
|
<0xc4000 0x100>,
|
||||||
|
<0xc5000 0x100>;
|
||||||
|
clocks = <&gateclk 3>, <&gateclk 19>;
|
||||||
|
clock-names = "pp_clk", "gop_clk";
|
||||||
|
|
||||||
|
ethernet-port@0 {
|
||||||
|
interrupts = <GIC_SPI 37 IRQ_TYPE_LEVEL_HIGH>;
|
||||||
|
reg = <0>;
|
||||||
|
port-id = <0>; /* For backward compatibility. */
|
||||||
|
phy = <&phy0>;
|
||||||
|
phy-mode = "rgmii-id";
|
||||||
|
};
|
||||||
|
|
||||||
|
ethernet-port@1 {
|
||||||
|
interrupts = <GIC_SPI 41 IRQ_TYPE_LEVEL_HIGH>;
|
||||||
|
reg = <1>;
|
||||||
|
port-id = <1>; /* For backward compatibility. */
|
||||||
|
phy = <&phy3>;
|
||||||
|
phy-mode = "gmii";
|
||||||
|
};
|
||||||
|
};
|
||||||
|
|
||||||
|
- |
|
||||||
|
// For Armada 7k/8k and Cn913x variants
|
||||||
|
#include <dt-bindings/interrupt-controller/mvebu-icu.h>
|
||||||
|
#include <dt-bindings/interrupt-controller/arm-gic.h>
|
||||||
|
|
||||||
|
ethernet@0 {
|
||||||
|
#address-cells = <1>;
|
||||||
|
#size-cells = <0>;
|
||||||
|
compatible = "marvell,armada-7k-pp22";
|
||||||
|
reg = <0x0 0x100000>, <0x129000 0xb000>, <0x220000 0x800>;
|
||||||
|
clocks = <&cp0_clk 1 3>, <&cp0_clk 1 9>,
|
||||||
|
<&cp0_clk 1 5>, <&cp0_clk 1 6>, <&cp0_clk 1 18>;
|
||||||
|
clock-names = "pp_clk", "gop_clk", "mg_clk", "mg_core_clk", "axi_clk";
|
||||||
|
marvell,system-controller = <&cp0_syscon0>;
|
||||||
|
|
||||||
|
ethernet-port@0 {
|
||||||
|
interrupts = <ICU_GRP_NSR 39 IRQ_TYPE_LEVEL_HIGH>,
|
||||||
|
<ICU_GRP_NSR 43 IRQ_TYPE_LEVEL_HIGH>,
|
||||||
|
<ICU_GRP_NSR 47 IRQ_TYPE_LEVEL_HIGH>,
|
||||||
|
<ICU_GRP_NSR 51 IRQ_TYPE_LEVEL_HIGH>,
|
||||||
|
<ICU_GRP_NSR 55 IRQ_TYPE_LEVEL_HIGH>,
|
||||||
|
<ICU_GRP_NSR 59 IRQ_TYPE_LEVEL_HIGH>,
|
||||||
|
<ICU_GRP_NSR 63 IRQ_TYPE_LEVEL_HIGH>,
|
||||||
|
<ICU_GRP_NSR 67 IRQ_TYPE_LEVEL_HIGH>,
|
||||||
|
<ICU_GRP_NSR 71 IRQ_TYPE_LEVEL_HIGH>,
|
||||||
|
<ICU_GRP_NSR 129 IRQ_TYPE_LEVEL_HIGH>;
|
||||||
|
interrupt-names = "hif0", "hif1", "hif2", "hif3", "hif4",
|
||||||
|
"hif5", "hif6", "hif7", "hif8", "link";
|
||||||
|
phy-mode = "10gbase-r";
|
||||||
|
phys = <&cp0_comphy4 0>;
|
||||||
|
reg = <0>;
|
||||||
|
port-id = <0>; /* For backward compatibility. */
|
||||||
|
gop-port-id = <0>;
|
||||||
|
};
|
||||||
|
|
||||||
|
ethernet-port@1 {
|
||||||
|
interrupts = <ICU_GRP_NSR 40 IRQ_TYPE_LEVEL_HIGH>,
|
||||||
|
<ICU_GRP_NSR 44 IRQ_TYPE_LEVEL_HIGH>,
|
||||||
|
<ICU_GRP_NSR 48 IRQ_TYPE_LEVEL_HIGH>,
|
||||||
|
<ICU_GRP_NSR 52 IRQ_TYPE_LEVEL_HIGH>,
|
||||||
|
<ICU_GRP_NSR 56 IRQ_TYPE_LEVEL_HIGH>,
|
||||||
|
<ICU_GRP_NSR 60 IRQ_TYPE_LEVEL_HIGH>,
|
||||||
|
<ICU_GRP_NSR 64 IRQ_TYPE_LEVEL_HIGH>,
|
||||||
|
<ICU_GRP_NSR 68 IRQ_TYPE_LEVEL_HIGH>,
|
||||||
|
<ICU_GRP_NSR 72 IRQ_TYPE_LEVEL_HIGH>,
|
||||||
|
<ICU_GRP_NSR 128 IRQ_TYPE_LEVEL_HIGH>;
|
||||||
|
interrupt-names = "hif0", "hif1", "hif2", "hif3", "hif4",
|
||||||
|
"hif5", "hif6", "hif7", "hif8", "link";
|
||||||
|
phy-mode = "rgmii-id";
|
||||||
|
reg = <1>;
|
||||||
|
port-id = <1>; /* For backward compatibility. */
|
||||||
|
gop-port-id = <2>;
|
||||||
|
};
|
||||||
|
|
||||||
|
ethernet-port@2 {
|
||||||
|
interrupts = <ICU_GRP_NSR 41 IRQ_TYPE_LEVEL_HIGH>,
|
||||||
|
<ICU_GRP_NSR 45 IRQ_TYPE_LEVEL_HIGH>,
|
||||||
|
<ICU_GRP_NSR 49 IRQ_TYPE_LEVEL_HIGH>,
|
||||||
|
<ICU_GRP_NSR 53 IRQ_TYPE_LEVEL_HIGH>,
|
||||||
|
<ICU_GRP_NSR 57 IRQ_TYPE_LEVEL_HIGH>,
|
||||||
|
<ICU_GRP_NSR 61 IRQ_TYPE_LEVEL_HIGH>,
|
||||||
|
<ICU_GRP_NSR 65 IRQ_TYPE_LEVEL_HIGH>,
|
||||||
|
<ICU_GRP_NSR 69 IRQ_TYPE_LEVEL_HIGH>,
|
||||||
|
<ICU_GRP_NSR 73 IRQ_TYPE_LEVEL_HIGH>,
|
||||||
|
<ICU_GRP_NSR 127 IRQ_TYPE_LEVEL_HIGH>;
|
||||||
|
interrupt-names = "hif0", "hif1", "hif2", "hif3", "hif4",
|
||||||
|
"hif5", "hif6", "hif7", "hif8", "link";
|
||||||
|
phy-mode = "2500base-x";
|
||||||
|
managed = "in-band-status";
|
||||||
|
phys = <&cp0_comphy5 2>;
|
||||||
|
sfp = <&sfp_eth3>;
|
||||||
|
reg = <2>;
|
||||||
|
port-id = <2>; /* For backward compatibility. */
|
||||||
|
gop-port-id = <3>;
|
||||||
|
};
|
||||||
|
};
|
|
@ -1,81 +0,0 @@
|
||||||
Marvell Prestera Switch Chip bindings
|
|
||||||
-------------------------------------
|
|
||||||
|
|
||||||
Required properties:
|
|
||||||
- compatible: must be "marvell,prestera" and one of the following
|
|
||||||
"marvell,prestera-98dx3236",
|
|
||||||
"marvell,prestera-98dx3336",
|
|
||||||
"marvell,prestera-98dx4251",
|
|
||||||
- reg: address and length of the register set for the device.
|
|
||||||
- interrupts: interrupt for the device
|
|
||||||
|
|
||||||
Optional properties:
|
|
||||||
- dfx: phandle reference to the "DFX Server" node
|
|
||||||
|
|
||||||
Example:
|
|
||||||
|
|
||||||
switch {
|
|
||||||
compatible = "simple-bus";
|
|
||||||
#address-cells = <1>;
|
|
||||||
#size-cells = <1>;
|
|
||||||
ranges = <0 MBUS_ID(0x03, 0x00) 0 0x100000>;
|
|
||||||
|
|
||||||
packet-processor@0 {
|
|
||||||
compatible = "marvell,prestera-98dx3236", "marvell,prestera";
|
|
||||||
reg = <0 0x4000000>;
|
|
||||||
interrupts = <33>, <34>, <35>;
|
|
||||||
dfx = <&dfx>;
|
|
||||||
};
|
|
||||||
};
|
|
||||||
|
|
||||||
DFX Server bindings
|
|
||||||
-------------------
|
|
||||||
|
|
||||||
Required properties:
|
|
||||||
- compatible: must be "marvell,dfx-server", "simple-bus"
|
|
||||||
- ranges: describes the address mapping of a memory-mapped bus.
|
|
||||||
- reg: address and length of the register set for the device.
|
|
||||||
|
|
||||||
Example:
|
|
||||||
|
|
||||||
dfx-server {
|
|
||||||
compatible = "marvell,dfx-server", "simple-bus";
|
|
||||||
#address-cells = <1>;
|
|
||||||
#size-cells = <1>;
|
|
||||||
ranges = <0 MBUS_ID(0x08, 0x00) 0 0x100000>;
|
|
||||||
reg = <MBUS_ID(0x08, 0x00) 0 0x100000>;
|
|
||||||
};
|
|
||||||
|
|
||||||
Marvell Prestera SwitchDev bindings
|
|
||||||
-----------------------------------
|
|
||||||
Optional properties:
|
|
||||||
- compatible: must be "marvell,prestera"
|
|
||||||
- base-mac-provider: describes handle to node which provides base mac address,
|
|
||||||
might be a static base mac address or nvme cell provider.
|
|
||||||
|
|
||||||
Example:
|
|
||||||
|
|
||||||
eeprom_mac_addr: eeprom-mac-addr {
|
|
||||||
compatible = "eeprom,mac-addr-cell";
|
|
||||||
status = "okay";
|
|
||||||
|
|
||||||
nvmem = <&eeprom_at24>;
|
|
||||||
};
|
|
||||||
|
|
||||||
prestera {
|
|
||||||
compatible = "marvell,prestera";
|
|
||||||
status = "okay";
|
|
||||||
|
|
||||||
base-mac-provider = <&eeprom_mac_addr>;
|
|
||||||
};
|
|
||||||
|
|
||||||
The current implementation of Prestera Switchdev PCI interface driver requires
|
|
||||||
that BAR2 is assigned to 0xf6000000 as base address from the PCI IO range:
|
|
||||||
|
|
||||||
&cp0_pcie0 {
|
|
||||||
ranges = <0x81000000 0x0 0xfb000000 0x0 0xfb000000 0x0 0xf0000
|
|
||||||
0x82000000 0x0 0xf6000000 0x0 0xf6000000 0x0 0x2000000
|
|
||||||
0x82000000 0x0 0xf9000000 0x0 0xf9000000 0x0 0x100000>;
|
|
||||||
phys = <&cp0_comphy0 0>;
|
|
||||||
status = "okay";
|
|
||||||
};
|
|
91
Documentation/devicetree/bindings/net/marvell,prestera.yaml
Normal file
91
Documentation/devicetree/bindings/net/marvell,prestera.yaml
Normal file
|
@ -0,0 +1,91 @@
|
||||||
|
# SPDX-License-Identifier: GPL-2.0-only OR BSD-2-Clause
|
||||||
|
%YAML 1.2
|
||||||
|
---
|
||||||
|
$id: http://devicetree.org/schemas/net/marvell,prestera.yaml#
|
||||||
|
$schema: http://devicetree.org/meta-schemas/core.yaml#
|
||||||
|
|
||||||
|
title: Marvell Prestera switch family
|
||||||
|
|
||||||
|
maintainers:
|
||||||
|
- Miquel Raynal <miquel.raynal@bootlin.com>
|
||||||
|
|
||||||
|
properties:
|
||||||
|
compatible:
|
||||||
|
oneOf:
|
||||||
|
- items:
|
||||||
|
- enum:
|
||||||
|
- marvell,prestera-98dx3236
|
||||||
|
- marvell,prestera-98dx3336
|
||||||
|
- marvell,prestera-98dx4251
|
||||||
|
- const: marvell,prestera
|
||||||
|
- enum:
|
||||||
|
- pci11ab,c804
|
||||||
|
- pci11ab,c80c
|
||||||
|
- pci11ab,cc1e
|
||||||
|
|
||||||
|
reg:
|
||||||
|
maxItems: 1
|
||||||
|
|
||||||
|
interrupts:
|
||||||
|
maxItems: 3
|
||||||
|
|
||||||
|
dfx:
|
||||||
|
description: Reference to the DFX Server bus node.
|
||||||
|
$ref: /schemas/types.yaml#/definitions/phandle
|
||||||
|
|
||||||
|
nvmem-cells: true
|
||||||
|
|
||||||
|
nvmem-cell-names: true
|
||||||
|
|
||||||
|
if:
|
||||||
|
properties:
|
||||||
|
compatible:
|
||||||
|
contains:
|
||||||
|
const: marvell,prestera
|
||||||
|
|
||||||
|
# Memory mapped AlleyCat3 family
|
||||||
|
then:
|
||||||
|
properties:
|
||||||
|
nvmem-cells: false
|
||||||
|
nvmem-cell-names: false
|
||||||
|
required:
|
||||||
|
- interrupts
|
||||||
|
|
||||||
|
# PCI Aldrin family
|
||||||
|
else:
|
||||||
|
properties:
|
||||||
|
interrupts: false
|
||||||
|
dfx: false
|
||||||
|
|
||||||
|
required:
|
||||||
|
- compatible
|
||||||
|
- reg
|
||||||
|
|
||||||
|
# Ports can also be described
|
||||||
|
additionalProperties:
|
||||||
|
type: object
|
||||||
|
|
||||||
|
examples:
|
||||||
|
- |
|
||||||
|
packet-processor@0 {
|
||||||
|
compatible = "marvell,prestera-98dx3236", "marvell,prestera";
|
||||||
|
reg = <0 0x4000000>;
|
||||||
|
interrupts = <33>, <34>, <35>;
|
||||||
|
dfx = <&dfx>;
|
||||||
|
};
|
||||||
|
|
||||||
|
- |
|
||||||
|
pcie@0 {
|
||||||
|
#address-cells = <3>;
|
||||||
|
#size-cells = <2>;
|
||||||
|
ranges = <0x0 0x0 0x0 0x0 0x0 0x0>;
|
||||||
|
reg = <0x0 0x0 0x0 0x0 0x0 0x0>;
|
||||||
|
device_type = "pci";
|
||||||
|
|
||||||
|
switch@0,0 {
|
||||||
|
reg = <0x0 0x0 0x0 0x0 0x0>;
|
||||||
|
compatible = "pci11ab,c80c";
|
||||||
|
nvmem-cells = <&mac_address 0>;
|
||||||
|
nvmem-cell-names = "mac-address";
|
||||||
|
};
|
||||||
|
};
|
|
@ -1,141 +0,0 @@
|
||||||
* Marvell Armada 375 Ethernet Controller (PPv2.1)
|
|
||||||
Marvell Armada 7K/8K Ethernet Controller (PPv2.2)
|
|
||||||
Marvell CN913X Ethernet Controller (PPv2.3)
|
|
||||||
|
|
||||||
Required properties:
|
|
||||||
|
|
||||||
- compatible: should be one of:
|
|
||||||
"marvell,armada-375-pp2"
|
|
||||||
"marvell,armada-7k-pp2"
|
|
||||||
- reg: addresses and length of the register sets for the device.
|
|
||||||
For "marvell,armada-375-pp2", must contain the following register
|
|
||||||
sets:
|
|
||||||
- common controller registers
|
|
||||||
- LMS registers
|
|
||||||
- one register area per Ethernet port
|
|
||||||
For "marvell,armada-7k-pp2" used by 7K/8K and CN913X, must contain the following register
|
|
||||||
sets:
|
|
||||||
- packet processor registers
|
|
||||||
- networking interfaces registers
|
|
||||||
- CM3 address space used for TX Flow Control
|
|
||||||
|
|
||||||
- clocks: pointers to the reference clocks for this device, consequently:
|
|
||||||
- main controller clock (for both armada-375-pp2 and armada-7k-pp2)
|
|
||||||
- GOP clock (for both armada-375-pp2 and armada-7k-pp2)
|
|
||||||
- MG clock (only for armada-7k-pp2)
|
|
||||||
- MG Core clock (only for armada-7k-pp2)
|
|
||||||
- AXI clock (only for armada-7k-pp2)
|
|
||||||
- clock-names: names of used clocks, must be "pp_clk", "gop_clk", "mg_clk",
|
|
||||||
"mg_core_clk" and "axi_clk" (the 3 latter only for armada-7k-pp2).
|
|
||||||
|
|
||||||
The ethernet ports are represented by subnodes. At least one port is
|
|
||||||
required.
|
|
||||||
|
|
||||||
Required properties (port):
|
|
||||||
|
|
||||||
- interrupts: interrupt(s) for the port
|
|
||||||
- port-id: ID of the port from the MAC point of view
|
|
||||||
- gop-port-id: only for marvell,armada-7k-pp2, ID of the port from the
|
|
||||||
GOP (Group Of Ports) point of view. This ID is used to index the
|
|
||||||
per-port registers in the second register area.
|
|
||||||
- phy-mode: See ethernet.txt file in the same directory
|
|
||||||
|
|
||||||
Optional properties (port):
|
|
||||||
|
|
||||||
- marvell,loopback: port is loopback mode
|
|
||||||
- phy: a phandle to a phy node defining the PHY address (as the reg
|
|
||||||
property, a single integer).
|
|
||||||
- interrupt-names: if more than a single interrupt for is given, must be the
|
|
||||||
name associated to the interrupts listed. Valid names are:
|
|
||||||
"hifX", with X in [0..8], and "link". The names "tx-cpu0",
|
|
||||||
"tx-cpu1", "tx-cpu2", "tx-cpu3" and "rx-shared" are supported
|
|
||||||
for backward compatibility but shouldn't be used for new
|
|
||||||
additions.
|
|
||||||
- marvell,system-controller: a phandle to the system controller.
|
|
||||||
|
|
||||||
Example for marvell,armada-375-pp2:
|
|
||||||
|
|
||||||
ethernet@f0000 {
|
|
||||||
compatible = "marvell,armada-375-pp2";
|
|
||||||
reg = <0xf0000 0xa000>,
|
|
||||||
<0xc0000 0x3060>,
|
|
||||||
<0xc4000 0x100>,
|
|
||||||
<0xc5000 0x100>;
|
|
||||||
clocks = <&gateclk 3>, <&gateclk 19>;
|
|
||||||
clock-names = "pp_clk", "gop_clk";
|
|
||||||
|
|
||||||
eth0: eth0@c4000 {
|
|
||||||
interrupts = <GIC_SPI 37 IRQ_TYPE_LEVEL_HIGH>;
|
|
||||||
port-id = <0>;
|
|
||||||
phy = <&phy0>;
|
|
||||||
phy-mode = "gmii";
|
|
||||||
};
|
|
||||||
|
|
||||||
eth1: eth1@c5000 {
|
|
||||||
interrupts = <GIC_SPI 41 IRQ_TYPE_LEVEL_HIGH>;
|
|
||||||
port-id = <1>;
|
|
||||||
phy = <&phy3>;
|
|
||||||
phy-mode = "gmii";
|
|
||||||
};
|
|
||||||
};
|
|
||||||
|
|
||||||
Example for marvell,armada-7k-pp2:
|
|
||||||
|
|
||||||
cpm_ethernet: ethernet@0 {
|
|
||||||
compatible = "marvell,armada-7k-pp22";
|
|
||||||
reg = <0x0 0x100000>, <0x129000 0xb000>, <0x220000 0x800>;
|
|
||||||
clocks = <&cpm_syscon0 1 3>, <&cpm_syscon0 1 9>,
|
|
||||||
<&cpm_syscon0 1 5>, <&cpm_syscon0 1 6>, <&cpm_syscon0 1 18>;
|
|
||||||
clock-names = "pp_clk", "gop_clk", "mg_clk", "mg_core_clk", "axi_clk";
|
|
||||||
|
|
||||||
eth0: eth0 {
|
|
||||||
interrupts = <ICU_GRP_NSR 39 IRQ_TYPE_LEVEL_HIGH>,
|
|
||||||
<ICU_GRP_NSR 43 IRQ_TYPE_LEVEL_HIGH>,
|
|
||||||
<ICU_GRP_NSR 47 IRQ_TYPE_LEVEL_HIGH>,
|
|
||||||
<ICU_GRP_NSR 51 IRQ_TYPE_LEVEL_HIGH>,
|
|
||||||
<ICU_GRP_NSR 55 IRQ_TYPE_LEVEL_HIGH>,
|
|
||||||
<ICU_GRP_NSR 59 IRQ_TYPE_LEVEL_HIGH>,
|
|
||||||
<ICU_GRP_NSR 63 IRQ_TYPE_LEVEL_HIGH>,
|
|
||||||
<ICU_GRP_NSR 67 IRQ_TYPE_LEVEL_HIGH>,
|
|
||||||
<ICU_GRP_NSR 71 IRQ_TYPE_LEVEL_HIGH>,
|
|
||||||
<ICU_GRP_NSR 129 IRQ_TYPE_LEVEL_HIGH>;
|
|
||||||
interrupt-names = "hif0", "hif1", "hif2", "hif3", "hif4",
|
|
||||||
"hif5", "hif6", "hif7", "hif8", "link";
|
|
||||||
port-id = <0>;
|
|
||||||
gop-port-id = <0>;
|
|
||||||
};
|
|
||||||
|
|
||||||
eth1: eth1 {
|
|
||||||
interrupts = <ICU_GRP_NSR 40 IRQ_TYPE_LEVEL_HIGH>,
|
|
||||||
<ICU_GRP_NSR 44 IRQ_TYPE_LEVEL_HIGH>,
|
|
||||||
<ICU_GRP_NSR 48 IRQ_TYPE_LEVEL_HIGH>,
|
|
||||||
<ICU_GRP_NSR 52 IRQ_TYPE_LEVEL_HIGH>,
|
|
||||||
<ICU_GRP_NSR 56 IRQ_TYPE_LEVEL_HIGH>,
|
|
||||||
<ICU_GRP_NSR 60 IRQ_TYPE_LEVEL_HIGH>,
|
|
||||||
<ICU_GRP_NSR 64 IRQ_TYPE_LEVEL_HIGH>,
|
|
||||||
<ICU_GRP_NSR 68 IRQ_TYPE_LEVEL_HIGH>,
|
|
||||||
<ICU_GRP_NSR 72 IRQ_TYPE_LEVEL_HIGH>,
|
|
||||||
<ICU_GRP_NSR 128 IRQ_TYPE_LEVEL_HIGH>;
|
|
||||||
interrupt-names = "hif0", "hif1", "hif2", "hif3", "hif4",
|
|
||||||
"hif5", "hif6", "hif7", "hif8", "link";
|
|
||||||
port-id = <1>;
|
|
||||||
gop-port-id = <2>;
|
|
||||||
};
|
|
||||||
|
|
||||||
eth2: eth2 {
|
|
||||||
interrupts = <ICU_GRP_NSR 41 IRQ_TYPE_LEVEL_HIGH>,
|
|
||||||
<ICU_GRP_NSR 45 IRQ_TYPE_LEVEL_HIGH>,
|
|
||||||
<ICU_GRP_NSR 49 IRQ_TYPE_LEVEL_HIGH>,
|
|
||||||
<ICU_GRP_NSR 53 IRQ_TYPE_LEVEL_HIGH>,
|
|
||||||
<ICU_GRP_NSR 57 IRQ_TYPE_LEVEL_HIGH>,
|
|
||||||
<ICU_GRP_NSR 61 IRQ_TYPE_LEVEL_HIGH>,
|
|
||||||
<ICU_GRP_NSR 65 IRQ_TYPE_LEVEL_HIGH>,
|
|
||||||
<ICU_GRP_NSR 69 IRQ_TYPE_LEVEL_HIGH>,
|
|
||||||
<ICU_GRP_NSR 73 IRQ_TYPE_LEVEL_HIGH>,
|
|
||||||
<ICU_GRP_NSR 127 IRQ_TYPE_LEVEL_HIGH>;
|
|
||||||
interrupt-names = "hif0", "hif1", "hif2", "hif3", "hif4",
|
|
||||||
"hif5", "hif6", "hif7", "hif8", "link";
|
|
||||||
port-id = <2>;
|
|
||||||
gop-port-id = <3>;
|
|
||||||
};
|
|
||||||
};
|
|
|
@ -39,7 +39,9 @@ properties:
|
||||||
- usb424,9e08 # SMSC LAN89530 USB Ethernet Device
|
- usb424,9e08 # SMSC LAN89530 USB Ethernet Device
|
||||||
- usb424,ec00 # SMSC9512/9514 USB Hub & Ethernet Device
|
- usb424,ec00 # SMSC9512/9514 USB Hub & Ethernet Device
|
||||||
|
|
||||||
reg: true
|
reg:
|
||||||
|
maxItems: 1
|
||||||
|
|
||||||
local-mac-address: true
|
local-mac-address: true
|
||||||
mac-address: true
|
mac-address: true
|
||||||
|
|
||||||
|
|
|
@ -14,7 +14,9 @@ properties:
|
||||||
oneOf:
|
oneOf:
|
||||||
- const: nxp,nxp-nci-i2c
|
- const: nxp,nxp-nci-i2c
|
||||||
- items:
|
- items:
|
||||||
- const: nxp,pn547
|
- enum:
|
||||||
|
- nxp,nq310
|
||||||
|
- nxp,pn547
|
||||||
- const: nxp,nxp-nci-i2c
|
- const: nxp,nxp-nci-i2c
|
||||||
|
|
||||||
enable-gpios:
|
enable-gpios:
|
||||||
|
|
|
@ -7,7 +7,9 @@ $schema: http://devicetree.org/meta-schemas/core.yaml#
|
||||||
title: NXP i.MX8 DWMAC glue layer
|
title: NXP i.MX8 DWMAC glue layer
|
||||||
|
|
||||||
maintainers:
|
maintainers:
|
||||||
- Joakim Zhang <qiangqing.zhang@nxp.com>
|
- Clark Wang <xiaoning.wang@nxp.com>
|
||||||
|
- Shawn Guo <shawnguo@kernel.org>
|
||||||
|
- NXP Linux Team <linux-imx@nxp.com>
|
||||||
|
|
||||||
# We need a select here so we don't match all nodes with 'snps,dwmac'
|
# We need a select here so we don't match all nodes with 'snps,dwmac'
|
||||||
select:
|
select:
|
||||||
|
|
40
Documentation/devicetree/bindings/net/pcs/fsl,lynx-pcs.yaml
Normal file
40
Documentation/devicetree/bindings/net/pcs/fsl,lynx-pcs.yaml
Normal file
|
@ -0,0 +1,40 @@
|
||||||
|
# SPDX-License-Identifier: GPL-2.0-only OR BSD-2-Clause
|
||||||
|
%YAML 1.2
|
||||||
|
---
|
||||||
|
$id: http://devicetree.org/schemas/net/pcs/fsl,lynx-pcs.yaml#
|
||||||
|
$schema: http://devicetree.org/meta-schemas/core.yaml#
|
||||||
|
|
||||||
|
title: NXP Lynx PCS
|
||||||
|
|
||||||
|
maintainers:
|
||||||
|
- Ioana Ciornei <ioana.ciornei@nxp.com>
|
||||||
|
|
||||||
|
description: |
|
||||||
|
NXP Lynx 10G and 28G SerDes have Ethernet PCS devices which can be used as
|
||||||
|
protocol controllers. They are accessible over the Ethernet interface's MDIO
|
||||||
|
bus.
|
||||||
|
|
||||||
|
properties:
|
||||||
|
compatible:
|
||||||
|
const: fsl,lynx-pcs
|
||||||
|
|
||||||
|
reg:
|
||||||
|
maxItems: 1
|
||||||
|
|
||||||
|
required:
|
||||||
|
- compatible
|
||||||
|
- reg
|
||||||
|
|
||||||
|
additionalProperties: false
|
||||||
|
|
||||||
|
examples:
|
||||||
|
- |
|
||||||
|
mdio-bus {
|
||||||
|
#address-cells = <1>;
|
||||||
|
#size-cells = <0>;
|
||||||
|
|
||||||
|
qsgmii_pcs1: ethernet-pcs@1 {
|
||||||
|
compatible = "fsl,lynx-pcs";
|
||||||
|
reg = <1>;
|
||||||
|
};
|
||||||
|
};
|
|
@ -123,7 +123,6 @@ examples:
|
||||||
|
|
||||||
switch_port0: port@0 {
|
switch_port0: port@0 {
|
||||||
reg = <0x0>;
|
reg = <0x0>;
|
||||||
label = "cpu";
|
|
||||||
ethernet = <ð1>;
|
ethernet = <ð1>;
|
||||||
|
|
||||||
phy-mode = "gmii";
|
phy-mode = "gmii";
|
||||||
|
|
|
@ -49,6 +49,7 @@ properties:
|
||||||
- qcom,sc7280-ipa
|
- qcom,sc7280-ipa
|
||||||
- qcom,sdm845-ipa
|
- qcom,sdm845-ipa
|
||||||
- qcom,sdx55-ipa
|
- qcom,sdx55-ipa
|
||||||
|
- qcom,sm6350-ipa
|
||||||
- qcom,sm8350-ipa
|
- qcom,sm8350-ipa
|
||||||
|
|
||||||
reg:
|
reg:
|
||||||
|
@ -124,19 +125,31 @@ properties:
|
||||||
- const: ipa-clock-enabled-valid
|
- const: ipa-clock-enabled-valid
|
||||||
- const: ipa-clock-enabled
|
- const: ipa-clock-enabled
|
||||||
|
|
||||||
|
qcom,gsi-loader:
|
||||||
|
enum:
|
||||||
|
- self
|
||||||
|
- modem
|
||||||
|
- skip
|
||||||
|
description:
|
||||||
|
Indicates how GSI firmware should be loaded. If the AP loads
|
||||||
|
and validates GSI firmware, this property has value "self".
|
||||||
|
If the modem does this, this property has value "modem".
|
||||||
|
Otherwise, "skip" means GSI firmware loading is not required.
|
||||||
|
|
||||||
modem-init:
|
modem-init:
|
||||||
|
deprecated: true
|
||||||
type: boolean
|
type: boolean
|
||||||
description:
|
description:
|
||||||
If present, it indicates that the modem is responsible for
|
This is the older (deprecated) way of indicating how GSI firmware
|
||||||
performing early IPA initialization, including loading and
|
should be loaded. If present, the modem loads GSI firmware; if
|
||||||
validating firwmare used by the GSI.
|
absent, the AP loads GSI firmware.
|
||||||
|
|
||||||
memory-region:
|
memory-region:
|
||||||
maxItems: 1
|
maxItems: 1
|
||||||
description:
|
description:
|
||||||
If present, a phandle for a reserved memory area that holds
|
If present, a phandle for a reserved memory area that holds
|
||||||
the firmware passed to Trust Zone for authentication. Required
|
the firmware passed to Trust Zone for authentication. Required
|
||||||
when Trust Zone (not the modem) performs early initialization.
|
when the AP (not the modem) performs early initialization.
|
||||||
|
|
||||||
firmware-name:
|
firmware-name:
|
||||||
$ref: /schemas/types.yaml#/definitions/string
|
$ref: /schemas/types.yaml#/definitions/string
|
||||||
|
@ -155,22 +168,36 @@ required:
|
||||||
- interconnects
|
- interconnects
|
||||||
- qcom,smem-states
|
- qcom,smem-states
|
||||||
|
|
||||||
# Either modem-init is present, or memory-region must be present.
|
allOf:
|
||||||
oneOf:
|
# If qcom,gsi-loader is present, modem-init must not be present
|
||||||
- required:
|
- if:
|
||||||
- modem-init
|
required:
|
||||||
- required:
|
- qcom,gsi-loader
|
||||||
- memory-region
|
then:
|
||||||
|
properties:
|
||||||
|
modem-init: false
|
||||||
|
|
||||||
# If memory-region is present, firmware-name may optionally be present.
|
# If qcom,gsi-loader is "self", the AP loads GSI firmware, and
|
||||||
# But if modem-init is present, firmware-name must not be present.
|
# memory-region must be specified
|
||||||
if:
|
if:
|
||||||
required:
|
properties:
|
||||||
- modem-init
|
qcom,gsi-loader:
|
||||||
then:
|
contains:
|
||||||
not:
|
const: self
|
||||||
required:
|
then:
|
||||||
- firmware-name
|
required:
|
||||||
|
- memory-region
|
||||||
|
else:
|
||||||
|
# If qcom,gsi-loader is not present, we use deprecated behavior.
|
||||||
|
# If modem-init is not present, the AP loads GSI firmware, and
|
||||||
|
# memory-region must be specified.
|
||||||
|
if:
|
||||||
|
not:
|
||||||
|
required:
|
||||||
|
- modem-init
|
||||||
|
then:
|
||||||
|
required:
|
||||||
|
- memory-region
|
||||||
|
|
||||||
additionalProperties: false
|
additionalProperties: false
|
||||||
|
|
||||||
|
@ -201,14 +228,17 @@ examples:
|
||||||
};
|
};
|
||||||
|
|
||||||
ipa@1e40000 {
|
ipa@1e40000 {
|
||||||
compatible = "qcom,sdm845-ipa";
|
compatible = "qcom,sc7180-ipa";
|
||||||
|
|
||||||
modem-init;
|
qcom,gsi-loader = "self";
|
||||||
|
memory-region = <&ipa_fw_mem>;
|
||||||
|
firmware-name = "qcom/sc7180-trogdor/modem/modem.mdt";
|
||||||
|
|
||||||
iommus = <&apps_smmu 0x720 0x3>;
|
iommus = <&apps_smmu 0x440 0x0>,
|
||||||
|
<&apps_smmu 0x442 0x0>;
|
||||||
reg = <0x1e40000 0x7000>,
|
reg = <0x1e40000 0x7000>,
|
||||||
<0x1e47000 0x2000>,
|
<0x1e47000 0x2000>,
|
||||||
<0x1e04000 0x2c000>;
|
<0x1e04000 0x2c000>;
|
||||||
reg-names = "ipa-reg",
|
reg-names = "ipa-reg",
|
||||||
"ipa-shared",
|
"ipa-shared",
|
||||||
"gsi";
|
"gsi";
|
||||||
|
@ -226,9 +256,9 @@ examples:
|
||||||
clock-names = "core";
|
clock-names = "core";
|
||||||
|
|
||||||
interconnects =
|
interconnects =
|
||||||
<&rsc_hlos MASTER_IPA &rsc_hlos SLAVE_EBI1>,
|
<&aggre2_noc MASTER_IPA 0 &mc_virt SLAVE_EBI1 0>,
|
||||||
<&rsc_hlos MASTER_IPA &rsc_hlos SLAVE_IMEM>,
|
<&aggre2_noc MASTER_IPA 0 &system_noc SLAVE_IMEM 0>,
|
||||||
<&rsc_hlos MASTER_APPSS_PROC &rsc_hlos SLAVE_IPA_CFG>;
|
<&gem_noc MASTER_APPSS_PROC 0 &config_noc SLAVE_IPA_CFG 0>;
|
||||||
interconnect-names = "memory",
|
interconnect-names = "memory",
|
||||||
"imem",
|
"imem",
|
||||||
"config";
|
"config";
|
||||||
|
|
|
@ -9,14 +9,18 @@ title: Qualcomm IPQ40xx MDIO Controller
|
||||||
maintainers:
|
maintainers:
|
||||||
- Robert Marko <robert.marko@sartura.hr>
|
- Robert Marko <robert.marko@sartura.hr>
|
||||||
|
|
||||||
allOf:
|
|
||||||
- $ref: "mdio.yaml#"
|
|
||||||
|
|
||||||
properties:
|
properties:
|
||||||
compatible:
|
compatible:
|
||||||
enum:
|
oneOf:
|
||||||
- qcom,ipq4019-mdio
|
- enum:
|
||||||
- qcom,ipq5018-mdio
|
- qcom,ipq4019-mdio
|
||||||
|
- qcom,ipq5018-mdio
|
||||||
|
|
||||||
|
- items:
|
||||||
|
- enum:
|
||||||
|
- qcom,ipq6018-mdio
|
||||||
|
- qcom,ipq8074-mdio
|
||||||
|
- const: qcom,ipq4019-mdio
|
||||||
|
|
||||||
"#address-cells":
|
"#address-cells":
|
||||||
const: 1
|
const: 1
|
||||||
|
@ -33,10 +37,12 @@ properties:
|
||||||
address range is only required by the platform IPQ50xx.
|
address range is only required by the platform IPQ50xx.
|
||||||
|
|
||||||
clocks:
|
clocks:
|
||||||
maxItems: 1
|
items:
|
||||||
description: |
|
- description: MDIO clock source frequency fixed to 100MHZ
|
||||||
MDIO clock source frequency fixed to 100MHZ, this clock should be specified
|
|
||||||
by the platform IPQ807x, IPQ60xx and IPQ50xx.
|
clock-names:
|
||||||
|
items:
|
||||||
|
- const: gcc_mdio_ahb_clk
|
||||||
|
|
||||||
required:
|
required:
|
||||||
- compatible
|
- compatible
|
||||||
|
@ -44,6 +50,26 @@ required:
|
||||||
- "#address-cells"
|
- "#address-cells"
|
||||||
- "#size-cells"
|
- "#size-cells"
|
||||||
|
|
||||||
|
allOf:
|
||||||
|
- $ref: "mdio.yaml#"
|
||||||
|
|
||||||
|
- if:
|
||||||
|
properties:
|
||||||
|
compatible:
|
||||||
|
contains:
|
||||||
|
enum:
|
||||||
|
- qcom,ipq5018-mdio
|
||||||
|
- qcom,ipq6018-mdio
|
||||||
|
- qcom,ipq8074-mdio
|
||||||
|
then:
|
||||||
|
required:
|
||||||
|
- clocks
|
||||||
|
- clock-names
|
||||||
|
else:
|
||||||
|
properties:
|
||||||
|
clocks: false
|
||||||
|
clock-names: false
|
||||||
|
|
||||||
unevaluatedProperties: false
|
unevaluatedProperties: false
|
||||||
|
|
||||||
examples:
|
examples:
|
||||||
|
|
|
@ -20,6 +20,7 @@ properties:
|
||||||
enum:
|
enum:
|
||||||
- realtek,rtl8723bs-bt
|
- realtek,rtl8723bs-bt
|
||||||
- realtek,rtl8723cs-bt
|
- realtek,rtl8723cs-bt
|
||||||
|
- realtek,rtl8723ds-bt
|
||||||
- realtek,rtl8822cs-bt
|
- realtek,rtl8822cs-bt
|
||||||
|
|
||||||
device-wake-gpios:
|
device-wake-gpios:
|
||||||
|
|
|
@ -0,0 +1,262 @@
|
||||||
|
# SPDX-License-Identifier: (GPL-2.0-only OR BSD-2-Clause)
|
||||||
|
%YAML 1.2
|
||||||
|
---
|
||||||
|
$id: http://devicetree.org/schemas/net/renesas,r8a779f0-ether-switch.yaml#
|
||||||
|
$schema: http://devicetree.org/meta-schemas/core.yaml#
|
||||||
|
|
||||||
|
title: Renesas Ethernet Switch
|
||||||
|
|
||||||
|
maintainers:
|
||||||
|
- Yoshihiro Shimoda <yoshihiro.shimoda.uh@renesas.com>
|
||||||
|
|
||||||
|
properties:
|
||||||
|
compatible:
|
||||||
|
const: renesas,r8a779f0-ether-switch
|
||||||
|
|
||||||
|
reg:
|
||||||
|
maxItems: 2
|
||||||
|
|
||||||
|
reg-names:
|
||||||
|
items:
|
||||||
|
- const: base
|
||||||
|
- const: secure_base
|
||||||
|
|
||||||
|
interrupts:
|
||||||
|
maxItems: 47
|
||||||
|
|
||||||
|
interrupt-names:
|
||||||
|
items:
|
||||||
|
- const: mfwd_error
|
||||||
|
- const: race_error
|
||||||
|
- const: coma_error
|
||||||
|
- const: gwca0_error
|
||||||
|
- const: gwca1_error
|
||||||
|
- const: etha0_error
|
||||||
|
- const: etha1_error
|
||||||
|
- const: etha2_error
|
||||||
|
- const: gptp0_status
|
||||||
|
- const: gptp1_status
|
||||||
|
- const: mfwd_status
|
||||||
|
- const: race_status
|
||||||
|
- const: coma_status
|
||||||
|
- const: gwca0_status
|
||||||
|
- const: gwca1_status
|
||||||
|
- const: etha0_status
|
||||||
|
- const: etha1_status
|
||||||
|
- const: etha2_status
|
||||||
|
- const: rmac0_status
|
||||||
|
- const: rmac1_status
|
||||||
|
- const: rmac2_status
|
||||||
|
- const: gwca0_rxtx0
|
||||||
|
- const: gwca0_rxtx1
|
||||||
|
- const: gwca0_rxtx2
|
||||||
|
- const: gwca0_rxtx3
|
||||||
|
- const: gwca0_rxtx4
|
||||||
|
- const: gwca0_rxtx5
|
||||||
|
- const: gwca0_rxtx6
|
||||||
|
- const: gwca0_rxtx7
|
||||||
|
- const: gwca1_rxtx0
|
||||||
|
- const: gwca1_rxtx1
|
||||||
|
- const: gwca1_rxtx2
|
||||||
|
- const: gwca1_rxtx3
|
||||||
|
- const: gwca1_rxtx4
|
||||||
|
- const: gwca1_rxtx5
|
||||||
|
- const: gwca1_rxtx6
|
||||||
|
- const: gwca1_rxtx7
|
||||||
|
- const: gwca0_rxts0
|
||||||
|
- const: gwca0_rxts1
|
||||||
|
- const: gwca1_rxts0
|
||||||
|
- const: gwca1_rxts1
|
||||||
|
- const: rmac0_mdio
|
||||||
|
- const: rmac1_mdio
|
||||||
|
- const: rmac2_mdio
|
||||||
|
- const: rmac0_phy
|
||||||
|
- const: rmac1_phy
|
||||||
|
- const: rmac2_phy
|
||||||
|
|
||||||
|
clocks:
|
||||||
|
maxItems: 1
|
||||||
|
|
||||||
|
resets:
|
||||||
|
maxItems: 1
|
||||||
|
|
||||||
|
iommus:
|
||||||
|
maxItems: 16
|
||||||
|
|
||||||
|
power-domains:
|
||||||
|
maxItems: 1
|
||||||
|
|
||||||
|
ethernet-ports:
|
||||||
|
type: object
|
||||||
|
additionalProperties: false
|
||||||
|
|
||||||
|
properties:
|
||||||
|
'#address-cells':
|
||||||
|
description: Port number of ETHA (TSNA).
|
||||||
|
const: 1
|
||||||
|
|
||||||
|
'#size-cells':
|
||||||
|
const: 0
|
||||||
|
|
||||||
|
patternProperties:
|
||||||
|
"^port@[0-9a-f]+$":
|
||||||
|
type: object
|
||||||
|
$ref: /schemas/net/ethernet-controller.yaml#
|
||||||
|
unevaluatedProperties: false
|
||||||
|
|
||||||
|
properties:
|
||||||
|
reg:
|
||||||
|
maxItems: 1
|
||||||
|
description:
|
||||||
|
Port number of ETHA (TSNA).
|
||||||
|
|
||||||
|
phys:
|
||||||
|
maxItems: 1
|
||||||
|
description:
|
||||||
|
Phandle of an Ethernet SERDES.
|
||||||
|
|
||||||
|
mdio:
|
||||||
|
$ref: /schemas/net/mdio.yaml#
|
||||||
|
unevaluatedProperties: false
|
||||||
|
|
||||||
|
required:
|
||||||
|
- reg
|
||||||
|
- phy-handle
|
||||||
|
- phy-mode
|
||||||
|
- phys
|
||||||
|
- mdio
|
||||||
|
|
||||||
|
required:
|
||||||
|
- compatible
|
||||||
|
- reg
|
||||||
|
- reg-names
|
||||||
|
- interrupts
|
||||||
|
- interrupt-names
|
||||||
|
- clocks
|
||||||
|
- resets
|
||||||
|
- power-domains
|
||||||
|
- ethernet-ports
|
||||||
|
|
||||||
|
additionalProperties: false
|
||||||
|
|
||||||
|
examples:
|
||||||
|
- |
|
||||||
|
#include <dt-bindings/clock/r8a779f0-cpg-mssr.h>
|
||||||
|
#include <dt-bindings/interrupt-controller/arm-gic.h>
|
||||||
|
#include <dt-bindings/power/r8a779f0-sysc.h>
|
||||||
|
|
||||||
|
ethernet@e6880000 {
|
||||||
|
compatible = "renesas,r8a779f0-ether-switch";
|
||||||
|
reg = <0xe6880000 0x20000>, <0xe68c0000 0x20000>;
|
||||||
|
reg-names = "base", "secure_base";
|
||||||
|
interrupts = <GIC_SPI 256 IRQ_TYPE_LEVEL_HIGH>,
|
||||||
|
<GIC_SPI 257 IRQ_TYPE_LEVEL_HIGH>,
|
||||||
|
<GIC_SPI 258 IRQ_TYPE_LEVEL_HIGH>,
|
||||||
|
<GIC_SPI 259 IRQ_TYPE_LEVEL_HIGH>,
|
||||||
|
<GIC_SPI 260 IRQ_TYPE_LEVEL_HIGH>,
|
||||||
|
<GIC_SPI 261 IRQ_TYPE_LEVEL_HIGH>,
|
||||||
|
<GIC_SPI 262 IRQ_TYPE_LEVEL_HIGH>,
|
||||||
|
<GIC_SPI 263 IRQ_TYPE_LEVEL_HIGH>,
|
||||||
|
<GIC_SPI 265 IRQ_TYPE_LEVEL_HIGH>,
|
||||||
|
<GIC_SPI 266 IRQ_TYPE_LEVEL_HIGH>,
|
||||||
|
<GIC_SPI 267 IRQ_TYPE_LEVEL_HIGH>,
|
||||||
|
<GIC_SPI 268 IRQ_TYPE_LEVEL_HIGH>,
|
||||||
|
<GIC_SPI 269 IRQ_TYPE_LEVEL_HIGH>,
|
||||||
|
<GIC_SPI 270 IRQ_TYPE_LEVEL_HIGH>,
|
||||||
|
<GIC_SPI 271 IRQ_TYPE_LEVEL_HIGH>,
|
||||||
|
<GIC_SPI 272 IRQ_TYPE_LEVEL_HIGH>,
|
||||||
|
<GIC_SPI 273 IRQ_TYPE_LEVEL_HIGH>,
|
||||||
|
<GIC_SPI 274 IRQ_TYPE_LEVEL_HIGH>,
|
||||||
|
<GIC_SPI 276 IRQ_TYPE_LEVEL_HIGH>,
|
||||||
|
<GIC_SPI 277 IRQ_TYPE_LEVEL_HIGH>,
|
||||||
|
<GIC_SPI 278 IRQ_TYPE_LEVEL_HIGH>,
|
||||||
|
<GIC_SPI 280 IRQ_TYPE_LEVEL_HIGH>,
|
||||||
|
<GIC_SPI 281 IRQ_TYPE_LEVEL_HIGH>,
|
||||||
|
<GIC_SPI 282 IRQ_TYPE_LEVEL_HIGH>,
|
||||||
|
<GIC_SPI 283 IRQ_TYPE_LEVEL_HIGH>,
|
||||||
|
<GIC_SPI 284 IRQ_TYPE_LEVEL_HIGH>,
|
||||||
|
<GIC_SPI 285 IRQ_TYPE_LEVEL_HIGH>,
|
||||||
|
<GIC_SPI 286 IRQ_TYPE_LEVEL_HIGH>,
|
||||||
|
<GIC_SPI 287 IRQ_TYPE_LEVEL_HIGH>,
|
||||||
|
<GIC_SPI 288 IRQ_TYPE_LEVEL_HIGH>,
|
||||||
|
<GIC_SPI 289 IRQ_TYPE_LEVEL_HIGH>,
|
||||||
|
<GIC_SPI 290 IRQ_TYPE_LEVEL_HIGH>,
|
||||||
|
<GIC_SPI 291 IRQ_TYPE_LEVEL_HIGH>,
|
||||||
|
<GIC_SPI 292 IRQ_TYPE_LEVEL_HIGH>,
|
||||||
|
<GIC_SPI 293 IRQ_TYPE_LEVEL_HIGH>,
|
||||||
|
<GIC_SPI 294 IRQ_TYPE_LEVEL_HIGH>,
|
||||||
|
<GIC_SPI 295 IRQ_TYPE_LEVEL_HIGH>,
|
||||||
|
<GIC_SPI 296 IRQ_TYPE_LEVEL_HIGH>,
|
||||||
|
<GIC_SPI 297 IRQ_TYPE_LEVEL_HIGH>,
|
||||||
|
<GIC_SPI 298 IRQ_TYPE_LEVEL_HIGH>,
|
||||||
|
<GIC_SPI 299 IRQ_TYPE_LEVEL_HIGH>,
|
||||||
|
<GIC_SPI 300 IRQ_TYPE_LEVEL_HIGH>,
|
||||||
|
<GIC_SPI 301 IRQ_TYPE_LEVEL_HIGH>,
|
||||||
|
<GIC_SPI 302 IRQ_TYPE_LEVEL_HIGH>,
|
||||||
|
<GIC_SPI 304 IRQ_TYPE_LEVEL_HIGH>,
|
||||||
|
<GIC_SPI 305 IRQ_TYPE_LEVEL_HIGH>,
|
||||||
|
<GIC_SPI 306 IRQ_TYPE_LEVEL_HIGH>;
|
||||||
|
interrupt-names = "mfwd_error", "race_error",
|
||||||
|
"coma_error", "gwca0_error",
|
||||||
|
"gwca1_error", "etha0_error",
|
||||||
|
"etha1_error", "etha2_error",
|
||||||
|
"gptp0_status", "gptp1_status",
|
||||||
|
"mfwd_status", "race_status",
|
||||||
|
"coma_status", "gwca0_status",
|
||||||
|
"gwca1_status", "etha0_status",
|
||||||
|
"etha1_status", "etha2_status",
|
||||||
|
"rmac0_status", "rmac1_status",
|
||||||
|
"rmac2_status",
|
||||||
|
"gwca0_rxtx0", "gwca0_rxtx1",
|
||||||
|
"gwca0_rxtx2", "gwca0_rxtx3",
|
||||||
|
"gwca0_rxtx4", "gwca0_rxtx5",
|
||||||
|
"gwca0_rxtx6", "gwca0_rxtx7",
|
||||||
|
"gwca1_rxtx0", "gwca1_rxtx1",
|
||||||
|
"gwca1_rxtx2", "gwca1_rxtx3",
|
||||||
|
"gwca1_rxtx4", "gwca1_rxtx5",
|
||||||
|
"gwca1_rxtx6", "gwca1_rxtx7",
|
||||||
|
"gwca0_rxts0", "gwca0_rxts1",
|
||||||
|
"gwca1_rxts0", "gwca1_rxts1",
|
||||||
|
"rmac0_mdio", "rmac1_mdio",
|
||||||
|
"rmac2_mdio",
|
||||||
|
"rmac0_phy", "rmac1_phy",
|
||||||
|
"rmac2_phy";
|
||||||
|
clocks = <&cpg CPG_MOD 1505>;
|
||||||
|
power-domains = <&sysc R8A779F0_PD_ALWAYS_ON>;
|
||||||
|
resets = <&cpg 1505>;
|
||||||
|
|
||||||
|
ethernet-ports {
|
||||||
|
#address-cells = <1>;
|
||||||
|
#size-cells = <0>;
|
||||||
|
port@0 {
|
||||||
|
reg = <0>;
|
||||||
|
phy-handle = <ð_phy0>;
|
||||||
|
phy-mode = "sgmii";
|
||||||
|
phys = <ð_serdes 0>;
|
||||||
|
mdio {
|
||||||
|
#address-cells = <1>;
|
||||||
|
#size-cells = <0>;
|
||||||
|
};
|
||||||
|
};
|
||||||
|
port@1 {
|
||||||
|
reg = <1>;
|
||||||
|
phy-handle = <ð_phy1>;
|
||||||
|
phy-mode = "sgmii";
|
||||||
|
phys = <ð_serdes 1>;
|
||||||
|
mdio {
|
||||||
|
#address-cells = <1>;
|
||||||
|
#size-cells = <0>;
|
||||||
|
};
|
||||||
|
};
|
||||||
|
port@2 {
|
||||||
|
reg = <2>;
|
||||||
|
phy-handle = <ð_phy2>;
|
||||||
|
phy-mode = "sgmii";
|
||||||
|
phys = <ð_serdes 2>;
|
||||||
|
mdio {
|
||||||
|
#address-cells = <1>;
|
||||||
|
#size-cells = <0>;
|
||||||
|
};
|
||||||
|
};
|
||||||
|
};
|
||||||
|
};
|
|
@ -22,7 +22,8 @@ properties:
|
||||||
phandle of an I2C bus controller for the SFP two wire serial
|
phandle of an I2C bus controller for the SFP two wire serial
|
||||||
|
|
||||||
maximum-power-milliwatt:
|
maximum-power-milliwatt:
|
||||||
maxItems: 1
|
minimum: 1000
|
||||||
|
default: 1000
|
||||||
description:
|
description:
|
||||||
Maximum module power consumption Specifies the maximum power consumption
|
Maximum module power consumption Specifies the maximum power consumption
|
||||||
allowable by a module in the slot, in milli-Watts. Presently, modules can
|
allowable by a module in the slot, in milli-Watts. Presently, modules can
|
||||||
|
|
|
@ -167,56 +167,238 @@ properties:
|
||||||
snps,mtl-rx-config:
|
snps,mtl-rx-config:
|
||||||
$ref: /schemas/types.yaml#/definitions/phandle
|
$ref: /schemas/types.yaml#/definitions/phandle
|
||||||
description:
|
description:
|
||||||
Multiple RX Queues parameters. Phandle to a node that can
|
Multiple RX Queues parameters. Phandle to a node that
|
||||||
contain the following properties
|
implements the 'rx-queues-config' object described in
|
||||||
* snps,rx-queues-to-use, number of RX queues to be used in the
|
this binding.
|
||||||
driver
|
|
||||||
* Choose one of these RX scheduling algorithms
|
rx-queues-config:
|
||||||
* snps,rx-sched-sp, Strict priority
|
type: object
|
||||||
* snps,rx-sched-wsp, Weighted Strict priority
|
properties:
|
||||||
* For each RX queue
|
snps,rx-queues-to-use:
|
||||||
* Choose one of these modes
|
$ref: /schemas/types.yaml#/definitions/uint32
|
||||||
* snps,dcb-algorithm, Queue to be enabled as DCB
|
description: number of RX queues to be used in the driver
|
||||||
* snps,avb-algorithm, Queue to be enabled as AVB
|
snps,rx-sched-sp:
|
||||||
* snps,map-to-dma-channel, Channel to map
|
type: boolean
|
||||||
* Specifiy specific packet routing
|
description: Strict priority
|
||||||
* snps,route-avcp, AV Untagged Control packets
|
snps,rx-sched-wsp:
|
||||||
* snps,route-ptp, PTP Packets
|
type: boolean
|
||||||
* snps,route-dcbcp, DCB Control Packets
|
description: Weighted Strict priority
|
||||||
* snps,route-up, Untagged Packets
|
allOf:
|
||||||
* snps,route-multi-broad, Multicast & Broadcast Packets
|
- if:
|
||||||
* snps,priority, bitmask of the tagged frames priorities assigned to
|
required:
|
||||||
the queue
|
- snps,rx-sched-sp
|
||||||
|
then:
|
||||||
|
properties:
|
||||||
|
snps,rx-sched-wsp: false
|
||||||
|
- if:
|
||||||
|
required:
|
||||||
|
- snps,rx-sched-wsp
|
||||||
|
then:
|
||||||
|
properties:
|
||||||
|
snps,rx-sched-sp: false
|
||||||
|
patternProperties:
|
||||||
|
"^queue[0-9]$":
|
||||||
|
description: Each subnode represents a queue.
|
||||||
|
type: object
|
||||||
|
properties:
|
||||||
|
snps,dcb-algorithm:
|
||||||
|
type: boolean
|
||||||
|
description: Queue to be enabled as DCB
|
||||||
|
snps,avb-algorithm:
|
||||||
|
type: boolean
|
||||||
|
description: Queue to be enabled as AVB
|
||||||
|
snps,map-to-dma-channel:
|
||||||
|
$ref: /schemas/types.yaml#/definitions/uint32
|
||||||
|
description: DMA channel id to map
|
||||||
|
snps,route-avcp:
|
||||||
|
type: boolean
|
||||||
|
description: AV Untagged Control packets
|
||||||
|
snps,route-ptp:
|
||||||
|
type: boolean
|
||||||
|
description: PTP Packets
|
||||||
|
snps,route-dcbcp:
|
||||||
|
type: boolean
|
||||||
|
description: DCB Control Packets
|
||||||
|
snps,route-up:
|
||||||
|
type: boolean
|
||||||
|
description: Untagged Packets
|
||||||
|
snps,route-multi-broad:
|
||||||
|
type: boolean
|
||||||
|
description: Multicast & Broadcast Packets
|
||||||
|
snps,priority:
|
||||||
|
$ref: /schemas/types.yaml#/definitions/uint32
|
||||||
|
description: Bitmask of the tagged frames priorities assigned to the queue
|
||||||
|
allOf:
|
||||||
|
- if:
|
||||||
|
required:
|
||||||
|
- snps,dcb-algorithm
|
||||||
|
then:
|
||||||
|
properties:
|
||||||
|
snps,avb-algorithm: false
|
||||||
|
- if:
|
||||||
|
required:
|
||||||
|
- snps,avb-algorithm
|
||||||
|
then:
|
||||||
|
properties:
|
||||||
|
snps,dcb-algorithm: false
|
||||||
|
- if:
|
||||||
|
required:
|
||||||
|
- snps,route-avcp
|
||||||
|
then:
|
||||||
|
properties:
|
||||||
|
snps,route-ptp: false
|
||||||
|
snps,route-dcbcp: false
|
||||||
|
snps,route-up: false
|
||||||
|
snps,route-multi-broad: false
|
||||||
|
- if:
|
||||||
|
required:
|
||||||
|
- snps,route-ptp
|
||||||
|
then:
|
||||||
|
properties:
|
||||||
|
snps,route-avcp: false
|
||||||
|
snps,route-dcbcp: false
|
||||||
|
snps,route-up: false
|
||||||
|
snps,route-multi-broad: false
|
||||||
|
- if:
|
||||||
|
required:
|
||||||
|
- snps,route-dcbcp
|
||||||
|
then:
|
||||||
|
properties:
|
||||||
|
snps,route-avcp: false
|
||||||
|
snps,route-ptp: false
|
||||||
|
snps,route-up: false
|
||||||
|
snps,route-multi-broad: false
|
||||||
|
- if:
|
||||||
|
required:
|
||||||
|
- snps,route-up
|
||||||
|
then:
|
||||||
|
properties:
|
||||||
|
snps,route-avcp: false
|
||||||
|
snps,route-ptp: false
|
||||||
|
snps,route-dcbcp: false
|
||||||
|
snps,route-multi-broad: false
|
||||||
|
- if:
|
||||||
|
required:
|
||||||
|
- snps,route-multi-broad
|
||||||
|
then:
|
||||||
|
properties:
|
||||||
|
snps,route-avcp: false
|
||||||
|
snps,route-ptp: false
|
||||||
|
snps,route-dcbcp: false
|
||||||
|
snps,route-up: false
|
||||||
|
additionalProperties: false
|
||||||
|
additionalProperties: false
|
||||||
|
|
||||||
snps,mtl-tx-config:
|
snps,mtl-tx-config:
|
||||||
$ref: /schemas/types.yaml#/definitions/phandle
|
$ref: /schemas/types.yaml#/definitions/phandle
|
||||||
description:
|
description:
|
||||||
Multiple TX Queues parameters. Phandle to a node that can
|
Multiple TX Queues parameters. Phandle to a node that
|
||||||
contain the following properties
|
implements the 'tx-queues-config' object described in
|
||||||
* snps,tx-queues-to-use, number of TX queues to be used in the
|
this binding.
|
||||||
driver
|
|
||||||
* Choose one of these TX scheduling algorithms
|
tx-queues-config:
|
||||||
* snps,tx-sched-wrr, Weighted Round Robin
|
type: object
|
||||||
* snps,tx-sched-wfq, Weighted Fair Queuing
|
properties:
|
||||||
* snps,tx-sched-dwrr, Deficit Weighted Round Robin
|
snps,tx-queues-to-use:
|
||||||
* snps,tx-sched-sp, Strict priority
|
$ref: /schemas/types.yaml#/definitions/uint32
|
||||||
* For each TX queue
|
description: number of TX queues to be used in the driver
|
||||||
* snps,weight, TX queue weight (if using a DCB weight
|
snps,tx-sched-wrr:
|
||||||
algorithm)
|
type: boolean
|
||||||
* Choose one of these modes
|
description: Weighted Round Robin
|
||||||
* snps,dcb-algorithm, TX queue will be working in DCB
|
snps,tx-sched-wfq:
|
||||||
* snps,avb-algorithm, TX queue will be working in AVB
|
type: boolean
|
||||||
[Attention] Queue 0 is reserved for legacy traffic
|
description: Weighted Fair Queuing
|
||||||
and so no AVB is available in this queue.
|
snps,tx-sched-dwrr:
|
||||||
* Configure Credit Base Shaper (if AVB Mode selected)
|
type: boolean
|
||||||
* snps,send_slope, enable Low Power Interface
|
description: Deficit Weighted Round Robin
|
||||||
* snps,idle_slope, unlock on WoL
|
snps,tx-sched-sp:
|
||||||
* snps,high_credit, max write outstanding req. limit
|
type: boolean
|
||||||
* snps,low_credit, max read outstanding req. limit
|
description: Strict priority
|
||||||
* snps,priority, bitmask of the priorities assigned to the queue.
|
allOf:
|
||||||
When a PFC frame is received with priorities matching the bitmask,
|
- if:
|
||||||
the queue is blocked from transmitting for the pause time specified
|
required:
|
||||||
in the PFC frame.
|
- snps,tx-sched-wrr
|
||||||
|
then:
|
||||||
|
properties:
|
||||||
|
snps,tx-sched-wfq: false
|
||||||
|
snps,tx-sched-dwrr: false
|
||||||
|
snps,tx-sched-sp: false
|
||||||
|
- if:
|
||||||
|
required:
|
||||||
|
- snps,tx-sched-wfq
|
||||||
|
then:
|
||||||
|
properties:
|
||||||
|
snps,tx-sched-wrr: false
|
||||||
|
snps,tx-sched-dwrr: false
|
||||||
|
snps,tx-sched-sp: false
|
||||||
|
- if:
|
||||||
|
required:
|
||||||
|
- snps,tx-sched-dwrr
|
||||||
|
then:
|
||||||
|
properties:
|
||||||
|
snps,tx-sched-wrr: false
|
||||||
|
snps,tx-sched-wfq: false
|
||||||
|
snps,tx-sched-sp: false
|
||||||
|
- if:
|
||||||
|
required:
|
||||||
|
- snps,tx-sched-sp
|
||||||
|
then:
|
||||||
|
properties:
|
||||||
|
snps,tx-sched-wrr: false
|
||||||
|
snps,tx-sched-wfq: false
|
||||||
|
snps,tx-sched-dwrr: false
|
||||||
|
patternProperties:
|
||||||
|
"^queue[0-9]$":
|
||||||
|
description: Each subnode represents a queue.
|
||||||
|
type: object
|
||||||
|
properties:
|
||||||
|
snps,weight:
|
||||||
|
$ref: /schemas/types.yaml#/definitions/uint32
|
||||||
|
description: TX queue weight (if using a DCB weight algorithm)
|
||||||
|
snps,dcb-algorithm:
|
||||||
|
type: boolean
|
||||||
|
description: TX queue will be working in DCB
|
||||||
|
snps,avb-algorithm:
|
||||||
|
type: boolean
|
||||||
|
description:
|
||||||
|
TX queue will be working in AVB.
|
||||||
|
Queue 0 is reserved for legacy traffic and so no AVB is
|
||||||
|
available in this queue.
|
||||||
|
snps,send_slope:
|
||||||
|
$ref: /schemas/types.yaml#/definitions/uint32
|
||||||
|
description: enable Low Power Interface
|
||||||
|
snps,idle_slope:
|
||||||
|
$ref: /schemas/types.yaml#/definitions/uint32
|
||||||
|
description: unlock on WoL
|
||||||
|
snps,high_credit:
|
||||||
|
$ref: /schemas/types.yaml#/definitions/uint32
|
||||||
|
description: max write outstanding req. limit
|
||||||
|
snps,low_credit:
|
||||||
|
$ref: /schemas/types.yaml#/definitions/uint32
|
||||||
|
description: max read outstanding req. limit
|
||||||
|
snps,priority:
|
||||||
|
$ref: /schemas/types.yaml#/definitions/uint32
|
||||||
|
description:
|
||||||
|
Bitmask of the tagged frames priorities assigned to the queue.
|
||||||
|
When a PFC frame is received with priorities matching the bitmask,
|
||||||
|
the queue is blocked from transmitting for the pause time specified
|
||||||
|
in the PFC frame.
|
||||||
|
allOf:
|
||||||
|
- if:
|
||||||
|
required:
|
||||||
|
- snps,dcb-algorithm
|
||||||
|
then:
|
||||||
|
properties:
|
||||||
|
snps,avb-algorithm: false
|
||||||
|
- if:
|
||||||
|
required:
|
||||||
|
- snps,avb-algorithm
|
||||||
|
then:
|
||||||
|
properties:
|
||||||
|
snps,dcb-algorithm: false
|
||||||
|
snps,weight: false
|
||||||
|
additionalProperties: false
|
||||||
|
additionalProperties: false
|
||||||
|
|
||||||
snps,reset-gpio:
|
snps,reset-gpio:
|
||||||
deprecated: true
|
deprecated: true
|
||||||
|
@ -463,41 +645,6 @@ additionalProperties: true
|
||||||
|
|
||||||
examples:
|
examples:
|
||||||
- |
|
- |
|
||||||
stmmac_axi_setup: stmmac-axi-config {
|
|
||||||
snps,wr_osr_lmt = <0xf>;
|
|
||||||
snps,rd_osr_lmt = <0xf>;
|
|
||||||
snps,blen = <256 128 64 32 0 0 0>;
|
|
||||||
};
|
|
||||||
|
|
||||||
mtl_rx_setup: rx-queues-config {
|
|
||||||
snps,rx-queues-to-use = <1>;
|
|
||||||
snps,rx-sched-sp;
|
|
||||||
queue0 {
|
|
||||||
snps,dcb-algorithm;
|
|
||||||
snps,map-to-dma-channel = <0x0>;
|
|
||||||
snps,priority = <0x0>;
|
|
||||||
};
|
|
||||||
};
|
|
||||||
|
|
||||||
mtl_tx_setup: tx-queues-config {
|
|
||||||
snps,tx-queues-to-use = <2>;
|
|
||||||
snps,tx-sched-wrr;
|
|
||||||
queue0 {
|
|
||||||
snps,weight = <0x10>;
|
|
||||||
snps,dcb-algorithm;
|
|
||||||
snps,priority = <0x0>;
|
|
||||||
};
|
|
||||||
|
|
||||||
queue1 {
|
|
||||||
snps,avb-algorithm;
|
|
||||||
snps,send_slope = <0x1000>;
|
|
||||||
snps,idle_slope = <0x1000>;
|
|
||||||
snps,high_credit = <0x3E800>;
|
|
||||||
snps,low_credit = <0xFFC18000>;
|
|
||||||
snps,priority = <0x1>;
|
|
||||||
};
|
|
||||||
};
|
|
||||||
|
|
||||||
gmac0: ethernet@e0800000 {
|
gmac0: ethernet@e0800000 {
|
||||||
compatible = "snps,dwxgmac-2.10", "snps,dwxgmac";
|
compatible = "snps,dwxgmac-2.10", "snps,dwxgmac";
|
||||||
reg = <0xe0800000 0x8000>;
|
reg = <0xe0800000 0x8000>;
|
||||||
|
@ -516,6 +663,42 @@ examples:
|
||||||
snps,axi-config = <&stmmac_axi_setup>;
|
snps,axi-config = <&stmmac_axi_setup>;
|
||||||
snps,mtl-rx-config = <&mtl_rx_setup>;
|
snps,mtl-rx-config = <&mtl_rx_setup>;
|
||||||
snps,mtl-tx-config = <&mtl_tx_setup>;
|
snps,mtl-tx-config = <&mtl_tx_setup>;
|
||||||
|
|
||||||
|
stmmac_axi_setup: stmmac-axi-config {
|
||||||
|
snps,wr_osr_lmt = <0xf>;
|
||||||
|
snps,rd_osr_lmt = <0xf>;
|
||||||
|
snps,blen = <256 128 64 32 0 0 0>;
|
||||||
|
};
|
||||||
|
|
||||||
|
mtl_rx_setup: rx-queues-config {
|
||||||
|
snps,rx-queues-to-use = <1>;
|
||||||
|
snps,rx-sched-sp;
|
||||||
|
queue0 {
|
||||||
|
snps,dcb-algorithm;
|
||||||
|
snps,map-to-dma-channel = <0x0>;
|
||||||
|
snps,priority = <0x0>;
|
||||||
|
};
|
||||||
|
};
|
||||||
|
|
||||||
|
mtl_tx_setup: tx-queues-config {
|
||||||
|
snps,tx-queues-to-use = <2>;
|
||||||
|
snps,tx-sched-wrr;
|
||||||
|
queue0 {
|
||||||
|
snps,weight = <0x10>;
|
||||||
|
snps,dcb-algorithm;
|
||||||
|
snps,priority = <0x0>;
|
||||||
|
};
|
||||||
|
|
||||||
|
queue1 {
|
||||||
|
snps,avb-algorithm;
|
||||||
|
snps,send_slope = <0x1000>;
|
||||||
|
snps,idle_slope = <0x1000>;
|
||||||
|
snps,high_credit = <0x3E800>;
|
||||||
|
snps,low_credit = <0xFFC18000>;
|
||||||
|
snps,priority = <0x1>;
|
||||||
|
};
|
||||||
|
};
|
||||||
|
|
||||||
mdio0 {
|
mdio0 {
|
||||||
#address-cells = <1>;
|
#address-cells = <1>;
|
||||||
#size-cells = <0>;
|
#size-cells = <0>;
|
||||||
|
|
|
@ -0,0 +1,73 @@
|
||||||
|
# SPDX-License-Identifier: (GPL-2.0-only OR BSD-2-Clause)
|
||||||
|
%YAML 1.2
|
||||||
|
---
|
||||||
|
$id: http://devicetree.org/schemas/net/socionext,synquacer-netsec.yaml#
|
||||||
|
$schema: http://devicetree.org/meta-schemas/core.yaml#
|
||||||
|
|
||||||
|
title: Socionext NetSec Ethernet Controller IP
|
||||||
|
|
||||||
|
maintainers:
|
||||||
|
- Jassi Brar <jaswinder.singh@linaro.org>
|
||||||
|
- Ilias Apalodimas <ilias.apalodimas@linaro.org>
|
||||||
|
|
||||||
|
allOf:
|
||||||
|
- $ref: ethernet-controller.yaml#
|
||||||
|
|
||||||
|
properties:
|
||||||
|
compatible:
|
||||||
|
const: socionext,synquacer-netsec
|
||||||
|
|
||||||
|
reg:
|
||||||
|
items:
|
||||||
|
- description: control register area
|
||||||
|
- description: EEPROM holding the MAC address and microengine firmware
|
||||||
|
|
||||||
|
clocks:
|
||||||
|
maxItems: 1
|
||||||
|
|
||||||
|
clock-names:
|
||||||
|
const: phy_ref_clk
|
||||||
|
|
||||||
|
dma-coherent: true
|
||||||
|
|
||||||
|
interrupts:
|
||||||
|
maxItems: 1
|
||||||
|
|
||||||
|
mdio:
|
||||||
|
$ref: mdio.yaml#
|
||||||
|
|
||||||
|
required:
|
||||||
|
- compatible
|
||||||
|
- reg
|
||||||
|
- clocks
|
||||||
|
- clock-names
|
||||||
|
- interrupts
|
||||||
|
- mdio
|
||||||
|
|
||||||
|
unevaluatedProperties: false
|
||||||
|
|
||||||
|
examples:
|
||||||
|
- |
|
||||||
|
#include <dt-bindings/interrupt-controller/arm-gic.h>
|
||||||
|
|
||||||
|
ethernet@522d0000 {
|
||||||
|
compatible = "socionext,synquacer-netsec";
|
||||||
|
reg = <0x522d0000 0x10000>, <0x10000000 0x10000>;
|
||||||
|
interrupts = <GIC_SPI 176 IRQ_TYPE_LEVEL_HIGH>;
|
||||||
|
clocks = <&clk_netsec>;
|
||||||
|
clock-names = "phy_ref_clk";
|
||||||
|
phy-mode = "rgmii";
|
||||||
|
max-speed = <1000>;
|
||||||
|
max-frame-size = <9000>;
|
||||||
|
phy-handle = <&phy1>;
|
||||||
|
|
||||||
|
mdio {
|
||||||
|
#address-cells = <1>;
|
||||||
|
#size-cells = <0>;
|
||||||
|
phy1: ethernet-phy@1 {
|
||||||
|
compatible = "ethernet-phy-ieee802.3-c22";
|
||||||
|
reg = <1>;
|
||||||
|
};
|
||||||
|
};
|
||||||
|
};
|
||||||
|
...
|
|
@ -1,56 +0,0 @@
|
||||||
* Socionext NetSec Ethernet Controller IP
|
|
||||||
|
|
||||||
Required properties:
|
|
||||||
- compatible: Should be "socionext,synquacer-netsec"
|
|
||||||
- reg: Address and length of the control register area, followed by the
|
|
||||||
address and length of the EEPROM holding the MAC address and
|
|
||||||
microengine firmware
|
|
||||||
- interrupts: Should contain ethernet controller interrupt
|
|
||||||
- clocks: phandle to the PHY reference clock
|
|
||||||
- clock-names: Should be "phy_ref_clk"
|
|
||||||
- phy-mode: See ethernet.txt file in the same directory
|
|
||||||
- phy-handle: See ethernet.txt in the same directory.
|
|
||||||
|
|
||||||
- mdio device tree subnode: When the Netsec has a phy connected to its local
|
|
||||||
mdio, there must be device tree subnode with the following
|
|
||||||
required properties:
|
|
||||||
|
|
||||||
- #address-cells: Must be <1>.
|
|
||||||
- #size-cells: Must be <0>.
|
|
||||||
|
|
||||||
For each phy on the mdio bus, there must be a node with the following
|
|
||||||
fields:
|
|
||||||
- compatible: Refer to phy.txt
|
|
||||||
- reg: phy id used to communicate to phy.
|
|
||||||
|
|
||||||
Optional properties: (See ethernet.txt file in the same directory)
|
|
||||||
- dma-coherent: Boolean property, must only be present if memory
|
|
||||||
accesses performed by the device are cache coherent.
|
|
||||||
- max-speed: See ethernet.txt in the same directory.
|
|
||||||
- max-frame-size: See ethernet.txt in the same directory.
|
|
||||||
|
|
||||||
The MAC address will be determined using the optional properties
|
|
||||||
defined in ethernet.txt. The 'phy-mode' property is required, but may
|
|
||||||
be set to the empty string if the PHY configuration is programmed by
|
|
||||||
the firmware or set by hardware straps, and needs to be preserved.
|
|
||||||
|
|
||||||
Example:
|
|
||||||
eth0: ethernet@522d0000 {
|
|
||||||
compatible = "socionext,synquacer-netsec";
|
|
||||||
reg = <0 0x522d0000 0x0 0x10000>, <0 0x10000000 0x0 0x10000>;
|
|
||||||
interrupts = <GIC_SPI 176 IRQ_TYPE_LEVEL_HIGH>;
|
|
||||||
clocks = <&clk_netsec>;
|
|
||||||
clock-names = "phy_ref_clk";
|
|
||||||
phy-mode = "rgmii";
|
|
||||||
max-speed = <1000>;
|
|
||||||
max-frame-size = <9000>;
|
|
||||||
phy-handle = <&phy1>;
|
|
||||||
|
|
||||||
mdio {
|
|
||||||
#address-cells = <1>;
|
|
||||||
#size-cells = <0>;
|
|
||||||
phy1: ethernet-phy@1 {
|
|
||||||
compatible = "ethernet-phy-ieee802.3-c22";
|
|
||||||
reg = <1>;
|
|
||||||
};
|
|
||||||
};
|
|
|
@ -68,6 +68,8 @@ Optional properties:
|
||||||
- mdio : Child node for MDIO bus. Must be defined if PHY access is
|
- mdio : Child node for MDIO bus. Must be defined if PHY access is
|
||||||
required through the core's MDIO interface (i.e. always,
|
required through the core's MDIO interface (i.e. always,
|
||||||
unless the PHY is accessed through a different bus).
|
unless the PHY is accessed through a different bus).
|
||||||
|
Non-standard MDIO bus frequency is supported via
|
||||||
|
"clock-frequency", see mdio.yaml.
|
||||||
|
|
||||||
- pcs-handle: Phandle to the internal PCS/PMA PHY in SGMII or 1000Base-X
|
- pcs-handle: Phandle to the internal PCS/PMA PHY in SGMII or 1000Base-X
|
||||||
modes, where "pcs-handle" should be used to point
|
modes, where "pcs-handle" should be used to point
|
||||||
|
|
|
@ -0,0 +1,51 @@
|
||||||
|
# SPDX-License-Identifier: (GPL-2.0 OR BSD-2-Clause)
|
||||||
|
%YAML 1.2
|
||||||
|
---
|
||||||
|
$id: http://devicetree.org/schemas/soc/mediatek/mediatek,mt7986-wo-ccif.yaml#
|
||||||
|
$schema: http://devicetree.org/meta-schemas/core.yaml#
|
||||||
|
|
||||||
|
title: MediaTek Wireless Ethernet Dispatch (WED) WO controller interface for MT7986
|
||||||
|
|
||||||
|
maintainers:
|
||||||
|
- Lorenzo Bianconi <lorenzo@kernel.org>
|
||||||
|
- Felix Fietkau <nbd@nbd.name>
|
||||||
|
|
||||||
|
description:
|
||||||
|
The MediaTek wo-ccif provides a configuration interface for WED WO
|
||||||
|
controller used to perfrom offload rx packet processing (e.g. 802.11
|
||||||
|
aggregation packet reordering or rx header translation) on MT7986 soc.
|
||||||
|
|
||||||
|
properties:
|
||||||
|
compatible:
|
||||||
|
items:
|
||||||
|
- enum:
|
||||||
|
- mediatek,mt7986-wo-ccif
|
||||||
|
- const: syscon
|
||||||
|
|
||||||
|
reg:
|
||||||
|
maxItems: 1
|
||||||
|
|
||||||
|
interrupts:
|
||||||
|
maxItems: 1
|
||||||
|
|
||||||
|
required:
|
||||||
|
- compatible
|
||||||
|
- reg
|
||||||
|
- interrupts
|
||||||
|
|
||||||
|
additionalProperties: false
|
||||||
|
|
||||||
|
examples:
|
||||||
|
- |
|
||||||
|
#include <dt-bindings/interrupt-controller/arm-gic.h>
|
||||||
|
#include <dt-bindings/interrupt-controller/irq.h>
|
||||||
|
soc {
|
||||||
|
#address-cells = <2>;
|
||||||
|
#size-cells = <2>;
|
||||||
|
|
||||||
|
syscon@151a5000 {
|
||||||
|
compatible = "mediatek,mt7986-wo-ccif", "syscon";
|
||||||
|
reg = <0 0x151a5000 0 0x1000>;
|
||||||
|
interrupts = <GIC_SPI 205 IRQ_TYPE_LEVEL_HIGH>;
|
||||||
|
};
|
||||||
|
};
|
|
@ -42,15 +42,13 @@ properties:
|
||||||
bluetooth:
|
bluetooth:
|
||||||
type: object
|
type: object
|
||||||
additionalProperties: false
|
additionalProperties: false
|
||||||
|
allOf:
|
||||||
|
- $ref: /schemas/net/bluetooth/bluetooth-controller.yaml#
|
||||||
properties:
|
properties:
|
||||||
compatible:
|
compatible:
|
||||||
const: qcom,wcnss-bt
|
const: qcom,wcnss-bt
|
||||||
|
|
||||||
local-bd-address:
|
local-bd-address: true
|
||||||
$ref: /schemas/types.yaml#/definitions/uint8-array
|
|
||||||
maxItems: 6
|
|
||||||
description:
|
|
||||||
See Documentation/devicetree/bindings/net/bluetooth.txt
|
|
||||||
|
|
||||||
required:
|
required:
|
||||||
- compatible
|
- compatible
|
||||||
|
|
|
@ -566,7 +566,8 @@ miimon
|
||||||
link monitoring. A value of 100 is a good starting point.
|
link monitoring. A value of 100 is a good starting point.
|
||||||
The use_carrier option, below, affects how the link state is
|
The use_carrier option, below, affects how the link state is
|
||||||
determined. See the High Availability section for additional
|
determined. See the High Availability section for additional
|
||||||
information. The default value is 0.
|
information. The default value is 100 if arp_interval is not
|
||||||
|
set.
|
||||||
|
|
||||||
min_links
|
min_links
|
||||||
|
|
||||||
|
@ -956,6 +957,7 @@ xmit_hash_policy
|
||||||
hash = hash XOR source IP XOR destination IP
|
hash = hash XOR source IP XOR destination IP
|
||||||
hash = hash XOR (hash RSHIFT 16)
|
hash = hash XOR (hash RSHIFT 16)
|
||||||
hash = hash XOR (hash RSHIFT 8)
|
hash = hash XOR (hash RSHIFT 8)
|
||||||
|
hash = hash RSHIFT 1
|
||||||
And then hash is reduced modulo slave count.
|
And then hash is reduced modulo slave count.
|
||||||
|
|
||||||
If the protocol is IPv6 then the source and destination
|
If the protocol is IPv6 then the source and destination
|
||||||
|
|
|
@ -1148,6 +1148,39 @@ tuning on deep embedded systems'. The author is running a MPC603e
|
||||||
load without any problems ...
|
load without any problems ...
|
||||||
|
|
||||||
|
|
||||||
|
Switchable Termination Resistors
|
||||||
|
--------------------------------
|
||||||
|
|
||||||
|
CAN bus requires a specific impedance across the differential pair,
|
||||||
|
typically provided by two 120Ohm resistors on the farthest nodes of
|
||||||
|
the bus. Some CAN controllers support activating / deactivating a
|
||||||
|
termination resistor(s) to provide the correct impedance.
|
||||||
|
|
||||||
|
Query the available resistances::
|
||||||
|
|
||||||
|
$ ip -details link show can0
|
||||||
|
...
|
||||||
|
termination 120 [ 0, 120 ]
|
||||||
|
|
||||||
|
Activate the terminating resistor::
|
||||||
|
|
||||||
|
$ ip link set dev can0 type can termination 120
|
||||||
|
|
||||||
|
Deactivate the terminating resistor::
|
||||||
|
|
||||||
|
$ ip link set dev can0 type can termination 0
|
||||||
|
|
||||||
|
To enable termination resistor support to a can-controller, either
|
||||||
|
implement in the controller's struct can-priv::
|
||||||
|
|
||||||
|
termination_const
|
||||||
|
termination_const_cnt
|
||||||
|
do_set_termination
|
||||||
|
|
||||||
|
or add gpio control with the device tree entries from
|
||||||
|
Documentation/devicetree/bindings/net/can/can-controller.yaml
|
||||||
|
|
||||||
|
|
||||||
The Virtual CAN Driver (vcan)
|
The Virtual CAN Driver (vcan)
|
||||||
-----------------------------
|
-----------------------------
|
||||||
|
|
||||||
|
|
|
@ -181,10 +181,13 @@ when necessary using the below listed API::
|
||||||
- int dpaa2_mac_connect(struct dpaa2_mac *mac);
|
- int dpaa2_mac_connect(struct dpaa2_mac *mac);
|
||||||
- void dpaa2_mac_disconnect(struct dpaa2_mac *mac);
|
- void dpaa2_mac_disconnect(struct dpaa2_mac *mac);
|
||||||
|
|
||||||
A phylink integration is necessary only when the partner DPMAC is not of TYPE_FIXED.
|
A phylink integration is necessary only when the partner DPMAC is not of
|
||||||
One can check for this condition using the below API::
|
``TYPE_FIXED``. This means it is either of ``TYPE_PHY``, or of
|
||||||
|
``TYPE_BACKPLANE`` (the difference being the two that in the ``TYPE_BACKPLANE``
|
||||||
|
mode, the MC firmware does not access the PCS registers). One can check for
|
||||||
|
this condition using the following helper::
|
||||||
|
|
||||||
- bool dpaa2_mac_is_type_fixed(struct fsl_mc_device *dpmac_dev,struct fsl_mc_io *mc_io);
|
- static inline bool dpaa2_mac_is_type_phy(struct dpaa2_mac *mac);
|
||||||
|
|
||||||
Before connection to a MAC, the caller must allocate and populate the
|
Before connection to a MAC, the caller must allocate and populate the
|
||||||
dpaa2_mac structure with the associated net_device, a pointer to the MC portal
|
dpaa2_mac structure with the associated net_device, a pointer to the MC portal
|
||||||
|
|
|
@ -23,6 +23,7 @@ Supported Devices
|
||||||
=================
|
=================
|
||||||
Currently, this driver support following devices:
|
Currently, this driver support following devices:
|
||||||
* Network controller: Cavium, Inc. Device b200
|
* Network controller: Cavium, Inc. Device b200
|
||||||
|
* Network controller: Cavium, Inc. Device b400
|
||||||
|
|
||||||
Interface Control
|
Interface Control
|
||||||
=================
|
=================
|
||||||
|
|
|
@ -25,7 +25,7 @@ Enabling the driver and kconfig options
|
||||||
| at build time via kernel Kconfig flags.
|
| at build time via kernel Kconfig flags.
|
||||||
| Basic features, ethernet net device rx/tx offloads and XDP, are available with the most basic flags
|
| Basic features, ethernet net device rx/tx offloads and XDP, are available with the most basic flags
|
||||||
| CONFIG_MLX5_CORE=y/m and CONFIG_MLX5_CORE_EN=y.
|
| CONFIG_MLX5_CORE=y/m and CONFIG_MLX5_CORE_EN=y.
|
||||||
| For the list of advanced features please see below.
|
| For the list of advanced features, please see below.
|
||||||
|
|
||||||
**CONFIG_MLX5_CORE=(y/m/n)** (module mlx5_core.ko)
|
**CONFIG_MLX5_CORE=(y/m/n)** (module mlx5_core.ko)
|
||||||
|
|
||||||
|
@ -89,11 +89,11 @@ Enabling the driver and kconfig options
|
||||||
|
|
||||||
**CONFIG_MLX5_EN_IPSEC=(y/n)**
|
**CONFIG_MLX5_EN_IPSEC=(y/n)**
|
||||||
|
|
||||||
| Enables `IPSec XFRM cryptography-offload accelaration <http://www.mellanox.com/related-docs/prod_software/Mellanox_Innova_IPsec_Ethernet_Adapter_Card_User_Manual.pdf>`_.
|
| Enables `IPSec XFRM cryptography-offload acceleration <http://www.mellanox.com/related-docs/prod_software/Mellanox_Innova_IPsec_Ethernet_Adapter_Card_User_Manual.pdf>`_.
|
||||||
|
|
||||||
**CONFIG_MLX5_EN_TLS=(y/n)**
|
**CONFIG_MLX5_EN_TLS=(y/n)**
|
||||||
|
|
||||||
| TLS cryptography-offload accelaration.
|
| TLS cryptography-offload acceleration.
|
||||||
|
|
||||||
|
|
||||||
**CONFIG_MLX5_INFINIBAND=(y/n/m)** (module mlx5_ib.ko)
|
**CONFIG_MLX5_INFINIBAND=(y/n/m)** (module mlx5_ib.ko)
|
||||||
|
@ -139,14 +139,14 @@ flow_steering_mode: Device flow steering mode
|
||||||
The flow steering mode parameter controls the flow steering mode of the driver.
|
The flow steering mode parameter controls the flow steering mode of the driver.
|
||||||
Two modes are supported:
|
Two modes are supported:
|
||||||
1. 'dmfs' - Device managed flow steering.
|
1. 'dmfs' - Device managed flow steering.
|
||||||
2. 'smfs - Software/Driver managed flow steering.
|
2. 'smfs' - Software/Driver managed flow steering.
|
||||||
|
|
||||||
In DMFS mode, the HW steering entities are created and managed through the
|
In DMFS mode, the HW steering entities are created and managed through the
|
||||||
Firmware.
|
Firmware.
|
||||||
In SMFS mode, the HW steering entities are created and managed though by
|
In SMFS mode, the HW steering entities are created and managed though by
|
||||||
the driver directly into Hardware without firmware intervention.
|
the driver directly into hardware without firmware intervention.
|
||||||
|
|
||||||
SMFS mode is faster and provides better rule inserstion rate compared to default DMFS mode.
|
SMFS mode is faster and provides better rule insertion rate compared to default DMFS mode.
|
||||||
|
|
||||||
User command examples:
|
User command examples:
|
||||||
|
|
||||||
|
@ -165,9 +165,9 @@ User command examples:
|
||||||
enable_roce: RoCE enablement state
|
enable_roce: RoCE enablement state
|
||||||
----------------------------------
|
----------------------------------
|
||||||
RoCE enablement state controls driver support for RoCE traffic.
|
RoCE enablement state controls driver support for RoCE traffic.
|
||||||
When RoCE is disabled, there is no gid table, only raw ethernet QPs are supported and traffic on the well known UDP RoCE port is handled as raw ethernet traffic.
|
When RoCE is disabled, there is no gid table, only raw ethernet QPs are supported and traffic on the well-known UDP RoCE port is handled as raw ethernet traffic.
|
||||||
|
|
||||||
To change RoCE enablement state a user must change the driverinit cmode value and run devlink reload.
|
To change RoCE enablement state, a user must change the driverinit cmode value and run devlink reload.
|
||||||
|
|
||||||
User command examples:
|
User command examples:
|
||||||
|
|
||||||
|
@ -186,7 +186,7 @@ User command examples:
|
||||||
|
|
||||||
esw_port_metadata: Eswitch port metadata state
|
esw_port_metadata: Eswitch port metadata state
|
||||||
----------------------------------------------
|
----------------------------------------------
|
||||||
When applicable, disabling Eswitch metadata can increase packet rate
|
When applicable, disabling eswitch metadata can increase packet rate
|
||||||
up to 20% depending on the use case and packet sizes.
|
up to 20% depending on the use case and packet sizes.
|
||||||
|
|
||||||
Eswitch port metadata state controls whether to internally tag packets with
|
Eswitch port metadata state controls whether to internally tag packets with
|
||||||
|
@ -253,26 +253,26 @@ mlx5 subfunction
|
||||||
================
|
================
|
||||||
mlx5 supports subfunction management using devlink port (see :ref:`Documentation/networking/devlink/devlink-port.rst <devlink_port>`) interface.
|
mlx5 supports subfunction management using devlink port (see :ref:`Documentation/networking/devlink/devlink-port.rst <devlink_port>`) interface.
|
||||||
|
|
||||||
A Subfunction has its own function capabilities and its own resources. This
|
A subfunction has its own function capabilities and its own resources. This
|
||||||
means a subfunction has its own dedicated queues (txq, rxq, cq, eq). These
|
means a subfunction has its own dedicated queues (txq, rxq, cq, eq). These
|
||||||
queues are neither shared nor stolen from the parent PCI function.
|
queues are neither shared nor stolen from the parent PCI function.
|
||||||
|
|
||||||
When a subfunction is RDMA capable, it has its own QP1, GID table and rdma
|
When a subfunction is RDMA capable, it has its own QP1, GID table, and RDMA
|
||||||
resources neither shared nor stolen from the parent PCI function.
|
resources neither shared nor stolen from the parent PCI function.
|
||||||
|
|
||||||
A subfunction has a dedicated window in PCI BAR space that is not shared
|
A subfunction has a dedicated window in PCI BAR space that is not shared
|
||||||
with ther other subfunctions or the parent PCI function. This ensures that all
|
with the other subfunctions or the parent PCI function. This ensures that all
|
||||||
devices (netdev, rdma, vdpa etc.) of the subfunction accesses only assigned
|
devices (netdev, rdma, vdpa, etc.) of the subfunction accesses only assigned
|
||||||
PCI BAR space.
|
PCI BAR space.
|
||||||
|
|
||||||
A Subfunction supports eswitch representation through which it supports tc
|
A subfunction supports eswitch representation through which it supports tc
|
||||||
offloads. The user configures eswitch to send/receive packets from/to
|
offloads. The user configures eswitch to send/receive packets from/to
|
||||||
the subfunction port.
|
the subfunction port.
|
||||||
|
|
||||||
Subfunctions share PCI level resources such as PCI MSI-X IRQs with
|
Subfunctions share PCI level resources such as PCI MSI-X IRQs with
|
||||||
other subfunctions and/or with its parent PCI function.
|
other subfunctions and/or with its parent PCI function.
|
||||||
|
|
||||||
Example mlx5 software, system and device view::
|
Example mlx5 software, system, and device view::
|
||||||
|
|
||||||
_______
|
_______
|
||||||
| admin |
|
| admin |
|
||||||
|
@ -310,7 +310,7 @@ Example mlx5 software, system and device view::
|
||||||
| (device add/del)
|
| (device add/del)
|
||||||
_____|____ ____|________
|
_____|____ ____|________
|
||||||
| | | subfunction |
|
| | | subfunction |
|
||||||
| PCI NIC |---- activate/deactive events---->| host driver |
|
| PCI NIC |--- activate/deactivate events--->| host driver |
|
||||||
|__________| | (mlx5_core) |
|
|__________| | (mlx5_core) |
|
||||||
|_____________|
|
|_____________|
|
||||||
|
|
||||||
|
@ -320,7 +320,7 @@ Subfunction is created using devlink port interface.
|
||||||
|
|
||||||
$ devlink dev eswitch set pci/0000:06:00.0 mode switchdev
|
$ devlink dev eswitch set pci/0000:06:00.0 mode switchdev
|
||||||
|
|
||||||
- Add a devlink port of subfunction flaovur::
|
- Add a devlink port of subfunction flavour::
|
||||||
|
|
||||||
$ devlink port add pci/0000:06:00.0 flavour pcisf pfnum 0 sfnum 88
|
$ devlink port add pci/0000:06:00.0 flavour pcisf pfnum 0 sfnum 88
|
||||||
pci/0000:06:00.0/32768: type eth netdev eth6 flavour pcisf controller 0 pfnum 0 sfnum 88 external false splittable false
|
pci/0000:06:00.0/32768: type eth netdev eth6 flavour pcisf controller 0 pfnum 0 sfnum 88 external false splittable false
|
||||||
|
@ -351,46 +351,30 @@ driver.
|
||||||
|
|
||||||
MAC address setup
|
MAC address setup
|
||||||
-----------------
|
-----------------
|
||||||
mlx5 driver provides mechanism to setup the MAC address of the PCI VF/SF.
|
mlx5 driver support devlink port function attr mechanism to setup MAC
|
||||||
|
address. (refer to Documentation/networking/devlink/devlink-port.rst)
|
||||||
|
|
||||||
The configured MAC address of the PCI VF/SF will be used by netdevice and rdma
|
RoCE capability setup
|
||||||
device created for the PCI VF/SF.
|
---------------------
|
||||||
|
Not all mlx5 PCI devices/SFs require RoCE capability.
|
||||||
|
|
||||||
- Get the MAC address of the VF identified by its unique devlink port index::
|
When RoCE capability is disabled, it saves 1 Mbytes worth of system memory per
|
||||||
|
PCI devices/SF.
|
||||||
|
|
||||||
$ devlink port show pci/0000:06:00.0/2
|
mlx5 driver support devlink port function attr mechanism to setup RoCE
|
||||||
pci/0000:06:00.0/2: type eth netdev enp6s0pf0vf1 flavour pcivf pfnum 0 vfnum 1
|
capability. (refer to Documentation/networking/devlink/devlink-port.rst)
|
||||||
function:
|
|
||||||
hw_addr 00:00:00:00:00:00
|
|
||||||
|
|
||||||
- Set the MAC address of the VF identified by its unique devlink port index::
|
migratable capability setup
|
||||||
|
---------------------------
|
||||||
|
User who wants mlx5 PCI VFs to be able to perform live migration need to
|
||||||
|
explicitly enable the VF migratable capability.
|
||||||
|
|
||||||
$ devlink port function set pci/0000:06:00.0/2 hw_addr 00:11:22:33:44:55
|
mlx5 driver support devlink port function attr mechanism to setup migratable
|
||||||
|
capability. (refer to Documentation/networking/devlink/devlink-port.rst)
|
||||||
$ devlink port show pci/0000:06:00.0/2
|
|
||||||
pci/0000:06:00.0/2: type eth netdev enp6s0pf0vf1 flavour pcivf pfnum 0 vfnum 1
|
|
||||||
function:
|
|
||||||
hw_addr 00:11:22:33:44:55
|
|
||||||
|
|
||||||
- Get the MAC address of the SF identified by its unique devlink port index::
|
|
||||||
|
|
||||||
$ devlink port show pci/0000:06:00.0/32768
|
|
||||||
pci/0000:06:00.0/32768: type eth netdev enp6s0pf0sf88 flavour pcisf pfnum 0 sfnum 88
|
|
||||||
function:
|
|
||||||
hw_addr 00:00:00:00:00:00
|
|
||||||
|
|
||||||
- Set the MAC address of the VF identified by its unique devlink port index::
|
|
||||||
|
|
||||||
$ devlink port function set pci/0000:06:00.0/32768 hw_addr 00:00:00:00:88:88
|
|
||||||
|
|
||||||
$ devlink port show pci/0000:06:00.0/32768
|
|
||||||
pci/0000:06:00.0/32768: type eth netdev enp6s0pf0sf88 flavour pcivf pfnum 0 sfnum 88
|
|
||||||
function:
|
|
||||||
hw_addr 00:00:00:00:88:88
|
|
||||||
|
|
||||||
SF state setup
|
SF state setup
|
||||||
--------------
|
--------------
|
||||||
To use the SF, the user must active the SF using the SF function state
|
To use the SF, the user must activate the SF using the SF function state
|
||||||
attribute.
|
attribute.
|
||||||
|
|
||||||
- Get the state of the SF identified by its unique devlink port index::
|
- Get the state of the SF identified by its unique devlink port index::
|
||||||
|
@ -447,7 +431,7 @@ for it.
|
||||||
|
|
||||||
Additionally, the SF port also gets the event when the driver attaches to the
|
Additionally, the SF port also gets the event when the driver attaches to the
|
||||||
auxiliary device of the subfunction. This results in changing the operational
|
auxiliary device of the subfunction. This results in changing the operational
|
||||||
state of the function. This provides visiblity to the user to decide when is it
|
state of the function. This provides visibility to the user to decide when is it
|
||||||
safe to delete the SF port for graceful termination of the subfunction.
|
safe to delete the SF port for graceful termination of the subfunction.
|
||||||
|
|
||||||
- Show the SF port operational state::
|
- Show the SF port operational state::
|
||||||
|
@ -464,14 +448,14 @@ tx reporter
|
||||||
-----------
|
-----------
|
||||||
The tx reporter is responsible for reporting and recovering of the following two error scenarios:
|
The tx reporter is responsible for reporting and recovering of the following two error scenarios:
|
||||||
|
|
||||||
- TX timeout
|
- tx timeout
|
||||||
Report on kernel tx timeout detection.
|
Report on kernel tx timeout detection.
|
||||||
Recover by searching lost interrupts.
|
Recover by searching lost interrupts.
|
||||||
- TX error completion
|
- tx error completion
|
||||||
Report on error tx completion.
|
Report on error tx completion.
|
||||||
Recover by flushing the TX queue and reset it.
|
Recover by flushing the tx queue and reset it.
|
||||||
|
|
||||||
TX reporter also support on demand diagnose callback, on which it provides
|
tx reporter also support on demand diagnose callback, on which it provides
|
||||||
real time information of its send queues status.
|
real time information of its send queues status.
|
||||||
|
|
||||||
User commands examples:
|
User commands examples:
|
||||||
|
@ -491,32 +475,32 @@ rx reporter
|
||||||
-----------
|
-----------
|
||||||
The rx reporter is responsible for reporting and recovering of the following two error scenarios:
|
The rx reporter is responsible for reporting and recovering of the following two error scenarios:
|
||||||
|
|
||||||
- RX queues initialization (population) timeout
|
- rx queues' initialization (population) timeout
|
||||||
RX queues descriptors population on ring initialization is done in
|
Population of rx queues' descriptors on ring initialization is done
|
||||||
napi context via triggering an irq, in case of a failure to get
|
in napi context via triggering an irq. In case of a failure to get
|
||||||
the minimum amount of descriptors, a timeout would occur and it
|
the minimum amount of descriptors, a timeout would occur, and
|
||||||
could be recoverable by polling the EQ (Event Queue).
|
descriptors could be recovered by polling the EQ (Event Queue).
|
||||||
- RX completions with errors (reported by HW on interrupt context)
|
- rx completions with errors (reported by HW on interrupt context)
|
||||||
Report on rx completion error.
|
Report on rx completion error.
|
||||||
Recover (if needed) by flushing the related queue and reset it.
|
Recover (if needed) by flushing the related queue and reset it.
|
||||||
|
|
||||||
RX reporter also supports on demand diagnose callback, on which it
|
rx reporter also supports on demand diagnose callback, on which it
|
||||||
provides real time information of its receive queues status.
|
provides real time information of its receive queues' status.
|
||||||
|
|
||||||
- Diagnose rx queues status, and corresponding completion queue::
|
- Diagnose rx queues' status and corresponding completion queue::
|
||||||
|
|
||||||
$ devlink health diagnose pci/0000:82:00.0 reporter rx
|
$ devlink health diagnose pci/0000:82:00.0 reporter rx
|
||||||
|
|
||||||
NOTE: This command has valid output only when interface is up, otherwise the command has empty output.
|
NOTE: This command has valid output only when interface is up. Otherwise, the command has empty output.
|
||||||
|
|
||||||
- Show number of rx errors indicated, number of recover flows ended successfully,
|
- Show number of rx errors indicated, number of recover flows ended successfully,
|
||||||
is autorecover enabled and graceful period from last recover::
|
is autorecover enabled, and graceful period from last recover::
|
||||||
|
|
||||||
$ devlink health show pci/0000:82:00.0 reporter rx
|
$ devlink health show pci/0000:82:00.0 reporter rx
|
||||||
|
|
||||||
fw reporter
|
fw reporter
|
||||||
-----------
|
-----------
|
||||||
The fw reporter implements diagnose and dump callbacks.
|
The fw reporter implements `diagnose` and `dump` callbacks.
|
||||||
It follows symptoms of fw error such as fw syndrome by triggering
|
It follows symptoms of fw error such as fw syndrome by triggering
|
||||||
fw core dump and storing it into the dump buffer.
|
fw core dump and storing it into the dump buffer.
|
||||||
The fw reporter diagnose command can be triggered any time by the user to check
|
The fw reporter diagnose command can be triggered any time by the user to check
|
||||||
|
@ -537,7 +521,7 @@ running it on other PF or any VF will return "Operation not permitted".
|
||||||
|
|
||||||
fw fatal reporter
|
fw fatal reporter
|
||||||
-----------------
|
-----------------
|
||||||
The fw fatal reporter implements dump and recover callbacks.
|
The fw fatal reporter implements `dump` and `recover` callbacks.
|
||||||
It follows fatal errors indications by CR-space dump and recover flow.
|
It follows fatal errors indications by CR-space dump and recover flow.
|
||||||
The CR-space dump uses vsc interface which is valid even if the FW command
|
The CR-space dump uses vsc interface which is valid even if the FW command
|
||||||
interface is not functional, which is the case in most FW fatal errors.
|
interface is not functional, which is the case in most FW fatal errors.
|
||||||
|
@ -552,7 +536,7 @@ User commands examples:
|
||||||
|
|
||||||
$ devlink health recover pci/0000:82:00.0 reporter fw_fatal
|
$ devlink health recover pci/0000:82:00.0 reporter fw_fatal
|
||||||
|
|
||||||
- Read FW CR-space dump if already strored or trigger new one::
|
- Read FW CR-space dump if already stored or trigger new one::
|
||||||
|
|
||||||
$ devlink health dump show pci/0000:82:00.1 reporter fw_fatal
|
$ devlink health dump show pci/0000:82:00.1 reporter fw_fatal
|
||||||
|
|
||||||
|
@ -561,10 +545,10 @@ NOTE: This command can run only on PF.
|
||||||
mlx5 tracepoints
|
mlx5 tracepoints
|
||||||
================
|
================
|
||||||
|
|
||||||
mlx5 driver provides internal trace points for tracking and debugging using
|
mlx5 driver provides internal tracepoints for tracking and debugging using
|
||||||
kernel tracepoints interfaces (refer to Documentation/trace/ftrace.rst).
|
kernel tracepoints interfaces (refer to Documentation/trace/ftrace.rst).
|
||||||
|
|
||||||
For the list of support mlx5 events check /sys/kernel/debug/tracing/events/mlx5/
|
For the list of support mlx5 events, check `/sys/kernel/debug/tracing/events/mlx5/`.
|
||||||
|
|
||||||
tc and eswitch offloads tracepoints:
|
tc and eswitch offloads tracepoints:
|
||||||
|
|
||||||
|
|
|
@ -1,50 +1,57 @@
|
||||||
.. SPDX-License-Identifier: (GPL-2.0-only OR BSD-2-Clause)
|
.. SPDX-License-Identifier: (GPL-2.0-only OR BSD-2-Clause)
|
||||||
|
.. include:: <isonum.txt>
|
||||||
|
|
||||||
=============================================
|
===========================================
|
||||||
Netronome Flow Processor (NFP) Kernel Drivers
|
Network Flow Processor (NFP) Kernel Drivers
|
||||||
=============================================
|
===========================================
|
||||||
|
|
||||||
Copyright (c) 2019, Netronome Systems, Inc.
|
:Copyright: |copy| 2019, Netronome Systems, Inc.
|
||||||
|
:Copyright: |copy| 2022, Corigine, Inc.
|
||||||
|
|
||||||
Contents
|
Contents
|
||||||
========
|
========
|
||||||
|
|
||||||
- `Overview`_
|
- `Overview`_
|
||||||
- `Acquiring Firmware`_
|
- `Acquiring Firmware`_
|
||||||
|
- `Devlink Info`_
|
||||||
|
- `Configure Device`_
|
||||||
|
- `Statistics`_
|
||||||
|
|
||||||
Overview
|
Overview
|
||||||
========
|
========
|
||||||
|
|
||||||
This driver supports Netronome's line of Flow Processor devices,
|
This driver supports Netronome and Corigine's line of Network Flow Processor
|
||||||
including the NFP4000, NFP5000, and NFP6000 models, which are also
|
devices, including the NFP3800, NFP4000, NFP5000, and NFP6000 models, which
|
||||||
incorporated in the company's family of Agilio SmartNICs. The SR-IOV
|
are also incorporated in the companies' family of Agilio SmartNICs. The SR-IOV
|
||||||
physical and virtual functions for these devices are supported by
|
physical and virtual functions for these devices are supported by the driver.
|
||||||
the driver.
|
|
||||||
|
|
||||||
Acquiring Firmware
|
Acquiring Firmware
|
||||||
==================
|
==================
|
||||||
|
|
||||||
The NFP4000 and NFP6000 devices require application specific firmware
|
The NFP3800, NFP4000 and NFP6000 devices require application specific firmware
|
||||||
to function. Application firmware can be located either on the host file system
|
to function. Application firmware can be located either on the host file system
|
||||||
or in the device flash (if supported by management firmware).
|
or in the device flash (if supported by management firmware).
|
||||||
|
|
||||||
Firmware files on the host filesystem contain card type (`AMDA-*` string), media
|
Firmware files on the host filesystem contain card type (`AMDA-*` string), media
|
||||||
config etc. They should be placed in `/lib/firmware/netronome` directory to
|
config etc. They should be placed in `/lib/firmware/netronome` directory to
|
||||||
load firmware from the host file system.
|
load firmware from the host file system.
|
||||||
|
|
||||||
Firmware for basic NIC operation is available in the upstream
|
Firmware for basic NIC operation is available in the upstream
|
||||||
`linux-firmware.git` repository.
|
`linux-firmware.git` repository.
|
||||||
|
|
||||||
|
A more comprehensive list of firmware can be downloaded from the
|
||||||
|
`Corigine Support site <https://www.corigine.com/DPUDownload.html>`_.
|
||||||
|
|
||||||
Firmware in NVRAM
|
Firmware in NVRAM
|
||||||
-----------------
|
-----------------
|
||||||
|
|
||||||
Recent versions of management firmware supports loading application
|
Recent versions of management firmware supports loading application
|
||||||
firmware from flash when the host driver gets probed. The firmware loading
|
firmware from flash when the host driver gets probed. The firmware loading
|
||||||
policy configuration may be used to configure this feature appropriately.
|
policy configuration may be used to configure this feature appropriately.
|
||||||
|
|
||||||
Devlink or ethtool can be used to update the application firmware on the device
|
Devlink or ethtool can be used to update the application firmware on the device
|
||||||
flash by providing the appropriate `nic_AMDA*.nffw` file to the respective
|
flash by providing the appropriate `nic_AMDA*.nffw` file to the respective
|
||||||
command. Users need to take care to write the correct firmware image for the
|
command. Users need to take care to write the correct firmware image for the
|
||||||
card and media configuration to flash.
|
card and media configuration to flash.
|
||||||
|
|
||||||
Available storage space in flash depends on the card being used.
|
Available storage space in flash depends on the card being used.
|
||||||
|
@ -79,9 +86,9 @@ You may need to use hard instead of symbolic links on distributions
|
||||||
which use old `mkinitrd` command instead of `dracut` (e.g. Ubuntu).
|
which use old `mkinitrd` command instead of `dracut` (e.g. Ubuntu).
|
||||||
|
|
||||||
After changing firmware files you may need to regenerate the initramfs
|
After changing firmware files you may need to regenerate the initramfs
|
||||||
image. Initramfs contains drivers and firmware files your system may
|
image. Initramfs contains drivers and firmware files your system may
|
||||||
need to boot. Refer to the documentation of your distribution to find
|
need to boot. Refer to the documentation of your distribution to find
|
||||||
out how to update initramfs. Good indication of stale initramfs
|
out how to update initramfs. Good indication of stale initramfs
|
||||||
is system loading wrong driver or firmware on boot, but when driver is
|
is system loading wrong driver or firmware on boot, but when driver is
|
||||||
later reloaded manually everything works correctly.
|
later reloaded manually everything works correctly.
|
||||||
|
|
||||||
|
@ -89,9 +96,9 @@ Selecting firmware per device
|
||||||
-----------------------------
|
-----------------------------
|
||||||
|
|
||||||
Most commonly all cards on the system use the same type of firmware.
|
Most commonly all cards on the system use the same type of firmware.
|
||||||
If you want to load specific firmware image for a specific card, you
|
If you want to load a specific firmware image for a specific card, you
|
||||||
can use either the PCI bus address or serial number. Driver will print
|
can use either the PCI bus address or serial number. The driver will
|
||||||
which files it's looking for when it recognizes a NFP device::
|
print which files it's looking for when it recognizes a NFP device::
|
||||||
|
|
||||||
nfp: Looking for firmware file in order of priority:
|
nfp: Looking for firmware file in order of priority:
|
||||||
nfp: netronome/serial-00-12-34-aa-bb-cc-10-ff.nffw: not found
|
nfp: netronome/serial-00-12-34-aa-bb-cc-10-ff.nffw: not found
|
||||||
|
@ -106,6 +113,15 @@ Note that `serial-*` and `pci-*` files are **not** automatically included
|
||||||
in initramfs, you will have to refer to documentation of appropriate tools
|
in initramfs, you will have to refer to documentation of appropriate tools
|
||||||
to find out how to include them.
|
to find out how to include them.
|
||||||
|
|
||||||
|
Running firmware version
|
||||||
|
------------------------
|
||||||
|
|
||||||
|
The version of the loaded firmware for a particular <netdev> interface,
|
||||||
|
(e.g. enp4s0), or an interface's port <netdev port> (e.g. enp4s0np0) can
|
||||||
|
be displayed with the ethtool command::
|
||||||
|
|
||||||
|
$ ethtool -i <netdev>
|
||||||
|
|
||||||
Firmware loading policy
|
Firmware loading policy
|
||||||
-----------------------
|
-----------------------
|
||||||
|
|
||||||
|
@ -132,6 +148,115 @@ abi_drv_load_ifc
|
||||||
Defines a list of PF devices allowed to load FW on the device.
|
Defines a list of PF devices allowed to load FW on the device.
|
||||||
This variable is not currently user configurable.
|
This variable is not currently user configurable.
|
||||||
|
|
||||||
|
Devlink Info
|
||||||
|
============
|
||||||
|
|
||||||
|
The devlink info command displays the running and stored firmware versions
|
||||||
|
on the device, serial number and board information.
|
||||||
|
|
||||||
|
Devlink info command example (replace PCI address)::
|
||||||
|
|
||||||
|
$ devlink dev info pci/0000:03:00.0
|
||||||
|
pci/0000:03:00.0:
|
||||||
|
driver nfp
|
||||||
|
serial_number CSAAMDA2001-1003000111
|
||||||
|
versions:
|
||||||
|
fixed:
|
||||||
|
board.id AMDA2001-1003
|
||||||
|
board.rev 01
|
||||||
|
board.manufacture CSA
|
||||||
|
board.model mozart
|
||||||
|
running:
|
||||||
|
fw.mgmt 22.10.0-rc3
|
||||||
|
fw.cpld 0x1000003
|
||||||
|
fw.app nic-22.09.0
|
||||||
|
chip.init AMDA-2001-1003 1003000111
|
||||||
|
stored:
|
||||||
|
fw.bundle_id bspbundle_1003000111
|
||||||
|
fw.mgmt 22.10.0-rc3
|
||||||
|
fw.cpld 0x0
|
||||||
|
chip.init AMDA-2001-1003 1003000111
|
||||||
|
|
||||||
|
Configure Device
|
||||||
|
================
|
||||||
|
|
||||||
|
This section explains how to use Agilio SmartNICs running basic NIC firmware.
|
||||||
|
|
||||||
|
Configure interface link-speed
|
||||||
|
------------------------------
|
||||||
|
The following steps explains how to change between 10G mode and 25G mode on
|
||||||
|
Agilio CX 2x25GbE cards. The changing of port speed must be done in order,
|
||||||
|
port 0 (p0) must be set to 10G before port 1 (p1) may be set to 10G.
|
||||||
|
|
||||||
|
Down the respective interface(s)::
|
||||||
|
|
||||||
|
$ ip link set dev <netdev port 0> down
|
||||||
|
$ ip link set dev <netdev port 1> down
|
||||||
|
|
||||||
|
Set interface link-speed to 10G::
|
||||||
|
|
||||||
|
$ ethtool -s <netdev port 0> speed 10000
|
||||||
|
$ ethtool -s <netdev port 1> speed 10000
|
||||||
|
|
||||||
|
Set interface link-speed to 25G::
|
||||||
|
|
||||||
|
$ ethtool -s <netdev port 0> speed 25000
|
||||||
|
$ ethtool -s <netdev port 1> speed 25000
|
||||||
|
|
||||||
|
Reload driver for changes to take effect::
|
||||||
|
|
||||||
|
$ rmmod nfp; modprobe nfp
|
||||||
|
|
||||||
|
Configure interface Maximum Transmission Unit (MTU)
|
||||||
|
---------------------------------------------------
|
||||||
|
|
||||||
|
The MTU of interfaces can temporarily be set using the iproute2, ip link or
|
||||||
|
ifconfig tools. Note that this change will not persist. Setting this via
|
||||||
|
Network Manager, or another appropriate OS configuration tool, is
|
||||||
|
recommended as changes to the MTU using Network Manager can be made to
|
||||||
|
persist.
|
||||||
|
|
||||||
|
Set interface MTU to 9000 bytes::
|
||||||
|
|
||||||
|
$ ip link set dev <netdev port> mtu 9000
|
||||||
|
|
||||||
|
It is the responsibility of the user or the orchestration layer to set
|
||||||
|
appropriate MTU values when handling jumbo frames or utilizing tunnels. For
|
||||||
|
example, if packets sent from a VM are to be encapsulated on the card and
|
||||||
|
egress a physical port, then the MTU of the VF should be set to lower than
|
||||||
|
that of the physical port to account for the extra bytes added by the
|
||||||
|
additional header. If a setup is expected to see fallback traffic between
|
||||||
|
the SmartNIC and the kernel then the user should also ensure that the PF MTU
|
||||||
|
is appropriately set to avoid unexpected drops on this path.
|
||||||
|
|
||||||
|
Configure Forward Error Correction (FEC) modes
|
||||||
|
----------------------------------------------
|
||||||
|
|
||||||
|
Agilio SmartNICs support FEC mode configuration, e.g. Auto, Firecode Base-R,
|
||||||
|
ReedSolomon and Off modes. Each physical port's FEC mode can be set
|
||||||
|
independently using ethtool. The supported FEC modes for an interface can
|
||||||
|
be viewed using::
|
||||||
|
|
||||||
|
$ ethtool <netdev>
|
||||||
|
|
||||||
|
The currently configured FEC mode can be viewed using::
|
||||||
|
|
||||||
|
$ ethtool --show-fec <netdev>
|
||||||
|
|
||||||
|
To force the FEC mode for a particular port, auto-negotiation must be disabled
|
||||||
|
(see the `Auto-negotiation`_ section). An example of how to set the FEC mode
|
||||||
|
to Reed-Solomon is::
|
||||||
|
|
||||||
|
$ ethtool --set-fec <netdev> encoding rs
|
||||||
|
|
||||||
|
Auto-negotiation
|
||||||
|
----------------
|
||||||
|
|
||||||
|
To change auto-negotiation settings, the link must first be put down. After the
|
||||||
|
link is down, auto-negotiation can be enabled or disabled using::
|
||||||
|
|
||||||
|
ethtool -s <netdev> autoneg <on|off>
|
||||||
|
|
||||||
Statistics
|
Statistics
|
||||||
==========
|
==========
|
||||||
|
|
||||||
|
|
|
@ -198,6 +198,11 @@ fw.bundle_id
|
||||||
|
|
||||||
Unique identifier of the entire firmware bundle.
|
Unique identifier of the entire firmware bundle.
|
||||||
|
|
||||||
|
fw.bootloader
|
||||||
|
-------------
|
||||||
|
|
||||||
|
Version of the bootloader.
|
||||||
|
|
||||||
Future work
|
Future work
|
||||||
===========
|
===========
|
||||||
|
|
||||||
|
|
|
@ -110,7 +110,7 @@ devlink ports for both the controllers.
|
||||||
Function configuration
|
Function configuration
|
||||||
======================
|
======================
|
||||||
|
|
||||||
A user can configure the function attribute before enumerating the PCI
|
Users can configure one or more function attributes before enumerating the PCI
|
||||||
function. Usually it means, user should configure function attribute
|
function. Usually it means, user should configure function attribute
|
||||||
before a bus specific device for the function is created. However, when
|
before a bus specific device for the function is created. However, when
|
||||||
SRIOV is enabled, virtual function devices are created on the PCI bus.
|
SRIOV is enabled, virtual function devices are created on the PCI bus.
|
||||||
|
@ -119,9 +119,127 @@ function device to the driver. For subfunctions, this means user should
|
||||||
configure port function attribute before activating the port function.
|
configure port function attribute before activating the port function.
|
||||||
|
|
||||||
A user may set the hardware address of the function using
|
A user may set the hardware address of the function using
|
||||||
'devlink port function set hw_addr' command. For Ethernet port function
|
`devlink port function set hw_addr` command. For Ethernet port function
|
||||||
this means a MAC address.
|
this means a MAC address.
|
||||||
|
|
||||||
|
Users may also set the RoCE capability of the function using
|
||||||
|
`devlink port function set roce` command.
|
||||||
|
|
||||||
|
Users may also set the function as migratable using
|
||||||
|
'devlink port function set migratable' command.
|
||||||
|
|
||||||
|
Function attributes
|
||||||
|
===================
|
||||||
|
|
||||||
|
MAC address setup
|
||||||
|
-----------------
|
||||||
|
The configured MAC address of the PCI VF/SF will be used by netdevice and rdma
|
||||||
|
device created for the PCI VF/SF.
|
||||||
|
|
||||||
|
- Get the MAC address of the VF identified by its unique devlink port index::
|
||||||
|
|
||||||
|
$ devlink port show pci/0000:06:00.0/2
|
||||||
|
pci/0000:06:00.0/2: type eth netdev enp6s0pf0vf1 flavour pcivf pfnum 0 vfnum 1
|
||||||
|
function:
|
||||||
|
hw_addr 00:00:00:00:00:00
|
||||||
|
|
||||||
|
- Set the MAC address of the VF identified by its unique devlink port index::
|
||||||
|
|
||||||
|
$ devlink port function set pci/0000:06:00.0/2 hw_addr 00:11:22:33:44:55
|
||||||
|
|
||||||
|
$ devlink port show pci/0000:06:00.0/2
|
||||||
|
pci/0000:06:00.0/2: type eth netdev enp6s0pf0vf1 flavour pcivf pfnum 0 vfnum 1
|
||||||
|
function:
|
||||||
|
hw_addr 00:11:22:33:44:55
|
||||||
|
|
||||||
|
- Get the MAC address of the SF identified by its unique devlink port index::
|
||||||
|
|
||||||
|
$ devlink port show pci/0000:06:00.0/32768
|
||||||
|
pci/0000:06:00.0/32768: type eth netdev enp6s0pf0sf88 flavour pcisf pfnum 0 sfnum 88
|
||||||
|
function:
|
||||||
|
hw_addr 00:00:00:00:00:00
|
||||||
|
|
||||||
|
- Set the MAC address of the SF identified by its unique devlink port index::
|
||||||
|
|
||||||
|
$ devlink port function set pci/0000:06:00.0/32768 hw_addr 00:00:00:00:88:88
|
||||||
|
|
||||||
|
$ devlink port show pci/0000:06:00.0/32768
|
||||||
|
pci/0000:06:00.0/32768: type eth netdev enp6s0pf0sf88 flavour pcisf pfnum 0 sfnum 88
|
||||||
|
function:
|
||||||
|
hw_addr 00:00:00:00:88:88
|
||||||
|
|
||||||
|
RoCE capability setup
|
||||||
|
---------------------
|
||||||
|
Not all PCI VFs/SFs require RoCE capability.
|
||||||
|
|
||||||
|
When RoCE capability is disabled, it saves system memory per PCI VF/SF.
|
||||||
|
|
||||||
|
When user disables RoCE capability for a VF/SF, user application cannot send or
|
||||||
|
receive any RoCE packets through this VF/SF and RoCE GID table for this PCI
|
||||||
|
will be empty.
|
||||||
|
|
||||||
|
When RoCE capability is disabled in the device using port function attribute,
|
||||||
|
VF/SF driver cannot override it.
|
||||||
|
|
||||||
|
- Get RoCE capability of the VF device::
|
||||||
|
|
||||||
|
$ devlink port show pci/0000:06:00.0/2
|
||||||
|
pci/0000:06:00.0/2: type eth netdev enp6s0pf0vf1 flavour pcivf pfnum 0 vfnum 1
|
||||||
|
function:
|
||||||
|
hw_addr 00:00:00:00:00:00 roce enable
|
||||||
|
|
||||||
|
- Set RoCE capability of the VF device::
|
||||||
|
|
||||||
|
$ devlink port function set pci/0000:06:00.0/2 roce disable
|
||||||
|
|
||||||
|
$ devlink port show pci/0000:06:00.0/2
|
||||||
|
pci/0000:06:00.0/2: type eth netdev enp6s0pf0vf1 flavour pcivf pfnum 0 vfnum 1
|
||||||
|
function:
|
||||||
|
hw_addr 00:00:00:00:00:00 roce disable
|
||||||
|
|
||||||
|
migratable capability setup
|
||||||
|
---------------------------
|
||||||
|
Live migration is the process of transferring a live virtual machine
|
||||||
|
from one physical host to another without disrupting its normal
|
||||||
|
operation.
|
||||||
|
|
||||||
|
User who want PCI VFs to be able to perform live migration need to
|
||||||
|
explicitly enable the VF migratable capability.
|
||||||
|
|
||||||
|
When user enables migratable capability for a VF, and the HV binds the VF to VFIO driver
|
||||||
|
with migration support, the user can migrate the VM with this VF from one HV to a
|
||||||
|
different one.
|
||||||
|
|
||||||
|
However, when migratable capability is enable, device will disable features which cannot
|
||||||
|
be migrated. Thus migratable cap can impose limitations on a VF so let the user decide.
|
||||||
|
|
||||||
|
Example of LM with migratable function configuration:
|
||||||
|
- Get migratable capability of the VF device::
|
||||||
|
|
||||||
|
$ devlink port show pci/0000:06:00.0/2
|
||||||
|
pci/0000:06:00.0/2: type eth netdev enp6s0pf0vf1 flavour pcivf pfnum 0 vfnum 1
|
||||||
|
function:
|
||||||
|
hw_addr 00:00:00:00:00:00 migratable disable
|
||||||
|
|
||||||
|
- Set migratable capability of the VF device::
|
||||||
|
|
||||||
|
$ devlink port function set pci/0000:06:00.0/2 migratable enable
|
||||||
|
|
||||||
|
$ devlink port show pci/0000:06:00.0/2
|
||||||
|
pci/0000:06:00.0/2: type eth netdev enp6s0pf0vf1 flavour pcivf pfnum 0 vfnum 1
|
||||||
|
function:
|
||||||
|
hw_addr 00:00:00:00:00:00 migratable enable
|
||||||
|
|
||||||
|
- Bind VF to VFIO driver with migration support::
|
||||||
|
|
||||||
|
$ echo <pci_id> > /sys/bus/pci/devices/0000:08:00.0/driver/unbind
|
||||||
|
$ echo mlx5_vfio_pci > /sys/bus/pci/devices/0000:08:00.0/driver_override
|
||||||
|
$ echo <pci_id> > /sys/bus/pci/devices/0000:08:00.0/driver/bind
|
||||||
|
|
||||||
|
Attach VF to the VM.
|
||||||
|
Start the VM.
|
||||||
|
Perform live migration.
|
||||||
|
|
||||||
Subfunction
|
Subfunction
|
||||||
============
|
============
|
||||||
|
|
||||||
|
@ -130,10 +248,11 @@ it is deployed. Subfunction is created and deployed in unit of 1. Unlike
|
||||||
SRIOV VFs, a subfunction doesn't require its own PCI virtual function.
|
SRIOV VFs, a subfunction doesn't require its own PCI virtual function.
|
||||||
A subfunction communicates with the hardware through the parent PCI function.
|
A subfunction communicates with the hardware through the parent PCI function.
|
||||||
|
|
||||||
To use a subfunction, 3 steps setup sequence is followed.
|
To use a subfunction, 3 steps setup sequence is followed:
|
||||||
(1) create - create a subfunction;
|
|
||||||
(2) configure - configure subfunction attributes;
|
1) create - create a subfunction;
|
||||||
(3) deploy - deploy the subfunction;
|
2) configure - configure subfunction attributes;
|
||||||
|
3) deploy - deploy the subfunction;
|
||||||
|
|
||||||
Subfunction management is done using devlink port user interface.
|
Subfunction management is done using devlink port user interface.
|
||||||
User performs setup on the subfunction management device.
|
User performs setup on the subfunction management device.
|
||||||
|
@ -191,13 +310,48 @@ API allows to configure following rate object's parameters:
|
||||||
``tx_max``
|
``tx_max``
|
||||||
Maximum TX rate value.
|
Maximum TX rate value.
|
||||||
|
|
||||||
|
``tx_priority``
|
||||||
|
Allows for usage of strict priority arbiter among siblings. This
|
||||||
|
arbitration scheme attempts to schedule nodes based on their priority
|
||||||
|
as long as the nodes remain within their bandwidth limit. The higher the
|
||||||
|
priority the higher the probability that the node will get selected for
|
||||||
|
scheduling.
|
||||||
|
|
||||||
|
``tx_weight``
|
||||||
|
Allows for usage of Weighted Fair Queuing arbitration scheme among
|
||||||
|
siblings. This arbitration scheme can be used simultaneously with the
|
||||||
|
strict priority. As a node is configured with a higher rate it gets more
|
||||||
|
BW relative to it's siblings. Values are relative like a percentage
|
||||||
|
points, they basically tell how much BW should node take relative to
|
||||||
|
it's siblings.
|
||||||
|
|
||||||
``parent``
|
``parent``
|
||||||
Parent node name. Parent node rate limits are considered as additional limits
|
Parent node name. Parent node rate limits are considered as additional limits
|
||||||
to all node children limits. ``tx_max`` is an upper limit for children.
|
to all node children limits. ``tx_max`` is an upper limit for children.
|
||||||
``tx_share`` is a total bandwidth distributed among children.
|
``tx_share`` is a total bandwidth distributed among children.
|
||||||
|
|
||||||
|
``tx_priority`` and ``tx_weight`` can be used simultaneously. In that case
|
||||||
|
nodes with the same priority form a WFQ subgroup in the sibling group
|
||||||
|
and arbitration among them is based on assigned weights.
|
||||||
|
|
||||||
|
Arbitration flow from the high level:
|
||||||
|
|
||||||
|
#. Choose a node, or group of nodes with the highest priority that stays
|
||||||
|
within the BW limit and are not blocked. Use ``tx_priority`` as a
|
||||||
|
parameter for this arbitration.
|
||||||
|
|
||||||
|
#. If group of nodes have the same priority perform WFQ arbitration on
|
||||||
|
that subgroup. Use ``tx_weight`` as a parameter for this arbitration.
|
||||||
|
|
||||||
|
#. Select the winner node, and continue arbitration flow among it's children,
|
||||||
|
until leaf node is reached, and the winner is established.
|
||||||
|
|
||||||
|
#. If all the nodes from the highest priority sub-group are satisfied, or
|
||||||
|
overused their assigned BW, move to the lower priority nodes.
|
||||||
|
|
||||||
Driver implementations are allowed to support both or either rate object types
|
Driver implementations are allowed to support both or either rate object types
|
||||||
and setting methods of their parameters.
|
and setting methods of their parameters. Additionally driver implementation
|
||||||
|
may export nodes/leafs and their child-parent relationships.
|
||||||
|
|
||||||
Terms and Definitions
|
Terms and Definitions
|
||||||
=====================
|
=====================
|
||||||
|
|
|
@ -31,6 +31,15 @@ in its ``devlink_region_ops`` structure. If snapshot id is not set in
|
||||||
the ``DEVLINK_CMD_REGION_NEW`` request kernel will allocate one and send
|
the ``DEVLINK_CMD_REGION_NEW`` request kernel will allocate one and send
|
||||||
the snapshot information to user space.
|
the snapshot information to user space.
|
||||||
|
|
||||||
|
Regions may optionally allow directly reading from their contents without a
|
||||||
|
snapshot. Direct read requests are not atomic. In particular a read request
|
||||||
|
of size 256 bytes or larger will be split into multiple chunks. If atomic
|
||||||
|
access is required, use a snapshot. A driver wishing to enable this for a
|
||||||
|
region should implement the ``.read`` callback in the ``devlink_region_ops``
|
||||||
|
structure. User space can request a direct read by using the
|
||||||
|
``DEVLINK_ATTR_REGION_DIRECT`` attribute instead of specifying a snapshot
|
||||||
|
id.
|
||||||
|
|
||||||
example usage
|
example usage
|
||||||
-------------
|
-------------
|
||||||
|
|
||||||
|
@ -65,6 +74,10 @@ example usage
|
||||||
$ devlink region read pci/0000:00:05.0/fw-health snapshot 1 address 0 length 16
|
$ devlink region read pci/0000:00:05.0/fw-health snapshot 1 address 0 length 16
|
||||||
0000000000000000 0014 95dc 0014 9514 0035 1670 0034 db30
|
0000000000000000 0014 95dc 0014 9514 0035 1670 0034 db30
|
||||||
|
|
||||||
|
# Read from the region without a snapshot
|
||||||
|
$ devlink region read pci/0000:00:05.0/fw-health address 16 length 16
|
||||||
|
0000000000000010 0000 0000 ffff ff04 0029 8c00 0028 8cc8
|
||||||
|
|
||||||
As regions are likely very device or driver specific, no generic regions are
|
As regions are likely very device or driver specific, no generic regions are
|
||||||
defined. See the driver-specific documentation files for information on the
|
defined. See the driver-specific documentation files for information on the
|
||||||
specific regions a driver supports.
|
specific regions a driver supports.
|
||||||
|
|
|
@ -485,6 +485,16 @@ be added to the following table:
|
||||||
- Traps incoming packets that the device decided to drop because
|
- Traps incoming packets that the device decided to drop because
|
||||||
the destination MAC is not configured in the MAC table and
|
the destination MAC is not configured in the MAC table and
|
||||||
the interface is not in promiscuous mode
|
the interface is not in promiscuous mode
|
||||||
|
* - ``eapol``
|
||||||
|
- ``control``
|
||||||
|
- Traps "Extensible Authentication Protocol over LAN" (EAPOL) packets
|
||||||
|
specified in IEEE 802.1X
|
||||||
|
* - ``locked_port``
|
||||||
|
- ``drop``
|
||||||
|
- Traps packets that the device decided to drop because they failed the
|
||||||
|
locked bridge port check. That is, packets that were received via a
|
||||||
|
locked port and whose {SMAC, VID} does not correspond to an FDB entry
|
||||||
|
pointing to the port
|
||||||
|
|
||||||
Driver-specific Packet Traps
|
Driver-specific Packet Traps
|
||||||
============================
|
============================
|
||||||
|
@ -589,6 +599,9 @@ narrow. The description of these groups must be added to the following table:
|
||||||
* - ``parser_error_drops``
|
* - ``parser_error_drops``
|
||||||
- Contains packet traps for packets that were marked by the device during
|
- Contains packet traps for packets that were marked by the device during
|
||||||
parsing as erroneous
|
parsing as erroneous
|
||||||
|
* - ``eapol``
|
||||||
|
- Contains packet traps for "Extensible Authentication Protocol over LAN"
|
||||||
|
(EAPOL) packets specified in IEEE 802.1X
|
||||||
|
|
||||||
Packet Trap Policers
|
Packet Trap Policers
|
||||||
====================
|
====================
|
||||||
|
|
36
Documentation/networking/devlink/etas_es58x.rst
Normal file
36
Documentation/networking/devlink/etas_es58x.rst
Normal file
|
@ -0,0 +1,36 @@
|
||||||
|
.. SPDX-License-Identifier: GPL-2.0
|
||||||
|
|
||||||
|
==========================
|
||||||
|
etas_es58x devlink support
|
||||||
|
==========================
|
||||||
|
|
||||||
|
This document describes the devlink features implemented by the
|
||||||
|
``etas_es58x`` device driver.
|
||||||
|
|
||||||
|
Info versions
|
||||||
|
=============
|
||||||
|
|
||||||
|
The ``etas_es58x`` driver reports the following versions
|
||||||
|
|
||||||
|
.. list-table:: devlink info versions implemented
|
||||||
|
:widths: 5 5 90
|
||||||
|
|
||||||
|
* - Name
|
||||||
|
- Type
|
||||||
|
- Description
|
||||||
|
* - ``fw``
|
||||||
|
- running
|
||||||
|
- Version of the firmware running on the device. Also available
|
||||||
|
through ``ethtool -i`` as the first member of the
|
||||||
|
``firmware-version``.
|
||||||
|
* - ``fw.bootloader``
|
||||||
|
- running
|
||||||
|
- Version of the bootloader running on the device. Also available
|
||||||
|
through ``ethtool -i`` as the second member of the
|
||||||
|
``firmware-version``.
|
||||||
|
* - ``board.rev``
|
||||||
|
- fixed
|
||||||
|
- The hardware revision of the device.
|
||||||
|
* - ``serial_number``
|
||||||
|
- fixed
|
||||||
|
- The USB serial number. Also available through ``lsusb -v``.
|
|
@ -189,12 +189,21 @@ device data.
|
||||||
* - ``nvm-flash``
|
* - ``nvm-flash``
|
||||||
- The contents of the entire flash chip, sometimes referred to as
|
- The contents of the entire flash chip, sometimes referred to as
|
||||||
the device's Non Volatile Memory.
|
the device's Non Volatile Memory.
|
||||||
|
* - ``shadow-ram``
|
||||||
|
- The contents of the Shadow RAM, which is loaded from the beginning
|
||||||
|
of the flash. Although the contents are primarily from the flash,
|
||||||
|
this area also contains data generated during device boot which is
|
||||||
|
not stored in flash.
|
||||||
* - ``device-caps``
|
* - ``device-caps``
|
||||||
- The contents of the device firmware's capabilities buffer. Useful to
|
- The contents of the device firmware's capabilities buffer. Useful to
|
||||||
determine the current state and configuration of the device.
|
determine the current state and configuration of the device.
|
||||||
|
|
||||||
Users can request an immediate capture of a snapshot via the
|
Both the ``nvm-flash`` and ``shadow-ram`` regions can be accessed without a
|
||||||
``DEVLINK_CMD_REGION_NEW``
|
snapshot. The ``device-caps`` region requires a snapshot as the contents are
|
||||||
|
sent by firmware and can't be split into separate reads.
|
||||||
|
|
||||||
|
Users can request an immediate capture of a snapshot for all three regions
|
||||||
|
via the ``DEVLINK_CMD_REGION_NEW`` command.
|
||||||
|
|
||||||
.. code:: shell
|
.. code:: shell
|
||||||
|
|
||||||
|
@ -254,3 +263,118 @@ Users can request an immediate capture of a snapshot via the
|
||||||
0000000000000210 00 00 00 00 00 00 00 00 00 00 00 00 00 00 00 00
|
0000000000000210 00 00 00 00 00 00 00 00 00 00 00 00 00 00 00 00
|
||||||
|
|
||||||
$ devlink region delete pci/0000:01:00.0/device-caps snapshot 1
|
$ devlink region delete pci/0000:01:00.0/device-caps snapshot 1
|
||||||
|
|
||||||
|
Devlink Rate
|
||||||
|
============
|
||||||
|
|
||||||
|
The ``ice`` driver implements devlink-rate API. It allows for offload of
|
||||||
|
the Hierarchical QoS to the hardware. It enables user to group Virtual
|
||||||
|
Functions in a tree structure and assign supported parameters: tx_share,
|
||||||
|
tx_max, tx_priority and tx_weight to each node in a tree. So effectively
|
||||||
|
user gains an ability to control how much bandwidth is allocated for each
|
||||||
|
VF group. This is later enforced by the HW.
|
||||||
|
|
||||||
|
It is assumed that this feature is mutually exclusive with DCB performed
|
||||||
|
in FW and ADQ, or any driver feature that would trigger changes in QoS,
|
||||||
|
for example creation of the new traffic class. The driver will prevent DCB
|
||||||
|
or ADQ configuration if user started making any changes to the nodes using
|
||||||
|
devlink-rate API. To configure those features a driver reload is necessary.
|
||||||
|
Correspondingly if ADQ or DCB will get configured the driver won't export
|
||||||
|
hierarchy at all, or will remove the untouched hierarchy if those
|
||||||
|
features are enabled after the hierarchy is exported, but before any
|
||||||
|
changes are made.
|
||||||
|
|
||||||
|
This feature is also dependent on switchdev being enabled in the system.
|
||||||
|
It's required bacause devlink-rate requires devlink-port objects to be
|
||||||
|
present, and those objects are only created in switchdev mode.
|
||||||
|
|
||||||
|
If the driver is set to the switchdev mode, it will export internal
|
||||||
|
hierarchy the moment VF's are created. Root of the tree is always
|
||||||
|
represented by the node_0. This node can't be deleted by the user. Leaf
|
||||||
|
nodes and nodes with children also can't be deleted.
|
||||||
|
|
||||||
|
.. list-table:: Attributes supported
|
||||||
|
:widths: 15 85
|
||||||
|
|
||||||
|
* - Name
|
||||||
|
- Description
|
||||||
|
* - ``tx_max``
|
||||||
|
- maximum bandwidth to be consumed by the tree Node. Rate Limit is
|
||||||
|
an absolute number specifying a maximum amount of bytes a Node may
|
||||||
|
consume during the course of one second. Rate limit guarantees
|
||||||
|
that a link will not oversaturate the receiver on the remote end
|
||||||
|
and also enforces an SLA between the subscriber and network
|
||||||
|
provider.
|
||||||
|
* - ``tx_share``
|
||||||
|
- minimum bandwidth allocated to a tree node when it is not blocked.
|
||||||
|
It specifies an absolute BW. While tx_max defines the maximum
|
||||||
|
bandwidth the node may consume, the tx_share marks committed BW
|
||||||
|
for the Node.
|
||||||
|
* - ``tx_priority``
|
||||||
|
- allows for usage of strict priority arbiter among siblings. This
|
||||||
|
arbitration scheme attempts to schedule nodes based on their
|
||||||
|
priority as long as the nodes remain within their bandwidth limit.
|
||||||
|
Range 0-7. Nodes with priority 7 have the highest priority and are
|
||||||
|
selected first, while nodes with priority 0 have the lowest
|
||||||
|
priority. Nodes that have the same priority are treated equally.
|
||||||
|
* - ``tx_weight``
|
||||||
|
- allows for usage of Weighted Fair Queuing arbitration scheme among
|
||||||
|
siblings. This arbitration scheme can be used simultaneously with
|
||||||
|
the strict priority. Range 1-200. Only relative values mater for
|
||||||
|
arbitration.
|
||||||
|
|
||||||
|
``tx_priority`` and ``tx_weight`` can be used simultaneously. In that case
|
||||||
|
nodes with the same priority form a WFQ subgroup in the sibling group
|
||||||
|
and arbitration among them is based on assigned weights.
|
||||||
|
|
||||||
|
.. code:: shell
|
||||||
|
|
||||||
|
# enable switchdev
|
||||||
|
$ devlink dev eswitch set pci/0000:4b:00.0 mode switchdev
|
||||||
|
|
||||||
|
# at this point driver should export internal hierarchy
|
||||||
|
$ echo 2 > /sys/class/net/ens785np0/device/sriov_numvfs
|
||||||
|
|
||||||
|
$ devlink port function rate show
|
||||||
|
pci/0000:4b:00.0/node_25: type node parent node_24
|
||||||
|
pci/0000:4b:00.0/node_24: type node parent node_0
|
||||||
|
pci/0000:4b:00.0/node_32: type node parent node_31
|
||||||
|
pci/0000:4b:00.0/node_31: type node parent node_30
|
||||||
|
pci/0000:4b:00.0/node_30: type node parent node_16
|
||||||
|
pci/0000:4b:00.0/node_19: type node parent node_18
|
||||||
|
pci/0000:4b:00.0/node_18: type node parent node_17
|
||||||
|
pci/0000:4b:00.0/node_17: type node parent node_16
|
||||||
|
pci/0000:4b:00.0/node_14: type node parent node_5
|
||||||
|
pci/0000:4b:00.0/node_5: type node parent node_3
|
||||||
|
pci/0000:4b:00.0/node_13: type node parent node_4
|
||||||
|
pci/0000:4b:00.0/node_12: type node parent node_4
|
||||||
|
pci/0000:4b:00.0/node_11: type node parent node_4
|
||||||
|
pci/0000:4b:00.0/node_10: type node parent node_4
|
||||||
|
pci/0000:4b:00.0/node_9: type node parent node_4
|
||||||
|
pci/0000:4b:00.0/node_8: type node parent node_4
|
||||||
|
pci/0000:4b:00.0/node_7: type node parent node_4
|
||||||
|
pci/0000:4b:00.0/node_6: type node parent node_4
|
||||||
|
pci/0000:4b:00.0/node_4: type node parent node_3
|
||||||
|
pci/0000:4b:00.0/node_3: type node parent node_16
|
||||||
|
pci/0000:4b:00.0/node_16: type node parent node_15
|
||||||
|
pci/0000:4b:00.0/node_15: type node parent node_0
|
||||||
|
pci/0000:4b:00.0/node_2: type node parent node_1
|
||||||
|
pci/0000:4b:00.0/node_1: type node parent node_0
|
||||||
|
pci/0000:4b:00.0/node_0: type node
|
||||||
|
pci/0000:4b:00.0/1: type leaf parent node_25
|
||||||
|
pci/0000:4b:00.0/2: type leaf parent node_25
|
||||||
|
|
||||||
|
# let's create some custom node
|
||||||
|
$ devlink port function rate add pci/0000:4b:00.0/node_custom parent node_0
|
||||||
|
|
||||||
|
# second custom node
|
||||||
|
$ devlink port function rate add pci/0000:4b:00.0/node_custom_1 parent node_custom
|
||||||
|
|
||||||
|
# reassign second VF to newly created branch
|
||||||
|
$ devlink port function rate set pci/0000:4b:00.0/2 parent node_custom_1
|
||||||
|
|
||||||
|
# assign tx_weight to the VF
|
||||||
|
$ devlink port function rate set pci/0000:4b:00.0/2 tx_weight 5
|
||||||
|
|
||||||
|
# assign tx_share to the VF
|
||||||
|
$ devlink port function rate set pci/0000:4b:00.0/2 tx_share 500Mbps
|
||||||
|
|
|
@ -222,6 +222,7 @@ Userspace to kernel:
|
||||||
``ETHTOOL_MSG_MODULE_GET`` get transceiver module parameters
|
``ETHTOOL_MSG_MODULE_GET`` get transceiver module parameters
|
||||||
``ETHTOOL_MSG_PSE_SET`` set PSE parameters
|
``ETHTOOL_MSG_PSE_SET`` set PSE parameters
|
||||||
``ETHTOOL_MSG_PSE_GET`` get PSE parameters
|
``ETHTOOL_MSG_PSE_GET`` get PSE parameters
|
||||||
|
``ETHTOOL_MSG_RSS_GET`` get RSS settings
|
||||||
===================================== =================================
|
===================================== =================================
|
||||||
|
|
||||||
Kernel to userspace:
|
Kernel to userspace:
|
||||||
|
@ -263,6 +264,7 @@ Kernel to userspace:
|
||||||
``ETHTOOL_MSG_PHC_VCLOCKS_GET_REPLY`` PHC virtual clocks info
|
``ETHTOOL_MSG_PHC_VCLOCKS_GET_REPLY`` PHC virtual clocks info
|
||||||
``ETHTOOL_MSG_MODULE_GET_REPLY`` transceiver module parameters
|
``ETHTOOL_MSG_MODULE_GET_REPLY`` transceiver module parameters
|
||||||
``ETHTOOL_MSG_PSE_GET_REPLY`` PSE parameters
|
``ETHTOOL_MSG_PSE_GET_REPLY`` PSE parameters
|
||||||
|
``ETHTOOL_MSG_RSS_GET_REPLY`` RSS settings
|
||||||
======================================== =================================
|
======================================== =================================
|
||||||
|
|
||||||
``GET`` requests are sent by userspace applications to retrieve device
|
``GET`` requests are sent by userspace applications to retrieve device
|
||||||
|
@ -491,6 +493,7 @@ Kernel response contents:
|
||||||
``ETHTOOL_A_LINKSTATE_SQI_MAX`` u32 Max support SQI value
|
``ETHTOOL_A_LINKSTATE_SQI_MAX`` u32 Max support SQI value
|
||||||
``ETHTOOL_A_LINKSTATE_EXT_STATE`` u8 link extended state
|
``ETHTOOL_A_LINKSTATE_EXT_STATE`` u8 link extended state
|
||||||
``ETHTOOL_A_LINKSTATE_EXT_SUBSTATE`` u8 link extended substate
|
``ETHTOOL_A_LINKSTATE_EXT_SUBSTATE`` u8 link extended substate
|
||||||
|
``ETHTOOL_A_LINKSTATE_EXT_DOWN_CNT`` u32 count of link down events
|
||||||
==================================== ====== ============================
|
==================================== ====== ============================
|
||||||
|
|
||||||
For most NIC drivers, the value of ``ETHTOOL_A_LINKSTATE_LINK`` returns
|
For most NIC drivers, the value of ``ETHTOOL_A_LINKSTATE_LINK`` returns
|
||||||
|
@ -1686,6 +1689,33 @@ to control PoDL PSE Admin functions. This option is implementing
|
||||||
``IEEE 802.3-2018`` 30.15.1.2.1 acPoDLPSEAdminControl. See
|
``IEEE 802.3-2018`` 30.15.1.2.1 acPoDLPSEAdminControl. See
|
||||||
``ETHTOOL_A_PODL_PSE_ADMIN_STATE`` for supported values.
|
``ETHTOOL_A_PODL_PSE_ADMIN_STATE`` for supported values.
|
||||||
|
|
||||||
|
RSS_GET
|
||||||
|
=======
|
||||||
|
|
||||||
|
Get indirection table, hash key and hash function info associated with a
|
||||||
|
RSS context of an interface similar to ``ETHTOOL_GRSSH`` ioctl request.
|
||||||
|
|
||||||
|
Request contents:
|
||||||
|
|
||||||
|
===================================== ====== ==========================
|
||||||
|
``ETHTOOL_A_RSS_HEADER`` nested request header
|
||||||
|
``ETHTOOL_A_RSS_CONTEXT`` u32 context number
|
||||||
|
===================================== ====== ==========================
|
||||||
|
|
||||||
|
Kernel response contents:
|
||||||
|
|
||||||
|
===================================== ====== ==========================
|
||||||
|
``ETHTOOL_A_RSS_HEADER`` nested reply header
|
||||||
|
``ETHTOOL_A_RSS_HFUNC`` u32 RSS hash func
|
||||||
|
``ETHTOOL_A_RSS_INDIR`` binary Indir table bytes
|
||||||
|
``ETHTOOL_A_RSS_HKEY`` binary Hash key bytes
|
||||||
|
===================================== ====== ==========================
|
||||||
|
|
||||||
|
ETHTOOL_A_RSS_HFUNC attribute is bitmap indicating the hash function
|
||||||
|
being used. Current supported options are toeplitz, xor or crc32.
|
||||||
|
ETHTOOL_A_RSS_INDIR attribute returns RSS indrection table where each byte
|
||||||
|
indicates queue number.
|
||||||
|
|
||||||
Request translation
|
Request translation
|
||||||
===================
|
===================
|
||||||
|
|
||||||
|
@ -1767,7 +1797,7 @@ are netlink only.
|
||||||
``ETHTOOL_GMODULEEEPROM`` ``ETHTOOL_MSG_MODULE_EEPROM_GET``
|
``ETHTOOL_GMODULEEEPROM`` ``ETHTOOL_MSG_MODULE_EEPROM_GET``
|
||||||
``ETHTOOL_GEEE`` ``ETHTOOL_MSG_EEE_GET``
|
``ETHTOOL_GEEE`` ``ETHTOOL_MSG_EEE_GET``
|
||||||
``ETHTOOL_SEEE`` ``ETHTOOL_MSG_EEE_SET``
|
``ETHTOOL_SEEE`` ``ETHTOOL_MSG_EEE_SET``
|
||||||
``ETHTOOL_GRSSH`` n/a
|
``ETHTOOL_GRSSH`` ``ETHTOOL_MSG_RSS_GET``
|
||||||
``ETHTOOL_SRSSH`` n/a
|
``ETHTOOL_SRSSH`` n/a
|
||||||
``ETHTOOL_GTUNABLE`` n/a
|
``ETHTOOL_GTUNABLE`` n/a
|
||||||
``ETHTOOL_STUNABLE`` n/a
|
``ETHTOOL_STUNABLE`` n/a
|
||||||
|
|
|
@ -104,6 +104,7 @@ Contents:
|
||||||
switchdev
|
switchdev
|
||||||
sysfs-tagging
|
sysfs-tagging
|
||||||
tc-actions-env-rules
|
tc-actions-env-rules
|
||||||
|
tc-queue-filters
|
||||||
tcp-thin
|
tcp-thin
|
||||||
team
|
team
|
||||||
timestamping
|
timestamping
|
||||||
|
|
|
@ -1069,6 +1069,81 @@ tcp_child_ehash_entries - INTEGER
|
||||||
|
|
||||||
Default: 0
|
Default: 0
|
||||||
|
|
||||||
|
tcp_plb_enabled - BOOLEAN
|
||||||
|
If set and the underlying congestion control (e.g. DCTCP) supports
|
||||||
|
and enables PLB feature, TCP PLB (Protective Load Balancing) is
|
||||||
|
enabled. PLB is described in the following paper:
|
||||||
|
https://doi.org/10.1145/3544216.3544226. Based on PLB parameters,
|
||||||
|
upon sensing sustained congestion, TCP triggers a change in
|
||||||
|
flow label field for outgoing IPv6 packets. A change in flow label
|
||||||
|
field potentially changes the path of outgoing packets for switches
|
||||||
|
that use ECMP/WCMP for routing.
|
||||||
|
|
||||||
|
PLB changes socket txhash which results in a change in IPv6 Flow Label
|
||||||
|
field, and currently no-op for IPv4 headers. It is possible
|
||||||
|
to apply PLB for IPv4 with other network header fields (e.g. TCP
|
||||||
|
or IPv4 options) or using encapsulation where outer header is used
|
||||||
|
by switches to determine next hop. In either case, further host
|
||||||
|
and switch side changes will be needed.
|
||||||
|
|
||||||
|
When set, PLB assumes that congestion signal (e.g. ECN) is made
|
||||||
|
available and used by congestion control module to estimate a
|
||||||
|
congestion measure (e.g. ce_ratio). PLB needs a congestion measure to
|
||||||
|
make repathing decisions.
|
||||||
|
|
||||||
|
Default: FALSE
|
||||||
|
|
||||||
|
tcp_plb_idle_rehash_rounds - INTEGER
|
||||||
|
Number of consecutive congested rounds (RTT) seen after which
|
||||||
|
a rehash can be performed, given there are no packets in flight.
|
||||||
|
This is referred to as M in PLB paper:
|
||||||
|
https://doi.org/10.1145/3544216.3544226.
|
||||||
|
|
||||||
|
Possible Values: 0 - 31
|
||||||
|
|
||||||
|
Default: 3
|
||||||
|
|
||||||
|
tcp_plb_rehash_rounds - INTEGER
|
||||||
|
Number of consecutive congested rounds (RTT) seen after which
|
||||||
|
a forced rehash can be performed. Be careful when setting this
|
||||||
|
parameter, as a small value increases the risk of retransmissions.
|
||||||
|
This is referred to as N in PLB paper:
|
||||||
|
https://doi.org/10.1145/3544216.3544226.
|
||||||
|
|
||||||
|
Possible Values: 0 - 31
|
||||||
|
|
||||||
|
Default: 12
|
||||||
|
|
||||||
|
tcp_plb_suspend_rto_sec - INTEGER
|
||||||
|
Time, in seconds, to suspend PLB in event of an RTO. In order to avoid
|
||||||
|
having PLB repath onto a connectivity "black hole", after an RTO a TCP
|
||||||
|
connection suspends PLB repathing for a random duration between 1x and
|
||||||
|
2x of this parameter. Randomness is added to avoid concurrent rehashing
|
||||||
|
of multiple TCP connections. This should be set corresponding to the
|
||||||
|
amount of time it takes to repair a failed link.
|
||||||
|
|
||||||
|
Possible Values: 0 - 255
|
||||||
|
|
||||||
|
Default: 60
|
||||||
|
|
||||||
|
tcp_plb_cong_thresh - INTEGER
|
||||||
|
Fraction of packets marked with congestion over a round (RTT) to
|
||||||
|
tag that round as congested. This is referred to as K in the PLB paper:
|
||||||
|
https://doi.org/10.1145/3544216.3544226.
|
||||||
|
|
||||||
|
The 0-1 fraction range is mapped to 0-256 range to avoid floating
|
||||||
|
point operations. For example, 128 means that if at least 50% of
|
||||||
|
the packets in a round were marked as congested then the round
|
||||||
|
will be tagged as congested.
|
||||||
|
|
||||||
|
Setting threshold to 0 means that PLB repaths every RTT regardless
|
||||||
|
of congestion. This is not intended behavior for PLB and should be
|
||||||
|
used only for experimentation purpose.
|
||||||
|
|
||||||
|
Possible Values: 0 - 256
|
||||||
|
|
||||||
|
Default: 128
|
||||||
|
|
||||||
UDP variables
|
UDP variables
|
||||||
=============
|
=============
|
||||||
|
|
||||||
|
@ -1102,6 +1177,33 @@ udp_rmem_min - INTEGER
|
||||||
udp_wmem_min - INTEGER
|
udp_wmem_min - INTEGER
|
||||||
UDP does not have tx memory accounting and this tunable has no effect.
|
UDP does not have tx memory accounting and this tunable has no effect.
|
||||||
|
|
||||||
|
udp_hash_entries - INTEGER
|
||||||
|
Show the number of hash buckets for UDP sockets in the current
|
||||||
|
networking namespace.
|
||||||
|
|
||||||
|
A negative value means the networking namespace does not own its
|
||||||
|
hash buckets and shares the initial networking namespace's one.
|
||||||
|
|
||||||
|
udp_child_ehash_entries - INTEGER
|
||||||
|
Control the number of hash buckets for UDP sockets in the child
|
||||||
|
networking namespace, which must be set before clone() or unshare().
|
||||||
|
|
||||||
|
If the value is not 0, the kernel uses a value rounded up to 2^n
|
||||||
|
as the actual hash bucket size. 0 is a special value, meaning
|
||||||
|
the child networking namespace will share the initial networking
|
||||||
|
namespace's hash buckets.
|
||||||
|
|
||||||
|
Note that the child will use the global one in case the kernel
|
||||||
|
fails to allocate enough memory. In addition, the global hash
|
||||||
|
buckets are spread over available NUMA nodes, but the allocation
|
||||||
|
of the child hash table depends on the current process's NUMA
|
||||||
|
policy, which could result in performance differences.
|
||||||
|
|
||||||
|
Possible values: 0, 2^n (n: 7 (128) - 16 (64K))
|
||||||
|
|
||||||
|
Default: 0
|
||||||
|
|
||||||
|
|
||||||
RAW variables
|
RAW variables
|
||||||
=============
|
=============
|
||||||
|
|
||||||
|
@ -3025,6 +3127,15 @@ ecn_enable - BOOLEAN
|
||||||
|
|
||||||
Default: 1
|
Default: 1
|
||||||
|
|
||||||
|
l3mdev_accept - BOOLEAN
|
||||||
|
Enabling this option allows a "global" bound socket to work
|
||||||
|
across L3 master domains (e.g., VRFs) with packets capable of
|
||||||
|
being received regardless of the L3 domain in which they
|
||||||
|
originated. Only valid when the kernel was compiled with
|
||||||
|
CONFIG_NET_L3_MASTER_DEV.
|
||||||
|
|
||||||
|
Default: 1 (enabled)
|
||||||
|
|
||||||
|
|
||||||
``/proc/sys/net/core/*``
|
``/proc/sys/net/core/*``
|
||||||
========================
|
========================
|
||||||
|
|
|
@ -129,6 +129,26 @@ drop_packet - INTEGER
|
||||||
threshold. When the mode 3 is set, the always mode drop rate
|
threshold. When the mode 3 is set, the always mode drop rate
|
||||||
is controlled by the /proc/sys/net/ipv4/vs/am_droprate.
|
is controlled by the /proc/sys/net/ipv4/vs/am_droprate.
|
||||||
|
|
||||||
|
est_cpulist - CPULIST
|
||||||
|
Allowed CPUs for estimation kthreads
|
||||||
|
|
||||||
|
Syntax: standard cpulist format
|
||||||
|
empty list - stop kthread tasks and estimation
|
||||||
|
default - the system's housekeeping CPUs for kthreads
|
||||||
|
|
||||||
|
Example:
|
||||||
|
"all": all possible CPUs
|
||||||
|
"0-N": all possible CPUs, N denotes last CPU number
|
||||||
|
"0,1-N:1/2": first and all CPUs with odd number
|
||||||
|
"": empty list
|
||||||
|
|
||||||
|
est_nice - INTEGER
|
||||||
|
default 0
|
||||||
|
Valid range: -20 (more favorable) .. 19 (less favorable)
|
||||||
|
|
||||||
|
Niceness value to use for the estimation kthreads (scheduling
|
||||||
|
priority)
|
||||||
|
|
||||||
expire_nodest_conn - BOOLEAN
|
expire_nodest_conn - BOOLEAN
|
||||||
- 0 - disabled (default)
|
- 0 - disabled (default)
|
||||||
- not 0 - enabled
|
- not 0 - enabled
|
||||||
|
@ -304,8 +324,8 @@ run_estimation - BOOLEAN
|
||||||
0 - disabled
|
0 - disabled
|
||||||
not 0 - enabled (default)
|
not 0 - enabled (default)
|
||||||
|
|
||||||
If disabled, the estimation will be stop, and you can't see
|
If disabled, the estimation will be suspended and kthread tasks
|
||||||
any update on speed estimation data.
|
stopped.
|
||||||
|
|
||||||
You can always re-enable estimation by setting this value to 1.
|
You can always re-enable estimation by setting this value to 1.
|
||||||
But be careful, the first estimation after re-enable is not
|
But be careful, the first estimation after re-enable is not
|
||||||
|
|
37
Documentation/networking/tc-queue-filters.rst
Normal file
37
Documentation/networking/tc-queue-filters.rst
Normal file
|
@ -0,0 +1,37 @@
|
||||||
|
.. SPDX-License-Identifier: GPL-2.0
|
||||||
|
|
||||||
|
=========================
|
||||||
|
TC queue based filtering
|
||||||
|
=========================
|
||||||
|
|
||||||
|
TC can be used for directing traffic to either a set of queues or
|
||||||
|
to a single queue on both the transmit and receive side.
|
||||||
|
|
||||||
|
On the transmit side:
|
||||||
|
|
||||||
|
1) TC filter directing traffic to a set of queues is achieved
|
||||||
|
using the action skbedit priority for Tx priority selection,
|
||||||
|
the priority maps to a traffic class (set of queues) when
|
||||||
|
the queue-sets are configured using mqprio.
|
||||||
|
|
||||||
|
2) TC filter directs traffic to a transmit queue with the action
|
||||||
|
skbedit queue_mapping $tx_qid. The action skbedit queue_mapping
|
||||||
|
for transmit queue is executed in software only and cannot be
|
||||||
|
offloaded.
|
||||||
|
|
||||||
|
Likewise, on the receive side, the two filters for selecting set of
|
||||||
|
queues and/or a single queue are supported as below:
|
||||||
|
|
||||||
|
1) TC flower filter directs incoming traffic to a set of queues using
|
||||||
|
the 'hw_tc' option.
|
||||||
|
hw_tc $TCID - Specify a hardware traffic class to pass matching
|
||||||
|
packets on to. TCID is in the range 0 through 15.
|
||||||
|
|
||||||
|
2) TC filter with action skbedit queue_mapping $rx_qid selects a
|
||||||
|
receive queue. The action skbedit queue_mapping for receive queue
|
||||||
|
is supported only in hardware. Multiple filters may compete in
|
||||||
|
the hardware for queue selection. In such case, the hardware
|
||||||
|
pipeline resolves conflicts based on priority. On Intel E810
|
||||||
|
devices, TC filter directing traffic to a queue have higher
|
||||||
|
priority over flow director filter assigning a queue. The hash
|
||||||
|
filter has lowest priority.
|
|
@ -179,7 +179,8 @@ SOF_TIMESTAMPING_OPT_ID:
|
||||||
identifier and returns that along with the timestamp. The identifier
|
identifier and returns that along with the timestamp. The identifier
|
||||||
is derived from a per-socket u32 counter (that wraps). For datagram
|
is derived from a per-socket u32 counter (that wraps). For datagram
|
||||||
sockets, the counter increments with each sent packet. For stream
|
sockets, the counter increments with each sent packet. For stream
|
||||||
sockets, it increments with every byte.
|
sockets, it increments with every byte. For stream sockets, also set
|
||||||
|
SOF_TIMESTAMPING_OPT_ID_TCP, see the section below.
|
||||||
|
|
||||||
The counter starts at zero. It is initialized the first time that
|
The counter starts at zero. It is initialized the first time that
|
||||||
the socket option is enabled. It is reset each time the option is
|
the socket option is enabled. It is reset each time the option is
|
||||||
|
@ -192,6 +193,35 @@ SOF_TIMESTAMPING_OPT_ID:
|
||||||
among all possibly concurrently outstanding timestamp requests for
|
among all possibly concurrently outstanding timestamp requests for
|
||||||
that socket.
|
that socket.
|
||||||
|
|
||||||
|
SOF_TIMESTAMPING_OPT_ID_TCP:
|
||||||
|
Pass this modifier along with SOF_TIMESTAMPING_OPT_ID for new TCP
|
||||||
|
timestamping applications. SOF_TIMESTAMPING_OPT_ID defines how the
|
||||||
|
counter increments for stream sockets, but its starting point is
|
||||||
|
not entirely trivial. This option fixes that.
|
||||||
|
|
||||||
|
For stream sockets, if SOF_TIMESTAMPING_OPT_ID is set, this should
|
||||||
|
always be set too. On datagram sockets the option has no effect.
|
||||||
|
|
||||||
|
A reasonable expectation is that the counter is reset to zero with
|
||||||
|
the system call, so that a subsequent write() of N bytes generates
|
||||||
|
a timestamp with counter N-1. SOF_TIMESTAMPING_OPT_ID_TCP
|
||||||
|
implements this behavior under all conditions.
|
||||||
|
|
||||||
|
SOF_TIMESTAMPING_OPT_ID without modifier often reports the same,
|
||||||
|
especially when the socket option is set when no data is in
|
||||||
|
transmission. If data is being transmitted, it may be off by the
|
||||||
|
length of the output queue (SIOCOUTQ).
|
||||||
|
|
||||||
|
The difference is due to being based on snd_una versus write_seq.
|
||||||
|
snd_una is the offset in the stream acknowledged by the peer. This
|
||||||
|
depends on factors outside of process control, such as network RTT.
|
||||||
|
write_seq is the last byte written by the process. This offset is
|
||||||
|
not affected by external inputs.
|
||||||
|
|
||||||
|
The difference is subtle and unlikely to be noticed when configured
|
||||||
|
at initial socket creation, when no data is queued or sent. But
|
||||||
|
SOF_TIMESTAMPING_OPT_ID_TCP behavior is more robust regardless of
|
||||||
|
when the socket option is set.
|
||||||
|
|
||||||
SOF_TIMESTAMPING_OPT_CMSG:
|
SOF_TIMESTAMPING_OPT_CMSG:
|
||||||
Support recv() cmsg for all timestamped packets. Control messages
|
Support recv() cmsg for all timestamped packets. Control messages
|
||||||
|
|
|
@ -5,6 +5,7 @@ XFRM device - offloading the IPsec computations
|
||||||
===============================================
|
===============================================
|
||||||
|
|
||||||
Shannon Nelson <shannon.nelson@oracle.com>
|
Shannon Nelson <shannon.nelson@oracle.com>
|
||||||
|
Leon Romanovsky <leonro@nvidia.com>
|
||||||
|
|
||||||
|
|
||||||
Overview
|
Overview
|
||||||
|
@ -18,10 +19,21 @@ can radically increase throughput and decrease CPU utilization. The XFRM
|
||||||
Device interface allows NIC drivers to offer to the stack access to the
|
Device interface allows NIC drivers to offer to the stack access to the
|
||||||
hardware offload.
|
hardware offload.
|
||||||
|
|
||||||
|
Right now, there are two types of hardware offload that kernel supports.
|
||||||
|
* IPsec crypto offload:
|
||||||
|
* NIC performs encrypt/decrypt
|
||||||
|
* Kernel does everything else
|
||||||
|
* IPsec packet offload:
|
||||||
|
* NIC performs encrypt/decrypt
|
||||||
|
* NIC does encapsulation
|
||||||
|
* Kernel and NIC have SA and policy in-sync
|
||||||
|
* NIC handles the SA and policies states
|
||||||
|
* The Kernel talks to the keymanager
|
||||||
|
|
||||||
Userland access to the offload is typically through a system such as
|
Userland access to the offload is typically through a system such as
|
||||||
libreswan or KAME/raccoon, but the iproute2 'ip xfrm' command set can
|
libreswan or KAME/raccoon, but the iproute2 'ip xfrm' command set can
|
||||||
be handy when experimenting. An example command might look something
|
be handy when experimenting. An example command might look something
|
||||||
like this::
|
like this for crypto offload:
|
||||||
|
|
||||||
ip x s add proto esp dst 14.0.0.70 src 14.0.0.52 spi 0x07 mode transport \
|
ip x s add proto esp dst 14.0.0.70 src 14.0.0.52 spi 0x07 mode transport \
|
||||||
reqid 0x07 replay-window 32 \
|
reqid 0x07 replay-window 32 \
|
||||||
|
@ -29,6 +41,17 @@ like this::
|
||||||
sel src 14.0.0.52/24 dst 14.0.0.70/24 proto tcp \
|
sel src 14.0.0.52/24 dst 14.0.0.70/24 proto tcp \
|
||||||
offload dev eth4 dir in
|
offload dev eth4 dir in
|
||||||
|
|
||||||
|
and for packet offload
|
||||||
|
|
||||||
|
ip x s add proto esp dst 14.0.0.70 src 14.0.0.52 spi 0x07 mode transport \
|
||||||
|
reqid 0x07 replay-window 32 \
|
||||||
|
aead 'rfc4106(gcm(aes))' 0x44434241343332312423222114131211f4f3f2f1 128 \
|
||||||
|
sel src 14.0.0.52/24 dst 14.0.0.70/24 proto tcp \
|
||||||
|
offload packet dev eth4 dir in
|
||||||
|
|
||||||
|
ip x p add src 14.0.0.70 dst 14.0.0.52 offload packet dev eth4 dir in
|
||||||
|
tmpl src 14.0.0.70 dst 14.0.0.52 proto esp reqid 10000 mode transport
|
||||||
|
|
||||||
Yes, that's ugly, but that's what shell scripts and/or libreswan are for.
|
Yes, that's ugly, but that's what shell scripts and/or libreswan are for.
|
||||||
|
|
||||||
|
|
||||||
|
@ -40,17 +63,24 @@ Callbacks to implement
|
||||||
|
|
||||||
/* from include/linux/netdevice.h */
|
/* from include/linux/netdevice.h */
|
||||||
struct xfrmdev_ops {
|
struct xfrmdev_ops {
|
||||||
|
/* Crypto and Packet offload callbacks */
|
||||||
int (*xdo_dev_state_add) (struct xfrm_state *x);
|
int (*xdo_dev_state_add) (struct xfrm_state *x);
|
||||||
void (*xdo_dev_state_delete) (struct xfrm_state *x);
|
void (*xdo_dev_state_delete) (struct xfrm_state *x);
|
||||||
void (*xdo_dev_state_free) (struct xfrm_state *x);
|
void (*xdo_dev_state_free) (struct xfrm_state *x);
|
||||||
bool (*xdo_dev_offload_ok) (struct sk_buff *skb,
|
bool (*xdo_dev_offload_ok) (struct sk_buff *skb,
|
||||||
struct xfrm_state *x);
|
struct xfrm_state *x);
|
||||||
void (*xdo_dev_state_advance_esn) (struct xfrm_state *x);
|
void (*xdo_dev_state_advance_esn) (struct xfrm_state *x);
|
||||||
|
|
||||||
|
/* Solely packet offload callbacks */
|
||||||
|
void (*xdo_dev_state_update_curlft) (struct xfrm_state *x);
|
||||||
|
int (*xdo_dev_policy_add) (struct xfrm_policy *x);
|
||||||
|
void (*xdo_dev_policy_delete) (struct xfrm_policy *x);
|
||||||
|
void (*xdo_dev_policy_free) (struct xfrm_policy *x);
|
||||||
};
|
};
|
||||||
|
|
||||||
The NIC driver offering ipsec offload will need to implement these
|
The NIC driver offering ipsec offload will need to implement callbacks
|
||||||
callbacks to make the offload available to the network stack's
|
relevant to supported offload to make the offload available to the network
|
||||||
XFRM subsystem. Additionally, the feature bits NETIF_F_HW_ESP and
|
stack's XFRM subsystem. Additionally, the feature bits NETIF_F_HW_ESP and
|
||||||
NETIF_F_HW_ESP_TX_CSUM will signal the availability of the offload.
|
NETIF_F_HW_ESP_TX_CSUM will signal the availability of the offload.
|
||||||
|
|
||||||
|
|
||||||
|
@ -79,7 +109,8 @@ and an indication of whether it is for Rx or Tx. The driver should
|
||||||
|
|
||||||
=========== ===================================
|
=========== ===================================
|
||||||
0 success
|
0 success
|
||||||
-EOPNETSUPP offload not supported, try SW IPsec
|
-EOPNETSUPP offload not supported, try SW IPsec,
|
||||||
|
not applicable for packet offload mode
|
||||||
other fail the request
|
other fail the request
|
||||||
=========== ===================================
|
=========== ===================================
|
||||||
|
|
||||||
|
@ -96,6 +127,7 @@ will serviceable. This can check the packet information to be sure the
|
||||||
offload can be supported (e.g. IPv4 or IPv6, no IPv4 options, etc) and
|
offload can be supported (e.g. IPv4 or IPv6, no IPv4 options, etc) and
|
||||||
return true of false to signify its support.
|
return true of false to signify its support.
|
||||||
|
|
||||||
|
Crypto offload mode:
|
||||||
When ready to send, the driver needs to inspect the Tx packet for the
|
When ready to send, the driver needs to inspect the Tx packet for the
|
||||||
offload information, including the opaque context, and set up the packet
|
offload information, including the opaque context, and set up the packet
|
||||||
send accordingly::
|
send accordingly::
|
||||||
|
@ -139,13 +171,25 @@ the stack in xfrm_input().
|
||||||
In ESN mode, xdo_dev_state_advance_esn() is called from xfrm_replay_advance_esn().
|
In ESN mode, xdo_dev_state_advance_esn() is called from xfrm_replay_advance_esn().
|
||||||
Driver will check packet seq number and update HW ESN state machine if needed.
|
Driver will check packet seq number and update HW ESN state machine if needed.
|
||||||
|
|
||||||
|
Packet offload mode:
|
||||||
|
HW adds and deletes XFRM headers. So in RX path, XFRM stack is bypassed if HW
|
||||||
|
reported success. In TX path, the packet lefts kernel without extra header
|
||||||
|
and not encrypted, the HW is responsible to perform it.
|
||||||
|
|
||||||
When the SA is removed by the user, the driver's xdo_dev_state_delete()
|
When the SA is removed by the user, the driver's xdo_dev_state_delete()
|
||||||
is asked to disable the offload. Later, xdo_dev_state_free() is called
|
and xdo_dev_policy_delete() are asked to disable the offload. Later,
|
||||||
from a garbage collection routine after all reference counts to the state
|
xdo_dev_state_free() and xdo_dev_policy_free() are called from a garbage
|
||||||
|
collection routine after all reference counts to the state and policy
|
||||||
have been removed and any remaining resources can be cleared for the
|
have been removed and any remaining resources can be cleared for the
|
||||||
offload state. How these are used by the driver will depend on specific
|
offload state. How these are used by the driver will depend on specific
|
||||||
hardware needs.
|
hardware needs.
|
||||||
|
|
||||||
As a netdev is set to DOWN the XFRM stack's netdev listener will call
|
As a netdev is set to DOWN the XFRM stack's netdev listener will call
|
||||||
xdo_dev_state_delete() and xdo_dev_state_free() on any remaining offloaded
|
xdo_dev_state_delete(), xdo_dev_policy_delete(), xdo_dev_state_free() and
|
||||||
states.
|
xdo_dev_policy_free() on any remaining offloaded states.
|
||||||
|
|
||||||
|
Outcome of HW handling packets, the XFRM core can't count hard, soft limits.
|
||||||
|
The HW/driver are responsible to perform it and provide accurate data when
|
||||||
|
xdo_dev_state_update_curlft() is called. In case of one of these limits
|
||||||
|
occuried, the driver needs to call to xfrm_state_check_expire() to make sure
|
||||||
|
that XFRM performs rekeying sequence.
|
||||||
|
|
23
MAINTAINERS
23
MAINTAINERS
|
@ -1932,6 +1932,7 @@ F: Documentation/devicetree/bindings/interrupt-controller/apple,*
|
||||||
F: Documentation/devicetree/bindings/iommu/apple,dart.yaml
|
F: Documentation/devicetree/bindings/iommu/apple,dart.yaml
|
||||||
F: Documentation/devicetree/bindings/iommu/apple,sart.yaml
|
F: Documentation/devicetree/bindings/iommu/apple,sart.yaml
|
||||||
F: Documentation/devicetree/bindings/mailbox/apple,mailbox.yaml
|
F: Documentation/devicetree/bindings/mailbox/apple,mailbox.yaml
|
||||||
|
F: Documentation/devicetree/bindings/net/bluetooth/brcm,bcm4377-bluetooth.yaml
|
||||||
F: Documentation/devicetree/bindings/nvme/apple,nvme-ans.yaml
|
F: Documentation/devicetree/bindings/nvme/apple,nvme-ans.yaml
|
||||||
F: Documentation/devicetree/bindings/nvmem/apple,efuses.yaml
|
F: Documentation/devicetree/bindings/nvmem/apple,efuses.yaml
|
||||||
F: Documentation/devicetree/bindings/pci/apple,pcie.yaml
|
F: Documentation/devicetree/bindings/pci/apple,pcie.yaml
|
||||||
|
@ -1939,6 +1940,7 @@ F: Documentation/devicetree/bindings/pinctrl/apple,pinctrl.yaml
|
||||||
F: Documentation/devicetree/bindings/power/apple*
|
F: Documentation/devicetree/bindings/power/apple*
|
||||||
F: Documentation/devicetree/bindings/watchdog/apple,wdt.yaml
|
F: Documentation/devicetree/bindings/watchdog/apple,wdt.yaml
|
||||||
F: arch/arm64/boot/dts/apple/
|
F: arch/arm64/boot/dts/apple/
|
||||||
|
F: drivers/bluetooth/hci_bcm4377.c
|
||||||
F: drivers/clk/clk-apple-nco.c
|
F: drivers/clk/clk-apple-nco.c
|
||||||
F: drivers/cpufreq/apple-soc-cpufreq.c
|
F: drivers/cpufreq/apple-soc-cpufreq.c
|
||||||
F: drivers/dma/apple-admac.c
|
F: drivers/dma/apple-admac.c
|
||||||
|
@ -2470,6 +2472,7 @@ L: linux-arm-kernel@lists.infradead.org (moderated for non-subscribers)
|
||||||
S: Supported
|
S: Supported
|
||||||
T: git git://github.com/microchip-ung/linux-upstream.git
|
T: git git://github.com/microchip-ung/linux-upstream.git
|
||||||
F: arch/arm64/boot/dts/microchip/
|
F: arch/arm64/boot/dts/microchip/
|
||||||
|
F: drivers/net/ethernet/microchip/vcap/
|
||||||
F: drivers/pinctrl/pinctrl-microchip-sgpio.c
|
F: drivers/pinctrl/pinctrl-microchip-sgpio.c
|
||||||
N: sparx5
|
N: sparx5
|
||||||
|
|
||||||
|
@ -6362,6 +6365,7 @@ F: drivers/net/ethernet/freescale/dpaa2/Kconfig
|
||||||
F: drivers/net/ethernet/freescale/dpaa2/Makefile
|
F: drivers/net/ethernet/freescale/dpaa2/Makefile
|
||||||
F: drivers/net/ethernet/freescale/dpaa2/dpaa2-eth*
|
F: drivers/net/ethernet/freescale/dpaa2/dpaa2-eth*
|
||||||
F: drivers/net/ethernet/freescale/dpaa2/dpaa2-mac*
|
F: drivers/net/ethernet/freescale/dpaa2/dpaa2-mac*
|
||||||
|
F: drivers/net/ethernet/freescale/dpaa2/dpaa2-xsk*
|
||||||
F: drivers/net/ethernet/freescale/dpaa2/dpkg.h
|
F: drivers/net/ethernet/freescale/dpaa2/dpkg.h
|
||||||
F: drivers/net/ethernet/freescale/dpaa2/dpmac*
|
F: drivers/net/ethernet/freescale/dpaa2/dpmac*
|
||||||
F: drivers/net/ethernet/freescale/dpaa2/dpni*
|
F: drivers/net/ethernet/freescale/dpaa2/dpni*
|
||||||
|
@ -7734,6 +7738,7 @@ ETAS ES58X CAN/USB DRIVER
|
||||||
M: Vincent Mailhol <mailhol.vincent@wanadoo.fr>
|
M: Vincent Mailhol <mailhol.vincent@wanadoo.fr>
|
||||||
L: linux-can@vger.kernel.org
|
L: linux-can@vger.kernel.org
|
||||||
S: Maintained
|
S: Maintained
|
||||||
|
F: Documentation/networking/devlink/etas_es58x.rst
|
||||||
F: drivers/net/can/usb/etas_es58x/
|
F: drivers/net/can/usb/etas_es58x/
|
||||||
|
|
||||||
ETHERNET BRIDGE
|
ETHERNET BRIDGE
|
||||||
|
@ -8236,7 +8241,10 @@ S: Maintained
|
||||||
F: drivers/i2c/busses/i2c-cpm.c
|
F: drivers/i2c/busses/i2c-cpm.c
|
||||||
|
|
||||||
FREESCALE IMX / MXC FEC DRIVER
|
FREESCALE IMX / MXC FEC DRIVER
|
||||||
M: Joakim Zhang <qiangqing.zhang@nxp.com>
|
M: Wei Fang <wei.fang@nxp.com>
|
||||||
|
R: Shenwei Wang <shenwei.wang@nxp.com>
|
||||||
|
R: Clark Wang <xiaoning.wang@nxp.com>
|
||||||
|
R: NXP Linux Team <linux-imx@nxp.com>
|
||||||
L: netdev@vger.kernel.org
|
L: netdev@vger.kernel.org
|
||||||
S: Maintained
|
S: Maintained
|
||||||
F: Documentation/devicetree/bindings/net/fsl,fec.yaml
|
F: Documentation/devicetree/bindings/net/fsl,fec.yaml
|
||||||
|
@ -9493,8 +9501,9 @@ F: Documentation/devicetree/bindings/iio/humidity/st,hts221.yaml
|
||||||
F: drivers/iio/humidity/hts221*
|
F: drivers/iio/humidity/hts221*
|
||||||
|
|
||||||
HUAWEI ETHERNET DRIVER
|
HUAWEI ETHERNET DRIVER
|
||||||
|
M: Cai Huoqing <cai.huoqing@linux.dev>
|
||||||
L: netdev@vger.kernel.org
|
L: netdev@vger.kernel.org
|
||||||
S: Orphan
|
S: Maintained
|
||||||
F: Documentation/networking/device_drivers/ethernet/huawei/hinic.rst
|
F: Documentation/networking/device_drivers/ethernet/huawei/hinic.rst
|
||||||
F: drivers/net/ethernet/huawei/hinic/
|
F: drivers/net/ethernet/huawei/hinic/
|
||||||
|
|
||||||
|
@ -9597,6 +9606,7 @@ F: include/asm-generic/hyperv-tlfs.h
|
||||||
F: include/asm-generic/mshyperv.h
|
F: include/asm-generic/mshyperv.h
|
||||||
F: include/clocksource/hyperv_timer.h
|
F: include/clocksource/hyperv_timer.h
|
||||||
F: include/linux/hyperv.h
|
F: include/linux/hyperv.h
|
||||||
|
F: include/net/mana
|
||||||
F: include/uapi/linux/hyperv.h
|
F: include/uapi/linux/hyperv.h
|
||||||
F: net/vmw_vsock/hyperv_transport.c
|
F: net/vmw_vsock/hyperv_transport.c
|
||||||
F: tools/hv/
|
F: tools/hv/
|
||||||
|
@ -12410,7 +12420,7 @@ M: Marcin Wojtas <mw@semihalf.com>
|
||||||
M: Russell King <linux@armlinux.org.uk>
|
M: Russell King <linux@armlinux.org.uk>
|
||||||
L: netdev@vger.kernel.org
|
L: netdev@vger.kernel.org
|
||||||
S: Maintained
|
S: Maintained
|
||||||
F: Documentation/devicetree/bindings/net/marvell-pp2.txt
|
F: Documentation/devicetree/bindings/net/marvell,pp2.yaml
|
||||||
F: drivers/net/ethernet/marvell/mvpp2/
|
F: drivers/net/ethernet/marvell/mvpp2/
|
||||||
|
|
||||||
MARVELL MWIFIEX WIRELESS DRIVER
|
MARVELL MWIFIEX WIRELESS DRIVER
|
||||||
|
@ -12458,7 +12468,7 @@ F: Documentation/networking/device_drivers/ethernet/marvell/octeontx2.rst
|
||||||
F: drivers/net/ethernet/marvell/octeontx2/af/
|
F: drivers/net/ethernet/marvell/octeontx2/af/
|
||||||
|
|
||||||
MARVELL PRESTERA ETHERNET SWITCH DRIVER
|
MARVELL PRESTERA ETHERNET SWITCH DRIVER
|
||||||
M: Taras Chornyi <tchornyi@marvell.com>
|
M: Taras Chornyi <taras.chornyi@plvision.eu>
|
||||||
S: Supported
|
S: Supported
|
||||||
W: https://github.com/Marvell-switching/switchdev-prestera
|
W: https://github.com/Marvell-switching/switchdev-prestera
|
||||||
F: drivers/net/ethernet/marvell/prestera/
|
F: drivers/net/ethernet/marvell/prestera/
|
||||||
|
@ -13017,6 +13027,7 @@ M: Felix Fietkau <nbd@nbd.name>
|
||||||
M: John Crispin <john@phrozen.org>
|
M: John Crispin <john@phrozen.org>
|
||||||
M: Sean Wang <sean.wang@mediatek.com>
|
M: Sean Wang <sean.wang@mediatek.com>
|
||||||
M: Mark Lee <Mark-MC.Lee@mediatek.com>
|
M: Mark Lee <Mark-MC.Lee@mediatek.com>
|
||||||
|
M: Lorenzo Bianconi <lorenzo@kernel.org>
|
||||||
L: netdev@vger.kernel.org
|
L: netdev@vger.kernel.org
|
||||||
S: Maintained
|
S: Maintained
|
||||||
F: drivers/net/ethernet/mediatek/
|
F: drivers/net/ethernet/mediatek/
|
||||||
|
@ -14048,6 +14059,7 @@ F: include/uapi/linux/meye.h
|
||||||
|
|
||||||
MOTORCOMM PHY DRIVER
|
MOTORCOMM PHY DRIVER
|
||||||
M: Peter Geis <pgwipeout@gmail.com>
|
M: Peter Geis <pgwipeout@gmail.com>
|
||||||
|
M: Frank <Frank.Sae@motor-comm.com>
|
||||||
L: netdev@vger.kernel.org
|
L: netdev@vger.kernel.org
|
||||||
S: Maintained
|
S: Maintained
|
||||||
F: drivers/net/phy/motorcomm.c
|
F: drivers/net/phy/motorcomm.c
|
||||||
|
@ -19175,7 +19187,7 @@ M: Jassi Brar <jaswinder.singh@linaro.org>
|
||||||
M: Ilias Apalodimas <ilias.apalodimas@linaro.org>
|
M: Ilias Apalodimas <ilias.apalodimas@linaro.org>
|
||||||
L: netdev@vger.kernel.org
|
L: netdev@vger.kernel.org
|
||||||
S: Maintained
|
S: Maintained
|
||||||
F: Documentation/devicetree/bindings/net/socionext-netsec.txt
|
F: Documentation/devicetree/bindings/net/socionext,synquacer-netsec.yaml
|
||||||
F: drivers/net/ethernet/socionext/netsec.c
|
F: drivers/net/ethernet/socionext/netsec.c
|
||||||
|
|
||||||
SOCIONEXT (SNI) Synquacer SPI DRIVER
|
SOCIONEXT (SNI) Synquacer SPI DRIVER
|
||||||
|
@ -20828,7 +20840,6 @@ W: https://wireless.wiki.kernel.org/en/users/Drivers/wl12xx
|
||||||
W: https://wireless.wiki.kernel.org/en/users/Drivers/wl1251
|
W: https://wireless.wiki.kernel.org/en/users/Drivers/wl1251
|
||||||
T: git git://git.kernel.org/pub/scm/linux/kernel/git/luca/wl12xx.git
|
T: git git://git.kernel.org/pub/scm/linux/kernel/git/luca/wl12xx.git
|
||||||
F: drivers/net/wireless/ti/
|
F: drivers/net/wireless/ti/
|
||||||
F: include/linux/wl12xx.h
|
|
||||||
|
|
||||||
TIMEKEEPING, CLOCKSOURCE CORE, NTP, ALARMTIMER
|
TIMEKEEPING, CLOCKSOURCE CORE, NTP, ALARMTIMER
|
||||||
M: John Stultz <jstultz@google.com>
|
M: John Stultz <jstultz@google.com>
|
||||||
|
|
|
@ -178,6 +178,8 @@
|
||||||
|
|
||||||
/* Network controller */
|
/* Network controller */
|
||||||
ethernet: ethernet@f0000 {
|
ethernet: ethernet@f0000 {
|
||||||
|
#address-cells = <1>;
|
||||||
|
#size-cells = <0>;
|
||||||
compatible = "marvell,armada-375-pp2";
|
compatible = "marvell,armada-375-pp2";
|
||||||
reg = <0xf0000 0xa000>, /* Packet Processor regs */
|
reg = <0xf0000 0xa000>, /* Packet Processor regs */
|
||||||
<0xc0000 0x3060>, /* LMS regs */
|
<0xc0000 0x3060>, /* LMS regs */
|
||||||
|
@ -187,15 +189,17 @@
|
||||||
clock-names = "pp_clk", "gop_clk";
|
clock-names = "pp_clk", "gop_clk";
|
||||||
status = "disabled";
|
status = "disabled";
|
||||||
|
|
||||||
eth0: eth0 {
|
eth0: ethernet-port@0 {
|
||||||
interrupts = <GIC_SPI 37 IRQ_TYPE_LEVEL_HIGH>;
|
interrupts = <GIC_SPI 37 IRQ_TYPE_LEVEL_HIGH>;
|
||||||
port-id = <0>;
|
reg = <0>;
|
||||||
|
port-id = <0>; /* For backward compatibility. */
|
||||||
status = "disabled";
|
status = "disabled";
|
||||||
};
|
};
|
||||||
|
|
||||||
eth1: eth1 {
|
eth1: ethernet-port@1 {
|
||||||
interrupts = <GIC_SPI 41 IRQ_TYPE_LEVEL_HIGH>;
|
interrupts = <GIC_SPI 41 IRQ_TYPE_LEVEL_HIGH>;
|
||||||
port-id = <1>;
|
reg = <1>;
|
||||||
|
port-id = <1>; /* For backward compatibility. */
|
||||||
status = "disabled";
|
status = "disabled";
|
||||||
};
|
};
|
||||||
};
|
};
|
||||||
|
|
|
@ -10,7 +10,6 @@
|
||||||
#include <linux/init.h>
|
#include <linux/init.h>
|
||||||
#include <linux/kernel.h>
|
#include <linux/kernel.h>
|
||||||
#include <linux/of_platform.h>
|
#include <linux/of_platform.h>
|
||||||
#include <linux/wl12xx.h>
|
|
||||||
#include <linux/mmc/card.h>
|
#include <linux/mmc/card.h>
|
||||||
#include <linux/mmc/host.h>
|
#include <linux/mmc/host.h>
|
||||||
#include <linux/power/smartreflex.h>
|
#include <linux/power/smartreflex.h>
|
||||||
|
|
|
@ -21,6 +21,10 @@
|
||||||
};
|
};
|
||||||
};
|
};
|
||||||
|
|
||||||
|
&bluetooth0 {
|
||||||
|
brcm,board-type = "apple,atlantisb";
|
||||||
|
};
|
||||||
|
|
||||||
&wifi0 {
|
&wifi0 {
|
||||||
brcm,board-type = "apple,atlantisb";
|
brcm,board-type = "apple,atlantisb";
|
||||||
};
|
};
|
||||||
|
|
|
@ -17,6 +17,10 @@
|
||||||
model = "Apple MacBook Pro (13-inch, M1, 2020)";
|
model = "Apple MacBook Pro (13-inch, M1, 2020)";
|
||||||
};
|
};
|
||||||
|
|
||||||
|
&bluetooth0 {
|
||||||
|
brcm,board-type = "apple,honshu";
|
||||||
|
};
|
||||||
|
|
||||||
&wifi0 {
|
&wifi0 {
|
||||||
brcm,board-type = "apple,honshu";
|
brcm,board-type = "apple,honshu";
|
||||||
};
|
};
|
||||||
|
|
|
@ -17,6 +17,10 @@
|
||||||
model = "Apple MacBook Air (M1, 2020)";
|
model = "Apple MacBook Air (M1, 2020)";
|
||||||
};
|
};
|
||||||
|
|
||||||
|
&bluetooth0 {
|
||||||
|
brcm,board-type = "apple,shikoku";
|
||||||
|
};
|
||||||
|
|
||||||
&wifi0 {
|
&wifi0 {
|
||||||
brcm,board-type = "apple,shikoku";
|
brcm,board-type = "apple,shikoku";
|
||||||
};
|
};
|
||||||
|
|
|
@ -21,6 +21,10 @@
|
||||||
};
|
};
|
||||||
};
|
};
|
||||||
|
|
||||||
|
&bluetooth0 {
|
||||||
|
brcm,board-type = "apple,capri";
|
||||||
|
};
|
||||||
|
|
||||||
&wifi0 {
|
&wifi0 {
|
||||||
brcm,board-type = "apple,capri";
|
brcm,board-type = "apple,capri";
|
||||||
};
|
};
|
||||||
|
|
|
@ -21,6 +21,10 @@
|
||||||
};
|
};
|
||||||
};
|
};
|
||||||
|
|
||||||
|
&bluetooth0 {
|
||||||
|
brcm,board-type = "apple,santorini";
|
||||||
|
};
|
||||||
|
|
||||||
&wifi0 {
|
&wifi0 {
|
||||||
brcm,board-type = "apple,santorini";
|
brcm,board-type = "apple,santorini";
|
||||||
};
|
};
|
||||||
|
|
|
@ -11,6 +11,7 @@
|
||||||
|
|
||||||
/ {
|
/ {
|
||||||
aliases {
|
aliases {
|
||||||
|
bluetooth0 = &bluetooth0;
|
||||||
serial0 = &serial0;
|
serial0 = &serial0;
|
||||||
serial2 = &serial2;
|
serial2 = &serial2;
|
||||||
wifi0 = &wifi0;
|
wifi0 = &wifi0;
|
||||||
|
@ -77,6 +78,13 @@
|
||||||
local-mac-address = [00 00 00 00 00 00];
|
local-mac-address = [00 00 00 00 00 00];
|
||||||
apple,antenna-sku = "XX";
|
apple,antenna-sku = "XX";
|
||||||
};
|
};
|
||||||
|
|
||||||
|
bluetooth0: bluetooth@0,1 {
|
||||||
|
compatible = "pci14e4,5f69";
|
||||||
|
reg = <0x10100 0x0 0x0 0x0 0x0>;
|
||||||
|
/* To be filled by the loader */
|
||||||
|
local-bd-address = [00 00 00 00 00 00];
|
||||||
|
};
|
||||||
};
|
};
|
||||||
|
|
||||||
&nco_clkref {
|
&nco_clkref {
|
||||||
|
|
|
@ -24,9 +24,12 @@
|
||||||
|
|
||||||
/* these aliases provide the FMan ports mapping */
|
/* these aliases provide the FMan ports mapping */
|
||||||
enet0: ethernet@e0000 {
|
enet0: ethernet@e0000 {
|
||||||
|
pcs-handle-names = "qsgmii";
|
||||||
};
|
};
|
||||||
|
|
||||||
enet1: ethernet@e2000 {
|
enet1: ethernet@e2000 {
|
||||||
|
pcsphy-handle = <&pcsphy1>, <&qsgmiib_pcs1>;
|
||||||
|
pcs-handle-names = "sgmii", "qsgmii";
|
||||||
};
|
};
|
||||||
|
|
||||||
enet2: ethernet@e4000 {
|
enet2: ethernet@e4000 {
|
||||||
|
@ -36,11 +39,32 @@
|
||||||
};
|
};
|
||||||
|
|
||||||
enet4: ethernet@e8000 {
|
enet4: ethernet@e8000 {
|
||||||
|
pcsphy-handle = <&pcsphy4>, <&qsgmiib_pcs2>;
|
||||||
|
pcs-handle-names = "sgmii", "qsgmii";
|
||||||
};
|
};
|
||||||
|
|
||||||
enet5: ethernet@ea000 {
|
enet5: ethernet@ea000 {
|
||||||
|
pcsphy-handle = <&pcsphy5>, <&qsgmiib_pcs3>;
|
||||||
|
pcs-handle-names = "sgmii", "qsgmii";
|
||||||
};
|
};
|
||||||
|
|
||||||
enet6: ethernet@f0000 {
|
enet6: ethernet@f0000 {
|
||||||
};
|
};
|
||||||
|
|
||||||
|
mdio@e1000 {
|
||||||
|
qsgmiib_pcs1: ethernet-pcs@1 {
|
||||||
|
compatible = "fsl,lynx-pcs";
|
||||||
|
reg = <0x1>;
|
||||||
|
};
|
||||||
|
|
||||||
|
qsgmiib_pcs2: ethernet-pcs@2 {
|
||||||
|
compatible = "fsl,lynx-pcs";
|
||||||
|
reg = <0x2>;
|
||||||
|
};
|
||||||
|
|
||||||
|
qsgmiib_pcs3: ethernet-pcs@3 {
|
||||||
|
compatible = "fsl,lynx-pcs";
|
||||||
|
reg = <0x3>;
|
||||||
|
};
|
||||||
|
};
|
||||||
};
|
};
|
||||||
|
|
|
@ -23,6 +23,8 @@
|
||||||
&fman0 {
|
&fman0 {
|
||||||
/* these aliases provide the FMan ports mapping */
|
/* these aliases provide the FMan ports mapping */
|
||||||
enet0: ethernet@e0000 {
|
enet0: ethernet@e0000 {
|
||||||
|
pcsphy-handle = <&qsgmiib_pcs3>;
|
||||||
|
pcs-handle-names = "qsgmii";
|
||||||
};
|
};
|
||||||
|
|
||||||
enet1: ethernet@e2000 {
|
enet1: ethernet@e2000 {
|
||||||
|
@ -35,14 +37,37 @@
|
||||||
};
|
};
|
||||||
|
|
||||||
enet4: ethernet@e8000 {
|
enet4: ethernet@e8000 {
|
||||||
|
pcsphy-handle = <&pcsphy4>, <&qsgmiib_pcs1>;
|
||||||
|
pcs-handle-names = "sgmii", "qsgmii";
|
||||||
};
|
};
|
||||||
|
|
||||||
enet5: ethernet@ea000 {
|
enet5: ethernet@ea000 {
|
||||||
|
pcsphy-handle = <&pcsphy5>, <&pcsphy5>;
|
||||||
|
pcs-handle-names = "sgmii", "qsgmii";
|
||||||
};
|
};
|
||||||
|
|
||||||
enet6: ethernet@f0000 {
|
enet6: ethernet@f0000 {
|
||||||
};
|
};
|
||||||
|
|
||||||
enet7: ethernet@f2000 {
|
enet7: ethernet@f2000 {
|
||||||
|
pcsphy-handle = <&pcsphy7>, <&qsgmiib_pcs2>, <&pcsphy7>;
|
||||||
|
pcs-handle-names = "sgmii", "qsgmii", "xfi";
|
||||||
|
};
|
||||||
|
|
||||||
|
mdio@eb000 {
|
||||||
|
qsgmiib_pcs1: ethernet-pcs@1 {
|
||||||
|
compatible = "fsl,lynx-pcs";
|
||||||
|
reg = <0x1>;
|
||||||
|
};
|
||||||
|
|
||||||
|
qsgmiib_pcs2: ethernet-pcs@2 {
|
||||||
|
compatible = "fsl,lynx-pcs";
|
||||||
|
reg = <0x2>;
|
||||||
|
};
|
||||||
|
|
||||||
|
qsgmiib_pcs3: ethernet-pcs@3 {
|
||||||
|
compatible = "fsl,lynx-pcs";
|
||||||
|
reg = <0x3>;
|
||||||
|
};
|
||||||
};
|
};
|
||||||
};
|
};
|
||||||
|
|
|
@ -58,6 +58,8 @@
|
||||||
ranges = <0x0 0x0 ADDRESSIFY(CP11X_BASE) 0x2000000>;
|
ranges = <0x0 0x0 ADDRESSIFY(CP11X_BASE) 0x2000000>;
|
||||||
|
|
||||||
CP11X_LABEL(ethernet): ethernet@0 {
|
CP11X_LABEL(ethernet): ethernet@0 {
|
||||||
|
#address-cells = <1>;
|
||||||
|
#size-cells = <0>;
|
||||||
compatible = "marvell,armada-7k-pp22";
|
compatible = "marvell,armada-7k-pp22";
|
||||||
reg = <0x0 0x100000>, <0x129000 0xb000>, <0x220000 0x800>;
|
reg = <0x0 0x100000>, <0x129000 0xb000>, <0x220000 0x800>;
|
||||||
clocks = <&CP11X_LABEL(clk) 1 3>, <&CP11X_LABEL(clk) 1 9>,
|
clocks = <&CP11X_LABEL(clk) 1 3>, <&CP11X_LABEL(clk) 1 9>,
|
||||||
|
@ -69,7 +71,7 @@
|
||||||
status = "disabled";
|
status = "disabled";
|
||||||
dma-coherent;
|
dma-coherent;
|
||||||
|
|
||||||
CP11X_LABEL(eth0): eth0 {
|
CP11X_LABEL(eth0): ethernet-port@0 {
|
||||||
interrupts = <39 IRQ_TYPE_LEVEL_HIGH>,
|
interrupts = <39 IRQ_TYPE_LEVEL_HIGH>,
|
||||||
<43 IRQ_TYPE_LEVEL_HIGH>,
|
<43 IRQ_TYPE_LEVEL_HIGH>,
|
||||||
<47 IRQ_TYPE_LEVEL_HIGH>,
|
<47 IRQ_TYPE_LEVEL_HIGH>,
|
||||||
|
@ -83,12 +85,13 @@
|
||||||
interrupt-names = "hif0", "hif1", "hif2",
|
interrupt-names = "hif0", "hif1", "hif2",
|
||||||
"hif3", "hif4", "hif5", "hif6", "hif7",
|
"hif3", "hif4", "hif5", "hif6", "hif7",
|
||||||
"hif8", "link";
|
"hif8", "link";
|
||||||
port-id = <0>;
|
reg = <0>;
|
||||||
|
port-id = <0>; /* For backward compatibility. */
|
||||||
gop-port-id = <0>;
|
gop-port-id = <0>;
|
||||||
status = "disabled";
|
status = "disabled";
|
||||||
};
|
};
|
||||||
|
|
||||||
CP11X_LABEL(eth1): eth1 {
|
CP11X_LABEL(eth1): ethernet-port@1 {
|
||||||
interrupts = <40 IRQ_TYPE_LEVEL_HIGH>,
|
interrupts = <40 IRQ_TYPE_LEVEL_HIGH>,
|
||||||
<44 IRQ_TYPE_LEVEL_HIGH>,
|
<44 IRQ_TYPE_LEVEL_HIGH>,
|
||||||
<48 IRQ_TYPE_LEVEL_HIGH>,
|
<48 IRQ_TYPE_LEVEL_HIGH>,
|
||||||
|
@ -102,12 +105,13 @@
|
||||||
interrupt-names = "hif0", "hif1", "hif2",
|
interrupt-names = "hif0", "hif1", "hif2",
|
||||||
"hif3", "hif4", "hif5", "hif6", "hif7",
|
"hif3", "hif4", "hif5", "hif6", "hif7",
|
||||||
"hif8", "link";
|
"hif8", "link";
|
||||||
port-id = <1>;
|
reg = <1>;
|
||||||
|
port-id = <1>; /* For backward compatibility. */
|
||||||
gop-port-id = <2>;
|
gop-port-id = <2>;
|
||||||
status = "disabled";
|
status = "disabled";
|
||||||
};
|
};
|
||||||
|
|
||||||
CP11X_LABEL(eth2): eth2 {
|
CP11X_LABEL(eth2): ethernet-port@2 {
|
||||||
interrupts = <41 IRQ_TYPE_LEVEL_HIGH>,
|
interrupts = <41 IRQ_TYPE_LEVEL_HIGH>,
|
||||||
<45 IRQ_TYPE_LEVEL_HIGH>,
|
<45 IRQ_TYPE_LEVEL_HIGH>,
|
||||||
<49 IRQ_TYPE_LEVEL_HIGH>,
|
<49 IRQ_TYPE_LEVEL_HIGH>,
|
||||||
|
@ -121,7 +125,8 @@
|
||||||
interrupt-names = "hif0", "hif1", "hif2",
|
interrupt-names = "hif0", "hif1", "hif2",
|
||||||
"hif3", "hif4", "hif5", "hif6", "hif7",
|
"hif3", "hif4", "hif5", "hif6", "hif7",
|
||||||
"hif8", "link";
|
"hif8", "link";
|
||||||
port-id = <2>;
|
reg = <2>;
|
||||||
|
port-id = <2>; /* For backward compatibility. */
|
||||||
gop-port-id = <3>;
|
gop-port-id = <3>;
|
||||||
status = "disabled";
|
status = "disabled";
|
||||||
};
|
};
|
||||||
|
|
|
@ -77,6 +77,47 @@
|
||||||
no-map;
|
no-map;
|
||||||
reg = <0 0x4fc00000 0 0x00100000>;
|
reg = <0 0x4fc00000 0 0x00100000>;
|
||||||
};
|
};
|
||||||
|
|
||||||
|
wo_emi0: wo-emi@4fd00000 {
|
||||||
|
reg = <0 0x4fd00000 0 0x40000>;
|
||||||
|
no-map;
|
||||||
|
};
|
||||||
|
|
||||||
|
wo_emi1: wo-emi@4fd40000 {
|
||||||
|
reg = <0 0x4fd40000 0 0x40000>;
|
||||||
|
no-map;
|
||||||
|
};
|
||||||
|
|
||||||
|
wo_ilm0: wo-ilm@151e0000 {
|
||||||
|
reg = <0 0x151e0000 0 0x8000>;
|
||||||
|
no-map;
|
||||||
|
};
|
||||||
|
|
||||||
|
wo_ilm1: wo-ilm@151f0000 {
|
||||||
|
reg = <0 0x151f0000 0 0x8000>;
|
||||||
|
no-map;
|
||||||
|
};
|
||||||
|
|
||||||
|
wo_data: wo-data@4fd80000 {
|
||||||
|
reg = <0 0x4fd80000 0 0x240000>;
|
||||||
|
no-map;
|
||||||
|
};
|
||||||
|
|
||||||
|
wo_dlm0: wo-dlm@151e8000 {
|
||||||
|
reg = <0 0x151e8000 0 0x2000>;
|
||||||
|
no-map;
|
||||||
|
};
|
||||||
|
|
||||||
|
wo_dlm1: wo-dlm@151f8000 {
|
||||||
|
reg = <0 0x151f8000 0 0x2000>;
|
||||||
|
no-map;
|
||||||
|
};
|
||||||
|
|
||||||
|
wo_boot: wo-boot@15194000 {
|
||||||
|
reg = <0 0x15194000 0 0x1000>;
|
||||||
|
no-map;
|
||||||
|
};
|
||||||
|
|
||||||
};
|
};
|
||||||
|
|
||||||
timer {
|
timer {
|
||||||
|
@ -298,6 +339,11 @@
|
||||||
reg = <0 0x15010000 0 0x1000>;
|
reg = <0 0x15010000 0 0x1000>;
|
||||||
interrupt-parent = <&gic>;
|
interrupt-parent = <&gic>;
|
||||||
interrupts = <GIC_SPI 205 IRQ_TYPE_LEVEL_HIGH>;
|
interrupts = <GIC_SPI 205 IRQ_TYPE_LEVEL_HIGH>;
|
||||||
|
memory-region = <&wo_emi0>, <&wo_ilm0>, <&wo_dlm0>,
|
||||||
|
<&wo_data>, <&wo_boot>;
|
||||||
|
memory-region-names = "wo-emi", "wo-ilm", "wo-dlm",
|
||||||
|
"wo-data", "wo-boot";
|
||||||
|
mediatek,wo-ccif = <&wo_ccif0>;
|
||||||
};
|
};
|
||||||
|
|
||||||
wed1: wed@15011000 {
|
wed1: wed@15011000 {
|
||||||
|
@ -306,6 +352,25 @@
|
||||||
reg = <0 0x15011000 0 0x1000>;
|
reg = <0 0x15011000 0 0x1000>;
|
||||||
interrupt-parent = <&gic>;
|
interrupt-parent = <&gic>;
|
||||||
interrupts = <GIC_SPI 206 IRQ_TYPE_LEVEL_HIGH>;
|
interrupts = <GIC_SPI 206 IRQ_TYPE_LEVEL_HIGH>;
|
||||||
|
memory-region = <&wo_emi1>, <&wo_ilm1>, <&wo_dlm1>,
|
||||||
|
<&wo_data>, <&wo_boot>;
|
||||||
|
memory-region-names = "wo-emi", "wo-ilm", "wo-dlm",
|
||||||
|
"wo-data", "wo-boot";
|
||||||
|
mediatek,wo-ccif = <&wo_ccif1>;
|
||||||
|
};
|
||||||
|
|
||||||
|
wo_ccif0: syscon@151a5000 {
|
||||||
|
compatible = "mediatek,mt7986-wo-ccif", "syscon";
|
||||||
|
reg = <0 0x151a5000 0 0x1000>;
|
||||||
|
interrupt-parent = <&gic>;
|
||||||
|
interrupts = <GIC_SPI 211 IRQ_TYPE_LEVEL_HIGH>;
|
||||||
|
};
|
||||||
|
|
||||||
|
wo_ccif1: syscon@151ad000 {
|
||||||
|
compatible = "mediatek,mt7986-wo-ccif", "syscon";
|
||||||
|
reg = <0 0x151ad000 0 0x1000>;
|
||||||
|
interrupt-parent = <&gic>;
|
||||||
|
interrupts = <GIC_SPI 212 IRQ_TYPE_LEVEL_HIGH>;
|
||||||
};
|
};
|
||||||
|
|
||||||
eth: ethernet@15100000 {
|
eth: ethernet@15100000 {
|
||||||
|
|
|
@ -1649,13 +1649,8 @@ static void invoke_bpf_prog(struct jit_ctx *ctx, struct bpf_tramp_link *l,
|
||||||
struct bpf_prog *p = l->link.prog;
|
struct bpf_prog *p = l->link.prog;
|
||||||
int cookie_off = offsetof(struct bpf_tramp_run_ctx, bpf_cookie);
|
int cookie_off = offsetof(struct bpf_tramp_run_ctx, bpf_cookie);
|
||||||
|
|
||||||
if (p->aux->sleepable) {
|
enter_prog = (u64)bpf_trampoline_enter(p);
|
||||||
enter_prog = (u64)__bpf_prog_enter_sleepable;
|
exit_prog = (u64)bpf_trampoline_exit(p);
|
||||||
exit_prog = (u64)__bpf_prog_exit_sleepable;
|
|
||||||
} else {
|
|
||||||
enter_prog = (u64)__bpf_prog_enter;
|
|
||||||
exit_prog = (u64)__bpf_prog_exit;
|
|
||||||
}
|
|
||||||
|
|
||||||
if (l->cookie == 0) {
|
if (l->cookie == 0) {
|
||||||
/* if cookie is zero, one instruction is enough to store it */
|
/* if cookie is zero, one instruction is enough to store it */
|
||||||
|
|
|
@ -284,7 +284,6 @@ CONFIG_IXGB=m
|
||||||
CONFIG_SKGE=m
|
CONFIG_SKGE=m
|
||||||
CONFIG_SKY2=m
|
CONFIG_SKY2=m
|
||||||
CONFIG_MYRI10GE=m
|
CONFIG_MYRI10GE=m
|
||||||
CONFIG_FEALNX=m
|
|
||||||
CONFIG_NATSEMI=m
|
CONFIG_NATSEMI=m
|
||||||
CONFIG_NS83820=m
|
CONFIG_NS83820=m
|
||||||
CONFIG_S2IO=m
|
CONFIG_S2IO=m
|
||||||
|
|
|
@ -55,7 +55,8 @@ fman@400000 {
|
||||||
reg = <0xe0000 0x1000>;
|
reg = <0xe0000 0x1000>;
|
||||||
fsl,fman-ports = <&fman0_rx_0x08 &fman0_tx_0x28>;
|
fsl,fman-ports = <&fman0_rx_0x08 &fman0_tx_0x28>;
|
||||||
ptp-timer = <&ptp_timer0>;
|
ptp-timer = <&ptp_timer0>;
|
||||||
pcsphy-handle = <&pcsphy0>;
|
pcsphy-handle = <&pcsphy0>, <&pcsphy0>;
|
||||||
|
pcs-handle-names = "sgmii", "qsgmii";
|
||||||
};
|
};
|
||||||
|
|
||||||
mdio@e1000 {
|
mdio@e1000 {
|
||||||
|
|
|
@ -52,7 +52,15 @@ fman@400000 {
|
||||||
compatible = "fsl,fman-memac";
|
compatible = "fsl,fman-memac";
|
||||||
reg = <0xf0000 0x1000>;
|
reg = <0xf0000 0x1000>;
|
||||||
fsl,fman-ports = <&fman0_rx_0x10 &fman0_tx_0x30>;
|
fsl,fman-ports = <&fman0_rx_0x10 &fman0_tx_0x30>;
|
||||||
pcsphy-handle = <&pcsphy6>;
|
pcsphy-handle = <&pcsphy6>, <&qsgmiib_pcs2>, <&pcsphy6>;
|
||||||
|
pcs-handle-names = "sgmii", "qsgmii", "xfi";
|
||||||
|
};
|
||||||
|
|
||||||
|
mdio@e9000 {
|
||||||
|
qsgmiib_pcs2: ethernet-pcs@2 {
|
||||||
|
compatible = "fsl,lynx-pcs";
|
||||||
|
reg = <2>;
|
||||||
|
};
|
||||||
};
|
};
|
||||||
|
|
||||||
mdio@f1000 {
|
mdio@f1000 {
|
||||||
|
|
|
@ -55,7 +55,15 @@ fman@400000 {
|
||||||
reg = <0xe2000 0x1000>;
|
reg = <0xe2000 0x1000>;
|
||||||
fsl,fman-ports = <&fman0_rx_0x09 &fman0_tx_0x29>;
|
fsl,fman-ports = <&fman0_rx_0x09 &fman0_tx_0x29>;
|
||||||
ptp-timer = <&ptp_timer0>;
|
ptp-timer = <&ptp_timer0>;
|
||||||
pcsphy-handle = <&pcsphy1>;
|
pcsphy-handle = <&pcsphy1>, <&qsgmiia_pcs1>;
|
||||||
|
pcs-handle-names = "sgmii", "qsgmii";
|
||||||
|
};
|
||||||
|
|
||||||
|
mdio@e1000 {
|
||||||
|
qsgmiia_pcs1: ethernet-pcs@1 {
|
||||||
|
compatible = "fsl,lynx-pcs";
|
||||||
|
reg = <1>;
|
||||||
|
};
|
||||||
};
|
};
|
||||||
|
|
||||||
mdio@e3000 {
|
mdio@e3000 {
|
||||||
|
|
Some files were not shown because too many files have changed in this diff Show more
Loading…
Add table
Add a link
Reference in a new issue