Commit graph

29997 commits

Author SHA1 Message Date
Christoph Hellwig
518a2f1925 dma-mapping: zero memory returned from dma_alloc_*
If we want to map memory from the DMA allocator to userspace it must be
zeroed at allocation time to prevent stale data leaks.   We already do
this on most common architectures, but some architectures don't do this
yet, fix them up, either by passing GFP_ZERO when we use the normal page
allocator or doing a manual memset otherwise.

Signed-off-by: Christoph Hellwig <hch@lst.de>
Acked-by: Geert Uytterhoeven <geert@linux-m68k.org> [m68k]
Acked-by: Sam Ravnborg <sam@ravnborg.org> [sparc]
2018-12-20 08:13:52 +01:00
Martin KaFai Lau
fdbaa0beb7 bpf: Ensure line_info.insn_off cannot point to insn with zero code
This patch rejects a line_info if the bpf insn code referred by
line_info.insn_off is 0. F.e. a broken userspace tool might generate
a line_info.insn_off that points to the second 8 bytes of a BPF_LD_IMM64.

Signed-off-by: Martin KaFai Lau <kafai@fb.com>
Signed-off-by: Alexei Starovoitov <ast@kernel.org>
2018-12-19 15:42:55 -08:00
NeilBrown
a6d8e7637f cred: export get_task_cred().
There is no reason that modules should not be able
to use this, and NFS will need it when converted to
use 'struct cred'.

Signed-off-by: NeilBrown <neilb@suse.com>
Signed-off-by: Anna Schumaker <Anna.Schumaker@Netapp.com>
2018-12-19 13:52:44 -05:00
NeilBrown
97d0fb239c cred: add get_cred_rcu()
Sometimes we want to opportunistically get a
ref to a cred in an rcu_read_lock protected section.
get_task_cred() does this, and NFS does as similar thing
with its own credential structures.
To prepare for NFS converting to use 'struct cred' more
uniformly, define get_cred_rcu(), and use it in
get_task_cred().

Signed-off-by: NeilBrown <neilb@suse.com>
Signed-off-by: Anna Schumaker <Anna.Schumaker@Netapp.com>
2018-12-19 13:52:44 -05:00
NeilBrown
d89b22d46a cred: add cred_fscmp() for comparing creds.
NFS needs to compare to credentials, to see if they can
be treated the same w.r.t. filesystem access.  Sometimes
an ordering is needed when credentials are used as a key
to an rbtree.
NFS currently has its own private credential management from
before 'struct cred' existed.  To move it over to more consistent
use of 'struct cred' we need a comparison function.
This patch adds that function.

Signed-off-by: NeilBrown <neilb@suse.com>
Signed-off-by: Anna Schumaker <Anna.Schumaker@Netapp.com>
2018-12-19 13:52:44 -05:00
Dou Liyang
c410abbbac genirq/affinity: Add is_managed to struct irq_affinity_desc
Devices which use managed interrupts usually have two classes of
interrupts:

  - Interrupts for multiple device queues
  - Interrupts for general device management

Currently both classes are treated the same way, i.e. as managed
interrupts. The general interrupts get the default affinity mask assigned
while the device queue interrupts are spread out over the possible CPUs.

Treating the general interrupts as managed is both a limitation and under
certain circumstances a bug. Assume the following situation:

 default_irq_affinity = 4..7

So if CPUs 4-7 are offlined, then the core code will shut down the device
management interrupts because the last CPU in their affinity mask went
offline.

It's also a limitation because it's desired to allow manual placement of
the general device interrupts for various reasons. If they are marked
managed then the interrupt affinity setting from both user and kernel space
is disabled. That limitation was reported by Kashyap and Sumit.

Expand struct irq_affinity_desc with a new bit 'is_managed' which is set
for truly managed interrupts (queue interrupts) and cleared for the general
device interrupts.

[ tglx: Simplify code and massage changelog ]

Reported-by: Kashyap Desai <kashyap.desai@broadcom.com>
Reported-by: Sumit Saxena <sumit.saxena@broadcom.com>
Signed-off-by: Dou Liyang <douliyangs@gmail.com>
Signed-off-by: Thomas Gleixner <tglx@linutronix.de>
Cc: linux-pci@vger.kernel.org
Cc: shivasharan.srikanteshwara@broadcom.com
Cc: ming.lei@redhat.com
Cc: hch@lst.de
Cc: bhelgaas@google.com
Cc: douliyang1@huawei.com
Link: https://lkml.kernel.org/r/20181204155122.6327-3-douliyangs@gmail.com
2018-12-19 11:32:08 +01:00
Dou Liyang
bec04037e4 genirq/core: Introduce struct irq_affinity_desc
The interrupt affinity management uses straight cpumask pointers to convey
the automatically assigned affinity masks for managed interrupts. The core
interrupt descriptor allocation also decides based on the pointer being non
NULL whether an interrupt is managed or not.

Devices which use managed interrupts usually have two classes of
interrupts:

  - Interrupts for multiple device queues
  - Interrupts for general device management

Currently both classes are treated the same way, i.e. as managed
interrupts. The general interrupts get the default affinity mask assigned
while the device queue interrupts are spread out over the possible CPUs.

Treating the general interrupts as managed is both a limitation and under
certain circumstances a bug. Assume the following situation:

 default_irq_affinity = 4..7

So if CPUs 4-7 are offlined, then the core code will shut down the device
management interrupts because the last CPU in their affinity mask went
offline.

It's also a limitation because it's desired to allow manual placement of
the general device interrupts for various reasons. If they are marked
managed then the interrupt affinity setting from both user and kernel space
is disabled.

To remedy that situation it's required to convey more information than the
cpumasks through various interfaces related to interrupt descriptor
allocation.

Instead of adding yet another argument, create a new data structure
'irq_affinity_desc' which for now just contains the cpumask. This struct
can be expanded to convey auxilliary information in the next step.

No functional change, just preparatory work.

[ tglx: Simplified logic and clarified changelog ]

Suggested-by: Thomas Gleixner <tglx@linutronix.de>
Suggested-by: Bjorn Helgaas <bhelgaas@google.com>
Signed-off-by: Dou Liyang <douliyangs@gmail.com>
Signed-off-by: Thomas Gleixner <tglx@linutronix.de>
Cc: linux-pci@vger.kernel.org
Cc: kashyap.desai@broadcom.com
Cc: shivasharan.srikanteshwara@broadcom.com
Cc: sumit.saxena@broadcom.com
Cc: ming.lei@redhat.com
Cc: hch@lst.de
Cc: douliyang1@huawei.com
Link: https://lkml.kernel.org/r/20181204155122.6327-2-douliyangs@gmail.com
2018-12-19 11:32:08 +01:00
Thomas Gleixner
c2899c3470 genirq/affinity: Remove excess indentation
Plus other coding style issues which stood out while staring at that code.

Signed-off-by: Thomas Gleixner <tglx@linutronix.de>
2018-12-19 11:32:07 +01:00
Yonghong Song
76c43ae84e bpf: log struct/union attribute for forward type
Current btf internal verbose logger logs fwd type as
  [2] FWD A type_id=0
where A is the type name.

Commit 9d5f9f701b ("bpf: btf: fix struct/union/fwd types
with kind_flag") introduced kind_flag which can be used
to distinguish whether a forward type is a struct or
union.

Also, "type_id=0" does not carry any meaningful
information for fwd type as btf_type.type = 0 is simply
enforced during btf verification and is not used
anywhere else.

This commit changed the log to
  [2] FWD A struct
if kind_flag = 0, or
  [2] FWD A union
if kind_flag = 1.

Acked-by: Martin KaFai Lau <kafai@fb.com>
Signed-off-by: Yonghong Song <yhs@fb.com>
Signed-off-by: Daniel Borkmann <daniel@iogearbox.net>
2018-12-19 00:47:56 +01:00
Jiong Wang
0bae2d4d62 bpf: correct slot_type marking logic to allow more stack slot sharing
Verifier is supposed to support sharing stack slot allocated to ptr with
SCALAR_VALUE for privileged program. However this doesn't happen for some
cases.

The reason is verifier is not clearing slot_type STACK_SPILL for all bytes,
it only clears part of them, while verifier is using:

  slot_type[0] == STACK_SPILL

as a convention to check one slot is ptr type.

So, the consequence of partial clearing slot_type is verifier could treat a
partially overridden ptr slot, which should now be a SCALAR_VALUE slot,
still as ptr slot, and rejects some valid programs.

Before this patch, test_xdp_noinline.o under bpf selftests, bpf_lxc.o and
bpf_netdev.o under Cilium bpf repo, when built with -mattr=+alu32 are
rejected due to this issue. After this patch, they all accepted.

There is no processed insn number change before and after this patch on
Cilium bpf programs.

Reviewed-by: Jakub Kicinski <jakub.kicinski@netronome.com>
Signed-off-by: Jiong Wang <jiong.wang@netronome.com>
Reviewed-by: Daniel Borkmann <daniel@iogearbox.net>
Signed-off-by: Alexei Starovoitov <ast@kernel.org>
2018-12-18 14:45:01 -08:00
Thomas Gleixner
da791a6675 futex: Cure exit race
Stefan reported, that the glibc tst-robustpi4 test case fails
occasionally. That case creates the following race between
sys_exit() and sys_futex_lock_pi():

 CPU0				CPU1

 sys_exit()			sys_futex()
  do_exit()			 futex_lock_pi()
   exit_signals(tsk)		  No waiters:
    tsk->flags |= PF_EXITING;	  *uaddr == 0x00000PID
  mm_release(tsk)		  Set waiter bit
   exit_robust_list(tsk) {	  *uaddr = 0x80000PID;
      Set owner died		  attach_to_pi_owner() {
    *uaddr = 0xC0000000;	   tsk = get_task(PID);
   }				   if (!tsk->flags & PF_EXITING) {
  ...				     attach();
  tsk->flags |= PF_EXITPIDONE;	   } else {
				     if (!(tsk->flags & PF_EXITPIDONE))
				       return -EAGAIN;
				     return -ESRCH; <--- FAIL
				   }

ESRCH is returned all the way to user space, which triggers the glibc test
case assert. Returning ESRCH unconditionally is wrong here because the user
space value has been changed by the exiting task to 0xC0000000, i.e. the
FUTEX_OWNER_DIED bit is set and the futex PID value has been cleared. This
is a valid state and the kernel has to handle it, i.e. taking the futex.

Cure it by rereading the user space value when PF_EXITING and PF_EXITPIDONE
is set in the task which 'owns' the futex. If the value has changed, let
the kernel retry the operation, which includes all regular sanity checks
and correctly handles the FUTEX_OWNER_DIED case.

If it hasn't changed, then return ESRCH as there is no way to distinguish
this case from malfunctioning user space. This happens when the exiting
task did not have a robust list, the robust list was corrupted or the user
space value in the futex was simply bogus.

Reported-by: Stefan Liebler <stli@linux.ibm.com>
Signed-off-by: Thomas Gleixner <tglx@linutronix.de>
Acked-by: Peter Zijlstra <peterz@infradead.org>
Cc: Heiko Carstens <heiko.carstens@de.ibm.com>
Cc: Darren Hart <dvhart@infradead.org>
Cc: Ingo Molnar <mingo@kernel.org>
Cc: Sasha Levin <sashal@kernel.org>
Cc: stable@vger.kernel.org
Link: https://bugzilla.kernel.org/show_bug.cgi?id=200467
Link: https://lkml.kernel.org/r/20181210152311.986181245@linutronix.de
2018-12-18 23:13:15 +01:00
Matt Mullins
a38d1107f9 bpf: support raw tracepoints in modules
Distributions build drivers as modules, including network and filesystem
drivers which export numerous tracepoints.  This enables
bpf(BPF_RAW_TRACEPOINT_OPEN) to attach to those tracepoints.

Signed-off-by: Matt Mullins <mmullins@fb.com>
Acked-by: Martin KaFai Lau <kafai@fb.com>
Signed-off-by: Alexei Starovoitov <ast@kernel.org>
2018-12-18 14:08:12 -08:00
Thomas Gleixner
ff3730a497 irqchip updates for 4.21
- A bunch of new irqchip drivers (RDA8810PL, Madera, imx-irqsteer)
 - Updates for new (and old) platforms (i.MX8MQ, F1C100s)
 - A number of SPDX cleanups
 - A workaround for a very broken GICv3 implementation
 - A platform-msi fix
 - Various cleanups
 -----BEGIN PGP SIGNATURE-----
 
 iQJJBAABCgAzFiEEn9UcU+C1Yxj9lZw9I9DQutE9ekMFAlwZI8cVHG1hcmMuenlu
 Z2llckBhcm0uY29tAAoJECPQ0LrRPXpDyokP+gKoKbZMc1E7dX6WxUrKh2N+fMJF
 uVbuGF2s57CLG955YNuyo8BK4meWJIHGO3JahwE8I/9eu0G7PaudYvpZgP7s/sxD
 XHLWFVHB1mq4lExMcluT0jG4ZpX7EKvYB1KGqgYM1ScOS9Uubb4ZG9T5GPhUT/YM
 w1BAtHaZmCAg8d0wNPUMaAFc9Bd2B9Z1C8nwS+wpdJRxYxE9x8BES42r95rbXCG6
 5Cq2ol/NbF4RbFodel4YdiAIKfrQtXyQ3N3twC5GRXln4XLjUfzs4mA5rxLLoeGZ
 2UGXeIk0GcokSWF/e+0p3tQDWKwdbqoBhbRbqk7u5ZWuEWTRf4Zot3IlCVpJAMM3
 iRw5XChWxovC+/oqgin4sp1gNpSRgf5mMvR1EauR5DTVtwlOjUBKaPEyKLrPITOo
 B42EJugJ94J0YVdT9RUJsOSXIdOiYFE6I9F4i/XioLYq5FItBB56/81ARZgEncpg
 FEdtseCCtRC3WWGzghxZsSzCW3iGi8wdddRdZmOXCNdPtH03TZg0dGPS+KIn8Soh
 eVSGImV/4efN6hh6fSryeR02fYT3DKGgDQUiV4e/1SOSzxy6VjjrOh48tB8qn/M7
 NbFZMqDKnltsXT2C+bh6zjhorbVCkj8AEtx1oF0d7iIyBxor3eHUelTz6VglNlLq
 RFetH+Yjh9nt9ReO
 =1Mk9
 -----END PGP SIGNATURE-----

Merge tag 'irqchip-4.21' of git://git.kernel.org/pub/scm/linux/kernel/git/maz/arm-platforms into irq/core

Pull irqchip updates from Marc Zyngier:

 - A bunch of new irqchip drivers (RDA8810PL, Madera, imx-irqsteer)
 - Updates for new (and old) platforms (i.MX8MQ, F1C100s)
 - A number of SPDX cleanups
 - A workaround for a very broken GICv3 implementation
 - A platform-msi fix
 - Various cleanups
2018-12-18 18:37:27 +01:00
Arnd Bergmann
437e78d3fd timekeeping: remove timespec_add/timespec_del
The last users were removed a while ago since everyone moved to ktime_t,
so we can remove the two unused interfaces for old timespec structures.

With those two gone, set_normalized_timespec() is also unused, so
remove that as well.

Signed-off-by: Arnd Bergmann <arnd@arndb.de>
Acked-by: John Stultz <john.stultz@linaro.org>
2018-12-18 16:13:05 +01:00
Arnd Bergmann
926617889d timekeeping: remove unused {read,update}_persistent_clock
After arch/sh has removed the last reference to these functions,
we can remove them completely and just rely on the 64-bit time_t
based versions. This cleans up a rather ugly use of __weak
functions.

Signed-off-by: Arnd Bergmann <arnd@arndb.de>
Acked-by: John Stultz <john.stultz@linaro.org>
2018-12-18 16:13:05 +01:00
Arnd Bergmann
2367c4b5fa y2038: signal: Add compat_sys_rt_sigtimedwait_time64
Now that 32-bit architectures have two variants of sys_rt_sigtimedwaid()
for 32-bit and 64-bit time_t, we also need to have a second compat system
call entry point on the corresponding 64-bit architectures.

The traditional system call keeps getting handled
by compat_sys_rt_sigtimedwait(), and this adds a new
compat_sys_rt_sigtimedwait_time64() that differs only in the timeout
argument type.

The naming remains a bit asymmetric for the moment. Ideally we would
want to have compat_sys_rt_sigtimedwait_time32() for the old version
and compat_sys_rt_sigtimedwait() for the new one to mirror the names
of the native entry points, but renaming the existing system call
tables causes unnecessary churn. I would suggest renaming all such
system calls together at a later point.

Signed-off-by: Arnd Bergmann <arnd@arndb.de>
2018-12-18 16:13:04 +01:00
Arnd Bergmann
df8522a340 y2038: signal: Add sys_rt_sigtimedwait_time32
Once sys_rt_sigtimedwait() gets changed to a 64-bit time_t, we have
to provide compatibility support for existing binaries.

An earlier version of this patch reused the compat_sys_rt_sigtimedwait
entry point to avoid code duplication, but this newer approach
duplicates the existing native entry point instead, which seems
a bit cleaner.

Signed-off-by: Arnd Bergmann <arnd@arndb.de>
2018-12-18 16:13:04 +01:00
Arnd Bergmann
e11d4284e2 y2038: socket: Add compat_sys_recvmmsg_time64
recvmmsg() takes two arguments to pointers of structures that differ
between 32-bit and 64-bit architectures: mmsghdr and timespec.

For y2038 compatbility, we are changing the native system call from
timespec to __kernel_timespec with a 64-bit time_t (in another patch),
and use the existing compat system call on both 32-bit and 64-bit
architectures for compatibility with traditional 32-bit user space.

As we now have two variants of recvmmsg() for 32-bit tasks that are both
different from the variant that we use on 64-bit tasks, this means we
also require two compat system calls!

The solution I picked is to flip things around: The existing
compat_sys_recvmmsg() call gets moved from net/compat.c into net/socket.c
and now handles the case for old user space on all architectures that
have set CONFIG_COMPAT_32BIT_TIME.  A new compat_sys_recvmmsg_time64()
call gets added in the old place for 64-bit architectures only, this
one handles the case of a compat mmsghdr structure combined with
__kernel_timespec.

In the indirect sys_socketcall(), we now need to call either
do_sys_recvmmsg() or __compat_sys_recvmmsg(), depending on what kind of
architecture we are on. For compat_sys_socketcall(), no such change is
needed, we always call __compat_sys_recvmmsg().

I decided to not add a new SYS_RECVMMSG_TIME64 socketcall: Any libc
implementation for 64-bit time_t will need significant changes including
an updated asm/unistd.h, and it seems better to consistently use the
separate syscalls that configuration, leaving the socketcall only for
backward compatibility with 32-bit time_t based libc.

The naming is asymmetric for the moment, so both existing syscalls
entry points keep their names, while the new ones are recvmmsg_time32
and compat_recvmmsg_time64 respectively. I expect that we will rename
the compat syscalls later as we start using generated syscall tables
everywhere and add these entry points.

Signed-off-by: Arnd Bergmann <arnd@arndb.de>
2018-12-18 16:13:04 +01:00
Ingo Molnar
c5f48c0a7a genirq: Fix various typos in comments
Go over the IRQ subsystem source code (including irqchip drivers) and
fix common typos in comments.

No change in functionality intended.

Signed-off-by: Ingo Molnar <mingo@kernel.org>
Cc: Thomas Gleixner <tglx@linutronix.de>
Cc: Jason Cooper <jason@lakedaemon.net>
Cc: Marc Zyngier <marc.zyngier@arm.com>
Cc: Peter Zijlstra <peterz@infradead.org>
Cc: Linus Torvalds <torvalds@linux-foundation.org>
Cc: linux-kernel@vger.kernel.org
2018-12-18 14:22:28 +01:00
YueHaibing
07daef8b41 ntp: Remove duplicated include
Signed-off-by: YueHaibing <yuehaibing@huawei.com>
Signed-off-by: Thomas Gleixner <tglx@linutronix.de>
Cc: <john.stultz@linaro.org>
Cc: <sboyd@kernel.org>
Link: https://lkml.kernel.org/r/20181209062225.4344-1-yuehaibing@huawei.com
2018-12-18 12:59:33 +01:00
Tetsuo Handa
15ff2069cb printk: Add caller information to printk() output.
Sometimes we want to print a series of printk() messages to consoles
without being disturbed by concurrent printk() from interrupts and/or
other threads. But we can't enforce printk() callers to use their local
buffers because we need to ask them to make too much changes. Also, even
buffering up to one line inside printk() might cause failing to emit
an important clue under critical situation.

Therefore, instead of trying to help buffering, let's try to help
reconstructing messages by saving caller information as of calling
log_store() and adding it as "[T$thread_id]" or "[C$processor_id]"
upon printing to consoles.

Some examples for console output:

  [    1.222773][    T1] x86: Booting SMP configuration:
  [    2.779635][    T1] pci 0000:00:01.0: PCI bridge to [bus 01]
  [    5.069193][  T268] Fusion MPT base driver 3.04.20
  [    9.316504][    C2] random: fast init done
  [   13.413336][ T3355] Initialized host personality

Some examples for /dev/kmsg output:

  6,496,1222773,-,caller=T1;x86: Booting SMP configuration:
  6,968,2779635,-,caller=T1;pci 0000:00:01.0: PCI bridge to [bus 01]
   SUBSYSTEM=pci
   DEVICE=+pci:0000:00:01.0
  6,1353,5069193,-,caller=T268;Fusion MPT base driver 3.04.20
  5,1526,9316504,-,caller=C2;random: fast init done
  6,1575,13413336,-,caller=T3355;Initialized host personality

Note that this patch changes max length of messages which can be printed
by printk() or written to /dev/kmsg interface from 992 bytes to 976 bytes,
based on an assumption that userspace won't try to write messages hitting
that border line to /dev/kmsg interface.

Link: http://lkml.kernel.org/r/93f19e57-5051-c67d-9af4-b17624062d44@i-love.sakura.ne.jp
Cc: Dmitry Vyukov <dvyukov@google.com>
Cc: Sergey Senozhatsky <sergey.senozhatsky.work@gmail.com>
Cc: Steven Rostedt <rostedt@goodmis.org>
Cc: Linus Torvalds <torvalds@linux-foundation.org>
Cc: Andrew Morton <akpm@linux-foundation.org>
Cc: LKML <linux-kernel@vger.kernel.org>
Cc: syzkaller <syzkaller@googlegroups.com>
Signed-off-by: Tetsuo Handa <penguin-kernel@I-love.SAKURA.ne.jp>
Acked-by: Sergey Senozhatsky <sergey.senozhatsky@gmail.com>
Signed-off-by: Petr Mladek <pmladek@suse.com>
2018-12-18 10:53:14 +01:00
Yonghong Song
ffa0c1cf59 bpf: enable cgroup local storage map pretty print with kind_flag
Commit 970289fc0a83 ("bpf: add bpffs pretty print for cgroup
local storage maps") added bpffs pretty print for cgroup
local storage maps. The commit worked for struct without kind_flag
set.

This patch refactored and made pretty print also work
with kind_flag set for the struct.

Acked-by: Martin KaFai Lau <kafai@fb.com>
Signed-off-by: Yonghong Song <yhs@fb.com>
Signed-off-by: Daniel Borkmann <daniel@iogearbox.net>
2018-12-18 01:11:59 +01:00
Yonghong Song
9d5f9f701b bpf: btf: fix struct/union/fwd types with kind_flag
This patch fixed two issues with BTF. One is related to
struct/union bitfield encoding and the other is related to
forward type.

Issue #1 and solution:

======================

Current btf encoding of bitfield follows what pahole generates.
For each bitfield, pahole will duplicate the type chain and
put the bitfield size at the final int or enum type.
Since the BTF enum type cannot encode bit size,
pahole workarounds the issue by generating
an int type whenever the enum bit size is not 32.

For example,
  -bash-4.4$ cat t.c
  typedef int ___int;
  enum A { A1, A2, A3 };
  struct t {
    int a[5];
    ___int b:4;
    volatile enum A c:4;
  } g;
  -bash-4.4$ gcc -c -O2 -g t.c
The current kernel supports the following BTF encoding:
  $ pahole -JV t.o
  [1] TYPEDEF ___int type_id=2
  [2] INT int size=4 bit_offset=0 nr_bits=32 encoding=SIGNED
  [3] ENUM A size=4 vlen=3
        A1 val=0
        A2 val=1
        A3 val=2
  [4] STRUCT t size=24 vlen=3
        a type_id=5 bits_offset=0
        b type_id=9 bits_offset=160
        c type_id=11 bits_offset=164
  [5] ARRAY (anon) type_id=2 index_type_id=2 nr_elems=5
  [6] INT sizetype size=8 bit_offset=0 nr_bits=64 encoding=(none)
  [7] VOLATILE (anon) type_id=3
  [8] INT int size=1 bit_offset=0 nr_bits=4 encoding=(none)
  [9] TYPEDEF ___int type_id=8
  [10] INT (anon) size=1 bit_offset=0 nr_bits=4 encoding=SIGNED
  [11] VOLATILE (anon) type_id=10

Two issues are in the above:
  . by changing enum type to int, we lost the original
    type information and this will not be ideal later
    when we try to convert BTF to a header file.
  . the type duplication for bitfields will cause
    BTF bloat. Duplicated types cannot be deduplicated
    later if the bitfield size is different.

To fix this issue, this patch implemented a compatible
change for BTF struct type encoding:
  . the bit 31 of struct_type->info, previously reserved,
    now is used to indicate whether bitfield_size is
    encoded in btf_member or not.
  . if bit 31 of struct_type->info is set,
    btf_member->offset will encode like:
      bit 0 - 23: bit offset
      bit 24 - 31: bitfield size
    if bit 31 is not set, the old behavior is preserved:
      bit 0 - 31: bit offset

So if the struct contains a bit field, the maximum bit offset
will be reduced to (2^24 - 1) instead of MAX_UINT. The maximum
bitfield size will be 256 which is enough for today as maximum
bitfield in compiler can be 128 where int128 type is supported.

This kernel patch intends to support the new BTF encoding:
  $ pahole -JV t.o
  [1] TYPEDEF ___int type_id=2
  [2] INT int size=4 bit_offset=0 nr_bits=32 encoding=SIGNED
  [3] ENUM A size=4 vlen=3
        A1 val=0
        A2 val=1
        A3 val=2
  [4] STRUCT t kind_flag=1 size=24 vlen=3
        a type_id=5 bitfield_size=0 bits_offset=0
        b type_id=1 bitfield_size=4 bits_offset=160
        c type_id=7 bitfield_size=4 bits_offset=164
  [5] ARRAY (anon) type_id=2 index_type_id=2 nr_elems=5
  [6] INT sizetype size=8 bit_offset=0 nr_bits=64 encoding=(none)
  [7] VOLATILE (anon) type_id=3

Issue #2 and solution:
======================

Current forward type in BTF does not specify whether the original
type is struct or union. This will not work for type pretty print
and BTF-to-header-file conversion as struct/union must be specified.
  $ cat tt.c
  struct t;
  union u;
  int foo(struct t *t, union u *u) { return 0; }
  $ gcc -c -g -O2 tt.c
  $ pahole -JV tt.o
  [1] INT int size=4 bit_offset=0 nr_bits=32 encoding=SIGNED
  [2] FWD t type_id=0
  [3] PTR (anon) type_id=2
  [4] FWD u type_id=0
  [5] PTR (anon) type_id=4

To fix this issue, similar to issue #1, type->info bit 31
is used. If the bit is set, it is union type. Otherwise, it is
a struct type.

  $ pahole -JV tt.o
  [1] INT int size=4 bit_offset=0 nr_bits=32 encoding=SIGNED
  [2] FWD t kind_flag=0 type_id=0
  [3] PTR (anon) kind_flag=0 type_id=2
  [4] FWD u kind_flag=1 type_id=0
  [5] PTR (anon) kind_flag=0 type_id=4

Pahole/LLVM change:
===================

The new kind_flag functionality has been implemented in pahole
and llvm:
  https://github.com/yonghong-song/pahole/tree/bitfield
  https://github.com/yonghong-song/llvm/tree/bitfield

Note that pahole hasn't implemented func/func_proto kind
and .BTF.ext. So to print function signature with bpftool,
the llvm compiler should be used.

Fixes: 69b693f0ae ("bpf: btf: Introduce BPF Type Format (BTF)")
Acked-by: Martin KaFai Lau <kafai@fb.com>
Signed-off-by: Martin KaFai Lau <kafai@fb.com>
Signed-off-by: Yonghong Song <yhs@fb.com>
Signed-off-by: Daniel Borkmann <daniel@iogearbox.net>
2018-12-18 01:11:59 +01:00
Yonghong Song
f97be3ab04 bpf: btf: refactor btf_int_bits_seq_show()
Refactor function btf_int_bits_seq_show() by creating
function btf_bitfield_seq_show() which has no dependence
on btf and btf_type. The function btf_bitfield_seq_show()
will be in later patch to directly dump bitfield member values.

Acked-by: Martin KaFai Lau <kafai@fb.com>
Signed-off-by: Yonghong Song <yhs@fb.com>
Signed-off-by: Daniel Borkmann <daniel@iogearbox.net>
2018-12-18 01:11:59 +01:00
Daniel Borkmann
6c4fc209fc bpf: remove useless version check for prog load
Existing libraries and tracing frameworks work around this kernel
version check by automatically deriving the kernel version from
uname(3) or similar such that the user does not need to do it
manually; these workarounds also make the version check useless
at the same time.

Moreover, most other BPF tracing types enabling bpf_probe_read()-like
functionality have /not/ adapted this check, and in general these
days it is well understood anyway that all the tracing programs are
not stable with regards to future kernels as kernel internal data
structures are subject to change from release to release.

Back at last netconf we discussed [0] and agreed to remove this
check from bpf_prog_load() and instead document it here in the uapi
header that there is no such guarantee for stable API for these
programs.

  [0] http://vger.kernel.org/netconf2018_files/DanielBorkmann_netconf2018.pdf

Signed-off-by: Daniel Borkmann <daniel@iogearbox.net>
Acked-by: Alexei Starovoitov <ast@kernel.org>
Acked-by: Quentin Monnet <quentin.monnet@netronome.com>
Signed-off-by: Alexei Starovoitov <ast@kernel.org>
2018-12-17 13:41:35 -08:00
Lendacky, Thomas
c92a54cfa0 dma-direct: do not include SME mask in the DMA supported check
The dma_direct_supported() function intends to check the DMA mask against
specific values. However, the phys_to_dma() function includes the SME
encryption mask, which defeats the intended purpose of the check. This
results in drivers that support less than 48-bit DMA (SME encryption mask
is bit 47) from being able to set the DMA mask successfully when SME is
active, which results in the driver failing to initialize.

Change the function used to check the mask from phys_to_dma() to
__phys_to_dma() so that the SME encryption mask is not part of the check.

Fixes: c1d0af1a1d ("kernel/dma/direct: take DMA offset into account in dma_direct_supported")
Signed-off-by: Tom Lendacky <thomas.lendacky@amd.com>
Signed-off-by: Christoph Hellwig <hch@lst.de>
2018-12-17 18:02:11 +01:00
Masami Hiramatsu
fb1a59fae8 kprobes: Blacklist symbols in arch-defined prohibited area
Blacklist symbols in arch-defined probe-prohibited areas.
With this change, user can see all symbols which are prohibited
to probe in debugfs.

All archtectures which have custom prohibit areas should define
its own arch_populate_kprobe_blacklist() function, but unless that,
all symbols marked __kprobes are blacklisted.

Reported-by: Andrea Righi <righi.andrea@gmail.com>
Tested-by: Andrea Righi <righi.andrea@gmail.com>
Signed-off-by: Masami Hiramatsu <mhiramat@kernel.org>
Cc: Andy Lutomirski <luto@kernel.org>
Cc: Anil S Keshavamurthy <anil.s.keshavamurthy@intel.com>
Cc: Borislav Petkov <bp@alien8.de>
Cc: David S. Miller <davem@davemloft.net>
Cc: Linus Torvalds <torvalds@linux-foundation.org>
Cc: Naveen N. Rao <naveen.n.rao@linux.vnet.ibm.com>
Cc: Peter Zijlstra <peterz@infradead.org>
Cc: Thomas Gleixner <tglx@linutronix.de>
Cc: Yonghong Song <yhs@fb.com>
Link: http://lkml.kernel.org/r/154503485491.26176.15823229545155174796.stgit@devbox
Signed-off-by: Ingo Molnar <mingo@kernel.org>
2018-12-17 17:48:38 +01:00
Ingo Molnar
76aea1eeb9 Linux 4.20-rc7
-----BEGIN PGP SIGNATURE-----
 
 iQFSBAABCAA8FiEEq68RxlopcLEwq+PEeb4+QwBBGIYFAlwW4/oeHHRvcnZhbGRz
 QGxpbnV4LWZvdW5kYXRpb24ub3JnAAoJEHm+PkMAQRiG2QMH/Rl6iMpTUX23tMHe
 eXQzAOSvQXaWlFoX25j1Jvt8nhS7Uy8vkdpYTCOI/7DF0Jg4O/6uxcZkErlwWxb8
 MW1rMgpfO+OpDLSLXAO2GKxaKI3ArqF2BcOQA2mji1/jR2VUTqmIvBoudn5d+GYz
 19aCyfdzmVTC38G9sBhhcqJ10EkxLiHe2K74bf4JxVuSf2EnTI4LYt5xJPDoT0/C
 6fOeUNwVhvv5a4svvzJmortq7x7BwyxBQArc7PbO0MPhabLU4wyFUOTRszgsGd76
 o5JuOFwgdIIHlSSacGla6rKq10nmkwR07fHfRFFwbvrfBOEHsXOP2hvzMZX+FLBK
 IXOzdtc=
 =XlMc
 -----END PGP SIGNATURE-----

Merge tag 'v4.20-rc7' into perf/core, to pick up fixes

Signed-off-by: Ingo Molnar <mingo@kernel.org>
2018-12-17 17:46:26 +01:00
Thomas Gleixner
0e334db6bb posix-timers: Fix division by zero bug
The signal delivery path of posix-timers can try to rearm the timer even if
the interval is zero. That's handled for the common case (hrtimer) but not
for alarm timers. In that case the forwarding function raises a division by
zero exception.

The handling for hrtimer based posix timers is wrong because it marks the
timer as active despite the fact that it is stopped.

Move the check from common_hrtimer_rearm() to posixtimer_rearm() to cure
both issues.

Reported-by: syzbot+9d38bedac9cc77b8ad5e@syzkaller.appspotmail.com
Signed-off-by: Thomas Gleixner <tglx@linutronix.de>
Cc: John Stultz <john.stultz@linaro.org>
Cc: Linus Torvalds <torvalds@linux-foundation.org>
Cc: Peter Zijlstra <peterz@infradead.org>
Cc: sboyd@kernel.org
Cc: stable@vger.kernel.org
Cc: syzkaller-bugs@googlegroups.com
Link: http://lkml.kernel.org/r/alpine.DEB.2.21.1812171328050.1880@nanos.tec.linutronix.de
Signed-off-by: Ingo Molnar <mingo@kernel.org>
2018-12-17 17:35:45 +01:00
David S. Miller
10589a568f Merge git://git.kernel.org/pub/scm/linux/kernel/git/bpf/bpf
Alexei Starovoitov says:

====================
pull-request: bpf 2018-12-15

The following pull-request contains BPF updates for your *net* tree.

The main changes are:

1) fix liveness propagation of callee saved registers, from Jakub.

2) fix overflow in bpf_jit_limit knob, from Daniel.

3) bpf_flow_dissector api fix, from Stanislav.

4) bpf_perf_event api fix on powerpc, from Sandipan.
====================

Signed-off-by: David S. Miller <davem@davemloft.net>
2018-12-15 10:58:32 -08:00
Masahiro Yamada
723679339d kconfig: warn no new line at end of file
It would be nice to warn if a new line is missing at end of file.

We could do this by checkpatch.pl for arbitrary files, but new line
is rather essential as a statement terminator in Kconfig.

The warning message looks like this:

  kernel/Kconfig.preempt:60:warning: no new line at end of file

Currently, kernel/Kconfig.preempt is the only file with no new line
at end of file. Fix it.

I know there are some false negative cases. For example, no warning
is displayed when the last line contains some whitespaces/comments,
but no new line. Yet, this commit works well for most cases.

Signed-off-by: Masahiro Yamada <yamada.masahiro@socionext.com>
2018-12-15 17:44:35 +09:00
Alexei Starovoitov
9242b5f561 bpf: add self-check logic to liveness analysis
Introduce REG_LIVE_DONE to check the liveness propagation
and prepare the states for merging.
See algorithm description in clean_live_states().

Signed-off-by: Alexei Starovoitov <ast@kernel.org>
Acked-by: Jakub Kicinski <jakub.kicinski@netronome.com>
Signed-off-by: Daniel Borkmann <daniel@iogearbox.net>
2018-12-15 01:28:32 +01:00
Alexei Starovoitov
19e2dbb7dd bpf: improve stacksafe state comparison
"if (old->allocated_stack > cur->allocated_stack)" check is too conservative.
In some cases explored stack could have allocated more space,
but that stack space was not live.
The test case improves from 19 to 15 processed insns
and improvement on real programs is significant as well:

                       before    after
bpf_lb-DLB_L3.o        1940      1831
bpf_lb-DLB_L4.o        3089      3029
bpf_lb-DUNKNOWN.o      1065      1064
bpf_lxc-DDROP_ALL.o    28052     26309
bpf_lxc-DUNKNOWN.o     35487     33517
bpf_netdev.o           10864     9713
bpf_overlay.o          6643      6184
bpf_lcx_jit.o          38437     37335

Signed-off-by: Alexei Starovoitov <ast@kernel.org>
Acked-by: Edward Cree <ecree@solarflare.com>
Acked-by: Jakub Kicinski <jakub.kicinski@netronome.com>
Signed-off-by: Daniel Borkmann <daniel@iogearbox.net>
2018-12-15 01:28:32 +01:00
Alexei Starovoitov
b233920c97 bpf: speed up stacksafe check
Don't check the same stack liveness condition 8 times.
once is enough.

Signed-off-by: Alexei Starovoitov <ast@kernel.org>
Acked-by: Edward Cree <ecree@solarflare.com>
Acked-by: Jakub Kicinski <jakub.kicinski@netronome.com>
Signed-off-by: Daniel Borkmann <daniel@iogearbox.net>
2018-12-15 01:28:32 +01:00
Martin KaFai Lau
d9762e84ed bpf: verbose log bpf_line_info in verifier
This patch adds bpf_line_info during the verifier's verbose.
It can give error context for debug purpose.

~~~~~~~~~~
Here is the verbose log for backedge:
	while (a) {
		a += bpf_get_smp_processor_id();
		bpf_trace_printk(fmt, sizeof(fmt), a);
	}

~> bpftool prog load ./test_loop.o /sys/fs/bpf/test_loop type tracepoint
13: while (a) {
3: a += bpf_get_smp_processor_id();
back-edge from insn 13 to 3

~~~~~~~~~~
Here is the verbose log for invalid pkt access:
Modification to test_xdp_noinline.c:

	data = (void *)(long)xdp->data;
	data_end = (void *)(long)xdp->data_end;
/*
	if (data + 4 > data_end)
		return XDP_DROP;
*/
	*(u32 *)data = dst->dst;

~> bpftool prog load ./test_xdp_noinline.o /sys/fs/bpf/test_xdp_noinline type xdp
; data = (void *)(long)xdp->data;
224: (79) r2 = *(u64 *)(r10 -112)
225: (61) r2 = *(u32 *)(r2 +0)
; *(u32 *)data = dst->dst;
226: (63) *(u32 *)(r2 +0) = r1
invalid access to packet, off=0 size=4, R2(id=0,off=0,r=0)
R2 offset is outside of the packet

Signed-off-by: Martin KaFai Lau <kafai@fb.com>
Acked-by: Yonghong Song <yhs@fb.com>
Signed-off-by: Alexei Starovoitov <ast@kernel.org>
2018-12-14 14:17:34 -08:00
Martin KaFai Lau
23127b33ec bpf: Create a new btf_name_by_offset() for non type name use case
The current btf_name_by_offset() is returning "(anon)" type name for
the offset == 0 case and "(invalid-name-offset)" for the out-of-bound
offset case.

It fits well for the internal BTF verbose log purpose which
is focusing on type.  For example,
offset == 0 => "(anon)" => anonymous type/name.
Returning non-NULL for the bad offset case is needed
during the BTF verification process because the BTF verifier may
complain about another field first before discovering the name_off
is invalid.

However, it may not be ideal for the newer use case which does not
necessary mean type name.  For example, when logging line_info
in the BPF verifier in the next patch, it is better to log an
empty src line instead of logging "(anon)".

The existing bpf_name_by_offset() is renamed to __bpf_name_by_offset()
and static to btf.c.

A new bpf_name_by_offset() is added for generic context usage.  It
returns "\0" for name_off == 0 (note that btf->strings[0] is "\0")
and NULL for invalid offset.  It allows the caller to decide
what is the best output in its context.

The new btf_name_by_offset() is overlapped with btf_name_offset_valid().
Hence, btf_name_offset_valid() is removed from btf.h to keep the btf.h API
minimal.  The existing btf_name_offset_valid() usage in btf.c could also be
replaced later.

Signed-off-by: Martin KaFai Lau <kafai@fb.com>
Acked-by: Yonghong Song <yhs@fb.com>
Signed-off-by: Alexei Starovoitov <ast@kernel.org>
2018-12-14 14:17:34 -08:00
Vincent Whitchurch
93d77e7f14 ARM: module: Fix function kallsyms on Thumb-2
Thumb-2 functions have the lowest bit set in the symbol value in the
symtab.  When kallsyms are generated for the vmlinux, the kallsyms are
generated from the output of nm, and nm clears the lowest bit.

 $ arm-linux-gnueabihf-readelf -a vmlinux | grep show_interrupts
  95947: 8015dc89   686 FUNC    GLOBAL DEFAULT    2 show_interrupts
 $ arm-linux-gnueabihf-nm vmlinux | grep show_interrupts
 8015dc88 T show_interrupts
 $ cat /proc/kallsyms | grep show_interrupts
 8015dc88 T show_interrupts

However, for modules, the kallsyms uses the values in the symbol table
without modification, so for functions in modules, the lowest bit is set
in kallsyms.

 $ arm-linux-gnueabihf-readelf -a drivers/net/tun.ko | grep tun_get_socket
    333: 00002d4d    36 FUNC    GLOBAL DEFAULT    1 tun_get_socket
 $ arm-linux-gnueabihf-nm drivers/net/tun.ko | grep tun_get_socket
 00002d4c T tun_get_socket
 $ cat /proc/kallsyms | grep tun_get_socket
 7f802d4d t tun_get_socket      [tun]

Because of this, the symbol+offset of the crashing instruction shown in
oopses is incorrect when the crash is in a module.  For example, given a
tun_get_socket which starts like this,

 00002d4c <tun_get_socket>:
     2d4c:       6943            ldr     r3, [r0, #20]
     2d4e:       4a07            ldr     r2, [pc, #28]
     2d50:       4293            cmp     r3, r2

a crash when tun_get_socket is called with NULL results in:

 PC is at tun_xdp+0xa3/0xa4 [tun]
 pc : [<7f802d4c>]

As can be seen, the "PC is at" line reports the wrong symbol name, and
the symbol+offset will point to the wrong source line if it is passed to
gdb.

To solve this, add a way for archs to fixup the reading of these module
kallsyms values, and use that to clear the lowest bit for function
symbols on Thumb-2.

After the fix:

 # cat /proc/kallsyms | grep tun_get_socket
 7f802d4c t tun_get_socket       [tun]

 PC is at tun_get_socket+0x0/0x24 [tun]
 pc : [<7f802d4c>]

Signed-off-by: Vincent Whitchurch <vincent.whitchurch@axis.com>
Signed-off-by: Jessica Yu <jeyu@kernel.org>
2018-12-14 20:27:29 +01:00
Vincent Whitchurch
5439c985c5 module: Overwrite st_size instead of st_info
st_info is currently overwritten after relocation and used to store the
elf_type().  However, we're going to need it fix kallsyms on ARM's
Thumb-2 kernels, so preserve st_info and overwrite the st_size field
instead.  st_size is neither used by the module core nor by any
architecture.

Reviewed-by: Miroslav Benes <mbenes@suse.cz>
Reviewed-by: Dave Martin <Dave.Martin@arm.com>
Signed-off-by: Vincent Whitchurch <vincent.whitchurch@axis.com>
Signed-off-by: Jessica Yu <jeyu@kernel.org>
2018-12-14 20:27:21 +01:00
YueHaibing
d406db524c audit: remove duplicated include from audit.c
Remove duplicated include.

Signed-off-by: YueHaibing <yuehaibing@huawei.com>
Signed-off-by: Paul Moore <paul@paul-moore.com>
2018-12-14 12:09:30 -05:00
Tycho Andersen
319deec7db seccomp: fix poor type promotion
sparse complains,

kernel/seccomp.c:1172:13: warning: incorrect type in assignment (different base types)
kernel/seccomp.c:1172:13:    expected restricted __poll_t [usertype] ret
kernel/seccomp.c:1172:13:    got int
kernel/seccomp.c:1173:13: warning: restricted __poll_t degrades to integer

Instead of assigning this to ret, since we don't use this anywhere, let's
just test it against 0 directly.

Signed-off-by: Tycho Andersen <tycho@tycho.ws>
Reported-by: 0day robot <lkp@intel.com>
Fixes: 6a21cc50f0 ("seccomp: add a return code to trap to userspace")
Signed-off-by: Kees Cook <keescook@chromium.org>
2018-12-13 16:49:01 -08:00
Daniel Borkmann
9f8c1c5712 bpf: remove obsolete prog->aux sanitation in bpf_insn_prepare_dump
This logic is not needed anymore since we got rid of the verifier
rewrite that was using prog->aux address in f6069b9aa9 ("bpf:
fix redirect to map under tail calls").

Signed-off-by: Daniel Borkmann <daniel@iogearbox.net>
Signed-off-by: Alexei Starovoitov <ast@kernel.org>
2018-12-13 12:42:44 -08:00
Christoph Hellwig
356da6d0cd dma-mapping: bypass indirect calls for dma-direct
Avoid expensive indirect calls in the fast path DMA mapping
operations by directly calling the dma_direct_* ops if we are using
the directly mapped DMA operations.

Signed-off-by: Christoph Hellwig <hch@lst.de>
Acked-by: Jesper Dangaard Brouer <brouer@redhat.com>
Tested-by: Jesper Dangaard Brouer <brouer@redhat.com>
Tested-by: Tony Luck <tony.luck@intel.com>
2018-12-13 21:06:18 +01:00
Christoph Hellwig
55897af630 dma-direct: merge swiotlb_dma_ops into the dma_direct code
While the dma-direct code is (relatively) clean and simple we actually
have to use the swiotlb ops for the mapping on many architectures due
to devices with addressing limits.  Instead of keeping two
implementations around this commit allows the dma-direct
implementation to call the swiotlb bounce buffering functions and
thus share the guts of the mapping implementation.  This also
simplified the dma-mapping setup on a few architectures where we
don't have to differenciate which implementation to use.

Signed-off-by: Christoph Hellwig <hch@lst.de>
Acked-by: Jesper Dangaard Brouer <brouer@redhat.com>
Tested-by: Jesper Dangaard Brouer <brouer@redhat.com>
Tested-by: Tony Luck <tony.luck@intel.com>
2018-12-13 21:06:17 +01:00
Christoph Hellwig
17ac524719 dma-direct: use dma_direct_map_page to implement dma_direct_map_sg
No need to duplicate the mapping logic.

Signed-off-by: Christoph Hellwig <hch@lst.de>
Acked-by: Jesper Dangaard Brouer <brouer@redhat.com>
Tested-by: Jesper Dangaard Brouer <brouer@redhat.com>
Tested-by: Tony Luck <tony.luck@intel.com>
2018-12-13 21:06:16 +01:00
Christoph Hellwig
58dfd4ac02 dma-direct: improve addressability error reporting
Only report report a DMA addressability report once to avoid spewing the
kernel log with repeated message.  Also provide a stack trace to make it
easy to find the actual caller that caused the problem.

Last but not least move the actual check into the fast path and only
leave the error reporting in a helper.

Signed-off-by: Christoph Hellwig <hch@lst.de>
Acked-by: Jesper Dangaard Brouer <brouer@redhat.com>
Tested-by: Jesper Dangaard Brouer <brouer@redhat.com>
Tested-by: Tony Luck <tony.luck@intel.com>
2018-12-13 21:06:15 +01:00
Christoph Hellwig
68c608345c swiotlb: remove dma_mark_clean
Instead of providing a special dma_mark_clean hook just for ia64, switch
ia64 to use the normal arch_sync_dma_for_cpu hooks instead.

This means that we now also set the PG_arch_1 bit for pages in the
swiotlb buffer, which isn't stricly needed as we will never execute code
out of the swiotlb buffer, but otherwise harmless.

Signed-off-by: Christoph Hellwig <hch@lst.de>
Acked-by: Jesper Dangaard Brouer <brouer@redhat.com>
Tested-by: Jesper Dangaard Brouer <brouer@redhat.com>
Tested-by: Tony Luck <tony.luck@intel.com>
2018-12-13 21:06:14 +01:00
Christoph Hellwig
b907e20508 swiotlb: remove SWIOTLB_MAP_ERROR
We can use DMA_MAPPING_ERROR instead, which already maps to the same
value.

Signed-off-by: Christoph Hellwig <hch@lst.de>
Acked-by: Jesper Dangaard Brouer <brouer@redhat.com>
Tested-by: Jesper Dangaard Brouer <brouer@redhat.com>
Tested-by: Tony Luck <tony.luck@intel.com>
2018-12-13 21:06:13 +01:00
Robin Murphy
90ac706e98 dma-mapping: factor out dummy DMA ops
The dummy DMA ops are currently used by arm64 for any device which has
an invalid ACPI description and is thus barred from using DMA due to not
knowing whether is is cache-coherent or not. Factor these out into
general dma-mapping code so that they can be referenced from other
common code paths. In the process, we can prune all the optional
callbacks which just do the same thing as the default behaviour, and
fill in .map_resource for completeness.

Signed-off-by: Robin Murphy <robin.murphy@arm.com>
[hch: moved to a separate source file]
Reviewed-by: Rafael J. Wysocki <rafael.j.wysocki@intel.com>
Acked-by: Jesper Dangaard Brouer <brouer@redhat.com>
Tested-by: Jesper Dangaard Brouer <brouer@redhat.com>
Tested-by: Tony Luck <tony.luck@intel.com>
Signed-off-by: Christoph Hellwig <hch@lst.de>
2018-12-13 21:06:12 +01:00
Christoph Hellwig
3731c3d477 dma-mapping: always build the direct mapping code
All architectures except for sparc64 use the dma-direct code in some
form, and even for sparc64 we had the discussion of a direct mapping
mode a while ago.  In preparation for directly calling the direct
mapping code don't bother having it optionally but always build the
code in.  This is a minor hardship for some powerpc and arm configs
that don't pull it in yet (although they should in a relase ot two),
and sparc64 which currently doesn't need it at all, but it will
reduce the ifdef mess we'd otherwise need significantly.

Signed-off-by: Christoph Hellwig <hch@lst.de>
Acked-by: Jesper Dangaard Brouer <brouer@redhat.com>
Tested-by: Jesper Dangaard Brouer <brouer@redhat.com>
Tested-by: Tony Luck <tony.luck@intel.com>
2018-12-13 21:06:11 +01:00
Christoph Hellwig
8ddbe5943c dma-mapping: move dma_cache_sync out of line
This isn't exactly a slow path routine, but it is not super critical
either, and moving it out of line will help to keep the include chain
clean for the following DMA indirection bypass work.

Signed-off-by: Christoph Hellwig <hch@lst.de>
Acked-by: Jesper Dangaard Brouer <brouer@redhat.com>
Tested-by: Jesper Dangaard Brouer <brouer@redhat.com>
Tested-by: Tony Luck <tony.luck@intel.com>
2018-12-13 21:06:10 +01:00