Commit graph

29997 commits

Author SHA1 Message Date
Ravi Bangoria
1aed58e67a Uprobes: Fix kernel oops with delayed_uprobe_remove()
There could be a race between task exit and probe unregister:

  exit_mm()
  mmput()
  __mmput()                     uprobe_unregister()
  uprobe_clear_state()          put_uprobe()
  delayed_uprobe_remove()       delayed_uprobe_remove()

put_uprobe() is calling delayed_uprobe_remove() without taking
delayed_uprobe_lock and thus the race sometimes results in a
kernel crash. Fix this by taking delayed_uprobe_lock before
calling delayed_uprobe_remove() from put_uprobe().

Detailed crash log can be found at:
  Link: http://lkml.kernel.org/r/000000000000140c370577db5ece@google.com

Link: http://lkml.kernel.org/r/20181205033423.26242-1-ravi.bangoria@linux.ibm.com

Acked-by: Oleg Nesterov <oleg@redhat.com>
Reviewed-by: Srikar Dronamraju <srikar@linux.vnet.ibm.com>
Reported-by: syzbot+cb1fb754b771caca0a88@syzkaller.appspotmail.com
Fixes: 1cc33161a8 ("uprobes: Support SDT markers having reference count (semaphore)")
Signed-off-by: Ravi Bangoria <ravi.bangoria@linux.ibm.com>
Signed-off-by: Steven Rostedt (VMware) <rostedt@goodmis.org>
2018-12-05 23:05:13 -05:00
Anders Roxell
e9c7d65661 stackleak: Mark stackleak_track_stack() as notrace
Function graph tracing recurses into itself when stackleak is enabled,
causing the ftrace graph selftest to run for up to 90 seconds and
trigger the softlockup watchdog.

Breakpoint 2, ftrace_graph_caller () at ../arch/arm64/kernel/entry-ftrace.S:200
200             mcount_get_lr_addr        x0    //     pointer to function's saved lr
(gdb) bt
\#0  ftrace_graph_caller () at ../arch/arm64/kernel/entry-ftrace.S:200
\#1  0xffffff80081d5280 in ftrace_caller () at ../arch/arm64/kernel/entry-ftrace.S:153
\#2  0xffffff8008555484 in stackleak_track_stack () at ../kernel/stackleak.c:106
\#3  0xffffff8008421ff8 in ftrace_ops_test (ops=0xffffff8009eaa840 <graph_ops>, ip=18446743524091297036, regs=<optimized out>) at ../kernel/trace/ftrace.c:1507
\#4  0xffffff8008428770 in __ftrace_ops_list_func (regs=<optimized out>, ignored=<optimized out>, parent_ip=<optimized out>, ip=<optimized out>) at ../kernel/trace/ftrace.c:6286
\#5  ftrace_ops_no_ops (ip=18446743524091297036, parent_ip=18446743524091242824) at ../kernel/trace/ftrace.c:6321
\#6  0xffffff80081d5280 in ftrace_caller () at ../arch/arm64/kernel/entry-ftrace.S:153
\#7  0xffffff800832fd10 in irq_find_mapping (domain=0xffffffc03fc4bc80, hwirq=27) at ../kernel/irq/irqdomain.c:876
\#8  0xffffff800832294c in __handle_domain_irq (domain=0xffffffc03fc4bc80, hwirq=27, lookup=true, regs=0xffffff800814b840) at ../kernel/irq/irqdesc.c:650
\#9  0xffffff80081d52b4 in ftrace_graph_caller () at ../arch/arm64/kernel/entry-ftrace.S:205

Rework so we mark stackleak_track_stack as notrace

Co-developed-by: Arnd Bergmann <arnd@arndb.de>
Signed-off-by: Arnd Bergmann <arnd@arndb.de>
Signed-off-by: Anders Roxell <anders.roxell@linaro.org>
Acked-by: Steven Rostedt (VMware) <rostedt@goodmis.org>
Signed-off-by: Kees Cook <keescook@chromium.org>
2018-12-05 19:31:44 -08:00
Martin KaFai Lau
d30d42e08c bpf: Change insn_offset to insn_off in bpf_func_info
The later patch will introduce "struct bpf_line_info" which
has member "line_off" and "file_off" referring back to the
string section in btf.  The line_"off" and file_"off"
are more consistent to the naming convention in btf.h that
means "offset" (e.g. name_off in "struct btf_type").

The to-be-added "struct bpf_line_info" also has another
member, "insn_off" which is the same as the "insn_offset"
in "struct bpf_func_info".  Hence, this patch renames "insn_offset"
to "insn_off" for "struct bpf_func_info".

Signed-off-by: Martin KaFai Lau <kafai@fb.com>
Acked-by: Yonghong Song <yhs@fb.com>
Signed-off-by: Alexei Starovoitov <ast@kernel.org>
2018-12-05 18:48:40 -08:00
Martin KaFai Lau
7337224fc1 bpf: Improve the info.func_info and info.func_info_rec_size behavior
1) When bpf_dump_raw_ok() == false and the kernel can provide >=1
   func_info to the userspace, the current behavior is setting
   the info.func_info_cnt to 0 instead of setting info.func_info
   to 0.

   It is different from the behavior in jited_func_lens/nr_jited_func_lens,
   jited_ksyms/nr_jited_ksyms...etc.

   This patch fixes it. (i.e. set func_info to 0 instead of
   func_info_cnt to 0 when bpf_dump_raw_ok() == false).

2) When the userspace passed in info.func_info_cnt == 0, the kernel
   will set the expected func_info size back to the
   info.func_info_rec_size.  It is a way for the userspace to learn
   the kernel expected func_info_rec_size introduced in
   commit 838e96904f ("bpf: Introduce bpf_func_info").

   An exception is the kernel expected size is not set when
   func_info is not available for a bpf_prog.  This makes the
   returned info.func_info_rec_size has different values
   depending on the returned value of info.func_info_cnt.

   This patch sets the kernel expected size to info.func_info_rec_size
   independent of the info.func_info_cnt.

3) The current logic only rejects invalid func_info_rec_size if
   func_info_cnt is non zero.  This patch also rejects invalid
   nonzero info.func_info_rec_size and not equal to the kernel
   expected size.

4) Set info.btf_id as long as prog->aux->btf != NULL.  That will
   setup the later copy_to_user() codes look the same as others
   which then easier to understand and maintain.

   prog->aux->btf is not NULL only if prog->aux->func_info_cnt > 0.

   Breaking up info.btf_id from prog->aux->func_info_cnt is needed
   for the later line info patch anyway.

   A similar change is made to bpf_get_prog_name().

Fixes: 838e96904f ("bpf: Introduce bpf_func_info")
Signed-off-by: Martin KaFai Lau <kafai@fb.com>
Acked-by: Yonghong Song <yhs@fb.com>
Signed-off-by: Alexei Starovoitov <ast@kernel.org>
2018-12-05 18:48:40 -08:00
David S. Miller
e37d05a538 Merge git://git.kernel.org/pub/scm/linux/kernel/git/bpf/bpf
Alexei Starovoitov says:

====================
pull-request: bpf 2018-12-05

The following pull-request contains BPF updates for your *net* tree.

The main changes are:

1) fix bpf uapi pointers for 32-bit architectures, from Daniel.

2) improve verifer ability to handle progs with a lot of branches, from Alexei.

3) strict btf checks, from Yonghong.

4) bpf_sk_lookup api cleanup, from Joe.

5) other misc fixes
====================

Signed-off-by: David S. Miller <davem@davemloft.net>
2018-12-05 16:30:30 -08:00
Ard Biesheuvel
dc002bb62f bpf: add __weak hook for allocating executable memory
By default, BPF uses module_alloc() to allocate executable memory,
but this is not necessary on all arches and potentially undesirable
on some of them.

So break out the module_alloc() and module_memfree() calls into __weak
functions to allow them to be overridden in arch code.

Signed-off-by: Ard Biesheuvel <ard.biesheuvel@linaro.org>
Signed-off-by: Daniel Borkmann <daniel@iogearbox.net>
2018-12-05 16:36:28 +01:00
Marek Szyprowski
a1da439cc0 dma-mapping: fix lack of DMA address assignment in generic remap allocator
Commit bfd56cd605 ("dma-mapping: support highmem in the generic remap
allocator") replaced dma_direct_alloc_pages() with __dma_direct_alloc_pages(),
which doesn't set dma_handle and zero allocated memory. Fix it by doing this
directly in the caller function.

Fixes: bfd56cd605 ("dma-mapping: support highmem in the generic remap allocator")
Signed-off-by: Marek Szyprowski <m.szyprowski@samsung.com>
Tested-by: Thierry Reding <treding@nvidia.com>
Signed-off-by: Christoph Hellwig <hch@lst.de>
2018-12-05 05:49:10 -08:00
Bart Van Assche
ce10a5b395 timekeeping: Use proper seqcount initializer
tk_core.seq is initialized open coded, but that misses to initialize the
lockdep map when lockdep is enabled. Lockdep splats involving tk_core seq
consequently lack a name and are hard to read.

Use the proper initializer which takes care of the lockdep map
initialization.

[ tglx: Massaged changelog ]

Signed-off-by: Bart Van Assche <bvanassche@acm.org>
Signed-off-by: Thomas Gleixner <tglx@linutronix.de>
Cc: peterz@infradead.org
Cc: tj@kernel.org
Cc: johannes.berg@intel.com
Link: https://lkml.kernel.org/r/20181128234325.110011-12-bvanassche@acm.org
2018-12-05 11:00:09 +01:00
Jens Axboe
89d04ec349 Linux 4.20-rc5
-----BEGIN PGP SIGNATURE-----
 
 iQFSBAABCAA8FiEEq68RxlopcLEwq+PEeb4+QwBBGIYFAlwEZdIeHHRvcnZhbGRz
 QGxpbnV4LWZvdW5kYXRpb24ub3JnAAoJEHm+PkMAQRiGAlQH/19oax2Za3IPqF4X
 DM3lal5M6zlUVkoYstqzpbR3MqUwgEnMfvoeMDC6mI9N4/+r2LkV7cRR8HzqQCCS
 jDfD69IzRGb52VSeJmbOrkxBWsR1Nn0t4Z3rEeLPxwaOoNpRc8H973MbAQ2FKMpY
 S4Y3jIK1dNiRRxdh52NupVkQF+djAUwkBuVk/rrvRJmTDij4la03cuCDAO+Di9lt
 GHlVvygKw2SJhDR+z3ArwZNmE0ceCcE6+W7zPHzj2KeWuKrZg22kfUD454f2YEIw
 FG0hu9qecgtpYCkLSm2vr4jQzmpsDoyq3ZfwhjGrP4qtvPC3Db3vL3dbQnkzUcJu
 JtwhVCE=
 =O1q1
 -----END PGP SIGNATURE-----

Merge tag 'v4.20-rc5' into for-4.21/block

Pull in v4.20-rc5, solving a conflict we'll otherwise get in aio.c and
also getting the merge fix that went into mainline that users are
hitting testing for-4.21/block and/or for-next.

* tag 'v4.20-rc5': (664 commits)
  Linux 4.20-rc5
  PCI: Fix incorrect value returned from pcie_get_speed_cap()
  MAINTAINERS: Update linux-mips mailing list address
  ocfs2: fix potential use after free
  mm/khugepaged: fix the xas_create_range() error path
  mm/khugepaged: collapse_shmem() do not crash on Compound
  mm/khugepaged: collapse_shmem() without freezing new_page
  mm/khugepaged: minor reorderings in collapse_shmem()
  mm/khugepaged: collapse_shmem() remember to clear holes
  mm/khugepaged: fix crashes due to misaccounted holes
  mm/khugepaged: collapse_shmem() stop if punched or truncated
  mm/huge_memory: fix lockdep complaint on 32-bit i_size_read()
  mm/huge_memory: splitting set mapping+index before unfreeze
  mm/huge_memory: rename freeze_page() to unmap_page()
  initramfs: clean old path before creating a hardlink
  kernel/kcov.c: mark funcs in __sanitizer_cov_trace_pc() as notrace
  psi: make disabling/enabling easier for vendor kernels
  proc: fixup map_files test on arm
  debugobjects: avoid recursive calls with kmemleak
  userfaultfd: shmem: UFFDIO_COPY: set the page dirty if VM_WRITE is not set
  ...
2018-12-04 09:38:05 -07:00
Alexei Starovoitov
ceefbc96fa bpf: add per-insn complexity limit
malicious bpf program may try to force the verifier to remember
a lot of distinct verifier states.
Put a limit to number of per-insn 'struct bpf_verifier_state'.
Note that hitting the limit doesn't reject the program.
It potentially makes the verifier do more steps to analyze the program.
It means that malicious programs will hit BPF_COMPLEXITY_LIMIT_INSNS sooner
instead of spending cpu time walking long link list.

The limit of BPF_COMPLEXITY_LIMIT_STATES==64 affects cilium progs
with slight increase in number of "steps" it takes to successfully verify
the programs:
                       before    after
bpf_lb-DLB_L3.o         1940      1940
bpf_lb-DLB_L4.o         3089      3089
bpf_lb-DUNKNOWN.o       1065      1065
bpf_lxc-DDROP_ALL.o     28052  |  28162
bpf_lxc-DUNKNOWN.o      35487  |  35541
bpf_netdev.o            10864     10864
bpf_overlay.o           6643      6643
bpf_lcx_jit.o           38437     38437

But it also makes malicious program to be rejected in 0.4 seconds vs 6.5
Hence apply this limit to unprivileged programs only.

Signed-off-by: Alexei Starovoitov <ast@kernel.org>
Acked-by: Daniel Borkmann <daniel@iogearbox.net>
Acked-by: Edward Cree <ecree@solarflare.com>
Signed-off-by: Daniel Borkmann <daniel@iogearbox.net>
2018-12-04 17:22:02 +01:00
Alexei Starovoitov
4f7b3e8258 bpf: improve verifier branch analysis
pathological bpf programs may try to force verifier to explode in
the number of branch states:
  20: (d5) if r1 s<= 0x24000028 goto pc+0
  21: (b5) if r0 <= 0xe1fa20 goto pc+2
  22: (d5) if r1 s<= 0x7e goto pc+0
  23: (b5) if r0 <= 0xe880e000 goto pc+0
  24: (c5) if r0 s< 0x2100ecf4 goto pc+0
  25: (d5) if r1 s<= 0xe880e000 goto pc+1
  26: (c5) if r0 s< 0xf4041810 goto pc+0
  27: (d5) if r1 s<= 0x1e007e goto pc+0
  28: (b5) if r0 <= 0xe86be000 goto pc+0
  29: (07) r0 += 16614
  30: (c5) if r0 s< 0x6d0020da goto pc+0
  31: (35) if r0 >= 0x2100ecf4 goto pc+0

Teach verifier to recognize always taken and always not taken branches.
This analysis is already done for == and != comparison.
Expand it to all other branches.

It also helps real bpf programs to be verified faster:
                       before  after
bpf_lb-DLB_L3.o         2003    1940
bpf_lb-DLB_L4.o         3173    3089
bpf_lb-DUNKNOWN.o       1080    1065
bpf_lxc-DDROP_ALL.o     29584   28052
bpf_lxc-DUNKNOWN.o      36916   35487
bpf_netdev.o            11188   10864
bpf_overlay.o           6679    6643
bpf_lcx_jit.o           39555   38437

Reported-by: Anatoly Trosinenko <anatoly.trosinenko@gmail.com>
Signed-off-by: Alexei Starovoitov <ast@kernel.org>
Acked-by: Daniel Borkmann <daniel@iogearbox.net>
Acked-by: Edward Cree <ecree@solarflare.com>
Signed-off-by: Daniel Borkmann <daniel@iogearbox.net>
2018-12-04 17:22:02 +01:00
Alexei Starovoitov
c3494801cd bpf: check pending signals while verifying programs
Malicious user space may try to force the verifier to use as much cpu
time and memory as possible. Hence check for pending signals
while verifying the program.
Note that suspend of sys_bpf(PROG_LOAD) syscall will lead to EAGAIN,
since the kernel has to release the resources used for program verification.

Reported-by: Anatoly Trosinenko <anatoly.trosinenko@gmail.com>
Signed-off-by: Alexei Starovoitov <ast@kernel.org>
Acked-by: Daniel Borkmann <daniel@iogearbox.net>
Acked-by: Edward Cree <ecree@solarflare.com>
Signed-off-by: Daniel Borkmann <daniel@iogearbox.net>
2018-12-04 17:22:02 +01:00
Ingo Molnar
4bbfd7467c Merge branch 'for-mingo' of git://git.kernel.org/pub/scm/linux/kernel/git/paulmck/linux-rcu into core/rcu
Pull RCU changes from Paul E. McKenney:

- Convert RCU's BUG_ON() and similar calls to WARN_ON() and similar.

- Replace calls of RCU-bh and RCU-sched update-side functions
  to their vanilla RCU counterparts.  This series is a step
  towards complete removal of the RCU-bh and RCU-sched update-side
  functions.

  ( Note that some of these conversions are going upstream via their
    respective maintainers. )

- Documentation updates, including a number of flavor-consolidation
  updates from Joel Fernandes.

- Miscellaneous fixes.

- Automate generation of the initrd filesystem used for
  rcutorture testing.

- Convert spin_is_locked() assertions to instead use lockdep.

  ( Note that some of these conversions are going upstream via their
    respective maintainers. )

- SRCU updates, especially including a fix from Dennis Krein
  for a bag-on-head-class bug.

- RCU torture-test updates.

Signed-off-by: Ingo Molnar <mingo@kernel.org>
2018-12-04 07:52:30 +01:00
Richard Guy Briggs
9a547c7e57 audit: shorten PATH cap values when zero
Since the vast majority of files (99.993% on a typical system) have no
fcaps, display "0" instead of the full zero-padded 16 hex digits in the
two PATH record cap_f* fields to save netlink bandwidth and disk space.

Simply changing the format to %x won't work since the value is two (or
possibly more in the future) 32-bit hexadecimal values concatenated and
bits in higher order values will be misrepresented.

Passes audit-testsuite and userspace tools already work fine.
Please see the github issue tracker for more details
https://github.com/linux-audit/audit-kernel/issues/101

Signed-off-by: Richard Guy Briggs <rgb@redhat.com>
Acked-by: Steve Grubb <sgrubb@redhat.com>
Signed-off-by: Paul Moore <paul@paul-moore.com>
2018-12-03 19:26:10 -05:00
YueHaibing
1e7eacaf1d cpuset: Remove set but not used variable 'cs'
Fixes gcc '-Wunused-but-set-variable' warning:

kernel/cgroup/cpuset.c: In function 'cpuset_cancel_attach':
kernel/cgroup/cpuset.c:2167:17: warning:
 variable 'cs' set but not used [-Wunused-but-set-variable]

It never used since introduction in commit 1f7dd3e5a6 ("cgroup: fix handling
of multi-destination migration from subtree_control enabling")

Signed-off-by: YueHaibing <yuehaibing@huawei.com>
Signed-off-by: Tejun Heo <tj@kernel.org>
2018-12-03 08:23:22 -08:00
Ingo Molnar
dfcb245e28 sched: Fix various typos in comments
Go over the scheduler source code and fix common typos
in comments - and a typo in an actual variable name.

No change in functionality intended.

Cc: Peter Zijlstra <peterz@infradead.org>
Cc: Thomas Gleixner <tglx@linutronix.de>
Cc: Linus Torvalds <torvalds@linux-foundation.org>
Cc: linux-kernel@vger.kernel.org
Signed-off-by: Ingo Molnar <mingo@kernel.org>
2018-12-03 11:55:42 +01:00
Ingo Molnar
989a4222c1 Linux 4.20-rc5
-----BEGIN PGP SIGNATURE-----
 
 iQFSBAABCAA8FiEEq68RxlopcLEwq+PEeb4+QwBBGIYFAlwEZdIeHHRvcnZhbGRz
 QGxpbnV4LWZvdW5kYXRpb24ub3JnAAoJEHm+PkMAQRiGAlQH/19oax2Za3IPqF4X
 DM3lal5M6zlUVkoYstqzpbR3MqUwgEnMfvoeMDC6mI9N4/+r2LkV7cRR8HzqQCCS
 jDfD69IzRGb52VSeJmbOrkxBWsR1Nn0t4Z3rEeLPxwaOoNpRc8H973MbAQ2FKMpY
 S4Y3jIK1dNiRRxdh52NupVkQF+djAUwkBuVk/rrvRJmTDij4la03cuCDAO+Di9lt
 GHlVvygKw2SJhDR+z3ArwZNmE0ceCcE6+W7zPHzj2KeWuKrZg22kfUD454f2YEIw
 FG0hu9qecgtpYCkLSm2vr4jQzmpsDoyq3ZfwhjGrP4qtvPC3Db3vL3dbQnkzUcJu
 JtwhVCE=
 =O1q1
 -----END PGP SIGNATURE-----

Merge tag 'v4.20-rc5' into irq/core, to pick up fixes

Signed-off-by: Ingo Molnar <mingo@kernel.org>
2018-12-03 11:44:00 +01:00
Ingo Molnar
5f675231e4 Linux 4.20-rc5
-----BEGIN PGP SIGNATURE-----
 
 iQFSBAABCAA8FiEEq68RxlopcLEwq+PEeb4+QwBBGIYFAlwEZdIeHHRvcnZhbGRz
 QGxpbnV4LWZvdW5kYXRpb24ub3JnAAoJEHm+PkMAQRiGAlQH/19oax2Za3IPqF4X
 DM3lal5M6zlUVkoYstqzpbR3MqUwgEnMfvoeMDC6mI9N4/+r2LkV7cRR8HzqQCCS
 jDfD69IzRGb52VSeJmbOrkxBWsR1Nn0t4Z3rEeLPxwaOoNpRc8H973MbAQ2FKMpY
 S4Y3jIK1dNiRRxdh52NupVkQF+djAUwkBuVk/rrvRJmTDij4la03cuCDAO+Di9lt
 GHlVvygKw2SJhDR+z3ArwZNmE0ceCcE6+W7zPHzj2KeWuKrZg22kfUD454f2YEIw
 FG0hu9qecgtpYCkLSm2vr4jQzmpsDoyq3ZfwhjGrP4qtvPC3Db3vL3dbQnkzUcJu
 JtwhVCE=
 =O1q1
 -----END PGP SIGNATURE-----

Merge tag 'v4.20-rc5' into sched/core, to pick up fixes

Signed-off-by: Ingo Molnar <mingo@kernel.org>
2018-12-03 11:42:17 +01:00
Ingo Molnar
fca0c11650 perf: Fix typos in comments
Fix two typos in kernel/events/*.

No change in functionality intended.

Cc: Peter Zijlstra <peterz@infradead.org>
Cc: Thomas Gleixner <tglx@linutronix.de>
Cc: Linus Torvalds <torvalds@linux-foundation.org>
Cc: linux-kernel@vger.kernel.org
Signed-off-by: Ingo Molnar <mingo@kernel.org>
2018-12-03 11:22:32 +01:00
Martin KaFai Lau
5482e9a93c bpf: Fix memleak in aux->func_info and aux->btf
The aux->func_info and aux->btf are leaked in the error out cases
during bpf_prog_load().  This patch fixes it.

Fixes: ba64e7d852 ("bpf: btf: support proper non-jit func info")
Cc: Yonghong Song <yhs@fb.com>
Signed-off-by: Martin KaFai Lau <kafai@fb.com>
Acked-by: Yonghong Song <yhs@fb.com>
Signed-off-by: Alexei Starovoitov <ast@kernel.org>
2018-12-02 09:12:58 -08:00
Paul E. McKenney
5ac7cdc298 rcutorture: Don't do busted forward-progress testing
The "busted" rcutorture type is an intentionally broken implementation
of RCU.  Doing forward-progress testing on this implementation is not
particularly meaningful on the one hand and can result in fatal abuse
of the memory allocator on the other.  This commit therefore disables
forward-progress testing of the "busted" rcutorture type.

Reported-by: kernel test robot <lkp@intel.com>
Signed-off-by: Paul E. McKenney <paulmck@linux.ibm.com>
2018-12-01 12:45:42 -08:00
Paul E. McKenney
2e57bf97a6 rcutorture: Use 100ms buckets for forward-progress callback histograms
This commit narrows the scope of each bucket of the forward-progress
callback-invocation histograms from one second to 100 milliseconds, which
aids debugging of forward-progress problems by making shorter-duration
callback-invocation stalls visible.

Signed-off-by: Paul E. McKenney <paulmck@linux.ibm.com>
2018-12-01 12:45:41 -08:00
Paul E. McKenney
2667ccce93 rcutorture: Recover from OOM during forward-progress tests
This commit causes the OOM handler to do rcu_barrier() calls and to
free up forward-progress callbacks in order to recover from OOM events.
The current test is terminated, but subsequent forward-progress tests can
proceed.  This allows a long test to result in multiple forward-progress
failures, greatly reducing the required testing time.

Signed-off-by: Paul E. McKenney <paulmck@linux.ibm.com>
2018-12-01 12:45:41 -08:00
Paul E. McKenney
73d665b141 rcutorture: Print forward-progress test age upon failure
This commit prints the age of the forward-progress test in jiffies,
in order to allow better interpretation of the callback-invocation
histograms.

Signed-off-by: Paul E. McKenney <paulmck@linux.ibm.com>
2018-12-01 12:45:40 -08:00
Paul E. McKenney
c51d7b5e6c rcutorture: Print time since GP end upon forward-progress failure
If rcutorture's forward-progress tests fail while a grace period is not
in progress, it is useful to print the time since the last grace period
ended as a way to detect failure to launch a new grace period.  This
commit therefore makes this change.

Signed-off-by: Paul E. McKenney <paulmck@linux.ibm.com>
2018-12-01 12:45:40 -08:00
Paul E. McKenney
1a682754c7 rcutorture: Print histogram of CB invocation at OOM time
One reason why a forward-progress test might fail would be if something
prevented or delayed callback invocation.  This commit therefore adds a
callback-invocation histogram printout when OOM is reported to rcutorture.

Signed-off-by: Paul E. McKenney <paulmck@linux.ibm.com>
2018-12-01 12:45:39 -08:00
Paul E. McKenney
8dd3b54689 rcutorture: Print GP age upon forward-progress failure
Signed-off-by: Paul E. McKenney <paulmck@linux.ibm.com>
2018-12-01 12:45:38 -08:00
Paul E. McKenney
bfcfcffc5f rcu: Print per-CPU callback counts for forward-progress failures
This commit prints out the non-zero per-CPU callback counts when a
forware-progress error (OOM event) occurs.

Signed-off-by: Paul E. McKenney <paulmck@linux.ibm.com>
[ paulmck: Fix a pair of uninitialized locals spotted by kbuild test robot. ]
2018-12-01 12:45:38 -08:00
Paul E. McKenney
903ee83d91 rcu: Account for nocb-CPU callback counts in RCU CPU stall warnings
The RCU CPU stall warnings print an estimate of the total number of
RCU callbacks queued in the system, but this estimate leaves out
the callbacks queued for nocbs CPUs.  This commit therefore introduces
rcu_get_n_cbs_cpu(), which gives an accurate callback estimate for
both nocbs and normal CPUs, and uses this new function as needed.

This commit also introduces a rcu_get_n_cbs_nocb_cpu() helper function
that returns the number of callbacks for nocbs CPUs or zero otherwise,
and also uses this function in place of direct access to ->nocb_q_count
while in the area (fewer characters, you see).

Signed-off-by: Paul E. McKenney <paulmck@linux.ibm.com>
2018-12-01 12:45:37 -08:00
Paul E. McKenney
e0aff97355 rcutorture: Dump grace-period diagnostics upon forward-progress OOM
This commit adds an OOM notifier during rcutorture forward-progress
testing.  If this notifier is invoked, it dumps out some grace-period
state to help debug the forward-progress problem.

Signed-off-by: Paul E. McKenney <paulmck@linux.ibm.com>
2018-12-01 12:45:36 -08:00
Paul E. McKenney
61670adcb4 rcutorture: Prepare for asynchronous access to rcu_fwd_startat
Because rcutorture's forward-progress checking will trigger from an
OOM notifier, this notifier will introduce asynchronous concurrent
access to the rcu_fwd_startat variable.  This commit therefore prepares
for this by converting updates to WRITE_ONCE().

Signed-off-by: Paul E. McKenney <paulmck@linux.ibm.com>
2018-12-01 12:45:35 -08:00
Pierce Griffiths
2a7d968816 torture: Remove unnecessary "ret" variables
Remove return variables (declared as "ret") in cases where,
depending on whether a condition evaluates as true, the result of a
function call can be immediately returned instead of storing the result in
the return variable. When the condition evaluates as false, the constant
initially stored in the return variable at declaration is returned instead.

Signed-off-by: Pierce Griffiths <pierceagriffiths@gmail.com>
Signed-off-by: Paul E. McKenney <paulmck@linux.ibm.com>
2018-12-01 12:45:35 -08:00
Paul E. McKenney
5ab7ab8362 rcutorture: Affinity forward-progress test to avoid housekeeping CPUs
This commit affinities the forward-progress tests to avoid hogging a
housekeeping CPU on the theory that the offloaded callbacks will be
running on those housekeeping CPUs.

Signed-off-by: Paul E. McKenney <paulmck@linux.ibm.com>
[ paulmck: Fix NULL-pointer issue located by kbuild test robot. ]
Tested-by: Rong Chen <rong.a.chen@intel.com>
2018-12-01 12:45:34 -08:00
Paul E. McKenney
6b3de7a172 rcutorture: Break up too-long rcu_torture_fwd_prog() function
This commit splits rcu_torture_fwd_prog_nr() and rcu_torture_fwd_prog_cr()
functions out of rcu_torture_fwd_prog() in order to reduce indentation
pain and because rcu_torture_fwd_prog() was getting a bit too long.
In addition, this will enable easier conditional execution of the
rcu_torture_fwd_prog_cr() function, which can give false-positive
failures in some NO_HZ_FULL configurations due to overloading the
housekeeping CPUs.

Signed-off-by: Paul E. McKenney <paulmck@linux.vnet.ibm.com>
2018-12-01 12:45:34 -08:00
Paul E. McKenney
fc6f9c5778 rcutorture: Remove cbflood facility
Now that the forward-progress code does a full-bore continuous callback
flood lasting multiple seconds, there is little point in also posting a
mere 60,000 callbacks every second or so.  This commit therefore removes
the old cbflood testing.  Over time, it may be desirable to concurrently
do full-bore continuous callback floods on all CPUs simultaneously, but
one dragon at a time.

Signed-off-by: Paul E. McKenney <paulmck@linux.vnet.ibm.com>
2018-12-01 12:45:33 -08:00
Paul E. McKenney
28cf5952f5 torture: Bring any extra CPUs online during kernel startup
Currently, the torture scripts rely on the initrd/init script to bring
any extra CPUs online, for example, in the case where the kernel and
qemu have different ideas about how many CPUs are present.  This works,
but is an unnecessary dependency on initrd, which needs to vary depending
on the distro.  This commit therefore causes torture_onoff() to check
for additional CPUs, attempting to bring any found online. Errors are
ignored, just as they are by the initrd/init script.

Signed-off-by: Paul E. McKenney <paulmck@linux.vnet.ibm.com>
2018-12-01 12:45:32 -08:00
Paul E. McKenney
4871848531 rcutorture: Add call_rcu() flooding forward-progress tests
This commit adds a call_rcu() flooding loop to the forward-progress test.
This emulates tight userspace loops that force call_rcu() invocations,
for example, the infamous loop containing close(open()) that instigated
the addition of blimit.  If RCU does not make sufficient forward progress
in invoking the resulting flood of callbacks, rcutorture emits a warning.

Signed-off-by: Paul E. McKenney <paulmck@linux.vnet.ibm.com>
2018-12-01 12:45:32 -08:00
Paul E. McKenney
eaaf055f27 Merge branches 'bug.2018.11.12a', 'consolidate.2018.12.01a', 'doc.2018.11.12a', 'fixes.2018.11.12a', 'initrd.2018.11.08b', 'sil.2018.11.12a' and 'srcu.2018.11.27a' into HEAD
bug.2018.11.12a:  Get rid of BUG_ON() and friends
consolidate.2018.12.01a:  Continued RCU flavor-consolidation cleanup
doc.2018.11.12a:  Documentation updates
fixes.2018.11.12a:  Miscellaneous fixes
initrd.2018.11.08b:  Automate creation of rcutorture initrd
sil.2018.11.12a:  Remove more spin_unlock_wait() calls
2018-12-01 12:43:16 -08:00
Paul E. McKenney
6932689e41 livepatch: Replace synchronize_sched() with synchronize_rcu()
Now that synchronize_rcu() waits for preempt-disable regions of code
as well as RCU read-side critical sections, synchronize_sched() can be
replaced by synchronize_rcu().  This commit therefore makes this change,
even though it is but a comment.

Signed-off-by: Paul E. McKenney <paulmck@linux.ibm.com>
2018-12-01 12:38:50 -08:00
Paul E. McKenney
2af3024cd7 cgroups: Replace synchronize_sched() with synchronize_rcu()
Now that synchronize_rcu() waits for preempt-disable regions of code
as well as RCU read-side critical sections, synchronize_sched() can be
replaced by synchronize_rcu().  This commit therefore makes this change,
even though it is but a comment.

Signed-off-by: Paul E. McKenney <paulmck@linux.ibm.com>
Cc: Jens Axboe <axboe@kernel.dk>
Cc: Dennis Zhou <dennis@kernel.org>
Cc: Johannes Weiner <hannes@cmpxchg.org>
Cc: "Dennis Zhou (Facebook)" <dennisszhou@gmail.com>
Acked-by: Tejun Heo <tj@kernel.org>
2018-12-01 12:38:49 -08:00
Linus Torvalds
4b78317679 Merge branch 'x86-pti-for-linus' of git://git.kernel.org/pub/scm/linux/kernel/git/tip/tip
Pull STIBP fallout fixes from Thomas Gleixner:
 "The performance destruction department finally got it's act together
  and came up with a cure for the STIPB regression:

   - Provide a command line option to control the spectre v2 user space
     mitigations. Default is either seccomp or prctl (if seccomp is
     disabled in Kconfig). prctl allows mitigation opt-in, seccomp
     enables the migitation for sandboxed processes.

   - Rework the code to handle the conditional STIBP/IBPB control and
     remove the now unused ptrace_may_access_sched() optimization
     attempt

   - Disable STIBP automatically when SMT is disabled

   - Optimize the switch_to() logic to avoid MSR writes and invocations
     of __switch_to_xtra().

   - Make the asynchronous speculation TIF updates synchronous to
     prevent stale mitigation state.

  As a general cleanup this also makes retpoline directly depend on
  compiler support and removes the 'minimal retpoline' option which just
  pretended to provide some form of security while providing none"

* 'x86-pti-for-linus' of git://git.kernel.org/pub/scm/linux/kernel/git/tip/tip: (31 commits)
  x86/speculation: Provide IBPB always command line options
  x86/speculation: Add seccomp Spectre v2 user space protection mode
  x86/speculation: Enable prctl mode for spectre_v2_user
  x86/speculation: Add prctl() control for indirect branch speculation
  x86/speculation: Prepare arch_smt_update() for PRCTL mode
  x86/speculation: Prevent stale SPEC_CTRL msr content
  x86/speculation: Split out TIF update
  ptrace: Remove unused ptrace_may_access_sched() and MODE_IBRS
  x86/speculation: Prepare for conditional IBPB in switch_mm()
  x86/speculation: Avoid __switch_to_xtra() calls
  x86/process: Consolidate and simplify switch_to_xtra() code
  x86/speculation: Prepare for per task indirect branch speculation control
  x86/speculation: Add command line control for indirect branch speculation
  x86/speculation: Unify conditional spectre v2 print functions
  x86/speculataion: Mark command line parser data __initdata
  x86/speculation: Mark string arrays const correctly
  x86/speculation: Reorder the spec_v2 code
  x86/l1tf: Show actual SMT state
  x86/speculation: Rework SMT state change
  sched/smt: Expose sched_smt_present static key
  ...
2018-12-01 12:35:48 -08:00
Christoph Hellwig
e440e26a02 dma-remap: support DMA_ATTR_NO_KERNEL_MAPPING
Do not waste vmalloc space on allocations that do not require a mapping
into the kernel address space.

Signed-off-by: Christoph Hellwig <hch@lst.de>
Reviewed-by: Robin Murphy <robin.murphy@arm.com>
2018-12-01 18:07:14 +01:00
Christoph Hellwig
bfd56cd605 dma-mapping: support highmem in the generic remap allocator
By using __dma_direct_alloc_pages we can deal entirely with struct page
instead of having to derive a kernel virtual address.

Signed-off-by: Christoph Hellwig <hch@lst.de>
Reviewed-by: Robin Murphy <robin.murphy@arm.com>
2018-12-01 18:07:14 +01:00
Christoph Hellwig
0c3b3171ce dma-mapping: move the arm64 noncoherent alloc/free support to common code
The arm64 codebase to implement coherent dma allocation for architectures
with non-coherent DMA is a good start for a generic implementation, given
that is uses the generic remap helpers, provides the atomic pool for
allocations that can't sleep and still is realtively simple and well
tested.  Move it to kernel/dma and allow architectures to opt into it
using a config symbol.  Architectures just need to provide a new
arch_dma_prep_coherent helper to writeback an invalidate the caches
for any memory that gets remapped for uncached access.

Signed-off-by: Christoph Hellwig <hch@lst.de>
Reviewed-by: Will Deacon <will.deacon@arm.com>
Reviewed-by: Robin Murphy <robin.murphy@arm.com>
2018-12-01 18:07:11 +01:00
Christoph Hellwig
f0edfea8ef dma-mapping: move the remap helpers to a separate file
The dma remap code only makes sense for not cache coherent architectures
(or possibly the corner case of highmem CMA allocations) and currently
is only used by arm, arm64, csky and xtensa.  Split it out into a
separate file with a separate Kconfig symbol, which gets the right
copyright notice given that this code was written by Laura Abbott
working for Code Aurora at that point.

Signed-off-by: Christoph Hellwig <hch@lst.de>
Acked-by: Laura Abbott <labbott@redhat.com>
Reviewed-by: Robin Murphy <robin.murphy@arm.com>
2018-12-01 17:58:34 +01:00
Christoph Hellwig
704f2c20ea dma-direct: reject highmem pages from dma_alloc_from_contiguous
dma_alloc_from_contiguous can return highmem pages depending on the
setup, which a plain non-remapping DMA allocator can't handle.  Detect
this case and fail the allocation.

Signed-off-by: Christoph Hellwig <hch@lst.de>
Reviewed-by: Robin Murphy <robin.murphy@arm.com>
2018-12-01 17:56:08 +01:00
Christoph Hellwig
b18814e767 dma-direct: provide page based alloc/free helpers
Some architectures support remapping highmem into DMA coherent
allocations.  To use the common code for them we need variants of
dma_direct_{alloc,free}_pages that do not use kernel virtual addresses.

Signed-off-by: Christoph Hellwig <hch@lst.de>
Reviewed-by: Robin Murphy <robin.murphy@arm.com>
2018-12-01 17:55:02 +01:00
David Miller
e9ee9efc0d bpf: Add BPF_F_ANY_ALIGNMENT.
Often we want to write tests cases that check things like bad context
offset accesses.  And one way to do this is to use an odd offset on,
for example, a 32-bit load.

This unfortunately triggers the alignment checks first on platforms
that do not set CONFIG_EFFICIENT_UNALIGNED_ACCESS.  So the test
case see the alignment failure rather than what it was testing for.

It is often not completely possible to respect the original intention
of the test, or even test the same exact thing, while solving the
alignment issue.

Another option could have been to check the alignment after the
context and other validations are performed by the verifier, but
that is a non-trivial change to the verifier.

Signed-off-by: David S. Miller <davem@davemloft.net>
Signed-off-by: Alexei Starovoitov <ast@kernel.org>
2018-11-30 21:38:48 -08:00
Linus Torvalds
d8f190ee83 Merge branch 'akpm' (patches from Andrew)
Merge misc fixes from Andrew Morton:
 "31 fixes"

* emailed patches from Andrew Morton <akpm@linux-foundation.org>: (31 commits)
  ocfs2: fix potential use after free
  mm/khugepaged: fix the xas_create_range() error path
  mm/khugepaged: collapse_shmem() do not crash on Compound
  mm/khugepaged: collapse_shmem() without freezing new_page
  mm/khugepaged: minor reorderings in collapse_shmem()
  mm/khugepaged: collapse_shmem() remember to clear holes
  mm/khugepaged: fix crashes due to misaccounted holes
  mm/khugepaged: collapse_shmem() stop if punched or truncated
  mm/huge_memory: fix lockdep complaint on 32-bit i_size_read()
  mm/huge_memory: splitting set mapping+index before unfreeze
  mm/huge_memory: rename freeze_page() to unmap_page()
  initramfs: clean old path before creating a hardlink
  kernel/kcov.c: mark funcs in __sanitizer_cov_trace_pc() as notrace
  psi: make disabling/enabling easier for vendor kernels
  proc: fixup map_files test on arm
  debugobjects: avoid recursive calls with kmemleak
  userfaultfd: shmem: UFFDIO_COPY: set the page dirty if VM_WRITE is not set
  userfaultfd: shmem: add i_size checks
  userfaultfd: shmem/hugetlbfs: only allow to register VM_MAYWRITE vmas
  userfaultfd: shmem: allocate anonymous memory for MAP_PRIVATE shmem
  ...
2018-11-30 18:45:49 -08:00
Linus Torvalds
1f817429b2 stackleak plugin fix
- Fix crash by not allowing kprobing of stackleak_erase() (Alexander Popov)
 -----BEGIN PGP SIGNATURE-----
 Comment: Kees Cook <kees@outflux.net>
 
 iQJKBAABCgA0FiEEpcP2jyKd1g9yPm4TiXL039xtwCYFAlwBcL4WHGtlZXNjb29r
 QGNocm9taXVtLm9yZwAKCRCJcvTf3G3AJjEiD/9WErdSde1/jka9VX9SPdMFush9
 MCb1y0TpS+xlhF7Jyhxz9AJIRVwY0hXSO40RinYPJ4zgxXg+PjVJU/q2HhtGC4u1
 ZdUAblkw1WLEmYjKLuwyLcZmCAlodBpm+hMlYubAD2mfArtX4vLElcMd4livTz7x
 E4lJ/5WKVQzfadJ/onpltPxgQDVrov7u+uatrL7uFfb2geiCNV4lgeBqoJSx2QJd
 i8B7a28uifLS+ph5TJDx39nw7GJ4Iy4nnJofdX23kgKzfei5wef1mwdoaC6x93QL
 rrmhp3T4Mghnxi+txx+u8Bw7VxMiUnmlAoHQQcBQ0oLDMlTY+CQW3o32kvk+oHip
 5onLfQpHgn4kjd+Ns/qFFxr2oHYU0ODbEQ4taVXBu/f0jj0blQdEgLPWVwNpWS7l
 DoFezQ6qNjy1UJlbtg8t7dq1vUF2DTkaRj9JMPCqCBDX+zJrUZpOlCBoB+l4lC6e
 wJSh7m6QCNVh0fmIzctMenkio37rf2rEIp+l0/BMxFz0KsvlEBthb1x5dAlY6RK9
 Wq52oOLzduWTw3683lBnRYBFbsrDo9Y6LmY5wwkFFqx3TGZtcnomUtIqep8AE7YJ
 SH5QtrUUZKPLbPfHTr/k2G75daaJu8TIZ8cgmRmNe+o/LzH2u3IEnfYVteCxeeyj
 Bekcq9lWU5AVm4jNuQ==
 =OU+C
 -----END PGP SIGNATURE-----

Merge tag 'gcc-plugins-v4.20-rc5' of git://git.kernel.org/pub/scm/linux/kernel/git/kees/linux

Pull stackleak plugin fix from Kees Cook:
 "Fix crash by not allowing kprobing of stackleak_erase() (Alexander
  Popov)"

* tag 'gcc-plugins-v4.20-rc5' of git://git.kernel.org/pub/scm/linux/kernel/git/kees/linux:
  stackleak: Disable function tracing and kprobes for stackleak_erase()
2018-11-30 18:36:30 -08:00