bianbu-linux-6.6/kernel
Petr Pavlu af3274905b ring-buffer: Fix a race between readers and resize checks
commit c2274b908db05529980ec056359fae916939fdaa upstream.

The reader code in rb_get_reader_page() swaps a new reader page into the
ring buffer by doing cmpxchg on old->list.prev->next to point it to the
new page. Following that, if the operation is successful,
old->list.next->prev gets updated too. This means the underlying
doubly-linked list is temporarily inconsistent, page->prev->next or
page->next->prev might not be equal back to page for some page in the
ring buffer.

The resize operation in ring_buffer_resize() can be invoked in parallel.
It calls rb_check_pages() which can detect the described inconsistency
and stop further tracing:

[  190.271762] ------------[ cut here ]------------
[  190.271771] WARNING: CPU: 1 PID: 6186 at kernel/trace/ring_buffer.c:1467 rb_check_pages.isra.0+0x6a/0xa0
[  190.271789] Modules linked in: [...]
[  190.271991] Unloaded tainted modules: intel_uncore_frequency(E):1 skx_edac(E):1
[  190.272002] CPU: 1 PID: 6186 Comm: cmd.sh Kdump: loaded Tainted: G            E      6.9.0-rc6-default #5 158d3e1e6d0b091c34c3b96bfd99a1c58306d79f
[  190.272011] Hardware name: QEMU Standard PC (Q35 + ICH9, 2009), BIOS rel-1.16.0-0-gd239552c-rebuilt.opensuse.org 04/01/2014
[  190.272015] RIP: 0010:rb_check_pages.isra.0+0x6a/0xa0
[  190.272023] Code: [...]
[  190.272028] RSP: 0018:ffff9c37463abb70 EFLAGS: 00010206
[  190.272034] RAX: ffff8eba04b6cb80 RBX: 0000000000000007 RCX: ffff8eba01f13d80
[  190.272038] RDX: ffff8eba01f130c0 RSI: ffff8eba04b6cd00 RDI: ffff8eba0004c700
[  190.272042] RBP: ffff8eba0004c700 R08: 0000000000010002 R09: 0000000000000000
[  190.272045] R10: 00000000ffff7f52 R11: ffff8eba7f600000 R12: ffff8eba0004c720
[  190.272049] R13: ffff8eba00223a00 R14: 0000000000000008 R15: ffff8eba067a8000
[  190.272053] FS:  00007f1bd64752c0(0000) GS:ffff8eba7f680000(0000) knlGS:0000000000000000
[  190.272057] CS:  0010 DS: 0000 ES: 0000 CR0: 0000000080050033
[  190.272061] CR2: 00007f1bd6662590 CR3: 000000010291e001 CR4: 0000000000370ef0
[  190.272070] DR0: 0000000000000000 DR1: 0000000000000000 DR2: 0000000000000000
[  190.272073] DR3: 0000000000000000 DR6: 00000000fffe0ff0 DR7: 0000000000000400
[  190.272077] Call Trace:
[  190.272098]  <TASK>
[  190.272189]  ring_buffer_resize+0x2ab/0x460
[  190.272199]  __tracing_resize_ring_buffer.part.0+0x23/0xa0
[  190.272206]  tracing_resize_ring_buffer+0x65/0x90
[  190.272216]  tracing_entries_write+0x74/0xc0
[  190.272225]  vfs_write+0xf5/0x420
[  190.272248]  ksys_write+0x67/0xe0
[  190.272256]  do_syscall_64+0x82/0x170
[  190.272363]  entry_SYSCALL_64_after_hwframe+0x76/0x7e
[  190.272373] RIP: 0033:0x7f1bd657d263
[  190.272381] Code: [...]
[  190.272385] RSP: 002b:00007ffe72b643f8 EFLAGS: 00000246 ORIG_RAX: 0000000000000001
[  190.272391] RAX: ffffffffffffffda RBX: 0000000000000002 RCX: 00007f1bd657d263
[  190.272395] RDX: 0000000000000002 RSI: 0000555a6eb538e0 RDI: 0000000000000001
[  190.272398] RBP: 0000555a6eb538e0 R08: 000000000000000a R09: 0000000000000000
[  190.272401] R10: 0000555a6eb55190 R11: 0000000000000246 R12: 00007f1bd6662500
[  190.272404] R13: 0000000000000002 R14: 00007f1bd6667c00 R15: 0000000000000002
[  190.272412]  </TASK>
[  190.272414] ---[ end trace 0000000000000000 ]---

Note that ring_buffer_resize() calls rb_check_pages() only if the parent
trace_buffer has recording disabled. Recent commit d78ab792705c
("tracing: Stop current tracer when resizing buffer") causes that it is
now always the case which makes it more likely to experience this issue.

The window to hit this race is nonetheless very small. To help
reproducing it, one can add a delay loop in rb_get_reader_page():

 ret = rb_head_page_replace(reader, cpu_buffer->reader_page);
 if (!ret)
 	goto spin;
 for (unsigned i = 0; i < 1U << 26; i++)  /* inserted delay loop */
 	__asm__ __volatile__ ("" : : : "memory");
 rb_list_head(reader->list.next)->prev = &cpu_buffer->reader_page->list;

.. and then run the following commands on the target system:

 echo 1 > /sys/kernel/tracing/events/sched/sched_switch/enable
 while true; do
 	echo 16 > /sys/kernel/tracing/buffer_size_kb; sleep 0.1
 	echo 8 > /sys/kernel/tracing/buffer_size_kb; sleep 0.1
 done &
 while true; do
 	for i in /sys/kernel/tracing/per_cpu/*; do
 		timeout 0.1 cat $i/trace_pipe; sleep 0.2
 	done
 done

To fix the problem, make sure ring_buffer_resize() doesn't invoke
rb_check_pages() concurrently with a reader operating on the same
ring_buffer_per_cpu by taking its cpu_buffer->reader_lock.

Link: https://lore.kernel.org/linux-trace-kernel/20240517134008.24529-3-petr.pavlu@suse.com

Cc: stable@vger.kernel.org
Cc: Masami Hiramatsu <mhiramat@kernel.org>
Cc: Mathieu Desnoyers <mathieu.desnoyers@efficios.com>
Fixes: 659f451ff2 ("ring-buffer: Add integrity check at end of iter read")
Signed-off-by: Petr Pavlu <petr.pavlu@suse.com>
[ Fixed whitespace ]
Signed-off-by: Steven Rostedt (Google) <rostedt@goodmis.org>
Signed-off-by: Greg Kroah-Hartman <gregkh@linuxfoundation.org>
2024-06-12 11:11:19 +02:00
..
bpf bpf: Check bloom filter map value size 2024-05-17 12:02:11 +02:00
cgroup cgroup/cpuset: Fix retval in update_cpumask() 2024-04-03 15:28:40 +02:00
configs Kbuild updates for v6.6 2023-09-05 11:01:47 -07:00
debug kdb: Fix a potential buffer overflow in kdb_local() 2024-01-25 15:36:00 -08:00
dma swiotlb: initialise restricted pool list_head when SWIOTLB_DYNAMIC=y 2024-05-17 12:02:08 +02:00
entry entry: Respect changes to system call number by trace_sys_enter() 2024-04-03 15:28:50 +02:00
events perf: Fix the nr_addr_filters fix 2024-02-05 20:14:33 +00:00
futex futex: Don't include process MM in futex key on no-MMU 2023-11-20 11:58:53 +01:00
gcov gcov: shut up missing prototype warnings for internal stubs 2023-08-18 10:18:58 -07:00
irq genirq: Initialize resend_node hlist for all interrupt descriptors 2024-01-31 16:19:13 -08:00
kcsan kcsan: Don't expect 64 bits atomic builtins from 32 bits architectures 2023-06-09 23:29:50 +10:00
livepatch livepatch: Fix missing newline character in klp_resolve_symbols() 2023-11-20 11:59:25 +01:00
locking lockdep: Fix block chain corruption 2023-12-03 07:33:06 +01:00
module modules: wait do_free_init correctly 2024-03-26 18:19:55 -04:00
power PM: s2idle: Make sure CPUs will wakeup directly on resume 2024-04-17 11:19:26 +02:00
printk printk: For @suppress_panic_printk check for other CPU in panic 2024-04-13 13:07:29 +02:00
rcu rcu/nocb: Fix WARN_ON_ONCE() in the rcu_nocb_bypass_lock() 2024-04-13 13:07:34 +02:00
sched sched/eevdf: Prevent vlag from going out of bounds in reweight_eevdf() 2024-05-02 16:32:49 +02:00
time timekeeping: Fix cross-timestamp interpolation for non-x86 2024-03-26 18:19:19 -04:00
trace ring-buffer: Fix a race between readers and resize checks 2024-06-12 11:11:19 +02:00
.gitignore
acct.c audit/stable-6.6 PR 20230829 2023-08-30 08:17:35 -07:00
async.c async: Introduce async_schedule_dev_nocall() 2024-01-31 16:18:49 -08:00
audit.c audit: Send netlink ACK before setting connection in auditd_set 2024-02-05 20:14:14 +00:00
audit.h audit: correct audit_filter_inodes() definition 2023-07-21 12:17:25 -04:00
audit_fsnotify.c
audit_tree.c
audit_watch.c audit: don't WARN_ON_ONCE(!current->mm) in audit_exe_compare() 2023-11-28 17:19:56 +00:00
auditfilter.c audit: move trailing statements to next line 2023-08-15 18:16:14 -04:00
auditsc.c audit,io_uring: io_uring openat triggers audit reference count underflow 2023-10-13 18:34:46 +02:00
backtracetest.c
bounds.c bounds: Use the right number of bits for power-of-two CONFIG_NR_CPUS 2024-05-02 16:32:50 +02:00
capability.c lsm: constify the 'target' parameter in security_capget() 2023-08-08 16:48:47 -04:00
cfi.c
compat.c sched_getaffinity: don't assume 'cpumask_size()' is fully initialized 2023-03-14 19:32:38 -07:00
configs.c
context_tracking.c locking/atomic: treewide: use raw_atomic*_<op>() 2023-06-05 09:57:20 +02:00
cpu.c cpu: Re-enable CPU mitigations by default for !X86 architectures 2024-05-02 16:32:44 +02:00
cpu_pm.c cpuidle, cpu_pm: Remove RCU fiddling from cpu_pm_{enter,exit}() 2023-01-13 11:48:15 +01:00
crash_core.c mm: turn folio_test_hugetlb into a PageType 2024-05-02 16:32:47 +02:00
crash_dump.c
cred.c cred: get rid of CONFIG_DEBUG_CREDENTIALS 2023-12-20 17:01:51 +01:00
delayacct.c delayacct: track delays from IRQ/SOFTIRQ 2023-04-18 16:39:34 -07:00
dma.c
exec_domain.c
exit.c cred: get rid of CONFIG_DEBUG_CREDENTIALS 2023-12-20 17:01:51 +01:00
extable.c
fail_function.c kernel/fail_function: fix memory leak with using debugfs_lookup() 2023-02-08 13:36:22 +01:00
fork.c fork: defer linking file vma until vma is fully initialized 2024-05-02 16:32:42 +02:00
freezer.c
gen_kheaders.sh Revert "kheaders: substituting --sort in archive creation" 2023-05-28 16:20:21 +09:00
groups.c
hung_task.c kernel/hung_task.c: set some hung_task.c variables storage-class-specifier to static 2023-04-08 13:45:37 -07:00
iomem.c kernel/iomem.c: remove __weak ioremap_cache helper 2023-08-21 13:37:28 -07:00
irq_work.c trace: Add trace_ipi_send_cpu() 2023-03-24 11:01:29 +01:00
jump_label.c jump_label: Prevent key->enabled int overflow 2022-12-01 15:53:05 -08:00
kallsyms.c kallsyms: Change func signature for cleanup_symbol_name() 2023-08-25 15:00:36 -07:00
kallsyms_internal.h
kallsyms_selftest.c Modules changes for v6.6-rc1 2023-08-29 17:32:32 -07:00
kallsyms_selftest.h
kcmp.c
Kconfig.freezer
Kconfig.hz
Kconfig.kexec kexec: select CRYPTO from KEXEC_FILE instead of depending on it 2024-01-05 15:19:41 +01:00
Kconfig.locks
Kconfig.preempt
kcov.c kcov: add prototypes for helper functions 2023-06-09 17:44:17 -07:00
kexec.c kernel: kexec: copy user-array safely 2023-11-28 17:19:40 +00:00
kexec_core.c kexec: do syscore_shutdown() in kernel_kexec 2024-01-31 16:18:56 -08:00
kexec_elf.c
kexec_file.c integrity-v6.6 2023-08-30 09:16:56 -07:00
kexec_internal.h
kheaders.c kheaders: Use array declaration instead of char 2023-03-24 20:10:59 -07:00
kprobes.c kprobes: Fix possible use-after-free issue on kprobe registration 2024-04-17 11:19:34 +02:00
ksyms_common.c kallsyms: make kallsyms_show_value() as generic function 2023-06-08 12:27:20 -07:00
ksysfs.c crash: hotplug support for kexec_load() 2023-08-24 16:25:14 -07:00
kthread.c kthread: unexport __kthread_should_park() 2023-08-18 10:18:59 -07:00
latencytop.c
Makefile v6.5-rc1-modules-next 2023-06-28 15:51:08 -07:00
module_signature.c
notifier.c notifiers: add tracepoints to the notifiers infrastructure 2023-04-08 13:45:38 -07:00
nsproxy.c nsproxy: Convert nsproxy.count to refcount_t 2023-08-21 11:29:12 -07:00
padata.c crypto: pcrypt - Fix hungtask for PADATA_RESET 2023-11-28 17:19:42 +00:00
panic.c panic: Flush kernel log buffer at the end 2024-04-13 13:07:29 +02:00
params.c kernel: params: Remove unnecessary ‘0’ values from err 2023-07-10 12:47:01 -07:00
pid.c pidfd: prevent a kernel-doc warning 2023-09-19 13:21:33 -07:00
pid_namespace.c memfd: replace ratcheting feature from vm.memfd_noexec with hierarchy 2023-08-21 13:37:59 -07:00
pid_sysctl.h memfd: replace ratcheting feature from vm.memfd_noexec with hierarchy 2023-08-21 13:37:59 -07:00
profile.c
ptrace.c ptrace: Provide set/get interface for syscall user dispatch 2023-04-16 14:23:07 +02:00
range.c
reboot.c kernel/reboot: emergency_restart: Set correct system_state 2023-11-28 17:20:04 +00:00
regset.c
relay.c kernel: relay: remove unnecessary NULL values from relay_open_buf 2023-08-18 10:18:55 -07:00
resource.c kernel/resource: Increment by align value in get_free_mem_region() 2024-01-10 17:16:58 +01:00
resource_kunit.c
rseq.c rseq: Extend struct rseq with per-memory-map concurrency ID 2022-12-27 12:52:12 +01:00
scftorture.c scftorture: Pause testing after memory-allocation failure 2023-07-14 15:02:57 -07:00
scs.c
seccomp.c seccomp: Add missing kerndoc notations 2023-08-17 12:32:15 -07:00
signal.c signal: print comm and exe name on fatal signals 2023-08-18 10:18:50 -07:00
smp.c smp,csd: Throw an error if a CSD lock is stuck for too long 2023-11-28 17:19:36 +00:00
smpboot.c cpu/hotplug: Remove unused state functions 2023-05-15 13:45:00 +02:00
smpboot.h
softirq.c sched/core: introduce sched_core_idle_cpu() 2023-07-13 15:21:50 +02:00
stackleak.c stackleak: allow to specify arch specific stackleak poison function 2023-04-20 11:36:35 +02:00
stacktrace.c
static_call.c
static_call_inline.c
stop_machine.c
sys.c prctl: generalize PR_SET_MDWE support check to be per-arch 2024-04-03 15:28:54 +02:00
sys_ni.c posix-timers: Get rid of [COMPAT_]SYS_NI() uses 2024-01-20 11:51:46 +01:00
sysctl-test.c
sysctl.c v6.5-rc1-sysctl-next 2023-06-28 16:05:21 -07:00
task_work.c task_work: add kerneldoc annotation for 'data' argument 2023-09-19 13:21:32 -07:00
taskstats.c
torture.c rcutorture: Fix stuttering races and other issues 2023-11-28 17:20:08 +00:00
tracepoint.c tracepoint: Allow livepatch module add trace event 2023-02-18 14:34:36 -05:00
tsacct.c
ucount.c sysctl: Add size to register_sysctl 2023-08-15 15:26:17 -07:00
uid16.c
uid16.h
umh.c sysctl: fix unused proc_cap_handler() function warning 2023-06-29 15:19:43 -07:00
up.c
user-return-notifier.c
user.c kernel/user: Allow user_struct::locked_vm to be usable for iommufd 2022-11-30 20:16:49 -04:00
user_namespace.c userns: fix a struct's kernel-doc notation 2023-02-02 22:50:04 -08:00
usermode_driver.c
utsname.c
utsname_sysctl.c utsname: simplify one-level sysctl registration for uts_kern_table 2023-04-13 11:49:35 -07:00
vhost_task.c vhost: Fix worker hangs due to missed wake up calls 2023-06-08 15:43:09 -04:00
watch_queue.c kernel: watch_queue: copy user-array safely 2023-11-28 17:19:40 +00:00
watchdog.c watchdog: move softlockup_panic back to early_param 2023-11-28 17:19:57 +00:00
watchdog_buddy.c watchdog/hardlockup: move SMP barriers from common code to buddy code 2023-06-19 16:25:28 -07:00
watchdog_perf.c watchdog/perf: add a weak function for an arch to detect if perf can use NMIs 2023-06-09 17:44:21 -07:00
workqueue.c workqueue: Fix selection of wake_cpu in kick_pool() 2024-05-17 12:02:31 +02:00
workqueue_internal.h workqueue: Drop the special locking rule for worker->flags and worker_pool->flags 2023-08-07 15:57:22 -10:00