bianbu-linux-6.6/kernel
Johannes Weiner 1f997b1d13 sched: psi: fix bogus pressure spikes from aggregation race
[ Upstream commit 3840cbe24cf060ea05a585ca497814609f5d47d1 ]

Brandon reports sporadic, non-sensical spikes in cumulative pressure
time (total=) when reading cpu.pressure at a high rate. This is due to
a race condition between reader aggregation and tasks changing states.

While it affects all states and all resources captured by PSI, in
practice it most likely triggers with CPU pressure, since scheduling
events are so frequent compared to other resource events.

The race context is the live snooping of ongoing stalls during a
pressure read. The read aggregates per-cpu records for stalls that
have concluded, but will also incorporate ad-hoc the duration of any
active state that hasn't been recorded yet. This is important to get
timely measurements of ongoing stalls. Those ad-hoc samples are
calculated on-the-fly up to the current time on that CPU; since the
stall hasn't concluded, it's expected that this is the minimum amount
of stall time that will enter the per-cpu records once it does.

The problem is that the path that concludes the state uses a CPU clock
read that is not synchronized against aggregators; the clock is read
outside of the seqlock protection. This allows aggregators to race and
snoop a stall with a longer duration than will actually be recorded.

With the recorded stall time being less than the last snapshot
remembered by the aggregator, a subsequent sample will underflow and
observe a bogus delta value, resulting in an erratic jump in pressure.

Fix this by moving the clock read of the state change into the seqlock
protection. This ensures no aggregation can snoop live stalls past the
time that's recorded when the state concludes.

Reported-by: Brandon Duffany <brandon@buildbuddy.io>
Link: https://bugzilla.kernel.org/show_bug.cgi?id=219194
Link: https://lore.kernel.org/lkml/20240827121851.GB438928@cmpxchg.org/
Fixes: df77430639 ("psi: Reduce calls to sched_clock() in psi")
Cc: stable@vger.kernel.org
Signed-off-by: Johannes Weiner <hannes@cmpxchg.org>
Reviewed-by: Chengming Zhou <chengming.zhou@linux.dev>
Signed-off-by: Linus Torvalds <torvalds@linux-foundation.org>
Signed-off-by: Sasha Levin <sashal@kernel.org>
2024-10-10 11:58:03 +02:00
..
bpf bpf: Make the pointer returned by iter next method valid 2024-10-10 11:57:39 +02:00
cgroup cgroup: Protect css->cgroup write under css_set_lock 2024-09-12 11:11:35 +02:00
configs Kbuild updates for v6.6 2023-09-05 11:01:47 -07:00
debug kdb: Use the passed prompt in kdb_position_cursor() 2024-08-03 08:54:34 +02:00
dma dma-mapping: benchmark: Don't starve others when doing the test 2024-09-12 11:11:37 +02:00
entry entry: Respect changes to system call number by trace_sys_enter() 2024-04-03 15:28:50 +02:00
events uprobes: fix kernel info leak via "[uprobes]" vma 2024-10-10 11:58:02 +02:00
futex futex: Don't include process MM in futex key on no-MMU 2023-11-20 11:58:53 +01:00
gcov gcov: add support for GCC 14 2024-06-27 13:49:13 +02:00
irq genirq/cpuhotplug: Retry with cpu_online_mask when migration fails 2024-08-19 06:04:24 +02:00
kcsan kcsan: Don't expect 64 bits atomic builtins from 32 bits architectures 2023-06-09 23:29:50 +10:00
livepatch livepatch: Fix missing newline character in klp_resolve_symbols() 2023-11-20 11:59:25 +01:00
locking lockdep: fix deadlock issue between lockdep and rcu 2024-10-04 16:30:02 +02:00
module module: Fix KCOV-ignored file name 2024-10-04 16:30:03 +02:00
power PM: s2idle: Make sure CPUs will wakeup directly on resume 2024-04-17 11:19:26 +02:00
printk printk: For @suppress_panic_printk check for other CPU in panic 2024-04-13 13:07:29 +02:00
rcu rcuscale: Provide clear error when async specified without primitives 2024-10-10 11:57:31 +02:00
sched sched: psi: fix bogus pressure spikes from aggregation race 2024-10-10 11:58:03 +02:00
time hrtimer: Prevent queuing of hrtimer without a function callback 2024-08-29 17:33:41 +02:00
trace tracing/timerlat: Fix duplicated kthread creation due to CPU online/offline 2024-10-10 11:57:59 +02:00
.gitignore
acct.c audit/stable-6.6 PR 20230829 2023-08-30 08:17:35 -07:00
async.c async: Introduce async_schedule_dev_nocall() 2024-01-31 16:18:49 -08:00
audit.c audit: Send netlink ACK before setting connection in auditd_set 2024-02-05 20:14:14 +00:00
audit.h audit: correct audit_filter_inodes() definition 2023-07-21 12:17:25 -04:00
audit_fsnotify.c
audit_tree.c
audit_watch.c audit: don't WARN_ON_ONCE(!current->mm) in audit_exe_compare() 2023-11-28 17:19:56 +00:00
auditfilter.c ima: Avoid blocking in RCU read-side critical section 2024-07-11 12:49:18 +02:00
auditsc.c audit,io_uring: io_uring openat triggers audit reference count underflow 2023-10-13 18:34:46 +02:00
backtracetest.c
bounds.c bounds: Use the right number of bits for power-of-two CONFIG_NR_CPUS 2024-05-02 16:32:50 +02:00
capability.c lsm: constify the 'target' parameter in security_capget() 2023-08-08 16:48:47 -04:00
cfi.c
compat.c sched_getaffinity: don't assume 'cpumask_size()' is fully initialized 2023-03-14 19:32:38 -07:00
configs.c
context_tracking.c locking/atomic: treewide: use raw_atomic*_<op>() 2023-06-05 09:57:20 +02:00
cpu.c cpu/SMT: Enable SMT only if a core is online 2024-08-29 17:33:30 +02:00
cpu_pm.c
crash_core.c mm: turn folio_test_hugetlb into a PageType 2024-05-02 16:32:47 +02:00
crash_dump.c
cred.c cred: get rid of CONFIG_DEBUG_CREDENTIALS 2023-12-20 17:01:51 +01:00
delayacct.c delayacct: track delays from IRQ/SOFTIRQ 2023-04-18 16:39:34 -07:00
dma.c
exec_domain.c
exit.c mm: optimize the redundant loop of mm_update_owner_next() 2024-07-11 12:49:15 +02:00
extable.c
fail_function.c
fork.c close_range(): fix the logics in descriptor table trimming 2024-10-10 11:58:00 +02:00
freezer.c
gen_kheaders.sh kheaders: explicitly define file modes for archived headers 2024-06-21 14:38:40 +02:00
groups.c
hung_task.c kernel/hung_task.c: set some hung_task.c variables storage-class-specifier to static 2023-04-08 13:45:37 -07:00
iomem.c kernel/iomem.c: remove __weak ioremap_cache helper 2023-08-21 13:37:28 -07:00
irq_work.c trace: Add trace_ipi_send_cpu() 2023-03-24 11:01:29 +01:00
jump_label.c jump_label: Fix static_key_slow_dec() yet again 2024-10-10 11:57:13 +02:00
kallsyms.c kallsyms: Change func signature for cleanup_symbol_name() 2023-08-25 15:00:36 -07:00
kallsyms_internal.h
kallsyms_selftest.c Modules changes for v6.6-rc1 2023-08-29 17:32:32 -07:00
kallsyms_selftest.h
kcmp.c
Kconfig.freezer
Kconfig.hz
Kconfig.kexec kexec: select CRYPTO from KEXEC_FILE instead of depending on it 2024-01-05 15:19:41 +01:00
Kconfig.locks
Kconfig.preempt
kcov.c kcov: properly check for softirq context 2024-08-14 13:58:57 +02:00
kexec.c kernel: kexec: copy user-array safely 2023-11-28 17:19:40 +00:00
kexec_core.c kexec: do syscore_shutdown() in kernel_kexec 2024-01-31 16:18:56 -08:00
kexec_elf.c
kexec_file.c kexec_file: fix elfcorehdr digest exclusion when CONFIG_CRASH_HOTPLUG=y 2024-09-12 11:11:27 +02:00
kexec_internal.h
kheaders.c kheaders: Use array declaration instead of char 2023-03-24 20:10:59 -07:00
kprobes.c kprobes: Fix to check symbol prefixes correctly 2024-08-14 13:58:51 +02:00
ksyms_common.c kallsyms: make kallsyms_show_value() as generic function 2023-06-08 12:27:20 -07:00
ksysfs.c crash: hotplug support for kexec_load() 2023-08-24 16:25:14 -07:00
kthread.c kthread: fix task state in kthread worker if being frozen 2024-10-04 16:29:20 +02:00
latencytop.c
Makefile kernel/numa.c: Move logging out of numa.h 2024-06-12 11:11:50 +02:00
module_signature.c
notifier.c notifiers: add tracepoints to the notifiers infrastructure 2023-04-08 13:45:38 -07:00
nsproxy.c nsproxy: Convert nsproxy.count to refcount_t 2023-08-21 11:29:12 -07:00
numa.c kernel/numa.c: Move logging out of numa.h 2024-06-12 11:11:50 +02:00
padata.c padata: use integer wrap around to prevent deadlock on seq_nr overflow 2024-10-04 16:29:56 +02:00
panic.c panic: Flush kernel log buffer at the end 2024-04-13 13:07:29 +02:00
params.c kernel: params: Remove unnecessary ‘0’ values from err 2023-07-10 12:47:01 -07:00
pid.c pidfd: prevent a kernel-doc warning 2023-09-19 13:21:33 -07:00
pid_namespace.c zap_pid_ns_processes: clear TIF_NOTIFY_SIGNAL along with TIF_SIGPENDING 2024-06-21 14:38:50 +02:00
pid_sysctl.h memfd: replace ratcheting feature from vm.memfd_noexec with hierarchy 2023-08-21 13:37:59 -07:00
profile.c profiling: remove profile=sleep support 2024-08-14 13:58:47 +02:00
ptrace.c ptrace: Provide set/get interface for syscall user dispatch 2023-04-16 14:23:07 +02:00
range.c
reboot.c kernel/reboot: emergency_restart: Set correct system_state 2023-11-28 17:20:04 +00:00
regset.c
relay.c kernel: relay: remove unnecessary NULL values from relay_open_buf 2023-08-18 10:18:55 -07:00
resource.c resource: fix region_intersects() vs add_memory_driver_managed() 2024-10-10 11:57:50 +02:00
resource_kunit.c
rseq.c
scftorture.c scftorture: Pause testing after memory-allocation failure 2023-07-14 15:02:57 -07:00
scs.c
seccomp.c seccomp: Add missing kerndoc notations 2023-08-17 12:32:15 -07:00
signal.c kernel: rerun task_work while freezing in get_signal() 2024-08-03 08:54:13 +02:00
smp.c smp: Add missing destroy_work_on_stack() call in smp_call_on_cpu() 2024-09-12 11:11:37 +02:00
smpboot.c kthread: add kthread_stop_put 2024-06-12 11:12:52 +02:00
smpboot.h
softirq.c softirq: Fix suspicious RCU usage in __do_softirq() 2024-06-12 11:11:27 +02:00
stackleak.c stackleak: allow to specify arch specific stackleak poison function 2023-04-20 11:36:35 +02:00
stacktrace.c
static_call.c
static_call_inline.c static_call: Replace pointless WARN_ON() in static_call_module_notify() 2024-10-10 11:57:13 +02:00
stop_machine.c
sys.c prctl: generalize PR_SET_MDWE support check to be per-arch 2024-04-03 15:28:54 +02:00
sys_ni.c syscalls: fix compat_sys_io_pgetevents_time64 usage 2024-07-05 09:34:04 +02:00
sysctl-test.c
sysctl.c v6.5-rc1-sysctl-next 2023-06-28 16:05:21 -07:00
task_work.c task_work: Introduce task_work_cancel() again 2024-08-03 08:54:16 +02:00
taskstats.c
torture.c rcutorture: Fix stuttering races and other issues 2023-11-28 17:20:08 +00:00
tracepoint.c
tsacct.c
ucount.c sysctl: Add size to register_sysctl 2023-08-15 15:26:17 -07:00
uid16.c
uid16.h
umh.c sysctl: fix unused proc_cap_handler() function warning 2023-06-29 15:19:43 -07:00
up.c
user-return-notifier.c
user.c
user_namespace.c
usermode_driver.c
utsname.c
utsname_sysctl.c utsname: simplify one-level sysctl registration for uts_kern_table 2023-04-13 11:49:35 -07:00
vhost_task.c vhost_task: Handle SIGKILL by flushing work and exiting 2024-07-11 12:49:10 +02:00
watch_queue.c kernel: watch_queue: copy user-array safely 2023-11-28 17:19:40 +00:00
watchdog.c watchdog: move softlockup_panic back to early_param 2023-11-28 17:19:57 +00:00
watchdog_buddy.c watchdog/hardlockup: move SMP barriers from common code to buddy code 2023-06-19 16:25:28 -07:00
watchdog_perf.c watchdog/perf: properly initialize the turbo mode timestamp and rearm counter 2024-08-03 08:54:29 +02:00
workqueue.c workqueue: Improve scalability of workqueue watchdog touch 2024-09-12 11:11:42 +02:00
workqueue_internal.h workqueue: Drop the special locking rule for worker->flags and worker_pool->flags 2023-08-07 15:57:22 -10:00