bianbu-linux-6.6/kernel
Eric DeVolder 6f991cc363 crash: move a few code bits to setup support of crash hotplug
Patch series "crash: Kernel handling of CPU and memory hot un/plug", v28.

Once the kdump service is loaded, if changes to CPUs or memory occur,
either by hot un/plug or off/onlining, the crash elfcorehdr must also be
updated.

The elfcorehdr describes to kdump the CPUs and memory in the system, and
any inaccuracies can result in a vmcore with missing CPU context or memory
regions.

The current solution utilizes udev to initiate an unload-then-reload of
the kdump image (eg.  kernel, initrd, boot_params, purgatory and
elfcorehdr) by the userspace kexec utility.  In the original post I
outlined the significant performance problems related to offloading this
activity to userspace.

This patchset introduces a generic crash handler that registers with the
CPU and memory notifiers.  Upon CPU or memory changes, from either hot
un/plug or off/onlining, this generic handler is invoked and performs
important housekeeping, for example obtaining the appropriate lock, and
then invokes an architecture specific handler to do the appropriate
elfcorehdr update.

Note the description in patch 'crash: change crash_prepare_elf64_headers()
to for_each_possible_cpu()' and 'x86/crash: optimize CPU changes' that
enables further optimizations related to CPU plug/unplug/online/offline
performance of elfcorehdr updates.

In the case of x86_64, the arch specific handler generates a new
elfcorehdr, and overwrites the old one in memory; thus no involvement with
userspace needed.

To realize the benefits/test this patchset, one must make a couple
of minor changes to userspace:

 - Prevent udev from updating kdump crash kernel on hot un/plug changes.
   Add the following as the first lines to the RHEL udev rule file
   /usr/lib/udev/rules.d/98-kexec.rules:

   # The kernel updates the crash elfcorehdr for CPU and memory changes
   SUBSYSTEM=="cpu", ATTRS{crash_hotplug}=="1", GOTO="kdump_reload_end"
   SUBSYSTEM=="memory", ATTRS{crash_hotplug}=="1", GOTO="kdump_reload_end"

   With this changeset applied, the two rules evaluate to false for
   CPU and memory change events and thus skip the userspace
   unload-then-reload of kdump.

 - Change to the kexec_file_load for loading the kdump kernel:
   Eg. on RHEL: in /usr/bin/kdumpctl, change to:
    standard_kexec_args="-p -d -s"
   which adds the -s to select kexec_file_load() syscall.

This kernel patchset also supports kexec_load() with a modified kexec
userspace utility.  A working changeset to the kexec userspace utility is
posted to the kexec-tools mailing list here:

 http://lists.infradead.org/pipermail/kexec/2023-May/027049.html

To use the kexec-tools patch, apply, build and install kexec-tools, then
change the kdumpctl's standard_kexec_args to replace the -s with
--hotplug.  The removal of -s reverts to the kexec_load syscall and the
addition of --hotplug invokes the changes put forth in the kexec-tools
patch.


This patch (of 8):

The crash hotplug support leans on the work for the kexec_file_load()
syscall.  To also support the kexec_load() syscall, a few bits of code
need to be move outside of CONFIG_KEXEC_FILE.  As such, these bits are
moved out of kexec_file.c and into a common location crash_core.c.

In addition, struct crash_mem and crash_notes were moved to new locales so
that PROC_KCORE, which sets CRASH_CORE alone, builds correctly.

No functionality change intended.

Link: https://lkml.kernel.org/r/20230814214446.6659-1-eric.devolder@oracle.com
Link: https://lkml.kernel.org/r/20230814214446.6659-2-eric.devolder@oracle.com
Signed-off-by: Eric DeVolder <eric.devolder@oracle.com>
Reviewed-by: Sourabh Jain <sourabhjain@linux.ibm.com>
Acked-by: Hari Bathini <hbathini@linux.ibm.com>
Acked-by: Baoquan He <bhe@redhat.com>
Cc: Akhil Raj <lf32.dev@gmail.com>
Cc: Bjorn Helgaas <bhelgaas@google.com>
Cc: Borislav Petkov (AMD) <bp@alien8.de>
Cc: Boris Ostrovsky <boris.ostrovsky@oracle.com>
Cc: Dave Hansen <dave.hansen@linux.intel.com>
Cc: Dave Young <dyoung@redhat.com>
Cc: David Hildenbrand <david@redhat.com>
Cc: Eric W. Biederman <ebiederm@xmission.com>
Cc: Greg Kroah-Hartman <gregkh@linuxfoundation.org>
Cc: "H. Peter Anvin" <hpa@zytor.com>
Cc: Ingo Molnar <mingo@redhat.com>
Cc: Jonathan Corbet <corbet@lwn.net>
Cc: Konrad Rzeszutek Wilk <konrad.wilk@oracle.com>
Cc: Mimi Zohar <zohar@linux.ibm.com>
Cc: Naveen N. Rao <naveen.n.rao@linux.vnet.ibm.com>
Cc: Oscar Salvador <osalvador@suse.de>
Cc: "Rafael J. Wysocki" <rafael@kernel.org>
Cc: Sean Christopherson <seanjc@google.com>
Cc: Takashi Iwai <tiwai@suse.de>
Cc: Thomas Gleixner <tglx@linutronix.de>
Cc: Thomas Weißschuh <linux@weissschuh.net>
Cc: Valentin Schneider <vschneid@redhat.com>
Cc: Vivek Goyal <vgoyal@redhat.com>
Cc: Vlastimil Babka <vbabka@suse.cz>
Signed-off-by: Andrew Morton <akpm@linux-foundation.org>
2023-08-24 16:25:13 -07:00
..
bpf bpf: Repeat check_max_stack_depth for async callbacks 2023-07-18 15:21:09 -07:00
cgroup sched/psi: use kernfs polling functions for PSI trigger polling 2023-07-10 09:52:30 +02:00
configs treewide: drop CONFIG_EMBEDDED 2023-08-21 13:46:25 -07:00
debug kdb: move kdb_send_sig() declaration to a better header file 2023-07-03 09:27:12 +01:00
dma dma-mapping fixes for Linux 6.5 2023-07-09 10:24:22 -07:00
entry ptrace: Provide set/get interface for syscall user dispatch 2023-04-16 14:23:07 +02:00
events cxl for v6.5 2023-07-01 08:58:41 -07:00
futex - Prevent the leaking of a debug timer in futex_waitv() 2023-01-01 11:15:05 -08:00
gcov gcov: shut up missing prototype warnings for internal stubs 2023-08-18 10:18:58 -07:00
irq irqdomain: Use return value of strreplace() 2023-06-30 11:13:44 +02:00
kcsan kcsan: Don't expect 64 bits atomic builtins from 32 bits architectures 2023-06-09 23:29:50 +10:00
livepatch livepatch: Make 'klp_stack_entries' static 2023-06-05 13:56:52 +02:00
locking lockdep: fix static memory detection even more 2023-08-21 13:46:24 -07:00
module module: fix init_module_from_file() error handling 2023-07-04 10:17:11 -07:00
power Merge branches 'pm-sleep' and 'pm-qos' 2023-07-14 19:13:21 +02:00
printk seqlock/latch: Provide raw_read_seqcount_latch_retry() 2023-06-05 21:11:03 +02:00
rcu Merge branches 'doc.2023.05.10a', 'fixes.2023.05.11a', 'kvfree.2023.05.10a', 'nocb.2023.05.11a', 'rcu-tasks.2023.05.10a', 'torture.2023.05.15a' and 'rcu-urgent.2023.06.06a' into HEAD 2023-06-07 13:44:06 -07:00
sched sched/psi: use kernfs polling functions for PSI trigger polling 2023-07-10 09:52:30 +02:00
time hardening updates for v6.5-rc1 2023-06-27 21:24:18 -07:00
trace Probe fixes for 6.5-rc3: 2023-07-30 11:27:22 -07:00
.gitignore
acct.c acct: replace all non-returning strlcpy with strscpy 2023-08-18 10:18:51 -07:00
async.c
audit.c audit: use time_after to compare time 2022-08-29 19:47:03 -04:00
audit.h audit: avoid missing-prototype warnings 2023-05-17 11:34:55 -04:00
audit_fsnotify.c audit: fix potential double free on error path from fsnotify_add_inode_mark 2022-08-22 18:50:06 -04:00
audit_tree.c
audit_watch.c audit_init_parent(): constify path 2022-09-01 17:39:30 -04:00
auditfilter.c
auditsc.c capability: just use a 'u64' instead of a 'u32[2]' array 2023-03-01 10:01:22 -08:00
backtracetest.c
bounds.c mm: multi-gen LRU: minimal implementation 2022-09-26 19:46:09 -07:00
capability.c capability: fix kernel-doc warnings in capability.c 2023-05-22 14:30:52 -04:00
cfi.c cfi: Switch to -fsanitize=kcfi 2022-09-26 10:13:13 -07:00
compat.c sched_getaffinity: don't assume 'cpumask_size()' is fully initialized 2023-03-14 19:32:38 -07:00
configs.c
context_tracking.c locking/atomic: treewide: use raw_atomic*_<op>() 2023-06-05 09:57:20 +02:00
cpu.c cpu/hotplug: Fix off by one in cpuhp_bringup_mask() 2023-05-23 18:06:40 +02:00
cpu_pm.c cpuidle, cpu_pm: Remove RCU fiddling from cpu_pm_{enter,exit}() 2023-01-13 11:48:15 +01:00
crash_core.c crash: move a few code bits to setup support of crash hotplug 2023-08-24 16:25:13 -07:00
crash_dump.c
cred.c cred: convert printks to pr_<level> 2023-08-18 10:18:49 -07:00
delayacct.c delayacct: track delays from IRQ/SOFTIRQ 2023-04-18 16:39:34 -07:00
dma.c
exec_domain.c
exit.c fork, vhost: Use CLONE_THREAD to fix freezer/ps regression 2023-06-01 17:15:33 -04:00
extable.c
fail_function.c kernel/fail_function: fix memory leak with using debugfs_lookup() 2023-02-08 13:36:22 +01:00
fork.c kernel/fork: stop playing lockless games for exe_file replacement 2023-08-21 13:46:24 -07:00
freezer.c freezer,sched: Rewrite core freezer logic 2022-09-07 21:53:50 +02:00
gen_kheaders.sh Revert "kheaders: substituting --sort in archive creation" 2023-05-28 16:20:21 +09:00
groups.c
hung_task.c kernel/hung_task.c: set some hung_task.c variables storage-class-specifier to static 2023-04-08 13:45:37 -07:00
iomem.c
irq_work.c trace: Add trace_ipi_send_cpu() 2023-03-24 11:01:29 +01:00
jump_label.c jump_label: Prevent key->enabled int overflow 2022-12-01 15:53:05 -08:00
kallsyms.c kallsyms: strip LTO-only suffixes from promoted global functions 2023-07-12 15:39:34 -07:00
kallsyms_internal.h kallsyms: Reduce the memory occupied by kallsyms_seqs_of_names[] 2022-11-12 18:47:36 -08:00
kallsyms_selftest.c kallsyms: Delete an unused parameter related to {module_}kallsyms_on_each_symbol() 2023-03-19 13:27:19 -07:00
kallsyms_selftest.h kallsyms: Add self-test facility 2022-11-15 00:42:02 -08:00
kcmp.c
Kconfig.freezer
Kconfig.hz
Kconfig.kexec remove ARCH_DEFAULT_KEXEC from Kconfig.kexec 2023-08-18 10:18:55 -07:00
Kconfig.locks
Kconfig.preempt
kcov.c kcov: add prototypes for helper functions 2023-06-09 17:44:17 -07:00
kexec.c kexec: introduce sysctl parameters kexec_load_limit_* 2023-02-02 22:50:05 -08:00
kexec_core.c crash: move a few code bits to setup support of crash hotplug 2023-08-24 16:25:13 -07:00
kexec_elf.c
kexec_file.c crash: move a few code bits to setup support of crash hotplug 2023-08-24 16:25:13 -07:00
kexec_internal.h panic, kexec: make __crash_kexec() NMI safe 2022-09-11 21:55:06 -07:00
kheaders.c kheaders: Use array declaration instead of char 2023-03-24 20:10:59 -07:00
kprobes.c kprobes: Prohibit probing on CFI preamble symbol 2023-07-29 23:32:26 +09:00
ksyms_common.c kallsyms: make kallsyms_show_value() as generic function 2023-06-08 12:27:20 -07:00
ksysfs.c kernel/ksysfs.c: use sysfs_emit for sysfs show handlers 2023-03-24 17:09:14 +01:00
kthread.c kthread: unexport __kthread_should_park() 2023-08-18 10:18:59 -07:00
latencytop.c latencytop: use the last element of latency_record of system 2022-09-11 21:55:12 -07:00
Makefile v6.5-rc1-modules-next 2023-06-28 15:51:08 -07:00
module_signature.c
notifier.c notifiers: add tracepoints to the notifiers infrastructure 2023-04-08 13:45:38 -07:00
nsproxy.c convert setns(2) to fdget()/fdput() 2023-04-20 22:55:35 -04:00
padata.c padata: use alignment when calculating the number of worker threads 2023-03-14 17:06:44 +08:00
panic.c panic: hide unused global functions 2023-06-09 17:44:15 -07:00
params.c kallsyms: Replace all non-returning strlcpy with strscpy 2023-06-14 12:27:38 -07:00
pid.c pid: use struct_size_t() helper 2023-07-01 08:26:23 -07:00
pid_namespace.c pid: use struct_size_t() helper 2023-07-01 08:26:23 -07:00
pid_sysctl.h kernel: pid_namespace: remove unused set_memfd_noexec_scope() 2023-06-19 16:19:28 -07:00
profile.c kernel/profile.c: simplify duplicated code in profile_setup() 2022-09-11 21:55:12 -07:00
ptrace.c ptrace: Provide set/get interface for syscall user dispatch 2023-04-16 14:23:07 +02:00
range.c
reboot.c kernel/reboot: Add SYS_OFF_MODE_RESTART_PREPARE mode 2022-10-04 15:59:36 +02:00
regset.c
relay.c kernel: relay: remove unnecessary NULL values from relay_open_buf 2023-08-18 10:18:55 -07:00
resource.c dax/kmem: Fix leak of memory-hotplug resources 2023-02-17 14:58:01 -08:00
resource_kunit.c
rseq.c rseq: Extend struct rseq with per-memory-map concurrency ID 2022-12-27 12:52:12 +01:00
scftorture.c
scs.c scs: add support for dynamic shadow call stacks 2022-11-09 18:06:35 +00:00
seccomp.c seccomp: simplify sysctls with register_sysctl_init() 2023-04-13 11:49:20 -07:00
signal.c signal: print comm and exe name on fatal signals 2023-08-18 10:18:50 -07:00
smp.c trace,smp: Add tracepoints for scheduling remotelly called functions 2023-06-16 22:08:09 +02:00
smpboot.c cpu/hotplug: Remove unused state functions 2023-05-15 13:45:00 +02:00
smpboot.h
softirq.c Revert "softirq: Let ksoftirqd do its job" 2023-05-09 21:50:27 +02:00
stackleak.c stackleak: allow to specify arch specific stackleak poison function 2023-04-20 11:36:35 +02:00
stacktrace.c
static_call.c
static_call_inline.c static_call: Add call depth tracking support 2022-10-17 16:41:16 +02:00
stop_machine.c
sys.c prctl: move PR_GET_AUXV out of PR_MCE_KILL 2023-07-17 12:53:21 -07:00
sys_ni.c asm-generic updates for 6.5 2023-07-06 10:06:04 -07:00
sysctl-test.c kernel/sysctl-test: use SYSCTL_{ZERO/ONE_HUNDRED} instead of i_{zero/one_hundred} 2022-09-08 16:56:45 -07:00
sysctl.c v6.5-rc1-sysctl-next 2023-06-28 16:05:21 -07:00
task_work.c task_work: use try_cmpxchg in task_work_add, task_work_cancel_match and task_work_run 2022-09-11 21:55:10 -07:00
taskstats.c genetlink: start to validate reserved header bytes 2022-08-29 12:47:15 +01:00
torture.c torture: Fix hang during kthread shutdown phase 2023-01-05 12:10:35 -08:00
tracepoint.c tracepoint: Allow livepatch module add trace event 2023-02-18 14:34:36 -05:00
tsacct.c
ucount.c
uid16.c
uid16.h
umh.c sysctl: fix unused proc_cap_handler() function warning 2023-06-29 15:19:43 -07:00
up.c
user-return-notifier.c
user.c kernel/user: Allow user_struct::locked_vm to be usable for iommufd 2022-11-30 20:16:49 -04:00
user_namespace.c userns: fix a struct's kernel-doc notation 2023-02-02 22:50:04 -08:00
usermode_driver.c
utsname.c
utsname_sysctl.c utsname: simplify one-level sysctl registration for uts_kern_table 2023-04-13 11:49:35 -07:00
vhost_task.c vhost: Fix worker hangs due to missed wake up calls 2023-06-08 15:43:09 -04:00
watch_queue.c watch_queue: prevent dangling pipe pointer 2023-06-06 10:47:04 +02:00
watchdog.c watchdog/hardlockup: avoid large stack frames in watchdog_hardlockup_check() 2023-08-18 10:19:00 -07:00
watchdog_buddy.c watchdog/hardlockup: move SMP barriers from common code to buddy code 2023-06-19 16:25:28 -07:00
watchdog_perf.c watchdog/perf: add a weak function for an arch to detect if perf can use NMIs 2023-06-09 17:44:21 -07:00
workqueue.c workqueue: Changes for v6.5 2023-06-27 16:32:52 -07:00
workqueue_internal.h workqueue: Automatically mark CPU-hogging work items CPU_INTENSIVE 2023-05-17 17:02:08 -10:00