- Improve the handling of shared ACPI power resources in the PCI
bus type layer (Mika Westerberg).
- Make the PCI layer take link delays required by the PCIe spec
into account as appropriate and avoid polling devices in D3cold
for PME (Mika Westerberg).
- Fix some corner case issues in ACPI device power management and
in the PCI bus type layer, optimiza and clean up the handling of
runtime-suspended PCI devices during system-wide transitions to
sleep states (Rafael Wysocki).
- Rework hibernation handling in the ACPI core and the PCI bus type
to resume runtime-suspended devices before hibernation (which
allows some functional problems to be avoided) and fix some ACPI
power management issues related to hiberation (Rafael Wysocki).
- Extend the operating performance points (OPP) framework to support
a wider range of devices (Rajendra Nayak, Stehpen Boyd).
- Fix issues related to genpd_virt_devs and issues with platforms
using the set_opp() callback in the OPP framework (Viresh Kumar,
Dmitry Osipenko).
- Add new cpufreq driver for Raspberry Pi (Nicolas Saenz Julienne).
- Add new cpufreq driver for imx8m and imx7d chips (Leonard Crestez).
- Fix and clean up the pcc-cpufreq, brcmstb-avs-cpufreq, s5pv210,
and armada-37xx cpufreq drivers (David Arcari, Florian Fainelli,
Paweł Chmiel, YueHaibing).
- Clean up and fix the cpufreq core (Viresh Kumar, Daniel Lezcano).
- Fix minor issue in the ACPI system sleep support code and export
one function from it (Lenny Szubowicz, Dexuan Cui).
- Clean up assorted pieces of PM code and documentation (Kefeng Wang,
Andy Shevchenko, Bart Van Assche, Greg Kroah-Hartman, Fuqian Huang,
Geert Uytterhoeven, Mathieu Malaterre, Rafael Wysocki).
- Update the pm-graph utility to v5.4 (Todd Brandt).
- Fix and clean up the cpupower utility (Abhishek Goel, Nick Black).
-----BEGIN PGP SIGNATURE-----
iQJGBAABCAAwFiEE4fcc61cGeeHD/fCwgsRv/nhiVHEFAl0jK18SHHJqd0Byand5
c29ja2kubmV0AAoJEILEb/54YlRxgEAP/RbPe71Y9ufKu64L3EtgV6mS9iuLEhux
/Ad9laLNeM1b0oceT3QxGk7xQCacnZcBlcaqXVWI4NRsn4RBZp1cYZngpgJ9DP6E
ONr8hzyzDOMVReba3XJIF8H+WoTKjywMYtFutjdx6dRe2ZJLutqnuZ0JbH1YSSK7
IxOt0mJVALf2M4Zz7F17d+n3yGE/4xAPBVbj/rBRcTEsGYlR/Hoxs7iF6EBau7fy
R5drUH6XSrWk8adc+z7l3BTGqMMYj9deRSfAWB3wpM4YK7Fv7msX/amBoGINkdn6
xP/ZcrHvhKKzE89MS8OUGP4rGVwq+7tu6mktnYL/tpKgutJqqx5LVvrLsGDSWr+W
/aJExN8Eb4Jh98C6vog3XUJoqBxkVGbU8qoCBU3jlFsaznFEWjW9IKhBHs5CIaqz
MXte6AsJ8lvFzxILjvx0m2206wNpRJRXYLX3a/BSBxa4OgOESjIpBTmbPfOwbxwj
8z9hIVlDTTDtnF6BEyDQr1fjPi3Mxl7ibGnoqRrJm36VKBy9VZNNwG/0Y2oSvm6k
Es8CiTWA3A/46dCZxGr18/9Vbfxn1Yvg9QZ1lCE5Fqij+0F2yRApbHZgUro+1rji
6J8OWC5r5JccdKuGHh4RH/asMFhD0cAR/gUsRzS4dz/wz2jVIN1FstdD1aKN5GBy
d0lchx/AKR5H
=aBN3
-----END PGP SIGNATURE-----
Merge tag 'pm-5.3-rc1' of git://git.kernel.org/pub/scm/linux/kernel/git/rafael/linux-pm
Pull power management updates from Rafael Wysocki:
"These update PCI and ACPI power management (improved handling of ACPI
power resources and PCIe link delays, fixes related to corner cases,
hibernation handling rework), fix and extend the operating performance
points (OPP) framework, add new cpufreq drivers for Raspberry Pi and
imx8m chips, update some other cpufreq drivers, clean up assorted
pieces of PM code and documentation and update tools.
Specifics:
- Improve the handling of shared ACPI power resources in the PCI bus
type layer (Mika Westerberg).
- Make the PCI layer take link delays required by the PCIe spec into
account as appropriate and avoid polling devices in D3cold for PME
(Mika Westerberg).
- Fix some corner case issues in ACPI device power management and in
the PCI bus type layer, optimiza and clean up the handling of
runtime-suspended PCI devices during system-wide transitions to
sleep states (Rafael Wysocki).
- Rework hibernation handling in the ACPI core and the PCI bus type
to resume runtime-suspended devices before hibernation (which
allows some functional problems to be avoided) and fix some ACPI
power management issues related to hiberation (Rafael Wysocki).
- Extend the operating performance points (OPP) framework to support
a wider range of devices (Rajendra Nayak, Stehpen Boyd).
- Fix issues related to genpd_virt_devs and issues with platforms
using the set_opp() callback in the OPP framework (Viresh Kumar,
Dmitry Osipenko).
- Add new cpufreq driver for Raspberry Pi (Nicolas Saenz Julienne).
- Add new cpufreq driver for imx8m and imx7d chips (Leonard Crestez).
- Fix and clean up the pcc-cpufreq, brcmstb-avs-cpufreq, s5pv210, and
armada-37xx cpufreq drivers (David Arcari, Florian Fainelli, Paweł
Chmiel, YueHaibing).
- Clean up and fix the cpufreq core (Viresh Kumar, Daniel Lezcano).
- Fix minor issue in the ACPI system sleep support code and export
one function from it (Lenny Szubowicz, Dexuan Cui).
- Clean up assorted pieces of PM code and documentation (Kefeng Wang,
Andy Shevchenko, Bart Van Assche, Greg Kroah-Hartman, Fuqian Huang,
Geert Uytterhoeven, Mathieu Malaterre, Rafael Wysocki).
- Update the pm-graph utility to v5.4 (Todd Brandt).
- Fix and clean up the cpupower utility (Abhishek Goel, Nick Black)"
* tag 'pm-5.3-rc1' of git://git.kernel.org/pub/scm/linux/kernel/git/rafael/linux-pm: (57 commits)
ACPI: PM: Make acpi_sleep_state_supported() non-static
PM: sleep: Drop dev_pm_skip_next_resume_phases()
ACPI: PM: Unexport acpi_device_get_power()
Documentation: ABI: power: Add missing newline at end of file
ACPI: PM: Drop unused function and function header
ACPI: PM: Introduce "poweroff" callbacks for ACPI PM domain and LPSS
ACPI: PM: Simplify and fix PM domain hibernation callbacks
PCI: PM: Simplify bus-level hibernation callbacks
PM: ACPI/PCI: Resume all devices during hibernation
cpufreq: Avoid calling cpufreq_verify_current_freq() from handle_update()
cpufreq: Consolidate cpufreq_update_current_freq() and __cpufreq_get()
kernel: power: swap: use kzalloc() instead of kmalloc() followed by memset()
cpufreq: Don't skip frequency validation for has_target() drivers
PCI: PM/ACPI: Refresh all stale power state data in pci_pm_complete()
PCI / ACPI: Add _PR0 dependent devices
ACPI / PM: Introduce concept of a _PR0 dependent device
PCI / ACPI: Use cached ACPI device state to get PCI device power state
ACPI: PM: Allow transitions to D0 to occur in special cases
ACPI: PM: Avoid evaluating _PS3 on transitions from D3hot to D3cold
cpufreq: Use has_target() instead of !setpolicy
...
Pull force_sig() argument change from Eric Biederman:
"A source of error over the years has been that force_sig has taken a
task parameter when it is only safe to use force_sig with the current
task.
The force_sig function is built for delivering synchronous signals
such as SIGSEGV where the userspace application caused a synchronous
fault (such as a page fault) and the kernel responded with a signal.
Because the name force_sig does not make this clear, and because the
force_sig takes a task parameter the function force_sig has been
abused for sending other kinds of signals over the years. Slowly those
have been fixed when the oopses have been tracked down.
This set of changes fixes the remaining abusers of force_sig and
carefully rips out the task parameter from force_sig and friends
making this kind of error almost impossible in the future"
* 'siginfo-linus' of git://git.kernel.org/pub/scm/linux/kernel/git/ebiederm/user-namespace: (27 commits)
signal/x86: Move tsk inside of CONFIG_MEMORY_FAILURE in do_sigbus
signal: Remove the signal number and task parameters from force_sig_info
signal: Factor force_sig_info_to_task out of force_sig_info
signal: Generate the siginfo in force_sig
signal: Move the computation of force into send_signal and correct it.
signal: Properly set TRACE_SIGNAL_LOSE_INFO in __send_signal
signal: Remove the task parameter from force_sig_fault
signal: Use force_sig_fault_to_task for the two calls that don't deliver to current
signal: Explicitly call force_sig_fault on current
signal/unicore32: Remove tsk parameter from __do_user_fault
signal/arm: Remove tsk parameter from __do_user_fault
signal/arm: Remove tsk parameter from ptrace_break
signal/nds32: Remove tsk parameter from send_sigtrap
signal/riscv: Remove tsk parameter from do_trap
signal/sh: Remove tsk parameter from force_sig_info_fault
signal/um: Remove task parameter from send_sigtrap
signal/x86: Remove task parameter from send_sigtrap
signal: Remove task parameter from force_sig_mceerr
signal: Remove task parameter from force_sig
signal: Remove task parameter from force_sigsegv
...
Pull SMP/hotplug updates from Thomas Gleixner:
"A small set of updates for SMP and CPU hotplug:
- Abort disabling secondary CPUs in the freezer when a wakeup is
pending instead of evaluating it only after all CPUs have been
offlined.
- Remove the shared annotation for the strict per CPU cfd_data in the
smp function call core code.
- Remove the return values of smp_call_function() and on_each_cpu()
as they are unconditionally 0. Fixup the few callers which actually
bothered to check the return value"
* 'smp-hotplug-for-linus' of git://git.kernel.org/pub/scm/linux/kernel/git/tip/tip:
smp: Remove smp_call_function() and on_each_cpu() return values
smp: Do not mark call_function_data as shared
cpu/hotplug: Abort disabling secondary CPUs if wakeup is pending
cpu/hotplug: Fix notify_cpu_starting() reference in bringup_wait_for_ap()
- arm64 support for syscall emulation via PTRACE_SYSEMU{,_SINGLESTEP}
- Wire up VM_FLUSH_RESET_PERMS for arm64, allowing the core code to
manage the permissions of executable vmalloc regions more strictly
- Slight performance improvement by keeping softirqs enabled while
touching the FPSIMD/SVE state (kernel_neon_begin/end)
- Expose a couple of ARMv8.5 features to user (HWCAP): CondM (new XAFLAG
and AXFLAG instructions for floating point comparison flags
manipulation) and FRINT (rounding floating point numbers to integers)
- Re-instate ARM64_PSEUDO_NMI support which was previously marked as
BROKEN due to some bugs (now fixed)
- Improve parking of stopped CPUs and implement an arm64-specific
panic_smp_self_stop() to avoid warning on not being able to stop
secondary CPUs during panic
- perf: enable the ARM Statistical Profiling Extensions (SPE) on ACPI
platforms
- perf: DDR performance monitor support for iMX8QXP
- cache_line_size() can now be set from DT or ACPI/PPTT if provided to
cope with a system cache info not exposed via the CPUID registers
- Avoid warning on hardware cache line size greater than
ARCH_DMA_MINALIGN if the system is fully coherent
- arm64 do_page_fault() and hugetlb cleanups
- Refactor set_pte_at() to avoid redundant READ_ONCE(*ptep)
- Ignore ACPI 5.1 FADTs reported as 5.0 (infer from the 'arm_boot_flags'
introduced in 5.1)
- CONFIG_RANDOMIZE_BASE now enabled in defconfig
- Allow the selection of ARM64_MODULE_PLTS, currently only done via
RANDOMIZE_BASE (and an erratum workaround), allowing modules to spill
over into the vmalloc area
- Make ZONE_DMA32 configurable
-----BEGIN PGP SIGNATURE-----
iQIzBAABCgAdFiEE5RElWfyWxS+3PLO2a9axLQDIXvEFAl0eHqcACgkQa9axLQDI
XvFyNA/+L+bnkz8m3ncydlqqfXomQn4eJJVQ8Uksb0knJz+1+3CUxxbO4ry4jXZN
fMkbggYrDPRKpDbsUl0lsRipj7jW9bqan+N37c3SWqCkgb6HqDaHViwxdx6Ec/Uk
gHudozDSPh/8c7hxGcSyt/CFyuW6b+8eYIQU5rtIgz8aVY2BypBvS/7YtYCbIkx0
w4CFleRTK1zXD5mJQhrc6jyDx659sVkrAvdhf6YIymOY8nBTv40vwdNo3beJMYp8
Po/+0Ixu+VkHUNtmYYZQgP/AGH96xiTcRnUqd172JdtRPpCLqnLqwFokXeVIlUKT
KZFMDPzK+756Ayn4z4huEePPAOGlHbJje8JVNnFyreKhVVcCotW7YPY/oJR10bnc
eo7yD+DxABTn+93G2yP436bNVa8qO1UqjOBfInWBtnNFJfANIkZweij/MQ6MjaTA
o7KtviHnZFClefMPoiI7HDzwL8XSmsBDbeQ04s2Wxku1Y2xUHLx4iLmadwLQ1ZPb
lZMTZP3N/T1554MoURVA1afCjAwiqU3bt1xDUGjbBVjLfSPBAn/25IacsG9Li9AF
7Rp1M9VhrfLftjFFkB2HwpbhRASOxaOSx+EI3kzEfCtM2O9I1WHgP3rvCdc3l0HU
tbK0/IggQicNgz7GSZ8xDlWPwwSadXYGLys+xlMZEYd3pDIOiFc=
=0TDT
-----END PGP SIGNATURE-----
Merge tag 'arm64-upstream' of git://git.kernel.org/pub/scm/linux/kernel/git/arm64/linux
Pull arm64 updates from Catalin Marinas:
- arm64 support for syscall emulation via PTRACE_SYSEMU{,_SINGLESTEP}
- Wire up VM_FLUSH_RESET_PERMS for arm64, allowing the core code to
manage the permissions of executable vmalloc regions more strictly
- Slight performance improvement by keeping softirqs enabled while
touching the FPSIMD/SVE state (kernel_neon_begin/end)
- Expose a couple of ARMv8.5 features to user (HWCAP): CondM (new
XAFLAG and AXFLAG instructions for floating point comparison flags
manipulation) and FRINT (rounding floating point numbers to integers)
- Re-instate ARM64_PSEUDO_NMI support which was previously marked as
BROKEN due to some bugs (now fixed)
- Improve parking of stopped CPUs and implement an arm64-specific
panic_smp_self_stop() to avoid warning on not being able to stop
secondary CPUs during panic
- perf: enable the ARM Statistical Profiling Extensions (SPE) on ACPI
platforms
- perf: DDR performance monitor support for iMX8QXP
- cache_line_size() can now be set from DT or ACPI/PPTT if provided to
cope with a system cache info not exposed via the CPUID registers
- Avoid warning on hardware cache line size greater than
ARCH_DMA_MINALIGN if the system is fully coherent
- arm64 do_page_fault() and hugetlb cleanups
- Refactor set_pte_at() to avoid redundant READ_ONCE(*ptep)
- Ignore ACPI 5.1 FADTs reported as 5.0 (infer from the
'arm_boot_flags' introduced in 5.1)
- CONFIG_RANDOMIZE_BASE now enabled in defconfig
- Allow the selection of ARM64_MODULE_PLTS, currently only done via
RANDOMIZE_BASE (and an erratum workaround), allowing modules to spill
over into the vmalloc area
- Make ZONE_DMA32 configurable
* tag 'arm64-upstream' of git://git.kernel.org/pub/scm/linux/kernel/git/arm64/linux: (54 commits)
perf: arm_spe: Enable ACPI/Platform automatic module loading
arm_pmu: acpi: spe: Add initial MADT/SPE probing
ACPI/PPTT: Add function to return ACPI 6.3 Identical tokens
ACPI/PPTT: Modify node flag detection to find last IDENTICAL
x86/entry: Simplify _TIF_SYSCALL_EMU handling
arm64: rename dump_instr as dump_kernel_instr
arm64/mm: Drop [PTE|PMD]_TYPE_FAULT
arm64: Implement panic_smp_self_stop()
arm64: Improve parking of stopped CPUs
arm64: Expose FRINT capabilities to userspace
arm64: Expose ARMv8.5 CondM capability to userspace
arm64: defconfig: enable CONFIG_RANDOMIZE_BASE
arm64: ARM64_MODULES_PLTS must depend on MODULES
arm64: bpf: do not allocate executable memory
arm64/kprobes: set VM_FLUSH_RESET_PERMS on kprobe instruction pages
arm64/mm: wire up CONFIG_ARCH_HAS_SET_DIRECT_MAP
arm64: module: create module allocations without exec permissions
arm64: Allow user selection of ARM64_MODULE_PLTS
acpi/arm64: ignore 5.1 FADTs that are reported as 5.0
arm64: Allow selecting Pseudo-NMI again
...
* pm-sleep:
PM: sleep: Drop dev_pm_skip_next_resume_phases()
ACPI: PM: Drop unused function and function header
ACPI: PM: Introduce "poweroff" callbacks for ACPI PM domain and LPSS
ACPI: PM: Simplify and fix PM domain hibernation callbacks
PCI: PM: Simplify bus-level hibernation callbacks
PM: ACPI/PCI: Resume all devices during hibernation
kernel: power: swap: use kzalloc() instead of kmalloc() followed by memset()
PM: sleep: Update struct wakeup_source documentation
drivers: base: power: remove wakeup_sources_stats_dentry variable
PM: suspend: Rename pm_suspend_via_s2idle()
PM: sleep: Show how long dpm_suspend_start() and dpm_suspend_end() take
PM: hibernate: powerpc: Expose pfn_is_nosave() prototype
To increase readability/maintainability, replace hard coded
instructions values by symbolic names.
Signed-off-by: Christophe Leroy <christophe.leroy@c-s.fr>
[mpe: Fix R_PPC64_ENTRY case, the addi reads from r2 not r12]
Signed-off-by: Michael Ellerman <mpe@ellerman.id.au>
To increase readability/maintainability, replace hard coded
instructions values by symbolic names.
Signed-off-by: Christophe Leroy <christophe.leroy@c-s.fr>
Signed-off-by: Michael Ellerman <mpe@ellerman.id.au>
PPC_HA() PPC_HI() and PPC_LO() macros are nice macros. Move them
from module64.c to ppc-opcode.h in order to use them in other places.
Signed-off-by: Christophe Leroy <christophe.leroy@c-s.fr>
[mpe: Clean up formatting in new code, drop duplicates in ftrace.c]
Signed-off-by: Michael Ellerman <mpe@ellerman.id.au>
The comment here is wrong, the addi reads from r2 not r12. The code is
correct, 0x38420000 = addi r2,r2,0.
Fixes: a61674bdfc ("powerpc/module: Handle R_PPC64_ENTRY relocations")
Signed-off-by: Michael Ellerman <mpe@ellerman.id.au>
On most arches having function flush_dcache_range(), including PPC32,
this function does a writeback and invalidation of the cache bloc.
On PPC64, flush_dcache_range() only does a writeback while
flush_inval_dcache_range() does the invalidation in addition.
In addition it looks like within arch/powerpc/, there are no PPC64
platforms using flush_dcache_range()
This patch drops the existing 64 bits version of flush_dcache_range()
and renames flush_inval_dcache_range() into flush_dcache_range().
Signed-off-by: Christophe Leroy <christophe.leroy@c-s.fr>
Signed-off-by: Michael Ellerman <mpe@ellerman.id.au>
The pseries platform uses the PCI_PROBE_DEVTREE method of PCI probing
which reads "assigned-addresses" of every PCI device and initializes
the device resources. However if the property is missing or zero sized,
then there is no fallback of any kind and the PCI resources remain
undiscovered, i.e. pdev->resource[] array remains empty.
This adds a fallback which parses the "reg" property in pretty much same
way except it marks resources as "unset" which later make Linux assign
those resources proper addresses.
This has an effect when:
1. a hypervisor failed to assign any resource for a device;
2. /chosen/linux,pci-probe-only=0 is in the DT so the system may try
assigning a resource.
Neither is likely to happen under PowerVM.
Signed-off-by: Alexey Kardashevskiy <aik@ozlabs.ru>
Signed-off-by: Michael Ellerman <mpe@ellerman.id.au>
The commit 8617a5c5bc ("powerpc/dma: handle iommu bypass in
dma_iommu_ops") merged direct DMA ops into the IOMMU DMA ops allowing
SWIOTLB as well but only for mapping; the unmapping and bouncing parts
were left unmodified.
This adds missing direct unmapping calls to .unmap_page() and
.unmap_sg().
This adds missing sync callbacks and directs them to the direct DMA
hooks.
Fixes: 8617a5c5bc ("powerpc/dma: handle iommu bypass in dma_iommu_ops")
Signed-off-by: Alexey Kardashevskiy <aik@ozlabs.ru>
Signed-off-by: Thiago Jung Bauermann <bauerman@linux.ibm.com>
Signed-off-by: Michael Ellerman <mpe@ellerman.id.au>
Use the dma_get_mask() helper from dma-mapping.h instead, as they are
functionally identical.
Signed-off-by: Christoph Hellwig <hch@lst.de>
Reviewed-by: Alexey Kardashevskiy <aik@ozlabs.ru>
Signed-off-by: Michael Ellerman <mpe@ellerman.id.au>
If you compile with KVM but without CONFIG_HAVE_HW_BREAKPOINT you fail
at linking with:
arch/powerpc/kvm/book3s_hv_rmhandlers.o:(.text+0x708): undefined reference to `dawr_force_enable'
This was caused by commit c1fe190c06 ("powerpc: Add force enable of
DAWR on P9 option").
This moves a bunch of code around to fix this. It moves a lot of the
DAWR code in a new file and creates a new CONFIG_PPC_DAWR to enable
compiling it.
Fixes: c1fe190c06 ("powerpc: Add force enable of DAWR on P9 option")
Signed-off-by: Michael Neuling <mikey@neuling.org>
[mpe: Minor formatting in set_dawr()]
Signed-off-by: Michael Ellerman <mpe@ellerman.id.au>
In commit c1fe190c06 ("powerpc: Add force enable of DAWR on P9
option") the following piece of code was added:
smp_call_function((smp_call_func_t)set_dawr, &null_brk, 0);
Since GCC 8 this triggers the following warning about incompatible
function types:
arch/powerpc/kernel/hw_breakpoint.c:408:21: error: cast between incompatible function types from 'int (*)(struct arch_hw_breakpoint *)' to 'void (*)(void *)' [-Werror=cast-function-type]
Since the warning is there for a reason, and should not be hidden behind
a cast, provide an intermediate callback function to avoid the warning.
Fixes: c1fe190c06 ("powerpc: Add force enable of DAWR on P9 option")
Suggested-by: Christoph Hellwig <hch@infradead.org>
Signed-off-by: Mathieu Malaterre <malat@debian.org>
Signed-off-by: Michael Neuling <mikey@neuling.org>
Signed-off-by: Michael Ellerman <mpe@ellerman.id.au>
This makes it clear to the caller that it can only be used on POWER9
and later CPUs.
Signed-off-by: Nicholas Piggin <npiggin@gmail.com>
[mpe: Use "ISA_3_0" rather than "ARCH_300"]
Signed-off-by: Michael Ellerman <mpe@ellerman.id.au>
Branch to the relocated 0xc000 address early (still in real mode), to
simplify subsequent branches. Have the virt mode handler avoid just
'windup' and redo the exception from scratch, rather than branching
back to the trampoline.
Rearrange the stack setup instruction location to match the system
reset handler (e.g., right before EXCEPTION_PROLOG_COMMON).
Signed-off-by: Nicholas Piggin <npiggin@gmail.com>
Signed-off-by: Michael Ellerman <mpe@ellerman.id.au>
Follow convention and move tramp ahead of common.
Signed-off-by: Nicholas Piggin <npiggin@gmail.com>
Signed-off-by: Michael Ellerman <mpe@ellerman.id.au>
The idle wake up code in the system reset interrupt is not very
optimal. There are two requirements: perform idle wake up quickly;
and save everything including CFAR for non-idle interrupts, with
no performance requirement.
The problem with placing the idle test in the middle of the handler
and using the normal handler code to save CFAR, is that it's quite
costly (e.g., mfcfar is serialising, speculative workarounds get
applied, SRR1 has to be reloaded, etc). It also prevents the standard
interrupt handler boilerplate being used.
This pain can be avoided by using a dedicated idle interrupt handler
at the start of the interrupt handler, which restores all registers
back to the way they were in case it was not an idle wake up. CFAR
is preserved without saving it before the non-idle case by making that
the fall-through, and idle is a taken branch.
Performance seems to be in the noise, but possibly around 0.5% faster,
the executed instructions certainly look better. The bigger benefit is
being able to drop in standard interrupt handlers after the idle code,
which helps with subsequent cleanup and consolidation.
Signed-off-by: Nicholas Piggin <npiggin@gmail.com>
[mpe: Fixup BE by using DOTSYM for idle_return_gpr_loss call]
Signed-off-by: Michael Ellerman <mpe@ellerman.id.au>
The bad stack test in interrupt handlers has a few problems. For
performance it is taken in the common case, which is a fetch bubble
and a waste of i-cache.
For code development and maintainence, it requires yet another stack
frame setup routine, and that constrains all exception handlers to
follow the same register save pattern which inhibits future
optimisation.
Remove the test/branch and replace it with a trap. Teach the program
check handler to use the emergency stack for this case.
This does not result in quite so nice a message, however the SRR0 and
SRR1 of the crashed interrupt can be seen in r11 and r12, as is the
original r1 (adjusted by INT_FRAME_SIZE). These are the most important
parts to debugging the issue.
The original r9-12 and cr0 is lost, which is the main downside.
kernel BUG at linux/arch/powerpc/kernel/exceptions-64s.S:847!
Oops: Exception in kernel mode, sig: 5 [#1]
BE SMP NR_CPUS=2048 NUMA PowerNV
Modules linked in:
CPU: 0 PID: 1 Comm: swapper/0 Not tainted
NIP: c000000000009108 LR: c000000000cadbcc CTR: c0000000000090f0
REGS: c0000000fffcbd70 TRAP: 0700 Not tainted
MSR: 9000000000021032 <SF,HV,ME,IR,DR,RI> CR: 28222448 XER: 20040000
CFAR: c000000000009100 IRQMASK: 0
GPR00: 000000000000003d fffffffffffffd00 c0000000018cfb00 c0000000f02b3166
GPR04: fffffffffffffffd 0000000000000007 fffffffffffffffb 0000000000000030
GPR08: 0000000000000037 0000000028222448 0000000000000000 c000000000ca8de0
GPR12: 9000000002009032 c000000001ae0000 c000000000010a00 0000000000000000
GPR16: 0000000000000000 0000000000000000 0000000000000000 0000000000000000
GPR20: c0000000f00322c0 c000000000f85200 0000000000000004 ffffffffffffffff
GPR24: fffffffffffffffe 0000000000000000 0000000000000000 000000000000000a
GPR28: 0000000000000000 0000000000000000 c0000000f02b391c c0000000f02b3167
NIP [c000000000009108] decrementer_common+0x18/0x160
LR [c000000000cadbcc] .vsnprintf+0x3ec/0x4f0
Call Trace:
Instruction dump:
996d098a 994d098b 38610070 480246ed 48005518 60000000 38200000 718a4000
7c2a0b78 3821fd00 41c20008 e82d0970 <0981fd00> f92101a0 f9610170 f9810178
Signed-off-by: Nicholas Piggin <npiggin@gmail.com>
Signed-off-by: Michael Ellerman <mpe@ellerman.id.au>
Since the system reset interrupt began to use its own stack, and
machine check interrupts have done so for some time, r1 can be
changed without clearing MSR[RI], provided no other interrupts
(including SLB misses) are taken.
MSR[RI] does have to be cleared when using SCRATCH0, however.
Signed-off-by: Nicholas Piggin <npiggin@gmail.com>
Signed-off-by: Michael Ellerman <mpe@ellerman.id.au>
Although the 0x1500 interrupt only applies to bare metal, it is better
to just use the standard macro for scratch save.
Runtime code path remains unchanged (due to instruction patching).
Signed-off-by: Nicholas Piggin <npiggin@gmail.com>
Signed-off-by: Michael Ellerman <mpe@ellerman.id.au>
Restore all SPRs and CR up-front, these are longer latency
instructions. Move register restore around to maximise pairs of
adjacent loads (e.g., restore r0 next to r1).
Signed-off-by: Nicholas Piggin <npiggin@gmail.com>
Signed-off-by: Michael Ellerman <mpe@ellerman.id.au>
Duplicate the hmi windup code for both cases, rather than to put a
special case branch in the middle of it. Remove unused label. This
helps with later code consolidation.
Signed-off-by: Nicholas Piggin <npiggin@gmail.com>
Signed-off-by: Michael Ellerman <mpe@ellerman.id.au>
Move in_mce decrement earlier before registers are restored (but
still after RI=0). This helps with later consolidation.
Signed-off-by: Nicholas Piggin <npiggin@gmail.com>
Signed-off-by: Michael Ellerman <mpe@ellerman.id.au>
All supported 64s CPUs support mtmsrd L=1 instruction, so a cleanup
can be made in sreset and mce handlers.
Signed-off-by: Nicholas Piggin <npiggin@gmail.com>
Signed-off-by: Michael Ellerman <mpe@ellerman.id.au>
Move SPR reads ahead of writes. Real mode entry that is not a KVM
guest is rare these days, but bad practice propagates.
Signed-off-by: Nicholas Piggin <npiggin@gmail.com>
Signed-off-by: Michael Ellerman <mpe@ellerman.id.au>
syscall / hcall entry unnecessarily differs between KVM and non-KVM
builds. Move the SMT priority instruction to the same location
(after INTERRUPT_TO_KERNEL).
Signed-off-by: Nicholas Piggin <npiggin@gmail.com>
Signed-off-by: Michael Ellerman <mpe@ellerman.id.au>
No generated code change. Final vmlinux is changed only due to change
in bug table line numbers.
Signed-off-by: Nicholas Piggin <npiggin@gmail.com>
Signed-off-by: Michael Ellerman <mpe@ellerman.id.au>
Generally, macros that result in instructions being expanded are
indented by a tab, and those that don't have no indent. Fix the
obvious cases that go contrary to style.
No generated code change.
Signed-off-by: Nicholas Piggin <npiggin@gmail.com>
Signed-off-by: Michael Ellerman <mpe@ellerman.id.au>
After the previous cleanup, it becomes possible to consolidate some
common code outside the runtime alternate patching. Also remove
unused labels.
This results in some code change, but unchanged runtime instruction
sequence.
Signed-off-by: Nicholas Piggin <npiggin@gmail.com>
Signed-off-by: Michael Ellerman <mpe@ellerman.id.au>
Many of these macros just specify 1-4 lines which are only called a
few times each at most, and often just once. Remove this indirection.
No generated code change.
Signed-off-by: Nicholas Piggin <npiggin@gmail.com>
Signed-off-by: Michael Ellerman <mpe@ellerman.id.au>
More cases of code insertion via macros that does not add a great
deal. All the additions have to be specified in the macro arguments,
so they can just as well go after the macro.
No generated code change.
Signed-off-by: Nicholas Piggin <npiggin@gmail.com>
Signed-off-by: Michael Ellerman <mpe@ellerman.id.au>
The aim is to reduce the amount of indirection it takes to get through
the exception handler macros, particularly where it provides little
code sharing.
No generated code change.
Signed-off-by: Nicholas Piggin <npiggin@gmail.com>
Signed-off-by: Michael Ellerman <mpe@ellerman.id.au>
Conditionally expand the skip case if it is specified.
No generated code change.
Signed-off-by: Nicholas Piggin <npiggin@gmail.com>
Signed-off-by: Michael Ellerman <mpe@ellerman.id.au>