Commit graph

180227 commits

Author SHA1 Message Date
Palmer Dabbelt
6585bd8274
arm64: Use the generic devmem_is_allowed()
I recently copied this into lib/ for use by the RISC-V port.

Acked-by: Catalin Marinas <catalin.marinas@arm.com>
Reviewed-by: Luis Chamberlain <mcgrof@kernel.org>
Signed-off-by: Palmer Dabbelt <palmerdabbelt@google.com>
2020-12-11 12:28:35 -08:00
Palmer Dabbelt
914ee96654
arm: Use the generic devmem_is_allowed()
This is exactly the same as the arm64 version, which I recently copied
into lib/ for use by the RISC-V port.

Reviewed-by: Luis Chamberlain <mcgrof@kernel.org>
Signed-off-by: Palmer Dabbelt <palmerdabbelt@google.com>
2020-12-11 12:28:24 -08:00
Palmer Dabbelt
78ed473c76
RISC-V: Use the new generic devmem_is_allowed()
This allows us to enable STRICT_DEVMEM.

Reviewed-by: Luis Chamberlain <mcgrof@kernel.org>
Signed-off-by: Palmer Dabbelt <palmerdabbelt@google.com>
2020-12-11 12:28:19 -08:00
Lukas Bulwahn
9a02fd8b19 x86/ia32_signal: Propagate __user annotation properly
Commit

  57d563c829 ("x86: ia32_setup_rt_frame(): consolidate uaccess areas")

dropped a __user annotation in a cast when refactoring __put_user() to
unsafe_put_user().

Hence, since then, sparse warns in arch/x86/ia32/ia32_signal.c:350:9:

  warning: cast removes address space '__user' of expression
  warning: incorrect type in argument 1 (different address spaces)
    expected void const volatile [noderef] __user *ptr
    got unsigned long long [usertype] *

Add the __user annotation to restore the propagation of address spaces.

Signed-off-by: Lukas Bulwahn <lukas.bulwahn@gmail.com>
Signed-off-by: Borislav Petkov <bp@suse.de>
Link: https://lkml.kernel.org/r/20201207124141.21859-1-lukas.bulwahn@gmail.com
2020-12-11 19:44:31 +01:00
Greg Kroah-Hartman
007e337080 USB-serial updates for 5.11-rc1
Here are the USB-serial updates for 5.11-rc1, including:
 
  - keyspan_pda write-implementation fixes
  - digi_acceleport write-wakeup fix
  - mos7720 parport-restore fix
  - mos7720 parport-tasklet removal
  - cp210x termios-handling cleanups
  - option device-flag fix
  - ftdi_sio GPIO CBUS-configuration improvements
  - removal of in_interrupt() uses
 
 Included are also various clean ups.
 
 All have been in linux-next with no reported issues.
 -----BEGIN PGP SIGNATURE-----
 
 iHUEABYIAB0WIQQHbPq+cpGvN/peuzMLxc3C7H1lCAUCX9OIiQAKCRALxc3C7H1l
 CA1OAQCZVi5abi2R/+Rr3V9/iYOA/VJXJ6Mxg8xDbt3GWmp0lQD9G4z0Ws4f1RY1
 ACEcOBNQedpoxXa/o3eb8tlAjObwAwg=
 =udri
 -----END PGP SIGNATURE-----

Merge tag 'usb-serial-5.11-rc1' of https://git.kernel.org/pub/scm/linux/kernel/git/johan/usb-serial into usb-next

Johan writes:

USB-serial updates for 5.11-rc1

Here are the USB-serial updates for 5.11-rc1, including:

 - keyspan_pda write-implementation fixes
 - digi_acceleport write-wakeup fix
 - mos7720 parport-restore fix
 - mos7720 parport-tasklet removal
 - cp210x termios-handling cleanups
 - option device-flag fix
 - ftdi_sio GPIO CBUS-configuration improvements
 - removal of in_interrupt() uses

Included are also various clean ups.

All have been in linux-next with no reported issues.

* tag 'usb-serial-5.11-rc1' of https://git.kernel.org/pub/scm/linux/kernel/git/johan/usb-serial: (30 commits)
  USB: serial: ftdi_sio: log the CBUS GPIO validity
  USB: serial: ftdi_sio: drop GPIO line checking dead code
  USB: serial: ftdi_sio: report the valid GPIO lines to gpiolib
  USB: serial: option: add interface-number sanity check to flag handling
  USB: serial: cp210x: clean up dtr_rts()
  USB: serial: cp210x: refactor flow-control handling
  USB: serial: cp210x: drop flow-control debugging
  USB: serial: cp210x: set terminal settings on open
  USB: serial: cp210x: clean up line-control handling
  USB: serial: cp210x: return early on unchanged termios
  USB: serial: mos7720: defer state restore to a workqueue
  USB: serial: mos7720: fix parallel-port state restore
  USB: serial: remove write wait queue
  USB: serial: digi_acceleport: fix write-wakeup deadlocks
  USB: serial: keyspan_pda: drop redundant usb-serial pointer
  USB: serial: keyspan_pda: use BIT() macro
  USB: serial: keyspan_pda: clean up comments and whitespace
  USB: serial: keyspan_pda: clean up xircom/entrega support
  USB: serial: keyspan_pda: add write-fifo support
  USB: serial: keyspan_pda: increase transmitter threshold
  ...
2020-12-11 16:16:52 +01:00
Giovanni Gherdovich
3149cd5530 x86: Print ratio freq_max/freq_base used in frequency invariance calculations
The value freq_max/freq_base is a fundamental component of frequency
invariance calculations. It may come from a variety of sources such as MSRs
or ACPI data, tracking it down when troubleshooting a system could be
non-trivial. It is worth saving it in the kernel logs.

 # dmesg | grep 'Estimated ratio of average max'
 [   14.024036] smpboot: Estimated ratio of average max frequency by base frequency (times 1024): 1289

Signed-off-by: Giovanni Gherdovich <ggherdovich@suse.cz>
Signed-off-by: Peter Zijlstra (Intel) <peterz@infradead.org>
Signed-off-by: Ingo Molnar <mingo@kernel.org>
Link: https://lkml.kernel.org/r/20201112182614.10700-4-ggherdovich@suse.cz
2020-12-11 10:30:23 +01:00
Giovanni Gherdovich
976df7e573 x86, sched: Use midpoint of max_boost and max_P for frequency invariance on AMD EPYC
Frequency invariant accounting calculations need the ratio
freq_curr/freq_max, but freq_max is unknown as it depends on dynamic power
allocation between cores: AMD EPYC CPUs implement "Core Performance Boost".
Three candidates are considered to estimate this value:

- maximum non-boost frequency
- maximum boost frequency
- the mid point between the above two

Experimental data on an AMD EPYC Zen2 machine slightly favors the third
option, which is applied with this patch.

The analysis uses the ondemand cpufreq governor as baseline, and compares
it with schedutil in a number of configurations. Using the freq_max value
described above offers a moderate advantage in performance and efficiency:

sugov-max (freq_max=max_boost) performs the worst on tbench: less
throughput and reduced efficiency than the other invariant-schedutil
options (see "Data Overview" below). Consider that tbench is generally a
problematic case as no schedutil version currently is better than ondemand.

sugov-P0 (freq_max=max_P) is the worst on dbench, while the other sugov's
can surpass ondemand with less filesystem latency and slightly increased
efficiency.

1. DATA OVERVIEW
2. DETAILED PERFORMANCE TABLES
3. POWER CONSUMPTION TABLE

1. DATA OVERVIEW
================

sugov-noinv : non-invariant schedutil governor
sugov-max   : invariant schedutil, freq_max=max_boost
sugov-mid   : invariant schedutil, freq_max=midpoint
sugov-P0    : invariant schedutil, freq_max=max_P
perfgov     : performance governor

driver      : acpi_cpufreq
machine     : AMD EPYC 7742 (Zen2, aka "Rome"), dual socket,
              128 cores / 256 threads, SATA SSD storage, 250G of memory,
	      XFS filesystem

Benchmarks are described in the next section.
Tilde (~) means the value is the same as baseline.

- - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - -
            ondemand  perfgov  sugov-noinv  sugov-max  sugov-mid  sugov-P0  better if
- - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - -
                                        PERFORMANCE RATIOS
tbench        1.00       1.44       0.90       0.87       0.93       0.93      higher
dbench        1.00       0.91       0.95       0.94       0.94       1.06      lower
kernbench     1.00       0.93       ~          ~          ~          0.97      lower
gitsource     1.00       0.66       0.97       0.96       ~          0.95      lower
- - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - -
                                    PERFORMANCE-PER-WATT RATIOS
tbench        1.00       1.16       0.84       0.84       0.88       0.85      higher
dbench        1.00       1.03       1.02       1.02       1.02       0.93      higher
kernbench     1.00       1.05       ~          ~          ~          ~         higher
gitsource     1.00       1.46       1.04       1.04       ~          1.05      higher

2. DETAILED PERFORMANCE TABLES
==============================

Benchmark          : tbench4 (i.e. dbench4 over the network, actually loopback)
Varying parameter  : number of clients
Unit               : MB/sec (higher is better)

                  5.9.0-ondemand (BASELINE)                   5.9.0-perfgov               5.9.0-sugov-noinv
- - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - -
Hmean  1        427.19  +- 0.16% (        )     778.35  +- 0.10% (  82.20%)     346.92  +- 0.14% ( -18.79%)
Hmean  2        853.82  +- 0.09% (        )    1536.23  +- 0.03% (  79.93%)     694.36  +- 0.05% ( -18.68%)
Hmean  4       1657.54  +- 0.12% (        )    2938.18  +- 0.12% (  77.26%)    1362.81  +- 0.11% ( -17.78%)
Hmean  8       3301.87  +- 0.06% (        )    5679.10  +- 0.04% (  72.00%)    2693.35  +- 0.04% ( -18.43%)
Hmean  16      6139.65  +- 0.05% (        )    9498.81  +- 0.04% (  54.71%)    4889.97  +- 0.17% ( -20.35%)
Hmean  32     11170.28  +- 0.09% (        )   17393.25  +- 0.08% (  55.71%)    9104.55  +- 0.09% ( -18.49%)
Hmean  64     19322.97  +- 0.17% (        )   31573.91  +- 0.08% (  63.40%)   18552.52  +- 0.40% (  -3.99%)
Hmean  128    30383.71  +- 0.11% (        )   37416.91  +- 0.15% (  23.15%)   25938.70  +- 0.41% ( -14.63%)
Hmean  256    31143.96  +- 0.41% (        )   30908.76  +- 0.88% (  -0.76%)   29754.32  +- 0.24% (  -4.46%)
Hmean  512    30858.49  +- 0.26% (        )   38524.60  +- 1.19% (  24.84%)   42080.39  +- 0.56% (  36.37%)
Hmean  1024   39187.37  +- 0.19% (        )   36213.86  +- 0.26% (  -7.59%)   39555.98  +- 0.12% (   0.94%)

                            5.9.0-sugov-max                 5.9.0-sugov-mid                  5.9.0-sugov-P0
- - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - -
Hmean  1        352.59  +- 1.03% ( -17.46%)     352.08  +- 0.75% ( -17.58%)     352.31  +- 1.48% ( -17.53%)
Hmean  2        697.32  +- 0.08% ( -18.33%)     700.16  +- 0.20% ( -18.00%)     696.79  +- 0.06% ( -18.39%)
Hmean  4       1369.88  +- 0.04% ( -17.35%)    1369.72  +- 0.07% ( -17.36%)    1365.91  +- 0.05% ( -17.59%)
Hmean  8       2696.79  +- 0.04% ( -18.33%)    2711.06  +- 0.04% ( -17.89%)    2715.10  +- 0.61% ( -17.77%)
Hmean  16      4725.03  +- 0.03% ( -23.04%)    4875.65  +- 0.02% ( -20.59%)    4953.05  +- 0.28% ( -19.33%)
Hmean  32      9231.65  +- 0.10% ( -17.36%)    8704.89  +- 0.27% ( -22.07%)   10562.02  +- 0.36% (  -5.45%)
Hmean  64     15364.27  +- 0.19% ( -20.49%)   17786.64  +- 0.15% (  -7.95%)   19665.40  +- 0.22% (   1.77%)
Hmean  128    42100.58  +- 0.13% (  38.56%)   34946.28  +- 0.13% (  15.02%)   38635.79  +- 0.06% (  27.16%)
Hmean  256    30660.23  +- 1.08% (  -1.55%)   32307.67  +- 0.54% (   3.74%)   31153.27  +- 0.12% (   0.03%)
Hmean  512    24604.32  +- 0.14% ( -20.27%)   40408.50  +- 1.10% (  30.95%)   38800.29  +- 1.23% (  25.74%)
Hmean  1024   35535.47  +- 0.28% (  -9.32%)   41070.38  +- 2.56% (   4.81%)   31308.29  +- 2.52% ( -20.11%)

Benchmark          : dbench (filesystem stressor)
Varying parameter  : number of clients
Unit               : seconds (lower is better)

NOTE-1: This dbench version measures the average latency of a set of filesystem
        operations, as we found the traditional dbench metric (throughput) to be
	misleading.
NOTE-2: Due to high variability, we partition the original dataset and apply
        statistical bootrapping (a resampling method). Accuracy is reported in the
	form of 95% confidence intervals.

                  5.9.0-ondemand (BASELINE)                   5.9.0-perfgov               5.9.0-sugov-noinv
- - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - -
SubAmean  1         98.79  +- 0.92 (        )      83.36  +- 0.82 (  15.62%)      84.82  +- 0.92 (  14.14%)
SubAmean  2        116.00  +- 0.89 (        )     102.12  +- 0.77 (  11.96%)     109.63  +- 0.89 (   5.49%)
SubAmean  4        149.90  +- 1.03 (        )     132.12  +- 0.91 (  11.86%)     143.90  +- 1.15 (   4.00%)
SubAmean  8        182.41  +- 1.13 (        )     159.86  +- 0.93 (  12.36%)     165.82  +- 1.03 (   9.10%)
SubAmean  16       237.83  +- 1.23 (        )     219.46  +- 1.14 (   7.72%)     229.28  +- 1.19 (   3.59%)
SubAmean  32       334.34  +- 1.49 (        )     309.94  +- 1.42 (   7.30%)     321.19  +- 1.36 (   3.93%)
SubAmean  64       576.61  +- 2.16 (        )     540.75  +- 2.00 (   6.22%)     551.27  +- 1.99 (   4.39%)
SubAmean  128     1350.07  +- 4.14 (        )    1205.47  +- 3.20 (  10.71%)    1280.26  +- 3.75 (   5.17%)
SubAmean  256     3444.42  +- 7.97 (        )    3698.00 +- 27.43 (  -7.36%)    3494.14  +- 7.81 (  -1.44%)
SubAmean  2048   39457.89 +- 29.01 (        )   34105.33 +- 41.85 (  13.57%)   39688.52 +- 36.26 (  -0.58%)

                            5.9.0-sugov-max                 5.9.0-sugov-mid                  5.9.0-sugov-P0
- - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - -
SubAmean  1         85.68  +- 1.04 (  13.27%)      84.16  +- 0.84 (  14.81%)      83.99  +- 0.90 (  14.99%)
SubAmean  2        108.42  +- 0.95 (   6.54%)     109.91  +- 1.39 (   5.24%)     112.06  +- 0.91 (   3.39%)
SubAmean  4        136.90  +- 1.04 (   8.67%)     137.59  +- 0.93 (   8.21%)     136.55  +- 0.95 (   8.91%)
SubAmean  8        163.15  +- 0.96 (  10.56%)     166.07  +- 1.02 (   8.96%)     165.81  +- 0.99 (   9.10%)
SubAmean  16       224.86  +- 1.12 (   5.45%)     223.83  +- 1.06 (   5.89%)     230.66  +- 1.19 (   3.01%)
SubAmean  32       320.51  +- 1.38 (   4.13%)     322.85  +- 1.49 (   3.44%)     321.96  +- 1.46 (   3.70%)
SubAmean  64       553.25  +- 1.93 (   4.05%)     554.19  +- 2.08 (   3.89%)     562.26  +- 2.22 (   2.49%)
SubAmean  128     1264.35  +- 3.72 (   6.35%)    1256.99  +- 3.46 (   6.89%)    2018.97 +- 18.79 ( -49.55%)
SubAmean  256     3466.25  +- 8.25 (  -0.63%)    3450.58  +- 8.44 (  -0.18%)    5032.12 +- 38.74 ( -46.09%)
SubAmean  2048   39133.10 +- 45.71 (   0.82%)   39905.95 +- 34.33 (  -1.14%)   53811.86 +-193.04 ( -36.38%)

Benchmark          : kernbench (kernel compilation)
Varying parameter  : number of jobs
Unit               : seconds (lower is better)

                  5.9.0-ondemand (BASELINE)                   5.9.0-perfgov               5.9.0-sugov-noinv
- - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - -
Amean  2        471.71 +- 26.61% (        )     409.88 +- 16.99% (  13.11%)     430.63  +- 0.18% (   8.71%)
Amean  4        211.87  +- 0.58% (        )     194.03  +- 0.74% (   8.42%)     215.33  +- 0.64% (  -1.63%)
Amean  8        109.79  +- 1.27% (        )     101.43  +- 1.53% (   7.61%)     111.05  +- 1.95% (  -1.15%)
Amean  16        59.50  +- 1.28% (        )      55.61  +- 1.35% (   6.55%)      59.65  +- 1.78% (  -0.24%)
Amean  32        34.94  +- 1.22% (        )      32.36  +- 1.95% (   7.41%)      35.44  +- 0.63% (  -1.43%)
Amean  64        22.58  +- 0.38% (        )      20.97  +- 1.28% (   7.11%)      22.41  +- 1.73% (   0.74%)
Amean  128       17.72  +- 0.44% (        )      16.68  +- 0.32% (   5.88%)      17.65  +- 0.96% (   0.37%)
Amean  256       16.44  +- 0.53% (        )      15.76  +- 0.32% (   4.18%)      16.76  +- 0.60% (  -1.93%)
Amean  512       16.54  +- 0.21% (        )      15.62  +- 0.41% (   5.53%)      16.84  +- 0.85% (  -1.83%)

                            5.9.0-sugov-max                 5.9.0-sugov-mid                  5.9.0-sugov-P0
- - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - -
Amean  2        421.30  +- 0.24% (  10.69%)     419.26  +- 0.15% (  11.12%)     414.38  +- 0.33% (  12.15%)
Amean  4  	217.81  +- 5.53% (  -2.80%)     211.63  +- 0.99% (   0.12%)     208.43  +- 0.47% (   1.63%)
Amean  8  	108.80  +- 0.43% (   0.90%)     108.48  +- 1.44% (   1.19%)     108.59  +- 3.08% (   1.09%)
Amean  16 	 58.84  +- 0.74% (   1.12%)      58.37  +- 0.94% (   1.91%)      57.78  +- 0.78% (   2.90%)
Amean  32 	 34.04  +- 2.00% (   2.59%)      34.28  +- 1.18% (   1.91%)      33.98  +- 2.21% (   2.75%)
Amean  64 	 22.22  +- 1.69% (   1.60%)      22.27  +- 1.60% (   1.38%)      22.25  +- 1.41% (   1.47%)
Amean  128	 17.55  +- 0.24% (   0.97%)      17.53  +- 0.94% (   1.04%)      17.49  +- 0.43% (   1.30%)
Amean  256	 16.51  +- 0.46% (  -0.40%)      16.48  +- 0.48% (  -0.19%)      16.44  +- 1.21% (   0.00%)
Amean  512	 16.50  +- 0.35% (   0.19%)      16.35  +- 0.42% (   1.14%)      16.37  +- 0.33% (   0.99%)

Benchmark          : gitsource (time to run the git unit test suite)
Varying parameter  : none
Unit               : seconds (lower is better)

                  5.9.0-ondemand (BASELINE)                   5.9.0-perfgov               5.9.0-sugov-noinv
- - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - -
Amean          1035.76  +- 0.30% (        )     688.21  +- 0.04% (  33.56%)    1003.85  +- 0.14% (   3.08%)

                            5.9.0-sugov-max                 5.9.0-sugov-mid                  5.9.0-sugov-P0
- - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - -
Amean           995.82  +- 0.08% (   3.86%)    1011.98  +- 0.03% (   2.30%)     986.87  +- 0.19% (   4.72%)

3. POWER CONSUMPTION TABLE
==========================

Average power consumption (watts).

- - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - -
            ondemand  perfgov  sugov-noinv  sugov-max  sugov-mid  sugov-P0
- - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - -
tbench4     227.25     281.83     244.17     236.76     241.50     247.99
dbench4     151.97     161.87     157.08     158.10     158.06     153.73
kernbench   162.78     167.22     162.90     164.19     164.65     164.72
gitsource   133.65     139.00     133.04     134.43     134.18     134.32

Signed-off-by: Giovanni Gherdovich <ggherdovich@suse.cz>
Signed-off-by: Peter Zijlstra (Intel) <peterz@infradead.org>
Signed-off-by: Ingo Molnar <mingo@kernel.org>
Link: https://lkml.kernel.org/r/20201112182614.10700-3-ggherdovich@suse.cz
2020-12-11 10:29:55 +01:00
Nathan Fontenot
41ea667227 x86, sched: Calculate frequency invariance for AMD systems
This is the first pass in creating the ability to calculate the
frequency invariance on AMD systems. This approach uses the CPPC
highest performance and nominal performance values that range from
0 - 255 instead of a high and base frquency. This is because we do
not have the ability on AMD to get a highest frequency value.

On AMD systems the highest performance and nominal performance
vaues do correspond to the highest and base frequencies for the system
so using them should produce an appropriate ratio but some tweaking
is likely necessary.

Due to CPPC being initialized later in boot than when the frequency
invariant calculation is currently made, I had to create a callback
from the CPPC init code to do the calculation after we have CPPC
data.

Special thanks to "kernel test robot <lkp@intel.com>" for reporting that
compilation of drivers/acpi/cppc_acpi.c is conditional to
CONFIG_ACPI_CPPC_LIB, not just CONFIG_ACPI.

[ ggherdovich@suse.cz: made safe under CPU hotplug, edited changelog. ]

Signed-off-by: Nathan Fontenot <nathan.fontenot@amd.com>
Signed-off-by: Giovanni Gherdovich <ggherdovich@suse.cz>
Signed-off-by: Peter Zijlstra (Intel) <peterz@infradead.org>
Signed-off-by: Ingo Molnar <mingo@kernel.org>
Link: https://lkml.kernel.org/r/20201112182614.10700-2-ggherdovich@suse.cz
2020-12-11 10:26:00 +01:00
Souptick Joarder
3ae9c3cde5
riscv: Fixed kernel test robot warning
Kernel test robot throws below warning -

   arch/riscv/kernel/asm-offsets.c:14:6: warning: no previous prototype
for 'asm_offsets' [-Wmissing-prototypes]
      14 | void asm_offsets(void)
         |      ^~~~~~~~~~~

This patch should fixed it.

Reported-by: kernel test robot <lkp@intel.com>
Signed-off-by: Souptick Joarder <jrdr.linux@gmail.com>
Signed-off-by: Palmer Dabbelt <palmerdabbelt@google.com>
2020-12-10 17:47:23 -08:00
Kefeng Wang
772e1b7c42
riscv: kernel: Drop unused clean rule
Signed-off-by: Kefeng Wang <wangkefeng.wang@huawei.com>
Signed-off-by: Palmer Dabbelt <palmerdabbelt@google.com>
2020-12-10 17:42:34 -08:00
Palmer Dabbelt
ccbbfd1cbf
RISC-V: Define get_cycles64() regardless of M-mode
The timer driver uses get_cycles64() unconditionally to obtain the current
time.  A recent refactoring lost the common definition for some configs, which
is now the only one we need.

Fixes: d5be89a8d1 ("RISC-V: Resurrect the MMIO timer implementation for M-mode systems")
Reported-by: kernel test robot <lkp@intel.com>
Signed-off-by: Palmer Dabbelt <palmerdabbelt@google.com>
2020-12-10 17:39:43 -08:00
Nylon Chen
04091d6c05
riscv: provide memmove implementation
The memmove used by the kernel feature like KASAN.

Signed-off-by: Nick Hu <nickhu@andestech.com>
Signed-off-by: Nick Hu <nick650823@gmail.com>
Signed-off-by: Nylon Chen <nylon7@andestech.com>
Signed-off-by: Palmer Dabbelt <palmerdabbelt@google.com>
2020-12-10 17:27:54 -08:00
Linus Torvalds
47003b9971 powerpc fixes for 5.10 #6
One commit to implement copy_from_kernel_nofault_allowed(), otherwise
 copy_from_kernel_nofault() can trigger warnings when accessing bad addresses in
 some configurations.
 
 Thanks to:
   Christophe Leroy, Qian Cai.
 -----BEGIN PGP SIGNATURE-----
 
 iQJHBAABCAAxFiEEJFGtCPCthwEv2Y/bUevqPMjhpYAFAl/StwITHG1wZUBlbGxl
 cm1hbi5pZC5hdQAKCRBR6+o8yOGlgHtgD/9XePu2lenUUZDbzHVKJ/4oozNqJaYc
 mM7k/53GmPyi7AAttLdQGlSB0Gv2xBSDhng7T/UOnnBKwBk7gP8J4espuGraoYkA
 8q1LsAO9wwpN+oDQjFQ+s4uErildwIy73uSXByfhIESHo5VtY9ol7g+zZaTfyNhO
 W/wpSzcHLmTCMoWcJfk5vLCHMDmaY1Qq7U9uNt78bwUaNXz9LVZ/UFWSe4Bt6jEM
 573bgsSkbLoTV5QptDUOPpIBw1T+zahwB6dMjPzbxYa6Rws1I4QeJRNYxdvunDHP
 +F2ZYK/zyFBQlojPnjXJmbqQHEtXA/l9DNyLwR9VqjAOmgZaQezTVMIV56b8ndpM
 X7+AG37Nt6hqUfPz3f7L67y64VFAmAt8dFqVqUzEXBcN1KpVkS5BvBxjTUKwItwo
 Fdf80iSHaHYPdYAJJzjbeGuaaKID3w9H6npTR5xCKmN9o1r+N+VoZtQumlG+t6jl
 EtnPu0r6y/tPcyKixk/myAAx/8mVTQicDyIj2klheDClmNMK7NA0+QpLEBus10tl
 +bhk7KdWx7mQwYRltI+v7T3+mJ2SddVpQ84KmV6q21d/QbH1fQY/SvVNRYKWYb31
 s31KT9lYiW7xZ5qiA6R9YNGynvhrD61Bzr5o2dKJxvbpmFkvosVWN7waPqwozlRC
 l1Xvuc/1kBesiQ==
 =ERDG
 -----END PGP SIGNATURE-----

Merge tag 'powerpc-5.10-6' of git://git.kernel.org/pub/scm/linux/kernel/git/powerpc/linux

Pull powerpc fix from Michael Ellerman:
 "One commit to implement copy_from_kernel_nofault_allowed(), otherwise
  copy_from_kernel_nofault() can trigger warnings when accessing bad
  addresses in some configurations.

  Thanks to Christophe Leroy and Qian Cai"

* tag 'powerpc-5.10-6' of git://git.kernel.org/pub/scm/linux/kernel/git/powerpc/linux:
  powerpc/mm: Fix KUAP warning by providing copy_from_kernel_nofault_allowed()
2020-12-10 16:36:30 -08:00
Cédric Le Goater
dddc4ef92d KVM: PPC: Book3S HV: XIVE: Add a comment regarding VP numbering
When the XIVE resources are allocated at the HW level, the VP
structures describing the vCPUs of a guest are distributed among
the chips to optimize the PowerBUS usage. For best performance, the
guest vCPUs can be pinned to match the VP structure distribution.

Currently, the VP identifiers are deduced from the vCPU id using
the kvmppc_pack_vcpu_id() routine which is not incorrect but not
optimal either. It VSMT is used, the result is not continuous and
the constraints on HW resources described above can not be met.

Signed-off-by: Cédric Le Goater <clg@kaod.org>
Signed-off-by: Michael Ellerman <mpe@ellerman.id.au>
Link: https://lore.kernel.org/r/20201210171450.1933725-14-clg@kaod.org
2020-12-11 09:53:11 +11:00
Cédric Le Goater
07efbca11c powerpc/xive: Improve error reporting of OPAL calls
Introduce a vp_err() macro to standardize error reporting.

Signed-off-by: Cédric Le Goater <clg@kaod.org>
Reviewed-by: Greg Kurz <groug@kaod.org>
Signed-off-by: Michael Ellerman <mpe@ellerman.id.au>
Link: https://lore.kernel.org/r/20201210171450.1933725-13-clg@kaod.org
2020-12-11 09:53:11 +11:00
Cédric Le Goater
614546d562 powerpc/xive: Simplify xive_do_source_eoi()
Previous patches removed the need of the first argument which was a
hack for Firwmware EOI. Remove it and flatten the routine which has
became simpler.

Signed-off-by: Cédric Le Goater <clg@kaod.org>
Reviewed-by: Greg Kurz <groug@kaod.org>
Signed-off-by: Michael Ellerman <mpe@ellerman.id.au>
Link: https://lore.kernel.org/r/20201210171450.1933725-12-clg@kaod.org
2020-12-11 09:53:11 +11:00
Cédric Le Goater
cf58b74666 powerpc/xive: Remove P9 DD1 flag XIVE_IRQ_FLAG_EOI_FW
This flag was used to support the P9 DD1 and we have stopped
supporting this CPU when DD2 came out. See skiboot commit:

  0b0d15e3c1

Also, remove eoi handler which is now unused.

Signed-off-by: Cédric Le Goater <clg@kaod.org>
Reviewed-by: Greg Kurz <groug@kaod.org>
Signed-off-by: Michael Ellerman <mpe@ellerman.id.au>
Link: https://lore.kernel.org/r/20201210171450.1933725-11-clg@kaod.org
2020-12-11 09:53:10 +11:00
Cédric Le Goater
b5277d18c6 powerpc/xive: Remove P9 DD1 flag XIVE_IRQ_FLAG_MASK_FW
This flag was used to support the PHB4 LSIs on P9 DD1 and we have
stopped supporting this CPU when DD2 came out. See skiboot commit:

  0b0d15e3c1

Signed-off-by: Cédric Le Goater <clg@kaod.org>
Reviewed-by: Greg Kurz <groug@kaod.org>
Signed-off-by: Michael Ellerman <mpe@ellerman.id.au>
Link: https://lore.kernel.org/r/20201210171450.1933725-10-clg@kaod.org
2020-12-11 09:53:10 +11:00
Cédric Le Goater
4cc0e36df2 powerpc/xive: Remove P9 DD1 flag XIVE_IRQ_FLAG_SHIFT_BUG
This flag was used to support the PHB4 LSIs on P9 DD1 and we have
stopped supporting this CPU when DD2 came out. See skiboot commit:

  0b0d15e3c1

Signed-off-by: Cédric Le Goater <clg@kaod.org>
Reviewed-by: Greg Kurz <groug@kaod.org>
Signed-off-by: Michael Ellerman <mpe@ellerman.id.au>
Link: https://lore.kernel.org/r/20201210171450.1933725-9-clg@kaod.org
2020-12-11 09:53:10 +11:00
Cédric Le Goater
7b3b3de3b0 powerpc: Increase NR_IRQS range to support more KVM guests
PowerNV systems can handle up to 4K guests and 1M interrupt numbers
per chip. Increase the range of allowed interrupts to support a larger
number of guests.

Signed-off-by: Cédric Le Goater <clg@kaod.org>
Reviewed-by: Greg Kurz <groug@kaod.org>
Signed-off-by: Michael Ellerman <mpe@ellerman.id.au>
Link: https://lore.kernel.org/r/20201210171450.1933725-8-clg@kaod.org
2020-12-11 09:53:10 +11:00
Cédric Le Goater
a5021abc48 powerpc/xive: Add a debug_show handler to the XIVE irq_domain
Full state of the Linux interrupt descriptors can be dumped under
debugfs when compiled with CONFIG_GENERIC_IRQ_DEBUGFS. Add support for
the XIVE interrupt controller.

Signed-off-by: Cédric Le Goater <clg@kaod.org>
Signed-off-by: Michael Ellerman <mpe@ellerman.id.au>
Link: https://lore.kernel.org/r/20201210171450.1933725-7-clg@kaod.org
2020-12-11 09:53:10 +11:00
Cédric Le Goater
9dfe4b14df powerpc/xive: Add a name to the IRQ domain
We hope one day to handle multiple irq_domain in the XIVE driver.
Start simple by setting the name using the DT node.

Signed-off-by: Cédric Le Goater <clg@kaod.org>
Signed-off-by: Michael Ellerman <mpe@ellerman.id.au>
Link: https://lore.kernel.org/r/20201210171450.1933725-6-clg@kaod.org
2020-12-11 09:53:09 +11:00
Cédric Le Goater
e2cf43d595 powerpc/xive: Introduce XIVE_IPI_HW_IRQ
The XIVE driver deals with CPU IPIs in a peculiar way. Each CPU has
its own XIVE IPI interrupt allocated at the HW level, for PowerNV, or
at the hypervisor level for pSeries. In practice, these interrupts are
not always used. pSeries/PowerVM prefers local doorbells for local
threads since they are faster. On PowerNV, global doorbells are also
preferred for the same reason.

The mapping in the Linux is reduced to a single interrupt using HW
interrupt number 0 and a custom irq_chip to handle EOI. This can cause
performance issues in some benchmark (ipistorm) on multichip systems.

Clarify the use of the 0 value, it will help in improving multichip
support.

Signed-off-by: Cédric Le Goater <clg@kaod.org>
Reviewed-by: Greg Kurz <groug@kaod.org>
Signed-off-by: Michael Ellerman <mpe@ellerman.id.au>
Link: https://lore.kernel.org/r/20201210171450.1933725-4-clg@kaod.org
2020-12-11 09:34:07 +11:00
Cédric Le Goater
4f1c3f7b08 powerpc/xive: Rename XIVE_IRQ_NO_EOI to show its a flag
This is a simple cleanup to identify easily all flags of the XIVE
interrupt structure. The interrupts flagged with XIVE_IRQ_FLAG_NO_EOI
are the escalations used to wake up vCPUs in KVM. They are handled
very differently from the rest.

Signed-off-by: Cédric Le Goater <clg@kaod.org>
Reviewed-by: Greg Kurz <groug@kaod.org>
Signed-off-by: Michael Ellerman <mpe@ellerman.id.au>
Link: https://lore.kernel.org/r/20201210171450.1933725-3-clg@kaod.org
2020-12-11 09:34:07 +11:00
Cédric Le Goater
9898367500 KVM: PPC: Book3S HV: XIVE: Show detailed configuration in debug output
This is useful to track allocation of the HW resources on per guest
basis. Making sure IPIs are local to the chip of the vCPUs reduces
rerouting between interrupt controllers and gives better performance
in case of pinning. Checking the distribution of VP structures on the
chips also helps in reducing PowerBUS traffic.

[ clg: resurrected show_sources and reworked ouput ]

Signed-off-by: Greg Kurz <groug@kaod.org>
Signed-off-by: Cédric Le Goater <clg@kaod.org>
Signed-off-by: Michael Ellerman <mpe@ellerman.id.au>
Link: https://lore.kernel.org/r/20201210171450.1933725-2-clg@kaod.org
2020-12-11 09:34:07 +11:00
Thomas Gleixner
058df195c2 x86/ioapic: Cleanup the timer_works() irqflags mess
Mark tripped over the creative irqflags handling in the IO-APIC timer
delivery check which ends up doing:

        local_irq_save(flags);
	local_irq_enable();
        local_irq_restore(flags);

which triggered a new consistency check he's working on required for
replacing the POPF based restore with a conditional STI.

That code is a historical mess and none of this is needed. Make it
straightforward use local_irq_disable()/enable() as that's all what is
required. It is invoked from interrupt enabled code nowadays.

Reported-by: Mark Rutland <mark.rutland@arm.com>
Signed-off-by: Thomas Gleixner <tglx@linutronix.de>
Tested-by: Mark Rutland <mark.rutland@arm.com>
Link: https://lore.kernel.org/r/87k0tpju47.fsf@nanos.tec.linutronix.de
2020-12-10 23:02:31 +01:00
Thomas Gleixner
190113b4c6 x86/apic/vector: Fix ordering in vector assignment
Prarit reported that depending on the affinity setting the

 ' irq $N: Affinity broken due to vector space exhaustion.'

message is showing up in dmesg, but the vector space on the CPUs in the
affinity mask is definitely not exhausted.

Shung-Hsi provided traces and analysis which pinpoints the problem:

The ordering of trying to assign an interrupt vector in
assign_irq_vector_any_locked() is simply wrong if the interrupt data has a
valid node assigned. It does:

 1) Try the intersection of affinity mask and node mask
 2) Try the node mask
 3) Try the full affinity mask
 4) Try the full online mask

Obviously #2 and #3 are in the wrong order as the requested affinity
mask has to take precedence.

In the observed cases #1 failed because the affinity mask did not contain
CPUs from node 0. That made it allocate a vector from node 0, thereby
breaking affinity and emitting the misleading message.

Revert the order of #2 and #3 so the full affinity mask without the node
intersection is tried before actually affinity is broken.

If no node is assigned then only the full affinity mask and if that fails
the full online mask is tried.

Fixes: d6ffc6ac83 ("x86/vector: Respect affinity mask in irq descriptor")
Reported-by: Prarit Bhargava <prarit@redhat.com>
Reported-by: Shung-Hsi Yu <shung-hsi.yu@suse.com>
Signed-off-by: Thomas Gleixner <tglx@linutronix.de>
Tested-by: Shung-Hsi Yu <shung-hsi.yu@suse.com>
Cc: stable@vger.kernel.org
Link: https://lore.kernel.org/r/87ft4djtyp.fsf@nanos.tec.linutronix.de
2020-12-10 23:00:54 +01:00
Miquel Raynal
e5acf9c862 mtd: nand: ecc-hamming: Move Hamming code to the generic NAND layer
Hamming ECC code might be later re-used by the SPI NAND layer.

Signed-off-by: Miquel Raynal <miquel.raynal@bootlin.com>
Link: https://lore.kernel.org/linux-mtd/20200929230124.31491-12-miquel.raynal@bootlin.com
2020-12-10 22:37:29 +01:00
Gerald Schaefer
343dbdb7cb s390/mm: add support to allocate gigantic hugepages using CMA
Commit cf11e85fc0 ("mm: hugetlb: optionally allocate gigantic hugepages
using cma") added support for allocating gigantic hugepages using CMA,
by specifying the hugetlb_cma= kernel parameter, which will disable any
boot-time allocation of gigantic hugepages.

This patch enables that option also for s390.

Signed-off-by: Gerald Schaefer <gerald.schaefer@linux.ibm.com>
Signed-off-by: Heiko Carstens <hca@linux.ibm.com>
2020-12-10 21:11:01 +01:00
Harald Freudenberger
ff98cc986a s390/crypto: add arch_get_random_long() support
The random longs to be pulled by arch_get_random_long() are
prepared in an 4K buffer which is filled from the NIST 800-90
compliant s390 drbg. By default the random long buffer is refilled
256 times before the drbg itself needs a reseed. The reseed of the
drbg is done with 32 bytes fetched from the high quality (but slow)
trng which is assumed to deliver 100% entropy. So the 32 * 8 = 256
bits of entropy are spread over 256 * 4KB = 1MB serving 131072
arch_get_random_long() invocations before reseeded.

How often the 4K random long buffer is refilled with the drbg
before the drbg is reseeded can be adjusted. There is a module
parameter 's390_arch_rnd_long_drbg_reseed' accessible via
  /sys/module/arch_random/parameters/rndlong_drbg_reseed
or as kernel command line parameter
  arch_random.rndlong_drbg_reseed=<value>
This parameter tells how often the drbg fills the 4K buffer before
it is re-seeded by fresh entropy from the trng.
A value of 16 results in reseeding the drbg at every 16 * 4 KB = 64
KB with 32 bytes of fresh entropy pulled from the trng. So a value
of 16 would result in 256 bits entropy per 64 KB.
A value of 256 results in 1MB of drbg output before a reseed of the
drbg is done. So this would spread the 256 bits of entropy among 1MB.
Setting this parameter to 0 forces the reseed to take place every
time the 4K buffer is depleted, so the entropy rises to 256 bits
entropy per 4K or 0.5 bit entropy per arch_get_random_long().  With
setting this parameter to negative values all this effort is
disabled, arch_get_random long() returns false and thus indicating
that the arch_get_random_long() feature is disabled at all.

arch_get_random_long() is used by random.c among others to provide
an initial hash value to be mixed with the entropy pool on every
random data pull. For about 64 bytes read from /dev/urandom there
is one call to arch_get_random_long(). So these additional random
long values count for performance of /dev/urandom with measurable
but low penalty.

Signed-off-by: Harald Freudenberger <freude@linux.ibm.com>
Reviewed-by: Ingo Franzki <ifranzki@linux.ibm.com>
Reviewed-by: Juergen Christ <jchrist@linux.ibm.com>
Signed-off-by: Heiko Carstens <hca@linux.ibm.com>
2020-12-10 21:02:08 +01:00
Eric W. Biederman
460b4f812a file: Rename fcheck lookup_fd_rcu
Also remove the confusing comment about checking if a fd exists.  I
could not find one instance in the entire kernel that still matches
the description or the reason for the name fcheck.

The need for better names became apparent in the last round of
discussion of this set of changes[1].

[1] https://lkml.kernel.org/r/CAHk-=wj8BQbgJFLa+J0e=iT-1qpmCRTbPAJ8gd6MJQ=kbRPqyQ@mail.gmail.com
Link: https://lkml.kernel.org/r/20201120231441.29911-10-ebiederm@xmission.com
Signed-off-by: Eric W. Biederman <ebiederm@xmission.com>
2020-12-10 12:40:07 -06:00
Xiaochen Shen
06c5fe9b12 x86/resctrl: Fix incorrect local bandwidth when mba_sc is enabled
The MBA software controller (mba_sc) is a feedback loop which
periodically reads MBM counters and tries to restrict the bandwidth
below a user-specified value. It tags along the MBM counter overflow
handler to do the updates with 1s interval in mbm_update() and
update_mba_bw().

The purpose of mbm_update() is to periodically read the MBM counters to
make sure that the hardware counter doesn't wrap around more than once
between user samplings. mbm_update() calls __mon_event_count() for local
bandwidth updating when mba_sc is not enabled, but calls mbm_bw_count()
instead when mba_sc is enabled. __mon_event_count() will not be called
for local bandwidth updating in MBM counter overflow handler, but it is
still called when reading MBM local bandwidth counter file
'mbm_local_bytes', the call path is as below:

  rdtgroup_mondata_show()
    mon_event_read()
      mon_event_count()
        __mon_event_count()

In __mon_event_count(), m->chunks is updated by delta chunks which is
calculated from previous MSR value (m->prev_msr) and current MSR value.
When mba_sc is enabled, m->chunks is also updated in mbm_update() by
mistake by the delta chunks which is calculated from m->prev_bw_msr
instead of m->prev_msr. But m->chunks is not used in update_mba_bw() in
the mba_sc feedback loop.

When reading MBM local bandwidth counter file, m->chunks was changed
unexpectedly by mbm_bw_count(). As a result, the incorrect local
bandwidth counter which calculated from incorrect m->chunks is shown to
the user.

Fix this by removing incorrect m->chunks updating in mbm_bw_count() in
MBM counter overflow handler, and always calling __mon_event_count() in
mbm_update() to make sure that the hardware local bandwidth counter
doesn't wrap around.

Test steps:
  # Run workload with aggressive memory bandwidth (e.g., 10 GB/s)
  git clone https://github.com/intel/intel-cmt-cat && cd intel-cmt-cat
  && make
  ./tools/membw/membw -c 0 -b 10000 --read

  # Enable MBA software controller
  mount -t resctrl resctrl -o mba_MBps /sys/fs/resctrl

  # Create control group c1
  mkdir /sys/fs/resctrl/c1

  # Set MB throttle to 6 GB/s
  echo "MB:0=6000;1=6000" > /sys/fs/resctrl/c1/schemata

  # Write PID of the workload to tasks file
  echo `pidof membw` > /sys/fs/resctrl/c1/tasks

  # Read local bytes counters twice with 1s interval, the calculated
  # local bandwidth is not as expected (approaching to 6 GB/s):
  local_1=`cat /sys/fs/resctrl/c1/mon_data/mon_L3_00/mbm_local_bytes`
  sleep 1
  local_2=`cat /sys/fs/resctrl/c1/mon_data/mon_L3_00/mbm_local_bytes`
  echo "local b/w (bytes/s):" `expr $local_2 - $local_1`

Before fix:
  local b/w (bytes/s): 11076796416

After fix:
  local b/w (bytes/s): 5465014272

Fixes: ba0f26d852 (x86/intel_rdt/mba_sc: Prepare for feedback loop)
Signed-off-by: Xiaochen Shen <xiaochen.shen@intel.com>
Signed-off-by: Borislav Petkov <bp@suse.de>
Reviewed-by: Tony Luck <tony.luck@intel.com>
Cc: <stable@vger.kernel.org>
Link: https://lkml.kernel.org/r/1607063279-19437-1-git-send-email-xiaochen.shen@intel.com
2020-12-10 17:52:37 +01:00
Paolo Bonzini
83bbb8ffb4 kvm/arm64 fixes for 5.10, take #5
- Don't leak page tables on PTE update
 - Correctly invalidate TLBs on table to block transition
 - Only update permissions if the fault level matches the
   expected mapping size
 -----BEGIN PGP SIGNATURE-----
 
 iQJDBAABCgAtFiEEn9UcU+C1Yxj9lZw9I9DQutE9ekMFAl/KerUPHG1hekBrZXJu
 ZWwub3JnAAoJECPQ0LrRPXpD814P/1tEItvIx2rfWXT7zD/pfOImqGeDLKY7JUEx
 tcOVfKDvFqeT/aiArsp7VlRR5FV8Y97VAzSHpyVyT7zJYyD6NkRUXjD/tzMK0pmb
 jQoo2EKAWokl3lAETKsTh44ZweOM5kYvnM0IQf+lpZq58SMMWwB/62R6vZjJLsTT
 BmPiINYOf+zhAXbjqGmh9buwvzn00hHq6kzz96tWhyBR+ZluVaYByHdbIdcCzB93
 kX13ndVart8H6zrghhJy+ZwXd5WYMJPXAg/eGZSBSrIZDC/EzxAPBAQczYelqj3v
 g3cZ+HQhPnePgmEQ5VYB2gQIWO/kjwgumS0TfzYUBDKTXWwxMdxAwwY6vR8CVUY7
 HVr06Moyx+SVc6oY2iePYx6BDCg4I6sOWGux01usy9izsbUUqOggWGBnEWViSojX
 bn8jriYemkC4hZ6CKgn0K4Y9J9M6LjxBgwxdHmoPGNLBI6B1sdKmA8gCn6W+oLew
 nr/0yquuijWrASXrQjK56nKBP44jX+sSlsNzjKN/cd8UyCu44649239GhEgSXGgj
 EIf4aqhv1HgiF0ceXwQ+jFu8bp+KCR8YNt27cGf/HNz+xAaBFk0M2DUX/N9DrHko
 e1gK8MxOX26vvuCjCQo05rRx9nO4FXUZEMewSGLLxqdj2aPG6miI8/Gq929WeuSX
 zNnWPS2M
 =+7gU
 -----END PGP SIGNATURE-----

Merge tag 'kvmarm-fixes-5.10-5' of git://git.kernel.org/pub/scm/linux/kernel/git/kvmarm/kvmarm into HEAD

kvm/arm64 fixes for 5.10, take #5

- Don't leak page tables on PTE update
- Correctly invalidate TLBs on table to block transition
- Only update permissions if the fault level matches the
  expected mapping size
2020-12-10 11:34:24 -05:00
Arnd Bergmann
13719d8d0d Merge branch 'sparx5-next' of https://github.com/microchip-ung/linux-upstream into arm/dt
* 'sparx5-next' of https://github.com/microchip-ung/linux-upstream:
  arm64: dts: sparx5: Add SGPIO devices
  arm64: dts: sparx5: Add reset support

Link: https://lore.kernel.org/r/87ft4dq2q8.fsf@microchip.com
Signed-off-by: Arnd Bergmann <arnd@arndb.de>
2020-12-10 15:40:09 +01:00
Christian Borntraeger
50a05be484 KVM: s390: track synchronous pfault events in kvm_stat
Right now we do count pfault (pseudo page faults aka async page faults
start and completion events). What we do not count is, if an async page
fault would have been possible by the host, but it was disabled by the
guest (e.g. interrupts off, pfault disabled, secure execution....).  Let
us count those as well in the pfault_sync counter.

Signed-off-by: Christian Borntraeger <borntraeger@de.ibm.com>
Reviewed-by: David Hildenbrand <david@redhat.com>
Reviewed-by: Cornelia Huck <cohuck@redhat.com>
Link: https://lore.kernel.org/r/20201125090658.38463-1-borntraeger@de.ibm.com
2020-12-10 14:20:26 +01:00
Gautham R. Shenoy
0be47634db powerpc/cacheinfo: Print correct cache-sibling map/list for L2 cache
On POWER platforms where only some groups of threads within a core
share the L2-cache (indicated by the ibm,thread-groups device-tree
property), we currently print the incorrect shared_cpu_map/list for
L2-cache in the sysfs.

This patch reports the correct shared_cpu_map/list on such platforms.

Example:
On a platform with "ibm,thread-groups" set to
                 00000001 00000002 00000004 00000000
                 00000002 00000004 00000006 00000001
                 00000003 00000005 00000007 00000002
                 00000002 00000004 00000000 00000002
                 00000004 00000006 00000001 00000003
                 00000005 00000007

This indicates that threads {0,2,4,6} in the core share the L2-cache
and threads {1,3,5,7} in the core share the L2 cache.

However, without the patch, the shared_cpu_map/list for L2 for CPUs 0,
1 is reported in the sysfs as follows:

/sys/devices/system/cpu/cpu0/cache/index2/shared_cpu_list:0-7
/sys/devices/system/cpu/cpu0/cache/index2/shared_cpu_map:000000,000000ff

/sys/devices/system/cpu/cpu1/cache/index2/shared_cpu_list:0-7
/sys/devices/system/cpu/cpu1/cache/index2/shared_cpu_map:000000,000000ff

With the patch, the shared_cpu_map/list for L2 cache for CPUs 0, 1 is
correctly reported as follows:

/sys/devices/system/cpu/cpu0/cache/index2/shared_cpu_list:0,2,4,6
/sys/devices/system/cpu/cpu0/cache/index2/shared_cpu_map:000000,00000055

/sys/devices/system/cpu/cpu1/cache/index2/shared_cpu_list:1,3,5,7
/sys/devices/system/cpu/cpu1/cache/index2/shared_cpu_map:000000,000000aa

This patch also defines cpu_l2_cache_mask() for !CONFIG_SMP case.

Signed-off-by: Gautham R. Shenoy <ego@linux.vnet.ibm.com>
Signed-off-by: Michael Ellerman <mpe@ellerman.id.au>
Link: https://lore.kernel.org/r/1607596739-32439-6-git-send-email-ego@linux.vnet.ibm.com
2020-12-11 00:10:25 +11:00
Gautham R. Shenoy
9538abee18 powerpc/smp: Add support detecting thread-groups sharing L2 cache
On POWER systems, groups of threads within a core sharing the L2-cache
can be indicated by the "ibm,thread-groups" property array with the
identifier "2".

This patch adds support for detecting this, and when present, populate
the populating the cpu_l2_cache_mask of every CPU to the core-siblings
which share L2 with the CPU as specified in the by the
"ibm,thread-groups" property array.

On a platform with the following "ibm,thread-group" configuration
		 00000001 00000002 00000004 00000000
		 00000002 00000004 00000006 00000001
		 00000003 00000005 00000007 00000002
		 00000002 00000004 00000000 00000002
		 00000004 00000006 00000001 00000003
		 00000005 00000007

Without this patch, the sched-domain hierarchy for CPUs 0,1 would be
	CPU0 attaching sched-domain(s):
	domain-0: span=0,2,4,6 level=SMT
	domain-1: span=0-7 level=CACHE
	domain-2: span=0-15,24-39,48-55 level=MC
	domain-3: span=0-55 level=DIE

	CPU1 attaching sched-domain(s):
	domain-0: span=1,3,5,7 level=SMT
	domain-1: span=0-7 level=CACHE
	domain-2: span=0-15,24-39,48-55 level=MC
	domain-3: span=0-55 level=DIE

The CACHE domain at 0-7 is incorrect since the ibm,thread-groups
sub-array
[00000002 00000002 00000004
 00000000 00000002 00000004 00000006
 00000001 00000003 00000005 00000007]
indicates that L2 (Property "2") is shared only between the threads of a single
group. There are "2" groups of threads where each group contains "4"
threads each. The groups being {0,2,4,6} and {1,3,5,7}.

With this patch, the sched-domain hierarchy for CPUs 0,1 would be
     	CPU0 attaching sched-domain(s):
	domain-0: span=0,2,4,6 level=SMT
	domain-1: span=0-15,24-39,48-55 level=MC
	domain-2: span=0-55 level=DIE

	CPU1 attaching sched-domain(s):
	domain-0: span=1,3,5,7 level=SMT
	domain-1: span=0-15,24-39,48-55 level=MC
	domain-2: span=0-55 level=DIE

The CACHE domain with span=0,2,4,6 for CPU 0 (span=1,3,5,7 for CPU 1
resp.) gets degenerated into the SMT domain. Furthermore, the
last-level-cache domain gets correctly set to the SMT sched-domain.

Signed-off-by: Gautham R. Shenoy <ego@linux.vnet.ibm.com>
Signed-off-by: Michael Ellerman <mpe@ellerman.id.au>
Link: https://lore.kernel.org/r/1607596739-32439-5-git-send-email-ego@linux.vnet.ibm.com
2020-12-11 00:10:25 +11:00
Gautham R. Shenoy
fbd2b672e9 powerpc/smp: Rename init_thread_group_l1_cache_map() to make it generic
init_thread_group_l1_cache_map() initializes the per-cpu cpumask
thread_group_l1_cache_map with the core-siblings which share L1 cache
with the CPU. Make this function generic to the cache-property (L1 or
L2) and update a suitable mask. This is a preparatory patch for the
next patch where we will introduce discovery of thread-groups that
share L2-cache.

No functional change.

Signed-off-by: Gautham R. Shenoy <ego@linux.vnet.ibm.com>
Signed-off-by: Michael Ellerman <mpe@ellerman.id.au>
Link: https://lore.kernel.org/r/1607596739-32439-4-git-send-email-ego@linux.vnet.ibm.com
2020-12-11 00:10:25 +11:00
Gautham R. Shenoy
1fdc1d6632 powerpc/smp: Rename cpu_l1_cache_map as thread_group_l1_cache_map
On platforms which have the "ibm,thread-groups" property, the per-cpu
variable cpu_l1_cache_map keeps a track of which group of threads
within the same core share the L1 cache, Instruction and Data flow.

This patch renames the variable to "thread_group_l1_cache_map" to make
it consistent with a subsequent patch which will introduce
thread_group_l2_cache_map.

This patch introduces no functional change.

Signed-off-by: Gautham R. Shenoy <ego@linux.vnet.ibm.com>
Signed-off-by: Michael Ellerman <mpe@ellerman.id.au>
Link: https://lore.kernel.org/r/1607596739-32439-3-git-send-email-ego@linux.vnet.ibm.com
2020-12-11 00:10:24 +11:00
Gautham R. Shenoy
790a1662d3 powerpc/smp: Parse ibm,thread-groups with multiple properties
The "ibm,thread-groups" device-tree property is an array that is used
to indicate if groups of threads within a core share certain
properties. It provides details of which property is being shared by
which groups of threads. This array can encode information about
multiple properties being shared by different thread-groups within the
core.

Example: Suppose,
"ibm,thread-groups" = [1,2,4,8,10,12,14,9,11,13,15,2,2,4,8,10,12,14,9,11,13,15]

This can be decomposed up into two consecutive arrays:

a) [1,2,4,8,10,12,14,9,11,13,15]
b) [2,2,4,8,10,12,14,9,11,13,15]

where in,

a) provides information of Property "1" being shared by "2" groups,
   each with "4" threads each. The "ibm,ppc-interrupt-server#s" of the
   first group is {8,10,12,14} and the "ibm,ppc-interrupt-server#s" of
   the second group is {9,11,13,15}. Property "1" is indicative of
   the thread in the group sharing L1 cache, translation cache and
   Instruction Data flow.

b) provides information of Property "2" being shared by "2" groups,
   each group with "4" threads. The "ibm,ppc-interrupt-server#s" of
   the first group is {8,10,12,14} and the
   "ibm,ppc-interrupt-server#s" of the second group is
   {9,11,13,15}. Property "2" indicates that the threads in each group
   share the L2-cache.

The existing code assumes that the "ibm,thread-groups" encodes
information about only one property. Hence even on platforms which
encode information about multiple properties being shared by the
corresponding groups of threads, the current code will only pick the
first one. (In the above example, it will only consider
[1,2,4,8,10,12,14,9,11,13,15] but not [2,2,4,8,10,12,14,9,11,13,15]).

This patch extends the parsing support on platforms which encode
information about multiple properties being shared by the
corresponding groups of threads.

Signed-off-by: Gautham R. Shenoy <ego@linux.vnet.ibm.com>
Signed-off-by: Michael Ellerman <mpe@ellerman.id.au>
Link: https://lore.kernel.org/r/1607596739-32439-2-git-send-email-ego@linux.vnet.ibm.com
2020-12-11 00:10:16 +11:00
Ravi Bangoria
3d2ffcdd2a powerpc/watchpoint: Workaround P10 DD1 issue with VSX-32 byte instructions
POWER10 DD1 has an issue where it generates watchpoint exceptions when
it shouldn't. The conditions where this occur are:

 - octword op
 - ending address of DAWR range is less than starting address of op
 - those addresses need to be in the same or in two consecutive 512B
   blocks
 - 'op address + 64B' generates an address that has a carry into bit
   52 (crosses 2K boundary)

Handle such spurious exception by considering them as extraneous and
emulating/single-steeping instruction without generating an event.

[ravi: Fixed build warning reported by lkp@intel.com]
Signed-off-by: Ravi Bangoria <ravi.bangoria@linux.ibm.com>
Signed-off-by: Michael Ellerman <mpe@ellerman.id.au>
Link: https://lore.kernel.org/r/20201106045650.278987-1-ravi.bangoria@linux.ibm.com
2020-12-11 00:09:10 +11:00
Balamuruhan S
35785b293d powerpc/sstep: Add testcases for VSX vector paired load/store instructions
Add testcases for VSX vector paired load/store instructions.
Sample o/p:

  emulate_step_test: lxvp           : PASS
  emulate_step_test: stxvp          : PASS
  emulate_step_test: lxvpx          : PASS
  emulate_step_test: stxvpx         : PASS
  emulate_step_test: plxvp          : PASS
  emulate_step_test: pstxvp         : PASS

Signed-off-by: Balamuruhan S <bala24@linux.ibm.com>
Signed-off-by: Ravi Bangoria <ravi.bangoria@linux.ibm.com>
Signed-off-by: Michael Ellerman <mpe@ellerman.id.au>
Link: https://lore.kernel.org/r/20201011050908.72173-6-ravi.bangoria@linux.ibm.com
2020-12-11 00:09:10 +11:00
Balamuruhan S
6ce73ba769 powerpc/ppc-opcode: Add encoding macros for VSX vector paired instructions
Add instruction encodings, DQ, D0, D1 immediate, XTP, XSP operands as
macros for new VSX vector paired instructions,
  * Load VSX Vector Paired (lxvp)
  * Load VSX Vector Paired Indexed (lxvpx)
  * Prefixed Load VSX Vector Paired (plxvp)
  * Store VSX Vector Paired (stxvp)
  * Store VSX Vector Paired Indexed (stxvpx)
  * Prefixed Store VSX Vector Paired (pstxvp)

Suggested-by: Naveen N. Rao <naveen.n.rao@linux.vnet.ibm.com>
Signed-off-by: Balamuruhan S <bala24@linux.ibm.com>
Signed-off-by: Ravi Bangoria <ravi.bangoria@linux.ibm.com>
Signed-off-by: Michael Ellerman <mpe@ellerman.id.au>
Link: https://lore.kernel.org/r/20201011050908.72173-5-ravi.bangoria@linux.ibm.com
2020-12-11 00:09:09 +11:00
Balamuruhan S
af99da7433 powerpc/sstep: Support VSX vector paired storage access instructions
VSX Vector Paired instructions loads/stores an octword (32 bytes)
from/to storage into two sequential VSRs. Add emulation support
for these new instructions:
  * Load VSX Vector Paired (lxvp)
  * Load VSX Vector Paired Indexed (lxvpx)
  * Prefixed Load VSX Vector Paired (plxvp)
  * Store VSX Vector Paired (stxvp)
  * Store VSX Vector Paired Indexed (stxvpx)
  * Prefixed Store VSX Vector Paired (pstxvp)

[kernel test robot reported a build failure]

Reported-by: kernel test robot <lkp@intel.com>
Suggested-by: Naveen N. Rao <naveen.n.rao@linux.vnet.ibm.com>
Signed-off-by: Balamuruhan S <bala24@linux.ibm.com>
Signed-off-by: Ravi Bangoria <ravi.bangoria@linux.ibm.com>
Signed-off-by: Michael Ellerman <mpe@ellerman.id.au>
Link: https://lore.kernel.org/r/20201011050908.72173-4-ravi.bangoria@linux.ibm.com
2020-12-11 00:09:09 +11:00
Ravi Bangoria
1817de2f14 powerpc/sstep: Cover new VSX instructions under CONFIG_VSX
Recently added Power10 prefixed VSX instruction are included
unconditionally in the kernel. If they are executed on a
machine without VSX support, it might create issues. Fix that.
Also fix one mnemonics spelling mistake in comment.

Fixes: 50b80a12e4 ("powerpc sstep: Add support for prefixed load/stores")
Signed-off-by: Ravi Bangoria <ravi.bangoria@linux.ibm.com>
Signed-off-by: Michael Ellerman <mpe@ellerman.id.au>
Link: https://lore.kernel.org/r/20201011050908.72173-3-ravi.bangoria@linux.ibm.com
2020-12-11 00:09:09 +11:00
Balamuruhan S
ef6879f8c8 powerpc/sstep: Emulate prefixed instructions only when CPU_FTR_ARCH_31 is set
Unconditional emulation of prefixed instructions will allow
emulation of them on Power10 predecessors which might cause
issues. Restrict that.

Fixes: 3920742b92 ("powerpc sstep: Add support for prefixed fixed-point arithmetic")
Fixes: 50b80a12e4 ("powerpc sstep: Add support for prefixed load/stores")
Signed-off-by: Balamuruhan S <bala24@linux.ibm.com>
Signed-off-by: Ravi Bangoria <ravi.bangoria@linux.ibm.com>
Reviewed-by: Sandipan Das <sandipan@linux.ibm.com>
Reviewed-by: Daniel Axtens <dja@axtens.net>
Signed-off-by: Michael Ellerman <mpe@ellerman.id.au>
Link: https://lore.kernel.org/r/20201011050908.72173-2-ravi.bangoria@linux.ibm.com
2020-12-11 00:09:09 +11:00
Christian Borntraeger
0cd2a787cf s390/gmap: make gmap memcg aware
gmap allocations can be attributed to a process.

Signed-off-by: Christian Borntraeger <borntraeger@de.ibm.com>
Acked-by: Heiko Carstens <hca@linux.ibm.com>
Acked-by: Janosch Frank <frankja@linux.ibm.com>
Acked-by: Cornelia Huck <cohuck@redhat.com>
2020-12-10 13:36:05 +01:00
Christian Borntraeger
c419621873 KVM: s390: Add memcg accounting to KVM allocations
Almost all kvm allocations in the s390x KVM code can be attributed to
the process that triggers the allocation (in other words, no global
allocation for other guests). This will help the memcg controller to
make the right decisions.

Signed-off-by: Christian Borntraeger <borntraeger@de.ibm.com>
Acked-by: Janosch Frank <frankja@linux.ibm.com>
Acked-by: Cornelia Huck <cohuck@redhat.com>
2020-12-10 13:36:05 +01:00
Nicholas Piggin
02b02ee1b0 powerpc/64s: Remove idle workaround code from restore_cpu_cpufeatures
Idle code no longer uses the .cpu_restore CPU operation to restore
SPRs, so this workaround is no longer required.

Signed-off-by: Nicholas Piggin <npiggin@gmail.com>
Signed-off-by: Michael Ellerman <mpe@ellerman.id.au>
Link: https://lore.kernel.org/r/20190711022404.18132-2-npiggin@gmail.com
2020-12-10 23:13:26 +11:00
Athira Rajeev
aa8e21c053 powerpc/perf: Exclude kernel samples while counting events in user space.
Perf event attritube supports exclude_kernel flag to avoid
sampling/profiling in supervisor state (kernel). Based on this event
attr flag, Monitor Mode Control Register bit is set to freeze on
supervisor state. But sometimes (due to hardware limitation), Sampled
Instruction Address Register (SIAR) locks on to kernel address even
when freeze on supervisor is set. Patch here adds a check to drop
those samples.

Cc: stable@vger.kernel.org
Signed-off-by: Athira Rajeev <atrajeev@linux.vnet.ibm.com>
Signed-off-by: Michael Ellerman <mpe@ellerman.id.au>
Link: https://lore.kernel.org/r/1606289215-1433-1-git-send-email-atrajeev@linux.vnet.ibm.com
2020-12-10 23:04:35 +11:00