bianbu-linux-6.6/arch/s390/include/asm/processor.h
Martin Schwidefsky 0aaba41b58 s390: remove all code using the access register mode
The vdso code for the getcpu() and the clock_gettime() call use the access
register mode to access the per-CPU vdso data page with the current code.

An alternative to the complicated AR mode is to use the secondary space
mode. This makes the vdso faster and quite a bit simpler. The downside is
that the uaccess code has to be changed quite a bit.

Which instructions are used depends on the machine and what kind of uaccess
operation is requested. The instruction dictates which ASCE value needs
to be loaded into %cr1 and %cr7.

The different cases:

* User copy with MVCOS for z10 and newer machines
  The MVCOS instruction can copy between the primary space (aka user) and
  the home space (aka kernel) directly. For set_fs(KERNEL_DS) the kernel
  ASCE is loaded into %cr1. For set_fs(USER_DS) the user space is already
  loaded in %cr1.

* User copy with MVCP/MVCS for older machines
  To be able to execute the MVCP/MVCS instructions the kernel needs to
  switch to primary mode. The control register %cr1 has to be set to the
  kernel ASCE and %cr7 to either the kernel ASCE or the user ASCE dependent
  on set_fs(KERNEL_DS) vs set_fs(USER_DS).

* Data access in the user address space for strnlen / futex
  To use "normal" instruction with data from the user address space the
  secondary space mode is used. The kernel needs to switch to primary mode,
  %cr1 has to contain the kernel ASCE and %cr7 either the user ASCE or the
  kernel ASCE, dependent on set_fs.

To load a new value into %cr1 or %cr7 is an expensive operation, the kernel
tries to be lazy about it. E.g. for multiple user copies in a row with
MVCP/MVCS the replacement of the vdso ASCE in %cr7 with the user ASCE is
done only once. On return to user space a CPU bit is checked that loads the
vdso ASCE again.

To enable and disable the data access via the secondary space two new
functions are added, enable_sacf_uaccess and disable_sacf_uaccess. The fact
that a context is in secondary space uaccess mode is stored in the
mm_segment_t value for the task. The code of an interrupt may use set_fs
as long as it returns to the previous state it got with get_fs with another
call to set_fs. The code in finish_arch_post_lock_switch simply has to do a
set_fs with the current mm_segment_t value for the task.

For CPUs with MVCOS:

CPU running in                        | %cr1 ASCE | %cr7 ASCE |
--------------------------------------|-----------|-----------|
user space                            |  user     |  vdso     |
kernel, USER_DS, normal-mode          |  user     |  vdso     |
kernel, USER_DS, normal-mode, lazy    |  user     |  user     |
kernel, USER_DS, sacf-mode            |  kernel   |  user     |
kernel, KERNEL_DS, normal-mode        |  kernel   |  vdso     |
kernel, KERNEL_DS, normal-mode, lazy  |  kernel   |  kernel   |
kernel, KERNEL_DS, sacf-mode          |  kernel   |  kernel   |

For CPUs without MVCOS:

CPU running in                        | %cr1 ASCE | %cr7 ASCE |
--------------------------------------|-----------|-----------|
user space                            |  user     |  vdso     |
kernel, USER_DS, normal-mode          |  user     |  vdso     |
kernel, USER_DS, normal-mode lazy     |  kernel   |  user     |
kernel, USER_DS, sacf-mode            |  kernel   |  user     |
kernel, KERNEL_DS, normal-mode        |  kernel   |  vdso     |
kernel, KERNEL_DS, normal-mode, lazy  |  kernel   |  kernel   |
kernel, KERNEL_DS, sacf-mode          |  kernel   |  kernel   |

The lines with "lazy" refer to the state after a copy via the secondary
space with a delayed reload of %cr1 and %cr7.

There are three hardware address spaces that can cause a DAT exception,
primary, secondary and home space. The exception can be related to
four different fault types: user space fault, vdso fault, kernel fault,
and the gmap faults.

Dependent on the set_fs state and normal vs. sacf mode there are a number
of fault combinations:

1) user address space fault via the primary ASCE
2) gmap address space fault via the primary ASCE
3) kernel address space fault via the primary ASCE for machines with
   MVCOS and set_fs(KERNEL_DS)
4) vdso address space faults via the secondary ASCE with an invalid
   address while running in secondary space in problem state
5) user address space fault via the secondary ASCE for user-copy
   based on the secondary space mode, e.g. futex_ops or strnlen_user
6) kernel address space fault via the secondary ASCE for user-copy
   with secondary space mode with set_fs(KERNEL_DS)
7) kernel address space fault via the primary ASCE for user-copy
   with secondary space mode with set_fs(USER_DS) on machines without
   MVCOS.
8) kernel address space fault via the home space ASCE

Replace user_space_fault() with a new function get_fault_type() that
can distinguish all four different fault types.

With these changes the futex atomic ops from the kernel and the
strnlen_user will get a little bit slower, as well as the old style
uaccess with MVCP/MVCS. All user accesses based on MVCOS will be as
fast as before. On the positive side, the user space vdso code is a
lot faster and Linux ceases to use the complicated AR mode.

Reviewed-by: Heiko Carstens <heiko.carstens@de.ibm.com>
Signed-off-by: Martin Schwidefsky <schwidefsky@de.ibm.com>
Signed-off-by: Heiko Carstens <heiko.carstens@de.ibm.com>
2017-11-14 11:01:47 +01:00

382 lines
10 KiB
C

/* SPDX-License-Identifier: GPL-2.0 */
/*
* S390 version
* Copyright IBM Corp. 1999
* Author(s): Hartmut Penner (hp@de.ibm.com),
* Martin Schwidefsky (schwidefsky@de.ibm.com)
*
* Derived from "include/asm-i386/processor.h"
* Copyright (C) 1994, Linus Torvalds
*/
#ifndef __ASM_S390_PROCESSOR_H
#define __ASM_S390_PROCESSOR_H
#include <linux/const.h>
#define CIF_MCCK_PENDING 0 /* machine check handling is pending */
#define CIF_ASCE_PRIMARY 1 /* primary asce needs fixup / uaccess */
#define CIF_ASCE_SECONDARY 2 /* secondary asce needs fixup / uaccess */
#define CIF_NOHZ_DELAY 3 /* delay HZ disable for a tick */
#define CIF_FPU 4 /* restore FPU registers */
#define CIF_IGNORE_IRQ 5 /* ignore interrupt (for udelay) */
#define CIF_ENABLED_WAIT 6 /* in enabled wait state */
#define CIF_MCCK_GUEST 7 /* machine check happening in guest */
#define CIF_DEDICATED_CPU 8 /* this CPU is dedicated */
#define _CIF_MCCK_PENDING _BITUL(CIF_MCCK_PENDING)
#define _CIF_ASCE_PRIMARY _BITUL(CIF_ASCE_PRIMARY)
#define _CIF_ASCE_SECONDARY _BITUL(CIF_ASCE_SECONDARY)
#define _CIF_NOHZ_DELAY _BITUL(CIF_NOHZ_DELAY)
#define _CIF_FPU _BITUL(CIF_FPU)
#define _CIF_IGNORE_IRQ _BITUL(CIF_IGNORE_IRQ)
#define _CIF_ENABLED_WAIT _BITUL(CIF_ENABLED_WAIT)
#define _CIF_MCCK_GUEST _BITUL(CIF_MCCK_GUEST)
#define _CIF_DEDICATED_CPU _BITUL(CIF_DEDICATED_CPU)
#ifndef __ASSEMBLY__
#include <linux/linkage.h>
#include <linux/irqflags.h>
#include <asm/cpu.h>
#include <asm/page.h>
#include <asm/ptrace.h>
#include <asm/setup.h>
#include <asm/runtime_instr.h>
#include <asm/fpu/types.h>
#include <asm/fpu/internal.h>
static inline void set_cpu_flag(int flag)
{
S390_lowcore.cpu_flags |= (1UL << flag);
}
static inline void clear_cpu_flag(int flag)
{
S390_lowcore.cpu_flags &= ~(1UL << flag);
}
static inline int test_cpu_flag(int flag)
{
return !!(S390_lowcore.cpu_flags & (1UL << flag));
}
/*
* Test CIF flag of another CPU. The caller needs to ensure that
* CPU hotplug can not happen, e.g. by disabling preemption.
*/
static inline int test_cpu_flag_of(int flag, int cpu)
{
struct lowcore *lc = lowcore_ptr[cpu];
return !!(lc->cpu_flags & (1UL << flag));
}
#define arch_needs_cpu() test_cpu_flag(CIF_NOHZ_DELAY)
/*
* Default implementation of macro that returns current
* instruction pointer ("program counter").
*/
#define current_text_addr() ({ void *pc; asm("basr %0,0" : "=a" (pc)); pc; })
static inline void get_cpu_id(struct cpuid *ptr)
{
asm volatile("stidp %0" : "=Q" (*ptr));
}
void s390_adjust_jiffies(void);
void s390_update_cpu_mhz(void);
void cpu_detect_mhz_feature(void);
extern const struct seq_operations cpuinfo_op;
extern int sysctl_ieee_emulation_warnings;
extern void execve_tail(void);
/*
* User space process size: 2GB for 31 bit, 4TB or 8PT for 64 bit.
*/
#define TASK_SIZE_OF(tsk) (test_tsk_thread_flag(tsk, TIF_31BIT) ? \
(1UL << 31) : -PAGE_SIZE)
#define TASK_UNMAPPED_BASE (test_thread_flag(TIF_31BIT) ? \
(1UL << 30) : (1UL << 41))
#define TASK_SIZE TASK_SIZE_OF(current)
#define TASK_SIZE_MAX (-PAGE_SIZE)
#define STACK_TOP (test_thread_flag(TIF_31BIT) ? \
(1UL << 31) : (1UL << 42))
#define STACK_TOP_MAX (1UL << 42)
#define HAVE_ARCH_PICK_MMAP_LAYOUT
typedef unsigned int mm_segment_t;
/*
* Thread structure
*/
struct thread_struct {
unsigned int acrs[NUM_ACRS];
unsigned long ksp; /* kernel stack pointer */
unsigned long user_timer; /* task cputime in user space */
unsigned long guest_timer; /* task cputime in kvm guest */
unsigned long system_timer; /* task cputime in kernel space */
unsigned long hardirq_timer; /* task cputime in hardirq context */
unsigned long softirq_timer; /* task cputime in softirq context */
unsigned long sys_call_table; /* system call table address */
mm_segment_t mm_segment;
unsigned long gmap_addr; /* address of last gmap fault. */
unsigned int gmap_write_flag; /* gmap fault write indication */
unsigned int gmap_int_code; /* int code of last gmap fault */
unsigned int gmap_pfault; /* signal of a pending guest pfault */
/* Per-thread information related to debugging */
struct per_regs per_user; /* User specified PER registers */
struct per_event per_event; /* Cause of the last PER trap */
unsigned long per_flags; /* Flags to control debug behavior */
unsigned int system_call; /* system call number in signal */
unsigned long last_break; /* last breaking-event-address. */
/* pfault_wait is used to block the process on a pfault event */
unsigned long pfault_wait;
struct list_head list;
/* cpu runtime instrumentation */
struct runtime_instr_cb *ri_cb;
struct gs_cb *gs_cb; /* Current guarded storage cb */
struct gs_cb *gs_bc_cb; /* Broadcast guarded storage cb */
unsigned char trap_tdb[256]; /* Transaction abort diagnose block */
/*
* Warning: 'fpu' is dynamically-sized. It *MUST* be at
* the end.
*/
struct fpu fpu; /* FP and VX register save area */
};
/* Flag to disable transactions. */
#define PER_FLAG_NO_TE 1UL
/* Flag to enable random transaction aborts. */
#define PER_FLAG_TE_ABORT_RAND 2UL
/* Flag to specify random transaction abort mode:
* - abort each transaction at a random instruction before TEND if set.
* - abort random transactions at a random instruction if cleared.
*/
#define PER_FLAG_TE_ABORT_RAND_TEND 4UL
typedef struct thread_struct thread_struct;
/*
* Stack layout of a C stack frame.
*/
#ifndef __PACK_STACK
struct stack_frame {
unsigned long back_chain;
unsigned long empty1[5];
unsigned long gprs[10];
unsigned int empty2[8];
};
#else
struct stack_frame {
unsigned long empty1[5];
unsigned int empty2[8];
unsigned long gprs[10];
unsigned long back_chain;
};
#endif
#define ARCH_MIN_TASKALIGN 8
#define INIT_THREAD { \
.ksp = sizeof(init_stack) + (unsigned long) &init_stack, \
.fpu.regs = (void *) init_task.thread.fpu.fprs, \
}
/*
* Do necessary setup to start up a new thread.
*/
#define start_thread(regs, new_psw, new_stackp) do { \
regs->psw.mask = PSW_USER_BITS | PSW_MASK_EA | PSW_MASK_BA; \
regs->psw.addr = new_psw; \
regs->gprs[15] = new_stackp; \
execve_tail(); \
} while (0)
#define start_thread31(regs, new_psw, new_stackp) do { \
regs->psw.mask = PSW_USER_BITS | PSW_MASK_BA; \
regs->psw.addr = new_psw; \
regs->gprs[15] = new_stackp; \
crst_table_downgrade(current->mm); \
execve_tail(); \
} while (0)
/* Forward declaration, a strange C thing */
struct task_struct;
struct mm_struct;
struct seq_file;
struct pt_regs;
typedef int (*dump_trace_func_t)(void *data, unsigned long address, int reliable);
void dump_trace(dump_trace_func_t func, void *data,
struct task_struct *task, unsigned long sp);
void show_registers(struct pt_regs *regs);
void show_cacheinfo(struct seq_file *m);
/* Free all resources held by a thread. */
static inline void release_thread(struct task_struct *tsk) { }
/* Free guarded storage control block */
void guarded_storage_release(struct task_struct *tsk);
unsigned long get_wchan(struct task_struct *p);
#define task_pt_regs(tsk) ((struct pt_regs *) \
(task_stack_page(tsk) + THREAD_SIZE) - 1)
#define KSTK_EIP(tsk) (task_pt_regs(tsk)->psw.addr)
#define KSTK_ESP(tsk) (task_pt_regs(tsk)->gprs[15])
/* Has task runtime instrumentation enabled ? */
#define is_ri_task(tsk) (!!(tsk)->thread.ri_cb)
static inline unsigned long current_stack_pointer(void)
{
unsigned long sp;
asm volatile("la %0,0(15)" : "=a" (sp));
return sp;
}
static inline unsigned short stap(void)
{
unsigned short cpu_address;
asm volatile("stap %0" : "=m" (cpu_address));
return cpu_address;
}
/*
* Give up the time slice of the virtual PU.
*/
#define cpu_relax_yield cpu_relax_yield
void cpu_relax_yield(void);
#define cpu_relax() barrier()
#define ECAG_CACHE_ATTRIBUTE 0
#define ECAG_CPU_ATTRIBUTE 1
static inline unsigned long __ecag(unsigned int asi, unsigned char parm)
{
unsigned long val;
asm volatile(".insn rsy,0xeb000000004c,%0,0,0(%1)" /* ecag */
: "=d" (val) : "a" (asi << 8 | parm));
return val;
}
static inline void psw_set_key(unsigned int key)
{
asm volatile("spka 0(%0)" : : "d" (key));
}
/*
* Set PSW to specified value.
*/
static inline void __load_psw(psw_t psw)
{
asm volatile("lpswe %0" : : "Q" (psw) : "cc");
}
/*
* Set PSW mask to specified value, while leaving the
* PSW addr pointing to the next instruction.
*/
static inline void __load_psw_mask(unsigned long mask)
{
unsigned long addr;
psw_t psw;
psw.mask = mask;
asm volatile(
" larl %0,1f\n"
" stg %0,%O1+8(%R1)\n"
" lpswe %1\n"
"1:"
: "=&d" (addr), "=Q" (psw) : "Q" (psw) : "memory", "cc");
}
/*
* Extract current PSW mask
*/
static inline unsigned long __extract_psw(void)
{
unsigned int reg1, reg2;
asm volatile("epsw %0,%1" : "=d" (reg1), "=a" (reg2));
return (((unsigned long) reg1) << 32) | ((unsigned long) reg2);
}
static inline void local_mcck_enable(void)
{
__load_psw_mask(__extract_psw() | PSW_MASK_MCHECK);
}
static inline void local_mcck_disable(void)
{
__load_psw_mask(__extract_psw() & ~PSW_MASK_MCHECK);
}
/*
* Rewind PSW instruction address by specified number of bytes.
*/
static inline unsigned long __rewind_psw(psw_t psw, unsigned long ilc)
{
unsigned long mask;
mask = (psw.mask & PSW_MASK_EA) ? -1UL :
(psw.mask & PSW_MASK_BA) ? (1UL << 31) - 1 :
(1UL << 24) - 1;
return (psw.addr - ilc) & mask;
}
/*
* Function to stop a processor until the next interrupt occurs
*/
void enabled_wait(void);
/*
* Function to drop a processor into disabled wait state
*/
static inline void __noreturn disabled_wait(unsigned long code)
{
psw_t psw;
psw.mask = PSW_MASK_BASE | PSW_MASK_WAIT | PSW_MASK_BA | PSW_MASK_EA;
psw.addr = code;
__load_psw(psw);
while (1);
}
/*
* Basic Machine Check/Program Check Handler.
*/
extern void s390_base_mcck_handler(void);
extern void s390_base_pgm_handler(void);
extern void s390_base_ext_handler(void);
extern void (*s390_base_mcck_handler_fn)(void);
extern void (*s390_base_pgm_handler_fn)(void);
extern void (*s390_base_ext_handler_fn)(void);
#define ARCH_LOW_ADDRESS_LIMIT 0x7fffffffUL
extern int memcpy_real(void *, void *, size_t);
extern void memcpy_absolute(void *, void *, size_t);
#define mem_assign_absolute(dest, val) do { \
__typeof__(dest) __tmp = (val); \
\
BUILD_BUG_ON(sizeof(__tmp) != sizeof(val)); \
memcpy_absolute(&(dest), &__tmp, sizeof(__tmp)); \
} while (0)
#endif /* __ASSEMBLY__ */
#endif /* __ASM_S390_PROCESSOR_H */