Introduce CONFIG_INTEL_RDT_A (default: no, dependent on CPU_SUP_INTEL) to
control inclusion of Resource Director Technology in the build.
Simple init() routine just checks which features are present. If they are
pr_info() one line summary for each feature for now.
Signed-off-by: Fenghua Yu <fenghua.yu@intel.com>
Cc: "Ravi V Shankar" <ravi.v.shankar@intel.com>
Cc: "Tony Luck" <tony.luck@intel.com>
Cc: "David Carrillo-Cisneros" <davidcc@google.com>
Cc: "Sai Prakhya" <sai.praneeth.prakhya@intel.com>
Cc: "Peter Zijlstra" <peterz@infradead.org>
Cc: "Stephane Eranian" <eranian@google.com>
Cc: "Dave Hansen" <dave.hansen@intel.com>
Cc: "Shaohua Li" <shli@fb.com>
Cc: "Nilay Vaish" <nilayvaish@gmail.com>
Cc: "Vikas Shivappa" <vikas.shivappa@linux.intel.com>
Cc: "Ingo Molnar" <mingo@elte.hu>
Cc: "Borislav Petkov" <bp@suse.de>
Cc: "H. Peter Anvin" <h.peter.anvin@intel.com>
Link: http://lkml.kernel.org/r/1477142405-32078-7-git-send-email-fenghua.yu@intel.com
Signed-off-by: Thomas Gleixner <tglx@linutronix.de>
Cache id is retrieved from APIC ID and CPUID leaf 4 on x86.
For more details please see the section on "Cache ID Extraction
Parameters" in "Intel 64 Architecture Processor Topology Enumeration".
Also the documentation of the CPUID instruction in the "Intel 64 and
IA-32 Architectures Software Developer's Manual"
Signed-off-by: Fenghua Yu <fenghua.yu@intel.com>
Cc: "Ravi V Shankar" <ravi.v.shankar@intel.com>
Cc: "Tony Luck" <tony.luck@intel.com>
Cc: "David Carrillo-Cisneros" <davidcc@google.com>
Cc: "Sai Prakhya" <sai.praneeth.prakhya@intel.com>
Cc: "Peter Zijlstra" <peterz@infradead.org>
Cc: "Stephane Eranian" <eranian@google.com>
Cc: "Dave Hansen" <dave.hansen@intel.com>
Cc: "Shaohua Li" <shli@fb.com>
Cc: "Nilay Vaish" <nilayvaish@gmail.com>
Cc: "Vikas Shivappa" <vikas.shivappa@linux.intel.com>
Cc: "Ingo Molnar" <mingo@elte.hu>
Cc: "Borislav Petkov" <bp@suse.de>
Cc: "H. Peter Anvin" <h.peter.anvin@intel.com>
Link: http://lkml.kernel.org/r/1477142405-32078-4-git-send-email-fenghua.yu@intel.com
Signed-off-by: Thomas Gleixner <tglx@linutronix.de>
Commit 784d5699ed ("x86: move exports to actual definitions") removed the
EXPORT_SYMBOL(__fentry__) and EXPORT_SYMBOL(mcount) from x8664_ksyms_64.c,
and added EXPORT_SYMBOL(function_hook) in mcount_64.S instead. The problem
is that function_hook isn't a function at all, but a macro that is defined
as either mcount or __fentry__ depending on the support from gcc.
Originally, I thought this was a macro issue, like what __stringify()
is used for. But the problem is a bit deeper. The Makefile.build has
some magic that does post processing of files to create the CRC
bindings. It does some searches for EXPORT_SYMBOL() and because it
finds a macro name and not the actual functions, this causes
function_hook not to be converted into mcount or __fentry__ and they
are missed.
Instead of adding more magic to Makefile.build, just add
EXPORT_SYMBOL() for mcount and __fentry__ where the ifdef is used.
Since this is assembly and not C, it doesn't require being set after
the function is defined.
Signed-off-by: Steven Rostedt <rostedt@goodmis.org>
Tested-by: Borislav Petkov <bp@alien8.de>
Cc: Gabriel C <nix.or.die@gmail.com>
Cc: Nicholas Piggin <npiggin@gmail.com>
Cc: Al Viro <viro@ZenIV.linux.org.uk>
Link: http://lkml.kernel.org/r/20161024150148.4f9d90e4@gandalf.local.home
Signed-off-by: Thomas Gleixner <tglx@linutronix.de>
Currently after bringing up secondary CPUs all arches print "Brought up
%d CPUs". On x86 they also print the number of nodes that were brought
online.
It would be nice to also print the number of nodes on other arches.
Although we could override smp_announce() on the other ~10 NUMA aware
arches, it seems simpler to just always print the number of nodes. On
non-NUMA arches there is just always 1 node.
Having done that, smp_announce() is no longer weak, and seems small
enough to just pull directly into smp_init().
Also update the printing of "%d CPUs" to be smart when an SMP kernel is
booted on a single CPU system, or when only one CPU is available, eg:
smp: Brought up 2 nodes, 1 CPU
Signed-off-by: Michael Ellerman <mpe@ellerman.id.au>
Reviewed-by: Borislav Petkov <bp@suse.de>
Cc: akpm@osdl.org
Cc: jgross@suse.com
Cc: ak@linux.intel.com
Cc: tim.c.chen@linux.intel.com
Cc: len.brown@intel.com
Cc: peterz@infradead.org
Cc: richard@nod.at
Cc: jolsa@redhat.com
Cc: boris.ostrovsky@oracle.com
Cc: mgorman@techsingularity.net
Link: http://lkml.kernel.org/r/1477460275-8266-2-git-send-email-mpe@ellerman.id.au
Signed-off-by: Thomas Gleixner <tglx@linutronix.de>
For mostly historical reasons, the x86 oops dump shows the raw stack
values:
...
[registers]
Stack:
ffff880079af7350 ffff880079905400 0000000000000000 ffffc900008f3ae0
ffffffffa0196610 0000000000000001 00010000ffffffff 0000000087654321
0000000000000002 0000000000000000 0000000000000000 0000000000000000
Call Trace:
...
This seems to be an artifact from long ago, and probably isn't needed
anymore. It generally just adds noise to the dump, and it can be
actively harmful because it leaks kernel addresses.
Linus says:
"The stack dump actually goes back to forever, and it used to be
useful back in 1992 or so. But it used to be useful mainly because
stacks were simpler and we didn't have very good call traces anyway. I
definitely remember having used them - I just do not remember having
used them in the last ten+ years.
Of course, it's still true that if you can trigger an oops, you've
likely already lost the security game, but since the stack dump is so
useless, let's aim to just remove it and make games like the above
harder."
This also removes the related 'kstack=' cmdline option and the
'kstack_depth_to_print' sysctl.
Suggested-by: Linus Torvalds <torvalds@linux-foundation.org>
Signed-off-by: Josh Poimboeuf <jpoimboe@redhat.com>
Cc: Andy Lutomirski <luto@kernel.org>
Cc: Borislav Petkov <bp@alien8.de>
Cc: Brian Gerst <brgerst@gmail.com>
Cc: Denys Vlasenko <dvlasenk@redhat.com>
Cc: H. Peter Anvin <hpa@zytor.com>
Cc: Peter Zijlstra <peterz@infradead.org>
Cc: Thomas Gleixner <tglx@linutronix.de>
Link: http://lkml.kernel.org/r/e83bd50df52d8fe88e94d2566426ae40d813bf8f.1477405374.git.jpoimboe@redhat.com
Signed-off-by: Ingo Molnar <mingo@kernel.org>
Printing kernel text addresses in stack dumps is of questionable value,
especially now that address randomization is becoming common.
It can be a security issue because it leaks kernel addresses. It also
affects the usefulness of the stack dump. Linus says:
"I actually spend time cleaning up commit messages in logs, because
useless data that isn't actually information (random hex numbers) is
actively detrimental.
It makes commit logs less legible.
It also makes it harder to parse dumps.
It's not useful. That makes it actively bad.
I probably look at more oops reports than most people. I have not
found the hex numbers useful for the last five years, because they are
just randomized crap.
The stack content thing just makes code scroll off the screen etc, for
example."
The only real downside to removing these addresses is that they can be
used to disambiguate duplicate symbol names. However such cases are
rare, and the context of the stack dump should be enough to be able to
figure it out.
There's now a 'faddr2line' script which can be used to convert a
function address to a file name and line:
$ ./scripts/faddr2line ~/k/vmlinux write_sysrq_trigger+0x51/0x60
write_sysrq_trigger+0x51/0x60:
write_sysrq_trigger at drivers/tty/sysrq.c:1098
Or gdb can be used:
$ echo "list *write_sysrq_trigger+0x51" |gdb ~/k/vmlinux |grep "is in"
(gdb) 0xffffffff815b5d83 is in driver_probe_device (/home/jpoimboe/git/linux/drivers/base/dd.c:378).
(But note that when there are duplicate symbol names, gdb will only show
the first symbol it finds. faddr2line is recommended over gdb because
it handles duplicates and it also does function size checking.)
Here's an example of what a stack dump looks like after this change:
BUG: unable to handle kernel NULL pointer dereference at (null)
IP: sysrq_handle_crash+0x45/0x80
PGD 36bfa067 [ 29.650644] PUD 7aca3067
Oops: 0002 [#1] PREEMPT SMP
Modules linked in: ...
CPU: 1 PID: 786 Comm: bash Tainted: G E 4.9.0-rc1+ #1
Hardware name: QEMU Standard PC (i440FX + PIIX, 1996), BIOS 1.9.1-1.fc24 04/01/2014
task: ffff880078582a40 task.stack: ffffc90000ba8000
RIP: 0010:sysrq_handle_crash+0x45/0x80
RSP: 0018:ffffc90000babdc8 EFLAGS: 00010296
RAX: ffff880078582a40 RBX: 0000000000000063 RCX: 0000000000000001
RDX: 0000000000000001 RSI: 0000000000000000 RDI: 0000000000000292
RBP: ffffc90000babdc8 R08: 0000000b31866061 R09: 0000000000000000
R10: 0000000000000001 R11: 0000000000000000 R12: 0000000000000000
R13: 0000000000000007 R14: ffffffff81ee8680 R15: 0000000000000000
FS: 00007ffb43869700(0000) GS:ffff88007d400000(0000) knlGS:0000000000000000
CS: 0010 DS: 0000 ES: 0000 CR0: 0000000080050033
CR2: 0000000000000000 CR3: 000000007a3e9000 CR4: 00000000001406e0
Stack:
ffffc90000babe00 ffffffff81572d08 ffffffff81572bd5 0000000000000002
0000000000000000 ffff880079606600 00007ffb4386e000 ffffc90000babe20
ffffffff81573201 ffff880036a3fd00 fffffffffffffffb ffffc90000babe40
Call Trace:
__handle_sysrq+0x138/0x220
? __handle_sysrq+0x5/0x220
write_sysrq_trigger+0x51/0x60
proc_reg_write+0x42/0x70
__vfs_write+0x37/0x140
? preempt_count_sub+0xa1/0x100
? __sb_start_write+0xf5/0x210
? vfs_write+0x183/0x1a0
vfs_write+0xb8/0x1a0
SyS_write+0x58/0xc0
entry_SYSCALL_64_fastpath+0x1f/0xc2
RIP: 0033:0x7ffb42f55940
RSP: 002b:00007ffd33bb6b18 EFLAGS: 00000246 ORIG_RAX: 0000000000000001
RAX: ffffffffffffffda RBX: 0000000000000046 RCX: 00007ffb42f55940
RDX: 0000000000000002 RSI: 00007ffb4386e000 RDI: 0000000000000001
RBP: 0000000000000011 R08: 00007ffb4321ea40 R09: 00007ffb43869700
R10: 00007ffb43869700 R11: 0000000000000246 R12: 0000000000778a10
R13: 00007ffd33bb5c00 R14: 0000000000000007 R15: 0000000000000010
Code: 34 e8 d0 34 bc ff 48 c7 c2 3b 2b 57 81 be 01 00 00 00 48 c7 c7 e0 dd e5 81 e8 a8 55 ba ff c7 05 0e 3f de 00 01 00 00 00 0f ae f8 <c6> 04 25 00 00 00 00 01 5d c3 e8 4c 49 bc ff 84 c0 75 c3 48 c7
RIP: sysrq_handle_crash+0x45/0x80 RSP: ffffc90000babdc8
CR2: 0000000000000000
Suggested-by: Linus Torvalds <torvalds@linux-foundation.org>
Signed-off-by: Josh Poimboeuf <jpoimboe@redhat.com>
Cc: Andy Lutomirski <luto@kernel.org>
Cc: Borislav Petkov <bp@alien8.de>
Cc: Brian Gerst <brgerst@gmail.com>
Cc: Denys Vlasenko <dvlasenk@redhat.com>
Cc: H. Peter Anvin <hpa@zytor.com>
Cc: Peter Zijlstra <peterz@infradead.org>
Cc: Thomas Gleixner <tglx@linutronix.de>
Link: http://lkml.kernel.org/r/69329cb29b8f324bb5fcea14d61d224807fb6488.1477405374.git.jpoimboe@redhat.com
Signed-off-by: Ingo Molnar <mingo@kernel.org>
Yeah, I know, I know, this is a huuge patch and reviewing it is hard.
Sorry but this is the only way I could think of in which I can rewrite
the microcode patches loading procedure without breaking (knowingly) the
driver.
So maybe this patch is easier to review if one looks at the files after
the patch has been applied instead at the diff. Because then it becomes
pretty obvious:
* The BSP-loading path - load_ucode_bsp() is working independently from
the AP path now and it doesn't save any pointers or patches anymore -
it solely parses the builtin or initrd microcode and applies the patch.
That's it.
This fixes the CONFIG_RANDOMIZE_MEMORY offset fun more solidly.
* The AP-loading path - load_ucode_ap() then goes and scans
builtin/initrd *again* for the microcode patches but it caches them this
time so that we don't have to do that scan on each AP but only once.
This simplifies the code considerably.
Then, when we save the microcode from the initrd/builtin, we go and
add the relevant patches to our own cache. The AMD side did do that
and now the Intel side does it too. So no more pointer copying and
blabla, we save the microcode patches ourselves and are independent from
initrd/builtin.
This whole conversion gives us other benefits like unifying the
initrd parsing into a single function: find_microcode_in_initrd() is
used by both.
The diffstat speaks for itself: 456 insertions(+), 695 deletions(-)
Signed-off-by: Borislav Petkov <bp@suse.de>
Cc: Andy Lutomirski <luto@kernel.org>
Cc: Borislav Petkov <bp@alien8.de>
Cc: Brian Gerst <brgerst@gmail.com>
Cc: Denys Vlasenko <dvlasenk@redhat.com>
Cc: H. Peter Anvin <hpa@zytor.com>
Cc: Josh Poimboeuf <jpoimboe@redhat.com>
Cc: Linus Torvalds <torvalds@linux-foundation.org>
Cc: Peter Zijlstra <peterz@infradead.org>
Cc: Thomas Gleixner <tglx@linutronix.de>
Link: http://lkml.kernel.org/r/20161025095522.11964-12-bp@alien8.de
Signed-off-by: Ingo Molnar <mingo@kernel.org>
Its functions are used in intel.c only now, so get rid of it. Make
functions static.
No functionality change.
Signed-off-by: Borislav Petkov <bp@suse.de>
Cc: Andy Lutomirski <luto@kernel.org>
Cc: Borislav Petkov <bp@alien8.de>
Cc: Brian Gerst <brgerst@gmail.com>
Cc: Denys Vlasenko <dvlasenk@redhat.com>
Cc: H. Peter Anvin <hpa@zytor.com>
Cc: Josh Poimboeuf <jpoimboe@redhat.com>
Cc: Linus Torvalds <torvalds@linux-foundation.org>
Cc: Peter Zijlstra <peterz@infradead.org>
Cc: Thomas Gleixner <tglx@linutronix.de>
Link: http://lkml.kernel.org/r/20161025095522.11964-11-bp@alien8.de
Signed-off-by: Ingo Molnar <mingo@kernel.org>
Make them all static as they're used in a single file now.
No functionality change.
Signed-off-by: Borislav Petkov <bp@suse.de>
Cc: Andy Lutomirski <luto@kernel.org>
Cc: Borislav Petkov <bp@alien8.de>
Cc: Brian Gerst <brgerst@gmail.com>
Cc: Denys Vlasenko <dvlasenk@redhat.com>
Cc: H. Peter Anvin <hpa@zytor.com>
Cc: Josh Poimboeuf <jpoimboe@redhat.com>
Cc: Linus Torvalds <torvalds@linux-foundation.org>
Cc: Peter Zijlstra <peterz@infradead.org>
Cc: Thomas Gleixner <tglx@linutronix.de>
Link: http://lkml.kernel.org/r/20161025095522.11964-10-bp@alien8.de
Signed-off-by: Ingo Molnar <mingo@kernel.org>
We need to reread the CPU's microcode revision after resume because
applied microcode gets "forgotten" depending on the sleep state.
Signed-off-by: Borislav Petkov <bp@suse.de>
Cc: Andy Lutomirski <luto@kernel.org>
Cc: Borislav Petkov <bp@alien8.de>
Cc: Brian Gerst <brgerst@gmail.com>
Cc: Denys Vlasenko <dvlasenk@redhat.com>
Cc: H. Peter Anvin <hpa@zytor.com>
Cc: Josh Poimboeuf <jpoimboe@redhat.com>
Cc: Linus Torvalds <torvalds@linux-foundation.org>
Cc: Peter Zijlstra <peterz@infradead.org>
Cc: Thomas Gleixner <tglx@linutronix.de>
Link: http://lkml.kernel.org/r/20161025095522.11964-9-bp@alien8.de
Signed-off-by: Ingo Molnar <mingo@kernel.org>
Move it after the patch application function which also checks whether
we were successful.
Signed-off-by: Borislav Petkov <bp@suse.de>
Cc: Andy Lutomirski <luto@kernel.org>
Cc: Borislav Petkov <bp@alien8.de>
Cc: Brian Gerst <brgerst@gmail.com>
Cc: Denys Vlasenko <dvlasenk@redhat.com>
Cc: H. Peter Anvin <hpa@zytor.com>
Cc: Josh Poimboeuf <jpoimboe@redhat.com>
Cc: Linus Torvalds <torvalds@linux-foundation.org>
Cc: Peter Zijlstra <peterz@infradead.org>
Cc: Thomas Gleixner <tglx@linutronix.de>
Link: http://lkml.kernel.org/r/20161025095522.11964-8-bp@alien8.de
Signed-off-by: Ingo Molnar <mingo@kernel.org>
Will be needed in a following patch.
No functionality change.
Signed-off-by: Borislav Petkov <bp@suse.de>
Cc: Andy Lutomirski <luto@kernel.org>
Cc: Borislav Petkov <bp@alien8.de>
Cc: Brian Gerst <brgerst@gmail.com>
Cc: Denys Vlasenko <dvlasenk@redhat.com>
Cc: H. Peter Anvin <hpa@zytor.com>
Cc: Josh Poimboeuf <jpoimboe@redhat.com>
Cc: Linus Torvalds <torvalds@linux-foundation.org>
Cc: Peter Zijlstra <peterz@infradead.org>
Cc: Thomas Gleixner <tglx@linutronix.de>
Link: http://lkml.kernel.org/r/20161025095522.11964-7-bp@alien8.de
Signed-off-by: Ingo Molnar <mingo@kernel.org>
It will be used by both drivers so move it to core.c.
No functionality change.
Signed-off-by: Borislav Petkov <bp@suse.de>
Cc: Andy Lutomirski <luto@kernel.org>
Cc: Borislav Petkov <bp@alien8.de>
Cc: Brian Gerst <brgerst@gmail.com>
Cc: Denys Vlasenko <dvlasenk@redhat.com>
Cc: H. Peter Anvin <hpa@zytor.com>
Cc: Josh Poimboeuf <jpoimboe@redhat.com>
Cc: Linus Torvalds <torvalds@linux-foundation.org>
Cc: Peter Zijlstra <peterz@infradead.org>
Cc: Thomas Gleixner <tglx@linutronix.de>
Link: http://lkml.kernel.org/r/20161025095522.11964-6-bp@alien8.de
Signed-off-by: Ingo Molnar <mingo@kernel.org>
Make it return the ucode_state directly instead of assigning to a state
variable and jumping to an out: label.
No functionality change.
Signed-off-by: Borislav Petkov <bp@suse.de>
Cc: Andy Lutomirski <luto@kernel.org>
Cc: Borislav Petkov <bp@alien8.de>
Cc: Brian Gerst <brgerst@gmail.com>
Cc: Denys Vlasenko <dvlasenk@redhat.com>
Cc: H. Peter Anvin <hpa@zytor.com>
Cc: Josh Poimboeuf <jpoimboe@redhat.com>
Cc: Linus Torvalds <torvalds@linux-foundation.org>
Cc: Peter Zijlstra <peterz@infradead.org>
Cc: Thomas Gleixner <tglx@linutronix.de>
Link: http://lkml.kernel.org/r/20161025095522.11964-4-bp@alien8.de
Signed-off-by: Ingo Molnar <mingo@kernel.org>
cpu_init() is run also on the BSP (in addition to the APs):
x86_64_start_kernel
|-> x86_64_start_reservations
|-> start_kernel
|-> trap_init
|-> cpu_init
|-> load_ucode_ap
...
but we run the AP (Application Processors) microcode loading routine
there too even though we have a BSP-specific routine for that:
load_ucode_bsp().
Which is unnecessary. So let's limit the AP microcode loading routine to
the APs only.
Remove a useless comment while at it.
Signed-off-by: Borislav Petkov <bp@suse.de>
Cc: Andy Lutomirski <luto@kernel.org>
Cc: Borislav Petkov <bp@alien8.de>
Cc: Brian Gerst <brgerst@gmail.com>
Cc: Denys Vlasenko <dvlasenk@redhat.com>
Cc: H. Peter Anvin <hpa@zytor.com>
Cc: Josh Poimboeuf <jpoimboe@redhat.com>
Cc: Linus Torvalds <torvalds@linux-foundation.org>
Cc: Peter Zijlstra <peterz@infradead.org>
Cc: Thomas Gleixner <tglx@linutronix.de>
Link: http://lkml.kernel.org/r/20161025095522.11964-2-bp@alien8.de
Signed-off-by: Ingo Molnar <mingo@kernel.org>
It is useless as it dumps the MSRs too early BUT(!) we do set MSRs later too.
Also, it dumps only BSP MSRs as it gets called only for CPU 0.
And the MSR range array would need constant updating anyway, and so on
and so on...
Oh, and we have msr.ko and msr-tools which are the much better solution
anyway. So off it goes...
Signed-off-by: Borislav Petkov <bp@suse.de>
Cc: Andy Lutomirski <luto@kernel.org>
Cc: Borislav Petkov <bp@alien8.de>
Cc: Brian Gerst <brgerst@gmail.com>
Cc: Denys Vlasenko <dvlasenk@redhat.com>
Cc: H. Peter Anvin <hpa@zytor.com>
Cc: Josh Poimboeuf <jpoimboe@redhat.com>
Cc: Linus Torvalds <torvalds@linux-foundation.org>
Cc: Peter Zijlstra <peterz@infradead.org>
Cc: Thomas Gleixner <tglx@linutronix.de>
Link: http://lkml.kernel.org/r/20161024173844.23038-4-bp@alien8.de
Signed-off-by: Ingo Molnar <mingo@kernel.org>
Should be easier when following boot paths. It probably is a left over
from the x86 unification eons ago.
No functionality change.
Signed-off-by: Borislav Petkov <bp@suse.de>
Cc: Andy Lutomirski <luto@kernel.org>
Cc: Borislav Petkov <bp@alien8.de>
Cc: Brian Gerst <brgerst@gmail.com>
Cc: Denys Vlasenko <dvlasenk@redhat.com>
Cc: H. Peter Anvin <hpa@zytor.com>
Cc: Josh Poimboeuf <jpoimboe@redhat.com>
Cc: Linus Torvalds <torvalds@linux-foundation.org>
Cc: Peter Zijlstra <peterz@infradead.org>
Cc: Thomas Gleixner <tglx@linutronix.de>
Link: http://lkml.kernel.org/r/20161024173844.23038-3-bp@alien8.de
Signed-off-by: Ingo Molnar <mingo@kernel.org>
We're using a literal, move it into the string.
No functionality change.
Signed-off-by: Borislav Petkov <bp@suse.de>
Cc: Andy Lutomirski <luto@kernel.org>
Cc: Borislav Petkov <bp@alien8.de>
Cc: Brian Gerst <brgerst@gmail.com>
Cc: Denys Vlasenko <dvlasenk@redhat.com>
Cc: H. Peter Anvin <hpa@zytor.com>
Cc: Josh Poimboeuf <jpoimboe@redhat.com>
Cc: Linus Torvalds <torvalds@linux-foundation.org>
Cc: Peter Zijlstra <peterz@infradead.org>
Cc: Thomas Gleixner <tglx@linutronix.de>
Link: http://lkml.kernel.org/r/20161024173844.23038-2-bp@alien8.de
Signed-off-by: Ingo Molnar <mingo@kernel.org>
gcc -Wmaybe-uninitialized detects that quirk_intel_brickland_xeon_ras_cap
uses uninitialized data when CONFIG_PCI is not set:
arch/x86/kernel/quirks.c: In function ‘quirk_intel_brickland_xeon_ras_cap’:
arch/x86/kernel/quirks.c:641:13: error: ‘capid0’ is used uninitialized in this function [-Werror=uninitialized]
However, the function is also not called in this configuration, so we
can avoid the warning by moving the existing #ifdef to cover it as well.
Signed-off-by: Arnd Bergmann <arnd@arndb.de>
Cc: Bjorn Helgaas <bhelgaas@google.com>
Cc: Borislav Petkov <bp@suse.de>
Cc: Linus Torvalds <torvalds@linux-foundation.org>
Cc: Peter Zijlstra <peterz@infradead.org>
Cc: Thomas Gleixner <tglx@linutronix.de>
Cc: Tony Luck <tony.luck@intel.com>
Cc: linux-pci@vger.kernel.org
Link: http://lkml.kernel.org/r/20161024153325.2752428-1-arnd@arndb.de
Signed-off-by: Ingo Molnar <mingo@kernel.org>
Vince Waver reported the following bug:
WARNING: CPU: 0 PID: 21338 at arch/x86/mm/fault.c:435 vmalloc_fault+0x58/0x1f0
CPU: 0 PID: 21338 Comm: perf_fuzzer Not tainted 4.8.0+ #37
Hardware name: Hewlett-Packard HP Compaq Pro 6305 SFF/1850, BIOS K06 v02.57 08/16/2013
Call Trace:
<NMI> ? dump_stack+0x46/0x59
? __warn+0xd5/0xee
? vmalloc_fault+0x58/0x1f0
? __do_page_fault+0x6d/0x48e
? perf_log_throttle+0xa4/0xf4
? trace_page_fault+0x22/0x30
? __unwind_start+0x28/0x42
? perf_callchain_kernel+0x75/0xac
? get_perf_callchain+0x13a/0x1f0
? perf_callchain+0x6a/0x6c
? perf_prepare_sample+0x71/0x2eb
? perf_event_output_forward+0x1a/0x54
? __default_send_IPI_shortcut+0x10/0x2d
? __perf_event_overflow+0xfb/0x167
? x86_pmu_handle_irq+0x113/0x150
? native_read_msr+0x6/0x34
? perf_event_nmi_handler+0x22/0x39
? perf_ibs_nmi_handler+0x4a/0x51
? perf_event_nmi_handler+0x22/0x39
? nmi_handle+0x4d/0xf0
? perf_ibs_handle_irq+0x3d1/0x3d1
? default_do_nmi+0x3c/0xd5
? do_nmi+0x92/0x102
? end_repeat_nmi+0x1a/0x1e
? entry_SYSCALL_64_after_swapgs+0x12/0x4a
? entry_SYSCALL_64_after_swapgs+0x12/0x4a
? entry_SYSCALL_64_after_swapgs+0x12/0x4a
<EOE> ^A4---[ end trace 632723104d47d31a ]---
BUG: stack guard page was hit at ffffc90008500000 (stack is ffffc900084fc000..ffffc900084fffff)
kernel stack overflow (page fault): 0000 [#1] SMP
...
The NMI hit in the entry code right after setting up the stack pointer
from 'cpu_current_top_of_stack', so the kernel stack was empty. The
'guess' version of __unwind_start() attempted to dereference the "top of
stack" pointer, which is not actually *on* the stack.
Add a check in the guess unwinder to deal with an empty stack. (The
frame pointer unwinder already has such a check.)
Reported-by: Vince Weaver <vincent.weaver@maine.edu>
Signed-off-by: Josh Poimboeuf <jpoimboe@redhat.com>
Cc: Andy Lutomirski <luto@kernel.org>
Cc: Arnaldo Carvalho de Melo <acme@kernel.org>
Cc: Linus Torvalds <torvalds@linux-foundation.org>
Cc: Peter Zijlstra <peterz@infradead.org>
Cc: Thomas Gleixner <tglx@linutronix.de>
Fixes: 7c7900f897 ("x86/unwind: Add new unwind interface and implementations")
Link: http://lkml.kernel.org/r/20161024133127.e5evgeebdbohnmpb@treble
Signed-off-by: Ingo Molnar <mingo@kernel.org>
Ondrej reported that IRQs stopped working in v4.7 on several
platforms. A typical scenario, from Ondrej's VT82C694X/694X, is:
ACPI: Using PIC for interrupt routing
ACPI: PCI Interrupt Link [LNKA] (IRQs 1 3 4 5 6 7 10 *11 12 14 15)
ACPI: No IRQ available for PCI Interrupt Link [LNKA]
8139too 0000:00:0f.0: PCI INT A: no GSI
We're using PIC routing, so acpi_irq_balance == 0, and LNKA is already
active at IRQ 11. In that case, acpi_pci_link_allocate() only tries
to use the active IRQ (IRQ 11) which also happens to be the SCI.
We should penalize the SCI by PIRQ_PENALTY_PCI_USING, but
irq_get_trigger_type(11) returns something other than
IRQ_TYPE_LEVEL_LOW, so we penalize it by PIRQ_PENALTY_ISA_ALWAYS
instead, which makes acpi_pci_link_allocate() assume the IRQ isn't
available and give up.
Add acpi_penalize_sci_irq() so platforms can tell us the SCI IRQ,
trigger, and polarity directly and we don't have to depend on
irq_get_trigger_type().
Fixes: 103544d869 (ACPI,PCI,IRQ: reduce resource requirements)
Link: http://lkml.kernel.org/r/201609251512.05657.linux@rainbow-software.org
Reported-by: Ondrej Zary <linux@rainbow-software.org>
Acked-by: Bjorn Helgaas <bhelgaas@google.com>
Signed-off-by: Sinan Kaya <okaya@codeaurora.org>
Tested-by: Jonathan Liu <net147@gmail.com>
Signed-off-by: Rafael J. Wysocki <rafael.j.wysocki@intel.com>
Pull x86 fixes from Ingo Molnar:
"Three fixes, a hw-enablement and a cross-arch fix/enablement change:
- SGI/UV fix for older platforms
- x32 signal handling fix
- older x86 platform bootup APIC fix
- AVX512-4VNNIW (Neural Network Instructions) and AVX512-4FMAPS
(Multiply Accumulation Single precision instructions) enablement.
- move thread_info back into x86 specific code, to make life easier
for other architectures trying to make use of
CONFIG_THREAD_INFO_IN_TASK_STRUCT=y"
* 'x86-urgent-for-linus' of git://git.kernel.org/pub/scm/linux/kernel/git/tip/tip:
x86/boot/smp: Don't try to poke disabled/non-existent APIC
sched/core, x86: Make struct thread_info arch specific again
x86/signal: Remove bogus user_64bit_mode() check from sigaction_compat_abi()
x86/platform/UV: Fix support for EFI_OLD_MEMMAP after BIOS callback updates
x86/cpufeature: Add AVX512_4VNNIW and AVX512_4FMAPS features
x86/vmware: Skip timer_irq_works() check on VMware
Apparently trying to poke a disabled or non-existent APIC
leads to a box that doesn't even boot. Let's not do that.
No real clue if this is the right fix, but at least my
P3 machine boots again.
Signed-off-by: Ville Syrjälä <ville.syrjala@linux.intel.com>
Cc: Andy Lutomirski <luto@kernel.org>
Cc: Borislav Petkov <bp@alien8.de>
Cc: Borislav Petkov <bp@suse.de>
Cc: Brian Gerst <brgerst@gmail.com>
Cc: Denys Vlasenko <dvlasenk@redhat.com>
Cc: Eric Biederman <ebiederm@xmission.com>
Cc: H. Peter Anvin <hpa@zytor.com>
Cc: Jiri Olsa <jolsa@redhat.com>
Cc: Josh Poimboeuf <jpoimboe@redhat.com>
Cc: Juergen Gross <jgross@suse.com>
Cc: Len Brown <len.brown@intel.com>
Cc: Linus Torvalds <torvalds@linux-foundation.org>
Cc: Peter Zijlstra <peterz@infradead.org>
Cc: Prarit Bhargava <prarit@redhat.com>
Cc: Thomas Gleixner <tglx@linutronix.de>
Cc: Yinghai Lu <yinghai@kernel.org>
Cc: dyoung@redhat.com
Cc: kexec@lists.infradead.org
Cc: stable@vger.kernel.org
Fixes: 2a51fe083e ("arch/x86: Handle non enumerated CPU after physical hotplug")
Link: http://lkml.kernel.org/r/1477102684-5092-1-git-send-email-ville.syrjala@linux.intel.com
Signed-off-by: Ingo Molnar <mingo@kernel.org>
kbuild test robot reported this against the -RT tree:
|>> arch/x86/kernel/acpi/boot.c:90:21: warning: 'acpi_ioapic_lock' defined but not used [-Wunused-variable]
| static DEFINE_MUTEX(acpi_ioapic_lock);
| ^
| include/linux/mutex_rt.h:27:15: note: in definition of macro 'DEFINE_MUTEX'
| struct mutex mutexname = __MUTEX_INITIALIZER(mutexname)
^~~~~~~~~
which is also true (as in non-used) for !RT but the compiler does not
emit a warning.
Reported-by: kbuild test robot <fengguang.wu@intel.com>
Signed-off-by: Sebastian Andrzej Siewior <bigeasy@linutronix.de>
Cc: Linus Torvalds <torvalds@linux-foundation.org>
Cc: Peter Zijlstra <peterz@infradead.org>
Cc: Thomas Gleixner <tglx@linutronix.de>
Link: http://lkml.kernel.org/r/20161021084449.32523-1-bigeasy@linutronix.de
Signed-off-by: Ingo Molnar <mingo@kernel.org>
Re-factor the vmware platform setup code to query the hypervisor for tsc
frequency only once during boot. Since the VMware hypervisor guarantees
constant TSC, calibrate_tsc now uses the saved value.
Signed-off-by: Alexey Makhalov <amakhalov@vmware.com>
Acked-by: Alok N Kataria <akataria@vmware.com>
Cc: virtualization@lists.linux-foundation.org
Link: http://lkml.kernel.org/r/20161020050211.GA25304@amakhalov-virtual-machine
Signed-off-by: Thomas Gleixner <tglx@linutronix.de>
The value of regs->orig_ax contains potentially useful debugging data:
For syscalls it contains the syscall number. For interrupts it contains
the (negated) vector number. To reduce noise, print it only if it has a
useful value (i.e., something other than -1).
Here's what it looks like for a write syscall:
RIP: 0033:[<00007f53ad7b1940>] 0x7f53ad7b1940
RSP: 002b:00007fff8de66558 EFLAGS: 00000246 ORIG_RAX: 0000000000000001
RAX: ffffffffffffffda RBX: 0000000000000046 RCX: 00007f53ad7b1940
RDX: 0000000000000002 RSI: 00007f53ae0ca000 RDI: 0000000000000001
...
Suggested-by: Andy Lutomirski <luto@amacapital.net>
Signed-off-by: Josh Poimboeuf <jpoimboe@redhat.com>
Cc: Andy Lutomirski <luto@kernel.org>
Cc: Borislav Petkov <bp@alien8.de>
Cc: Brian Gerst <brgerst@gmail.com>
Cc: Denys Vlasenko <dvlasenk@redhat.com>
Cc: H. Peter Anvin <hpa@zytor.com>
Cc: Linus Torvalds <torvalds@linux-foundation.org>
Cc: Peter Zijlstra <peterz@infradead.org>
Cc: Thomas Gleixner <tglx@linutronix.de>
Link: http://lkml.kernel.org/r/93f0fe0307a4af884d3fca00edabcc8cff236002.1476973742.git.jpoimboe@redhat.com
Signed-off-by: Ingo Molnar <mingo@kernel.org>
show_trace_log_lvl() prints the stack id (e.g. "<IRQ>") without a
newline so that any stack address printed after it will appear on the
same line. That causes the first stack address to be vertically
misaligned with the rest, making it visually cluttered and slightly
confusing:
Call Trace:
<IRQ> [<ffffffff814431c3>] dump_stack+0x86/0xc3
[<ffffffff8100828b>] perf_callchain_kernel+0x14b/0x160
[<ffffffff811e915f>] get_perf_callchain+0x15f/0x2b0
...
<EOI> [<ffffffff8189c6c3>] ? _raw_spin_unlock_irq+0x33/0x60
[<ffffffff810e1c84>] finish_task_switch+0xb4/0x250
[<ffffffff8106f7dc>] do_async_page_fault+0x2c/0xa0
It will look worse once we start printing pt_regs registers found in the
middle of the stack:
<IRQ> RIP: 0010:[<ffffffff8189c6c3>] [<ffffffff8189c6c3>] _raw_spin_unlock_irq+0x33/0x60
RSP: 0018:ffff88007876f720 EFLAGS: 00000206
RAX: ffff8800786caa40 RBX: ffff88007d5da140 RCX: 0000000000000007
...
Improve readability by adding a newline to the stack name:
Call Trace:
<IRQ>
[<ffffffff814431c3>] dump_stack+0x86/0xc3
[<ffffffff8100828b>] perf_callchain_kernel+0x14b/0x160
[<ffffffff811e915f>] get_perf_callchain+0x15f/0x2b0
...
<EOI>
[<ffffffff8189c6c3>] ? _raw_spin_unlock_irq+0x33/0x60
[<ffffffff810e1c84>] finish_task_switch+0xb4/0x250
[<ffffffff8106f7dc>] do_async_page_fault+0x2c/0xa0
Now that "continued" lines are no longer needed, we can also remove the
hack of using the empty string (aka KERN_CONT) and replace it with
KERN_DEFAULT.
Signed-off-by: Josh Poimboeuf <jpoimboe@redhat.com>
Cc: Andy Lutomirski <luto@kernel.org>
Cc: Borislav Petkov <bp@alien8.de>
Cc: Brian Gerst <brgerst@gmail.com>
Cc: Denys Vlasenko <dvlasenk@redhat.com>
Cc: H. Peter Anvin <hpa@zytor.com>
Cc: Linus Torvalds <torvalds@linux-foundation.org>
Cc: Peter Zijlstra <peterz@infradead.org>
Cc: Thomas Gleixner <tglx@linutronix.de>
Link: http://lkml.kernel.org/r/9bdd6dee2c74555d45500939fcc155997dc7889e.1476973742.git.jpoimboe@redhat.com
Signed-off-by: Ingo Molnar <mingo@kernel.org>
The entry code doesn't encode the pt_regs pointer for syscalls. But the
pt_regs are always at the same location, so we can add a manual check
for them.
A later patch prints them as part of the oops stack dump. They could be
useful, for example, to determine the arguments to a system call.
Signed-off-by: Josh Poimboeuf <jpoimboe@redhat.com>
Cc: Andy Lutomirski <luto@kernel.org>
Cc: Borislav Petkov <bp@alien8.de>
Cc: Brian Gerst <brgerst@gmail.com>
Cc: Denys Vlasenko <dvlasenk@redhat.com>
Cc: H. Peter Anvin <hpa@zytor.com>
Cc: Linus Torvalds <torvalds@linux-foundation.org>
Cc: Peter Zijlstra <peterz@infradead.org>
Cc: Thomas Gleixner <tglx@linutronix.de>
Link: http://lkml.kernel.org/r/e176aa9272930cd3f51fda0b94e2eae356677da4.1476973742.git.jpoimboe@redhat.com
Signed-off-by: Ingo Molnar <mingo@kernel.org>
With frame pointers, when a task is interrupted, its stack is no longer
completely reliable because the function could have been interrupted
before it had a chance to save the previous frame pointer on the stack.
So the caller of the interrupted function could get skipped by a stack
trace.
This is problematic for live patching, which needs to know whether a
stack trace of a sleeping task can be relied upon. There's currently no
way to detect if a sleeping task was interrupted by a page fault
exception or preemption before it went to sleep.
Another issue is that when dumping the stack of an interrupted task, the
unwinder has no way of knowing where the saved pt_regs registers are, so
it can't print them.
This solves those issues by encoding the pt_regs pointer in the frame
pointer on entry from an interrupt or an exception.
This patch also updates the unwinder to be able to decode it, because
otherwise the unwinder would be broken by this change.
Note that this causes a change in the behavior of the unwinder: each
instance of a pt_regs on the stack is now considered a "frame". So
callers of unwind_get_return_address() will now get an occasional
'regs->ip' address that would have previously been skipped over.
Suggested-by: Andy Lutomirski <luto@amacapital.net>
Signed-off-by: Josh Poimboeuf <jpoimboe@redhat.com>
Cc: Andy Lutomirski <luto@kernel.org>
Cc: Borislav Petkov <bp@alien8.de>
Cc: Brian Gerst <brgerst@gmail.com>
Cc: Denys Vlasenko <dvlasenk@redhat.com>
Cc: H. Peter Anvin <hpa@zytor.com>
Cc: Linus Torvalds <torvalds@linux-foundation.org>
Cc: Peter Zijlstra <peterz@infradead.org>
Cc: Thomas Gleixner <tglx@linutronix.de>
Link: http://lkml.kernel.org/r/8b9f84a21e39d249049e0547b559ff8da0df0988.1476973742.git.jpoimboe@redhat.com
Signed-off-by: Ingo Molnar <mingo@kernel.org>
The recent introduction of SA_X32/IA32 sa_flags added a check for
user_64bit_mode() into sigaction_compat_abi(). user_64bit_mode() is true
for native 64-bit processes and x32 processes.
Due to that the function returns w/o setting the SA_X32_ABI flag for X32
processes. In consequence the kernel attempts to deliver the signal to the
X32 process in native 64-bit mode causing the process to segfault.
Remove the check, so the actual check for X32 mode which sets the ABI flag
can be reached. There is no side effect for native 64-bit mode.
[ tglx: Rewrote changelog ]
Fixes: 6846351052 ("x86/signal: Add SA_{X32,IA32}_ABI sa_flags")
Reported-by: Mikulas Patocka <mpatocka@redhat.com>
Tested-by: Adam Borowski <kilobyte@angband.pl>
Signed-off-by: Dmitry Safonov <0x7f454c46@gmail.com>
Cc: Dmitry Safonov <dsafonov@virtuozzo.com>
Cc: Oleg Nesterov <oleg@redhat.com>
Cc: linux-mm@kvack.org
Cc: Cyrill Gorcunov <gorcunov@openvz.org>
Cc: Pavel Emelyanov <xemul@virtuozzo.com>
Link: http://lkml.kernel.org/r/CAJwJo6Z8ZWPqNfT6t-i8GW1MKxQrKDUagQqnZ%2B0%2B697%3DMyVeGg@mail.gmail.com
Signed-off-by: Thomas Gleixner <tglx@linutronix.de>
Signed-off-by: Ingo Molnar <mingo@kernel.org>
When core_kernel_text() is used to determine whether an address on a
task's stack trace is a kernel text address, it incorrectly returns
false for early text addresses for the head code between the _text and
_stext markers. Among other things, this can cause the unwinder to
behave incorrectly when unwinding to x86 head code.
Head code is text code too, so mark it as such. This seems to match the
intent of other users of the _stext symbol, and it also seems consistent
with what other architectures are already doing.
Signed-off-by: Josh Poimboeuf <jpoimboe@redhat.com>
Cc: Andy Lutomirski <luto@kernel.org>
Cc: Borislav Petkov <bp@alien8.de>
Cc: Brian Gerst <brgerst@gmail.com>
Cc: Denys Vlasenko <dvlasenk@redhat.com>
Cc: H. Peter Anvin <hpa@zytor.com>
Cc: Linus Torvalds <torvalds@linux-foundation.org>
Cc: Nilay Vaish <nilayvaish@gmail.com>
Cc: Peter Zijlstra <peterz@infradead.org>
Cc: Thomas Gleixner <tglx@linutronix.de>
Link: http://lkml.kernel.org/r/789cf978866420e72fa89df44aa2849426ac378d.1474480779.git.jpoimboe@redhat.com
Signed-off-by: Ingo Molnar <mingo@kernel.org>
Thanks to all the recent x86 entry code refactoring, most tasks' kernel
stacks start at the same offset right below their saved pt_regs,
regardless of which syscall was used to enter the kernel. That creates
a nice convention which makes it straightforward to identify the end of
the stack, which can be useful for the unwinder to verify the stack is
sane.
However, the boot CPU's idle "swapper" task doesn't follow that
convention. Fix that by starting its stack at a sizeof(pt_regs) offset
from the end of the stack page.
Signed-off-by: Josh Poimboeuf <jpoimboe@redhat.com>
Cc: Andy Lutomirski <luto@kernel.org>
Cc: Borislav Petkov <bp@alien8.de>
Cc: Brian Gerst <brgerst@gmail.com>
Cc: Denys Vlasenko <dvlasenk@redhat.com>
Cc: H. Peter Anvin <hpa@zytor.com>
Cc: Linus Torvalds <torvalds@linux-foundation.org>
Cc: Nilay Vaish <nilayvaish@gmail.com>
Cc: Peter Zijlstra <peterz@infradead.org>
Cc: Thomas Gleixner <tglx@linutronix.de>
Link: http://lkml.kernel.org/r/81aee3beb6ed88e44f1bea6986bb7b65c368f77a.1474480779.git.jpoimboe@redhat.com
Signed-off-by: Ingo Molnar <mingo@kernel.org>
The frame at the end of each idle task stack has a zeroed return
address. This is inconsistent with real task stacks, which have a real
return address at that spot. This inconsistency can be confusing for
stack unwinders. It also hides useful information about what asm code
was involved in calling into C.
Make it a real address by using the side effect of a call instruction to
push the instruction pointer on the stack.
Signed-off-by: Josh Poimboeuf <jpoimboe@redhat.com>
Reviewed-by: Andy Lutomirski <luto@kernel.org>
Cc: Borislav Petkov <bp@alien8.de>
Cc: Brian Gerst <brgerst@gmail.com>
Cc: Denys Vlasenko <dvlasenk@redhat.com>
Cc: H. Peter Anvin <hpa@zytor.com>
Cc: Linus Torvalds <torvalds@linux-foundation.org>
Cc: Nilay Vaish <nilayvaish@gmail.com>
Cc: Peter Zijlstra <peterz@infradead.org>
Cc: Thomas Gleixner <tglx@linutronix.de>
Link: http://lkml.kernel.org/r/f59593ae7b15d5126f872b0a23143173d28aa32d.1474480779.git.jpoimboe@redhat.com
Signed-off-by: Ingo Molnar <mingo@kernel.org>
There are two different pieces of code for starting a CPU: start_cpu0()
and the end of secondary_startup_64(). They're identical except for the
stack setup. Combine the common parts into a shared start_cpu()
function.
Signed-off-by: Josh Poimboeuf <jpoimboe@redhat.com>
Cc: Andy Lutomirski <luto@kernel.org>
Cc: Borislav Petkov <bp@alien8.de>
Cc: Brian Gerst <brgerst@gmail.com>
Cc: Denys Vlasenko <dvlasenk@redhat.com>
Cc: H. Peter Anvin <hpa@zytor.com>
Cc: Linus Torvalds <torvalds@linux-foundation.org>
Cc: Nilay Vaish <nilayvaish@gmail.com>
Cc: Peter Zijlstra <peterz@infradead.org>
Cc: Thomas Gleixner <tglx@linutronix.de>
Link: http://lkml.kernel.org/r/1d692ffa62fcb3cc835a5b254e953f2d9bab3549.1474480779.git.jpoimboe@redhat.com
Signed-off-by: Ingo Molnar <mingo@kernel.org>
On 32-bit kernels, the initial idle stack calculation doesn't take into
account the TOP_OF_KERNEL_STACK_PADDING, making the stack end address
inconsistent with other tasks on 32-bit.
Signed-off-by: Josh Poimboeuf <jpoimboe@redhat.com>
Reviewed-by: Andy Lutomirski <luto@kernel.org>
Cc: Borislav Petkov <bp@alien8.de>
Cc: Brian Gerst <brgerst@gmail.com>
Cc: Denys Vlasenko <dvlasenk@redhat.com>
Cc: H. Peter Anvin <hpa@zytor.com>
Cc: Linus Torvalds <torvalds@linux-foundation.org>
Cc: Nilay Vaish <nilayvaish@gmail.com>
Cc: Peter Zijlstra <peterz@infradead.org>
Cc: Thomas Gleixner <tglx@linutronix.de>
Link: http://lkml.kernel.org/r/6cf569410bfa84cf923902fc4d628444cace94be.1474480779.git.jpoimboe@redhat.com
Signed-off-by: Ingo Molnar <mingo@kernel.org>
The frame at the end of each idle task stack is inconsistent with real
task stacks, which have a stack frame header and a real return address
before the pt_regs area. This inconsistency can be confusing for stack
unwinders. It also hides useful information about what asm code was
involved in calling into C.
Fix that by changing the initial code jumps to calls. Also add infinite
loops after the calls to make it clear that the calls don't return, and
to hang if they do.
Signed-off-by: Josh Poimboeuf <jpoimboe@redhat.com>
Cc: Andy Lutomirski <luto@kernel.org>
Cc: Borislav Petkov <bp@alien8.de>
Cc: Brian Gerst <brgerst@gmail.com>
Cc: Denys Vlasenko <dvlasenk@redhat.com>
Cc: H. Peter Anvin <hpa@zytor.com>
Cc: Linus Torvalds <torvalds@linux-foundation.org>
Cc: Nilay Vaish <nilayvaish@gmail.com>
Cc: Peter Zijlstra <peterz@infradead.org>
Cc: Thomas Gleixner <tglx@linutronix.de>
Link: http://lkml.kernel.org/r/2588f34b6fbac4ae6f6f9ead2a78d7f8d58a6341.1474480779.git.jpoimboe@redhat.com
Signed-off-by: Ingo Molnar <mingo@kernel.org>
Add the local label prefix to all non-function named labels in head_32.S
and entry_32.S. In addition to decluttering the symbol table, it also
will help stack traces to be more sensible. For example, the last
reported function in the idle task stack trace will be startup_32_smp()
instead of is486().
Signed-off-by: Josh Poimboeuf <jpoimboe@redhat.com>
Cc: Andy Lutomirski <luto@kernel.org>
Cc: Borislav Petkov <bp@alien8.de>
Cc: Brian Gerst <brgerst@gmail.com>
Cc: Denys Vlasenko <dvlasenk@redhat.com>
Cc: H. Peter Anvin <hpa@zytor.com>
Cc: Linus Torvalds <torvalds@linux-foundation.org>
Cc: Nilay Vaish <nilayvaish@gmail.com>
Cc: Peter Zijlstra <peterz@infradead.org>
Cc: Thomas Gleixner <tglx@linutronix.de>
Link: http://lkml.kernel.org/r/14f9f7afd478b23a762f40734da1a57c0c273f6e.1474480779.git.jpoimboe@redhat.com
Signed-off-by: Ingo Molnar <mingo@kernel.org>
Merge the gup_flags cleanups from Lorenzo Stoakes:
"This patch series adjusts functions in the get_user_pages* family such
that desired FOLL_* flags are passed as an argument rather than
implied by flags.
The purpose of this change is to make the use of FOLL_FORCE explicit
so it is easier to grep for and clearer to callers that this flag is
being used. The use of FOLL_FORCE is an issue as it overrides missing
VM_READ/VM_WRITE flags for the VMA whose pages we are reading
from/writing to, which can result in surprising behaviour.
The patch series came out of the discussion around commit 38e0885465
("mm: check VMA flags to avoid invalid PROT_NONE NUMA balancing"),
which addressed a BUG_ON() being triggered when a page was faulted in
with PROT_NONE set but having been overridden by FOLL_FORCE.
do_numa_page() was run on the assumption the page _must_ be one marked
for NUMA node migration as an actual PROT_NONE page would have been
dealt with prior to this code path, however FOLL_FORCE introduced a
situation where this assumption did not hold.
See
https://marc.info/?l=linux-mm&m=147585445805166
for the patch proposal"
Additionally, there's a fix for an ancient bug related to FOLL_FORCE and
FOLL_WRITE by me.
[ This branch was rebased recently to add a few more acked-by's and
reviewed-by's ]
* gup_flag-cleanups:
mm: replace access_process_vm() write parameter with gup_flags
mm: replace access_remote_vm() write parameter with gup_flags
mm: replace __access_remote_vm() write parameter with gup_flags
mm: replace get_user_pages_remote() write/force parameters with gup_flags
mm: replace get_user_pages() write/force parameters with gup_flags
mm: replace get_vaddr_frames() write/force parameters with gup_flags
mm: replace get_user_pages_locked() write/force parameters with gup_flags
mm: replace get_user_pages_unlocked() write/force parameters with gup_flags
mm: remove write/force parameters from __get_user_pages_unlocked()
mm: remove write/force parameters from __get_user_pages_locked()
mm: remove gup_flags FOLL_WRITE games from __get_user_pages()
AVX512_4VNNIW - Vector instructions for deep learning enhanced word
variable precision.
AVX512_4FMAPS - Vector instructions for deep learning floating-point
single precision.
These new instructions are to be used in future Intel Xeon & Xeon Phi
processors. The bits 2&3 of CPUID[level:0x07, EDX] inform that new
instructions are supported by a processor.
The spec can be found in the Intel Software Developer Manual (SDM) or in
the Instruction Set Extensions Programming Reference (ISE).
Define new feature flags to enumerate the new instructions in /proc/cpuinfo
accordingly to CPUID bits and add the required xsave extensions which are
required for proper operation.
Signed-off-by: Piotr Luc <piotr.luc@intel.com>
Cc: Denys Vlasenko <dvlasenk@redhat.com>
Cc: Peter Zijlstra <peterz@infradead.org>
Cc: Brian Gerst <brgerst@gmail.com>
Cc: Dave Hansen <dave.hansen@intel.com>
Cc: Borislav Petkov <bp@alien8.de>
Cc: Andy Lutomirski <luto@kernel.org>
Cc: Josh Poimboeuf <jpoimboe@redhat.com>
Cc: Linus Torvalds <torvalds@linux-foundation.org>
Link: http://lkml.kernel.org/r/20161018150111.29926-1-piotr.luc@intel.com
Signed-off-by: Thomas Gleixner <tglx@linutronix.de>
The timer_irq_works() boot check may sometimes fail in a VM, when
the Host is overcommitted or when the Guest is running nested.
Since the intended check is unnecessary on VMware's virtual
hardware, by-pass it.
Signed-off-by: Renat Valiullin <rvaliullin@vmware.com>
Acked-by: Alok N Kataria <akataria@vmware.com>
Cc: virtualization@lists.linux-foundation.org
Link: http://lkml.kernel.org/r/20161013184539.GA11497@rvaliullin-vm
Signed-off-by: Thomas Gleixner <tglx@linutronix.de>
This removes the 'write' argument from access_process_vm() and replaces
it with 'gup_flags' as use of this function previously silently implied
FOLL_FORCE, whereas after this patch callers explicitly pass this flag.
We make this explicit as use of FOLL_FORCE can result in surprising
behaviour (and hence bugs) within the mm subsystem.
Signed-off-by: Lorenzo Stoakes <lstoakes@gmail.com>
Acked-by: Jesper Nilsson <jesper.nilsson@axis.com>
Acked-by: Michal Hocko <mhocko@suse.com>
Acked-by: Michael Ellerman <mpe@ellerman.id.au>
Signed-off-by: Linus Torvalds <torvalds@linux-foundation.org>
By moving all of the new_fpu state handling into switch_fpu_finish(),
the code can be simplified some more.
This gets rid of the prefetch, but given the size of the FPU register
state on modern CPUs, and the amount of work done by __switch_to()
inbetween both functions, the value of a single cache line prefetch
seems somewhat dubious anyway.
Signed-off-by: Rik van Riel <riel@redhat.com>
Acked-by: Dave Hansen <dave.hansen@intel.com>
Cc: Andy Lutomirski <luto@kernel.org>
Cc: Borislav Petkov <bp@alien8.de>
Cc: Brian Gerst <brgerst@gmail.com>
Cc: Dave Hansen <dave.hansen@linux.intel.com>
Cc: Denys Vlasenko <dvlasenk@redhat.com>
Cc: Fenghua Yu <fenghua.yu@intel.com>
Cc: H. Peter Anvin <hpa@zytor.com>
Cc: Josh Poimboeuf <jpoimboe@redhat.com>
Cc: Linus Torvalds <torvalds@linux-foundation.org>
Cc: Oleg Nesterov <oleg@redhat.com>
Cc: Peter Zijlstra <peterz@infradead.org>
Cc: Quentin Casasnovas <quentin.casasnovas@oracle.com>
Cc: Thomas Gleixner <tglx@linutronix.de>
Link: http://lkml.kernel.org/r/1476447331-21566-3-git-send-email-riel@redhat.com
Signed-off-by: Ingo Molnar <mingo@kernel.org>