Commit Graph

26349 Commits

Author SHA1 Message Date
Juergen Gross
ecb23dc6f2 xen: add steal_clock support on x86
The pv_time_ops structure contains a function pointer for the
"steal_clock" functionality used only by KVM and Xen on ARM. Xen on x86
uses its own mechanism to account for the "stolen" time a thread wasn't
able to run due to hypervisor scheduling.

Add support in Xen arch independent time handling for this feature by
moving it out of the arm arch into drivers/xen and remove the x86 Xen
hack.

Signed-off-by: Juergen Gross <jgross@suse.com>
Reviewed-by: Boris Ostrovsky <boris.ostrovsky@oracle.com>
Reviewed-by: Stefano Stabellini <sstabellini@kernel.org>
Signed-off-by: David Vrabel <david.vrabel@citrix.com>
2016-07-06 10:34:48 +01:00
Shannon Zhao
a62ed50030 XEN: EFI: Move x86 specific codes to architecture directory
Move x86 specific codes to architecture directory and export those EFI
runtime service functions. This will be useful for initializing runtime
service on ARM later.

Signed-off-by: Shannon Zhao <shannon.zhao@linaro.org>
Reviewed-by: Stefano Stabellini <stefano.stabellini@eu.citrix.com>
Tested-by: Julien Grall <julien.grall@arm.com>
Signed-off-by: Stefano Stabellini <sstabellini@kernel.org>
2016-07-06 10:34:46 +01:00
Shannon Zhao
243848fc01 xen/grant-table: Move xlated_setup_gnttab_pages to common place
Move xlated_setup_gnttab_pages to common place, so it can be reused by
ARM to setup grant table.

Rename it to xen_xlate_map_ballooned_pages.

Signed-off-by: Shannon Zhao <shannon.zhao@linaro.org>
Reviewed-by: Stefano Stabellini <stefano.stabellini@eu.citrix.com>
Reviewed-by: Julien Grall <julien.grall@arm.com>
Tested-by: Julien Grall <julien.grall@arm.com>
2016-07-06 10:34:42 +01:00
Alexis Dambricourt
30b072ce03 KVM: MTRR: fix kvm_mtrr_check_gfn_range_consistency page fault
The following #PF may occurs:
[ 1403.317041] BUG: unable to handle kernel paging request at 0000000200000068
[ 1403.317045] IP: [<ffffffffc04c20b0>] __mtrr_lookup_var_next+0x10/0xa0 [kvm]

[ 1403.317123] Call Trace:
[ 1403.317134]  [<ffffffffc04c2a65>] ? kvm_mtrr_check_gfn_range_consistency+0xc5/0x120 [kvm]
[ 1403.317143]  [<ffffffffc04ac11f>] ? tdp_page_fault+0x9f/0x2c0 [kvm]
[ 1403.317152]  [<ffffffffc0498128>] ? kvm_set_msr_common+0x858/0xc00 [kvm]
[ 1403.317161]  [<ffffffffc04b8883>] ? x86_emulate_insn+0x273/0xd30 [kvm]
[ 1403.317171]  [<ffffffffc04c04e4>] ? kvm_cpuid+0x34/0x190 [kvm]
[ 1403.317180]  [<ffffffffc04a5bb9>] ? kvm_mmu_page_fault+0x59/0xe0 [kvm]
[ 1403.317183]  [<ffffffffc0d729e1>] ? vmx_handle_exit+0x1d1/0x14a0 [kvm_intel]
[ 1403.317185]  [<ffffffffc0d75f3f>] ? atomic_switch_perf_msrs+0x6f/0xa0 [kvm_intel]
[ 1403.317187]  [<ffffffffc0d7621d>] ? vmx_vcpu_run+0x2ad/0x420 [kvm_intel]
[ 1403.317196]  [<ffffffffc04a0962>] ? kvm_arch_vcpu_ioctl_run+0x622/0x1550 [kvm]
[ 1403.317204]  [<ffffffffc049abb9>] ? kvm_arch_vcpu_load+0x59/0x210 [kvm]
[ 1403.317206]  [<ffffffff81036245>] ? __kernel_fpu_end+0x35/0x100
[ 1403.317213]  [<ffffffffc0487eb6>] ? kvm_vcpu_ioctl+0x316/0x5d0 [kvm]
[ 1403.317215]  [<ffffffff81088225>] ? do_sigtimedwait+0xd5/0x220
[ 1403.317217]  [<ffffffff811f84dd>] ? do_vfs_ioctl+0x9d/0x5c0
[ 1403.317224]  [<ffffffffc04928ae>] ? kvm_on_user_return+0x3e/0x70 [kvm]
[ 1403.317225]  [<ffffffff811f8a74>] ? SyS_ioctl+0x74/0x80
[ 1403.317227]  [<ffffffff815bf0b6>] ? entry_SYSCALL_64_fastpath+0x1e/0xa8
[ 1403.317242] RIP  [<ffffffffc04c20b0>] __mtrr_lookup_var_next+0x10/0xa0 [kvm]

At mtrr_lookup_fixed_next(), when the condition
'if (iter->index >= ARRAY_SIZE(iter->mtrr_state->fixed_ranges))' becomes true,
mtrr_lookup_var_start() is called with iter->range with gargabe values from the
fixed MTRR union field. Then, list_prepare_entry() do not call list_entry()
initialization, keeping a garbage pointer in iter->range which is accessed in
the following __mtrr_lookup_var_next() call.

Fixes: f571c0973e
Signed-off-by: Alexis Dambricourt <alexis@blade-group.com>
Cc: stable@vger.kernel.org
Signed-off-by: Paolo Bonzini <pbonzini@redhat.com>
2016-07-05 16:14:43 +02:00
Wei Yongjun
03f6a22a39 KVM: x86: Use ARRAY_SIZE instead of dividing sizeof array with sizeof an element
Use ARRAY_SIZE instead of dividing sizeof array with sizeof an element

Signed-off-by: Wei Yongjun <yongjun_wei@trendmicro.com.cn>
Signed-off-by: Paolo Bonzini <pbonzini@redhat.com>
2016-07-05 14:41:45 +02:00
Rafael J. Wysocki
5d1191ab6c Merge back earlier cpufreq material for v4.8. 2016-07-04 13:21:43 +02:00
Thomas Gleixner
8658be133b Merge branch 'irq/for-block' into irq/core
Pull the irq affinity managing code which is in a seperate branch for block
developers to pull.
2016-07-04 12:26:05 +02:00
Thomas Gleixner
06ee6d571f genirq: Add affinity hint to irq allocation
Add an extra argument to the irq(domain) allocation functions, so we can hand
down affinity hints to the allocator. Thats necessary to implement proper
support for multiqueue devices.

Signed-off-by: Thomas Gleixner <tglx@linutronix.de>
Cc: Christoph Hellwig <hch@lst.de>
Cc: linux-block@vger.kernel.org
Cc: linux-pci@vger.kernel.org
Cc: linux-nvme@lists.infradead.org
Cc: axboe@fb.com
Cc: agordeev@redhat.com
Link: http://lkml.kernel.org/r/1467621574-8277-4-git-send-email-hch@lst.de
Signed-off-by: Thomas Gleixner <tglx@linutronix.de>
2016-07-04 12:25:13 +02:00
Josh Poimboeuf
fc18822510 perf/x86: Fix 32-bit perf user callgraph collection
A basic perf callgraph record operation causes an immediate panic on a
32-bit kernel compiled with CONFIG_CC_STACKPROTECTOR=y:

  $ perf record -g ls
  Kernel panic - not syncing: stack-protector: Kernel stack is corrupted in: c0404fbd

  CPU: 0 PID: 998 Comm: ls Not tainted 4.7.0-rc5+ #1
  Hardware name: QEMU Standard PC (i440FX + PIIX, 1996), BIOS 1.9.1-1.fc24 04/01/2014
   c0dd5967 ff7afe1c 00000086 f41dbc2c c07445a0 464c457f f41dbca8 f41dbc44
   c05646f4 f41dbca8 464c457f f41dbca8 464c457f f41dbc54 c04625be c0ce56fc
   c0404fbd f41dbc88 c0404fbd b74668f0 f41dc000 00000000 c0000000 00000000
  Call Trace:
   [<c07445a0>] dump_stack+0x58/0x78
   [<c05646f4>] panic+0x8e/0x1c6
   [<c04625be>] __stack_chk_fail+0x1e/0x30
   [<c0404fbd>] ? perf_callchain_user+0x22d/0x230
   [<c0404fbd>] perf_callchain_user+0x22d/0x230
   [<c055f89f>] get_perf_callchain+0x1ff/0x270
   [<c055f988>] perf_callchain+0x78/0x90
   [<c055c7eb>] perf_prepare_sample+0x24b/0x370
   [<c055c934>] perf_event_output_forward+0x24/0x70
   [<c05531c0>] __perf_event_overflow+0xa0/0x210
   [<c0550a93>] ? cpu_clock_event_read+0x43/0x50
   [<c0553431>] perf_swevent_hrtimer+0x101/0x180
   [<c0456235>] ? kmap_atomic_prot+0x35/0x140
   [<c056dc69>] ? get_page_from_freelist+0x279/0x950
   [<c058fdd8>] ? vma_interval_tree_remove+0x158/0x230
   [<c05939f4>] ? wp_page_copy.isra.82+0x2f4/0x630
   [<c05a050d>] ? page_add_file_rmap+0x1d/0x50
   [<c0565611>] ? unlock_page+0x61/0x80
   [<c0566755>] ? filemap_map_pages+0x305/0x320
   [<c059769f>] ? handle_mm_fault+0xb7f/0x1560
   [<c074cbeb>] ? timerqueue_del+0x1b/0x70
   [<c04cfefe>] ? __remove_hrtimer+0x2e/0x60
   [<c04d017b>] __hrtimer_run_queues+0xcb/0x2a0
   [<c0553330>] ? __perf_event_overflow+0x210/0x210
   [<c04d0a2a>] hrtimer_interrupt+0x8a/0x180
   [<c043ecc2>] local_apic_timer_interrupt+0x32/0x60
   [<c043f643>] smp_apic_timer_interrupt+0x33/0x50
   [<c0b0cd38>] apic_timer_interrupt+0x34/0x3c
  Kernel Offset: disabled
  ---[ end Kernel panic - not syncing: stack-protector: Kernel stack is corrupted in: c0404fbd

The panic is caused by the fact that perf_callchain_user() mistakenly
assumes it's 64-bit only and ends up corrupting the stack.

Signed-off-by: Josh Poimboeuf <jpoimboe@redhat.com>
Cc: Alexander Shishkin <alexander.shishkin@linux.intel.com>
Cc: Arnaldo Carvalho de Melo <acme@kernel.org>
Cc: Arnaldo Carvalho de Melo <acme@redhat.com>
Cc: Jiri Olsa <jolsa@redhat.com>
Cc: Linus Torvalds <torvalds@linux-foundation.org>
Cc: Peter Zijlstra <peterz@infradead.org>
Cc: Stephane Eranian <eranian@google.com>
Cc: Thomas Gleixner <tglx@linutronix.de>
Cc: Vince Weaver <vincent.weaver@maine.edu>
Cc: stable@vger.kernel.org # v4.5+
Fixes: 75925e1ad7 ("perf/x86: Optimize stack walk user accesses")
Link: http://lkml.kernel.org/r/1a547f5077ec30f75f9b57074837c3c80df86e5e.1467432113.git.jpoimboe@redhat.com
Signed-off-by: Ingo Molnar <mingo@kernel.org>
2016-07-03 10:43:00 +02:00
Stephane Eranian
9010ae4a8d perf/x86/intel: Update event constraints when HT is off
This patch updates the event constraints for non-PEBS mode for
Intel Broadwell and Skylake processors. When HT is off, each
CPU gets 8 generic counters. However, not all events can be
programmed on any of the 8 counters.  This patch adds the
constraints for the MEM_* events which can only be measured on the
bottom 4 counters. The constraints are also valid when HT is off
because, then, there are only 4 generic counters and they are the
bottom counters.

Signed-off-by: Stephane Eranian <eranian@google.com>
Cc: Alexander Shishkin <alexander.shishkin@linux.intel.com>
Cc: Arnaldo Carvalho de Melo <acme@redhat.com>
Cc: Jiri Olsa <jolsa@redhat.com>
Cc: Linus Torvalds <torvalds@linux-foundation.org>
Cc: Peter Zijlstra <peterz@infradead.org>
Cc: Thomas Gleixner <tglx@linutronix.de>
Cc: Vince Weaver <vincent.weaver@maine.edu>
Cc: kan.liang@intel.com
Link: http://lkml.kernel.org/r/1467411742-13245-1-git-send-email-eranian@google.com
Signed-off-by: Ingo Molnar <mingo@kernel.org>
2016-07-03 10:39:53 +02:00
Dave Airlie
542d972221 Back-merge tag 'v4.7-rc5' into drm-next
Linux 4.7-rc5

The fsl-dcu pull needs -rc3 so go to -rc5 for now.
2016-07-02 15:56:01 +10:00
Sinan Kaya
487cf917ed Revert "ACPI, PCI, IRQ: remove redundant code in acpi_irq_penalty_init()"
Trying to make the ISA and PCI init functionality common turned out
to be a bad idea, because the ISA path depends on external
functionality.

Restore the previous behavior and limit the refactoring to PCI
interrupts only.

Fixes: 1fcb6a813c "ACPI,PCI,IRQ: remove redundant code in acpi_irq_penalty_init()"
Signed-off-by: Sinan Kaya <okaya@codeaurora.org>
Tested-by: Wim Osterholt <wim@djo.tudelft.nl>
Signed-off-by: Rafael J. Wysocki <rafael.j.wysocki@intel.com>
2016-07-02 01:38:34 +02:00
Herbert Xu
02fa472afe crypto: aesni - Use crypto_cipher to derive rfc4106 subkey
Currently aesni uses an async ctr(aes) to derive the rfc4106
subkey, which was presumably copied over from the generic rfc4106
code.  Over there it's done that way because we already have a
ctr(aes) spawn.  But it is simply overkill for aesni since we
have to go get a ctr(aes) from scratch anyway.

This patch simplifies the subkey derivation by using a straight
aes cipher instead.

Signed-off-by: Herbert Xu <herbert@gondor.apana.org.au>
2016-07-01 23:45:06 +08:00
Wanpeng Li
196f20ca52 KVM: vmx: fix missed cancellation of TSC deadline timer
INFO: rcu_sched detected stalls on CPUs/tasks:
 1-...: (11800 GPs behind) idle=45d/140000000000000/0 softirq=0/0 fqs=21663
 (detected by 0, t=65016 jiffies, g=11500, c=11499, q=719)
Task dump for CPU 1:
qemu-system-x86 R  running task        0  3529   3525 0x00080808
 ffff8802021791a0 ffff880212895040 0000000000000001 00007f1c2c00db40
 ffff8801dd20fcd3 ffffc90002b98000 ffff8801dd20fc88 ffff8801dd20fcf8
 0000000000000286 ffff8801dd2ac538 ffff8801dd20fcc0 ffffffffc06949c9
Call Trace:
? kvm_write_guest_cached+0xb9/0x160 [kvm]
? __delay+0xf/0x20
? wait_lapic_expire+0x14a/0x200 [kvm]
? kvm_arch_vcpu_ioctl_run+0xcbe/0x1b00 [kvm]
? kvm_arch_vcpu_ioctl_run+0xe34/0x1b00 [kvm]
? kvm_vcpu_ioctl+0x2d3/0x7c0 [kvm]
? __fget+0x5/0x210
? do_vfs_ioctl+0x96/0x6a0
? __fget_light+0x2a/0x90
? SyS_ioctl+0x79/0x90
? do_syscall_64+0x7c/0x1e0
? entry_SYSCALL64_slow_path+0x25/0x25

This can be reproduced readily by running a full dynticks guest(since hrtimer
in guest is heavily used) w/ lapic_timer_advance disabled.

If fail to program hardware preemption timer, we will fallback to hrtimer based
method, however, a previous programmed preemption timer miss to cancel in this
scenario which results in one hardware preemption timer and one hrtimer emulated
tsc deadline timer run simultaneously. So sometimes the target guest deadline
tsc is earlier than guest tsc, which leads to the computation in vmx_set_hv_timer
can underflow and cause delta_tsc to be set a huge value, then host soft lockup
as above.

This patch fix it by cancelling the previous programmed preemption timer if there
is once we failed to program the new preemption timer and fallback to hrtimer
based method.

Cc: Paolo Bonzini <pbonzini@redhat.com>
Cc: Radim Krčmář <rkrcmar@redhat.com>
Cc: Yunhong Jiang <yunhong.jiang@intel.com>
Signed-off-by: Wanpeng Li <wanpeng.li@hotmail.com>
Signed-off-by: Paolo Bonzini <pbonzini@redhat.com>
2016-07-01 11:03:42 +02:00
Wanpeng Li
bd97ad0e7e KVM: x86: introduce cancel_hv_tscdeadline
Introduce cancel_hv_tscdeadline() to encapsulate preemption
timer cancel stuff.

Cc: Paolo Bonzini <pbonzini@redhat.com>
Cc: Radim Krčmář <rkrcmar@redhat.com>
Cc: Yunhong Jiang <yunhong.jiang@intel.com>
Signed-off-by: Wanpeng Li <wanpeng.li@hotmail.com>
Signed-off-by: Paolo Bonzini <pbonzini@redhat.com>
2016-07-01 11:03:41 +02:00
Paolo Bonzini
9175d2e97b KVM: vmx: fix underflow in TSC deadline calculation
If the TSC deadline timer is programmed really close to the deadline or
even in the past, the computation in vmx_set_hv_timer can underflow and
cause delta_tsc to be set to a huge value.  This generally results
in vmx_set_hv_timer returning -ERANGE, but we can fix it by limiting
delta_tsc to be positive or zero.

Reported-by: Wanpeng Li <wanpeng.li@hotmail.com>
Signed-off-by: Paolo Bonzini <pbonzini@redhat.com>
2016-07-01 11:03:39 +02:00
Paolo Bonzini
f2485b3e0c KVM: x86: use guest_exit_irqoff
This gains a few clock cycles per vmexit.  On Intel there is no need
anymore to enable the interrupts in vmx_handle_external_intr, since
we are using the "acknowledge interrupt on exit" feature.  AMD
needs to do that, and must be careful to avoid the interrupt shadow.

Signed-off-by: Paolo Bonzini <pbonzini@redhat.com>
2016-07-01 11:03:38 +02:00
Paolo Bonzini
91fa0f8e9e KVM: x86: always use "acknowledge interrupt on exit"
This is necessary to simplify handle_external_intr in the next patch.

Signed-off-by: Paolo Bonzini <pbonzini@redhat.com>
2016-07-01 11:03:36 +02:00
Paolo Bonzini
6edaa5307f KVM: remove kvm_guest_enter/exit wrappers
Use the functions from context_tracking.h directly.

Cc: Andy Lutomirski <luto@kernel.org>
Cc: Peter Zijlstra <peterz@infradead.org>
Cc: H. Peter Anvin <hpa@zytor.com>
Cc: Ingo Molnar <mingo@kernel.org>
Cc: Thomas Gleixner <tglx@linutronix.de>
Reviewed-by: Rik van Riel <riel@redhat.com>
Signed-off-by: Paolo Bonzini <pbonzini@redhat.com>
2016-07-01 11:03:21 +02:00
Andy Shevchenko
0519e8b4cb x86/platform/intel-mid: Add pinctrl for Intel Merrifield
Intel Merrifield uses a special address space reserved for Family-Level
Interface Shim (FLIS) that allows consumers to mux and configure pins.

Create a platform device for it.

Signed-off-by: Andy Shevchenko <andriy.shevchenko@linux.intel.com>
Cc: Linus Torvalds <torvalds@linux-foundation.org>
Cc: Peter Zijlstra <peterz@infradead.org>
Cc: Thomas Gleixner <tglx@linutronix.de>
Link: http://lkml.kernel.org/r/1467226894-107109-1-git-send-email-andriy.shevchenko@linux.intel.com
[ Fixed typo. ]
Signed-off-by: Ingo Molnar <mingo@kernel.org>
2016-07-01 10:12:39 +02:00
Dave Hansen
4b3b234f43 x86/cpu: Rename "WESTMERE2" family to "NEHALEM_G"
Len Brown noticed something was amiss in our INTEL_FAM6_*
definitions.  It seems like model 0x1F was a Nehalem part,
marketed as "Intel Core i7 and i5 Processors" (according to the
SDM).  But, although it was a Nehalem 0x1F had some uncore events
which were shared with Westmere.

Len also mentioned he thought it was called "Havendale", which
Wikipedia says was graphics-oriented and canceled:

	https://en.wikipedia.org/wiki/Nehalem_(microarchitecture)

So either way, it's probably not imporant what we call it, but
call it Nehalem to be accurate, and add a "G" since it seems
graphics-related.  If it were canceled that would be a good reason
why it's so sparsely and inconsistently referred to in the code.

Signed-off-by: Dave Hansen <dave.hansen@linux.intel.com>
Cc: Dave Hansen <dave@sr71.net>
Cc: Len Brown <lenb@kernel.org>
Cc: Linus Torvalds <torvalds@linux-foundation.org>
Cc: Peter Zijlstra <peterz@infradead.org>
Cc: Thomas Gleixner <tglx@linutronix.de>
Link: http://lkml.kernel.org/r/20160629192737.949C41A8@viggo.jf.intel.com
Signed-off-by: Ingo Molnar <mingo@kernel.org>
2016-07-01 10:03:24 +02:00
Borislav Petkov
09c6c30e72 x86/amd_nb: Clean up init path
The initcall had unnecessary pr_notice() messages which are useless
noise on distro kernels.

Also, push the GART init error message where it belongs, *after* the
check whether the current hw we're loaded on, supports GART at all.

Signed-off-by: Borislav Petkov <bp@suse.de>
Cc: Linus Torvalds <torvalds@linux-foundation.org>
Cc: Peter Zijlstra <peterz@infradead.org>
Cc: Thomas Gleixner <tglx@linutronix.de>
Cc: Tony Battersby <tonyb@cybernetics.com>
Link: http://lkml.kernel.org/r/1466097230-5333-3-git-send-email-bp@alien8.de
Signed-off-by: Ingo Molnar <mingo@kernel.org>
2016-07-01 09:36:12 +02:00
Ingo Molnar
35c88cabb1 Merge branch 'x86/urgent' into x86/cpu, to pick up dependent fix
Signed-off-by: Ingo Molnar <mingo@kernel.org>
2016-07-01 09:35:49 +02:00
Borislav Petkov
1ead852dd8 x86/amd_nb: Fix boot crash on non-AMD systems
Fix boot crash that triggers if this driver is built into a kernel and
run on non-AMD systems.

AMD northbridges users call amd_cache_northbridges() and it returns
a negative value to signal that we weren't able to cache/detect any
northbridges on the system.

At least, it should do so as all its callers expect it to do so. But it
does return a negative value only when kmalloc() fails.

Fix it to return -ENODEV if there are no NBs cached as otherwise, amd_nb
users like amd64_edac, for example, which relies on it to know whether
it should load or not, gets loaded on systems like Intel Xeons where it
shouldn't.

Reported-and-tested-by: Tony Battersby <tonyb@cybernetics.com>
Signed-off-by: Borislav Petkov <bp@suse.de>
Cc: <stable@vger.kernel.org>
Cc: Linus Torvalds <torvalds@linux-foundation.org>
Cc: Peter Zijlstra <peterz@infradead.org>
Cc: Thomas Gleixner <tglx@linutronix.de>
Link: http://lkml.kernel.org/r/1466097230-5333-2-git-send-email-bp@alien8.de
Link: https://lkml.kernel.org/r/5761BEB0.9000807@cybernetics.com
Signed-off-by: Ingo Molnar <mingo@kernel.org>
2016-07-01 09:35:35 +02:00
Rafael J. Wysocki
65c0554b73 x86/power/64: Fix kernel text mapping corruption during image restoration
Logan Gunthorpe reports that hibernation stopped working reliably for
him after commit ab76f7b4ab (x86/mm: Set NX on gap between __ex_table
and rodata).

That turns out to be a consequence of a long-standing issue with the
64-bit image restoration code on x86, which is that the temporary
page tables set up by it to avoid page tables corruption when the
last bits of the image kernel's memory contents are copied into
their original page frames re-use the boot kernel's text mapping,
but that mapping may very well get corrupted just like any other
part of the page tables.  Of course, if that happens, the final
jump to the image kernel's entry point will go to nowhere.

The exact reason why commit ab76f7b4ab matters here is that it
sometimes causes a PMD of a large page to be split into PTEs
that are allocated dynamically and get corrupted during image
restoration as described above.

To fix that issue note that the code copying the last bits of the
image kernel's memory contents to the page frames occupied by them
previoulsy doesn't use the kernel text mapping, because it runs from
a special page covered by the identity mapping set up for that code
from scratch.  Hence, the kernel text mapping is only needed before
that code starts to run and then it will only be used just for the
final jump to the image kernel's entry point.

Accordingly, the temporary page tables set up in swsusp_arch_resume()
on x86-64 need to contain the kernel text mapping too.  That mapping
is only going to be used for the final jump to the image kernel, so
it only needs to cover the image kernel's entry point, because the
first thing the image kernel does after getting control back is to
switch over to its own original page tables.  Moreover, the virtual
address of the image kernel's entry point in that mapping has to be
the same as the one mapped by the image kernel's page tables.

With that in mind, modify the x86-64's arch_hibernation_header_save()
and arch_hibernation_header_restore() routines to pass the physical
address of the image kernel's entry point (in addition to its virtual
address) to the boot kernel (a small piece of assembly code involved
in passing the entry point's virtual address to the image kernel is
not necessary any more after that, so drop it).  Update RESTORE_MAGIC
too to reflect the image header format change.

Next, in set_up_temporary_mappings(), use the physical and virtual
addresses of the image kernel's entry point passed in the image
header to set up a minimum kernel text mapping (using memory pages
that won't be overwritten by the image kernel's memory contents) that
will map those addresses to each other as appropriate.

This makes the concern about the possible corruption of the original
boot kernel text mapping go away and if the the minimum kernel text
mapping used for the final jump marks the image kernel's entry point
memory as executable, the jump to it is guaraneed to succeed.

Fixes: ab76f7b4ab (x86/mm: Set NX on gap between __ex_table and rodata)
Link: http://marc.info/?l=linux-pm&m=146372852823760&w=2
Reported-by: Logan Gunthorpe <logang@deltatee.com>
Reported-and-tested-by: Borislav Petkov <bp@suse.de>
Tested-by: Kees Cook <keescook@chromium.org>
Signed-off-by: Rafael J. Wysocki <rafael.j.wysocki@intel.com>
2016-06-30 23:57:15 +02:00
Dave Hansen
8eda072e9d x86/cpufeature: Add helper macro for mask check macros
Every time we add a word to our cpu features, we need to add
something like this in two places:

	(((bit)>>5)==16 && (1UL<<((bit)&31) & REQUIRED_MASK16))

The trick is getting the "16" in this case in both places.  I've
now screwed this up twice, so as pennance, I've come up with
this patch to keep me and other poor souls from doing the same.

I also commented the logic behind the bit manipulation showcased
above.

Signed-off-by: Dave Hansen <dave.hansen@linux.intel.com>
Cc: Andy Lutomirski <luto@kernel.org>
Cc: Borislav Petkov <bp@alien8.de>
Cc: Brian Gerst <brgerst@gmail.com>
Cc: Dave Hansen <dave@sr71.net>
Cc: Denys Vlasenko <dvlasenk@redhat.com>
Cc: H. Peter Anvin <hpa@zytor.com>
Cc: Josh Poimboeuf <jpoimboe@redhat.com>
Cc: Linus Torvalds <torvalds@linux-foundation.org>
Cc: Peter Zijlstra <peterz@infradead.org>
Cc: Thomas Gleixner <tglx@linutronix.de>
Link: http://lkml.kernel.org/r/20160629200110.1BA8949E@viggo.jf.intel.com
Signed-off-by: Ingo Molnar <mingo@kernel.org>
2016-06-30 09:11:32 +02:00
Dave Hansen
1e61f78baf x86/cpufeature: Make sure DISABLED/REQUIRED macros are updated
x86 has two macros which allow us to evaluate some CPUID-based
features at compile time:

	REQUIRED_MASK_BIT_SET()
	DISABLED_MASK_BIT_SET()

They're both defined by having the compiler check the bit
argument against some constant masks of features.

But, when adding new CPUID leaves, we need to check new words
for these macros.  So make sure that those macros and the
REQUIRED_MASK* and DISABLED_MASK* get updated when necessary.

This looks kinda silly to have an open-coded value ("18" in
this case) open-coded in 5 places in the code.  But, we really do
need 5 places updated when NCAPINTS gets bumped, so now we just
force the issue.

Signed-off-by: Dave Hansen <dave.hansen@linux.intel.com>
Cc: Andy Lutomirski <luto@kernel.org>
Cc: Borislav Petkov <bp@alien8.de>
Cc: Brian Gerst <brgerst@gmail.com>
Cc: Dave Hansen <dave@sr71.net>
Cc: Denys Vlasenko <dvlasenk@redhat.com>
Cc: H. Peter Anvin <hpa@zytor.com>
Cc: Josh Poimboeuf <jpoimboe@redhat.com>
Cc: Linus Torvalds <torvalds@linux-foundation.org>
Cc: Peter Zijlstra <peterz@infradead.org>
Cc: Thomas Gleixner <tglx@linutronix.de>
Link: http://lkml.kernel.org/r/20160629200108.92466F6F@viggo.jf.intel.com
Signed-off-by: Ingo Molnar <mingo@kernel.org>
2016-06-30 09:11:32 +02:00
Dave Hansen
6e17cb9c2d x86/cpufeature: Update cpufeaure macros
We had a new CPUID "NCAPINT" word added, but the REQUIRED_MASK and
DISABLED_MASK macros did not get updated.  Update them.

None of the features was needed in these masks, so there was no
harm, but we should keep them updated anyway.

Signed-off-by: Dave Hansen <dave.hansen@linux.intel.com>
Cc: Andy Lutomirski <luto@kernel.org>
Cc: Borislav Petkov <bp@alien8.de>
Cc: Brian Gerst <brgerst@gmail.com>
Cc: Dave Hansen <dave@sr71.net>
Cc: Denys Vlasenko <dvlasenk@redhat.com>
Cc: H. Peter Anvin <hpa@zytor.com>
Cc: Josh Poimboeuf <jpoimboe@redhat.com>
Cc: Linus Torvalds <torvalds@linux-foundation.org>
Cc: Peter Zijlstra <peterz@infradead.org>
Cc: Thomas Gleixner <tglx@linutronix.de>
Link: http://lkml.kernel.org/r/20160629200107.8D3C9A31@viggo.jf.intel.com
Signed-off-by: Ingo Molnar <mingo@kernel.org>
2016-06-30 09:11:32 +02:00
Megha Dey
bee5cfd9f6 crypto: sha512-mb - Crypto computation (x4 AVX2)
This patch introduces the assembly routines to do SHA512 computation on
buffers belonging to several jobs at once. The assembly routines are
optimized with AVX2 instructions that have 4 data lanes and using AVX2
registers.

Signed-off-by: Megha Dey <megha.dey@linux.intel.com>
Reviewed-by: Fenghua Yu <fenghua.yu@intel.com>
Reviewed-by: Tim Chen <tim.c.chen@linux.intel.com>
Signed-off-by: Herbert Xu <herbert@gondor.apana.org.au>
2016-06-28 16:06:40 +08:00
Megha Dey
2cdacb68d7 crypto: sha512-mb - Algorithm data structures
This patch introduces the data structures and prototypes of functions
needed for computing SHA512 hash using multi-buffer. Included are the
structures of the multi-buffer SHA512 job, job scheduler in C and x86
assembly.

Signed-off-by: Megha Dey <megha.dey@linux.intel.com>
Reviewed-by: Fenghua Yu <fenghua.yu@intel.com>
Reviewed-by: Tim Chen <tim.c.chen@linux.intel.com>
Signed-off-by: Herbert Xu <herbert@gondor.apana.org.au>
2016-06-28 16:06:39 +08:00
Megha Dey
45691e2d9b crypto: sha512-mb - submit/flush routines for AVX2
This patch introduces the routines used to submit and flush buffers
belonging to SHA512 crypto jobs to the SHA512 multibuffer algorithm.
It is implemented mostly in assembly optimized with AVX2 instructions.

Signed-off-by: Megha Dey <megha.dey@linux.intel.com>
Reviewed-by: Fenghua Yu <fenghua.yu@intel.com>
Reviewed-by: Tim Chen <tim.c.chen@linux.intel.com>
Signed-off-by: Herbert Xu <herbert@gondor.apana.org.au>
2016-06-28 16:06:38 +08:00
Megha Dey
8c603ff286 crypto: sha512-mb - SHA512 multibuffer job manager and glue code
This patch introduces the multi-buffer job manager which is responsible
for submitting scatter-gather buffers from several SHA512 jobs to the
multi-buffer algorithm. It also contains the flush routine that's called
by the crypto daemon to complete the job when no new jobs arrive before
the deadline of maximum latency of a SHA512 crypto job.

The SHA512 multi-buffer crypto algorithm is defined and initialized in this
patch.

Signed-off-by: Megha Dey <megha.dey@linux.intel.com>
Reviewed-by: Fenghua Yu <fenghua.yu@intel.com>
Reviewed-by: Tim Chen <tim.c.chen@linux.intel.com>
Signed-off-by: Herbert Xu <herbert@gondor.apana.org.au>
2016-06-28 16:06:35 +08:00
Quentin Casasnovas
ff30ef40de KVM: nVMX: VMX instructions: fix segment checks when L1 is in long mode.
I couldn't get Xen to boot a L2 HVM when it was nested under KVM - it was
getting a GP(0) on a rather unspecial vmread from Xen:

     (XEN) ----[ Xen-4.7.0-rc  x86_64  debug=n  Not tainted ]----
     (XEN) CPU:    1
     (XEN) RIP:    e008:[<ffff82d0801e629e>] vmx_get_segment_register+0x14e/0x450
     (XEN) RFLAGS: 0000000000010202   CONTEXT: hypervisor (d1v0)
     (XEN) rax: ffff82d0801e6288   rbx: ffff83003ffbfb7c   rcx: fffffffffffab928
     (XEN) rdx: 0000000000000000   rsi: 0000000000000000   rdi: ffff83000bdd0000
     (XEN) rbp: ffff83000bdd0000   rsp: ffff83003ffbfab0   r8:  ffff830038813910
     (XEN) r9:  ffff83003faf3958   r10: 0000000a3b9f7640   r11: ffff83003f82d418
     (XEN) r12: 0000000000000000   r13: ffff83003ffbffff   r14: 0000000000004802
     (XEN) r15: 0000000000000008   cr0: 0000000080050033   cr4: 00000000001526e0
     (XEN) cr3: 000000003fc79000   cr2: 0000000000000000
     (XEN) ds: 0000   es: 0000   fs: 0000   gs: 0000   ss: 0000   cs: e008
     (XEN) Xen code around <ffff82d0801e629e> (vmx_get_segment_register+0x14e/0x450):
     (XEN)  00 00 41 be 02 48 00 00 <44> 0f 78 74 24 08 0f 86 38 56 00 00 b8 08 68 00
     (XEN) Xen stack trace from rsp=ffff83003ffbfab0:

     ...

     (XEN) Xen call trace:
     (XEN)    [<ffff82d0801e629e>] vmx_get_segment_register+0x14e/0x450
     (XEN)    [<ffff82d0801f3695>] get_page_from_gfn_p2m+0x165/0x300
     (XEN)    [<ffff82d0801bfe32>] hvmemul_get_seg_reg+0x52/0x60
     (XEN)    [<ffff82d0801bfe93>] hvm_emulate_prepare+0x53/0x70
     (XEN)    [<ffff82d0801ccacb>] handle_mmio+0x2b/0xd0
     (XEN)    [<ffff82d0801be591>] emulate.c#_hvm_emulate_one+0x111/0x2c0
     (XEN)    [<ffff82d0801cd6a4>] handle_hvm_io_completion+0x274/0x2a0
     (XEN)    [<ffff82d0801f334a>] __get_gfn_type_access+0xfa/0x270
     (XEN)    [<ffff82d08012f3bb>] timer.c#add_entry+0x4b/0xb0
     (XEN)    [<ffff82d08012f80c>] timer.c#remove_entry+0x7c/0x90
     (XEN)    [<ffff82d0801c8433>] hvm_do_resume+0x23/0x140
     (XEN)    [<ffff82d0801e4fe7>] vmx_do_resume+0xa7/0x140
     (XEN)    [<ffff82d080164aeb>] context_switch+0x13b/0xe40
     (XEN)    [<ffff82d080128e6e>] schedule.c#schedule+0x22e/0x570
     (XEN)    [<ffff82d08012c0cc>] softirq.c#__do_softirq+0x5c/0x90
     (XEN)    [<ffff82d0801602c5>] domain.c#idle_loop+0x25/0x50
     (XEN)
     (XEN)
     (XEN) ****************************************
     (XEN) Panic on CPU 1:
     (XEN) GENERAL PROTECTION FAULT
     (XEN) [error_code=0000]
     (XEN) ****************************************

Tracing my host KVM showed it was the one injecting the GP(0) when
emulating the VMREAD and checking the destination segment permissions in
get_vmx_mem_address():

     3)               |    vmx_handle_exit() {
     3)               |      handle_vmread() {
     3)               |        nested_vmx_check_permission() {
     3)               |          vmx_get_segment() {
     3)   0.074 us    |            vmx_read_guest_seg_base();
     3)   0.065 us    |            vmx_read_guest_seg_selector();
     3)   0.066 us    |            vmx_read_guest_seg_ar();
     3)   1.636 us    |          }
     3)   0.058 us    |          vmx_get_rflags();
     3)   0.062 us    |          vmx_read_guest_seg_ar();
     3)   3.469 us    |        }
     3)               |        vmx_get_cs_db_l_bits() {
     3)   0.058 us    |          vmx_read_guest_seg_ar();
     3)   0.662 us    |        }
     3)               |        get_vmx_mem_address() {
     3)   0.068 us    |          vmx_cache_reg();
     3)               |          vmx_get_segment() {
     3)   0.074 us    |            vmx_read_guest_seg_base();
     3)   0.068 us    |            vmx_read_guest_seg_selector();
     3)   0.071 us    |            vmx_read_guest_seg_ar();
     3)   1.756 us    |          }
     3)               |          kvm_queue_exception_e() {
     3)   0.066 us    |            kvm_multiple_exception();
     3)   0.684 us    |          }
     3)   4.085 us    |        }
     3)   9.833 us    |      }
     3) + 10.366 us   |    }

Cross-checking the KVM/VMX VMREAD emulation code with the Intel Software
Developper Manual Volume 3C - "VMREAD - Read Field from Virtual-Machine
Control Structure", I found that we're enforcing that the destination
operand is NOT located in a read-only data segment or any code segment when
the L1 is in long mode - BUT that check should only happen when it is in
protected mode.

Shuffling the code a bit to make our emulation follow the specification
allows me to boot a Xen dom0 in a nested KVM and start HVM L2 guests
without problems.

Fixes: f9eb4af67c ("KVM: nVMX: VMX instructions: add checks for #GP/#SS exceptions")
Signed-off-by: Quentin Casasnovas <quentin.casasnovas@oracle.com>
Cc: Eugene Korenevsky <ekorenevsky@gmail.com>
Cc: Paolo Bonzini <pbonzini@redhat.com>
Cc: Radim Krčmář <rkrcmar@redhat.com>
Cc: Thomas Gleixner <tglx@linutronix.de>
Cc: Ingo Molnar <mingo@redhat.com>
Cc: H. Peter Anvin <hpa@zytor.com>
Cc: linux-stable <stable@vger.kernel.org>
Signed-off-by: Paolo Bonzini <pbonzini@redhat.com>
2016-06-27 15:30:44 +02:00
Marcelo Tosatti
b606f189c7 KVM: LAPIC: cap __delay at lapic_timer_advance_ns
The host timer which emulates the guest LAPIC TSC deadline
timer has its expiration diminished by lapic_timer_advance_ns
nanoseconds. Therefore if, at wait_lapic_expire, a difference
larger than lapic_timer_advance_ns is encountered, delay at most
lapic_timer_advance_ns.

This fixes a problem where the guest can cause the host
to delay for large amounts of time.

Reported-by: Alan Jenkins <alan.christopher.jenkins@gmail.com>
Signed-off-by: Marcelo Tosatti <mtosatti@redhat.com>
Signed-off-by: Paolo Bonzini <pbonzini@redhat.com>
2016-06-27 15:30:41 +02:00
Marcelo Tosatti
8d93c874ac KVM: x86: move nsec_to_cycles from x86.c to x86.h
Move the inline function nsec_to_cycles from x86.c to x86.h, as
the next patch uses it from lapic.c.

Signed-off-by: Marcelo Tosatti <mtosatti@redhat.com>
Signed-off-by: Paolo Bonzini <pbonzini@redhat.com>
2016-06-27 15:30:38 +02:00
Minfei Huang
ed911b43ad pvclock: Get rid of __pvclock_read_cycles in function pvclock_read_flags
There is a generic function __pvclock_read_cycles to be used to get both
flags and cycles. For function pvclock_read_flags, it's useless to get
cycles value. To make this function be more effective, get this variable
flags directly in function.

Signed-off-by: Minfei Huang <mnghuan@gmail.com>
Signed-off-by: Paolo Bonzini <pbonzini@redhat.com>
2016-06-27 15:12:15 +02:00
Minfei Huang
f7550d076d pvclock: Cleanup to remove function pvclock_get_nsec_offset
Function __pvclock_read_cycles is short enough, so there is no need to
have another function pvclock_get_nsec_offset to calculate tsc delta.
It's better to combine it into function __pvclock_read_cycles.

Remove useless variables in function __pvclock_read_cycles.

Signed-off-by: Minfei Huang <mnghuan@gmail.com>
Signed-off-by: Paolo Bonzini <pbonzini@redhat.com>
2016-06-27 15:12:14 +02:00
Minfei Huang
749d088b8e pvclock: Add CPU barriers to get correct version value
Protocol for the "version" fields is: hypervisor raises it (making it
uneven) before it starts updating the fields and raises it again (making
it even) when it is done.  Thus the guest can make sure the time values
it got are consistent by checking the version before and after reading
them.

Add CPU barries after getting version value just like what function
vread_pvclock does, because all of callees in this function is inline.

Fixes: 502dfeff23
Cc: stable@vger.kernel.org
Signed-off-by: Minfei Huang <mnghuan@gmail.com>
Signed-off-by: Paolo Bonzini <pbonzini@redhat.com>
2016-06-27 15:12:14 +02:00
Arnd Bergmann
b684e9bc75 x86/efi: Remove the unused efi_get_time() function
Nothing calls the efi_get_time() function on x86, but it does suffer
from the 32-bit time_t overflow in 2038.

This removes the function, we can always put it back in case we need
it later.

Signed-off-by: Arnd Bergmann <arnd@arndb.de>
Signed-off-by: Matt Fleming <matt@codeblueprint.co.uk>
Acked-by: Ard Biesheuvel <ard.biesheuvel@linaro.org>
Cc: Linus Torvalds <torvalds@linux-foundation.org>
Cc: Peter Zijlstra <peterz@infradead.org>
Cc: Thomas Gleixner <tglx@linutronix.de>
Cc: linux-efi@vger.kernel.org
Link: http://lkml.kernel.org/r/1466839230-12781-8-git-send-email-matt@codeblueprint.co.uk
Signed-off-by: Ingo Molnar <mingo@kernel.org>
2016-06-27 13:06:58 +02:00
Alex Thorlton
21f866257c x86/efi: Update efi_thunk() to use the the arch_efi_call_virt*() macros
Currently, the efi_thunk macro has some semi-duplicated code in it that
can be replaced with the arch_efi_call_virt_setup/teardown macros. This
commit simply replaces the duplicated code with those macros.

Suggested-by: Matt Fleming <matt@codeblueprint.co.uk>
Signed-off-by: Alex Thorlton <athorlton@sgi.com>
Signed-off-by: Matt Fleming <matt@codeblueprint.co.uk>
Cc: Ard Biesheuvel <ard.biesheuvel@linaro.org>
Cc: Catalin Marinas <catalin.marinas@arm.com>
Cc: Dimitri Sivanich <sivanich@sgi.com>
Cc: Linus Torvalds <torvalds@linux-foundation.org>
Cc: Mark Rutland <mark.rutland@arm.com>
Cc: Peter Zijlstra <peterz@infradead.org>
Cc: Roy Franz <roy.franz@linaro.org>
Cc: Russ Anderson <rja@sgi.com>
Cc: Russell King <linux@armlinux.org.uk>
Cc: Thomas Gleixner <tglx@linutronix.de>
Cc: Will Deacon <will.deacon@arm.com>
Cc: linux-arm-kernel@lists.infradead.org
Cc: linux-efi@vger.kernel.org
Link: http://lkml.kernel.org/r/1466839230-12781-7-git-send-email-matt@codeblueprint.co.uk
[ Renamed variables to the standard __ prefix. ]
Signed-off-by: Ingo Molnar <mingo@kernel.org>
2016-06-27 13:06:57 +02:00
Alex Thorlton
d1be84a232 x86/uv: Update uv_bios_call() to use efi_call_virt_pointer()
Now that the efi_call_virt() macro has been generalized to be able to
use EFI system tables besides efi.systab, we are able to convert our
uv_bios_call() wrapper to use this standard EFI callback mechanism.

This simple change is part of a much larger effort to recover from some
issues with the way we were mapping in some of our MMRs, and the way
that we were doing our BIOS callbacks, which were uncovered by commit
67a9108ed4 ("x86/efi: Build our own page table structures").

The first issue that this uncovered was that we were relying on the EFI
memory mapping mechanism to map in our MMR space for us, which, while
reliable, was technically a bug, as it relied on "undefined" behavior in
the mapping code.

The reason we were able to piggyback on the EFI memory mapping code to
map in our MMRs was because, previously, EFI code used the
trampoline_pgd, which shares a few entries with the main kernel pgd.  It
just so happened, that the memory range containing our MMRs was inside
one of those shared regions, which kept our code working without issue
for quite a while.

Anyways, once we discovered this problem, we brought back our original
code to map in the MMRs with commit:

  08914f436b ("x86/platform/UV: Bring back the call to map_low_mmrs in uv_system_init")

This got our systems a little further along, but we were still running
into trouble with our EFI callbacks, which prevented us from booting
all the way up.

Our first step towards fixing the BIOS callbacks was to get our
uv_bios_call() wrapper updated to use efi_call_virt() instead of the plain
efi_call().  The previous patch took care of the effort needed to make
that possible.  Along the way, we hit a major issue with some confusion
about how to properly pull arguments higher than number 6 off the stack
in the efi_call() code, which resulted in the following commit from Linus:

  683ad8092c ("x86/efi: Fix 7-parameter efi_call()s")

Now that all of those issues are out of the way, we're able to make this
simple change to use the new efi_call_virt_pointer() in uv_bios_call()
which gets our machines booting, running properly, and able to execute our
callbacks with 6+ arguments.

Note that, since we are now using the EFI page table when we make our
function call, we are no longer able to make the call using the __va()
of our function pointer, since the memory range containing that address
isn't mapped into the EFI page table.  For now, we will use the physical
address of the function directly, since that is mapped into the EFI page
table.  In the near future, we're going to get some code added in to
properly update our function pointer to its virtual address during
SetVirtualAddressMap.

Signed-off-by: Alex Thorlton <athorlton@sgi.com>
Signed-off-by: Matt Fleming <matt@codeblueprint.co.uk>
Cc: Ard Biesheuvel <ard.biesheuvel@linaro.org>
Cc: Catalin Marinas <catalin.marinas@arm.com>
Cc: Dimitri Sivanich <sivanich@sgi.com>
Cc: Linus Torvalds <torvalds@linux-foundation.org>
Cc: Mark Rutland <mark.rutland@arm.com>
Cc: Peter Zijlstra <peterz@infradead.org>
Cc: Roy Franz <roy.franz@linaro.org>
Cc: Russ Anderson <rja@sgi.com>
Cc: Russell King <linux@armlinux.org.uk>
Cc: Thomas Gleixner <tglx@linutronix.de>
Cc: Will Deacon <will.deacon@arm.com>
Cc: linux-arm-kernel@lists.infradead.org
Cc: linux-efi@vger.kernel.org
Link: http://lkml.kernel.org/r/1466839230-12781-6-git-send-email-matt@codeblueprint.co.uk
Signed-off-by: Ingo Molnar <mingo@kernel.org>
2016-06-27 13:06:56 +02:00
Alex Thorlton
80e7559607 efi: Convert efi_call_virt() to efi_call_virt_pointer()
This commit makes a few slight modifications to the efi_call_virt() macro
to get it to work with function pointers that are stored in locations
other than efi.systab->runtime, and renames the macro to
efi_call_virt_pointer().  The majority of the changes here are to pull
these macros up into header files so that they can be accessed from
outside of drivers/firmware/efi/runtime-wrappers.c.

The most significant change not directly related to the code move is to
add an extra "p" argument into the appropriate efi_call macros, and use
that new argument in place of the, formerly hard-coded,
efi.systab->runtime pointer.

The last piece of the puzzle was to add an efi_call_virt() macro back into
drivers/firmware/efi/runtime-wrappers.c to wrap around the new
efi_call_virt_pointer() macro - this was mainly to keep the code from
looking too cluttered by adding a bunch of extra references to
efi.systab->runtime everywhere.

Note that I also broke up the code in the efi_call_virt_pointer() macro a
bit in the process of moving it.

Signed-off-by: Alex Thorlton <athorlton@sgi.com>
Signed-off-by: Matt Fleming <matt@codeblueprint.co.uk>
Cc: Ard Biesheuvel <ard.biesheuvel@linaro.org>
Cc: Catalin Marinas <catalin.marinas@arm.com>
Cc: Dimitri Sivanich <sivanich@sgi.com>
Cc: Linus Torvalds <torvalds@linux-foundation.org>
Cc: Mark Rutland <mark.rutland@arm.com>
Cc: Peter Zijlstra <peterz@infradead.org>
Cc: Roy Franz <roy.franz@linaro.org>
Cc: Russ Anderson <rja@sgi.com>
Cc: Russell King <linux@armlinux.org.uk>
Cc: Thomas Gleixner <tglx@linutronix.de>
Cc: Will Deacon <will.deacon@arm.com>
Cc: linux-arm-kernel@lists.infradead.org
Cc: linux-efi@vger.kernel.org
Link: http://lkml.kernel.org/r/1466839230-12781-5-git-send-email-matt@codeblueprint.co.uk
Signed-off-by: Ingo Molnar <mingo@kernel.org>
2016-06-27 13:06:56 +02:00
Colin Ian King
f6d1747f89 x86/efi: Remove unused variable 'efi'
Remove unused variable 'efi', it is never used. This fixes the following
clang build warning:

  arch/x86/boot/compressed/eboot.c:803:2: warning: Value stored to 'efi' is never read

Signed-off-by: Colin Ian King <colin.king@canonical.com>
Signed-off-by: Matt Fleming <matt@codeblueprint.co.uk>
Cc: Ard Biesheuvel <ard.biesheuvel@linaro.org>
Cc: Linus Torvalds <torvalds@linux-foundation.org>
Cc: Peter Zijlstra <peterz@infradead.org>
Cc: Thomas Gleixner <tglx@linutronix.de>
Cc: linux-efi@vger.kernel.org
Link: http://lkml.kernel.org/r/1466839230-12781-4-git-send-email-matt@codeblueprint.co.uk
Signed-off-by: Ingo Molnar <mingo@kernel.org>
2016-06-27 13:06:55 +02:00
Borislav Petkov
dbf984d825 x86/boot/64: Add forgotten end of function marker
Add secondary_startup_64()'s ENDPROC() marker.

No functionality change.

Signed-off-by: Borislav Petkov <bp@suse.de>
Cc: Andy Lutomirski <luto@kernel.org>
Cc: Borislav Petkov <bp@alien8.de>
Cc: Brian Gerst <brgerst@gmail.com>
Cc: Denys Vlasenko <dvlasenk@redhat.com>
Cc: H. Peter Anvin <hpa@zytor.com>
Cc: Josh Poimboeuf <jpoimboe@redhat.com>
Cc: Linus Torvalds <torvalds@linux-foundation.org>
Cc: Peter Zijlstra <peterz@infradead.org>
Cc: Thomas Gleixner <tglx@linutronix.de>
Cc: linux-kernel@vger.kernel.org
Link: http://lkml.kernel.org/r/20160625112457.16930-1-bp@alien8.de
Signed-off-by: Ingo Molnar <mingo@kernel.org>
2016-06-27 12:20:31 +02:00
Ingo Molnar
630741fb60 Merge branch 'sched/urgent' into sched/core, to pick up fixes
Signed-off-by: Ingo Molnar <mingo@kernel.org>
2016-06-27 11:35:02 +02:00
Peter Zijlstra
d4cf1949f9 perf/x86/intel: Add {rd,wr}lbr_{to,from} wrappers
The whole rdmsr()/wrmsr() for lbr_from got a little unweildy with the
sign extension quirk, provide a few simple wrappers to clean things up.

Signed-off-by: Peter Zijlstra (Intel) <peterz@infradead.org>
Cc: Alexander Shishkin <alexander.shishkin@linux.intel.com>
Cc: Arnaldo Carvalho de Melo <acme@redhat.com>
Cc: David Carrillo-Cisneros <davidcc@google.com>
Cc: Jiri Olsa <jolsa@redhat.com>
Cc: Kan Liang <kan.liang@intel.com>
Cc: Linus Torvalds <torvalds@linux-foundation.org>
Cc: Peter Zijlstra <peterz@infradead.org>
Cc: Stephane Eranian <eranian@google.com>
Cc: Thomas Gleixner <tglx@linutronix.de>
Cc: Vince Weaver <vincent.weaver@maine.edu>
Signed-off-by: Ingo Molnar <mingo@kernel.org>
2016-06-27 11:34:21 +02:00
David Carrillo-Cisneros
71adae99ed perf/x86/intel: Add MSR_LAST_BRANCH_FROM_x quirk for ctx switch
Add quirk for context switch to save/restore the value of
MSR_LAST_BRANCH_FROM_x when LBR is enabled and there is potential for
kernel addresses to be in the lbr_from register.

To test this patch, use a perf tool and kernel with the patch
next in this series. That patch removes the work around that masked
the hw bug:

  $ ./lbr_perf record --call-graph lbr -e cycles:k sleep 1

where lbr_perf is the patched perf tool, that allows to specify :k
on lbr mode. The above command will trigger a #GPF :

 WARNING: CPU: 28 PID: 14096 at arch/x86/mm/extable.c:65 ex_handler_wrmsr_unsafe+0x70/0x80
 unchecked MSR access error: WRMSR to 0x681 (tried to write 0x1fffffff81010794)
 ...
 Call Trace:
  [<ffffffff8167af49>] dump_stack+0x4d/0x63
  [<ffffffff810b9b15>] __warn+0xe5/0x100
  [<ffffffff810b9be9>] warn_slowpath_fmt+0x49/0x50
  [<ffffffff810abb40>] ex_handler_wrmsr_unsafe+0x70/0x80
  [<ffffffff810abc42>] fixup_exception+0x42/0x50
  [<ffffffff81079d1a>] do_general_protection+0x8a/0x160
  [<ffffffff81684ec2>] general_protection+0x22/0x30
  [<ffffffff810101b9>] ? intel_pmu_lbr_sched_task+0xc9/0x380
  [<ffffffff81009d7c>] intel_pmu_sched_task+0x3c/0x60
  [<ffffffff81003a2b>] x86_pmu_sched_task+0x1b/0x20
  [<ffffffff81192a5b>] perf_pmu_sched_task+0x6b/0xb0
  [<ffffffff8119746d>] __perf_event_task_sched_in+0x7d/0x150
  [<ffffffff810dd9dc>] finish_task_switch+0x15c/0x200
  [<ffffffff8167f894>] __schedule+0x274/0x6cc
  [<ffffffff8167fdd9>] schedule+0x39/0x90
  [<ffffffff81675398>] exit_to_usermode_loop+0x39/0x89
  [<ffffffff810028ce>] prepare_exit_to_usermode+0x2e/0x30
  [<ffffffff81683c1b>] retint_user+0x8/0x10

Signed-off-by: David Carrillo-Cisneros <davidcc@google.com>
Signed-off-by: Peter Zijlstra (Intel) <peterz@infradead.org>
Reviewed-by: Stephane Eranian <eranian@google.com>
Reviewed-by: Andi Kleen <ak@linux.intel.com>
Cc: Alexander Shishkin <alexander.shishkin@linux.intel.com>
Cc: Arnaldo Carvalho de Melo <acme@redhat.com>
Cc: Jiri Olsa <jolsa@redhat.com>
Cc: Kan Liang <kan.liang@intel.com>
Cc: Linus Torvalds <torvalds@linux-foundation.org>
Cc: Peter Zijlstra <peterz@infradead.org>
Cc: Thomas Gleixner <tglx@linutronix.de>
Cc: Vince Weaver <vincent.weaver@maine.edu>
Link: http://lkml.kernel.org/r/1466533874-52003-5-git-send-email-davidcc@google.com
Signed-off-by: Ingo Molnar <mingo@kernel.org>
2016-06-27 11:34:20 +02:00
David Carrillo-Cisneros
3812bba84f perf/x86/intel: Fix trivial formatting and style bug
Replace spaces by tabs in LBR_FROM_* constants to align with newly
defined constant. Use BIT_ULL.

Signed-off-by: David Carrillo-Cisneros <davidcc@google.com>
Signed-off-by: Peter Zijlstra (Intel) <peterz@infradead.org>
Reviewed-by: Stephane Eranian <eranian@google.com>
Reviewed-by: Andi Kleen <ak@linux.intel.com>
Cc: Alexander Shishkin <alexander.shishkin@linux.intel.com>
Cc: Arnaldo Carvalho de Melo <acme@redhat.com>
Cc: Jiri Olsa <jolsa@redhat.com>
Cc: Kan Liang <kan.liang@intel.com>
Cc: Linus Torvalds <torvalds@linux-foundation.org>
Cc: Peter Zijlstra <peterz@infradead.org>
Cc: Thomas Gleixner <tglx@linutronix.de>
Cc: Vince Weaver <vincent.weaver@maine.edu>
Link: http://lkml.kernel.org/r/1466533874-52003-4-git-send-email-davidcc@google.com
Signed-off-by: Ingo Molnar <mingo@kernel.org>
2016-06-27 11:34:19 +02:00
David Carrillo-Cisneros
19fc9ddd61 perf/x86/intel: Fix MSR_LAST_BRANCH_FROM_x bug when no TSX
Intel's SDM states that bits 61:62 in MSR_LAST_BRANCH_FROM_x are the
TSX flags for formats with LBR_TSX flags (i.e. LBR_FORMAT_EIP_EFLAGS2).

However, when the CPU has TSX support deactivated, bits 61:62 actually
behave as follows:

  - For wrmsr(), bits 61:62 are considered part of the sign extension.
  - When capturing branches, the LBR hw will always clear bits 61:62.
    regardless of the sign extension.

Therefore, if:

  1) LBR has TSX format.
  2) CPU has no TSX support enabled.

... then any value passed to wrmsr() must be sign extended to 63 bits
and any value from rdmsr() must be converted to have a sign extension
of 61 bits, ignoring the values at TSX flags.

This bug was masked by the work-around to the Intel's CPU bug:
BJ94. "LBR May Contain Incorrect Information When Using FREEZE_LBRS_ON_PMI"
in Document Number: 324643-037US.

The aforementioned work-around uses hw flags to filter out all kernel
branches, limiting LBR callstack to user level execution only.

Since user addresses are not sign extended, they do not trigger the wrmsr()
bug in MSR_LAST_BRANCH_FROM_x when saved/restored at context switch.

To verify the hw bug:

  $ perf record -b -e cycles sleep 1
  $ rdmsr -p 0 0x680
  0x1fffffffb0b9b0cc
  $ wrmsr -p 0 0x680 0x1fffffffb0b9b0cc
  write(): Input/output error

The quirk for LBR_FROM_ MSRs is required before calls to wrmsrl() and
after rdmsrl().

This patch introduces it for wrmsrl()'s done for testing LBR support.

Future patch in series adds the quirk for context switch, that would
be required if LBR callstack is to be enabled for ring 0.

Signed-off-by: David Carrillo-Cisneros <davidcc@google.com>
Signed-off-by: Peter Zijlstra (Intel) <peterz@infradead.org>
Reviewed-by: Stephane Eranian <eranian@google.com>
Reviewed-by: Andi Kleen <ak@linux.intel.com>
Cc: Alexander Shishkin <alexander.shishkin@linux.intel.com>
Cc: Arnaldo Carvalho de Melo <acme@redhat.com>
Cc: Jiri Olsa <jolsa@redhat.com>
Cc: Kan Liang <kan.liang@intel.com>
Cc: Linus Torvalds <torvalds@linux-foundation.org>
Cc: Peter Zijlstra <peterz@infradead.org>
Cc: Thomas Gleixner <tglx@linutronix.de>
Cc: Vince Weaver <vincent.weaver@maine.edu>
Link: http://lkml.kernel.org/r/1466533874-52003-3-git-send-email-davidcc@google.com
Signed-off-by: Ingo Molnar <mingo@kernel.org>
2016-06-27 11:34:19 +02:00
David Carrillo-Cisneros
f09509b939 perf/x86/intel: Print LBR support statement after validation
The following commit:

  338b522ca4 ("perf/x86/intel: Protect LBR and extra_regs against KVM lying")

added an additional test to LBR support detection that is performed after
printing the LBR support statement to dmesg.

Move the LBR support output after the very last test, to make sure we
print the true status of LBR support.

Signed-off-by: David Carrillo-Cisneros <davidcc@google.com>
Signed-off-by: Peter Zijlstra (Intel) <peterz@infradead.org>
Reviewed-by: Stephane Eranian <eranian@google.com>
Reviewed-by: Andi Kleen <ak@linux.intel.com>
Cc: Alexander Shishkin <alexander.shishkin@linux.intel.com>
Cc: Arnaldo Carvalho de Melo <acme@redhat.com>
Cc: Jiri Olsa <jolsa@redhat.com>
Cc: Kan Liang <kan.liang@intel.com>
Cc: Linus Torvalds <torvalds@linux-foundation.org>
Cc: Peter Zijlstra <peterz@infradead.org>
Cc: Thomas Gleixner <tglx@linutronix.de>
Cc: Vince Weaver <vincent.weaver@maine.edu>
Link: http://lkml.kernel.org/r/1466533874-52003-2-git-send-email-davidcc@google.com
Signed-off-by: Ingo Molnar <mingo@kernel.org>
2016-06-27 11:34:18 +02:00