Commit Graph

22740 Commits

Author SHA1 Message Date
Linus Torvalds
f0d8690ad4 Merge tag 'for-linus' of git://git.kernel.org/pub/scm/virt/kvm/kvm
Pull KVM fixes from Paolo Bonzini:
 "This includes a fix for two oopses, one on PPC and on x86.

  The rest is fixes for bugs with newer Intel processors"

* tag 'for-linus' of git://git.kernel.org/pub/scm/virt/kvm/kvm:
  kvm/fpu: Enable eager restore kvm FPU for MPX
  Revert "KVM: x86: drop fpu_activate hook"
  kvm: fix crash in kvm_vcpu_reload_apic_access_page
  KVM: MMU: fix SMAP virtualization
  KVM: MMU: fix CR4.SMEP=1, CR0.WP=0 with shadow pages
  KVM: MMU: fix smap permission check
  KVM: PPC: Book3S HV: Fix list traversal in error case
2015-05-21 20:15:16 -07:00
Alexei Starovoitov
b52f00e6a7 x86: bpf_jit: implement bpf_tail_call() helper
bpf_tail_call() arguments:
ctx - context pointer
jmp_table - one of BPF_MAP_TYPE_PROG_ARRAY maps used as the jump table
index - index in the jump table

In this implementation x64 JIT bypasses stack unwind and jumps into the
callee program after prologue, so the callee program reuses the same stack.

The logic can be roughly expressed in C like:

u32 tail_call_cnt;

void *jumptable[2] = { &&label1, &&label2 };

int bpf_prog1(void *ctx)
{
label1:
    ...
}

int bpf_prog2(void *ctx)
{
label2:
    ...
}

int bpf_prog1(void *ctx)
{
    ...
    if (tail_call_cnt++ < MAX_TAIL_CALL_CNT)
        goto *jumptable[index]; ... and pass my 'ctx' to callee ...

    ... fall through if no entry in jumptable ...
}

Note that 'skip current program epilogue and next program prologue' is
an optimization. Other JITs don't have to do it the same way.
>From safety point of view it's valid as well, since programs always
initialize the stack before use, so any residue in the stack left by
the current program is not going be read. The same verifier checks are
done for the calls from the kernel into all bpf programs.

Signed-off-by: Alexei Starovoitov <ast@plumgrid.com>
Signed-off-by: David S. Miller <davem@davemloft.net>
2015-05-21 17:07:59 -04:00
Paolo Bonzini
a9b4fb7e79 Merge branch 'kvm-master' into kvm-next
Grab MPX bugfix, and fix conflicts against Rik's adaptive FPU
deactivation patch.

Signed-off-by: Paolo Bonzini <pbonzini@redhat.com>
2015-05-20 12:31:37 +02:00
Liang Li
c447e76b4c kvm/fpu: Enable eager restore kvm FPU for MPX
The MPX feature requires eager KVM FPU restore support. We have verified
that MPX cannot work correctly with the current lazy KVM FPU restore
mechanism. Eager KVM FPU restore should be enabled if the MPX feature is
exposed to VM.

Signed-off-by: Yang Zhang <yang.z.zhang@intel.com>
Signed-off-by: Liang Li <liang.z.li@intel.com>
[Also activate the FPU on AMD processors. - Paolo]
Cc: stable@vger.kernel.org
Signed-off-by: Paolo Bonzini <pbonzini@redhat.com>
2015-05-20 12:30:26 +02:00
Paolo Bonzini
0fdd74f778 Revert "KVM: x86: drop fpu_activate hook"
This reverts commit 4473b570a7.  We'll
use the hook again.

Cc: stable@vger.kernel.org
Signed-off-by: Paolo Bonzini <pbonzini@redhat.com>
2015-05-20 12:30:15 +02:00
Andrea Arcangeli
e8fd5e9e99 kvm: fix crash in kvm_vcpu_reload_apic_access_page
memslot->userfault_addr is set by the kernel with a mmap executed
from the kernel but the userland can still munmap it and lead to the
below oops after memslot->userfault_addr points to a host virtual
address that has no vma or mapping.

[  327.538306] BUG: unable to handle kernel paging request at fffffffffffffffe
[  327.538407] IP: [<ffffffff811a7b55>] put_page+0x5/0x50
[  327.538474] PGD 1a01067 PUD 1a03067 PMD 0
[  327.538529] Oops: 0000 [#1] SMP
[  327.538574] Modules linked in: macvtap macvlan xt_CHECKSUM iptable_mangle ipt_MASQUERADE nf_nat_masquerade_ipv4 iptable_nat nf_nat_ipv4 nf_nat nf_conntrack_ipv4 nf_defrag_ipv4 xt_conntrack nf_conntrack ipt_REJECT iptable_filter ip_tables tun bridge stp llc rpcsec_gss_krb5 nfsv4 dns_resolver nfs fscache xprtrdma ib_isert iscsi_target_mod ib_iser libiscsi scsi_transport_iscsi ib_srpt target_core_mod ib_srp scsi_transport_srp scsi_tgt ib_ipoib rdma_ucm ib_ucm ib_uverbs ib_umad rdma_cm ib_cm iw_cm ipmi_devintf iTCO_wdt iTCO_vendor_support intel_powerclamp coretemp dcdbas intel_rapl kvm_intel kvm crct10dif_pclmul crc32_pclmul ghash_clmulni_intel aesni_intel lrw gf128mul glue_helper ablk_helper cryptd pcspkr sb_edac edac_core ipmi_si ipmi_msghandler acpi_pad wmi acpi_power_meter lpc_ich mfd_core mei_me
[  327.539488]  mei shpchp nfsd auth_rpcgss nfs_acl lockd grace sunrpc mlx4_ib ib_sa ib_mad ib_core mlx4_en vxlan ib_addr ip_tunnel xfs libcrc32c sd_mod crc_t10dif crct10dif_common crc32c_intel mgag200 syscopyarea sysfillrect sysimgblt i2c_algo_bit drm_kms_helper ttm drm ahci i2c_core libahci mlx4_core libata tg3 ptp pps_core megaraid_sas ntb dm_mirror dm_region_hash dm_log dm_mod
[  327.539956] CPU: 3 PID: 3161 Comm: qemu-kvm Not tainted 3.10.0-240.el7.userfault19.4ca4011.x86_64.debug #1
[  327.540045] Hardware name: Dell Inc. PowerEdge R420/0CN7CM, BIOS 2.1.2 01/20/2014
[  327.540115] task: ffff8803280ccf00 ti: ffff880317c58000 task.ti: ffff880317c58000
[  327.540184] RIP: 0010:[<ffffffff811a7b55>]  [<ffffffff811a7b55>] put_page+0x5/0x50
[  327.540261] RSP: 0018:ffff880317c5bcf8  EFLAGS: 00010246
[  327.540313] RAX: 00057ffffffff000 RBX: ffff880616a20000 RCX: 0000000000000000
[  327.540379] RDX: 0000000000002014 RSI: 00057ffffffff000 RDI: fffffffffffffffe
[  327.540445] RBP: ffff880317c5bd10 R08: 0000000000000103 R09: 0000000000000000
[  327.540511] R10: 0000000000000000 R11: 0000000000000000 R12: fffffffffffffffe
[  327.540576] R13: 0000000000000000 R14: ffff880317c5bd70 R15: ffff880317c5bd50
[  327.540643] FS:  00007fd230b7f700(0000) GS:ffff880630800000(0000) knlGS:0000000000000000
[  327.540717] CS:  0010 DS: 0000 ES: 0000 CR0: 0000000080050033
[  327.540771] CR2: fffffffffffffffe CR3: 000000062a2c3000 CR4: 00000000000427e0
[  327.540837] DR0: 0000000000000000 DR1: 0000000000000000 DR2: 0000000000000000
[  327.540904] DR3: 0000000000000000 DR6: 00000000ffff0ff0 DR7: 0000000000000400
[  327.540974] Stack:
[  327.541008]  ffffffffa05d6d0c ffff880616a20000 0000000000000000 ffff880317c5bdc0
[  327.541093]  ffffffffa05ddaa2 0000000000000000 00000000002191bf 00000042f3feab2d
[  327.541177]  00000042f3feab2d 0000000000000002 0000000000000001 0321000000000000
[  327.541261] Call Trace:
[  327.541321]  [<ffffffffa05d6d0c>] ? kvm_vcpu_reload_apic_access_page+0x6c/0x80 [kvm]
[  327.543615]  [<ffffffffa05ddaa2>] vcpu_enter_guest+0x3f2/0x10f0 [kvm]
[  327.545918]  [<ffffffffa05e2f10>] kvm_arch_vcpu_ioctl_run+0x2b0/0x5a0 [kvm]
[  327.548211]  [<ffffffffa05e2d02>] ? kvm_arch_vcpu_ioctl_run+0xa2/0x5a0 [kvm]
[  327.550500]  [<ffffffffa05ca845>] kvm_vcpu_ioctl+0x2b5/0x680 [kvm]
[  327.552768]  [<ffffffff810b8d12>] ? creds_are_invalid.part.1+0x12/0x50
[  327.555069]  [<ffffffff810b8d71>] ? creds_are_invalid+0x21/0x30
[  327.557373]  [<ffffffff812d6066>] ? inode_has_perm.isra.49.constprop.65+0x26/0x80
[  327.559663]  [<ffffffff8122d985>] do_vfs_ioctl+0x305/0x530
[  327.561917]  [<ffffffff8122dc51>] SyS_ioctl+0xa1/0xc0
[  327.564185]  [<ffffffff816de829>] system_call_fastpath+0x16/0x1b
[  327.566480] Code: 0b 31 f6 4c 89 e7 e8 4b 7f ff ff 0f 0b e8 24 fd ff ff e9 a9 fd ff ff 66 66 66 66 66 66 2e 0f 1f 84 00 00 00 00 00 66 66 66 66 90 <48> f7 07 00 c0 00 00 55 48 89 e5 75 2a 8b 47 1c 85 c0 74 1e f0

Signed-off-by: Andrea Arcangeli <aarcange@redhat.com>
Cc: stable@vger.kernel.org
Signed-off-by: Paolo Bonzini <pbonzini@redhat.com>
2015-05-20 12:30:06 +02:00
Ingo Molnar
6f56a8d024 Merge branch 'x86/urgent' into x86/fpu, to resolve a conflict
Conflicts:
	arch/x86/kernel/i387.c

This commit is conflicting:

  e88221c50c ("x86/fpu: Disable XSAVES* support for now")

These functions changed a lot, move the quirk to arch/x86/kernel/fpu/init.c's
fpu__init_system_xstate_size_legacy().

Signed-off-by: Ingo Molnar <mingo@kernel.org>
2015-05-20 12:01:01 +02:00
Ingo Molnar
e88221c50c x86/fpu: Disable XSAVES* support for now
The kernel's handling of 'compacted' xsave state layout is buggy:

    http://marc.info/?l=linux-kernel&m=142967852317199

I don't have such a system, and the description there is vague, but
from extrapolation I guess that there were two kinds of bugs
observed:

  - boot crashes, due to size calculations being wrong and the dynamic
    allocation allocating a too small xstate area. (This is now fixed
    in the new FPU code - but still present in stable kernels.)

  - FPU state corruption and ABI breakage: if signal handlers try to
    change the FPU state in standard format, which then the kernel
    tries to restore in the compacted format.

These breakages are scary, but they only occur on a small number of
systems that have XSAVES* CPU support. Yet we have had XSAVES support
in the upstream kernel for a large number of stable kernel releases,
and the fixes are involved and unproven.

So do the safe resolution first: disable XSAVES* support and only
use the standard xstate format. This makes the code work and is
easy to backport.

On top of this we can work on enabling (and testing!) proper
compacted format support, without backporting pressure, on top of the
new, cleaned up FPU code.

Cc: <stable@vger.kernel.org>
Cc: Andy Lutomirski <luto@amacapital.net>
Cc: Borislav Petkov <bp@alien8.de>
Cc: Dave Hansen <dave.hansen@linux.intel.com>
Cc: Fenghua Yu <fenghua.yu@intel.com>
Cc: H. Peter Anvin <hpa@zytor.com>
Cc: Linus Torvalds <torvalds@linux-foundation.org>
Cc: Oleg Nesterov <oleg@redhat.com>
Cc: Peter Zijlstra <peterz@infradead.org>
Cc: Thomas Gleixner <tglx@linutronix.de>
Signed-off-by: Ingo Molnar <mingo@kernel.org>
2015-05-20 11:58:26 +02:00
Nicholas Krause
ed3cf15271 kvm: x86: Make functions that have no external callers static
This makes the functions kvm_guest_cpu_init and  kvm_init_debugfs
static now due to having no external callers outside their
declarations in the file, kvm.c.

Signed-off-by: Nicholas Krause <xerofoify@gmail.com>
Signed-off-by: Paolo Bonzini <pbonzini@redhat.com>
2015-05-20 11:48:21 +02:00
Ingo Molnar
5856afed0c x86/fpu/init: Clean up and comment the __setup() functions
Explain the functions and also standardize their style
and naming.

No change in functionality.

Cc: Andy Lutomirski <luto@amacapital.net>
Cc: Borislav Petkov <bp@alien8.de>
Cc: Dave Hansen <dave.hansen@linux.intel.com>
Cc: Fenghua Yu <fenghua.yu@intel.com>
Cc: H. Peter Anvin <hpa@zytor.com>
Cc: Linus Torvalds <torvalds@linux-foundation.org>
Cc: Oleg Nesterov <oleg@redhat.com>
Cc: Peter Zijlstra <peterz@infradead.org>
Cc: Thomas Gleixner <tglx@linutronix.de>
Cc: linux-kernel@vger.kernel.org
Signed-off-by: Ingo Molnar <mingo@kernel.org>
2015-05-20 11:39:35 +02:00
Ingo Molnar
7cf82d33b6 x86/fpu/init: Move __setup() functions to fpu/init.c
We had a number of FPU init related boot option handlers
in arch/x86/kernel/cpu/common.c - move them over into
arch/x86/kernel/fpu/init.c to have them all in a
single place.

Cc: Andy Lutomirski <luto@amacapital.net>
Cc: Borislav Petkov <bp@alien8.de>
Cc: Dave Hansen <dave.hansen@linux.intel.com>
Cc: Fenghua Yu <fenghua.yu@intel.com>
Cc: H. Peter Anvin <hpa@zytor.com>
Cc: Linus Torvalds <torvalds@linux-foundation.org>
Cc: Oleg Nesterov <oleg@redhat.com>
Cc: Peter Zijlstra <peterz@infradead.org>
Cc: Thomas Gleixner <tglx@linutronix.de>
Cc: linux-kernel@vger.kernel.org
Signed-off-by: Ingo Molnar <mingo@kernel.org>
2015-05-20 11:35:42 +02:00
Ingo Molnar
b11ca7fbc7 x86/fpu/xstate: Use explicit parameter in xstate_fault()
While looking at xstate.h it took me some time to realize that
'xstate_fault' uses 'err' as a silent parameter. This is not
obvious at the call site, at all.

Make it an explicit macro argument, so that the syntactic
connection is easier to see. Also explain xstate_fault()
a bit.

Cc: Andy Lutomirski <luto@amacapital.net>
Cc: Borislav Petkov <bp@alien8.de>
Cc: Dave Hansen <dave.hansen@linux.intel.com>
Cc: Fenghua Yu <fenghua.yu@intel.com>
Cc: H. Peter Anvin <hpa@zytor.com>
Cc: Linus Torvalds <torvalds@linux-foundation.org>
Cc: Oleg Nesterov <oleg@redhat.com>
Cc: Peter Zijlstra <peterz@infradead.org>
Cc: Thomas Gleixner <tglx@linutronix.de>
Cc: linux-kernel@vger.kernel.org
Signed-off-by: Ingo Molnar <mingo@kernel.org>
2015-05-20 10:02:22 +02:00
Dave Airlie
bdcddf95e8 Backmerge v4.1-rc4 into into drm-next
We picked up a silent conflict in amdkfd with drm-fixes and drm-next,
backmerge v4.1-rc5 and fix the conflicts

Signed-off-by: Dave Airlie <airlied@redhat.com>

Conflicts:
	drivers/gpu/drm/drm_irq.c
2015-05-20 16:23:53 +10:00
Paolo Bonzini
3520469d65 KVM: export __gfn_to_pfn_memslot, drop gfn_to_pfn_async
gfn_to_pfn_async is used in just one place, and because of x86-specific
treatment that place will need to look at the memory slot.  Hence inline
it into try_async_pf and export __gfn_to_pfn_memslot.

The patch also switches the subsequent call to gfn_to_pfn_prot to use
__gfn_to_pfn_memslot.  This is a small optimization.  Finally, remove
the now-unused async argument of __gfn_to_pfn.

Signed-off-by: Paolo Bonzini <pbonzini@redhat.com>
2015-05-19 20:52:45 +02:00
Xiao Guangrong
d81135a57a KVM: x86: do not reset mmu if CR0.CD and CR0.NW are changed
CR0.CD and CR0.NW are not used by shadow page table so that need
not adjust mmu if these two bit are changed

Signed-off-by: Xiao Guangrong <guangrong.xiao@linux.intel.com>
Signed-off-by: Paolo Bonzini <pbonzini@redhat.com>
2015-05-19 20:52:43 +02:00
Xiao Guangrong
efdfe536d8 KVM: MMU: fix MTRR update
Currently, whenever guest MTRR registers are changed
kvm_mmu_reset_context is called to switch to the new root shadow page
table, however, it's useless since:
1) the cache type is not cached into shadow page's attribute so that
   the original root shadow page will be reused

2) the cache type is set on the last spte, that means we should sync
   the last sptes when MTRR is changed

This patch fixs this issue by drop all the spte in the gfn range which
is being updated by MTRR

Signed-off-by: Xiao Guangrong <guangrong.xiao@linux.intel.com>
Signed-off-by: Paolo Bonzini <pbonzini@redhat.com>
2015-05-19 20:52:42 +02:00
Xiao Guangrong
d69afbc6b1 KVM: MMU: fix decoding cache type from MTRR
There are some bugs in current get_mtrr_type();
1: bit 1 of mtrr_state->enabled is corresponding bit 11 of
   IA32_MTRR_DEF_TYPE MSR which completely control MTRR's enablement
   that means other bits are ignored if it is cleared

2: the fixed MTRR ranges are controlled by bit 0 of
   mtrr_state->enabled (bit 10 of IA32_MTRR_DEF_TYPE)

3: if MTRR is disabled, UC is applied to all of physical memory rather
   than mtrr_state->def_type

Signed-off-by: Xiao Guangrong <guangrong.xiao@linux.intel.com>
Reviewed-by: Wanpeng Li <wanpeng.li@linux.intel.com>
Signed-off-by: Paolo Bonzini <pbonzini@redhat.com>
2015-05-19 20:52:42 +02:00
Xiao Guangrong
6a49f85c7a KVM: MMU: introduce kvm_zap_rmapp
Split kvm_unmap_rmapp and introduce kvm_zap_rmapp which will be used in the
later patch

Signed-off-by: Xiao Guangrong <guangrong.xiao@linux.intel.com>
Signed-off-by: Paolo Bonzini <pbonzini@redhat.com>
2015-05-19 20:52:41 +02:00
Xiao Guangrong
d77aa73c70 KVM: MMU: use slot_handle_level and its helper to clean up the code
slot_handle_level and its helper functions are ready now, use them to
clean up the code

Signed-off-by: Xiao Guangrong <guangrong.xiao@linux.intel.com>
Signed-off-by: Paolo Bonzini <pbonzini@redhat.com>
2015-05-19 20:52:41 +02:00
Xiao Guangrong
1bad2b2a3b KVM: MMU: introduce slot_handle_level_range() and its helpers
There are several places walking all rmaps for the memslot so that
introduce common functions to cleanup the code

Signed-off-by: Xiao Guangrong <guangrong.xiao@linux.intel.com>
Signed-off-by: Paolo Bonzini <pbonzini@redhat.com>
2015-05-19 20:52:40 +02:00
Xiao Guangrong
6ce1f4e295 KVM: MMU: introduce for_each_slot_rmap_range
It's used to abstract the code from kvm_handle_hva_range and it will be
used by later patch

Signed-off-by: Xiao Guangrong <guangrong.xiao@linux.intel.com>
Signed-off-by: Paolo Bonzini <pbonzini@redhat.com>
2015-05-19 20:52:39 +02:00
Xiao Guangrong
8a3d08f16f KVM: MMU: introduce PT_MAX_HUGEPAGE_LEVEL
Suggested-by: Paolo Bonzini <pbonzini@redhat.com>
Signed-off-by: Xiao Guangrong <guangrong.xiao@linux.intel.com>
Signed-off-by: Paolo Bonzini <pbonzini@redhat.com>
2015-05-19 20:52:39 +02:00
Xiao Guangrong
0d5367900a KVM: MMU: introduce for_each_rmap_spte()
It's used to walk all the sptes on the rmap to clean up the
code

Signed-off-by: Xiao Guangrong <guangrong.xiao@linux.intel.com>
Signed-off-by: Paolo Bonzini <pbonzini@redhat.com>
2015-05-19 20:52:38 +02:00
Paolo Bonzini
c35ebbeade Revert "kvmclock: set scheduler clock stable"
This reverts commit ff7bbb9c6a.
Sasha Levin is seeing odd jump in time values during boot of a KVM guest:

[...]
[    0.000000] tsc: Detected 2260.998 MHz processor
[3376355.247558] Calibrating delay loop (skipped) preset value..
[...]

and bisected them to this commit.

Reported-by: Sasha Levin <sasha.levin@oracle.com>
Signed-off-by: Paolo Bonzini <pbonzini@redhat.com>
2015-05-19 20:52:37 +02:00
Xiao Guangrong
edc90b7dc4 KVM: MMU: fix SMAP virtualization
KVM may turn a user page to a kernel page when kernel writes a readonly
user page if CR0.WP = 1. This shadow page entry will be reused after
SMAP is enabled so that kernel is allowed to access this user page

Fix it by setting SMAP && !CR0.WP into shadow page's role and reset mmu
once CR4.SMAP is updated

Signed-off-by: Xiao Guangrong <guangrong.xiao@linux.intel.com>
Signed-off-by: Paolo Bonzini <pbonzini@redhat.com>
2015-05-19 20:52:36 +02:00
Nadav Amit
428e3d0857 KVM: x86: Fix zero iterations REP-string
When a REP-string is executed in 64-bit mode with an address-size prefix,
ECX/EDI/ESI are used as counter and pointers. When ECX is initially zero, Intel
CPUs clear the high 32-bits of RCX, and recent Intel CPUs update the high bits
of the pointers in MOVS/STOS. This behavior is specific to Intel according to
few experiments.

As one may guess, this is an undocumented behavior. Yet, it is observable in
the guest, since at least VMX traps REP-INS/OUTS even when ECX=0. Note that
VMware appears to get it right.  The behavior can be observed using the
following code:

 #include <stdio.h>

 #define LOW_MASK	(0xffffffff00000000ull)
 #define ALL_MASK	(0xffffffffffffffffull)
 #define TEST(opcode)							\
	do {								\
	asm volatile(".byte 0xf2 \n\t .byte 0x67 \n\t .byte " opcode "\n\t" \
			: "=S"(s), "=c"(c), "=D"(d) 			\
			: "S"(ALL_MASK), "c"(LOW_MASK), "D"(ALL_MASK));	\
	printf("opcode %s rcx=%llx rsi=%llx rdi=%llx\n",		\
		opcode, c, s, d);					\
	} while(0)

void main()
{
	unsigned long long s, d, c;
	iopl(3);
	TEST("0x6c");
	TEST("0x6d");
	TEST("0x6e");
	TEST("0x6f");
	TEST("0xa4");
	TEST("0xa5");
	TEST("0xa6");
	TEST("0xa7");
	TEST("0xaa");
	TEST("0xab");
	TEST("0xae");
	TEST("0xaf");
}

Signed-off-by: Nadav Amit <namit@cs.technion.ac.il>
Signed-off-by: Paolo Bonzini <pbonzini@redhat.com>
2015-05-19 20:52:36 +02:00
Nadav Amit
ee122a7109 KVM: x86: Fix update RCX/RDI/RSI on REP-string
When REP-string instruction is preceded with an address-size prefix,
ECX/EDI/ESI are used as the operation counter and pointers.  When they are
updated, the high 32-bits of RCX/RDI/RSI are cleared, similarly to the way they
are updated on every 32-bit register operation.  Fix it.

Signed-off-by: Nadav Amit <namit@cs.technion.ac.il>
Signed-off-by: Paolo Bonzini <pbonzini@redhat.com>
2015-05-19 20:52:35 +02:00
Nadav Amit
3db176d5b4 KVM: x86: Fix DR7 mask on task-switch while debugging
If the host sets hardware breakpoints to debug the guest, and a task-switch
occurs in the guest, the architectural DR7 will not be updated. The effective
DR7 would be updated instead.

This fix puts the DR7 update during task-switch emulation, so it now uses the
standard DR setting mechanism instead of the one that was previously used. As a
bonus, the update of DR7 will now be effective for AMD as well.

Signed-off-by: Nadav Amit <namit@cs.technion.ac.il>
Signed-off-by: Paolo Bonzini <pbonzini@redhat.com>
2015-05-19 20:52:35 +02:00
Christoph Hellwig
c546d5db75 remove scatterlist.h generation from arch Kbuild files
Signed-off-by: Christoph Hellwig <hch@lst.de>
Reported-by: Geert Uytterhoeven <geert@linux-m68k.org>
Signed-off-by: Jens Axboe <axboe@fb.com>
2015-05-19 09:14:34 -06:00
Thomas Gleixner
c3b5d3cea5 Merge branch 'linus' into timers/core
Make sure the upstream fixes are applied before adding further
modifications.
2015-05-19 16:12:32 +02:00
Feng Wu
501b32653e x86/irq: Show statistics information for posted-interrupts
Show the statistics information for notification event
and wakeup event for posted-interrupt in /proc/interrupts.

[ tglx: Named the short identifiers PIN and PIW to match the long
  	identifiers ]

Signed-off-by: Feng Wu <feng.wu@intel.com>
Cc: jiang.liu@linux.intel.com
Link: http://lkml.kernel.org/r/1432026437-16560-5-git-send-email-feng.wu@intel.com
Signed-off-by: Thomas Gleixner <tglx@linutronix.de>
2015-05-19 15:51:17 +02:00
Feng Wu
f6b3c72c23 x86/irq: Define a global vector for VT-d Posted-Interrupts
Currently, we use a global vector as the Posted-Interrupts
Notification Event for all the vCPUs in the system. We need
to introduce another global vector for VT-d Posted-Interrtups,
which will be used to wakeup the sleep vCPU when an external
interrupt from a direct-assigned device happens for that vCPU.

[ tglx: Removed a gazillion of extra newlines ]

Signed-off-by: Feng Wu <feng.wu@intel.com>
Cc: jiang.liu@linux.intel.com
Link: http://lkml.kernel.org/r/1432026437-16560-4-git-send-email-feng.wu@intel.com
Suggested-by: Yang Zhang <yang.z.zhang@intel.com>
Acked-by: H. Peter Anvin <hpa@linux.intel.com>
Signed-off-by: Thomas Gleixner <tglx@linutronix.de>
2015-05-19 15:51:17 +02:00
Feng Wu
a2f1c8bdc0 x86/irq/msi: Implement irq_set_vcpu_affinity for remapped MSI irqs
Implement irq_set_vcpu_affinity for pci_msi_ir_controller.

Signed-off-by: Feng Wu <feng.wu@intel.com>
Reviewed-by: Jiang Liu <jiang.liu@linux.intel.com>
Link: http://lkml.kernel.org/r/1432026437-16560-3-git-send-email-feng.wu@intel.com
Signed-off-by: Thomas Gleixner <tglx@linutronix.de>
2015-05-19 15:51:17 +02:00
Ingo Molnar
b1b64dc355 x86/fpu: Reorganize fpu/internal.h
fpu/internal.h has grown organically, with not much high level structure,
which hurts its readability.

Organize the various definitions into 5 sections:

 - high level FPU state functions
 - FPU/CPU feature flag helpers
 - fpstate handling functions
 - FPU context switching helpers
 - misc helper functions

Other related changes:

 - Move MXCSR_DEFAULT to fpu/types.h.
 - drop the unused X87_FSW_ES define

No change in functionality.

Cc: Andy Lutomirski <luto@amacapital.net>
Cc: Borislav Petkov <bp@alien8.de>
Cc: Dave Hansen <dave.hansen@linux.intel.com>
Cc: Fenghua Yu <fenghua.yu@intel.com>
Cc: H. Peter Anvin <hpa@zytor.com>
Cc: Linus Torvalds <torvalds@linux-foundation.org>
Cc: Oleg Nesterov <oleg@redhat.com>
Cc: Peter Zijlstra <peterz@infradead.org>
Cc: Thomas Gleixner <tglx@linutronix.de>
Signed-off-by: Ingo Molnar <mingo@kernel.org>
2015-05-19 15:48:12 +02:00
Ingo Molnar
e97131a839 x86/fpu: Add CONFIG_X86_DEBUG_FPU=y FPU debugging code
There are various internal FPU state debugging checks that never
trigger in practice, but which are useful for FPU code development.

Separate these out into CONFIG_X86_DEBUG_FPU=y, and also add a
couple of new ones.

The size difference is about 0.5K of code on defconfig:

   text        data     bss          filename
   15028906    2578816  1638400      vmlinux
   15029430    2578816  1638400      vmlinux

( Keep this enabled by default until the new FPU code is debugged. )

Cc: Andy Lutomirski <luto@amacapital.net>
Cc: Borislav Petkov <bp@alien8.de>
Cc: Dave Hansen <dave.hansen@linux.intel.com>
Cc: Fenghua Yu <fenghua.yu@intel.com>
Cc: H. Peter Anvin <hpa@zytor.com>
Cc: Linus Torvalds <torvalds@linux-foundation.org>
Cc: Oleg Nesterov <oleg@redhat.com>
Cc: Peter Zijlstra <peterz@infradead.org>
Cc: Thomas Gleixner <tglx@linutronix.de>
Signed-off-by: Ingo Molnar <mingo@kernel.org>
2015-05-19 15:48:12 +02:00
Ingo Molnar
d364a7656c x86/fpu: Fix the 'nofxsr' boot parameter to also clear X86_FEATURE_FXSR_OPT
I tried to simulate an ancient CPU via this option, and
found that it still has fxsr_opt enabled, confusing the
FPU code.

Make the 'nofxsr' option also clear FXSR_OPT flag.

Cc: Andy Lutomirski <luto@amacapital.net>
Cc: Borislav Petkov <bp@alien8.de>
Cc: Dave Hansen <dave.hansen@linux.intel.com>
Cc: Fenghua Yu <fenghua.yu@intel.com>
Cc: H. Peter Anvin <hpa@zytor.com>
Cc: Linus Torvalds <torvalds@linux-foundation.org>
Cc: Oleg Nesterov <oleg@redhat.com>
Cc: Peter Zijlstra <peterz@infradead.org>
Cc: Thomas Gleixner <tglx@linutronix.de>
Signed-off-by: Ingo Molnar <mingo@kernel.org>
2015-05-19 15:48:12 +02:00
Ingo Molnar
e1884d69f6 x86/fpu: Pass 'struct fpu' to fpu__restore()
This cleans up the call sites and the function a bit,
and also makes it more symmetric with the other high
level FPU state handling functions.

It's still only valid for the current task, as we copy
to the FPU registers of the current CPU.

No change in functionality.

Cc: Andy Lutomirski <luto@amacapital.net>
Cc: Borislav Petkov <bp@alien8.de>
Cc: Dave Hansen <dave.hansen@linux.intel.com>
Cc: Fenghua Yu <fenghua.yu@intel.com>
Cc: H. Peter Anvin <hpa@zytor.com>
Cc: Linus Torvalds <torvalds@linux-foundation.org>
Cc: Oleg Nesterov <oleg@redhat.com>
Cc: Peter Zijlstra <peterz@infradead.org>
Cc: Thomas Gleixner <tglx@linutronix.de>
Signed-off-by: Ingo Molnar <mingo@kernel.org>
2015-05-19 15:48:11 +02:00
Ingo Molnar
32231879f6 x86/fpu/init: Propagate __init annotations
Now that all the FPU init function call dependencies are
cleaned up we can propagate __init annotations deeper.

This shrinks the runtime size of the kernel a bit, and
also addresses a few section warnings.

Cc: Andy Lutomirski <luto@amacapital.net>
Cc: Borislav Petkov <bp@alien8.de>
Cc: Dave Hansen <dave.hansen@linux.intel.com>
Cc: Fenghua Yu <fenghua.yu@intel.com>
Cc: H. Peter Anvin <hpa@zytor.com>
Cc: Linus Torvalds <torvalds@linux-foundation.org>
Cc: Oleg Nesterov <oleg@redhat.com>
Cc: Peter Zijlstra <peterz@infradead.org>
Cc: Thomas Gleixner <tglx@linutronix.de>
Signed-off-by: Ingo Molnar <mingo@kernel.org>
2015-05-19 15:48:11 +02:00
Ingo Molnar
5fd402dfa7 x86/fpu/xstate: Clean up setup_xstate_comp() call
So call setup_xstate_comp() from the xstate init code, not
from the generic fpu__init_system() code.

This allows us to remove the protytype from xstate.h as well.

Cc: Andy Lutomirski <luto@amacapital.net>
Cc: Borislav Petkov <bp@alien8.de>
Cc: Dave Hansen <dave.hansen@linux.intel.com>
Cc: Fenghua Yu <fenghua.yu@intel.com>
Cc: H. Peter Anvin <hpa@zytor.com>
Cc: Linus Torvalds <torvalds@linux-foundation.org>
Cc: Oleg Nesterov <oleg@redhat.com>
Cc: Peter Zijlstra <peterz@infradead.org>
Cc: Thomas Gleixner <tglx@linutronix.de>
Signed-off-by: Ingo Molnar <mingo@kernel.org>
2015-05-19 15:48:11 +02:00
Ingo Molnar
489e9c0188 x86/fpu: Clean up xstate feature reservation
Put MPX support into its separate high level structure, and
also replace the fixed YMM, LWP and MPX structures in
xregs_state with just reservations - their exact offsets
in the structure will depend on the CPU and no code actually
relies on those fields.

No change in functionality.

Cc: Andy Lutomirski <luto@amacapital.net>
Cc: Borislav Petkov <bp@alien8.de>
Cc: Dave Hansen <dave.hansen@linux.intel.com>
Cc: Fenghua Yu <fenghua.yu@intel.com>
Cc: H. Peter Anvin <hpa@zytor.com>
Cc: Linus Torvalds <torvalds@linux-foundation.org>
Cc: Oleg Nesterov <oleg@redhat.com>
Cc: Peter Zijlstra <peterz@infradead.org>
Cc: Thomas Gleixner <tglx@linutronix.de>
Signed-off-by: Ingo Molnar <mingo@kernel.org>
2015-05-19 15:48:10 +02:00
Ingo Molnar
39f1acd243 x86/fpu/xstate: Don't assume the first zero xfeatures zero bit means the end
The current xstate code in setup_xstate_features() assumes that
the first zero bit means the end of xfeatures - but that is not
so, the SDM clearly states that an arbitrary set of xfeatures
might be enabled - and it is also clear from the description
of the compaction feature that holes are possible:

  "13-6 Vol. 1MANAGING STATE USING THE XSAVE FEATURE SET
  [...]

  Compacted format. Each state component i (i ≥ 2) is located at a byte
  offset from the base address of the XSAVE area based on the XCOMP_BV
  field in the XSAVE header:

  — If XCOMP_BV[i] = 0, state component i is not in the XSAVE area.

  — If XCOMP_BV[i] = 1, the following items apply:

  • If XCOMP_BV[j] = 0 for every j, 2 ≤ j < i, state component i is
    located at a byte offset 576 from the base address of the XSAVE
    area. (This item applies if i is the first bit set in bits 62:2 of
    the XCOMP_BV; it implies that state component i is located at the
    beginning of the extended region.)

  • Otherwise, let j, 2 ≤ j < i, be the greatest value such that
    XCOMP_BV[j] = 1. Then state component i is located at a byte offset
    X from the location of state component j, where X is the number of
    bytes required for state component j as enumerated in
    CPUID.(EAX=0DH,ECX=j):EAX. (This item implies that state component i
    immediately follows the preceding state component whose bit is set
    in XCOMP_BV.)"

So don't assume that the first zero xfeatures bit means the end of
all xfeatures - iterate through all of them.

I'm not aware of hardware that triggers this currently.

Cc: Andy Lutomirski <luto@amacapital.net>
Cc: Borislav Petkov <bp@alien8.de>
Cc: Dave Hansen <dave.hansen@linux.intel.com>
Cc: Fenghua Yu <fenghua.yu@intel.com>
Cc: H. Peter Anvin <hpa@zytor.com>
Cc: Linus Torvalds <torvalds@linux-foundation.org>
Cc: Oleg Nesterov <oleg@redhat.com>
Cc: Peter Zijlstra <peterz@infradead.org>
Cc: Thomas Gleixner <tglx@linutronix.de>
Signed-off-by: Ingo Molnar <mingo@kernel.org>
2015-05-19 15:48:10 +02:00
Ingo Molnar
63c6680cd0 x86/fpu: Move debugging check from kernel_fpu_begin() to __kernel_fpu_begin()
kernel_fpu_begin() is __kernel_fpu_begin() with a preempt_disable().

Move the kernel_fpu_begin() debugging check into __kernel_fpu_begin(),
so that users of __kernel_fpu_begin() may benefit from it as well.

Cc: Andy Lutomirski <luto@amacapital.net>
Cc: Borislav Petkov <bp@alien8.de>
Cc: Dave Hansen <dave.hansen@linux.intel.com>
Cc: Fenghua Yu <fenghua.yu@intel.com>
Cc: H. Peter Anvin <hpa@zytor.com>
Cc: Linus Torvalds <torvalds@linux-foundation.org>
Cc: Oleg Nesterov <oleg@redhat.com>
Cc: Peter Zijlstra <peterz@infradead.org>
Cc: Thomas Gleixner <tglx@linutronix.de>
Signed-off-by: Ingo Molnar <mingo@kernel.org>
2015-05-19 15:48:10 +02:00
Ingo Molnar
bdf80d1040 x86/fpu: Document the various fpregs state formats
Document all the structures that make up 'struct fpu'.

Cc: Andy Lutomirski <luto@amacapital.net>
Cc: Borislav Petkov <bp@alien8.de>
Cc: Dave Hansen <dave.hansen@linux.intel.com>
Cc: Fenghua Yu <fenghua.yu@intel.com>
Cc: H. Peter Anvin <hpa@zytor.com>
Cc: Linus Torvalds <torvalds@linux-foundation.org>
Cc: Oleg Nesterov <oleg@redhat.com>
Cc: Peter Zijlstra <peterz@infradead.org>
Cc: Thomas Gleixner <tglx@linutronix.de>
Signed-off-by: Ingo Molnar <mingo@kernel.org>
2015-05-19 15:48:09 +02:00
Ingo Molnar
aeb997b9f2 x86/fpu: Change fpu->fpregs_active from 'int' to 'char', add lazy switching comments
Improve the memory layout of 'struct fpu':

 - change ->fpregs_active from 'int' to 'char' - it's just a single flag
   and modern x86 CPUs can do efficient byte accesses.

 - pack related fields closer to each other: often 'fpu->state' will not be
   touched, while the other fields will - so pack them into a group.

Also add comments to each field, describing their purpose, and add
some background information about lazy restores.

Also fix an obsolete, lazy switching related comment in fpu_copy()'s description.

Cc: Andy Lutomirski <luto@amacapital.net>
Cc: Borislav Petkov <bp@alien8.de>
Cc: Dave Hansen <dave.hansen@linux.intel.com>
Cc: Fenghua Yu <fenghua.yu@intel.com>
Cc: H. Peter Anvin <hpa@zytor.com>
Cc: Linus Torvalds <torvalds@linux-foundation.org>
Cc: Oleg Nesterov <oleg@redhat.com>
Cc: Peter Zijlstra <peterz@infradead.org>
Cc: Thomas Gleixner <tglx@linutronix.de>
Signed-off-by: Ingo Molnar <mingo@kernel.org>
2015-05-19 15:48:09 +02:00
Ingo Molnar
c47ada305d x86/fpu: Harmonize FPU register state types
Use these consistent names:

    struct fregs_state           # was: i387_fsave_struct
    struct fxregs_state          # was: i387_fxsave_struct
    struct swregs_state          # was: i387_soft_struct
    struct xregs_state           # was: xsave_struct
    union  fpregs_state          # was: thread_xstate

Cc: Andy Lutomirski <luto@amacapital.net>
Cc: Borislav Petkov <bp@alien8.de>
Cc: Dave Hansen <dave.hansen@linux.intel.com>
Cc: Fenghua Yu <fenghua.yu@intel.com>
Cc: H. Peter Anvin <hpa@zytor.com>
Cc: Linus Torvalds <torvalds@linux-foundation.org>
Cc: Oleg Nesterov <oleg@redhat.com>
Cc: Peter Zijlstra <peterz@infradead.org>
Cc: Thomas Gleixner <tglx@linutronix.de>
Signed-off-by: Ingo Molnar <mingo@kernel.org>
2015-05-19 15:48:09 +02:00
Ingo Molnar
0c306bcfba x86/fpu: Factor out the FPU regset code into fpu/regset.c
So much of fpu/core.c is the regset code, but it just obscures the generic
FPU state machine logic. Factor out the regset code into fpu/regset.c, where
it can be read in isolation.

This affects one API: fpu__activate_stopped() has to be made available
from the core to fpu/regset.c.

No change in functionality.

Cc: Andy Lutomirski <luto@amacapital.net>
Cc: Borislav Petkov <bp@alien8.de>
Cc: Dave Hansen <dave.hansen@linux.intel.com>
Cc: Fenghua Yu <fenghua.yu@intel.com>
Cc: H. Peter Anvin <hpa@zytor.com>
Cc: Linus Torvalds <torvalds@linux-foundation.org>
Cc: Oleg Nesterov <oleg@redhat.com>
Cc: Peter Zijlstra <peterz@infradead.org>
Cc: Thomas Gleixner <tglx@linutronix.de>
Signed-off-by: Ingo Molnar <mingo@kernel.org>
2015-05-19 15:48:09 +02:00
Ingo Molnar
b992c660d3 x86/fpu: Factor out fpu/signal.c
fpu/xstate.c has a lot of generic FPU signal frame handling routines,
move them into a separate file: fpu/signal.c.

Cc: Andy Lutomirski <luto@amacapital.net>
Cc: Borislav Petkov <bp@alien8.de>
Cc: Dave Hansen <dave.hansen@linux.intel.com>
Cc: Fenghua Yu <fenghua.yu@intel.com>
Cc: H. Peter Anvin <hpa@zytor.com>
Cc: Linus Torvalds <torvalds@linux-foundation.org>
Cc: Oleg Nesterov <oleg@redhat.com>
Cc: Peter Zijlstra <peterz@infradead.org>
Cc: Thomas Gleixner <tglx@linutronix.de>
Signed-off-by: Ingo Molnar <mingo@kernel.org>
2015-05-19 15:48:08 +02:00
Ingo Molnar
c681314421 x86/fpu: Rename all the fpregs, xregs, fxregs and fregs handling functions
Standardize the naming of the various functions that copy register
content in specific FPU context formats:

  copy_fxregs_to_kernel()         # was: fpu_fxsave()
  copy_xregs_to_kernel()          # was: xsave_state()

  copy_kernel_to_fregs()          # was: frstor_checking()
  copy_kernel_to_fxregs()         # was: fxrstor_checking()
  copy_kernel_to_xregs()          # was: fpu_xrstor_checking()
  copy_kernel_to_xregs_booting()  # was: xrstor_state_booting()

  copy_fregs_to_user()            # was: fsave_user()
  copy_fxregs_to_user()           # was: fxsave_user()
  copy_xregs_to_user()            # was: xsave_user()

  copy_user_to_fregs()            # was: frstor_user()
  copy_user_to_fxregs()           # was: fxrstor_user()
  copy_user_to_xregs()            # was: xrestore_user()
  copy_user_to_fpregs_zeroing()   # was: restore_user_xstate()

Eliminate fpu_xrstor_checking(), because it was just a wrapper.

No change in functionality.

Cc: Andy Lutomirski <luto@amacapital.net>
Cc: Borislav Petkov <bp@alien8.de>
Cc: Dave Hansen <dave.hansen@linux.intel.com>
Cc: Fenghua Yu <fenghua.yu@intel.com>
Cc: H. Peter Anvin <hpa@zytor.com>
Cc: Linus Torvalds <torvalds@linux-foundation.org>
Cc: Oleg Nesterov <oleg@redhat.com>
Cc: Peter Zijlstra <peterz@infradead.org>
Cc: Thomas Gleixner <tglx@linutronix.de>
Signed-off-by: Ingo Molnar <mingo@kernel.org>
2015-05-19 15:48:08 +02:00
Ingo Molnar
815418890e x86/fpu: Move restore_init_xstate() out of fpu/internal.h
Move restore_init_xstate() next to its sole caller.

Also rename it to copy_init_fpstate_to_fpregs() and add
some comments about what it does.

Cc: Andy Lutomirski <luto@amacapital.net>
Cc: Borislav Petkov <bp@alien8.de>
Cc: Dave Hansen <dave.hansen@linux.intel.com>
Cc: Fenghua Yu <fenghua.yu@intel.com>
Cc: H. Peter Anvin <hpa@zytor.com>
Cc: Linus Torvalds <torvalds@linux-foundation.org>
Cc: Oleg Nesterov <oleg@redhat.com>
Cc: Peter Zijlstra <peterz@infradead.org>
Cc: Thomas Gleixner <tglx@linutronix.de>
Signed-off-by: Ingo Molnar <mingo@kernel.org>
2015-05-19 15:48:08 +02:00
Ingo Molnar
6f57502310 x86/fpu: Generalize 'init_xstate_ctx'
So the handling of init_xstate_ctx has a layering violation: both
'struct xsave_struct' and 'union thread_xstate' have a
'struct i387_fxsave_struct' member:

   xsave_struct::i387
   thread_xstate::fxsave

The handling of init_xstate_ctx is generic, it is used on all
CPUs, with or without XSAVE instruction. So it's confusing how
the generic code passes around and handles an XSAVE specific
format.

What we really want is for init_xstate_ctx to be a proper
fpstate and we use its ::fxsave and ::xsave members, as
appropriate.

Since the xsave_struct::i387 and thread_xstate::fxsave aliases
each other this is not a functional problem.

So implement this, and move init_xstate_ctx to the generic FPU
code in the process.

Also, since init_xstate_ctx is not XSAVE specific anymore,
rename it to init_fpstate, and mark it __read_mostly,
because it's only modified once during bootup, and used
as a reference fpstate later on.

There's no change in functionality.

Cc: Andy Lutomirski <luto@amacapital.net>
Cc: Borislav Petkov <bp@alien8.de>
Cc: Dave Hansen <dave.hansen@linux.intel.com>
Cc: Fenghua Yu <fenghua.yu@intel.com>
Cc: H. Peter Anvin <hpa@zytor.com>
Cc: Linus Torvalds <torvalds@linux-foundation.org>
Cc: Oleg Nesterov <oleg@redhat.com>
Cc: Peter Zijlstra <peterz@infradead.org>
Cc: Thomas Gleixner <tglx@linutronix.de>
Signed-off-by: Ingo Molnar <mingo@kernel.org>
2015-05-19 15:48:07 +02:00