Commit Graph

22740 Commits

Author SHA1 Message Date
Dave Hansen
6ba48ff46f x86: Remove arbitrary instruction size limit in instruction decoder
The current x86 instruction decoder steps along through the
instruction stream but always ensures that it never steps farther
than the largest possible instruction size (MAX_INSN_SIZE).

The MPX code is now going to be doing some decoding of userspace
instructions.  We copy those from userspace in to the kernel and
they're obviously completely untrusted coming from userspace.  In
addition to the constraint that instructions can only be so long,
we also have to be aware of how long the buffer is that came in
from userspace.  This _looks_ to be similar to what the perf and
kprobes is doing, but it's unclear to me whether they are
affected.

The whole reason we need this is that it is perfectly valid to be
executing an instruction within MAX_INSN_SIZE bytes of an
unreadable page. We should be able to gracefully handle short
reads in those cases.

This adds support to the decoder to record how long the buffer
being decoded is and to refuse to "validate" the instruction if
we would have gone over the end of the buffer to decode it.

The kprobes code probably needs to be looked at here a bit more
carefully.  This patch still respects the MAX_INSN_SIZE limit
there but the kprobes code does look like it might be able to
be a bit more strict than it currently is.

Signed-off-by: Dave Hansen <dave.hansen@linux.intel.com>
Acked-by: Jim Keniston <jkenisto@us.ibm.com>
Acked-by: Masami Hiramatsu <masami.hiramatsu.pt@hitachi.com>
Cc: x86@kernel.org
Cc: Peter Zijlstra <a.p.zijlstra@chello.nl>
Cc: Paul Mackerras <paulus@samba.org>
Cc: Arnaldo Carvalho de Melo <acme@kernel.org>
Cc: Srikar Dronamraju <srikar@linux.vnet.ibm.com>
Cc: Ananth N Mavinakayanahalli <ananth@in.ibm.com>
Cc: Anil S Keshavamurthy <anil.s.keshavamurthy@intel.com>
Cc: "David S. Miller" <davem@davemloft.net>
Link: http://lkml.kernel.org/r/20141114153957.E6B01535@viggo.jf.intel.com
Signed-off-by: Thomas Gleixner <tglx@linutronix.de>
2014-11-18 00:58:52 +01:00
Nadav Amit
f210f7572b KVM: x86: Fix lost interrupt on irr_pending race
apic_find_highest_irr assumes irr_pending is set if any vector in APIC_IRR is
set.  If this assumption is broken and apicv is disabled, the injection of
interrupts may be deferred until another interrupt is delivered to the guest.
Ultimately, if no other interrupt should be injected to that vCPU, the pending
interrupt may be lost.

commit 56cc2406d6 ("KVM: nVMX: fix "acknowledge interrupt on exit" when APICv
is in use") changed the behavior of apic_clear_irr so irr_pending is cleared
after setting APIC_IRR vector. After this commit, if apic_set_irr and
apic_clear_irr run simultaneously, a race may occur, resulting in APIC_IRR
vector set, and irr_pending cleared. In the following example, assume a single
vector is set in IRR prior to calling apic_clear_irr:

apic_set_irr				apic_clear_irr
------------				--------------
apic->irr_pending = true;
					apic_clear_vector(...);
					vec = apic_search_irr(apic);
					// => vec == -1
apic_set_vector(...);
					apic->irr_pending = (vec != -1);
					// => apic->irr_pending == false

Nonetheless, it appears the race might even occur prior to this commit:

apic_set_irr				apic_clear_irr
------------				--------------
apic->irr_pending = true;
					apic->irr_pending = false;
					apic_clear_vector(...);
					if (apic_search_irr(apic) != -1)
						apic->irr_pending = true;
					// => apic->irr_pending == false
apic_set_vector(...);

Fixing this issue by:
1. Restoring the previous behavior of apic_clear_irr: clear irr_pending, call
   apic_clear_vector, and then if APIC_IRR is non-zero, set irr_pending.
2. On apic_set_irr: first call apic_set_vector, then set irr_pending.

Signed-off-by: Nadav Amit <namit@cs.technion.ac.il>
Signed-off-by: Paolo Bonzini <pbonzini@redhat.com>
2014-11-17 12:16:20 +01:00
Paolo Bonzini
a3e339e1ce KVM: compute correct map even if all APICs are software disabled
Logical destination mode can be used to send NMI IPIs even when all
APICs are software disabled, so if all APICs are software disabled we
should still look at the DFRs.

So the DFRs should all be the same, even if some or all APICs are
software disabled.  However, the SDM does not say this, so tweak
the logic as follows:

- if one APIC is enabled and has LDR != 0, use that one to build the map.
This picks the right DFR in case an OS is only setting it for the
software-enabled APICs, or in case an OS is using logical addressing
on some APICs while leaving the rest in reset state (using LDR was
suggested by Radim).

- if all APICs are disabled, pick a random one to build the map.
We use the last one with LDR != 0 for simplicity.

Signed-off-by: Paolo Bonzini <pbonzini@redhat.com>
2014-11-17 12:16:19 +01:00
Nadav Amit
173beedc16 KVM: x86: Software disabled APIC should still deliver NMIs
Currently, the APIC logical map does not consider VCPUs whose local-apic is
software-disabled.  However, NMIs, INIT, etc. should still be delivered to such
VCPUs. Therefore, the APIC mode should first be determined, and then the map,
considering all VCPUs should be constructed.

To address this issue, first find the APIC mode, and only then construct the
logical map.

Signed-off-by: Nadav Amit <namit@cs.technion.ac.il>
Signed-off-by: Paolo Bonzini <pbonzini@redhat.com>
2014-11-17 12:16:19 +01:00
Linus Torvalds
de55bbbff2 Merge branch 'x86-urgent-for-linus' of git://git.kernel.org/pub/scm/linux/kernel/git/tip/tip
Pull x86 fixes from Ingo Molnar:
 "Microcode fixes, a Xen fix and a KASLR boot loading fix with certain
  memory layouts"

* 'x86-urgent-for-linus' of git://git.kernel.org/pub/scm/linux/kernel/git/tip/tip:
  x86, microcode, AMD: Fix ucode patch stashing on 32-bit
  x86/core, x86/xen/smp: Use 'die_complete' completion when taking CPU down
  x86, microcode: Fix accessing dis_ucode_ldr on 32-bit
  x86, kaslr: Prevent .bss from overlaping initrd
  x86, microcode, AMD: Fix early ucode loading on 32-bit
2014-11-16 11:19:25 -08:00
Linus Torvalds
3b91270a0a x86-64: make csum_partial_copy_from_user() error handling consistent
Al Viro pointed out that the x86-64 csum_partial_copy_from_user() is
somewhat confused about what it should do on errors, notably it mostly
clears the uncopied end result buffer, but misses that for the initial
alignment case.

All users should check for errors, so it's dubious whether the clearing
is even necessary, and Al also points out that we should probably clean
up the calling conventions, but regardless of any future changes to this
function, the fact that it is inconsistent is just annoying.

So make the __get_user() failure path use the same error exit as all the
other errors do.

Reported-by: Al Viro <viro@zeniv.linux.org.uk>
Cc: David Miller <davem@davemloft.net>
Cc: Andi Kleen <andi@firstfloor.org>
Signed-off-by: Linus Torvalds <torvalds@linux-foundation.org>
2014-11-16 11:00:42 -08:00
Thomas Gleixner
0dbcae8847 x86: mm: Move PAT only functions to mm/pat.c
Commit e00c8cc93c "x86: Use new cache mode type in memtype related
functions" broke the ARCH=um build.

 arch/x86/include/asm/cacheflush.h:67:36: error: return type is an incomplete type
 static inline enum page_cache_mode get_page_memtype(struct page *pg)

The reason is simple. get_page_memtype() and set_page_memtype()
require enum page_cache_mode now, which is defined in
asm/pgtable_types.h. UM does not include that file for obvious reasons.

The simple solution is to move that functions to arch/x86/mm/pat.c
where the only callsites of this are located. They should have been
there in the first place.

Fixes: e00c8cc93c "x86: Use new cache mode type in memtype related functions"
Reported-by: Fengguang Wu <fengguang.wu@intel.com>
Signed-off-by: Thomas Gleixner <tglx@linutronix.de>
Cc: Juergen Gross <jgross@suse.com>
Cc: Richard Weinberger <richard@nod.at>
2014-11-16 18:59:19 +01:00
Dave Hansen
2cd3949f70 x86: Require exact match for 'noxsave' command line option
We have some very similarly named command-line options:

arch/x86/kernel/cpu/common.c:__setup("noxsave", x86_xsave_setup);
arch/x86/kernel/cpu/common.c:__setup("noxsaveopt", x86_xsaveopt_setup);
arch/x86/kernel/cpu/common.c:__setup("noxsaves", x86_xsaves_setup);

__setup() is designed to match options that take arguments, like
"foo=bar" where you would have:

	__setup("foo", x86_foo_func...);

The problem is that "noxsave" actually _matches_ "noxsaves" in
the same way that "foo" matches "foo=bar".  If you boot an old
kernel that does not know about "noxsaves" with "noxsaves" on the
command line, it will interpret the argument as "noxsave", which
is not what you want at all.

This makes the "noxsave" handler only return success when it finds
an *exact* match.

[ tglx: We really need to make __setup() more robust. ]

Signed-off-by: Dave Hansen <dave.hansen@linux.intel.com>
Cc: Dave Hansen <dave@sr71.net>
Cc: Fenghua Yu <fenghua.yu@intel.com>
Cc: x86@kernel.org
Cc: stable@vger.kernel.org
Link: http://lkml.kernel.org/r/20141111220133.FE053984@viggo.jf.intel.com
Signed-off-by: Thomas Gleixner <tglx@linutronix.de>
2014-11-16 12:13:16 +01:00
Stephane Eranian
aea48559ac perf/x86: Add support for sampling PEBS machine state registers
PEBS can capture machine state regs at retiremnt of the sampled
instructions. When precise sampling is enabled on an event, PEBS
is used, so substitute the interrupted state with the PEBS state.
Note that not all registers are captured by PEBS. Those missing
are replaced by the interrupt state counter-parts.

Signed-off-by: Stephane Eranian <eranian@google.com>
Signed-off-by: Peter Zijlstra (Intel) <peterz@infradead.org>
Link: http://lkml.kernel.org/r/1411559322-16548-3-git-send-email-eranian@google.com
Cc: cebbert.lkml@gmail.com
Cc: jolsa@redhat.com
Cc: Arnaldo Carvalho de Melo <acme@kernel.org>
Cc: Linus Torvalds <torvalds@linux-foundation.org>
Signed-off-by: Ingo Molnar <mingo@kernel.org>
2014-11-16 11:41:58 +01:00
Andi Kleen
af4bdcf675 perf/x86/intel: Disallow flags for most Core2/Atom/Nehalem/Westmere events
Disallow setting inv/cmask/etc. flags for all PEBS events
on these CPUs, except for the UOPS_RETIRED.* events on Nehalem/Westmere,
which are needed for cycles:p. This avoids an undefined situation
strongly discouraged by the Intle SDM. The PLD_* events were already
covered. This follows the earlier changes for Sandy Bridge and alter.

Signed-off-by: Andi Kleen <ak@linux.intel.com>
Signed-off-by: Peter Zijlstra (Intel) <peterz@infradead.org>
Cc: Arnaldo Carvalho de Melo <acme@kernel.org>
Link: http://lkml.kernel.org/r/1411569288-5627-3-git-send-email-andi@firstfloor.org
Signed-off-by: Ingo Molnar <mingo@kernel.org>
2014-11-16 11:41:56 +01:00
Andi Kleen
0dbc94796d perf/x86/intel: Use INTEL_FLAGS_UEVENT_CONSTRAINT for PRECDIST
My earlier commit:

  86a04461a9 ("perf/x86: Revamp PEBS event selection")

made nearly all PEBS on Sandy/IvyBridge/Haswell to reject non zero flags.

However this wasn't done for the INST_RETIRED.PREC_DIST event
because no suitable macro existed. Now that we have
INTEL_FLAGS_UEVENT_CONSTRAINT enforce zero flags for
INST_RETIRED.PREC_DIST too.

Signed-off-by: Andi Kleen <ak@linux.intel.com>
Signed-off-by: Peter Zijlstra (Intel) <peterz@infradead.org>
Cc: Arnaldo Carvalho de Melo <acme@kernel.org>
Link: http://lkml.kernel.org/r/1411569288-5627-2-git-send-email-andi@firstfloor.org
Signed-off-by: Ingo Molnar <mingo@kernel.org>
2014-11-16 11:41:55 +01:00
Andi Kleen
7550ddffe4 perf/x86: Add INTEL_FLAGS_UEVENT_CONSTRAINT
Add a FLAGS_UEVENT_CONSTRAINT macro that allows us to
match on event+umask, and in additional all flags.

This is needed to ensure the INV and CMASK fields
are zero for specific events, as this can cause undefined
behavior.

Signed-off-by: Andi Kleen <ak@linux.intel.com>
Signed-off-by: Peter Zijlstra (Intel) <peterz@infradead.org>
Cc: Kan Liang <kan.liang@intel.com>
Cc: Maria Dimakopoulou <maria.n.dimakopoulou@gmail.com>
Cc: Mark Davies <junk@eslaf.co.uk>
Cc: Stephane Eranian <eranian@google.com>
Link: http://lkml.kernel.org/r/1411569288-5627-1-git-send-email-andi@firstfloor.org
Signed-off-by: Ingo Molnar <mingo@kernel.org>
2014-11-16 11:41:54 +01:00
Andi Kleen
c0737ce453 perf/x86/intel/uncore: Add scaling units to the EP iMC events
Add scaling to MB/s to the memory controller read/write
events for Sandy/IvyBridge/Haswell-EP similar to how the client
does. This makes the events easier to use from the
standard perf tool.

Signed-off-by: Andi Kleen <ak@linux.intel.com>
Signed-off-by: Peter Zijlstra (Intel) <peterz@infradead.org>
Cc: Arnaldo Carvalho de Melo <acme@kernel.org>
Link: http://lkml.kernel.org/r/1415062828-19759-2-git-send-email-andi@firstfloor.org
Signed-off-by: Ingo Molnar <mingo@kernel.org>
2014-11-16 11:41:52 +01:00
Ingo Molnar
f108c898dd Merge branch 'perf/urgent' into perf/core, to pick up fixes
Signed-off-by: Ingo Molnar <mingo@kernel.org>
2014-11-16 11:31:17 +01:00
Juergen Gross
47591df505 xen: Support Xen pv-domains using PAT
With the dynamical mapping between cache modes and pgprot values it is
now possible to use all cache modes via the Xen hypervisor PAT settings
in a pv domain.

All to be done is to read the PAT configuration MSR and set up the
translation tables accordingly.

Signed-off-by: Juergen Gross <jgross@suse.com>
Reviewed-by: David Vrabel <david.vrabel@citrix.com>
Reviewed-by: Konrad Rzeszutek Wilk <konrad.wilk@oracle.com>
Reviewed-by: Thomas Gleixner <tglx@linutronix.de>
Cc: stefan.bader@canonical.com
Cc: xen-devel@lists.xensource.com
Cc: ville.syrjala@linux.intel.com
Cc: jbeulich@suse.com
Cc: toshi.kani@hp.com
Cc: plagnioj@jcrosoft.com
Cc: tomi.valkeinen@ti.com
Cc: bhelgaas@google.com
Link: http://lkml.kernel.org/r/1415019724-4317-19-git-send-email-jgross@suse.com
Signed-off-by: Thomas Gleixner <tglx@linutronix.de>
2014-11-16 11:04:26 +01:00
Juergen Gross
bd809af16e x86: Enable PAT to use cache mode translation tables
Update the translation tables from cache mode to pgprot values
according to the PAT settings. This enables changing the cache
attributes of a PAT index in just one place without having to change
at the users side.

With this change it is possible to use the same kernel with different
PAT configurations, e.g. supporting Xen.

Signed-off-by: Juergen Gross <jgross@suse.com>
Reviewed-by: Toshi Kani <toshi.kani@hp.com>
Reviewed-by: Konrad Rzeszutek Wilk <konrad.wilk@oracle.com>
Reviewed-by: Thomas Gleixner <tglx@linutronix.de>
Cc: stefan.bader@canonical.com
Cc: xen-devel@lists.xensource.com
Cc: ville.syrjala@linux.intel.com
Cc: david.vrabel@citrix.com
Cc: jbeulich@suse.com
Cc: plagnioj@jcrosoft.com
Cc: tomi.valkeinen@ti.com
Cc: bhelgaas@google.com
Link: http://lkml.kernel.org/r/1415019724-4317-18-git-send-email-jgross@suse.com
Signed-off-by: Thomas Gleixner <tglx@linutronix.de>
2014-11-16 11:04:26 +01:00
Juergen Gross
f5b2831d65 x86: Respect PAT bit when copying pte values between large and normal pages
The PAT bit in the ptes is not moved to the correct position when
copying page protection attributes between entries of different sized
pages. Translate the ptes according to their page size.

Based-on-patch-by: Stefan Bader <stefan.bader@canonical.com>
Signed-off-by: Juergen Gross <jgross@suse.com>
Reviewed-by: Thomas Gleixner <tglx@linutronix.de>
Cc: stefan.bader@canonical.com
Cc: xen-devel@lists.xensource.com
Cc: konrad.wilk@oracle.com
Cc: ville.syrjala@linux.intel.com
Cc: david.vrabel@citrix.com
Cc: jbeulich@suse.com
Cc: toshi.kani@hp.com
Cc: plagnioj@jcrosoft.com
Cc: tomi.valkeinen@ti.com
Cc: bhelgaas@google.com
Link: http://lkml.kernel.org/r/1415019724-4317-17-git-send-email-jgross@suse.com
Signed-off-by: Thomas Gleixner <tglx@linutronix.de>
2014-11-16 11:04:26 +01:00
Juergen Gross
f439c429c3 x86: Support PAT bit in pagetable dump for lower levels
Dumping page table protection bits is not correct for entries on levels
2 and 3 regarding the PAT bit, which is at a different position as on
level 4.

Based-on-patch-by: Stefan Bader <stefan.bader@canonical.com>
Signed-off-by: Juergen Gross <jgross@suse.com>
Reviewed-by: Thomas Gleixner <tglx@linutronix.de>
Cc: stefan.bader@canonical.com
Cc: xen-devel@lists.xensource.com
Cc: konrad.wilk@oracle.com
Cc: ville.syrjala@linux.intel.com
Cc: david.vrabel@citrix.com
Cc: jbeulich@suse.com
Cc: toshi.kani@hp.com
Cc: plagnioj@jcrosoft.com
Cc: tomi.valkeinen@ti.com
Cc: bhelgaas@google.com
Link: http://lkml.kernel.org/r/1415019724-4317-16-git-send-email-jgross@suse.com
Signed-off-by: Thomas Gleixner <tglx@linutronix.de>
2014-11-16 11:04:26 +01:00
Juergen Gross
87ad0b713b x86: Clean up pgtable_types.h
Remove no longer used defines from pgtable_types.h as they are not
used any longer.

Switch __PAGE_KERNEL_NOCACHE to use cache mode type instead of pte
bits.

Based-on-patch-by: Stefan Bader <stefan.bader@canonical.com>
Signed-off-by: Juergen Gross <jgross@suse.com>
Reviewed-by: Thomas Gleixner <tglx@linutronix.de>
Cc: stefan.bader@canonical.com
Cc: xen-devel@lists.xensource.com
Cc: konrad.wilk@oracle.com
Cc: ville.syrjala@linux.intel.com
Cc: david.vrabel@citrix.com
Cc: jbeulich@suse.com
Cc: toshi.kani@hp.com
Cc: plagnioj@jcrosoft.com
Cc: tomi.valkeinen@ti.com
Cc: bhelgaas@google.com
Link: http://lkml.kernel.org/r/1415019724-4317-15-git-send-email-jgross@suse.com
Signed-off-by: Thomas Gleixner <tglx@linutronix.de>
2014-11-16 11:04:26 +01:00
Juergen Gross
e00c8cc93c x86: Use new cache mode type in memtype related functions
Instead of directly using the cache mode bits in the pte switch to
using the cache mode type.

Based-on-patch-by: Stefan Bader <stefan.bader@canonical.com>
Signed-off-by: Juergen Gross <jgross@suse.com>
Reviewed-by: Thomas Gleixner <tglx@linutronix.de>
Cc: stefan.bader@canonical.com
Cc: xen-devel@lists.xensource.com
Cc: konrad.wilk@oracle.com
Cc: ville.syrjala@linux.intel.com
Cc: david.vrabel@citrix.com
Cc: jbeulich@suse.com
Cc: toshi.kani@hp.com
Cc: plagnioj@jcrosoft.com
Cc: tomi.valkeinen@ti.com
Cc: bhelgaas@google.com
Link: http://lkml.kernel.org/r/1415019724-4317-14-git-send-email-jgross@suse.com
Signed-off-by: Thomas Gleixner <tglx@linutronix.de>
2014-11-16 11:04:26 +01:00
Juergen Gross
b14097bd91 x86: Use new cache mode type in mm/ioremap.c
Instead of directly using the cache mode bits in the pte switch to
using the cache mode type.

Based-on-patch-by: Stefan Bader <stefan.bader@canonical.com>
Signed-off-by: Juergen Gross <jgross@suse.com>
Reviewed-by: Thomas Gleixner <tglx@linutronix.de>
Cc: stefan.bader@canonical.com
Cc: xen-devel@lists.xensource.com
Cc: konrad.wilk@oracle.com
Cc: ville.syrjala@linux.intel.com
Cc: david.vrabel@citrix.com
Cc: jbeulich@suse.com
Cc: toshi.kani@hp.com
Cc: plagnioj@jcrosoft.com
Cc: tomi.valkeinen@ti.com
Cc: bhelgaas@google.com
Link: http://lkml.kernel.org/r/1415019724-4317-13-git-send-email-jgross@suse.com
Signed-off-by: Thomas Gleixner <tglx@linutronix.de>
2014-11-16 11:04:26 +01:00
Juergen Gross
c06814d841 x86: Use new cache mode type in setting page attributes
Instead of directly using the cache mode bits in the pte switch to
using the cache mode type in the functions for modifying page
attributes.

Based-on-patch-by: Stefan Bader <stefan.bader@canonical.com>
Signed-off-by: Juergen Gross <jgross@suse.com>
Reviewed-by: Thomas Gleixner <tglx@linutronix.de>
Cc: stefan.bader@canonical.com
Cc: xen-devel@lists.xensource.com
Cc: konrad.wilk@oracle.com
Cc: ville.syrjala@linux.intel.com
Cc: david.vrabel@citrix.com
Cc: jbeulich@suse.com
Cc: toshi.kani@hp.com
Cc: plagnioj@jcrosoft.com
Cc: tomi.valkeinen@ti.com
Cc: bhelgaas@google.com
Link: http://lkml.kernel.org/r/1415019724-4317-12-git-send-email-jgross@suse.com
Signed-off-by: Thomas Gleixner <tglx@linutronix.de>
2014-11-16 11:04:25 +01:00
Juergen Gross
102e19e195 x86: Remove looking for setting of _PAGE_PAT_LARGE in pageattr.c
When modifying page attributes via change_page_attr_set_clr() don't
test for setting _PAGE_PAT_LARGE, as this is
- never done
- PAT support for large pages is not included in the kernel up to now

Signed-off-by: Juergen Gross <jgross@suse.com>
Cc: stefan.bader@canonical.com
Cc: xen-devel@lists.xensource.com
Cc: konrad.wilk@oracle.com
Cc: ville.syrjala@linux.intel.com
Cc: david.vrabel@citrix.com
Cc: jbeulich@suse.com
Cc: toshi.kani@hp.com
Cc: plagnioj@jcrosoft.com
Cc: tomi.valkeinen@ti.com
Cc: bhelgaas@google.com
Link: http://lkml.kernel.org/r/1415019724-4317-11-git-send-email-jgross@suse.com
Signed-off-by: Thomas Gleixner <tglx@linutronix.de>
2014-11-16 11:04:25 +01:00
Juergen Gross
2a3746984c x86: Use new cache mode type in track_pfn_remap() and track_pfn_insert()
Instead of directly using the cache mode bits in the pte switch to
using the cache mode type. As those are the main callers of
lookup_memtype(), change this as well.

Based-on-patch-by: Stefan Bader <stefan.bader@canonical.com>
Signed-off-by: Juergen Gross <jgross@suse.com>
Reviewed-by: Thomas Gleixner <tglx@linutronix.de>
Cc: stefan.bader@canonical.com
Cc: xen-devel@lists.xensource.com
Cc: konrad.wilk@oracle.com
Cc: ville.syrjala@linux.intel.com
Cc: david.vrabel@citrix.com
Cc: jbeulich@suse.com
Cc: toshi.kani@hp.com
Cc: plagnioj@jcrosoft.com
Cc: tomi.valkeinen@ti.com
Cc: bhelgaas@google.com
Link: http://lkml.kernel.org/r/1415019724-4317-10-git-send-email-jgross@suse.com
Signed-off-by: Thomas Gleixner <tglx@linutronix.de>
2014-11-16 11:04:25 +01:00
Juergen Gross
49a3b3cbdf x86: Use new cache mode type in mm/iomap_32.c
Instead of directly using the cache mode bits in the pte switch to
using the cache mode type. This requires to change
io_reserve_memtype() as well.

Based-on-patch-by: Stefan Bader <stefan.bader@canonical.com>
Signed-off-by: Juergen Gross <jgross@suse.com>
Reviewed-by: Thomas Gleixner <tglx@linutronix.de>
Cc: stefan.bader@canonical.com
Cc: xen-devel@lists.xensource.com
Cc: konrad.wilk@oracle.com
Cc: ville.syrjala@linux.intel.com
Cc: david.vrabel@citrix.com
Cc: jbeulich@suse.com
Cc: toshi.kani@hp.com
Cc: plagnioj@jcrosoft.com
Cc: tomi.valkeinen@ti.com
Cc: bhelgaas@google.com
Link: http://lkml.kernel.org/r/1415019724-4317-9-git-send-email-jgross@suse.com
Signed-off-by: Thomas Gleixner <tglx@linutronix.de>
2014-11-16 11:04:25 +01:00
Juergen Gross
d85f33342a x86: Use new cache mode type in asm/pgtable.h
Instead of directly using the cache mode bits in the pte switch to
using the cache mode type. This requires changing some callers of
is_new_memtype_allowed() to be changed as well.

Based-on-patch-by: Stefan Bader <stefan.bader@canonical.com>
Signed-off-by: Juergen Gross <jgross@suse.com>
Reviewed-by: Thomas Gleixner <tglx@linutronix.de>
Cc: stefan.bader@canonical.com
Cc: xen-devel@lists.xensource.com
Cc: konrad.wilk@oracle.com
Cc: ville.syrjala@linux.intel.com
Cc: david.vrabel@citrix.com
Cc: jbeulich@suse.com
Cc: toshi.kani@hp.com
Cc: plagnioj@jcrosoft.com
Cc: tomi.valkeinen@ti.com
Cc: bhelgaas@google.com
Link: http://lkml.kernel.org/r/1415019724-4317-8-git-send-email-jgross@suse.com
Signed-off-by: Thomas Gleixner <tglx@linutronix.de>
2014-11-16 11:04:25 +01:00
Juergen Gross
2df58b6d35 x86: Use new cache mode type in arch/x86/mm/init_64.c
Instead of directly using the cache mode bits in the pte switch to
using the cache mode type.

Based-on-patch-by: Stefan Bader <stefan.bader@canonical.com>
Signed-off-by: Juergen Gross <jgross@suse.com>
Reviewed-by: Thomas Gleixner <tglx@linutronix.de>
Cc: stefan.bader@canonical.com
Cc: xen-devel@lists.xensource.com
Cc: konrad.wilk@oracle.com
Cc: ville.syrjala@linux.intel.com
Cc: david.vrabel@citrix.com
Cc: jbeulich@suse.com
Cc: toshi.kani@hp.com
Cc: plagnioj@jcrosoft.com
Cc: tomi.valkeinen@ti.com
Cc: bhelgaas@google.com
Link: http://lkml.kernel.org/r/1415019724-4317-7-git-send-email-jgross@suse.com
Signed-off-by: Thomas Gleixner <tglx@linutronix.de>
2014-11-16 11:04:25 +01:00
Juergen Gross
1c64216be1 x86: Use new cache mode type in arch/x86/pci
Instead of directly using the cache mode bits in the pte switch to
using the cache mode type.

Based-on-patch-by: Stefan Bader <stefan.bader@canonical.com>
Signed-off-by: Juergen Gross <jgross@suse.com>
Reviewed-by: Thomas Gleixner <tglx@linutronix.de>
Cc: stefan.bader@canonical.com
Cc: xen-devel@lists.xensource.com
Cc: konrad.wilk@oracle.com
Cc: ville.syrjala@linux.intel.com
Cc: david.vrabel@citrix.com
Cc: jbeulich@suse.com
Cc: toshi.kani@hp.com
Cc: plagnioj@jcrosoft.com
Cc: tomi.valkeinen@ti.com
Cc: bhelgaas@google.com
Link: http://lkml.kernel.org/r/1415019724-4317-6-git-send-email-jgross@suse.com
Signed-off-by: Thomas Gleixner <tglx@linutronix.de>
2014-11-16 11:04:25 +01:00
Juergen Gross
c27ce0af89 x86: Use new cache mode type in include/asm/fb.h
Instead of directly using cache mode bits in the pte switch to usage of
the new cache mode type.

Based-on-patch-by: Stefan Bader <stefan.bader@canonical.com>
Signed-off-by: Juergen Gross <jgross@suse.com>
Reviewed-by: Thomas Gleixner <tglx@linutronix.de>
Cc: stefan.bader@canonical.com
Cc: xen-devel@lists.xensource.com
Cc: konrad.wilk@oracle.com
Cc: ville.syrjala@linux.intel.com
Cc: david.vrabel@citrix.com
Cc: jbeulich@suse.com
Cc: toshi.kani@hp.com
Cc: plagnioj@jcrosoft.com
Cc: tomi.valkeinen@ti.com
Cc: bhelgaas@google.com
Link: http://lkml.kernel.org/r/1415019724-4317-3-git-send-email-jgross@suse.com
Signed-off-by: Thomas Gleixner <tglx@linutronix.de>
2014-11-16 11:04:24 +01:00
Juergen Gross
281d4078be x86: Make page cache mode a real type
At the moment there are a lot of places that handle setting or getting
the page cache mode by treating the pgprot bits equal to the cache mode.
This is only true because there are a lot of assumptions about the setup
of the PAT MSR. Otherwise the cache type needs to get translated into
pgprot bits and vice versa.

This patch tries to prepare for that by introducing a separate type
for the cache mode and adding functions to translate between those and
pgprot values.

To avoid too much performance penalty the translation between cache mode
and pgprot values is done via tables which contain the relevant
information.  Write-back cache mode is hard-wired to be 0, all other
modes are configurable via those tables. For large pages there are
translation functions as the PAT bit is located at different positions
in the ptes of 4k and large pages.

Based-on-patch-by: Stefan Bader <stefan.bader@canonical.com>
Signed-off-by: Juergen Gross <jgross@suse.com>
Reviewed-by: Thomas Gleixner <tglx@linutronix.de>
Cc: stefan.bader@canonical.com
Cc: xen-devel@lists.xensource.com
Cc: konrad.wilk@oracle.com
Cc: ville.syrjala@linux.intel.com
Cc: david.vrabel@citrix.com
Cc: jbeulich@suse.com
Cc: toshi.kani@hp.com
Cc: plagnioj@jcrosoft.com
Cc: tomi.valkeinen@ti.com
Cc: bhelgaas@google.com
Link: http://lkml.kernel.org/r/1415019724-4317-2-git-send-email-jgross@suse.com
Signed-off-by: Thomas Gleixner <tglx@linutronix.de>
2014-11-16 11:04:24 +01:00
Ingo Molnar
e9ac5f0fa8 Merge branch 'sched/urgent' into sched/core, to pick up fixes before applying more changes
Signed-off-by: Ingo Molnar <mingo@kernel.org>
2014-11-16 10:50:25 +01:00
Ingo Molnar
595247f61f Merge tag 'efi-next' of git://git.kernel.org/pub/scm/linux/kernel/git/mfleming/efi into x86/efi
Pull EFI updates for v3.19 from Matt Fleming:

 - Support module unload for efivarfs - Mathias Krause

 - Another attempt at moving x86 to libstub taking advantage of the
   __pure attribute - Ard Biesheuvel

 - Add EFI runtime services section to ptdump - Mathias Krause

Signed-off-by: Ingo Molnar <mingo@kernel.org>
2014-11-16 10:48:53 +01:00
Andi Kleen
68055915c1 perf/x86/intel/uncore: Fix boot crash on SBOX PMU on Haswell-EP
There were several reports that on some systems writing the SBOX0 PMU
initialization MSR would #GP at boot. This did not happen on all
systems -- my two test systems booted fine.

Writing the three initialization bits bit-by-bit seems to avoid the
problem. So add a special callback to do just that.

This replaces an earlier patch that disabled the SBOX.

Reported-by: Alexei Starovoitov <alexei.starovoitov@gmail.com>
Reported-and-Tested-by: Patrick Lu <patrick.lu@intel.com>
Signed-off-by: Andi Kleen <ak@linux.intel.com>
Signed-off-by: Peter Zijlstra (Intel) <peterz@infradead.org>
Cc: Arnaldo Carvalho de Melo <acme@kernel.org>
Link: http://lkml.kernel.org/r/1415062828-19759-4-git-send-email-andi@firstfloor.org
[ Fixed a whitespace error and added attribution tags that were left out inexplicably. ]
Signed-off-by: Ingo Molnar <mingo@kernel.org>
2014-11-16 09:53:36 +01:00
Andi Kleen
41a134a583 perf/x86/intel/uncore: Fix IRP uncore register offsets on Haswell EP
The counter register offsets for the IRP box PMU for Haswell-EP
were incorrect. The offsets actually changed over IvyBridge EP.

Fix them to the correct values. For this we need to fork the read
function from the IVB and use an own counter array.

Tested-by: patrick.lu@intel.com
Signed-off-by: Andi Kleen <ak@linux.intel.com>
Signed-off-by: Peter Zijlstra (Intel) <peterz@infradead.org>
Cc: Arnaldo Carvalho de Melo <acme@kernel.org>
Link: http://lkml.kernel.org/r/1415062828-19759-3-git-send-email-andi@firstfloor.org
Signed-off-by: Ingo Molnar <mingo@kernel.org>
2014-11-16 09:45:47 +01:00
Igor Mammedov
1d4e7e3c0b kvm: x86: increase user memory slots to 509
With the 3 private slots, this gives us 512 slots total.
Motivation for this is in addition to assigned devices
support more memory hotplug slots, where 1 slot is
used by a hotplugged memory stick.
It will allow to support upto 256 hotplug memory
slots and leave 253 slots for assigned devices and
other devices that use them.

Signed-off-by: Igor Mammedov <imammedo@redhat.com>
Signed-off-by: Paolo Bonzini <pbonzini@redhat.com>
2014-11-14 10:02:40 +01:00
Chris J Arges
d913b90435 kvm: svm: move WARN_ON in svm_adjust_tsc_offset
When running the tsc_adjust kvm-unit-test on an AMD processor with the
IA32_TSC_ADJUST feature enabled, the WARN_ON in svm_adjust_tsc_offset can be
triggered. This WARN_ON checks for a negative adjustment in case __scale_tsc
is called; however it may trigger unnecessary warnings.

This patch moves the WARN_ON to trigger only if __scale_tsc will actually be
called from svm_adjust_tsc_offset. In addition make adj in kvm_set_msr_common
s64 since this can have signed values.

Signed-off-by: Chris J Arges <chris.j.arges@canonical.com>
Signed-off-by: Paolo Bonzini <pbonzini@redhat.com>
2014-11-13 11:56:11 +01:00
Daniel Lezcano
b82b6cca48 cpuidle: Invert CPUIDLE_FLAG_TIME_VALID logic
The only place where the time is invalid is when the ACPI_CSTATE_FFH entry
method is not set. Otherwise for all the drivers, the time can be correctly
measured.

Instead of duplicating the CPUIDLE_FLAG_TIME_VALID flag in all the drivers
for all the states, just invert the logic by replacing it by the flag
CPUIDLE_FLAG_TIME_INVALID, hence we can set this flag only for the acpi idle
driver, remove the former flag from all the drivers and invert the logic with
this flag in the different governor.

Signed-off-by: Daniel Lezcano <daniel.lezcano@linaro.org>
Signed-off-by: Rafael J. Wysocki <rafael.j.wysocki@intel.com>
2014-11-12 21:17:27 +01:00
Andy Lutomirski
54b98bff8e x86, kvm, vmx: Don't set LOAD_IA32_EFER when host and guest match
There's nothing to switch if the host and guest values are the same.
I am unable to find evidence that this makes any difference
whatsoever.

Signed-off-by: Andy Lutomirski <luto@amacapital.net>
[I could see a difference on Nehalem.  From 5 runs:

 userspace exit, guest!=host   12200 11772 12130 12164 12327
 userspace exit, guest=host    11983 11780 11920 11919 12040
 lightweight exit, guest!=host  3214  3220  3238  3218  3337
 lightweight exit, guest=host   3178  3193  3193  3187  3220

 This passes the t-test with 99% confidence for userspace exit,
 98.5% confidence for lightweight exit. - Paolo]
Signed-off-by: Paolo Bonzini <pbonzini@redhat.com>
2014-11-12 16:27:21 +01:00
Aravind Gopalakrishnan
904cb3677f perf/x86/amd/ibs: Update IBS MSRs and feature definitions
New Fam15h models carry extra feature bits and extend
the MSR register space for IBS ops. Adding them here.

While at it, add functionality to read IbsBrTarget and
OpData4 depending on their availability if user wants a
PERF_SAMPLE_RAW.

Signed-off-by: Aravind Gopalakrishnan <Aravind.Gopalakrishnan@amd.com>
Acked-by: Borislav Petkov <bp@suse.de>
Cc: Paolo Bonzini <pbonzini@redhat.com>
Cc: Jan Kiszka <jan.kiszka@siemens.com>
Cc: Len Brown <len.brown@intel.com>
Cc: Fenghua Yu <fenghua.yu@intel.com>
Cc: Peter Zijlstra <peterz@infradead.org>
Cc: <paulus@samba.org>
Cc: <acme@kernel.org>
Link: http://lkml.kernel.org/r/1415651066-13523-1-git-send-email-Aravind.Gopalakrishnan@amd.com
Signed-off-by: Ingo Molnar <mingo@kernel.org>
2014-11-12 15:12:32 +01:00
Herbert Xu
4c7912e919 Merge git://git.kernel.org/pub/scm/linux/kernel/git/torvalds/linux
Merging 3.18-rc4 in order to pick up the memzero_explicit helper.
2014-11-12 22:11:15 +08:00
Ingo Molnar
29cc373037 Merge tag 'x86_queue_for_3.19' of git://git.kernel.org/pub/scm/linux/kernel/git/bp/bp into x86/cleanups
Pull two minor cleanups from Borislav Petkov.

Signed-off-by: Ingo Molnar <mingo@kernel.org>
2014-11-12 15:10:12 +01:00
Ingo Molnar
890ca861f8 Merge tag 'v3.18-rc4' into x86/cleanups, to refresh the tree before pulling new changes.
Signed-off-by: Ingo Molnar <mingo@kernel.org>
2014-11-12 15:09:01 +01:00
Andy Lutomirski
f6577a5fa1 x86, kvm, vmx: Always use LOAD_IA32_EFER if available
At least on Sandy Bridge, letting the CPU switch IA32_EFER is much
faster than switching it manually.

I benchmarked this using the vmexit kvm-unit-test (single run, but
GOAL multiplied by 5 to do more iterations):

Test                                  Before      After    Change
cpuid                                   2000       1932    -3.40%
vmcall                                  1914       1817    -5.07%
mov_from_cr8                              13         13     0.00%
mov_to_cr8                                19         19     0.00%
inl_from_pmtimer                       19164      10619   -44.59%
inl_from_qemu                          15662      10302   -34.22%
inl_from_kernel                         3916       3802    -2.91%
outl_to_kernel                          2230       2194    -1.61%
mov_dr                                   172        176     2.33%
ipi                                (skipped)  (skipped)
ipi+halt                           (skipped)  (skipped)
ple-round-robin                           13         13     0.00%
wr_tsc_adjust_msr                       1920       1845    -3.91%
rd_tsc_adjust_msr                       1892       1814    -4.12%
mmio-no-eventfd:pci-mem                16394      11165   -31.90%
mmio-wildcard-eventfd:pci-mem           4607       4645     0.82%
mmio-datamatch-eventfd:pci-mem          4601       4610     0.20%
portio-no-eventfd:pci-io               11507       7942   -30.98%
portio-wildcard-eventfd:pci-io          2239       2225    -0.63%
portio-datamatch-eventfd:pci-io         2250       2234    -0.71%

I haven't explicitly computed the significance of these numbers,
but this isn't subtle.

Signed-off-by: Andy Lutomirski <luto@amacapital.net>
[The results were reproducible on all of Nehalem, Sandy Bridge and
 Ivy Bridge.  The slowness of manual switching is because writing
 to EFER with WRMSR triggers a TLB flush, even if the only bit you're
 touching is SCE (so the page table format is not affected).  Doing
 the write as part of vmentry/vmexit, instead, does not flush the TLB,
 probably because all processors that have EPT also have VPID. - Paolo]
Signed-off-by: Paolo Bonzini <pbonzini@redhat.com>
2014-11-12 12:35:02 +01:00
Dave Airlie
51b44eb17b Merge tag 'v3.18-rc4' into drm-next
backmerge to get vmwgfx locking changes into next as the
conflict with per-plane locking.
2014-11-12 17:53:30 +10:00
Dirk Brandewie
2f86dc4cdd intel_pstate: Add support for HWP
Add support of Hardware Managed Performance States (HWP) described in Volume 3
section 14.4 of the SDM.

With HWP enbaled intel_pstate will no longer be responsible for selecting P
states for the processor. intel_pstate will continue to register to
the cpufreq core as the scaling driver for CPUs implementing
HWP. In HWP mode intel_pstate provides three functions reporting
frequency to the cpufreq core, support for the set_policy() interface
from the core and maintaining the intel_pstate sysfs interface in
/sys/devices/system/cpu/intel_pstate.  User preferences expressed via
the set_policy() interface or the sysfs interface are forwared to the
CPU via the HWP MSR interface.

Signed-off-by: Dirk Brandewie <dirk.j.brandewie@intel.com>
Signed-off-by: Rafael J. Wysocki <rafael.j.wysocki@intel.com>
2014-11-12 00:04:38 +01:00
Dirk Brandewie
7787388772 x86: Add support for Intel HWP feature detection.
Add support of Hardware Managed Performance States (HWP) described in Volume 3
section 14.4 of the SDM.

One bit CPUID.06H:EAX[bit 7] expresses the presence of the HWP feature on
the processor. The remaining bits CPUID.06H:EAX[bit 8-11] denote the
presense of various HWP features.

Signed-off-by: Dirk Brandewie <dirk.j.brandewie@intel.com>
Signed-off-by: Rafael J. Wysocki <rafael.j.wysocki@intel.com>
2014-11-12 00:04:37 +01:00
Mathias Krause
8266e31ed0 x86, ptdump: Add section for EFI runtime services
In commit 3891a04aaf ("x86-64, espfix: Don't leak bits 31:16 of %esp
returning..") the "ESPFix Area" was added to the page table dump special
sections. That area, though, has a limited amount of entries printed.

The EFI runtime services are, unfortunately, located in-between the
espfix area and the high kernel memory mapping. Due to the enforced
limitation for the espfix area, the EFI mappings won't be printed in the
page table dump.

To make the ESP runtime service mappings visible again, provide them a
dedicated entry.

Signed-off-by: Mathias Krause <minipli@googlemail.com>
Acked-by: Borislav Petkov <bp@suse.de>
Cc: H. Peter Anvin <hpa@zytor.com>
Signed-off-by: Matt Fleming <matt.fleming@intel.com>
2014-11-11 22:28:57 +00:00
Ard Biesheuvel
243b6754cd efi/x86: Move x86 back to libstub
This reverts commit 84be880560, which itself reverted my original
attempt to move x86 from #include'ing .c files from across the tree
to using the EFI stub built as a static library.

The issue that affected the original approach was that splitting
the implementation into several .o files resulted in the variable
'efi_early' becoming a global with external linkage, which under
-fPIC implies that references to it must go through the GOT. However,
dealing with this additional GOT entry turned out to be troublesome
on some EFI implementations. (GCC's visibility=hidden attribute is
supposed to lift this requirement, but it turned out not to work on
the 32-bit build.)

Instead, use a pure getter function to get a reference to efi_early.
This approach results in no additional GOT entries being generated,
so there is no need for any changes in the early GOT handling.

Tested-by: Maarten Lankhorst <maarten.lankhorst@canonical.com>
Signed-off-by: Ard Biesheuvel <ard.biesheuvel@linaro.org>
Signed-off-by: Matt Fleming <matt.fleming@intel.com>
2014-11-11 22:23:11 +00:00
Yijing Wang
03f56e42d0 Revert "PCI: Add x86_msi.msi_mask_irq() and msix_mask_irq()"
The problem fixed by 0e4ccb1505 ("PCI: Add x86_msi.msi_mask_irq() and
msix_mask_irq()") has been fixed in a simpler way by a previous commit
("PCI/MSI: Add pci_msi_ignore_mask to prevent writes to MSI/MSI-X Mask
Bits").

The msi_mask_irq() and msix_mask_irq() x86_msi_ops added by 0e4ccb1505
are no longer needed, so revert the commit.

default_msi_mask_irq() and default_msix_mask_irq() were added by
0e4ccb1505 and are still used by s390, so keep them for now.

[bhelgaas: changelog]
Signed-off-by: Yijing Wang <wangyijing@huawei.com>
Signed-off-by: Bjorn Helgaas <bhelgaas@google.com>
Acked-by: David Vrabel <david.vrabel@citrix.com>
CC: Konrad Rzeszutek Wilk <konrad.wilk@oracle.com>
CC: xen-devel@lists.xenproject.org
2014-11-11 15:14:30 -07:00
Arnd Bergmann
1c8d29696f Merge branch 'io' of git://git.kernel.org/pub/scm/linux/kernel/git/will/linux into asm-generic
* 'io' of git://git.kernel.org/pub/scm/linux/kernel/git/will/linux:
  documentation: memory-barriers: clarify relaxed io accessor semantics
  x86: io: implement dummy relaxed accessor macros for writes
  tile: io: implement dummy relaxed accessor macros for writes
  sparc: io: implement dummy relaxed accessor macros for writes
  powerpc: io: implement dummy relaxed accessor macros for writes
  parisc: io: implement dummy relaxed accessor macros for writes
  mn10300: io: implement dummy relaxed accessor macros for writes
  m68k: io: implement dummy relaxed accessor macros for writes
  m32r: io: implement dummy relaxed accessor macros for writes
  ia64: io: implement dummy relaxed accessor macros for writes
  cris: io: implement dummy relaxed accessor macros for writes
  frv: io: implement dummy relaxed accessor macros for writes
  xtensa: io: remove dummy relaxed accessor macros for reads
  s390: io: remove dummy relaxed accessor macros for reads
  microblaze: io: remove dummy relaxed accessor macros
  asm-generic: io: implement relaxed accessor macros as conditional wrappers

Conflicts:
	include/asm-generic/io.h

Signed-off-by: Arnd Bergmann <arnd@arndb.de>
2014-11-11 19:55:45 +01:00