The Unified Memory Controllers (UMCs) on Fam17h log a normalized address
in their MCA_ADDR registers. We need to convert that normalized address
to a system physical address in order to support a few facilities:
1) To offline poisoned pages in DRAM proactively in the deferred error
handler.
2) To print sysaddr and page info for DRAM ECC errors in EDAC.
[ Boris: fixes/cleanups ontop:
* hi_addr_offset = 0 - no need for that branch. Stick it all under the
HiAddrOffsetEn case. It confines hi_addr_offset's declaration too.
* Move variables to the innermost scope they're used at so that we save
on stack and not blow it up immediately on function entry.
* Do not modify *sys_addr prematurely - we want to not exit early and
have modified *sys_addr some, which callers get to see. We either
convert to a sys_addr or we don't do anything. And we signal that with
the retval of the function.
* Rename label out -> out_err - because it is the error path.
* No need to pr_err of the conversion failed case: imagine a
sparsely-populated machine with UMCs which don't have DIMMs. Callers
should look at the retval instead and issue a printk only when really
necessary. No need for useless info in dmesg.
* s/temp_reg/tmp/ and other variable names shortening => shorter code.
* Use BIT() everywhere.
* Make error messages more informative.
* Small build fix for the !CONFIG_X86_MCE_AMD case.
* ... and more minor cleanups.
]
Signed-off-by: Yazen Ghannam <Yazen.Ghannam@amd.com>
Signed-off-by: Borislav Petkov <bp@suse.de>
Signed-off-by: Thomas Gleixner <tglx@linutronix.de>
Cc: Aravind Gopalakrishnan <aravindksg.lkml@gmail.com>
Cc: Linus Torvalds <torvalds@linux-foundation.org>
Cc: Peter Zijlstra <peterz@infradead.org>
Cc: Tony Luck <tony.luck@intel.com>
Cc: linux-edac <linux-edac@vger.kernel.org>
Link: http://lkml.kernel.org/r/20161122111133.mjzpvzhf7o7yl2oa@pd.tnic
[ Typo fixes. ]
Signed-off-by: Ingo Molnar <mingo@kernel.org>
The X86_FEATURE_TSC_RELIABLE flag in Linux kernel implies both reliable
(at runtime) and trustable (at calibration). But reliable running and
trustable calibration independent of each other.
Add a new flag X86_FEATURE_TSC_KNOWN_FREQ, which denotes that the frequency
is known (via MSR/CPUID). This flag is only meant to skip the long term
calibration on systems which have a known frequency.
Add X86_FEATURE_TSC_KNOWN_FREQ to the skip the delayed calibration and
leave X86_FEATURE_TSC_RELIABLE in place.
After converting the existing users of X86_FEATURE_TSC_RELIABLE to use
either both flags or just X86_FEATURE_TSC_KNOWN_FREQ we can seperate the
functionality.
Suggested-by: Thomas Gleixner <tglx@linutronix.de>
Signed-off-by: Bin Gao <bin.gao@intel.com>
Cc: Peter Zijlstra <peterz@infradead.org>
Link: http://lkml.kernel.org/r/1479241644-234277-2-git-send-email-bin.gao@linux.intel.com
Signed-off-by: Thomas Gleixner <tglx@linutronix.de>
Tvrtko needs
commit b3c11ac267
Author: Eric Engestrom <eric@engestrom.ch>
Date: Sat Nov 12 01:12:56 2016 +0000
drm: move allocation out of drm_get_format_name()
to be able to apply his patches without conflicts.
Signed-off-by: Daniel Vetter <daniel.vetter@intel.com>
Some devices on Fam17h can only be accessed through the System Management
Network (SMN). The SMN is accessed by a pair of index/data registers in PCI
config space. Add a pair of functions to read from and write to the SMN.
The Data Fabric on Fam17h allows multiple devices to use the same register
space. The registers of a specific device are accessed indirectly using the
device's DF InstanceId. Currently, we only need to read from these devices,
so only define a read function for now.
Signed-off-by: Yazen Ghannam <Yazen.Ghannam@amd.com>
Cc: linux-edac <linux-edac@vger.kernel.org>
Cc: x86-ml <x86@kernel.org>
Link: http://lkml.kernel.org/r/1478812257-5424-5-git-send-email-Yazen.Ghannam@amd.com
[ Boris: make __amd_smn_rw() even more compact. ]
Signed-off-by: Borislav Petkov <bp@suse.de>
Signed-off-by: Thomas Gleixner <tglx@linutronix.de>
Sparse populated CPUID leafs are collected in a software provided leaf to
avoid bloat of the x86_capability array, but there is no way to rebuild the
real leafs (e.g. for KVM CPUID enumeration) other than rereading the CPUID
leaf from the CPU. While this is possible it is problematic as it does not
take software disabled features into account. If a feature is disabled on
the host it should not be exposed to a guest either.
Add get_scattered_cpuid_leaf() which rebuilds the leaf from the scattered
cpuid table information and the active CPU features.
[ tglx: Rewrote changelog ]
Signed-off-by: He Chen <he.chen@linux.intel.com>
Reviewed-by: Borislav Petkov <bp@suse.de>
Cc: Luwei Kang <luwei.kang@intel.com>
Cc: kvm@vger.kernel.org
Cc: Radim Krčmář <rkrcmar@redhat.com>
Cc: Piotr Luc <Piotr.Luc@intel.com>
Cc: Borislav Petkov <bp@alien8.de>
Cc: Paolo Bonzini <pbonzini@redhat.com>
Link: http://lkml.kernel.org/r/1478856336-9388-3-git-send-email-he.chen@linux.intel.com
Signed-off-by: Thomas Gleixner <tglx@linutronix.de>
The per-cpu preempt count of x86 contains two values, the actual preempt
count and the inverted PREEMPT_NEED_RESCHED bit. If a corrupted preempt
count is detected the preempt_count_set() function is used to reset the
preempt count.
In case the inverted PREEMPT_NEED_RESCHED bit is zero at the time of the
reset, the preemption indication is lost. Use raw_cpu_cmpxchg_4() to reset
only the count part and leave the PREEMPT_NEED_RESCHED bit as it is.
This improves the kernel's behavior when it runs into preempt count leaks
and tries to fix them up.
Signed-off-by: Martin Schwidefsky <schwidefsky@de.ibm.com>
Signed-off-by: Peter Zijlstra (Intel) <peterz@infradead.org>
Cc: Linus Torvalds <torvalds@linux-foundation.org>
Cc: Peter Zijlstra <peterz@infradead.org>
Cc: Thomas Gleixner <tglx@linutronix.de>
Link: http://lkml.kernel.org/r/1478523660-733-1-git-send-email-schwidefsky@de.ibm.com
Signed-off-by: Ingo Molnar <mingo@kernel.org>
Make the MSR argument an unsigned int, both low and high u32, put
"notrace" last in the function signature. Reflow function signatures for
better readability and cleanup white space.
Signed-off-by: Borislav Petkov <bp@suse.de>
Signed-off-by: Thomas Gleixner <tglx@linutronix.de>
The threshold_cpu_callback callbacks looks like one of the notifier and
its arguments are almost the same. Split this out and have one ONLINE
and one DEAD callback. This will come handy later once the main code
gets changed to use the callback mechanism.
Also, handle threshold_cpu_callback_online() return value so we don't
continue if the function fails.
Boris Petkov removed the callback pointer and replaced it with proper
functions.
Signed-off-by: Sebastian Andrzej Siewior <bigeasy@linutronix.de>
Acked-by: Borislav Petkov <bp@alien8.de>
Cc: Tony Luck <tony.luck@intel.com>
Cc: rt@linutronix.de
Cc: linux-edac@vger.kernel.org
Link: http://lkml.kernel.org/r/20161110174447.11848-5-bigeasy@linutronix.de
Signed-off-by: Thomas Gleixner <tglx@linutronix.de>
We already have a macro to invoke boot services which on x86 adapts
automatically to the bitness of the EFI firmware: efi_call_early().
The macro allows sharing of functions across arches and bitness variants
as long as those functions only call boot services. However in practice
functions in the EFI stub contain a mix of boot services calls and
protocol calls.
Add an efi_call_proto() macro for bitness-agnostic protocol calls to
allow sharing more code across arches as well as deduplicating 32 bit
and 64 bit code paths.
On x86, implement it using a new efi_table_attr() macro for bitness-
agnostic table lookups. Refactor efi_call_early() to make use of the
same macro. (The resulting object code remains identical.)
Signed-off-by: Lukas Wunner <lukas@wunner.de>
Signed-off-by: Matt Fleming <matt@codeblueprint.co.uk>
Cc: Andreas Noever <andreas.noever@gmail.com>
Cc: Ard Biesheuvel <ard.biesheuvel@linaro.org>
Cc: Linus Torvalds <torvalds@linux-foundation.org>
Cc: Peter Jones <pjones@redhat.com>
Cc: Peter Zijlstra <peterz@infradead.org>
Cc: Thomas Gleixner <tglx@linutronix.de>
Cc: linux-efi@vger.kernel.org
Link: http://lkml.kernel.org/r/20161112213237.8804-8-matt@codeblueprint.co.uk
Signed-off-by: Ingo Molnar <mingo@kernel.org>
The calculation of the hwid_mcatype value in get_smca_bank_info()
became incorrect after applying the following commit:
1ce9cd7f9f ("x86/RAS: Simplify SMCA HWID descriptor struct")
This causes the function to not match a bank to its type.
Disassembly of hwid_mcatype calculation after change:
db: 8b 45 e0 mov -0x20(%rbp),%eax
de: 41 89 c4 mov %eax,%r12d
e1: 25 00 00 ff 0f and $0xfff0000,%eax
e6: 41 c1 ec 10 shr $0x10,%r12d
ea: 41 09 c4 or %eax,%r12d
Disassembly of hwid_mcatype calculation in original code:
286: 8b 45 d0 mov -0x30(%rbp),%eax
289: 41 89 c5 mov %eax,%r13d
28c: c1 e8 10 shr $0x10,%eax
28f: 41 81 e5 ff 0f 00 00 and $0xfff,%r13d
296: 41 c1 e5 10 shl $0x10,%r13d
29a: 41 09 c5 or %eax,%r13d
Grouping the arguments to the HWID_MCATYPE() macro fixes the issue.
( Boris suggested adding parentheses in the macro. )
Signed-off-by: Yazen Ghannam <Yazen.Ghannam@amd.com>
Cc: Aravind Gopalakrishnan <aravindksg.lkml@gmail.com>
Cc: Borislav Petkov <bp@suse.de>
Cc: Linus Torvalds <torvalds@linux-foundation.org>
Cc: Peter Zijlstra <peterz@infradead.org>
Cc: Thomas Gleixner <tglx@linutronix.de>
Cc: Tony Luck <tony.luck@intel.com>
Cc: linux-edac@vger.kernel.org
Cc: linux-kernel@vger.kernel.org
Signed-off-by: Ingo Molnar <mingo@kernel.org>
The following RCU lockdep warning led to adding irq_enter()/irq_exit() into
smp_reschedule_interrupt():
RCU used illegally from idle CPU!
rcu_scheduler_active = 1, debug_locks = 0
RCU used illegally from extended quiescent state!
no locks held by swapper/1/0.
do_trace_write_msr
native_write_msr
native_apic_msr_eoi_write
smp_reschedule_interrupt
reschedule_interrupt
As Peterz pointed out:
| So now we're making a very frequent interrupt slower because of debug
| code.
|
| The thing is, many many smp_reschedule_interrupt() invocations don't
| actually execute anything much at all and are only sent to tickle the
| return to user path (which does the actual preemption).
|
| Having to do the whole irq_enter/irq_exit dance just for this unlikely
| debug case totally blows.
Use the wrmsr_notrace() variant in native_apic_msr_write_eoi, annotate the
kvm variant with notrace and add a native_apic_eoi callback to the apic
structure so KVM guests are covered as well.
This allows to revert the irq_enter/irq_exit dance in
smp_reschedule_interrupt().
Suggested-by: Peter Zijlstra <peterz@infradead.org>
Suggested-by: Paolo Bonzini <pbonzini@redhat.com>
Signed-off-by: Wanpeng Li <wanpeng.li@hotmail.com>
Acked-by: Paolo Bonzini <pbonzini@redhat.com>
Cc: kvm@vger.kernel.org
Cc: Mike Galbraith <efault@gmx.de>
Cc: Borislav Petkov <bp@alien8.de>
Link: http://lkml.kernel.org/r/1478488420-5982-3-git-send-email-wanpeng.li@hotmail.com
Signed-off-by: Thomas Gleixner <tglx@linutronix.de>
Commit cc7cc02bad ("PCI: Query platform firmware for device power
state") augmented struct pci_platform_pm_ops with a ->get_state hook and
implemented it for acpi_pci_platform_pm, the only pci_platform_pm_ops
existing till v4.7.
However v4.8 introduced another pci_platform_pm_ops for Intel Mobile
Internet Devices with commit 5823d0893e ("x86/platform/intel-mid: Add
Power Management Unit driver"). It is missing the ->get_state hook,
which is fatal since pci_set_platform_pm() enforces its presence. Andy
Shevchenko reports that without the present commit, such a device
"crashes without even a character printed out on serial console and
reboots (since watchdog)".
Retrofit mid_pci_platform_pm with the missing callback to fix the
breakage.
Acked-and-tested-by: Andy Shevchenko <andriy.shevchenko@linux.intel.com>
Fixes: cc7cc02bad ("PCI: Query platform firmware for device power state")
Signed-off-by: Lukas Wunner <lukas@wunner.de>
Acked-by: Bjorn Helgaas <bhelgaas@google.com>
Cc: linux-pci@vger.kernel.org
Cc: Andy Shevchenko <andriy.shevchenko@linux.intel.com>
Link: http://lkml.kernel.org/r/7c1567d4c49303a4aada94ba16275cbf56b8976b.1477221514.git.lukas@wunner.de
Signed-off-by: Thomas Gleixner <tglx@linutronix.de>
Pull KVM updates from Paolo Bonzini:
"One NULL pointer dereference, and two fixes for regressions introduced
during the merge window.
The rest are fixes for MIPS, s390 and nested VMX"
* tag 'for-linus' of git://git.kernel.org/pub/scm/virt/kvm/kvm:
kvm: x86: Check memopp before dereference (CVE-2016-8630)
kvm: nVMX: VMCLEAR an active shadow VMCS after last use
KVM: x86: drop TSC offsetting kvm_x86_ops to fix KVM_GET/SET_CLOCK
KVM: x86: fix wbinvd_dirty_mask use-after-free
kvm/x86: Show WRMSR data is in hex
kvm: nVMX: Fix kernel panics induced by illegal INVEPT/INVVPID types
KVM: document lock orders
KVM: fix OOPS on flush_work
KVM: s390: Fix STHYI buffer alignment for diag224
KVM: MIPS: Precalculate MMIO load resume PC
KVM: MIPS: Make ERET handle ERL before EXL
KVM: MIPS: Fix lazy user ASID regenerate for SMP
When a memory slot is being moved or removed users of page track
can be notified. So users can drop write-protection for the pages
in that memory slot.
This notifier type is needed by KVMGT to sync up its shadow page
table when memory slot is being moved or removed.
Register the notifier type track_flush_slot to receive memslot move
and remove event.
Reviewed-by: Xiao Guangrong <guangrong.xiao@intel.com>
Signed-off-by: Chen Xiaoguang <xiaoguang.chen@intel.com>
[Squashed commits to avoid bisection breakage and reworded the subject.]
Signed-off-by: Radim Krčmář <rkrcmar@redhat.com>
These are never used by the host, but they can still be reflected to
the guest.
Tested-by: Ladi Prosek <lprosek@redhat.com>
Signed-off-by: Paolo Bonzini <pbonzini@redhat.com>
When a memory slot is being moved or removed users of page track
can be notified. So users can drop write-protection for the pages
in that memory slot.
This notifier type is needed by KVMGT to sync up its shadow page
table when memory slot is being moved or removed.
Register the notifier type track_flush_slot to receive memslot move
and remove event.
Reviewed-by: Xiao Guangrong <guangrong.xiao@intel.com>
Signed-off-by: Chen Xiaoguang <xiaoguang.chen@intel.com>
[Squashed commits to avoid bisection breakage and reworded the subject.]
Signed-off-by: Radim Krčmář <rkrcmar@redhat.com>
Since commit a545ab6a00 ("kvm: x86: add tsc_offset field to struct
kvm_vcpu_arch", 2016-09-07) the offset between host and L1 TSC is
cached and need not be fished out of the VMCS or VMCB. This means
that we can implement adjust_tsc_offset_guest and read_l1_tsc
entirely in generic code. The simplification is particularly
significant for VMX code, where vmx->nested.vmcs01_tsc_offset
was duplicating what is now in vcpu->arch.tsc_offset. Therefore
the vmcs01_tsc_offset can be dropped completely.
More importantly, this fixes KVM_GET_CLOCK/KVM_SET_CLOCK
which, after commit 108b249c45 ("KVM: x86: introduce get_kvmclock_ns",
2016-09-01) called read_l1_tsc while the VMCS was not loaded.
It thus returned bogus values on Intel CPUs.
Fixes: 108b249c45
Reported-by: Roman Kagan <rkagan@virtuozzo.com>
Reviewed-by: Radim Krčmář <rkrcmar@redhat.com>
Signed-off-by: Paolo Bonzini <pbonzini@redhat.com>