The header include/xen/interface/xen.h doesn't contain all definitions
from Xen's version of that header. Update it accordingly.
Signed-off-by: Juergen Gross <jgross@suse.com>
Reviewed-by: David Vrabel <david.vrabel@citrix.com>
Signed-off-by: David Vrabel <david.vrabel@citrix.com>
Today there are several places in the kernel which build tables
containing one entry for each possible Xen hypercall. Create an
infrastructure to be able to generate these tables at build time.
Based-on-patch-by: Jan Beulich <jbeulich@suse.com>
Signed-off-by: Juergen Gross <jgross@suse.com>
Reviewed-by: David Vrabel <david.vrabel@citrix.com>
Acked-by: Ingo Molnar <mingo@kernel.org>
Signed-off-by: David Vrabel <david.vrabel@citrix.com>
In Linux-3.9 we removed the mwait_idle() loop:
69fb3676df ("x86 idle: remove mwait_idle() and "idle=mwait" cmdline param")
The reasoning was that modern machines should be sufficiently
happy during the boot process using the default_idle() HALT
loop, until cpuidle loads and either acpi_idle or intel_idle
invoke the newer MWAIT-with-hints idle loop.
But two machines reported problems:
1. Certain Core2-era machines support MWAIT-C1 and HALT only.
MWAIT-C1 is preferred for optimal power and performance.
But if they support just C1, cpuidle never loads and
so they use the boot-time default idle loop forever.
2. Some laptops will boot-hang if HALT is used,
but will boot successfully if MWAIT is used.
This appears to be a hidden assumption in BIOS SMI,
that is presumably valid on the proprietary OS
where the BIOS was validated.
https://bugzilla.kernel.org/show_bug.cgi?id=60770
So here we effectively revert the patch above, restoring
the mwait_idle() loop. However, we don't bother restoring
the idle=mwait cmdline parameter, since it appears to add
no value.
Maintainer notes:
For 3.9, simply revert 69fb3676df
for 3.10, patch -F3 applies, fuzz needed due to __cpuinit use in
context For 3.11, 3.12, 3.13, this patch applies cleanly
Tested-by: Mike Galbraith <bitbucket@online.de>
Signed-off-by: Len Brown <len.brown@intel.com>
Acked-by: Mike Galbraith <bitbucket@online.de>
Cc: <stable@vger.kernel.org> # 3.9+
Cc: Borislav Petkov <bp@alien8.de>
Cc: H. Peter Anvin <hpa@zytor.com>
Cc: Ian Malone <ibmalone@gmail.com>
Cc: Josh Boyer <jwboyer@redhat.com>
Cc: Linus Torvalds <torvalds@linux-foundation.org>
Cc: Mike Galbraith <efault@gmx.de>
Cc: Peter Zijlstra <peterz@infradead.org>
Cc: Thomas Gleixner <tglx@linutronix.de>
Link: http://lkml.kernel.org/r/345254a551eb5a6a866e048d7ab570fd2193aca4.1389763084.git.len.brown@intel.com
[ Ported to recent kernels. ]
Signed-off-by: Ingo Molnar <mingo@kernel.org>
Change the ownership of power_supply structure from each driver
implementing the class to the power supply core.
The patch changes power_supply_register() function thus all drivers
implementing power supply class are adjusted.
Each driver provides the implementation of power supply. However it
should not be the owner of power supply class instance because it is
exposed by core to other subsystems with power_supply_get_by_name().
These other subsystems have no knowledge when the driver will unregister
the power supply. This leads to several issues when driver is unbound -
mostly because user of power supply accesses freed memory.
Instead let the core own the instance of struct 'power_supply'. Other
users of this power supply will still access valid memory because it
will be freed when device reference count reaches 0. Currently this
means "it will leak" but power_supply_put() call in next patches will
solve it.
This solves invalid memory references in following race condition
scenario:
Thread 1: charger manager
Thread 2: power supply driver, used by charger manager
THREAD 1 (charger manager) THREAD 2 (power supply driver)
========================== ==============================
psy = power_supply_get_by_name()
Driver unbind, .remove
power_supply_unregister()
Device fully removed
psy->get_property()
The 'get_property' call is executed in invalid context because the driver was
unbound and struct 'power_supply' memory was freed.
This could be observed easily with charger manager driver (here compiled
with max17040 fuel gauge):
$ cat /sys/devices/virtual/power_supply/cm-battery/capacity &
$ echo "1-0036" > /sys/bus/i2c/drivers/max17040/unbind
[ 55.725123] Unable to handle kernel NULL pointer dereference at virtual address 00000000
[ 55.732584] pgd = d98d4000
[ 55.734060] [00000000] *pgd=5afa2831, *pte=00000000, *ppte=00000000
[ 55.740318] Internal error: Oops: 80000007 [#1] PREEMPT SMP ARM
[ 55.746210] Modules linked in:
[ 55.749259] CPU: 1 PID: 2936 Comm: cat Tainted: G W 3.19.0-rc1-next-20141226-00048-gf79f475f3c44-dirty #1496
[ 55.760190] Hardware name: SAMSUNG EXYNOS (Flattened Device Tree)
[ 55.766270] task: d9b76f00 ti: daf54000 task.ti: daf54000
[ 55.771647] PC is at 0x0
[ 55.774182] LR is at charger_get_property+0x2f4/0x36c
[ 55.779201] pc : [<00000000>] lr : [<c034b0b4>] psr: 60000013
[ 55.779201] sp : daf55e90 ip : 00000003 fp : 00000000
[ 55.790657] r10: 00000000 r9 : c06e2878 r8 : d9b26c68
[ 55.795865] r7 : dad81610 r6 : daec7410 r5 : daf55ebc r4 : 00000000
[ 55.802367] r3 : 00000000 r2 : daf55ebc r1 : 0000002a r0 : d9b26c68
[ 55.808879] Flags: nZCv IRQs on FIQs on Mode SVC_32 ISA ARM Segment user
[ 55.815994] Control: 10c5387d Table: 598d406a DAC: 00000015
[ 55.821723] Process cat (pid: 2936, stack limit = 0xdaf54210)
[ 55.827451] Stack: (0xdaf55e90 to 0xdaf56000)
[ 55.831795] 5e80: 60000013 c01459c4 0000002a c06f8ef8
[ 55.839956] 5ea0: db651000 c06f8ef8 daebac00 c04cb668 daebac08 c0346864 00000000 c01459c4
[ 55.848115] 5ec0: d99eaa80 c06f8ef8 00000fff 00001000 db651000 c027f25c c027f240 d99eaa80
[ 55.856274] 5ee0: d9a06c00 c0146218 daf55f18 00001000 d99eaa80 db4c18c0 00000001 00000001
[ 55.864468] 5f00: daf55f80 c0144c78 c0144c54 c0107f90 00015000 d99eaab0 00000000 00000000
[ 55.872603] 5f20: 000051c7 00000000 db4c18c0 c04a9370 00015000 00001000 daf55f80 00001000
[ 55.880763] 5f40: daf54000 00015000 00000000 c00e53dc db4c18c0 c00e548c 0000000d 00008124
[ 55.888937] 5f60: 00000001 00000000 00000000 db4c18c0 db4c18c0 00001000 00015000 c00e5550
[ 55.897099] 5f80: 00000000 00000000 00001000 00001000 00015000 00000003 00000003 c000f364
[ 55.905239] 5fa0: 00000000 c000f1a0 00001000 00015000 00000003 00015000 00001000 0001333c
[ 55.913399] 5fc0: 00001000 00015000 00000003 00000003 00000002 00000000 00000000 00000000
[ 55.921560] 5fe0: 7fffe000 be999850 0000a225 b6f3c19c 60000010 00000003 00000000 00000000
[ 55.929744] [<c034b0b4>] (charger_get_property) from [<c0346864>] (power_supply_show_property+0x48/0x20c)
[ 55.939286] [<c0346864>] (power_supply_show_property) from [<c027f25c>] (dev_attr_show+0x1c/0x48)
[ 55.948130] [<c027f25c>] (dev_attr_show) from [<c0146218>] (sysfs_kf_seq_show+0x84/0x104)
[ 55.956298] [<c0146218>] (sysfs_kf_seq_show) from [<c0144c78>] (kernfs_seq_show+0x24/0x28)
[ 55.964536] [<c0144c78>] (kernfs_seq_show) from [<c0107f90>] (seq_read+0x1b0/0x484)
[ 55.972172] [<c0107f90>] (seq_read) from [<c00e53dc>] (__vfs_read+0x18/0x4c)
[ 55.979188] [<c00e53dc>] (__vfs_read) from [<c00e548c>] (vfs_read+0x7c/0x100)
[ 55.986304] [<c00e548c>] (vfs_read) from [<c00e5550>] (SyS_read+0x40/0x8c)
[ 55.993164] [<c00e5550>] (SyS_read) from [<c000f1a0>] (ret_fast_syscall+0x0/0x48)
[ 56.000626] Code: bad PC value
[ 56.011652] ---[ end trace 7b64343fbdae8ef1 ]---
Signed-off-by: Krzysztof Kozlowski <k.kozlowski@samsung.com>
Reviewed-by: Bartlomiej Zolnierkiewicz <b.zolnierkie@samsung.com>
[for the nvec part]
Reviewed-by: Marc Dietrich <marvin24@gmx.de>
[for compal-laptop.c]
Acked-by: Darren Hart <dvhart@linux.intel.com>
[for the mfd part]
Acked-by: Lee Jones <lee.jones@linaro.org>
[for the hid part]
Acked-by: Jiri Kosina <jkosina@suse.cz>
[for the acpi part]
Acked-by: Rafael J. Wysocki <rafael.j.wysocki@intel.com>
Signed-off-by: Sebastian Reichel <sre@kernel.org>
Pull xen bug fixes from David Vrabel:
- fix a PV regression in 3.19.
- fix a dom0 crash on hosts with large numbers of PIRQs.
- prevent pcifront from disabling memory or I/O port access, which may
trigger host crashes.
* tag 'stable/for-linus-4.0-rc3-tag' of git://git.kernel.org/pub/scm/linux/kernel/git/xen/tip:
xen-pciback: limit guest control of command register
xen/events: avoid NULL pointer dereference in dom0 on large machines
xen: Remove trailing semicolon from xenbus_register_frontend() definition
x86/xen: correct bug in p2m list initialization
While in L2, leave all #UD to L2 and do not try to emulate it. If L1 is
interested in doing this, it reports its interest via the exception
bitmap, and we never get into handle_exception of L0 anyway.
Signed-off-by: Jan Kiszka <jan.kiszka@siemens.com>
Signed-off-by: Marcelo Tosatti <mtosatti@redhat.com>
For a very long time (since 2b3d2a20), the path handling a vmmcall
instruction of the guest on an Intel host only applied the patch but no
longer handled the hypercall. The reverse case, vmcall on AMD hosts, is
fine. As both em_vmcall and em_vmmcall actually have to do the same, we
can fix the issue by consolidating both into the same handler.
Signed-off-by: Jan Kiszka <jan.kiszka@siemens.com>
Signed-off-by: Marcelo Tosatti <mtosatti@redhat.com>
Another patch in my war on emulate_on_interception() use as a svm exit handler.
These were pulled out of a larger patch at the suggestion of Radim Krcmar, see
https://lkml.org/lkml/2015/2/25/559
Changes since v1:
* fixed typo introduced after test, retested
Signed-off-by: David Kaplan <david.kaplan@amd.com>
[separated out just cr_interception part from larger removal of
INTERCEPT_CR0_WRITE, forward ported, tested]
Signed-off-by: Joel Schopp <joel.schopp@amd.com>
Reviewed-by: Radim Krčmář <rkrcmar@redhat.com>
Signed-off-by: Marcelo Tosatti <mtosatti@redhat.com>
In commit 3af18d9c5f ("KVM: nVMX: Prepare for using hardware MSR bitmap"),
we are setting MSR_BITMAP in prepare_vmcs02 if we should use hardware. This
is not enough since the field will be modified by following vmx_set_efer.
Fix this by setting vmx_msr_bitmap_nested in vmx_set_msr_bitmap if vcpu is
in guest mode.
Signed-off-by: Wincy Van <fanwenyi0529@gmail.com>
Signed-off-by: Marcelo Tosatti <mtosatti@redhat.com>
math_state_restore() assumes it is called with irqs disabled,
but this is not true if the caller is __restore_xstate_sig().
This means that if ia32_fxstate == T and __copy_from_user()
fails, __restore_xstate_sig() returns with irqs disabled too.
This triggers:
BUG: sleeping function called from invalid context at kernel/locking/rwsem.c:41
dump_stack
___might_sleep
? _raw_spin_unlock_irqrestore
__might_sleep
down_read
? _raw_spin_unlock_irqrestore
print_vma_addr
signal_fault
sys32_rt_sigreturn
Change __restore_xstate_sig() to call set_used_math()
unconditionally. This avoids enabling and disabling interrupts
in math_state_restore(). If copy_from_user() fails, we can
simply do fpu_finit() by hand.
[ Note: this is only the first step. math_state_restore() should
not check used_math(), it should set this flag. While
init_fpu() should simply die. ]
Signed-off-by: Oleg Nesterov <oleg@redhat.com>
Signed-off-by: Borislav Petkov <bp@suse.de>
Cc: <stable@vger.kernel.org>
Cc: Andy Lutomirski <luto@amacapital.net>
Cc: Borislav Petkov <bp@alien8.de>
Cc: Dave Hansen <dave.hansen@intel.com>
Cc: Fenghua Yu <fenghua.yu@intel.com>
Cc: H. Peter Anvin <hpa@zytor.com>
Cc: Linus Torvalds <torvalds@linux-foundation.org>
Cc: Pekka Riikonen <priikone@iki.fi>
Cc: Quentin Casasnovas <quentin.casasnovas@oracle.com>
Cc: Rik van Riel <riel@redhat.com>
Cc: Suresh Siddha <sbsiddha@gmail.com>
Cc: Thomas Gleixner <tglx@linutronix.de>
Link: http://lkml.kernel.org/r/20150307153844.GB25954@redhat.com
Signed-off-by: Ingo Molnar <mingo@kernel.org>
The semantic patch that fixes this problem is as follows:
(http://coccinelle.lip6.fr/)
// <smpl>
@r@
type T;
identifier f;
@@
static T f (...) { ... }
@@
identifier r.f;
declarer name EXPORT_SYMBOL_GPL;
@@
-EXPORT_SYMBOL_GPL(f);
// </smpl>
Signed-off-by: Julia Lawall <Julia.Lawall@lip6.fr>
Signed-off-by: Herbert Xu <herbert@gondor.apana.org.au>
The kernel crypto API logic requires the caller to provide the
length of (ciphertext || authentication tag) as cryptlen for the
AEAD decryption operation. Thus, the cipher implementation must
calculate the size of the plaintext output itself and cannot simply use
cryptlen.
The RFC4106 GCM decryption operation tries to overwrite cryptlen memory
in req->dst. As the destination buffer for decryption only needs to hold
the plaintext memory but cryptlen references the input buffer holding
(ciphertext || authentication tag), the assumption of the destination
buffer length in RFC4106 GCM operation leads to a too large size. This
patch simply uses the already calculated plaintext size.
In addition, this patch fixes the offset calculation of the AAD buffer
pointer: as mentioned before, cryptlen already includes the size of the
tag. Thus, the tag does not need to be added. With the addition, the AAD
will be written beyond the already allocated buffer.
Note, this fixes a kernel crash that can be triggered from user space
via AF_ALG(aead) -- simply use the libkcapi test application
from [1] and update it to use rfc4106-gcm-aes.
Using [1], the changes were tested using CAVS vectors to demonstrate
that the crypto operation still delivers the right results.
[1] http://www.chronox.de/libkcapi.html
CC: Tadeusz Struk <tadeusz.struk@intel.com>
Cc: stable@vger.kernel.org
Signed-off-by: Stephan Mueller <smueller@chronox.de>
Signed-off-by: Herbert Xu <herbert@gondor.apana.org.au>
If data is read from PIC with invalid access size, the return data stays
uninitialized even though success is returned.
Fix this by always initializing the data.
Signed-off-by: Petr Matousek <pmatouse@redhat.com>
Reported-by: Nadav Amit <nadav.amit@gmail.com>
Signed-off-by: Marcelo Tosatti <mtosatti@redhat.com>
This commit removes the open-coded CPU-offline notification with new
common code. Among other things, this change avoids calling scheduler
code using RCU from an offline CPU that RCU is ignoring. It also allows
Xen to notice at online time that the CPU did not go offline correctly.
Note that Xen has the surviving CPU carry out some cleanup operations,
so if the surviving CPU times out, these cleanup operations might have
been carried out while the outgoing CPU was still running. It might
therefore be unwise to bring this CPU back online, and this commit
avoids doing so.
Signed-off-by: Boris Ostrovsky <boris.ostrovsky@oracle.com>
Signed-off-by: Paul E. McKenney <paulmck@linux.vnet.ibm.com>
Cc: <x86@kernel.org>
Cc: Konrad Rzeszutek Wilk <konrad.wilk@oracle.com>
Cc: David Vrabel <david.vrabel@citrix.com>
Cc: <xen-devel@lists.xenproject.org>
This patch fixes the following sparse warnings:
drivers/tty/serial/8250/8250_core.c:3231:32: warning: incorrect type in assignment (different base types)
drivers/tty/serial/8250/8250_core.c:3231:32: expected restricted upf_t [usertype] flags
drivers/tty/serial/8250/8250_core.c:3231:32: got unsigned int const [unsigned] flags
Signed-off-by: Andy Shevchenko <andriy.shevchenko@linux.intel.com>
Signed-off-by: Greg Kroah-Hartman <gregkh@linuxfoundation.org>
POWER supports irqfds but forgot to advertise them. Some userspace does
not check for the capability, but others check it---thus they work on
x86 and s390 but not POWER.
To avoid that other architectures in the future make the same mistake, let
common code handle KVM_CAP_IRQFD the same way as KVM_CAP_IRQFD_RESAMPLE.
Reported-and-tested-by: Greg Kurz <gkurz@linux.vnet.ibm.com>
Cc: stable@vger.kernel.org
Fixes: 297e21053a
Signed-off-by: Paolo Bonzini <pbonzini@redhat.com>
Signed-off-by: Marcelo Tosatti <mtosatti@redhat.com>
No need to re-decode WBINVD since we know what it is from the intercept.
Signed-off-by: David Kaplan <David.Kaplan@amd.com>
[extracted from larger unlrelated patch, forward ported, tested,style cleanup]
Signed-off-by: Joel Schopp <joel.schopp@amd.com>
Reviewed-by: Radim Krčmář <rkrcmar@redhat.com>
Signed-off-by: Marcelo Tosatti <mtosatti@redhat.com>
Currently kvm_emulate() skips the instruction but kvm_emulate_* sometimes
don't. The end reult is the caller ends up doing the skip themselves.
Let's make them consistant.
Signed-off-by: Joel Schopp <joel.schopp@amd.com>
Reviewed-by: Radim Krčmář <rkrcmar@redhat.com>
Signed-off-by: Marcelo Tosatti <mtosatti@redhat.com>
kvm_kvfree() provides exactly the same functionality as the
new common kvfree() function - so let's simply replace the
kvm function with the common function.
Signed-off-by: Thomas Huth <thuth@linux.vnet.ibm.com>
Signed-off-by: Marcelo Tosatti <mtosatti@redhat.com>
This patch fixes the bug discussed in
https://www.mail-archive.com/kvm@vger.kernel.org/msg109813.html
This patch uses a new field named irr_delivered to record the
delivery status of edge-triggered interrupts, and clears the
delivered interrupts in kvm_get_ioapic. So it has the same effect
of commit 0bc830b05c
("KVM: ioapic: clear IRR for edge-triggered interrupts at delivery")
while avoids the bug of Windows guests.
Signed-off-by: Wincy Van <fanwenyi0529@gmail.com>
Signed-off-by: Marcelo Tosatti <mtosatti@redhat.com>
KVM has nice wrappers to access the register values, clean up a few places
that should use them but currently do not.
Signed-off-by: David Kaplan <david.kaplan@amd.com>
[forward port and testing]
Signed-off-by: Joel Schopp <joel.schopp@amd.com>
Acked-by: Borislav Petkov <bp@suse.de>
Signed-off-by: Marcelo Tosatti <mtosatti@redhat.com>
Current context tracking symbols are designed to express living state.
As such they are prefixed with "IN_": IN_USER, IN_KERNEL.
Now we are going to use these symbols to also express state transitions
such as context_tracking_enter(IN_USER) or context_tracking_exit(IN_USER).
But while the "IN_" prefix works well to express entering a context, it's
confusing to depict a context exit: context_tracking_exit(IN_USER)
could mean two things:
1) We are exiting the current context to enter user context.
2) We are exiting the user context
We want 2) but the reviewer may be confused and understand 1)
So lets disambiguate these symbols and rename them to CONTEXT_USER and
CONTEXT_KERNEL.
Acked-by: Rik van Riel <riel@redhat.com>
Cc: Paul E. McKenney <paulmck@linux.vnet.ibm.com>
Cc: Andy Lutomirski <luto@amacapital.net>
Cc: Will deacon <will.deacon@arm.com>
Cc: Marcelo Tosatti <mtosatti@redhat.com>
Cc: Christian Borntraeger <borntraeger@de.ibm.com>
Cc: Luiz Capitulino <lcapitulino@redhat.com>
Cc: Paolo Bonzini <pbonzini@redhat.com>
Signed-off-by: Frederic Weisbecker <fweisbec@gmail.com>
Pull x86 CPU cacheinfo changes from Borislav Petkov:
"Convert x86 cacheinfo code to the generic sysfs cacheinfo infrastructure. (Sudeep Holla)"
Signed-off-by: Ingo Molnar <mingo@kernel.org>
This patch removes the redundant sysfs cacheinfo code by reusing
the newly introduced generic cacheinfo infrastructure through the
commit
246246cbde ("drivers: base: support cpu cache information
interface to userspace via sysfs")
The private pointer provided by the cacheinfo is used to implement
the AMD L3 cache-specific attributes.
Note that with v4.0-rc1, commit
513e3d2d11 ("cpumask: always use nr_cpu_ids in formatting and parsing
functions")
in particular changes from long format to shorter one for all cpumasks
sysfs entries. As the consequence of the same, even the shared_cpu_map
in the cacheinfo sysfs was also changed.
This patch neither alters any existing sysfs entries nor their
formating, however since the generic cacheinfo has switched to use the
device attributes instead of the traditional raw kobjects, a directory
named "power" along with its standard attributes are added similar to
any other device.
Signed-off-by: Sudeep Holla <sudeep.holla@arm.com>
Cc: Thomas Gleixner <tglx@linutronix.de>
Cc: Ingo Molnar <mingo@redhat.com>
Cc: "H. Peter Anvin" <hpa@zytor.com>
Cc: Andre Przywara <andre.przywara@arm.com>
Link: http://lkml.kernel.org/r/1425470416-20691-1-git-send-email-sudeep.holla@arm.com
[ Add a check for uninitialized this_cpu_ci for the cpu_has_topoext case too
in __cache_amd_cpumap_setup() ]
Signed-off-by: Borislav Petkov <bp@suse.de>
By the nature of the TEST operation, it is often possible to test
a narrower part of the operand:
"testl $3, mem" -> "testb $3, mem",
"testq $3, %rcx" -> "testb $3, %cl"
This results in shorter instructions, because the TEST instruction
has no sign-entending byte-immediate forms unlike other ALU ops.
Note that this change does not create any LCP (Length-Changing Prefix)
stalls, which happen when adding a 0x66 prefix, which happens when
16-bit immediates are used, which changes such TEST instructions:
[test_opcode] [modrm] [imm32]
to:
[0x66] [test_opcode] [modrm] [imm16]
where [imm16] has a *different length* now: 2 bytes instead of 4.
This confuses the decoder and slows down execution.
REX prefixes were carefully designed to almost never hit this case:
adding REX prefix does not change instruction length except MOVABS
and MOV [addr],RAX instruction.
This patch does not add instructions which would use a 0x66 prefix,
code changes in assembly are:
-48 f7 07 01 00 00 00 testq $0x1,(%rdi)
+f6 07 01 testb $0x1,(%rdi)
-48 f7 c1 01 00 00 00 test $0x1,%rcx
+f6 c1 01 test $0x1,%cl
-48 f7 c1 02 00 00 00 test $0x2,%rcx
+f6 c1 02 test $0x2,%cl
-41 f7 c2 01 00 00 00 test $0x1,%r10d
+41 f6 c2 01 test $0x1,%r10b
-48 f7 c1 04 00 00 00 test $0x4,%rcx
+f6 c1 04 test $0x4,%cl
-48 f7 c1 08 00 00 00 test $0x8,%rcx
+f6 c1 08 test $0x8,%cl
Linus further notes:
"There are no stalls from using 8-bit instruction forms.
Now, changing from 64-bit or 32-bit 'test' instructions to 8-bit ones
*could* cause problems if it ends up having forwarding issues, so that
instead of just forwarding the result, you end up having to wait for
it to be stable in the L1 cache (or possibly the register file). The
forwarding from the store buffer is simplest and most reliable if the
read is done at the exact same address and the exact same size as the
write that gets forwarded.
But that's true only if:
(a) the write was very recent and is still in the write queue. I'm
not sure that's the case here anyway.
(b) on at least most Intel microarchitectures, you have to test a
different byte than the lowest one (so forwarding a 64-bit write
to a 8-bit read ends up working fine, as long as the 8-bit read
is of the low 8 bits of the written data).
A very similar issue *might* show up for registers too, not just
memory writes, if you use 'testb' with a high-byte register (where
instead of forwarding the value from the original producer it needs to
go through the register file and then shifted). But it's mainly a
problem for store buffers.
But afaik, the way Denys changed the test instructions, neither of the
above issues should be true.
The real problem for store buffer forwarding tends to be "write 8
bits, read 32 bits". That can be really surprisingly expensive,
because the read ends up having to wait until the write has hit the
cacheline, and we might talk tens of cycles of latency here. But
"write 32 bits, read the low 8 bits" *should* be fast on pretty much
all x86 chips, afaik."
Signed-off-by: Denys Vlasenko <dvlasenk@redhat.com>
Acked-by: Andy Lutomirski <luto@amacapital.net>
Acked-by: Linus Torvalds <torvalds@linux-foundation.org>
Cc: Borislav Petkov <bp@alien8.de>
Cc: Frederic Weisbecker <fweisbec@gmail.com>
Cc: H. Peter Anvin <hpa@linux.intel.com>
Cc: H. Peter Anvin <hpa@zytor.com>
Cc: Kees Cook <keescook@chromium.org>
Cc: Oleg Nesterov <oleg@redhat.com>
Cc: Steven Rostedt <rostedt@goodmis.org>
Cc: Will Drewry <wad@chromium.org>
Link: http://lkml.kernel.org/r/1425675332-31576-1-git-send-email-dvlasenk@redhat.com
Signed-off-by: Ingo Molnar <mingo@kernel.org>
I broke 32-bit kernels. The implementation of sp0 was correct
as far as I can tell, but sp0 was much weirder on x86_32 than I
realized. It has the following issues:
- Init's sp0 is inconsistent with everything else's: non-init tasks
are offset by 8 bytes. (I have no idea why, and the comment is unhelpful.)
- vm86 does crazy things to sp0.
Fix it up by replacing this_cpu_sp0() with
current_top_of_stack() and using a new percpu variable to track
the top of the stack on x86_32.
Signed-off-by: Andy Lutomirski <luto@amacapital.net>
Cc: Borislav Petkov <bp@alien8.de>
Cc: Denys Vlasenko <dvlasenk@redhat.com>
Cc: H. Peter Anvin <hpa@zytor.com>
Cc: Linus Torvalds <torvalds@linux-foundation.org>
Cc: Oleg Nesterov <oleg@redhat.com>
Cc: Thomas Gleixner <tglx@linutronix.de>
Fixes: 75182b1632 ("x86/asm/entry: Switch all C consumers of kernel_stack to this_cpu_sp0()")
Link: http://lkml.kernel.org/r/d09dbe270883433776e0cbee3c7079433349e96d.1425692936.git.luto@amacapital.net
Signed-off-by: Ingo Molnar <mingo@kernel.org>
Since we have a native 8250 driver carrying the Intel MID serial devices the
specific support is not needed anymore. This patch removes it for Intel MID.
Note that the console device name is changed from ttyMFDx to ttySx.
Signed-off-by: Andy Shevchenko <andriy.shevchenko@linux.intel.com>
Acked-by: Ingo Molnar <mingo@kernel.org>
Signed-off-by: Greg Kroah-Hartman <gregkh@linuxfoundation.org>
Pull power management and ACPI fixes from Rafael Wysocki:
"These are fixes for recent regressions (ACPI resources management,
suspend-to-idle), stable-candidate fixes (ACPI backlight), fixes
related to the wakeup IRQ management changes made in v3.18, other
fixes (suspend-to-idle, cpufreq ppc driver) and a couple of cleanups
(suspend-to-idle, generic power domains, ACPI backlight).
Specifics:
- Fix ACPI resources management problems introduced by the recent
rework of the code in question (Jiang Liu) and a build issue
introduced by those changes (Joachim Nilsson).
- Fix a recent suspend-to-idle regression on systems where entering
idle states causes local timers to stop, prevent suspend-to-idle
from crashing in restricted configurations (no cpuidle driver,
cpuidle disabled etc.) and clean up the idle loop somewhat while at
it (Rafael J Wysocki).
- Fix build problem in the cpufreq ppc driver (Geert Uytterhoeven).
- Allow the ACPI backlight driver module to be loaded if ACPI is
disabled which helps the i915 driver in those configurations
(stable-candidate) and change the code to help debug unusual use
cases (Chris Wilson).
- Wakeup IRQ management changes in v3.18 caused some drivers on the
at91 platform to trigger a warning from the IRQ core related to an
unexpected combination of interrupt action handler flags. However,
on at91 a timer IRQ is shared with some other devices (including
system wakeup ones) and that leads to the unusual combination of
flags in question.
To make it possible to avoid the warning introduce a new interrupt
action handler flag (which can be used by drivers to indicate the
special case to the core) and rework the problematic at91 drivers
to use it and work as expected during system suspend/resume. From
Boris Brezillon, Rafael J Wysocki and Mark Rutland.
- Clean up the generic power domains subsystem's debugfs interface
(Kevin Hilman)"
* tag 'pm+acpi-4.0-rc3' of git://git.kernel.org/pub/scm/linux/kernel/git/rafael/linux-pm:
genirq / PM: describe IRQF_COND_SUSPEND
tty: serial: atmel: rework interrupt and wakeup handling
watchdog: at91sam9: request the irq with IRQF_NO_SUSPEND
cpuidle / sleep: Use broadcast timer for states that stop local timer
clk: at91: implement suspend/resume for the PMC irqchip
rtc: at91rm9200: rework wakeup and interrupt handling
rtc: at91sam9: rework wakeup and interrupt handling
PM / wakeup: export pm_system_wakeup symbol
genirq / PM: Add flag for shared NO_SUSPEND interrupt lines
ACPI / video: Propagate the error code for acpi_video_register
ACPI / video: Load the module even if ACPI is disabled
PM / Domains: cleanup: rename gpd -> genpd in debugfs interface
cpufreq: ppc: Add missing #include <asm/smp.h>
x86/PCI/ACPI: Relax ACPI resource descriptor checks to work around BIOS bugs
x86/PCI/ACPI: Ignore resources consumed by host bridge itself
cpuidle: Clean up fallback handling in cpuidle_idle_call()
cpuidle / sleep: Do sanity checks in cpuidle_enter_freeze() too
idle / sleep: Avoid excessive disabling and enabling interrupts
PCI: versatile: Update for list_for_each_entry() API change
genirq / PM: better describe IRQF_NO_SUSPEND semantics
On gcc5 the kernel does not link:
ld: .eh_frame_hdr table[4] FDE at 0000000000000648 overlaps table[5] FDE at 0000000000000670.
Because prior GCC versions always emitted NOPs on ALIGN directives, but
gcc5 started omitting them.
.LSTARTFDEDLSI1 says:
/* HACK: The dwarf2 unwind routines will subtract 1 from the
return address to get an address in the middle of the
presumed call instruction. Since we didn't get here via
a call, we need to include the nop before the real start
to make up for it. */
.long .LSTART_sigreturn-1-. /* PC-relative start address */
But commit 69d0627a7f ("x86 vDSO: reorder vdso32 code") from 2.6.25
replaced .org __kernel_vsyscall+32,0x90 by ALIGN right before
__kernel_sigreturn.
Of course, ALIGN need not generate any NOP in there. Esp. gcc5 collapses
vclock_gettime.o and int80.o together with no generated NOPs as "ALIGN".
So fix this by adding to that point at least a single NOP and make the
function ALIGN possibly with more NOPs then.
Kudos for reporting and diagnosing should go to Richard.
Reported-by: Richard Biener <rguenther@suse.de>
Signed-off-by: Jiri Slaby <jslaby@suse.cz>
Acked-by: Andy Lutomirski <luto@amacapital.net>
Cc: <stable@vger.kernel.org>
Cc: Borislav Petkov <bp@alien8.de>
Cc: H. Peter Anvin <hpa@zytor.com>
Cc: Linus Torvalds <torvalds@linux-foundation.org>
Cc: Thomas Gleixner <tglx@linutronix.de>
Link: http://lkml.kernel.org/r/1425543211-12542-1-git-send-email-jslaby@suse.cz
Signed-off-by: Ingo Molnar <mingo@kernel.org>