Mathieu reported that since 317f394160 ("sched: Move the second half
of ttwu() to the remote cpu") trace_sched_wakeup() can happen out of
context of the waker.
This is a problem when you want to analyse wakeup paths because it is
now very hard to correlate the wakeup event to whoever issued the
wakeup.
OTOH trace_sched_wakeup() is issued at the point where we set
p->state = TASK_RUNNING, which is right were we hand the task off to
the scheduler, so this is an important point when looking at
scheduling behaviour, up to here its been the wakeup path everything
hereafter is due to scheduler policy.
To bridge this gap, introduce a second tracepoint: trace_sched_waking.
It is guaranteed to be called in the waker context.
Reported-by: Mathieu Desnoyers <mathieu.desnoyers@efficios.com>
Signed-off-by: Peter Zijlstra (Intel) <peterz@infradead.org>
Cc: Francis Giraldeau <francis.giraldeau@gmail.com>
Cc: Linus Torvalds <torvalds@linux-foundation.org>
Cc: Mike Galbraith <efault@gmx.de>
Cc: Peter Zijlstra <peterz@infradead.org>
Cc: Steven Rostedt <rostedt@goodmis.org>
Cc: Thomas Gleixner <tglx@linutronix.de>
Link: http://lkml.kernel.org/r/20150609091336.GQ3644@twins.programming.kicks-ass.net
Signed-off-by: Ingo Molnar <mingo@kernel.org>
There are various problems and short-comings with the current
static_key interface:
- static_key_{true,false}() read like a branch depending on the key
value, instead of the actual likely/unlikely branch depending on
init value.
- static_key_{true,false}() are, as stated above, tied to the
static_key init values STATIC_KEY_INIT_{TRUE,FALSE}.
- we're limited to the 2 (out of 4) possible options that compile to
a default NOP because that's what our arch_static_branch() assembly
emits.
So provide a new static_key interface:
DEFINE_STATIC_KEY_TRUE(name);
DEFINE_STATIC_KEY_FALSE(name);
Which define a key of different types with an initial true/false
value.
Then allow:
static_branch_likely()
static_branch_unlikely()
to take a key of either type and emit the right instruction for the
case.
This means adding a second arch_static_branch_jump() assembly helper
which emits a JMP per default.
In order to determine the right instruction for the right state,
encode the branch type in the LSB of jump_entry::key.
This is the final step in removing the naming confusion that has led to
a stream of avoidable bugs such as:
a833581e37 ("x86, perf: Fix static_key bug in load_mm_cr4()")
... but it also allows new static key combinations that will give us
performance enhancements in the subsequent patches.
Tested-by: Rabin Vincent <rabin@rab.in> # arm
Signed-off-by: Peter Zijlstra (Intel) <peterz@infradead.org>
Acked-by: Michael Ellerman <mpe@ellerman.id.au> # ppc
Acked-by: Heiko Carstens <heiko.carstens@de.ibm.com> # s390
Cc: Andrew Morton <akpm@linux-foundation.org>
Cc: Linus Torvalds <torvalds@linux-foundation.org>
Cc: Paul E. McKenney <paulmck@linux.vnet.ibm.com>
Cc: Peter Zijlstra <peterz@infradead.org>
Cc: Thomas Gleixner <tglx@linutronix.de>
Cc: linux-kernel@vger.kernel.org
Signed-off-by: Ingo Molnar <mingo@kernel.org>
Since we've already stepped away from ENABLE is a JMP and DISABLE is a
NOP with the branch_default bits, and are going to make it even worse,
rename it to make it all clearer.
This way we don't mix multiple levels of logic attributes, but have a
plain 'physical' name for what the current instruction patching status
of a jump label is.
This is a first step in removing the naming confusion that has led to
a stream of avoidable bugs such as:
a833581e37 ("x86, perf: Fix static_key bug in load_mm_cr4()")
Signed-off-by: Peter Zijlstra (Intel) <peterz@infradead.org>
Cc: Andrew Morton <akpm@linux-foundation.org>
Cc: Linus Torvalds <torvalds@linux-foundation.org>
Cc: Paul E. McKenney <paulmck@linux.vnet.ibm.com>
Cc: Peter Zijlstra <peterz@infradead.org>
Cc: Thomas Gleixner <tglx@linutronix.de>
Cc: linux-kernel@vger.kernel.org
[ Beefed up the changelog. ]
Signed-off-by: Ingo Molnar <mingo@kernel.org>
Upcoming changes to static keys is interacting/conflicting with the following
pending TSC commits in tip:x86/asm:
4ea1636b04 x86/asm/tsc: Rename native_read_tsc() to rdtsc()
...
So merge it into the locking tree to have a smoother resolution.
Signed-off-by: Ingo Molnar <mingo@kernel.org>
For an over-committed guest with more vCPUs than physical CPUs
available, it is possible that a vCPU may be kicked twice before
getting the lock - once before it becomes queue head and once again
before it gets the lock. All these CPU kicking and halting (VMEXIT)
can be expensive and slow down system performance.
This patch adds a new vCPU state (vcpu_hashed) which enables the code
to delay CPU kicking until at unlock time. Once this state is set,
the new lock holder will set _Q_SLOW_VAL and fill in the hash table
on behalf of the halted queue head vCPU. The original vcpu_halted
state will be used by pv_wait_node() only to differentiate other
queue nodes from the qeue head.
Signed-off-by: Waiman Long <Waiman.Long@hp.com>
Signed-off-by: Peter Zijlstra (Intel) <peterz@infradead.org>
Cc: Andrew Morton <akpm@linux-foundation.org>
Cc: Douglas Hatch <doug.hatch@hp.com>
Cc: H. Peter Anvin <hpa@zytor.com>
Cc: Linus Torvalds <torvalds@linux-foundation.org>
Cc: Paul E. McKenney <paulmck@linux.vnet.ibm.com>
Cc: Peter Zijlstra <peterz@infradead.org>
Cc: Scott J Norton <scott.norton@hp.com>
Cc: Thomas Gleixner <tglx@linutronix.de>
Link: http://lkml.kernel.org/r/1436647018-49734-2-git-send-email-Waiman.Long@hp.com
Signed-off-by: Ingo Molnar <mingo@kernel.org>
Currently, a reader will check first to make sure that the writer mode
byte is cleared before incrementing the reader count. That waiting is
not really necessary. It increases the latency in the reader/writer
to reader transition and reduces readers performance.
This patch eliminates that waiting. It also has the side effect
of reducing the chance of writer lock stealing and improving the
fairness of the lock. Using a locking microbenchmark, a 10-threads 5M
locking loop of mostly readers (RW ratio = 10,000:1) has the following
performance numbers in a Haswell-EX box:
Kernel Locking Rate (Kops/s)
------ ---------------------
4.1.1 15,063,081
4.1.1+patch 17,241,552 (+14.4%)
Signed-off-by: Waiman Long <Waiman.Long@hp.com>
Signed-off-by: Peter Zijlstra (Intel) <peterz@infradead.org>
Cc: Andrew Morton <akpm@linux-foundation.org>
Cc: Arnd Bergmann <arnd@arndb.de>
Cc: Douglas Hatch <doug.hatch@hp.com>
Cc: Linus Torvalds <torvalds@linux-foundation.org>
Cc: Paul E. McKenney <paulmck@linux.vnet.ibm.com>
Cc: Peter Zijlstra <peterz@infradead.org>
Cc: Scott J Norton <scott.norton@hp.com>
Cc: Thomas Gleixner <tglx@linutronix.de>
Cc: Will Deacon <will.deacon@arm.com>
Link: http://lkml.kernel.org/r/1436459543-29126-2-git-send-email-Waiman.Long@hp.com
Signed-off-by: Ingo Molnar <mingo@kernel.org>
When we unlock in __pv_queued_spin_unlock(), a failed cmpxchg() on the lock
value indicates that we need to take the slow-path and unhash the
corresponding node blocked on the lock.
Since a failed cmpxchg() does not provide any memory-ordering guarantees,
it is possible that the node data could be read before the cmpxchg() on
weakly-ordered architectures and therefore return a stale value, leading
to hash corruption and/or a BUG().
This patch adds an smb_rmb() following the failed cmpxchg operation, so
that the unhashing is ordered after the lock has been checked.
Reported-by: Peter Zijlstra <peterz@infradead.org>
Signed-off-by: Will Deacon <will.deacon@arm.com>
[ Added more comments]
Signed-off-by: Peter Zijlstra (Intel) <peterz@infradead.org>
Acked-by: Waiman Long <Waiman.Long@hp.com>
Cc: Andrew Morton <akpm@linux-foundation.org>
Cc: Linus Torvalds <torvalds@linux-foundation.org>
Cc: Paul E. McKenney <paulmck@linux.vnet.ibm.com>
Cc: Paul McKenney <paulmck@linux.vnet.ibm.com>
Cc: Steve Capper <Steve.Capper@arm.com>
Cc: Thomas Gleixner <tglx@linutronix.de>
Link: http://lkml.kernel.org/r/20150713155830.GL2632@arm.com
Signed-off-by: Ingo Molnar <mingo@kernel.org>
On ETRAX FS, all pins on the first port (and only the first port) have
interrupt support.
On ARTPEC-3, all pins on all ports have interrupt support. However,
there are only eight interrupts. Each of the interrupts is associated
with a group of pins and for each interrupt the one pin from the group
which will trigger it can be selected.
Signed-off-by: Rabin Vincent <rabin@rab.in>
Signed-off-by: Linus Walleij <linus.walleij@linaro.org>
If the driver has specified its own irq_{request/release}_resources()
functions, don't override them. The gpio-etraxfs driver will use this.
Signed-off-by: Rabin Vincent <rabin@rab.in>
[Added a small comment blurb]
Signed-off-by: Linus Walleij <linus.walleij@linaro.org>
The driver code allows for the disabling of MSI interrupts; however the
module_parm line was missed and the option fails to show with modinfo.
Signed-off-by: Larry Finger <Larry.Finger@lwfinger.net>
Cc: Stable <stable@vger.kernel.org> [3.15+]
Signed-off-by: Kalle Valo <kvalo@codeaurora.org>
To cleanup the pin_quirk table:
- rewrite the pin_config_match(), comparing all pins on the machine
with the corresponding pins in the quirk table.
- remove all 0x4xxxxxxx pin configurations from pin_quirk table
- after removing the 0x4xxxxxxx pin configurations, some pin tables
are exactly same, so removing the redudant pin tables.
Signed-off-by: Hui Wang <hui.wang@canonical.com>
Signed-off-by: Takashi Iwai <tiwai@suse.de>
This fixes the following warning, that is seen with gcc 5.1:
warning: logical not is only applied to the left hand side of comparison [-Wlogical-not-parentheses].
Signed-off-by: Tomer Barletz <barletz@gmail.com>
Signed-off-by: Takashi Iwai <tiwai@suse.de>
With NUMA support for s390, arch_add_memory() needs to respect the nid
parameter.
Signed-off-by: Gerald Schaefer <gerald.schaefer@de.ibm.com>
Signed-off-by: Martin Schwidefsky <schwidefsky@de.ibm.com>
Right now we use the address of the sie control block as tag for
the sampling data. This is hard to get for users. Let's just use
the PID of the cpu thread to mark the hardware samples.
Reviewed-by: David Hildenbrand <dahi@linux.vnet.ibm.com>
Signed-off-by: Christian Borntraeger <borntraeger@de.ibm.com>
Signed-off-by: Martin Schwidefsky <schwidefsky@de.ibm.com>
All calls to save_fpu_regs() specify the fpu structure of the current task
pointer as parameter. The task pointer of the current task can also be
retrieved from the CPU lowcore directly. Remove the parameter definition,
load the __LC_CURRENT task pointer from the CPU lowcore, and rebase the FPU
structure onto the task structure. Apply the same approach for the
load_fpu_regs() function.
Reviewed-by: Heiko Carstens <heiko.carstens@de.ibm.com>
Signed-off-by: Hendrik Brueckner <brueckner@linux.vnet.ibm.com>
Signed-off-by: Martin Schwidefsky <schwidefsky@de.ibm.com>
The show_stack() function deals exclusively with kernel contexts, but if
it gets called in user context with EVA enabled, show_stacktrace() will
attempt to access the stack using EVA accesses, which will either read
other user mapped data, or more likely cause an exception which will be
handled by __get_user().
This is easily reproduced using SysRq t to show all task states, which
results in the following stack dump output:
Stack : (Bad stack address)
Fix by setting the current user access mode to kernel around the call to
show_stacktrace(). This causes __get_user() to use normal loads to read
the kernel stack.
Now we get the correct output, like this:
Stack : 00000000 80168960 00000000 004a0000 00000000 00000000 8060016c 1f3abd0c
1f172cd8 8056f09c 7ff1e450 8014fc3c 00000001 806dd0b0 0000001d 00000002
1f17c6a0 1f17c804 1f17c6a0 8066f6e0 00000000 0000000a 00000000 00000000
00000000 00000000 00000000 00000000 00000000 00000000 00000000 00000000
00000000 00000000 00000000 00000000 00000000 0110e800 1f3abd6c 1f17c6a0
...
Signed-off-by: James Hogan <james.hogan@imgtec.com>
Cc: Markos Chandras <markos.chandras@imgtec.com>
Cc: Leonid Yegoshin <leonid.yegoshin@imgtec.com>
Cc: linux-mips@linux-mips.org
Cc: <stable@vger.kernel.org> # 3.15+
Patchwork: https://patchwork.linux-mips.org/patch/10778/
Signed-off-by: Ralf Baechle <ralf@linux-mips.org>
If a machine check exception is raised in kernel mode, user context,
with EVA enabled, then the do_mcheck handler will attempt to read the
code around the EPC using EVA load instructions, i.e. as if the reads
were from user mode. This will either read random user data if the
process has anything mapped at the same address, or it will cause an
exception which is handled by __get_user, resulting in this output:
Code: (Bad address in epc)
Fix by setting the current user access mode to kernel if the saved
register context indicates the exception was taken in kernel mode. This
causes __get_user to use normal loads to read the kernel code.
Signed-off-by: James Hogan <james.hogan@imgtec.com>
Cc: Markos Chandras <markos.chandras@imgtec.com>
Cc: Leonid Yegoshin <leonid.yegoshin@imgtec.com>
Cc: linux-mips@linux-mips.org
Cc: <stable@vger.kernel.org> # 3.15+
Patchwork: https://patchwork.linux-mips.org/patch/10777/
Signed-off-by: Ralf Baechle <ralf@linux-mips.org>
The majority of SMP platforms handle their IPIs through do_IRQ()
which calls irq_{enter/exit}(). When a call function IPI is received,
smp_call_function_interrupt() is called which also calls
irq_{enter,exit}(), meaning irq_count is raised twice.
When tick broadcasting is used (which is implemented via a call
function IPI), this incorrectly causes all CPU idle time on the core
receiving broadcast ticks to be accounted as time spent servicing
IRQs, as account_process_tick() will account as such if irq_count is
greater than 1. This results in 100% CPU usage being reported on a
core which receives its ticks via broadcast.
This patch removes the SMP smp_call_function_interrupt() wrapper which
calls irq_{enter,exit}(). Platforms which handle their IPIs through
do_IRQ() now call generic_smp_call_function_interrupt() directly to
avoid incrementing irq_count a second time. Platforms which don't
(loongson, sgi-ip27, sibyte) call generic_smp_call_function_interrupt()
wrapped in irq_{enter,exit}().
Signed-off-by: Alex Smith <alex.smith@imgtec.com>
Cc: linux-mips@linux-mips.org
Patchwork: https://patchwork.linux-mips.org/patch/10770/
Signed-off-by: Ralf Baechle <ralf@linux-mips.org>
Execution of break instruction, trap instructions, emulation of unaligned
loads or floating point instructions - anything that tries to read the
instruction's opcode from userspace - needs read access to a page.
RIXI (Read Inhibit / Execute Inhibit) support however allows the creation of
pags that are executable but not readable. On such a mapping the attempted
load of the opcode by the kernel is going to cause an endless loop of
page faults.
The quick workaround for this is to disable the combinations that the kernel
currently isn't able to handle which are executable mappings.
Signed-off-by: Ralf Baechle <ralf@linux-mips.org>
On Malta, since commit a87ea88d8f ("MIPS: Malta: initialise the RTC at
boot"), the RTC is reinitialised and forced into binary coded decimal
(BCD) mode during init, even if the bootloader has already initialised
it, and may even have already put it into binary mode (as YAMON does).
This corrupts the current time, can result in the RTC seconds being an
invalid BCD (e.g. 0x1a..0x1f) for up to 6 seconds, as well as confusing
YAMON for a while after reset, enough for it to report timeouts when
attempting to load from TFTP (it actually uses the RTC in that code).
Therefore only initialise the RTC to the extent that is necessary so
that Linux avoids interfering with the bootloader setup, while also
allowing it to estimate the CPU frequency without hanging, without a
bootloader necessarily having done anything with the RTC (for example
when the kernel is loaded via EJTAG).
The divider control is configured for a 32KHZ reference clock if
necessary, and the SET bit of the RTC_CONTROL register is cleared if
necessary without changing any other bits (this bit will be set when
coming out of reset if the battery has been disconnected).
Fixes: a87ea88d8f ("MIPS: Malta: initialise the RTC at boot")
Signed-off-by: James Hogan <james.hogan@imgtec.com>
Reviewed-by: Paul Burton <paul.burton@imgtec.com>
Cc: Ralf Baechle <ralf@linux-mips.org>
Cc: Maciej W. Rozycki <macro@linux-mips.org>
Cc: linux-mips@linux-mips.org
Cc: <stable@vger.kernel.org> # 3.14+
Patchwork: https://patchwork.linux-mips.org/patch/10739/
Signed-off-by: Ralf Baechle <ralf@linux-mips.org>
Commit eeb5389503 ("MIPS: unaligned: Prevent EVA instructions on kernel
unaligned accesses") renamed the Load* and Store* defines in unaligned.c
to _Load* and _Store* as part of its fix. One define was missed out which
causes big endian R6 kernels to fail to build.
arch/mips/kernel/unaligned.c:880:35:
error: implicit declaration of function '_StoreDW'
#define StoreDW(addr, value, res) _StoreDW(addr, value, res)
^
Signed-off-by: James Cowgill <James.Cowgill@imgtec.com>
Fixes: eeb5389503 ("MIPS: unaligned: Prevent EVA instructions on kernel unaligned accesses")
Cc: Markos Chandras <markos.chandras@imgtec.com>
Cc: <stable@vger.kernel.org> # 4.0+
Cc: linux-mips@linux-mips.org
Patchwork: https://patchwork.linux-mips.org/patch/10575/
Signed-off-by: Ralf Baechle <ralf@linux-mips.org>
Commit 01306aeadd ("MIPS: prepare for user enabling of CONFIG_OF")
changed the guards in asm/prom.h from CONFIG_OF to CONFIG_USE_OF, but
missed the actual function declarations in kernel/prom.c, which have
additional dependencies.
Fixes the following build error:
CC arch/mips/kernel/prom.o
arch/mips/kernel/prom.c: In function '__dt_setup_arch':
arch/mips/kernel/prom.c:54:2: error: implicit declaration of function 'early_init_dt_scan' [-Werror=implicit-function-declaration]
if (!early_init_dt_scan(bph))
^
Fixes: 01306aeadd ("MIPS: prepare for user enabling of CONFIG_OF")
Signed-off-by: Jonas Gorski <jogo@openwrt.org>
Acked-by: Rob Herring <robh@kernel.org>
Cc: linux-mips@linux-mips.org
Cc: devicetree@vger.kernel.org
Cc: Grant Likely <grant.likely@linaro.org>
Patchwork: https://patchwork.linux-mips.org/patch/10741/
Signed-off-by: Ralf Baechle <ralf@linux-mips.org>