Pull sched_ext/for-6.16-fixes to receive:
c50784e99f ("sched_ext: Make scx_group_set_weight() always update tg->scx.weight")
33796b9187 ("sched_ext, sched/core: Don't call scx_group_set_weight() prematurely from sched_create_group()")
which are needed to implement CPU bandwidth control interface support.
Signed-off-by: Tejun Heo <tj@kernel.org>
- Move input parameter validation from tg_set_cfs_bandwidth() to the new
outer function tg_set_bandwidth(). The outer function handles parameters
in usecs, validates them and calls tg_set_cfs_bandwidth() which converts
them into nsecs. This matches tg_bandwidth() on the read side.
- max/min_cfs_* consts are now used by tg_set_bandwidth(). Relocate, convert
into usecs and drop "cfs" from the names.
- Reimplement cpu_cfs_{period|quote|burst}_write_*() using tg_bandwidth()
and tg_set_bandwidth() and replace "cfs" in the names with "bw".
- Update cpu_max_write() to use tg_set_bandiwdth(). cpu_period_quota_parse()
is updated to drop nsec conversion accordingly. This aligns the behavior
with cfs_period_quota_print().
- Drop now unused tg_set_cfs_{period|quota|burst}().
- While at it, for consistency, rename default_cfs_period() to
default_bw_period_us() and make it return usecs.
This is to prepare for adding bandwidth control support to sched_ext.
tg_set_bandwidth() will be used as the muxing point. No functional changes
intended.
Signed-off-by: Tejun Heo <tj@kernel.org>
Signed-off-by: Peter Zijlstra (Intel) <peterz@infradead.org>
Link: https://lore.kernel.org/r/20250614012346.2358261-5-tj@kernel.org
- Update tg_get_cfs_*() to return u64 values. These are now used as the low
level accessors to the fair's bandwidth configuration parameters.
Translation to usecs takes place in these functions.
- Add tg_bandwidth() which reads all three bandwidth parameters using
tg_get_cfs_*().
- Reimplement cgroup interface read functions using tg_bandwidth(). Drop cfs
from the function names.
This is to prepare for adding bandwidth control support to sched_ext.
tg_bandwidth() will be used as the muxing point similar to tg_weight(). No
functional changes.
Signed-off-by: Tejun Heo <tj@kernel.org>
Signed-off-by: Peter Zijlstra (Intel) <peterz@infradead.org>
Link: https://lore.kernel.org/r/20250614012346.2358261-4-tj@kernel.org
During task_group creation, sched_create_group() calls
scx_group_set_weight() with CGROUP_WEIGHT_DFL to initialize the sched_ext
portion. This is premature and ends up calling ops.cgroup_set_weight() with
an incorrect @cgrp before ops.cgroup_init() is called.
sched_create_group() should just initialize SCX related fields in the new
task_group. Fix it by factoring out scx_tg_init() from sched_init() and
making sched_create_group() call that function instead of
scx_group_set_weight().
v2: Retain CONFIG_EXT_GROUP_SCHED ifdef in sched_init() as removing it leads
to build failures on !CONFIG_GROUP_SCHED configs.
Signed-off-by: Tejun Heo <tj@kernel.org>
Fixes: 8195136669 ("sched_ext: Add cgroup support")
Cc: stable@vger.kernel.org # v6.12+
Pull more MM updates from Andrew Morton:
- "zram: support algorithm-specific parameters" from Sergey Senozhatsky
adds infrastructure for passing algorithm-specific parameters into
zram. A single parameter `winbits' is implemented at this time.
- "memcg: nmi-safe kmem charging" from Shakeel Butt makes memcg
charging nmi-safe, which is required by BFP, which can operate in NMI
context.
- "Some random fixes and cleanup to shmem" from Kemeng Shi implements
small fixes and cleanups in the shmem code.
- "Skip mm selftests instead when kernel features are not present" from
Zi Yan fixes some issues in the MM selftest code.
- "mm/damon: build-enable essential DAMON components by default" from
SeongJae Park reworks DAMON Kconfig to make it easier to enable
CONFIG_DAMON.
- "sched/numa: add statistics of numa balance task migration" from Libo
Chen adds more info into sysfs and procfs files to improve visibility
into the NUMA balancer's task migration activity.
- "selftests/mm: cow and gup_longterm cleanups" from Mark Brown
provides various updates to some of the MM selftests to make them
play better with the overall containing framework.
* tag 'mm-stable-2025-06-01-14-06' of git://git.kernel.org/pub/scm/linux/kernel/git/akpm/mm: (43 commits)
mm/khugepaged: clean up refcount check using folio_expected_ref_count()
selftests/mm: fix test result reporting in gup_longterm
selftests/mm: report unique test names for each cow test
selftests/mm: add helper for logging test start and results
selftests/mm: use standard ksft_finished() in cow and gup_longterm
selftests/damon/_damon_sysfs: skip testcases if CONFIG_DAMON_SYSFS is disabled
sched/numa: add statistics of numa balance task
sched/numa: fix task swap by skipping kernel threads
tools/testing: check correct variable in open_procmap()
tools/testing/vma: add missing function stub
mm/gup: update comment explaining why gup_fast() disables IRQs
selftests/mm: two fixes for the pfnmap test
mm/khugepaged: fix race with folio split/free using temporary reference
mm: add CONFIG_PAGE_BLOCK_ORDER to select page block order
mmu_notifiers: remove leftover stub macros
selftests/mm: deduplicate test names in madv_populate
kcov: rust: add flags for KCOV with Rust
mm: rust: make CONFIG_MMU ifdefs more narrow
mmu_gather: move tlb flush for VM_PFNMAP/VM_MIXEDMAP vmas into free_pgtables()
mm/damon/Kconfig: enable CONFIG_DAMON by default
...
On systems with NUMA balancing enabled, it has been found that tracking
task activities resulting from NUMA balancing is beneficial. NUMA
balancing employs two mechanisms for task migration: one is to migrate
a task to an idle CPU within its preferred node, and the other is to
swap tasks located on different nodes when they are on each other's
preferred nodes.
The kernel already provides NUMA page migration statistics in
/sys/fs/cgroup/mytest/memory.stat and /proc/{PID}/sched. However, it
lacks statistics regarding task migration and swapping. Therefore,
relevant counts for task migration and swapping should be added.
The following two new fields:
numa_task_migrated
numa_task_swapped
will be shown in /sys/fs/cgroup/{GROUP}/memory.stat, /proc/{PID}/sched
and /proc/vmstat.
Introducing both per-task and per-memory cgroup (memcg) NUMA balancing
statistics facilitates a rapid evaluation of the performance and
resource utilization of the target workload. For instance, users can
first identify the container with high NUMA balancing activity and then
further pinpoint a specific task within that group, and subsequently
adjust the memory policy for that task. In short, although it is
possible to iterate through /proc/$pid/sched to locate the problematic
task, the introduction of aggregated NUMA balancing activity for tasks
within each memcg can assist users in identifying the task more
efficiently through a divide-and-conquer approach.
As Libo Chen pointed out, the memcg event relies on the text names in
vmstat_text, and /proc/vmstat generates corresponding items based on
vmstat_text. Thus, the relevant task migration and swapping events
introduced in vmstat_text also need to be populated by
count_vm_numa_event(), otherwise these values are zero in /proc/vmstat.
In theory, task migration and swap events are part of the scheduler's
activities. The reason for exposing them through the
memory.stat/vmstat interface is that we already have NUMA balancing
statistics in memory.stat/vmstat, and these events are closely related
to each other. Following Shakeel's suggestion, we describe the
end-to-end flow/story of all these events occurring on a timeline for
future reference:
The goal of NUMA balancing is to co-locate a task and its memory pages
on the same NUMA node. There are two strategies: migrate the pages to
the task's node, or migrate the task to the node where its pages
reside.
Suppose a task p1 is running on Node 0, but its pages are located on
Node 1. NUMA page fault statistics for p1 reveal its "page footprint"
across nodes. If NUMA balancing detects that most of p1's pages are on
Node 1:
1.Page Migration Attempt:
The Numa balance first tries to migrate p1's pages to Node 0.
The numa_page_migrate counter increments.
2.Task Migration Strategies:
After the page migration finishes, Numa balance checks every
1 second to see if p1 can be migrated to Node 1.
Case 2.1: Idle CPU Available
If Node 1 has an idle CPU, p1 is directly scheduled there. This
event is logged as numa_task_migrated.
Case 2.2: No Idle CPU (Task Swap)
If all CPUs on Node1 are busy, direct migration could cause CPU
contention or load imbalance. Instead: The Numa balance selects a
candidate task p2 on Node 1 that prefers Node 0 (e.g., due to its own
page footprint). p1 and p2 are swapped. This cross-node swap is
recorded as numa_task_swapped.
Link: https://lkml.kernel.org/r/d00edb12ba0f0de3c5222f61487e65f2ac58f5b1.1748493462.git.yu.c.chen@intel.com
Link: https://lkml.kernel.org/r/7ef90a88602ed536be46eba7152ed0d33bad5790.1748002400.git.yu.c.chen@intel.com
Signed-off-by: Chen Yu <yu.c.chen@intel.com>
Tested-by: K Prateek Nayak <kprateek.nayak@amd.com>
Tested-by: Madadi Vineeth Reddy <vineethr@linux.ibm.com>
Acked-by: Peter Zijlstra (Intel) <peterz@infradead.org>
Tested-by: Venkat Rao Bagalkote <venkat88@linux.ibm.com>
Cc: Aubrey Li <aubrey.li@intel.com>
Cc: Ayush Jain <Ayush.jain3@amd.com>
Cc: "Chen, Tim C" <tim.c.chen@intel.com>
Cc: Ingo Molnar <mingo@redhat.com>
Cc: Johannes Weiner <hannes@cmpxchg.org>
Cc: Jonathan Corbet <corbet@lwn.net>
Cc: Libo Chen <libo.chen@oracle.com>
Cc: Mel Gorman <mgorman <mgorman@suse.de>
Cc: Michal Hocko <mhocko@kernel.org>
Cc: Michal Koutný <mkoutny@suse.com>
Cc: Muchun Song <muchun.song@linux.dev>
Cc: Roman Gushchin <roman.gushchin@linux.dev>
Cc: Shakeel Butt <shakeel.butt@linux.dev>
Cc: Tejun Heo <tj@kernel.org>
Signed-off-by: Andrew Morton <akpm@linux-foundation.org>
Pull scheduler updates from Ingo Molnar:
"Core & fair scheduler changes:
- Tweak wait_task_inactive() to force dequeue sched_delayed tasks
(John Stultz)
- Adhere to place_entity() constraints (Peter Zijlstra)
- Allow decaying util_est when util_avg > CPU capacity (Pierre
Gondois)
- Fix up wake_up_sync() vs DELAYED_DEQUEUE (Xuewen Yan)
Energy management:
- Introduce sched_update_asym_prefer_cpu() (K Prateek Nayak)
- cpufreq/amd-pstate: Update asym_prefer_cpu when core rankings
change (K Prateek Nayak)
- Align uclamp and util_est and call before freq update (Xuewen Yan)
CPU isolation:
- Make use of more than one housekeeping CPU (Phil Auld)
RT scheduler:
- Fix race in push_rt_task() (Harshit Agarwal)
- Add kernel cmdline option for rt_group_sched (Michal Koutný)
Scheduler topology support:
- Improve topology_span_sane speed (Steve Wahl)
Scheduler debugging:
- Move and extend the sched_process_exit() tracepoint (Andrii
Nakryiko)
- Add RT_GROUP WARN checks for non-root task_groups (Michal Koutný)
- Fix trace_sched_switch(.prev_state) (Peter Zijlstra)
- Untangle cond_resched() and live-patching (Peter Zijlstra)
Fixes and cleanups:
- Misc fixes and cleanups (K Prateek Nayak, Michal Koutný, Peter
Zijlstra, Xuewen Yan)"
* tag 'sched-core-2025-05-25' of git://git.kernel.org/pub/scm/linux/kernel/git/tip/tip: (26 commits)
sched/uclamp: Align uclamp and util_est and call before freq update
sched/util_est: Simplify condition for util_est_{en,de}queue()
sched/fair: Fixup wake_up_sync() vs DELAYED_DEQUEUE
sched,livepatch: Untangle cond_resched() and live-patching
sched/core: Tweak wait_task_inactive() to force dequeue sched_delayed tasks
sched/fair: Adhere to place_entity() constraints
sched/debug: Print the local group's asym_prefer_cpu
cpufreq/amd-pstate: Update asym_prefer_cpu when core rankings change
sched/topology: Introduce sched_update_asym_prefer_cpu()
sched/fair: Use READ_ONCE() to read sg->asym_prefer_cpu
sched/isolation: Make use of more than one housekeeping cpu
sched/rt: Fix race in push_rt_task
sched: Add annotations to RT_GROUP_SCHED fields
sched: Add RT_GROUP WARN checks for non-root task_groups
sched: Do not construct nor expose RT_GROUP_SCHED structures if disabled
sched: Bypass bandwitdh checks with runtime disabled RT_GROUP_SCHED
sched: Skip non-root task_groups with disabled RT_GROUP_SCHED
sched: Add commadline option for RT_GROUP_SCHED toggling
sched: Always initialize rt_rq's task_group
sched: Remove unneeed macro wrap
...
The commit dfa0a574cb ("sched/uclamg: Handle delayed dequeue")
has add the sched_delayed check to prevent double uclamp_dec/inc.
However, it put the uclamp_rq_inc() after enqueue_task().
This may lead to the following issues:
When a task with uclamp goes through enqueue_task() and could trigger
cpufreq update, its uclamp won't even be considered in the cpufreq
update. It is only after enqueue will the uclamp be added to rq
buckets, and cpufreq will only pick it up at the next update.
This could cause a delay in frequency updating. It may affect
the performance(uclamp_min > 0) or power(uclamp_max < 1024).
So, just like util_est, put the uclamp_rq_inc() before enqueue_task().
And as for the sched_delayed_task, same as util_est, using the
sched_delayed flag to prevent inc the sched_delayed_task's uclamp,
using the ENQUEUE_DELAYED flag to allow inc the sched_delayed_task's uclamp
which is being woken up.
Signed-off-by: Xuewen Yan <xuewen.yan@unisoc.com>
Signed-off-by: Peter Zijlstra (Intel) <peterz@infradead.org>
Reviewed-by: Vincent Guittot <vincent.guittot@linaro.org>
Reviewed-by: Dietmar Eggemann <dietmar.eggemann@arm.com>
Tested-by: Dietmar Eggemann <dietmar.eggemann@arm.com>
Link: https://lore.kernel.org/r/20250417043457.10632-3-xuewen.yan@unisoc.com
It was reported that in 6.12, smpboot_create_threads() was
taking much longer then in 6.6.
I narrowed down the call path to:
smpboot_create_threads()
-> kthread_create_on_cpu()
-> kthread_bind()
-> __kthread_bind_mask()
->wait_task_inactive()
Where in wait_task_inactive() we were regularly hitting the
queued case, which sets a 1 tick timeout, which when called
multiple times in a row, accumulates quickly into a long
delay.
I noticed disabling the DELAY_DEQUEUE sched feature recovered
the performance, and it seems the newly create tasks are usually
sched_delayed and left on the runqueue.
So in wait_task_inactive() when we see the task
p->se.sched_delayed, manually dequeue the sched_delayed task
with DEQUEUE_DELAYED, so we don't have to constantly wait a
tick.
Fixes: 152e11f6df ("sched/fair: Implement delayed dequeue")
Reported-by: peter-yc.chang@mediatek.com
Signed-off-by: John Stultz <jstultz@google.com>
Signed-off-by: Peter Zijlstra (Intel) <peterz@infradead.org>
Tested-by: K Prateek Nayak <kprateek.nayak@amd.com>
Link: https://lkml.kernel.org/r/20250429150736.3778580-1-jstultz@google.com
Thanks to kernel cmdline being available early, before any
cgroup hierarchy exists, we can achieve the RT_GROUP_SCHED boottime
disabling goal by simply skipping any creation (and destruction) of
RT_GROUP data and its exposure via RT attributes.
We can do this thanks to previously placed runtime guards that would
redirect all operations to root_task_group's data when RT_GROUP_SCHED
disabled.
Signed-off-by: Michal Koutný <mkoutny@suse.com>
Signed-off-by: Peter Zijlstra (Intel) <peterz@infradead.org>
Link: https://lkml.kernel.org/r/20250310170442.504716-8-mkoutny@suse.com
When RT_GROUPs are compiled but not exposed, their bandwidth cannot
be configured (and it is not initialized for non-root task_groups neither).
Therefore bypass any checks of task vs task_group bandwidth.
This will achieve behavior very similar to setups that have
!CONFIG_RT_GROUP_SCHED and attach cpu controller to cgroup v2 hierarchy.
(On a related note, this may allow having RT tasks with
CONFIG_RT_GROUP_SCHED and cgroup v2 hierarchy.)
Signed-off-by: Michal Koutný <mkoutny@suse.com>
Signed-off-by: Peter Zijlstra (Intel) <peterz@infradead.org>
Link: https://lkml.kernel.org/r/20250310170442.504716-7-mkoutny@suse.com
First, we want to prevent placement of RT tasks on non-root rt_rqs which
we achieve in the task migration code that'd fall back to
root_task_group's rt_rq.
Second, we want to work with only root_task_group's rt_rq when iterating
all "real" rt_rqs when RT_GROUP is disabled. To achieve this we keep
root_task_group as the first one on the task_groups and break out
quickly.
Signed-off-by: Michal Koutný <mkoutny@suse.com>
Signed-off-by: Peter Zijlstra (Intel) <peterz@infradead.org>
Link: https://lkml.kernel.org/r/20250310170442.504716-6-mkoutny@suse.com
Gabriele noted that in case of signal_pending_state(), the tracepoint
sees a stale task-state.
Fixes: fa2c3254d7 ("sched/tracing: Don't re-read p->state when emitting sched_switch event")
Reported-by: Gabriele Monaco <gmonaco@redhat.com>
Signed-off-by: Peter Zijlstra (Intel) <peterz@infradead.org>
Cc: Valentin Schneider <vschneid@redhat.com>
Eliminate a useless task_work on execve by moving the call to
rseq_set_notify_resume() from sched_mm_cid_after_execve() to the error
path of bprm_execve().
The call to rseq_set_notify_resume() from sched_mm_cid_after_execve() is
pointless in the success case, because rseq_execve() will clear the rseq
pointer before returning to userspace.
sched_mm_cid_after_execve() is called from both the success and error
paths of bprm_execve(). The call to rseq_set_notify_resume() is needed
on error because the mm_cid may have changed.
Also move the rseq_execve() to right after sched_mm_cid_after_execve()
in bprm_execve().
[ mingo: Merged to a recent upstream kernel, extended the changelog. ]
Signed-off-by: Mathieu Desnoyers <mathieu.desnoyers@efficios.com>
Signed-off-by: Ingo Molnar <mingo@kernel.org>
Cc: Oleg Nesterov <oleg@redhat.com>
Cc: Linus Torvalds <torvalds@linux-foundation.org>
Link: https://lore.kernel.org/r/20250327132945.1558783-1-mathieu.desnoyers@efficios.com
Pull latency tracing updates from Steven Rostedt:
- Add some trace events to osnoise and timerlat sample generation
This adds more information to the osnoise and timerlat tracers as
well as allows BPF programs to be attached to these locations to
extract even more data.
- Fix to DECLARE_TRACE_CONDITION() macro
It wasn't used but now will be and it happened to be broken causing
the build to fail.
- Add scheduler specification monitors to runtime verifier (RV)
This is a continuation of Daniel Bristot's work.
RV allows monitors to run and react concurrently. Running the
cumulative model is equivalent to running single components using the
same reactors, with the advantage that it's easier to point out which
specification failed in case of error.
This update introduces nested monitors to RV, in short, the sysfs
monitor folder will contain a monitor named sched, which is nothing
but an empty container for other monitors. Controlling the sched
monitor (enable, disable, set reactors) controls all nested monitors.
The following scheduling monitors are added:
- sco: scheduling context operations
Monitor to ensure sched_set_state happens only in thread context
- tss: task switch while scheduling
Monitor to ensure sched_switch happens only in scheduling context
- snroc: set non runnable on its own context
Monitor to ensure set_state happens only in the respective task's context
- scpd: schedule called with preemption disabled
Monitor to ensure schedule is called with preemption disabled
- snep: schedule does not enable preempt
Monitor to ensure schedule does not enable preempt
- sncid: schedule not called with interrupt disabled
Monitor to ensure schedule is not called with interrupt disabled
* tag 'trace-latency-v6.15' of git://git.kernel.org/pub/scm/linux/kernel/git/trace/linux-trace:
tools/rv: Allow rv list to filter for container
Documentation/rv: Add docs for the sched monitors
verification/dot2k: Add support for nested monitors
tools/rv: Add support for nested monitors
rv: Add scpd, snep and sncid per-cpu monitors
rv: Add snroc per-task monitor
rv: Add sco and tss per-cpu monitors
rv: Add option for nested monitors and include sched
sched: Add sched tracepoints for RV task model
rv: Add license identifiers to monitor files
tracing: Fix DECLARE_TRACE_CONDITION
trace/osnoise: Add trace events for samples
Pull timer cleanups from Thomas Gleixner:
"A treewide hrtimer timer cleanup
hrtimers are initialized with hrtimer_init() and a subsequent store to
the callback pointer. This turned out to be suboptimal for the
upcoming Rust integration and is obviously a silly implementation to
begin with.
This cleanup replaces the hrtimer_init(T); T->function = cb; sequence
with hrtimer_setup(T, cb);
The conversion was done with Coccinelle and a few manual fixups.
Once the conversion has completely landed in mainline, hrtimer_init()
will be removed and the hrtimer::function becomes a private member"
* tag 'timers-cleanups-2025-03-23' of git://git.kernel.org/pub/scm/linux/kernel/git/tip/tip: (100 commits)
wifi: rt2x00: Switch to use hrtimer_update_function()
io_uring: Use helper function hrtimer_update_function()
serial: xilinx_uartps: Use helper function hrtimer_update_function()
ASoC: fsl: imx-pcm-fiq: Switch to use hrtimer_setup()
RDMA: Switch to use hrtimer_setup()
virtio: mem: Switch to use hrtimer_setup()
drm/vmwgfx: Switch to use hrtimer_setup()
drm/xe/oa: Switch to use hrtimer_setup()
drm/vkms: Switch to use hrtimer_setup()
drm/msm: Switch to use hrtimer_setup()
drm/i915/request: Switch to use hrtimer_setup()
drm/i915/uncore: Switch to use hrtimer_setup()
drm/i915/pmu: Switch to use hrtimer_setup()
drm/i915/perf: Switch to use hrtimer_setup()
drm/i915/gvt: Switch to use hrtimer_setup()
drm/i915/huc: Switch to use hrtimer_setup()
drm/amdgpu: Switch to use hrtimer_setup()
stm class: heartbeat: Switch to use hrtimer_setup()
i2c: Switch to use hrtimer_setup()
iio: Switch to use hrtimer_setup()
...
Pull scheduler updates from Ingo Molnar:
"Core & fair scheduler changes:
- Cancel the slice protection of the idle entity (Zihan Zhou)
- Reduce the default slice to avoid tasks getting an extra tick
(Zihan Zhou)
- Force propagating min_slice of cfs_rq when {en,de}queue tasks
(Tianchen Ding)
- Refactor can_migrate_task() to elimate looping (I Hsin Cheng)
- Add unlikey branch hints to several system calls (Colin Ian King)
- Optimize current_clr_polling() on certain architectures (Yujun
Dong)
Deadline scheduler: (Juri Lelli)
- Remove redundant dl_clear_root_domain call
- Move dl_rebuild_rd_accounting to cpuset.h
Uclamp:
- Use the uclamp_is_used() helper instead of open-coding it (Xuewen
Yan)
- Optimize sched_uclamp_used static key enabling (Xuewen Yan)
Scheduler topology support: (Juri Lelli)
- Ignore special tasks when rebuilding domains
- Add wrappers for sched_domains_mutex
- Generalize unique visiting of root domains
- Rebuild root domain accounting after every update
- Remove partition_and_rebuild_sched_domains
- Stop exposing partition_sched_domains_locked
RSEQ: (Michael Jeanson)
- Update kernel fields in lockstep with CONFIG_DEBUG_RSEQ=y
- Fix segfault on registration when rseq_cs is non-zero
- selftests: Add rseq syscall errors test
- selftests: Ensure the rseq ABI TLS is actually 1024 bytes
Membarriers:
- Fix redundant load of membarrier_state (Nysal Jan K.A.)
Scheduler debugging:
- Introduce and use preempt_model_str() (Sebastian Andrzej Siewior)
- Make CONFIG_SCHED_DEBUG unconditional (Ingo Molnar)
Fixes and cleanups:
- Always save/restore x86 TSC sched_clock() on suspend/resume
(Guilherme G. Piccoli)
- Misc fixes and cleanups (Thorsten Blum, Juri Lelli, Sebastian
Andrzej Siewior)"
* tag 'sched-core-2025-03-22' of git://git.kernel.org/pub/scm/linux/kernel/git/tip/tip: (40 commits)
cpuidle, sched: Use smp_mb__after_atomic() in current_clr_polling()
sched/debug: Remove CONFIG_SCHED_DEBUG
sched/debug: Remove CONFIG_SCHED_DEBUG from self-test config files
sched/debug, Documentation: Remove (most) CONFIG_SCHED_DEBUG references from documentation
sched/debug: Make CONFIG_SCHED_DEBUG functionality unconditional
sched/debug: Make 'const_debug' tunables unconditional __read_mostly
sched/debug: Change SCHED_WARN_ON() to WARN_ON_ONCE()
rseq/selftests: Fix namespace collision with rseq UAPI header
include/{topology,cpuset}: Move dl_rebuild_rd_accounting to cpuset.h
sched/topology: Stop exposing partition_sched_domains_locked
cgroup/cpuset: Remove partition_and_rebuild_sched_domains
sched/topology: Remove redundant dl_clear_root_domain call
sched/deadline: Rebuild root domain accounting after every update
sched/deadline: Generalize unique visiting of root domains
sched/topology: Wrappers for sched_domains_mutex
sched/deadline: Ignore special tasks when rebuilding domains
tracing: Use preempt_model_str()
xtensa: Rely on generic printing of preemption model
x86: Rely on generic printing of preemption model
s390: Rely on generic printing of preemption model
...
Pull RCU updates from Boqun Feng:
"Documentation:
- Add broken-timing possibility to stallwarn.rst
- Improve discussion of this_cpu_ptr(), add raw_cpu_ptr()
- Document self-propagating callbacks
- Point call_srcu() to call_rcu() for detailed memory ordering
- Add CONFIG_RCU_LAZY delays to call_rcu() kernel-doc header
- Clarify RCU_LAZY and RCU_LAZY_DEFAULT_OFF help text
- Remove references to old grace-period-wait primitives
srcu:
- Introduce srcu_read_{un,}lock_fast(), which is similar to
srcu_read_{un,}lock_lite(): avoid smp_mb()s in lock and unlock
at the cost of calling synchronize_rcu() in synchronize_srcu()
Moreover, by returning the percpu offset of the counter at
srcu_read_lock_fast() time, srcu_read_unlock_fast() can avoid
extra pointer dereferencing, which makes it faster than
srcu_read_{un,}lock_lite()
srcu_read_{un,}lock_fast() are intended to replace
rcu_read_{un,}lock_trace() if possible
RCU torture:
- Add get_torture_init_jiffies() to return the start time of the test
- Add a test_boost_holdoff module parameter to allow delaying
boosting tests when building rcutorture as built-in
- Add grace period sequence number logging at the beginning and end
of failure/close-call results
- Switch to hexadecimal for the expedited grace period sequence
number in the rcu_exp_grace_period trace point
- Make cur_ops->format_gp_seqs take buffer length
- Move RCU_TORTURE_TEST_{CHK_RDR_STATE,LOG_CPU} to bool
- Complain when invalid SRCU reader_flavor is specified
- Add FORCE_NEED_SRCU_NMI_SAFE Kconfig for testing, which forces SRCU
uses atomics even when percpu ops are NMI safe, and use the Kconfig
for SRCU lockdep testing
Misc:
- Split rcu_report_exp_cpu_mult() mask parameter and use for tracing
- Remove READ_ONCE() for rdp->gpwrap access in __note_gp_changes()
- Fix get_state_synchronize_rcu_full() GP-start detection
- Move RCU Tasks self-tests to core_initcall()
- Print segment lengths in show_rcu_nocb_gp_state()
- Make RCU watch ct_kernel_exit_state() warning
- Flush console log from kernel_power_off()
- rcutorture: Allow a negative value for nfakewriters
- rcu: Update TREE05.boot to test normal synchronize_rcu()
- rcu: Use _full() API to debug synchronize_rcu()
Make RCU handle PREEMPT_LAZY better:
- Fix header guard for rcu_all_qs()
- rcu: Rename PREEMPT_AUTO to PREEMPT_LAZY
- Update __cond_resched comment about RCU quiescent states
- Handle unstable rdp in rcu_read_unlock_strict()
- Handle quiescent states for PREEMPT_RCU=n, PREEMPT_COUNT=y
- osnoise: Provide quiescent states
- Adjust rcutorture with possible PREEMPT_RCU=n && PREEMPT_COUNT=y
combination
- Limit PREEMPT_RCU configurations
- Make rcutorture senario TREE07 and senario TREE10 use
PREEMPT_LAZY=y"
* tag 'rcu-next-v6.15' of git://git.kernel.org/pub/scm/linux/kernel/git/rcu/linux: (59 commits)
rcutorture: Make scenario TREE07 build CONFIG_PREEMPT_LAZY=y
rcutorture: Make scenario TREE10 build CONFIG_PREEMPT_LAZY=y
rcu: limit PREEMPT_RCU configurations
rcutorture: Update ->extendables check for lazy preemption
rcutorture: Update rcutorture_one_extend_check() for lazy preemption
osnoise: provide quiescent states
rcu: Use _full() API to debug synchronize_rcu()
rcu: Update TREE05.boot to test normal synchronize_rcu()
rcutorture: Allow a negative value for nfakewriters
Flush console log from kernel_power_off()
context_tracking: Make RCU watch ct_kernel_exit_state() warning
rcu/nocb: Print segment lengths in show_rcu_nocb_gp_state()
rcu-tasks: Move RCU Tasks self-tests to core_initcall()
rcu: Fix get_state_synchronize_rcu_full() GP-start detection
torture: Make SRCU lockdep testing use srcu_read_lock_nmisafe()
srcu: Add FORCE_NEED_SRCU_NMI_SAFE Kconfig for testing
rcutorture: Complain when invalid SRCU reader_flavor is specified
rcutorture: Move RCU_TORTURE_TEST_{CHK_RDR_STATE,LOG_CPU} to bool
rcutorture: Make cur_ops->format_gp_seqs take buffer length
rcutorture: Add ftrace-compatible timestamp to GP# failure/close-call output
...
Pull sched_ext updates from Tejun Heo:
- Add mechanism to count and report internal events. This significantly
improves visibility on subtle corner conditions.
- The default idle CPU selection logic is revamped and improved in
multiple ways including being made topology aware.
- sched_ext was disabling ttwu_queue for simplicity, which can be
costly when hardware topology is more complex. Implement
SCX_OPS_ALLOWED_QUEUED_WAKEUP so that BPF schedulers can selectively
enable ttwu_queue.
- tools/sched_ext updates to improve compatibility among others.
- Other misc updates and fixes.
- sched_ext/for-6.14-fixes were pulled a few times to receive
prerequisite fixes and resolve conflicts.
* tag 'sched_ext-for-6.15' of git://git.kernel.org/pub/scm/linux/kernel/git/tj/sched_ext: (42 commits)
sched_ext: idle: Refactor scx_select_cpu_dfl()
sched_ext: idle: Honor idle flags in the built-in idle selection policy
sched_ext: Skip per-CPU tasks in scx_bpf_reenqueue_local()
sched_ext: Add trace point to track sched_ext core events
sched_ext: Change the event type from u64 to s64
sched_ext: Documentation: add task lifecycle summary
tools/sched_ext: Provide a compatible helper for scx_bpf_events()
selftests/sched_ext: Add NUMA-aware scheduler test
tools/sched_ext: Provide consistent access to scx flags
sched_ext: idle: Fix scx_bpf_pick_any_cpu_node() behavior
sched_ext: idle: Introduce scx_bpf_nr_node_ids()
sched_ext: idle: Introduce node-aware idle cpu kfunc helpers
sched_ext: idle: Per-node idle cpumasks
sched_ext: idle: Introduce SCX_OPS_BUILTIN_IDLE_PER_NODE
sched_ext: idle: Make idle static keys private
sched/topology: Introduce for_each_node_numadist() iterator
mm/numa: Introduce nearest_node_nodemask()
nodemask: numa: reorganize inclusion path
nodemask: add nodes_copy()
tools/sched_ext: Sync with scx repo
...
Rebuilding of root domains accounting information (total_bw) is
currently broken on some cases, e.g. suspend/resume on aarch64. Problem
is that the way we keep track of domain changes and try to add bandwidth
back is convoluted and fragile.
Fix it by simplify things by making sure bandwidth accounting is cleared
and completely restored after root domains changes (after root domains
are again stable).
To be sure we always call dl_rebuild_rd_accounting while holding
cpuset_mutex we also add cpuset_reset_sched_domains() wrapper.
Fixes: 53916d5fd3 ("sched/deadline: Check bandwidth overflow earlier for hotplug")
Reported-by: Jon Hunter <jonathanh@nvidia.com>
Co-developed-by: Waiman Long <llong@redhat.com>
Signed-off-by: Waiman Long <llong@redhat.com>
Signed-off-by: Juri Lelli <juri.lelli@redhat.com>
Signed-off-by: Peter Zijlstra (Intel) <peterz@infradead.org>
Reviewed-by: Dietmar Eggemann <dietmar.eggemann@arm.com>
Tested-by: Dietmar Eggemann <dietmar.eggemann@arm.com>
Link: https://lore.kernel.org/r/Z9MRfeJKJUOyUSto@jlelli-thinkpadt14gen4.remote.csb
The individual architectures often add the preemption model to the begin
of the backtrace. This is the case on X86 or ARM64 for the "die" case
but not for regular warning. With the addition of DYNAMIC_PREEMPT for
PREEMPT_RT we end up with CONFIG_PREEMPT and CONFIG_PREEMPT_RT set
simultaneously. That means that everyone who tried to add that piece of
information gets it wrong for PREEMPT_RT because PREEMPT is checked
first.
Provide a generic function which returns the current scheduling model
considering LAZY preempt and the current state of PREEMPT_DYNAMIC.
The resulting strings are:
┏━━━━━━━━━━━┳━━━━━━━━━━━━━━┳━━━━━━━━━━━━━━━━━━━┳━━━━━━━━━━━━━━━━━━━━┳━━━━━━━━━━━━━━━━━━━┓
┃ Model ┃ -RT -DYN ┃ +RT -DYN ┃ -RT +DYN ┃ +RT +DYN ┃
┡━━━━━━━━━━━╇━━━━━━━━━━━━━━╇━━━━━━━━━━━━━━━━━━━╇━━━━━━━━━━━━━━━━━━━━╇━━━━━━━━━━━━━━━━━━━┩
│NONE │ NONE │ n/a │ PREEMPT(none) │ n/a │
├───────────┼──────────────┼───────────────────┼────────────────────┼───────────────────┤
│VOLUNTARY │ VOLUNTARY │ n/a │ PREEMPT(voluntary) │ n/a │
├───────────┼──────────────┼───────────────────┼────────────────────┼───────────────────┤
│FULL │ PREEMPT │ PREEMPT_RT │ PREEMPT(full) │ PREEMPT_{RT,full} │
├───────────┼──────────────┼───────────────────┼────────────────────┼───────────────────┤
│LAZY │ PREEMPT_LAZY │ PREEMPT_{RT,LAZY} │ PREEMPT(lazy) │ PREEMPT_{RT,lazy} │
└───────────┴──────────────┴───────────────────┴────────────────────┴───────────────────┘
[ The dynamic building of the string can lead to an empty string if the
function is invoked simultaneously on two CPUs. ]
Co-developed-by: "Peter Zijlstra (Intel)" <peterz@infradead.org>
Signed-off-by: "Peter Zijlstra (Intel)" <peterz@infradead.org>
Co-developed-by: "Steven Rostedt (Google)" <rostedt@goodmis.org>
Signed-off-by: "Steven Rostedt (Google)" <rostedt@goodmis.org>
Signed-off-by: Sebastian Andrzej Siewior <bigeasy@linutronix.de>
Signed-off-by: Peter Zijlstra (Intel) <peterz@infradead.org>
Reviewed-by: Shrikanth Hegde <sshegde@linux.ibm.com>
Link: https://lore.kernel.org/r/20250314160810.2373416-2-bigeasy@linutronix.de
This reverts commit eff6c8ce8d.
Hazem reported a 30% drop in UnixBench spawn test with commit
eff6c8ce8d ("sched/core: Reduce cost of sched_move_task when config
autogroup") on a m6g.xlarge AWS EC2 instance with 4 vCPUs and 16 GiB RAM
(aarch64) (single level MC sched domain):
https://lkml.kernel.org/r/20250205151026.13061-1-hagarhem@amazon.com
There is an early bail from sched_move_task() if p->sched_task_group is
equal to p's 'cpu cgroup' (sched_get_task_group()). E.g. both are
pointing to taskgroup '/user.slice/user-1000.slice/session-1.scope'
(Ubuntu '22.04.5 LTS').
So in:
do_exit()
sched_autogroup_exit_task()
sched_move_task()
if sched_get_task_group(p) == p->sched_task_group
return
/* p is enqueued */
dequeue_task() \
sched_change_group() |
task_change_group_fair() |
detach_task_cfs_rq() | (1)
set_task_rq() |
attach_task_cfs_rq() |
enqueue_task() /
(1) isn't called for p anymore.
Turns out that the regression is related to sgs->group_util in
group_is_overloaded() and group_has_capacity(). If (1) isn't called for
all the 'spawn' tasks then sgs->group_util is ~900 and
sgs->group_capacity = 1024 (single CPU sched domain) and this leads to
group_is_overloaded() returning true (2) and group_has_capacity() false
(3) much more often compared to the case when (1) is called.
I.e. there are much more cases of 'group_is_overloaded' and
'group_fully_busy' in WF_FORK wakeup sched_balance_find_dst_cpu() which
then returns much more often a CPU != smp_processor_id() (5).
This isn't good for these extremely short running tasks (FORK + EXIT)
and also involves calling sched_balance_find_dst_group_cpu() unnecessary
(single CPU sched domain).
Instead if (1) is called for 'p->flags & PF_EXITING' then the path
(4),(6) is taken much more often.
select_task_rq_fair(..., wake_flags = WF_FORK)
cpu = smp_processor_id()
new_cpu = sched_balance_find_dst_cpu(..., cpu, ...)
group = sched_balance_find_dst_group(..., cpu)
do {
update_sg_wakeup_stats()
sgs->group_type = group_classify()
if group_is_overloaded() (2)
return group_overloaded
if !group_has_capacity() (3)
return group_fully_busy
return group_has_spare (4)
} while group
if local_sgs.group_type > idlest_sgs.group_type
return idlest (5)
case group_has_spare:
if local_sgs.idle_cpus >= idlest_sgs.idle_cpus
return NULL (6)
Unixbench Tests './Run -c 4 spawn' on:
(a) VM AWS instance (m7gd.16xlarge) with v6.13 ('maxcpus=4 nr_cpus=4')
and Ubuntu 22.04.5 LTS (aarch64).
Shell & test run in '/user.slice/user-1000.slice/session-1.scope'.
w/o patch w/ patch
21005 27120
(b) i7-13700K with tip/sched/core ('nosmt maxcpus=8 nr_cpus=8') and
Ubuntu 22.04.5 LTS (x86_64).
Shell & test run in '/A'.
w/o patch w/ patch
67675 88806
CONFIG_SCHED_AUTOGROUP=y & /sys/proc/kernel/sched_autogroup_enabled equal
0 or 1.
Reported-by: Hazem Mohamed Abuelfotoh <abuehaze@amazon.com>
Signed-off-by: Dietmar Eggemann <dietmar.eggemann@arm.com>
Signed-off-by: Peter Zijlstra (Intel) <peterz@infradead.org>
Signed-off-by: Ingo Molnar <mingo@kernel.org>
Reviewed-by: Vincent Guittot <vincent.guittot@linaro.org>
Tested-by: Hagar Hemdan <hagarhem@amazon.com>
Cc: Linus Torvalds <torvalds@linux-foundation.org>
Link: https://lore.kernel.org/r/20250314151345.275739-1-dietmar.eggemann@arm.com
Repeat calls of static_branch_enable() to an already enabled
static key introduce overhead, because it calls cpus_read_lock().
Users may frequently set the uclamp value of tasks, triggering
the repeat enabling of the sched_uclamp_used static key.
Optimize this and avoid repeat calls to static_branch_enable()
by checking whether it's enabled already.
[ mingo: Rewrote the changelog for legibility ]
Signed-off-by: Xuewen Yan <xuewen.yan@unisoc.com>
Signed-off-by: Peter Zijlstra (Intel) <peterz@infradead.org>
Signed-off-by: Ingo Molnar <mingo@kernel.org>
Reviewed-by: Christian Loehle <christian.loehle@arm.com>
Reviewed-by: Vincent Guittot <vincent.guittot@linaro.org>
Link: https://lore.kernel.org/r/20250219093747.2612-2-xuewen.yan@unisoc.com
David reported a warning observed while loop testing kexec jump:
Interrupts enabled after irqrouter_resume+0x0/0x50
WARNING: CPU: 0 PID: 560 at drivers/base/syscore.c:103 syscore_resume+0x18a/0x220
kernel_kexec+0xf6/0x180
__do_sys_reboot+0x206/0x250
do_syscall_64+0x95/0x180
The corresponding interrupt flag trace:
hardirqs last enabled at (15573): [<ffffffffa8281b8e>] __up_console_sem+0x7e/0x90
hardirqs last disabled at (15580): [<ffffffffa8281b73>] __up_console_sem+0x63/0x90
That means __up_console_sem() was invoked with interrupts enabled. Further
instrumentation revealed that in the interrupt disabled section of kexec
jump one of the syscore_suspend() callbacks woke up a task, which set the
NEED_RESCHED flag. A later callback in the resume path invoked
cond_resched() which in turn led to the invocation of the scheduler:
__cond_resched+0x21/0x60
down_timeout+0x18/0x60
acpi_os_wait_semaphore+0x4c/0x80
acpi_ut_acquire_mutex+0x3d/0x100
acpi_ns_get_node+0x27/0x60
acpi_ns_evaluate+0x1cb/0x2d0
acpi_rs_set_srs_method_data+0x156/0x190
acpi_pci_link_set+0x11c/0x290
irqrouter_resume+0x54/0x60
syscore_resume+0x6a/0x200
kernel_kexec+0x145/0x1c0
__do_sys_reboot+0xeb/0x240
do_syscall_64+0x95/0x180
This is a long standing problem, which probably got more visible with
the recent printk changes. Something does a task wakeup and the
scheduler sets the NEED_RESCHED flag. cond_resched() sees it set and
invokes schedule() from a completely bogus context. The scheduler
enables interrupts after context switching, which causes the above
warning at the end.
Quite some of the code paths in syscore_suspend()/resume() can result in
triggering a wakeup with the exactly same consequences. They might not
have done so yet, but as they share a lot of code with normal operations
it's just a question of time.
The problem only affects the PREEMPT_NONE and PREEMPT_VOLUNTARY scheduling
models. Full preemption is not affected as cond_resched() is disabled and
the preemption check preemptible() takes the interrupt disabled flag into
account.
Cure the problem by adding a corresponding check into cond_resched().
Reported-by: David Woodhouse <dwmw@amazon.co.uk>
Suggested-by: Peter Zijlstra <peterz@infradead.org>
Signed-off-by: Thomas Gleixner <tglx@linutronix.de>
Signed-off-by: Ingo Molnar <mingo@kernel.org>
Tested-by: David Woodhouse <dwmw@amazon.co.uk>
Cc: Linus Torvalds <torvalds@linux-foundation.org>
Cc: stable@vger.kernel.org
Closes: https://lore.kernel.org/all/7717fe2ac0ce5f0a2c43fdab8b11f4483d54a2a4.camel@infradead.org
Pull scheduler fix from Borislav Petkov:
- Clarify what happens when a task is woken up from the wake queue and
make clear its removal from that queue is atomic
* tag 'sched_urgent_for_v6.14_rc3' of git://git.kernel.org/pub/scm/linux/kernel/git/tip/tip:
sched: Clarify wake_up_q()'s write to task->wake_q.next
Pull sched_ext fixes from Tejun Heo:
- Fix lock imbalance in a corner case of dispatch_to_local_dsq()
- Migration disabled tasks were confusing some BPF schedulers and its
handling had a bug. Fix it and simplify the default behavior by
dispatching them automatically
- ops.tick(), ops.disable() and ops.exit_task() were incorrectly
disallowing kfuncs that require the task argument to be the rq
operation is currently operating on and thus is rq-locked.
Allow them.
- Fix autogroup migration handling bug which was occasionally
triggering a warning in the cgroup migration path
- tools/sched_ext, selftest and other misc updates
* tag 'sched_ext-for-6.14-rc2-fixes' of git://git.kernel.org/pub/scm/linux/kernel/git/tj/sched_ext:
sched_ext: Use SCX_CALL_OP_TASK in task_tick_scx
sched_ext: Fix the incorrect bpf_list kfunc API in common.bpf.h.
sched_ext: selftests: Fix grammar in tests description
sched_ext: Fix incorrect assumption about migration disabled tasks in task_can_run_on_remote_rq()
sched_ext: Fix migration disabled handling in targeted dispatches
sched_ext: Implement auto local dispatching of migration disabled tasks
sched_ext: Fix incorrect time delta calculation in time_delta()
sched_ext: Fix lock imbalance in dispatch_to_local_dsq()
sched_ext: selftests/dsp_local_on: Fix selftest on UP systems
tools/sched_ext: Add helper to check task migration state
sched_ext: Fix incorrect autogroup migration detection
sched_ext: selftests/dsp_local_on: Fix sporadic failures
selftests/sched_ext: Fix enum resolution
sched_ext: Include task weight in the error state dump
sched_ext: Fixes typos in comments
A task wakeup can be either processed on the waker's CPU or bounced to the
wakee's previous CPU using an IPI (ttwu_queue). Bouncing to the wakee's CPU
avoids the waker's CPU locking and accessing the wakee's rq which can be
expensive across cache and node boundaries.
When ttwu_queue path is taken, select_task_rq() and thus ops.select_cpu()
may be skipped in some cases (racing against the wakee switching out). As
this confused some BPF schedulers, there wasn't a good way for a BPF
scheduler to tell whether idle CPU selection has been skipped, ops.enqueue()
couldn't insert tasks into foreign local DSQs, and the performance
difference on machines with simple toplogies were minimal, sched_ext
disabled ttwu_queue.
However, this optimization makes noticeable difference on more complex
topologies and a BPF scheduler now has an easy way tell whether
ops.select_cpu() was skipped since 9b671793c7 ("sched_ext, scx_qmap: Add
and use SCX_ENQ_CPU_SELECTED") and can insert tasks into foreign local DSQs
since 5b26f7b920 ("sched_ext: Allow SCX_DSQ_LOCAL_ON for direct
dispatches").
Implement SCX_OPS_ALLOW_QUEUED_WAKEUP which allows BPF schedulers to choose
to enable ttwu_queue optimization.
v2: Update the patch description and comment re. ops.select_cpu() being
skipped in some cases as opposed to always as per Neel.
Signed-off-by: Tejun Heo <tj@kernel.org>
Reported-by: Neel Natu <neelnatu@google.com>
Reported-by: Barret Rhoden <brho@google.com>
Cc: Peter Zijlstra (Intel) <peterz@infradead.org>
Acked-by: Andrea Righi <arighi@nvidia.com>
Add the const qualifier to all the ctl_tables in the tree except for
watchdog_hardlockup_sysctl, memory_allocation_profiling_sysctls,
loadpin_sysctl_table and the ones calling register_net_sysctl (./net,
drivers/inifiniband dirs). These are special cases as they use a
registration function with a non-const qualified ctl_table argument or
modify the arrays before passing them on to the registration function.
Constifying ctl_table structs will prevent the modification of
proc_handler function pointers as the arrays would reside in .rodata.
This is made possible after commit 78eb4ea25c ("sysctl: treewide:
constify the ctl_table argument of proc_handlers") constified all the
proc_handlers.
Created this by running an spatch followed by a sed command:
Spatch:
virtual patch
@
depends on !(file in "net")
disable optional_qualifier
@
identifier table_name != {
watchdog_hardlockup_sysctl,
iwcm_ctl_table,
ucma_ctl_table,
memory_allocation_profiling_sysctls,
loadpin_sysctl_table
};
@@
+ const
struct ctl_table table_name [] = { ... };
sed:
sed --in-place \
-e "s/struct ctl_table .table = &uts_kern/const struct ctl_table *table = \&uts_kern/" \
kernel/utsname_sysctl.c
Reviewed-by: Song Liu <song@kernel.org>
Acked-by: Steven Rostedt (Google) <rostedt@goodmis.org> # for kernel/trace/
Reviewed-by: Martin K. Petersen <martin.petersen@oracle.com> # SCSI
Reviewed-by: Darrick J. Wong <djwong@kernel.org> # xfs
Acked-by: Jani Nikula <jani.nikula@intel.com>
Acked-by: Corey Minyard <cminyard@mvista.com>
Acked-by: Wei Liu <wei.liu@kernel.org>
Acked-by: Thomas Gleixner <tglx@linutronix.de>
Reviewed-by: Bill O'Donnell <bodonnel@redhat.com>
Acked-by: Baoquan He <bhe@redhat.com>
Acked-by: Ashutosh Dixit <ashutosh.dixit@intel.com>
Acked-by: Anna Schumaker <anna.schumaker@oracle.com>
Signed-off-by: Joel Granados <joel.granados@kernel.org>