Commit Graph

77768 Commits

Author SHA1 Message Date
Eric Dumazet
cd8ae85299 tcp: provide SYN headers for passive connections
This patch allows a server application to get the TCP SYN headers for
its passive connections.  This is useful if the server is doing
fingerprinting of clients based on SYN packet contents.

Two socket options are added: TCP_SAVE_SYN and TCP_SAVED_SYN.

The first is used on a socket to enable saving the SYN headers
for child connections. This can be set before or after the listen()
call.

The latter is used to retrieve the SYN headers for passive connections,
if the parent listener has enabled TCP_SAVE_SYN.

TCP_SAVED_SYN is read once, it frees the saved SYN headers.

The data returned in TCP_SAVED_SYN are network (IPv4/IPv6) and TCP
headers.

Original patch was written by Tom Herbert, I changed it to not hold
a full skb (and associated dst and conntracking reference).

We have used such patch for about 3 years at Google.

Signed-off-by: Eric Dumazet <edumazet@google.com>
Acked-by: Neal Cardwell <ncardwell@google.com>
Tested-by: Neal Cardwell <ncardwell@google.com>
Signed-off-by: David S. Miller <davem@davemloft.net>
2015-05-05 16:02:34 -04:00
Christoph Hellwig
9dc6c806b3 nbd: stop using req->cmd
Signed-off-by: Christoph Hellwig <hch@lst.de>
Signed-off-by: Jens Axboe <axboe@fb.com>
2015-05-05 13:40:44 -06:00
Christoph Hellwig
a7928c1578 block: move PM request support to IDE
This removes the request types and hacks from the block code and into the
old IDE driver.  There is a small amunt of code duplication due to this,
but it's not too bad.

Signed-off-by: Christoph Hellwig <hch@lst.de>
Signed-off-by: Jens Axboe <axboe@fb.com>
2015-05-05 13:40:42 -06:00
Christoph Hellwig
ac7cdff00a block: remove REQ_TYPE_PM_SHUTDOWN
Signed-off-by: Christoph Hellwig <hch@lst.de>
Signed-off-by: Jens Axboe <axboe@fb.com>
2015-05-05 13:40:10 -06:00
Christoph Hellwig
b0b93b48a3 block: move REQ_TYPE_SENSE to the ide driver
Signed-off-by: Christoph Hellwig <hch@lst.de>
Signed-off-by: Jens Axboe <axboe@fb.com>
2015-05-05 13:40:07 -06:00
Christoph Hellwig
b42171ef7d block: move REQ_TYPE_ATA_TASKFILE and REQ_TYPE_ATA_PC to ide.h
These values are only used by the IDE driver, so move them into it
by allowing drivers to take cmd_type values after the first private
one.  Note that we have to turn cmd_type into a plain unsigned integer
so that gcc doesn't complain about mismatching enum types.

Signed-off-by: Christoph Hellwig <hch@lst.de>
Signed-off-by: Jens Axboe <axboe@fb.com>
2015-05-05 13:40:05 -06:00
Christoph Hellwig
4f8c9510ba block: rename REQ_TYPE_SPECIAL to REQ_TYPE_DRV_PRIV
Signed-off-by: Christoph Hellwig <hch@lst.de>
Signed-off-by: Jens Axboe <axboe@fb.com>
2015-05-05 13:40:03 -06:00
Christoph Hellwig
84be456f88 remove <asm/scatterlist.h>
We don't have any arch specific scatterlist now that parisc switched over
to the generic one.

Signed-off-by: Christoph Hellwig <hch@lst.de>
Signed-off-by: Jens Axboe <axboe@fb.com>
2015-05-05 13:35:39 -06:00
Jens Axboe
dac56212e8 bio: skip atomic inc/dec of ->bi_cnt for most use cases
Struct bio has a reference count that controls when it can be freed.
Most uses cases is allocating the bio, which then returns with a
single reference to it, doing IO, and then dropping that single
reference. We can remove this atomic_dec_and_test() in the completion
path, if nobody else is holding a reference to the bio.

If someone does call bio_get() on the bio, then we flag the bio as
now having valid count and that we must properly honor the reference
count when it's being put.

Tested-by: Robert Elliott <elliott@hp.com>
Signed-off-by: Jens Axboe <axboe@fb.com>
2015-05-05 13:32:49 -06:00
Jens Axboe
c4cf5261f8 bio: skip atomic inc/dec of ->bi_remaining for non-chains
Struct bio has an atomic ref count for chained bio's, and we use this
to know when to end IO on the bio. However, most bio's are not chained,
so we don't need to always introduce this atomic operation as part of
ending IO.

Add a helper to elevate the bi_remaining count, and flag the bio as
now actually needing the decrement at end_io time. Rename the field
to __bi_remaining to catch any current users of this doing the
incrementing manually.

For high IOPS workloads, this reduces the overhead of bio_endio()
substantially.

Tested-by: Robert Elliott <elliott@hp.com>
Acked-by: Kent Overstreet <kent.overstreet@gmail.com>
Reviewed-by: Jan Kara <jack@suse.cz>
Signed-off-by: Jens Axboe <axboe@fb.com>
2015-05-05 13:32:47 -06:00
David Ahern
0d0f738f6a IB/core: Fix unaligned accesses
Addresses the following kernel logs seen during boot of sparc systems:

Kernel unaligned access at TPC[103bce50] cm_find_listen+0x34/0xf8 [ib_cm]
Kernel unaligned access at TPC[103bce50] cm_find_listen+0x34/0xf8 [ib_cm]
Kernel unaligned access at TPC[103bce50] cm_find_listen+0x34/0xf8 [ib_cm]
Kernel unaligned access at TPC[103bce50] cm_find_listen+0x34/0xf8 [ib_cm]
Kernel unaligned access at TPC[103bce50] cm_find_listen+0x34/0xf8 [ib_cm]

Signed-off-by: David Ahern <david.ahern@oracle.com>
Signed-off-by: Doug Ledford <dledford@redhat.com>
2015-05-05 13:21:27 -04:00
Honggang LI
471e705832 IB/core: change rdma_gid2ip into void function as it always return zero
Signed-off-by: Honggang Li <honli@redhat.com>
Acked-by: Sean Hefty <sean.hefty@intel.com>
Signed-off-by: Doug Ledford <dledford@redhat.com>
2015-05-05 13:21:27 -04:00
Linus Torvalds
d9cee5d4f6 Merge git://git.kernel.org/pub/scm/linux/kernel/git/herbert/crypto-2.6
Pull crypto fixes from Herbert Xu:
 "This fixes a build problem with bcm63xx and yet another fix to the
  memzero_explicit function to ensure that the memset is not elided"

* git://git.kernel.org/pub/scm/linux/kernel/git/herbert/crypto-2.6:
  hwrng: bcm63xx - Fix driver compilation
  lib: make memzero_explicit more robust against dead store elimination
2015-05-05 09:03:52 -07:00
Rafael J. Wysocki
5c53b262c8 ACPI / property: Refine consistency check for PRP0001
Refine the check for the presence of the "compatible" property
if the PRP0001 device ID is present in the device's list of
ACPI/PNP IDs to also print the message if _DSD is missing
entirely or the format of it is incorrect.

One special case to take into accout is that the "compatible"
property need not be provided for devices having the PRP0001
device ID in their lists of ACPI/PNP IDs if they are ancestors
of PRP0001 devices with the "compatible" property present.
This is to cover heriarchies of device objects where the kernel
is only supposed to use a struct device representation for the
topmost one and the others represent, for example, functional
blocks of a composite device.

While at it, reduce the log level of the message to "info"
and reduce the log level of the "broken _DSD" message to
"debug" (noise reduction).

Signed-off-by: Rafael J. Wysocki <rafael.j.wysocki@intel.com>
Reviewed-by: Mika Westerberg <mika.westerberg@linux.intel.com>
2015-05-05 15:43:07 +02:00
Martin K. Petersen
5d3abf8ff6 libata: Fall back to unqueued READ LOG EXT if the DMA variant fails
Some devices advertise support for the READ/WRITE LOG DMA EXT commands
but fail when we try to issue them. This can lead to queued TRIM being
unintentionally disabled since the relevant feature flag is located in a
general purpose log page.

Fall back to unqueued READ LOG EXT if the DMA variant fails while
reading a log page.

Signed-off-by: Martin K. Petersen <martin.petersen@oracle.com>
Reviewed-by: Hannes Reinecke <hare@suse.de>
Signed-off-by: Tejun Heo <tj@kernel.org>
2015-05-05 09:30:19 -04:00
Martin K. Petersen
406c057c4e libata: READ LOG DMA EXT support can be in either page 119 or 120
Support for the READ/WRITE LOG DMA EXT commands can be signaled either
in page 119 or page 120. We should check both pages.

Signed-off-by: Martin K. Petersen <martin.petersen@oracle.com>
Reviewed-by: Hannes Reinecke <hare@suse.de>
Signed-off-by: Tejun Heo <tj@kernel.org>
2015-05-05 09:30:19 -04:00
Tatyana Nikolova
6eec177461 RDMA/core: Enable the iWarp Port Mapper to provide the actual address of the connecting peer to its clients
Add functionality to enable the port mapper on the passive side to provide to its
clients the actual (non-mapped) ip/tcp address information of the connecting peer

1) Adding remote_info_cb() to process the address info of the connecting peer
   The address info is provided by the user space port mapper service when
   the connection is initiated by the peer
2) Adding a hash list to store the remote address info
3) Adding functionality to add/remove the remote address info
   After the info has been provided to the port mapper client,
   it is removed from the hash list

Signed-off-by: Tatyana Nikolova <tatyana.e.nikolova@intel.com>
Reviewed-by: Steve Wise <swise@opengridcomputing.com>
Signed-off-by: Doug Ledford <dledford@redhat.com>
2015-05-05 09:18:01 -04:00
Lu, Han
632f3ab95f drm/i915/audio: add codec wakeup override enabled/disable callback
Add support for enabling codec wakeup override signal to allow
re-enumeration of the controller on SKL after resume from low power state.

In SKL, HDMI/DP codec and PCH HD Audio Controller are in different power
wells, so it's necessary to reset display audio codecs when power well on,
otherwise display audio codecs will disappear when resume from low power
state.
Reset steps when power on:
    enable codec wakeup -> azx_init_chip() -> disable codec wakeup

v3 by Jani: Simplify to only support toggling the appropriate chicken bit.

v4 by Han: add explanation and specify the hw swquence.

Signed-off-by: Lu, Han <han.lu@intel.com>
Signed-off-by: Jani Nikula <jani.nikula@intel.com>
Acked-by: Daniel Vetter <daniel.vetter@ffwll.ch>
Signed-off-by: Takashi Iwai <tiwai@suse.de>
2015-05-05 14:44:19 +02:00
Johannes Berg
f5c4ae0799 mac80211: make LED trigger names const
This is just a code cleanup, make the LED trigger names const
as they're not expected to be modified by drivers.

Signed-off-by: Johannes Berg <johannes.berg@intel.com>
2015-05-05 14:21:55 +02:00
Mikko Perttunen
73a7f0a906 memory: tegra: Add EMC (external memory controller) driver
Implements functionality needed to change the rate of the memory bus
clock.

Signed-off-by: Mikko Perttunen <mperttunen@nvidia.com>
Signed-off-by: Tomeu Vizoso <tomeu.vizoso@collabora.com>
Signed-off-by: Thierry Reding <treding@nvidia.com>
2015-05-05 11:12:17 +02:00
Mikko Perttunen
3d9dd6fdd2 memory: tegra: Add API needed by the EMC driver
The EMC driver needs to know the number of external memory devices and
also needs to update the EMEM configuration based on the new rate of the
memory bus.

To know how to update the EMEM config, looks up the values of the burst
regs in the DT, for a given timing.

Signed-off-by: Mikko Perttunen <mperttunen@nvidia.com>
Signed-off-by: Tomeu Vizoso <tomeu.vizoso@collabora.com>
Signed-off-by: Thierry Reding <treding@nvidia.com>
2015-05-05 11:10:19 +02:00
Geert Uytterhoeven
f31105347c irqchip: irqc: Remove platform data support
As of commit 914d7d1484 ("ARM: shmobile: r8a73a4: Remove legacy
code"), the Renesas R-Mobile/R-Car interrupt controller is used with DT
only, and interrupt numbers are thus always assigned automatically.

Drop the platform data declaration and all related support code.

Signed-off-by: Geert Uytterhoeven <geert+renesas@glider.be>
Cc: Magnus Damm <magnus.damm@gmail.com>
Cc: Jason Cooper <jason@lakedaemon.net>
Link: http://lkml.kernel.org/r/1430216270-31929-1-git-send-email-geert%2Brenesas@glider.be
Signed-off-by: Thomas Gleixner <tglx@linutronix.de>
2015-05-05 10:45:59 +02:00
David Herrmann
32e7b94a3f drm: simplify authentication management
The magic auth tokens we have are a simple map from cyclic IDs to drm_file
objects. Remove all the old bulk of code and replace it with a simple,
direct IDR.

The previous behavior is kept. Especially calling authmagic multiple times
on the same magic results in EINVAL except on the first call. The only
difference in behavior is that we never allocate IDs multiple times as
long as a client has its FD open.

v2:
 - Fix return code of GetMagic()
 - Use non-cyclic IDR allocator
 - fix off-by-one in "magic > INT_MAX" sanity check

v3:
 - drop redundant "magic > INT_MAX" check

Signed-off-by: David Herrmann <dh.herrmann@gmail.com>
Reviewed-by: Chris Wilson <chris@chris-wilson.co.uk>
Signed-off-by: Daniel Vetter <daniel.vetter@ffwll.ch>
2015-05-05 09:45:57 +02:00
David Herrmann
acab18b5c3 drm: drop unused 'magicfree' list
This list is write-only. It's never used for read-access, so no reason to
keep it around. Drop it!

Signed-off-by: David Herrmann <dh.herrmann@gmail.com>
Reviewed-by: Chris Wilson <chris@chris-wilson.co.uk>
Signed-off-by: Daniel Vetter <daniel.vetter@ffwll.ch>
2015-05-05 09:40:50 +02:00
Javi Merino
6828a4711f thermal: add trace events to the power allocator governor
Add trace events for the power allocator governor and the power actor
interface of the cpu cooling device.

Cc: Zhang Rui <rui.zhang@intel.com>
Cc: Eduardo Valentin <edubezval@gmail.com>
Cc: Frederic Weisbecker <fweisbec@gmail.com>
Cc: Ingo Molnar <mingo@redhat.com>
Acked-by: Steven Rostedt <rostedt@goodmis.org>
Signed-off-by: Javi Merino <javi.merino@arm.com>
Signed-off-by: Eduardo Valentin <edubezval@gmail.com>
2015-05-04 21:27:52 -07:00
Javi Merino
6b775e870c thermal: introduce the Power Allocator governor
The power allocator governor is a thermal governor that controls system
and device power allocation to control temperature.  Conceptually, the
implementation divides the sustainable power of a thermal zone among
all the heat sources in that zone.

This governor relies on "power actors", entities that represent heat
sources.  They can report current and maximum power consumption and
can set a given maximum power consumption, usually via a cooling
device.

The governor uses a Proportional Integral Derivative (PID) controller
driven by the temperature of the thermal zone.  The output of the
controller is a power budget that is then allocated to each power
actor that can have bearing on the temperature we are trying to
control.  It decides how much power to give each cooling device based
on the performance they are requesting.  The PID controller ensures
that the total power budget does not exceed the control temperature.

Cc: Zhang Rui <rui.zhang@intel.com>
Cc: Eduardo Valentin <edubezval@gmail.com>
Signed-off-by: Punit Agrawal <punit.agrawal@arm.com>
Signed-off-by: Javi Merino <javi.merino@arm.com>
Signed-off-by: Eduardo Valentin <edubezval@gmail.com>
2015-05-04 21:27:52 -07:00
Javi Merino
c36cf07176 thermal: cpu_cooling: implement the power cooling device API
Add a basic power model to the cpu cooling device to implement the
power cooling device API.  The power model uses the current frequency,
current load and OPPs for the power calculations.  The cpus must have
registered their OPPs using the OPP library.

Cc: Zhang Rui <rui.zhang@intel.com>
Cc: Eduardo Valentin <edubezval@gmail.com>
Signed-off-by: Kapileshwar Singh <kapileshwar.singh@arm.com>
Signed-off-by: Punit Agrawal <punit.agrawal@arm.com>
Signed-off-by: Javi Merino <javi.merino@arm.com>
Signed-off-by: Eduardo Valentin <edubezval@gmail.com>
2015-05-04 21:27:52 -07:00
Javi Merino
35b11d2e3a thermal: extend the cooling device API to include power information
Add three optional callbacks to the cooling device interface to allow
them to express power.  In addition to the callbacks, add helpers to
identify cooling devices that implement the power cooling device API.

Cc: Zhang Rui <rui.zhang@intel.com>
Cc: Eduardo Valentin <edubezval@gmail.com>
Signed-off-by: Javi Merino <javi.merino@arm.com>
Signed-off-by: Eduardo Valentin <edubezval@gmail.com>
2015-05-04 21:27:52 -07:00
Javi Merino
e33df1d2f3 thermal: let governors have private data for each thermal zone
A governor may need to store its current state between calls to
throttle().  That state depends on the thermal zone, so store it as
private data in struct thermal_zone_device.

The governors may have two new ops: bind_to_tz() and unbind_from_tz().
When provided, these functions let governors do some initialization
and teardown when they are bound/unbound to a tz and possibly store that
information in the governor_data field of the struct
thermal_zone_device.

Cc: Zhang Rui <rui.zhang@intel.com>
Cc: Eduardo Valentin <edubezval@gmail.com>
Signed-off-by: Javi Merino <javi.merino@arm.com>
Signed-off-by: Eduardo Valentin <edubezval@gmail.com>
2015-05-04 21:27:52 -07:00
Javi Merino
bcdcbbc711 thermal: fair_share: generalize the weight concept
The fair share governor has the concept of weights, which is the
influence of each cooling device in a thermal zone.  The current
implementation forces the weights of all cooling devices in a thermal
zone to add up to a 100.  This complicates setups, as you need to know
in advance how many cooling devices you are going to have.  If you bind a
new cooling device, you have to modify all the other cooling devices
weights, which is error prone.  Furthermore, you can't specify a
"default" weight for platforms since that default value depends on the
number of cooling devices in the platform.

This patch generalizes the concept of weight by allowing any number to
be a "weight".  Weights are now relative to each other.  Platforms that
don't specify weights get the same default value for all their cooling
devices, so all their cdevs are considered to be equally influential.

It's important to note that previous users of the weights don't need to
alter the code: percentages continue to work as they used to.  This
patch just removes the constraint of all the weights in a thermal zone
having to add up to a 100.  If they do, you get the same behavior as
before.  If they don't, fair share now works for that platform.

Cc: Zhang Rui <rui.zhang@intel.com>
Cc: Eduardo Valentin <edubezval@gmail.com>
Cc: Durgadoss R <durgadoss.r@intel.com>
Acked-by: Durgadoss R <durgadoss.r@intel.com>
Signed-off-by: Javi Merino <javi.merino@arm.com>
Signed-off-by: Eduardo Valentin <edubezval@gmail.com>
2015-05-04 21:27:51 -07:00
Kapileshwar Singh
6cd9e9f629 thermal: of: fix cooling device weights in device tree
Currently you can specify the weight of the cooling device in the device
tree but that information is not populated to the
thermal_bind_params where the fair share governor expects it to
be.  The of thermal zone device doesn't have a thermal_bind_params
structure and arguably it's better to pass the weight inside the
thermal_instance as it is specific to the bind of a cooling device to a
thermal zone parameter.

Core thermal code is fixed to populate the weight in the instance from
the thermal_bind_params, so platform code that was passing the weight
inside the thermal_bind_params continue to work seamlessly.

While we are at it, create a default value for the weight parameter for
those thermal zones that currently don't define it and remove the
hardcoded default in of-thermal.

Cc: Zhang Rui <rui.zhang@intel.com>
Cc: "Rafael J. Wysocki" <rjw@rjwysocki.net>
Cc: Len Brown <lenb@kernel.org>
Cc: Peter Feuerer <peter@piie.net>
Cc: Darren Hart <dvhart@infradead.org>
Cc: Eduardo Valentin <edubezval@gmail.com>
Cc: Kukjin Kim <kgene@kernel.org>
Cc: Durgadoss R <durgadoss.r@intel.com>
Signed-off-by: Kapileshwar Singh <kapileshwar.singh@arm.com>
Signed-off-by: Eduardo Valentin <edubezval@gmail.com>
2015-05-04 21:27:50 -07:00
Doug Smythies
4055fad340 intel_pstate: Add tsc collection and keep previous target pstate
The intel_pstate driver is difficult to debug and investigate without tsc.

Also, it is likely use of tsc, and some version of C0 percentage,
will be re-introdcued in futute.

There have also been occasions where it is desirebale to know, and
confirm, the previous target pstate.

This patch brings back tsc, adds previous target pstate,
and adds both to the trace data collection.

Signed-off-by: Doug Smythies <dsmythies@telus.net>
Acked-by: Kristen Carlson Accardi <kristen@linux.intel.com>
Signed-off-by: Rafael J. Wysocki <rafael.j.wysocki@intel.com>
2015-05-05 01:08:54 +02:00
David S. Miller
b7ba7b469a Merge tag 'mac80211-for-davem-2015-05-04' of git://git.kernel.org/pub/scm/linux/kernel/git/jberg/mac80211
Johannes Berg says:

====================
We have only a few fixes right now:
 * a fix for an issue with hash collision handling in the
   rhashtable conversion
 * a merge issue - rhashtable removed default shrinking
   just before mac80211 was converted, so enable it now
 * remove an invalid WARN that can trigger with legitimate
   userspace behaviour
 * add a struct member missing from kernel-doc that caused
   a lot of warnings
====================

Signed-off-by: David S. Miller <davem@davemloft.net>
2015-05-04 16:00:55 -04:00
David S. Miller
73e84313ee Merge branch 'for-upstream' of git://git.kernel.org/pub/scm/linux/kernel/git/bluetooth/bluetooth-next
Johan Hedberg says:

====================
pull request: bluetooth-next 2015-05-04

Here's the first bluetooth-next pull request for 4.2:

 - Various fixes for at86rf230 driver
 - ieee802154: trace events support for rdev->ops
 - HCI UART driver refactoring
 - New Realtek IDs added to btusb driver
 - Off-by-one fix for rtl8723b in btusb driver
 - Refactoring of btbcm driver for both UART & USB use

Please let me know if there are any issues pulling. Thanks.
====================

Signed-off-by: David S. Miller <davem@davemloft.net>
2015-05-04 15:36:07 -04:00
Shaohua Li
b2387ddcce blk-mq: fix FUA request hang
When a FUA request enters its DATA stage of flush pipeline, the
request is added to mq requeue list, the request will then be added to
ctx->rq_list. blk_mq_attempt_merge() might merge the request with a bio.
Later when the request is finished the flush pipeline, the
request->__data_len is 0. Then I only saw the bio gets endio called, the
original request never finish.

Adding REQ_FLUSH_SEQ into REQ_NOMERGE_FLAGS looks an easy fix.

stable: 3.15+

Signed-off-by: Shaohua Li <shli@fb.com>
Signed-off-by: Jens Axboe <axboe@fb.com>
2015-05-04 13:09:55 -06:00
Linus Lüssing
9afd85c9e4 net: Export IGMP/MLD message validation code
With this patch, the IGMP and MLD message validation functions are moved
from the bridge code to IPv4/IPv6 multicast files. Some small
refactoring was done to enhance readibility and to iron out some
differences in behaviour between the IGMP and MLD parsing code (e.g. the
skb-cloning of MLD messages is now only done if necessary, just like the
IGMP part always did).

Finally, these IGMP and MLD message validation functions are exported so
that not only the bridge can use it but batman-adv later, too.

Signed-off-by: Linus Lüssing <linus.luessing@c0d3.blue>
Signed-off-by: David S. Miller <davem@davemloft.net>
2015-05-04 14:49:23 -04:00
Jacek Anaszewski
20f56758b0 leds: unify the location of led-trigger API
Part of led-trigger API was in the private drivers/leds/leds.h header.
Move it to the include/linux/leds.h header to unify the API location
and announce it as public. It has been already exported from
led-triggers.c with EXPORT_SYMBOL_GPL macro. The no-op definitions are
changed from macros to inline to match the style of the surrounding code.

Signed-off-by: Jacek Anaszewski <j.anaszewski@samsung.com>
Cc: Richard Purdie <rpurdie@rpsys.net>
Acked-by: Sakari Ailus <sakari.ailus@linux.intel.com>
Signed-off-by: Bryan Wu <cooloney@gmail.com>
2015-05-04 11:05:55 -07:00
Jamal Hadi Salim
c19ae86a51 tc: remove unused redirect ttl
improves ingress+u32 performance from 22.4 Mpps to 22.9 Mpps

Signed-off-by: Jamal Hadi Salim <jhs@mojatatu.com>
Signed-off-by: Alexei Starovoitov <ast@plumgrid.com>
Acked-by: Florian Westphal <fw@strlen.de>
Acked-by: Daniel Borkmann <daniel@iogearbox.net>
Acked-by: Jesper Dangaard Brouer <brouer@redhat.com>
Signed-off-by: David S. Miller <davem@davemloft.net>
2015-05-04 12:16:12 -04:00
David Riley
7892158a96 soc/tegra: pmc: move to using a restart handler
The pmc driver was previously exporting tegra_pmc_restart, which was
assigned to machine_desc.init_machine, taking precedence over the
restart handlers registered through register_restart_handler().

Signed-off-by: David Riley <davidriley@chromium.org>
[tomeu.vizoso@collabora.com: Rebased]
Signed-off-by: Tomeu Vizoso <tomeu.vizoso@collabora.com>
Acked-by: Stephen Warren <swarren@nvidia.com>
Reviewed-by: Alexandre Courbot <acourbot@nvidia.com>
[treding@nvidia.com: minor cleanups]
Signed-off-by: Thierry Reding <treding@nvidia.com>
2015-05-04 14:21:45 +02:00
Mikko Perttunen
6ea2609ab3 soc/tegra: fuse: Add RAM code reader helper
Needed for the EMC and MC drivers to know what timings from the DT to
use.

Signed-off-by: Mikko Perttunen <mperttunen@nvidia.com>
Signed-off-by: Tomeu Vizoso <tomeu.vizoso@collabora.com>
Signed-off-by: Thierry Reding <treding@nvidia.com>
2015-05-04 14:21:21 +02:00
Randy Dunlap
ff419b3f95 mac80211: fix 90 kernel-doc warnings
Eliminate 90 of these warnings:

Warning(..//include/net/mac80211.h:1682): No description found for parameter 'drv_priv[0]'

Signed-off-by: Randy Dunlap <rdunlap@infradead.org>
Signed-off-by: Johannes Berg <johannes.berg@intel.com>
2015-05-04 12:56:13 +02:00
Thierry Reding
d1313e7896 iommu/tegra-smmu: Add debugfs support
Provide clients and swgroups files in debugfs. These files show for
which clients IOMMU translation is enabled and which ASID is associated
with each SWGROUP.

Cc: Hiroshi Doyu <hdoyu@nvidia.com>
Acked-by: Joerg Roedel <jroedel@suse.de>
Signed-off-by: Thierry Reding <treding@nvidia.com>
2015-05-04 12:54:23 +02:00
Thierry Reding
e660df07ab memory: tegra: Add SWGROUP names
Subsequent patches will add debugfs files that print the status of the
SWGROUPs. Add a new names field and complement the SoC tables with the
names of the individual SWGROUPs.

Signed-off-by: Thierry Reding <treding@nvidia.com>
2015-05-04 12:54:23 +02:00
Daniel Borkmann
7829fb09a2 lib: make memzero_explicit more robust against dead store elimination
In commit 0b053c9518 ("lib: memzero_explicit: use barrier instead
of OPTIMIZER_HIDE_VAR"), we made memzero_explicit() more robust in
case LTO would decide to inline memzero_explicit() and eventually
find out it could be elimiated as dead store.

While using barrier() works well for the case of gcc, recent efforts
from LLVMLinux people suggest to use llvm as an alternative to gcc,
and there, Stephan found in a simple stand-alone user space example
that llvm could nevertheless optimize and thus elimitate the memset().
A similar issue has been observed in the referenced llvm bug report,
which is regarded as not-a-bug.

Based on some experiments, icc is a bit special on its own, while it
doesn't seem to eliminate the memset(), it could do so with an own
implementation, and then result in similar findings as with llvm.

The fix in this patch now works for all three compilers (also tested
with more aggressive optimization levels). Arguably, in the current
kernel tree it's more of a theoretical issue, but imho, it's better
to be pedantic about it.

It's clearly visible with gcc/llvm though, with the below code: if we
would have used barrier() only here, llvm would have omitted clearing,
not so with barrier_data() variant:

  static inline void memzero_explicit(void *s, size_t count)
  {
    memset(s, 0, count);
    barrier_data(s);
  }

  int main(void)
  {
    char buff[20];
    memzero_explicit(buff, sizeof(buff));
    return 0;
  }

  $ gcc -O2 test.c
  $ gdb a.out
  (gdb) disassemble main
  Dump of assembler code for function main:
   0x0000000000400400  <+0>: lea   -0x28(%rsp),%rax
   0x0000000000400405  <+5>: movq  $0x0,-0x28(%rsp)
   0x000000000040040e <+14>: movq  $0x0,-0x20(%rsp)
   0x0000000000400417 <+23>: movl  $0x0,-0x18(%rsp)
   0x000000000040041f <+31>: xor   %eax,%eax
   0x0000000000400421 <+33>: retq
  End of assembler dump.

  $ clang -O2 test.c
  $ gdb a.out
  (gdb) disassemble main
  Dump of assembler code for function main:
   0x00000000004004f0  <+0>: xorps  %xmm0,%xmm0
   0x00000000004004f3  <+3>: movaps %xmm0,-0x18(%rsp)
   0x00000000004004f8  <+8>: movl   $0x0,-0x8(%rsp)
   0x0000000000400500 <+16>: lea    -0x18(%rsp),%rax
   0x0000000000400505 <+21>: xor    %eax,%eax
   0x0000000000400507 <+23>: retq
  End of assembler dump.

As gcc, clang, but also icc defines __GNUC__, it's sufficient to define
this in compiler-gcc.h only to be picked up. For a fallback or otherwise
unsupported compiler, we define it as a barrier. Similarly, for ecc which
does not support gcc inline asm.

Reference: https://llvm.org/bugs/show_bug.cgi?id=15495
Reported-by: Stephan Mueller <smueller@chronox.de>
Tested-by: Stephan Mueller <smueller@chronox.de>
Signed-off-by: Daniel Borkmann <daniel@iogearbox.net>
Cc: Theodore Ts'o <tytso@mit.edu>
Cc: Stephan Mueller <smueller@chronox.de>
Cc: Hannes Frederic Sowa <hannes@stressinduktion.org>
Cc: mancha security <mancha1@zoho.com>
Cc: Mark Charlebois <charlebm@gmail.com>
Cc: Behan Webster <behanw@converseincode.com>
Signed-off-by: Herbert Xu <herbert@gondor.apana.org.au>
2015-05-04 17:49:51 +08:00
Daniel Vetter
99264a61df drm/vblank: Fixup and document timestamp update/read barriers
This was a bit too much cargo-culted, so lets make it solid:
- vblank->count doesn't need to be an atomic, writes are always done
  under the protection of dev->vblank_time_lock. Switch to an unsigned
  long instead and update comments. Note that atomic_read is just a
  normal read of a volatile variable, so no need to audit all the
  read-side access specifically.

- The barriers for the vblank counter seqlock weren't complete: The
  read-side was missing the first barrier between the counter read and
  the timestamp read, it only had a barrier between the ts and the
  counter read. We need both.

- Barriers weren't properly documented. Since barriers only work if
  you have them on boths sides of the transaction it's prudent to
  reference where the other side is. To avoid duplicating the
  write-side comment 3 times extract a little store_vblank() helper.
  In that helper also assert that we do indeed hold
  dev->vblank_time_lock, since in some cases the lock is acquired a
  few functions up in the callchain.

Spotted while reviewing a patch from Chris Wilson to add a fastpath to
the vblank_wait ioctl.

v2: Add comment to better explain how store_vblank works, suggested by
Chris.

v3: Peter noticed that as-is the 2nd smp_wmb is redundant with the
implicit barrier in the spin_unlock. But that can only be proven by
auditing all callers and my point in extracting this little helper was
to localize all the locking into just one place. Hence I think that
additional optimization is too risky.

Cc: Chris Wilson <chris@chris-wilson.co.uk>
Cc: Mario Kleiner <mario.kleiner.de@gmail.com>
Cc: Ville Syrjälä <ville.syrjala@linux.intel.com>
Cc: Michel Dänzer <michel@daenzer.net>
Cc: Peter Hurley <peter@hurleysoftware.com>
Reviewed-by: Chris Wilson <chris@chris-wilson.co.uk>
Reviewed-and-tested-by: Mario Kleiner <mario.kleiner.de@gmail.com>
Signed-off-by: Daniel Vetter <daniel.vetter@intel.com>
2015-05-04 10:57:00 +02:00
Tom Herbert
2f59e1ebaa net: Add flow_keys digest
Some users of flow keys (well just sch_choke now) need to pass
flow_keys in skbuff cb, and use them for exact comparisons of flows
so that skb->hash is not sufficient. In order to increase size of
the flow_keys structure, we introduce another structure for
the purpose of passing flow keys in skbuff cb. We limit this structure
to sixteen bytes, and we will technically treat this as a digest of
flow_keys struct hence its name flow_keys_digest. In the first
incaranation we just copy the flow_keys structure up to 16 bytes--
this is the same information previously passed in the cb. In the
future, we'll adapt this for larger flow_keys and could use something
like SHA-1 over the whole flow_keys to improve the quality of the
digest.

Signed-off-by: Tom Herbert <tom@herbertland.com>
Signed-off-by: David S. Miller <davem@davemloft.net>
2015-05-04 00:09:09 -04:00
Tom Herbert
50fb799289 net: Add skb_get_hash_perturb
This calls flow_disect and __skb_get_hash to procure a hash for a
packet. Input includes a key to initialize jhash. This function
does not set skb->hash.

Signed-off-by: Tom Herbert <tom@herbertland.com>
Signed-off-by: David S. Miller <davem@davemloft.net>
2015-05-04 00:09:08 -04:00
Alexander Duyck
d54385ce68 etherdev: Process is_multicast_ether_addr at same size as other operations
This change makes it so that we process the address in
is_multicast_ether_addr at the same size as the other calls.  This allows
us to avoid duplicate reads when used with other calls such as
is_zero_ether_addr or eth_addr_copy.  In addition I have added a 64 bit
version of the function so in eth_type_trans we can process the destination
address as a 64 bit value throughout.

Signed-off-by: Alexander Duyck <alexander.h.duyck@redhat.com>
Signed-off-by: David S. Miller <davem@davemloft.net>
2015-05-03 22:30:36 -04:00
Eric Dumazet
a5d2809040 codel: fix maxpacket/mtu confusion
Under presence of TSO/GSO/GRO packets, codel at low rates can be quite
useless. In following example, not a single packet was ever dropped,
while average delay in codel queue is ~100 ms !

qdisc codel 0: parent 1:12 limit 16000p target 5.0ms interval 100.0ms
 Sent 134376498 bytes 88797 pkt (dropped 0, overlimits 0 requeues 0)
 backlog 13626b 3p requeues 0
  count 0 lastcount 0 ldelay 96.9ms drop_next 0us
  maxpacket 9084 ecn_mark 0 drop_overlimit 0

This comes from a confusion of what should be the minimal backlog. It is
pretty clear it is not 64KB or whatever max GSO packet ever reached the
qdisc.

codel intent was to use MTU of the device.

After the fix, we finally drop some packets, and rtt/cwnd of my single
TCP flow are meeting our expectations.

qdisc codel 0: parent 1:12 limit 16000p target 5.0ms interval 100.0ms
 Sent 102798497 bytes 67912 pkt (dropped 1365, overlimits 0 requeues 0)
 backlog 6056b 3p requeues 0
  count 1 lastcount 1 ldelay 36.3ms drop_next 0us
  maxpacket 10598 ecn_mark 0 drop_overlimit 0

Signed-off-by: Eric Dumazet <edumazet@google.com>
Cc: Kathleen Nichols <nichols@pollere.com>
Cc: Dave Taht <dave.taht@gmail.com>
Cc: Van Jacobson <vanj@google.com>
Signed-off-by: David S. Miller <davem@davemloft.net>
2015-05-03 22:17:40 -04:00
Tom Herbert
82a584b7cd ipv6: Flow label state ranges
This patch divides the IPv6 flow label space into two ranges:
0-7ffff is reserved for flow label manager, 80000-fffff will be
used for creating auto flow labels (per RFC6438). This only affects how
labels are set on transmit, it does not affect receive. This range split
can be disbaled by systcl.

Background:

IPv6 flow labels have been an unmitigated disappointment thus far
in the lifetime of IPv6. Support in HW devices to use them for ECMP
is lacking, and OSes don't turn them on by default. If we had these
we could get much better hashing in IPv6 networks without resorting
to DPI, possibly eliminating some of the motivations to to define new
encaps in UDP just for getting ECMP.

Unfortunately, the initial specfications of IPv6 did not clarify
how they are to be used. There has always been a vague concept that
these can be used for ECMP, flow hashing, etc. and we do now have a
good standard how to this in RFC6438. The problem is that flow labels
can be either stateful or stateless (as in RFC6438), and we are
presented with the possibility that a stateless label may collide
with a stateful one.  Attempts to split the flow label space were
rejected in IETF. When we added support in Linux for RFC6438, we
could not turn on flow labels by default due to this conflict.

This patch splits the flow label space and should give us
a path to enabling auto flow labels by default for all IPv6 packets.
This is an API change so we need to consider compatibility with
existing deployment. The stateful range is chosen to be the lower
values in hopes that most uses would have chosen small numbers.

Once we resolve the stateless/stateful issue, we can proceed to
look at enabling RFC6438 flow labels by default (starting with
scaled testing).

Signed-off-by: Tom Herbert <tom@herbertland.com>
Signed-off-by: David S. Miller <davem@davemloft.net>
2015-05-03 21:58:01 -04:00