Currently Linux always offers a reply chunk, even when the reply
can be sent inline (ie. is smaller than 1KB).
On the client, registering a memory region can be expensive. A
server may choose not to use the reply chunk, wasting the cost of
the registration.
This is a change only for RPC replies smaller than 1KB which the
server constructs in the RPC reply send buffer. Because the elements
of the reply must be XDR encoded, a copy-free data transfer has no
benefit in this case.
Signed-off-by: Chuck Lever <chuck.lever@oracle.com>
Reviewed-by: Sagi Grimberg <sagig@mellanox.com>
Tested-by: Devesh Sharma <devesh.sharma@avagotech.com>
Signed-off-by: Anna Schumaker <Anna.Schumaker@Netapp.com>
The client has been setting up a reply chunk for NFS READs that are
smaller than the inline threshold. This is not efficient: both the
server and client CPUs have to copy the reply's data payload into
and out of the memory region that is then transferred via RDMA.
Using the write list, the data payload is moved by the device and no
extra data copying is necessary.
Signed-off-by: Chuck Lever <chuck.lever@oracle.com>
Reviewed-by: Devesh Sharma <devesh.sharma@avagotech.com>
Reviewed-By: Sagi Grimberg <sagig@mellanox.com>
Tested-by: Devesh Sharma <devesh.sharma@avagotech.com>
Signed-off-by: Anna Schumaker <Anna.Schumaker@Netapp.com>
When the size of the RPC message is near the inline threshold (1KB),
the client would allow messages to be sent that were a few bytes too
large.
When marshaling RPC/RDMA requests, ensure the combined size of
RPC/RDMA header and RPC header do not exceed the inline threshold.
Endpoints typically reject RPC/RDMA messages that exceed the size
of their receive buffers.
The two server implementations I test with (Linux and Solaris) use
receive buffers that are larger than the client’s inline threshold.
Thus so far this has been benign, observed only by code inspection.
Signed-off-by: Chuck Lever <chuck.lever@oracle.com>
Reviewed-by: Devesh Sharma <devesh.sharma@avagotech.com>
Tested-by: Devesh Sharma <devesh.sharma@avagotech.com>
Signed-off-by: Anna Schumaker <Anna.Schumaker@Netapp.com>
RDMA_MSGP type calls insert a zero pad in the middle of the RPC
message to align the RPC request's data payload to the server's
alignment preferences. A server can then "page flip" the payload
into place to avoid a data copy in certain circumstances. However:
1. The client has to have a priori knowledge of the server's
preferred alignment
2. Requests eligible for RDMA_MSGP are requests that are small
enough to have been sent inline, and convey a data payload
at the _end_ of the RPC message
Today 1. is done with a sysctl, and is a global setting that is
copied during mount. Linux does not support CCP to query the
server's preferences (RFC 5666, Section 6).
A small-ish NFSv3 WRITE might use RDMA_MSGP, but no NFSv4
compound fits bullet 2.
Thus the Linux client currently leaves RDMA_MSGP disabled. The
Linux server handles RDMA_MSGP, but does not use any special
page flipping, so it confers no benefit.
Clean up the marshaling code by removing the logic that constructs
RDMA_MSGP type calls. This also reduces the maximum send iovec size
from four to just two elements.
/proc/sys/sunrpc/rdma_inline_write_padding is a kernel API, and
thus is left in place.
Signed-off-by: Chuck Lever <chuck.lever@oracle.com>
Tested-by: Devesh Sharma <devesh.sharma@avagotech.com>
Signed-off-by: Anna Schumaker <Anna.Schumaker@Netapp.com>
Untangle the end of rpcrdma_ia_open() by moving DMA MR set-up, which
is different for each registration method, to the .ro_open functions.
This is refactoring only. No behavior change is expected.
Signed-off-by: Chuck Lever <chuck.lever@oracle.com>
Tested-by: Devesh Sharma <devesh.sharma@avagotech.com>
Signed-off-by: Anna Schumaker <Anna.Schumaker@Netapp.com>
All HCA providers have an ib_get_dma_mr() verb. Thus
rpcrdma_ia_open() will either grab the device's local_dma_key if one
is available, or it will call ib_get_dma_mr(). If ib_get_dma_mr()
fails, rpcrdma_ia_open() fails and no transport is created.
Therefore execution never reaches the ib_reg_phys_mr() call site in
rpcrdma_register_internal(), so it can be removed.
The remaining logic in rpcrdma_{de}register_internal() is folded
into rpcrdma_{alloc,free}_regbuf().
This is clean up only. No behavior change is expected.
Signed-off-by: Chuck Lever <chuck.lever@oracle.com>
Reviewed-by: Devesh Sharma <devesh.sharma@avagotech.com>
Reviewed-By: Sagi Grimberg <sagig@mellanox.com>
Tested-by: Devesh Sharma <devesh.sharma@avagotech.com>
Signed-off-by: Anna Schumaker <Anna.Schumaker@Netapp.com>
PHYSICAL memory registration uses a single rkey for all of the
client's memory, thus is insecure. It is still useful in some cases
for testing.
Retain the ability to select PHYSICAL memory registration capability
via /proc/sys/sunrpc/rdma_memreg_strategy, but don't fall back to it
if the HCA does not support FRWR or FMR.
This means amso1100 no longer works out of the box with NFS/RDMA.
When using amso1100 HCAs, set the memreg_strategy sysctl to 6 before
performing NFS/RDMA mounts.
Signed-off-by: Chuck Lever <chuck.lever@oracle.com>
Tested-by: Devesh Sharma <devesh.sharma@avagotech.com>
Reviewed-by: Christoph Hellwig <hch@lst.de>
Signed-off-by: Anna Schumaker <Anna.Schumaker@Netapp.com>
In preparation for similar increases on NFS/RDMA servers, bump the
advertised credit limit for RPC/RDMA to 128. This allocates some
extra resources, but the client will continue to allow only the
number of RPCs in flight that the server requests via its advertised
credit limit.
Signed-off-by: Chuck Lever <chuck.lever@oracle.com>
Reviewed-By: Sagi Grimberg <sagig@mellanox.com>
Tested-by: Devesh Sharma <devesh.sharma@avagotech.com>
Signed-off-by: Anna Schumaker <Anna.Schumaker@Netapp.com>
This split the function in two: the allocation part is inlined into the
inflate function and the lock part is kept into his own function.
This change is needed in order to be able to allocate more than one page
before doing the hypervisor call.
Signed-off-by: Xavier Deguillard <xdeguillard@vmware.com>
Acked-by: Dmitry Torokhov <dtor@vmware.com>
Signed-off-by: Philip P. Moltmann <moltmann@vmware.com>
Acked-by: Andy King <acking@vmware.com>
Signed-off-by: Greg Kroah-Hartman <gregkh@linuxfoundation.org>
While cgroup subsystems can't be modules, blkcg supports dynamically
loadable policies which interact with cgroup core. Export
cgrp_dfl_root so that cgroup_on_dfl() can be used in those modules.
Signed-off-by: Tejun Heo <tj@kernel.org>
Acked-by: Li Zefan <lizefan@huawei.com>
Cc: Johannes Weiner <hannes@cmpxchg.org>
Without this patch, it is cumbersome to read the trace output but
ignoring the normal, potentially verbose, output of the debuggee. One
common example is doing something like the following:
perf trace -s find /tmp > /dev/null
Without this patch, the trace summary will be lost. Now, it will still
be printed at the end. This behavior is also applied by strace.
Cc: Milian Wolff <milian.wolff@kdab.com>
Cc: David Ahern <dsahern@gmail.com>
Link: http://lkml.kernel.org/n/tip-tqnks6y2cnvm5f9g2dsfr7zl@git.kernel.org
Signed-off-by: Arnaldo Carvalho de Melo <acme@redhat.com>
Fix five occurrences of the checkpatch.pl error:
ERROR: do not use assignment in if condition
The semantic patch that makes this change is:
// <smpl>
@@
identifier i;
expression E;
statement S1, S2;
@@
+ i = E;
if (
- (i = E)
+ i
) S1 else S2
@@
identifier i;
expression E;
statement S;
constant c;
binary operator b;
@@
+ i = E;
if (
- (i = E)
+ i
b
c ) S
// </smpl>
Signed-off-by: Kris Borer <kborer@gmail.com>
Signed-off-by: Greg Kroah-Hartman <gregkh@linuxfoundation.org>
When using OF defined controllers the platform data struct is shared
between all devices, so it can't be used for device specific settings.
However it is currently used for the OF properties
needs-reset-on-resume and has-transaction-translator.
To fix this issue move setting hcd->has_tt to the probe and
move pdata->reset_on_resume to the private data.
Signed-off-by: Alban Bedel <albeu@free.fr>
Acked-by: Alan Stern <stern@rowland.harvard.edu>
Signed-off-by: Greg Kroah-Hartman <gregkh@linuxfoundation.org>
IRQ_DOMAIN is a hidden config option, so depending on it doesn't
make any sense. Select the config option because it's required to
compile this driver.
Signed-off-by: Stephen Boyd <sboyd@codeaurora.org>
Reviewed-by: Andy Gross <agross@codeaurora.org>
Signed-off-by: Greg Kroah-Hartman <gregkh@linuxfoundation.org>
Add tracepoints to retrieve information about read, write
and non-data commands. For performance measurement support
tracepoints are added at the beginning and at the end of
transfers. Following is a list showing the new tracepoint
events. The "cmd" parameter here represents the opcode, SID,
and full 16-bit address.
spmi_write_begin: cmd and data buffer.
spmi_write_end : cmd and return value.
spmi_read_begin : cmd.
spmi_read_end : cmd, return value and data buffer.
spmi_cmd : cmd.
The reason that cmd appears at both the beginning and at
the end event is that SPMI drivers can request commands
concurrently. cmd helps in matching the corresponding
events.
SPMI tracepoints can be enabled like:
echo 1 >/sys/kernel/debug/tracing/events/spmi/enable
and will dump messages that can be viewed in
/sys/kernel/debug/tracing/trace that look like:
... spmi_read_begin: opc=56 sid=00 addr=0x0000
... spmi_read_end: opc=56 sid=00 addr=0x0000 ret=0 len=02 buf=0x[01-40]
... spmi_write_begin: opc=48 sid=00 addr=0x0000 len=3 buf=0x[ff-ff-ff]
Suggested-by: Sagar Dharia <sdharia@codeaurora.org>
Acked-by: Steven Rostedt <rostedt@goodmis.org>
Reviewed-by: Stephen Boyd <sboyd@codeaurora.org>
Signed-off-by: Gilad Avidov <gavidov@codeaurora.org>
Signed-off-by: Ankit Gupta <ankgupta@codeaurora.org>
Signed-off-by: Greg Kroah-Hartman <gregkh@linuxfoundation.org>
Until now, only 32-bit DMA addressing was allowed, following a report on
some old Intel machine that dropped 64-bit PCIe packets, even though
pci_set_dma_mask() was successful with DMA_BIT_MASK(64).
But then came TI's Keystone II chip (ARM Cortex A15 + DSPs), which refuses
32-bit DMA addressing (for good reasons). So 64-bit DMA is allowed as a
fallback option.
Signed-off-by: Eli Billauer <eli.billauer@gmail.com>
Signed-off-by: Greg Kroah-Hartman <gregkh@linuxfoundation.org>
Fix RS232/485 mode incorrect setting after S3/S4 resume for F81504/508/512
We had add RS232/485 RTS control with fecf27a373. But when it
resume from S3/S4, the mode register 0x40 + 0x08 * idx + 7 will
rewrite to 0x01 (RS232 mode).
This patch will modify 2 sections.
One is pci_fintek_init(), if it called when first init, it will
write mode register with 0x01. If it called from S3/S4 resume,
it's will get the relative port data and pass it to
pci_fintek_rs485_config() with NULL rs485 parameter.
The another modification is in pci_fintek_rs485_config(). It'll
re-apply old configuration when the parameter rs485 is NULL.
Signed-off-by: Peter Hung <hpeter+linux_kernel@gmail.com>
Signed-off-by: Greg Kroah-Hartman <gregkh@linuxfoundation.org>
This patch uses kasprintf which combines kzalloc and sprintf.
kasprintf also takes care of the size calculation.
Semantic patch used is as follows:
@@
expression a,flag;
expression list args;
statement S;
@@
a =
- \(kmalloc\|kzalloc\)(...,flag)
+ kasprintf (flag,args)
<... when != a
if (a == NULL || ...) S
...>
- sprintf(a,args);
Signed-off-by: Shraddha Barke <shraddha.6596@gmail.com>
Signed-off-by: Greg Kroah-Hartman <gregkh@linuxfoundation.org>
Replace comma between expression statements by a semicolon.
The semantic patch used is as follows:
@@
expression e1,e2;
@@
e1
- ,
+ ;
e2;
Signed-off-by: Shraddha Barke <shraddha.6596@gmail.com>
Signed-off-by: Greg Kroah-Hartman <gregkh@linuxfoundation.org>
Replace comma between expression statements by a semicolon.
The semantic patch used is as follows:
@@
expression e1,e2;
@@
e1
- ,
+ ;
e2;
Signed-off-by: Shraddha Barke <shraddha.6596@gmail.com>
Signed-off-by: Greg Kroah-Hartman <gregkh@linuxfoundation.org>
Replace comma between expression statements by a semicolon.
The semantic patch used is as follows:
@@
expression e1,e2;
@@
e1
- ,
+ ;
e2;
Signed-off-by: Shraddha Barke <shraddha.6596@gmail.com>
Signed-off-by: Greg Kroah-Hartman <gregkh@linuxfoundation.org>
This patch does away with the cast on void * as it is unnecessary.
Semantic patch used is as follows:
@r@
expression x;
void* e;
type T;
identifier f;
@@
(
*((T *)e)
|
((T *)x)[...]
|
((T *)x)->f
|
- (T *)
e
)
Signed-off-by: Shraddha Barke <shraddha.6596@gmail.com>
Signed-off-by: Greg Kroah-Hartman <gregkh@linuxfoundation.org>
This patch does away with the cast on void * as it is unnecessary.
Semantic patch used is as follows:
@r@
expression x;
void* e;
type T;
identifier f;
@@
(
*((T *)e)
|
((T *)x)[...]
|
((T *)x)->f
|
- (T *)
e
)
Signed-off-by: Shraddha Barke <shraddha.6596@gmail.com>
Signed-off-by: Greg Kroah-Hartman <gregkh@linuxfoundation.org>
This patch does away with the cast on void * as it is unnecessary.
Semantic patch used is as follows:
@r@
expression x;
void* e;
type T;
identifier f;
@@
(
*((T *)e)
|
((T *)x)[...]
|
((T *)x)->f
|
- (T *)
e
)
Signed-off-by: Shraddha Barke <shraddha.6596@gmail.com>
Signed-off-by: Greg Kroah-Hartman <gregkh@linuxfoundation.org>
Switch the visornic over to use napi. Currently there is a kernel
thread that sits and waits on a wait queue to get notified of incoming
virtual interrupts. It would be nice if we could handle frame reception
using the standard napi processing instead. This patch creates our napi
instance and has the rx thread schedule it
Given that the unisys hypervisor currently requires that queue servicing
be done by a polling loop that wakes up every 2ms, lets instead also
convert that to a timer, which is simpler, and allows us to remove all
the thread starting and stopping code.
Signed-off-by: Neil Horman <nhorman@redhat.com>
Signed-off-by: Benjamin Romer <benjamin.romer@unisys.com>
Signed-off-by: Greg Kroah-Hartman <gregkh@linuxfoundation.org>
The character ' ' is removed after the character '('. This fixes the
checkpatch.pl error - "space prohibited after that open
parenthesis '('".
Signed-off-by: Chandra S Gorentla <csgorentla@gmail.com>
Signed-off-by: Greg Kroah-Hartman <gregkh@linuxfoundation.org>
Added 'void' keyword in the paranthesis of function definitions, when
there are no arguments to the functions. This fixes the checkpatch.pl
error - "Bad function definition 'function()' should probably be
function(void)".
Signed-off-by: Chandra S Gorentla <csgorentla@gmail.com>
Signed-off-by: Greg Kroah-Hartman <gregkh@linuxfoundation.org>
The Linux kernel coding style guidelines suggest not using typedefs for
structure and enum types. This patch gets rid of the typedefs for
Ack_session_info_t.
The following Coccinelle semantic patch detects the cases for struct type:
@tn@
identifier i;
type td;
@@
-typedef
struct i { ... }
-td
;
@@
type tn.td;
identifier tn.i;
@@
-td
+ struct i
Signed-off-by: Shraddha Barke <shraddha.6596@gmail.com>
Signed-off-by: Greg Kroah-Hartman <gregkh@linuxfoundation.org>