linux

mirror of https://github.com/torvalds/linux.git synced 2026-05-04 14:32:27 -04:00

Author	SHA1	Message	Date
Linus Torvalds	bf4afc53b7	Convert 'alloc_obj' family to use the new default GFP_KERNEL argument This was done entirely with mindless brute force, using git grep -l '\<k[vmz]alloc_objs(., GFP_KERNEL)' \| xargs sed -i 's/$alloc_objs(.*$, GFP_KERNEL)/\1)/' to convert the new alloc_obj() users that had a simple GFP_KERNEL argument to just drop that argument. Note that due to the extreme simplicity of the scripting, any slightly more complex cases spread over multiple lines would not be triggered: they definitely exist, but this covers the vast bulk of the cases, and the resulting diff is also then easier to check automatically. For the same reason the 'flex' versions will be done as a separate conversion. Signed-off-by: Linus Torvalds <torvalds@linux-foundation.org>	2026-02-21 17:09:51 -08:00
Kees Cook	69050f8d6d	treewide: Replace kmalloc with kmalloc_obj for non-scalar types This is the result of running the Coccinelle script from scripts/coccinelle/api/kmalloc_objs.cocci. The script is designed to avoid scalar types (which need careful case-by-case checking), and instead replace kmalloc-family calls that allocate struct or union object instances: Single allocations: kmalloc(sizeof(TYPE), ...) are replaced with: kmalloc_obj(TYPE, ...) Array allocations: kmalloc_array(COUNT, sizeof(TYPE), ...) are replaced with: kmalloc_objs(TYPE, COUNT, ...) Flex array allocations: kmalloc(struct_size(PTR, FAM, COUNT), ...) are replaced with: kmalloc_flex(PTR, FAM, COUNT, ...) (where TYPE may also be VAR) The resulting allocations no longer return "void ", instead returning "TYPE ". Signed-off-by: Kees Cook <kees@kernel.org>	2026-02-21 01:02:28 -08:00
Marco Crivellari	f673fb3449	RDMA/core: RDMA/mlx5: replace use of system_unbound_wq with system_dfl_wq Currently if a user enqueue a work item using schedule_delayed_work() the used wq is "system_wq" (per-cpu wq) while queue_delayed_work() use WORK_CPU_UNBOUND (used when a cpu is not specified). The same applies to schedule_work() that is using system_wq and queue_work(), that makes use again of WORK_CPU_UNBOUND. This lack of consistency cannot be addressed without refactoring the API. system_unbound_wq should be the default workqueue so as not to enforce locality constraints for random work whenever it's not required. Adding system_dfl_wq to encourage its use when unbound work should be used. The old system_unbound_wq will be kept for a few release cycles. Suggested-by: Tejun Heo <tj@kernel.org> Signed-off-by: Marco Crivellari <marco.crivellari@suse.com> Link: https://patch.msgid.link/20251101163121.78400-2-marco.crivellari@suse.com Signed-off-by: Leon Romanovsky <leon@kernel.org>	2025-11-06 02:23:23 -05:00
Mark Zhang	a3c9d0fcd3	RDMA/ucma: Support write an event into a CM Enable user-space to inject an event into a CM through it's event channel. Two new events are added and supported: RDMA_CM_EVENT_USER and RDMA_CM_EVENT_INTERNAL. With these 2 events a new event parameter "arg" is supported, which is passed from sender to receiver transparently. With this feature an application is able to write an event into a CM channel with a new user-space rdmacm API. For example thread T1 could write an event with the API: rdma_write_cm_event(cm_id, RDMA_CM_EVENT_USER, status, arg); and thread T2 could receive the event with rdma_get_cm_event(). Signed-off-by: Mark Zhang <markzhang@nvidia.com> Reviewed-by: Vlad Dumitrescu <vdumitrescu@nvidia.com> Link: https://patch.msgid.link/fdf49d0b17a45933c5d8c1d90605c9447d9a3c73.1751279794.git.leonro@nvidia.com Signed-off-by: Leon Romanovsky <leon@kernel.org>	2025-08-13 06:16:11 -04:00
Mark Zhang	810f874eda	RDMA/ucma: Support query resolved service records Enable user-space to query resolved service records through a ucma command when a RDMA_CM_EVENT_ADDRINFO_RESOLVED event is received. Signed-off-by: Or Har-Toov <ohartoov@nvidia.com> Signed-off-by: Mark Zhang <markzhang@nvidia.com> Reviewed-by: Vlad Dumitrescu <vdumitrescu@nvidia.com> Link: https://patch.msgid.link/1090ee7c00c3f8058c4f9e7557de983504a16715.1751279794.git.leonro@nvidia.com Signed-off-by: Leon Romanovsky <leon@kernel.org>	2025-08-13 06:16:07 -04:00
Mark Zhang	a6404823fe	RDMA/cma: Support IB service record resolution Add new UCMA command and the corresponding CMA implementation. Userspace can send this command to request service resolution based on service name or ID. On a successful resolution, one or multiple service records are returned, the first one will be used as destination address by default. Two new CM events are added and returned to caller accordingly: - RDMA_CM_EVENT_ADDRINFO_RESOLVED: Resolve succeeded; - RDMA_CM_EVENT_ADDRINFO_ERROR: Resolve failed. Internally two new CM states are added: - RDMA_CM_ADDRINFO_QUERY: CM is in the process of IB service resolution; - RDMA_CM_ADDRINFO_RESOLVED: CM has finished the resolve process. With these new states, beside existing state transfer processes, 2 new processes are supported: 1. The default address is used: RDMA_CM_ADDR_BOUND -> RDMA_CM_ADDRINFO_QUERY -> RDMA_CM_ADDRINFO_RESOLVED -> RDMA_CM_ROUTE_QUERY 2. To use a different address: RDMA_CM_ADDR_BOUND -> RDMA_CM_ADDRINFO_QUERY-> RDMA_CM_ADDRINFO_RESOLVED -> RDMA_CM_ADDR_QUERY -> RDMA_CM_ADDR_RESOLVED -> RDMA_CM_ROUTE_QUERY In the 2nd case, resolve_addrinfo returns multiple records, a user could call rdma_resolve_addr() with the one that is not the first. Signed-off-by: Or Har-Toov <ohartoov@nvidia.com> Signed-off-by: Mark Zhang <markzhang@nvidia.com> Reviewed-by: Vlad Dumitrescu <vdumitrescu@nvidia.com> Link: https://patch.msgid.link/b6e82ad75522a13b5efe4ff86da0e465aab04cc2.1751279794.git.leonro@nvidia.com Signed-off-by: Leon Romanovsky <leon@kernel.org>	2025-08-13 06:16:00 -04:00
Nicolas Bouchinet	f33cd9b3fd	RDMA/core: Fixes infiniband sysctl bounds Bound infiniband iwcm and ucma sysctl writings between SYSCTL_ZERO and SYSCTL_INT_MAX. The proc_handler has thus been updated to proc_dointvec_minmax. Signed-off-by: Nicolas Bouchinet <nicolas.bouchinet@ssi.gouv.fr> Link: https://patch.msgid.link/20250224095826.16458-6-nicolas.bouchinet@clip-os.org Reviewed-by: Zhu Yanjun <yanjun.zhu@linux.dev> Reviewed-by: Joel Granados <joel.granados@kernel.org> Signed-off-by: Leon Romanovsky <leon@kernel.org>	2025-03-03 13:49:02 -05:00
Al Viro	8152f82010	fdget(), more trivial conversions all failure exits prior to fdget() leave the scope, all matching fdput() are immediately followed by leaving the scope. [xfs_ioc_commit_range() chunk moved here as well] Reviewed-by: Christian Brauner <brauner@kernel.org> Signed-off-by: Al Viro <viro@zeniv.linux.org.uk>	2024-11-03 01:28:06 -05:00
Al Viro	cb787f4ac0	[tree-wide] finally take no_llseek out no_llseek had been defined to NULL two years ago, in commit `868941b144` ("fs: remove no_llseek") To quote that commit, At -rc1 we'll need do a mechanical removal of no_llseek - git grep -l -w no_llseek \| grep -v porting.rst \| while read i; do sed -i '/\<no_llseek\>/d' $i done would do it. Unfortunately, that hadn't been done. Linus, could you do that now, so that we could finally put that thing to rest? All instances are of the form .llseek = no_llseek, so it's obviously safe. Signed-off-by: Al Viro <viro@zeniv.linux.org.uk> Signed-off-by: Linus Torvalds <torvalds@linux-foundation.org>	2024-09-27 08:18:43 -07:00
Al Viro	1da91ea87a	introduce fd_file(), convert all accessors to it. For any changes of struct fd representation we need to turn existing accesses to fields into calls of wrappers. Accesses to struct fd::flags are very few (3 in linux/file.h, 1 in net/socket.c, 3 in fs/overlayfs/file.c and 3 more in explicit initializers). Those can be dealt with in the commit converting to new layout; accesses to struct fd::file are too many for that. This commit converts (almost) all of f.file to fd_file(f). It's not entirely mechanical ('file' is used as a member name more than just in struct fd) and it does not even attempt to distinguish the uses in pointer context from those in boolean context; the latter will be eventually turned into a separate helper (fd_empty()). NOTE: mass conversion to fd_empty(), tempting as it might be, is a bad idea; better do that piecewise in commit that convert from fdget...() to CLASS(...). [conflicts in fs/fhandle.c, kernel/bpf/syscall.c, mm/memcontrol.c caught by git; fs/stat.c one got caught by git grep] [fs/xattr.c conflict] Reviewed-by: Christian Brauner <brauner@kernel.org> Signed-off-by: Al Viro <viro@zeniv.linux.org.uk>	2024-08-12 22:00:43 -04:00
Joel Granados	6d07cc269b	infiniband: Remove the now superfluous sentinel element from ctl_table array This commit comes at the tail end of a greater effort to remove the empty elements at the end of the ctl_table arrays (sentinels) which will reduce the overall build time size of the kernel and run time memory bloat by ~64 bytes per sentinel (further information Link : https://lore.kernel.org/all/ZO5Yx5JFogGi%2FcBo@bombadil.infradead.org/) Remove sentinel from iwcm_ctl_table and ucma_ctl_table Signed-off-by: Joel Granados <j.granados@samsung.com> Signed-off-by: Luis Chamberlain <mcgrof@kernel.org>	2023-10-11 12:16:13 -07:00
Mark Zhang	bf9a992851	RDMA/core: Rename rdma_route.num_paths field to num_pri_alt_paths This fields means the total number of primary and alternative paths, i.e.,: 0 - No primary nor alternate path is available; 1 - Only primary path is available; 2 - Both primary and alternate path are available. Rename it to avoid confusion as with follow patches primary path will support multiple path records. Signed-off-by: Mark Zhang <markzhang@nvidia.com> Reviewed-by: Mark Bloch <mbloch@nvidia.com> Link: https://lore.kernel.org/r/cbe424de63a56207870d70c5edce7c68e45f429e.1662631201.git.leonro@nvidia.com Signed-off-by: Leon Romanovsky <leon@kernel.org>	2022-09-22 12:35:13 +03:00
Leon Romanovsky	36e8169ec9	RDMA/ucma: Protect mc during concurrent multicast leaves Partially revert the commit mentioned in the Fixes line to make sure that allocation and erasing multicast struct are locked. BUG: KASAN: use-after-free in ucma_cleanup_multicast drivers/infiniband/core/ucma.c:491 [inline] BUG: KASAN: use-after-free in ucma_destroy_private_ctx+0x914/0xb70 drivers/infiniband/core/ucma.c:579 Read of size 8 at addr ffff88801bb74b00 by task syz-executor.1/25529 CPU: 0 PID: 25529 Comm: syz-executor.1 Not tainted 5.16.0-rc7-syzkaller #0 Hardware name: Google Google Compute Engine/Google Compute Engine, BIOS Google 01/01/2011 Call Trace: __dump_stack lib/dump_stack.c:88 [inline] dump_stack_lvl+0xcd/0x134 lib/dump_stack.c:106 print_address_description.constprop.0.cold+0x8d/0x320 mm/kasan/report.c:247 __kasan_report mm/kasan/report.c:433 [inline] kasan_report.cold+0x83/0xdf mm/kasan/report.c:450 ucma_cleanup_multicast drivers/infiniband/core/ucma.c:491 [inline] ucma_destroy_private_ctx+0x914/0xb70 drivers/infiniband/core/ucma.c:579 ucma_destroy_id+0x1e6/0x280 drivers/infiniband/core/ucma.c:614 ucma_write+0x25c/0x350 drivers/infiniband/core/ucma.c:1732 vfs_write+0x28e/0xae0 fs/read_write.c:588 ksys_write+0x1ee/0x250 fs/read_write.c:643 do_syscall_x64 arch/x86/entry/common.c:50 [inline] do_syscall_64+0x35/0xb0 arch/x86/entry/common.c:80 entry_SYSCALL_64_after_hwframe+0x44/0xae Currently the xarray search can touch a concurrently freeing mc as the xa_for_each() is not surrounded by any lock. Rather than hold the lock for a full scan hold it only for the effected items, which is usually an empty list. Fixes: `95fe51096b` ("RDMA/ucma: Remove mc_list and rely on xarray") Link: https://lore.kernel.org/r/1cda5fabb1081e8d16e39a48d3a4f8160cea88b8.1642491047.git.leonro@nvidia.com Reported-by: syzbot+e3f96c43d19782dd14a7@syzkaller.appspotmail.com Suggested-by: Jason Gunthorpe <jgg@nvidia.com> Reviewed-by: Maor Gottlieb <maorg@nvidia.com> Signed-off-by: Leon Romanovsky <leonro@nvidia.com> Signed-off-by: Jason Gunthorpe <jgg@nvidia.com>	2022-01-28 11:36:55 -04:00
YueHaibing	c5b8eaf8af	RDMA/core: Use the DEVICE_ATTR_RO macro Use the DEVICE_ATTR_RO() helper instead of plain DEVICE_ATTR(), which makes the code a bit shorter and easier to read. Link: https://lore.kernel.org/r/20210526132949.20184-1-yuehaibing@huawei.com Signed-off-by: YueHaibing <yuehaibing@huawei.com> Signed-off-by: Jason Gunthorpe <jgg@nvidia.com>	2021-05-28 20:39:51 -03:00
Xiaofei Tan	e3d65124ce	RDMA/ucma: Cleanup to reduce duplicate code The lable "err1" does the same thing as the branch of copy_to_user() failed in the function ucma_create_id(). Just jump to the label directly to reduce duplicate code. Link: https://lore.kernel.org/r/1620291106-3675-1-git-send-email-tanxiaofei@huawei.com Signed-off-by: Xiaofei Tan <tanxiaofei@huawei.com> Reviewed-by: Leon Romanovsky <leonro@nvidia.com> Signed-off-by: Jason Gunthorpe <jgg@nvidia.com>	2021-05-11 13:17:26 -03:00
Wenpeng Liang	9516b8f9ec	RDMA/core: Add necessary spaces Space is required before '(' of switch statements and around '='. Link: https://lore.kernel.org/r/1617783353-48249-4-git-send-email-liweihang@huawei.com Signed-off-by: Wenpeng Liang <liangwenpeng@huawei.com> Signed-off-by: Weihang Li <liweihang@huawei.com> Signed-off-by: Jason Gunthorpe <jgg@nvidia.com>	2021-04-12 14:52:22 -03:00
Wenpeng Liang	ab27f45fdf	RDMA/core: Print the function name by __func__ instead of an fixed string It's better to use __func__ than a fixed string to print a function's name. Link: https://lore.kernel.org/r/1617783353-48249-2-git-send-email-liweihang@huawei.com Signed-off-by: Wenpeng Liang <liangwenpeng@huawei.com> Signed-off-by: Weihang Li <liweihang@huawei.com> Signed-off-by: Jason Gunthorpe <jgg@nvidia.com>	2021-04-12 14:52:22 -03:00
Jason Gunthorpe	8ae291cc95	RDMA/ucma: Do not miss ctx destruction steps in some cases The destruction flow is very complicated here because the cm_id can be destroyed from the event handler at any time if the device is hot-removed. This leaves behind a partial ctx with no cm_id in the xarray, and will let user space leak memory. Make everything consistent in this flow in all places: - Return the xarray back to XA_ZERO_ENTRY before beginning any destruction. The thread that reaches this first is responsible to kfree, everyone else does nothing. - Test the xarray during the special hot-removal case to block the queue_work, this has much simpler locking and doesn't require a 'destroying' - Fix the ref initialization so that it is only positive if cm_id != NULL, then rely on that to guide the destruction process in all cases. Now the new ucma_destroy_private_ctx() can be called in all places that want to free the ctx, including all the error unwinds, and none of the details are missed. Fixes: `a1d33b70db` ("RDMA/ucma: Rework how new connections are passed through event delivery") Link: https://lore.kernel.org/r/20210105111327.230270-1-leon@kernel.org Signed-off-by: Leon Romanovsky <leonro@nvidia.com> Signed-off-by: Jason Gunthorpe <jgg@nvidia.com>	2021-01-06 17:15:15 -04:00
Joe Perches	1c7fd72687	RDMA: Convert sysfs device * show functions to use sysfs_emit() Done with cocci script: @@ identifier d_show; identifier dev, attr, buf; @@ ssize_t d_show(struct device dev, struct device_attribute attr, char buf) { <... return - sprintf(buf, + sysfs_emit(buf, ...); ...> } @@ identifier d_show; identifier dev, attr, buf; @@ ssize_t d_show(struct device dev, struct device_attribute attr, char buf) { <... return - snprintf(buf, PAGE_SIZE, + sysfs_emit(buf, ...); ...> } @@ identifier d_show; identifier dev, attr, buf; @@ ssize_t d_show(struct device dev, struct device_attribute attr, char buf) { <... return - scnprintf(buf, PAGE_SIZE, + sysfs_emit(buf, ...); ...> } @@ identifier d_show; identifier dev, attr, buf; expression chr; @@ ssize_t d_show(struct device dev, struct device_attribute attr, char buf) { <... return - strcpy(buf, chr); + sysfs_emit(buf, chr); ...> } @@ identifier d_show; identifier dev, attr, buf; identifier len; @@ ssize_t d_show(struct device dev, struct device_attribute attr, char buf) { <... len = - sprintf(buf, + sysfs_emit(buf, ...); ...> return len; } @@ identifier d_show; identifier dev, attr, buf; identifier len; @@ ssize_t d_show(struct device dev, struct device_attribute attr, char buf) { <... len = - snprintf(buf, PAGE_SIZE, + sysfs_emit(buf, ...); ...> return len; } @@ identifier d_show; identifier dev, attr, buf; identifier len; @@ ssize_t d_show(struct device dev, struct device_attribute attr, char buf) { <... len = - scnprintf(buf, PAGE_SIZE, + sysfs_emit(buf, ...); ...> return len; } @@ identifier d_show; identifier dev, attr, buf; identifier len; @@ ssize_t d_show(struct device dev, struct device_attribute attr, char buf) { <... - len += scnprintf(buf + len, PAGE_SIZE - len, + len += sysfs_emit_at(buf, len, ...); ...> return len; } @@ identifier d_show; identifier dev, attr, buf; expression chr; @@ ssize_t d_show(struct device dev, struct device_attribute attr, char *buf) { ... - strcpy(buf, chr); - return strlen(buf); + return sysfs_emit(buf, chr); } Link: https://lore.kernel.org/r/7f406fa8e3aa2552c022bec680f621e38d1fe414.1602122879.git.joe@perches.com Signed-off-by: Joe Perches <joe@perches.com> Reviewed-by: Jason Gunthorpe <jgg@nvidia.com> Signed-off-by: Jason Gunthorpe <jgg@nvidia.com>	2020-10-26 19:53:21 -03:00
Maor Gottlieb	c7a198c700	RDMA/ucma: Fix use after free in destroy id flow ucma_free_ctx() should call to __destroy_id() on all the connection requests that have not been delivered to user space. Currently it calls on the context itself and cause to use after free. Fixes the trace: BUG: Unable to handle kernel data access on write at 0x5deadbeef0000108 Faulting instruction address: 0xc0080000002428f4 Oops: Kernel access of bad area, sig: 11 [#1] Call Trace: [c000000207f2b680] [c00800000024280c] .__destroy_id+0x28c/0x610 [rdma_ucm] (unreliable) [c000000207f2b750] [c0080000002429c4] .__destroy_id+0x444/0x610 [rdma_ucm] [c000000207f2b820] [c008000000242c24] .ucma_close+0x94/0xf0 [rdma_ucm] [c000000207f2b8c0] [c00000000046fbdc] .__fput+0xac/0x330 [c000000207f2b960] [c00000000015d48c] .task_work_run+0xbc/0x110 [c000000207f2b9f0] [c00000000012fb00] .do_exit+0x430/0xc50 [c000000207f2bae0] [c0000000001303ec] .do_group_exit+0x5c/0xd0 [c000000207f2bb70] [c000000000144a34] .get_signal+0x194/0xe30 [c000000207f2bc60] [c00000000001f6b4] .do_notify_resume+0x124/0x470 [c000000207f2bd60] [c000000000032484] .interrupt_exit_user_prepare+0x1b4/0x240 [c000000207f2be20] [c000000000010034] interrupt_return+0x14/0x1c0 Rename listen_ctx to conn_req_ctx as the poor name was the cause of this bug. Fixes: `a1d33b70db` ("RDMA/ucma: Rework how new connections are passed through event delivery") Link: https://lore.kernel.org/r/20201012045600.418271-4-leon@kernel.org Signed-off-by: Maor Gottlieb <maorg@nvidia.com> Signed-off-by: Leon Romanovsky <leonro@nvidia.com> Signed-off-by: Jason Gunthorpe <jgg@nvidia.com>	2020-10-16 14:07:08 -03:00
Leon Romanovsky	b09c4d7012	RDMA/restrack: Improve readability in task name management Use rdma_restrack_set_name() and rdma_restrack_parent_name() instead of tricky uses of rdma_restrack_attach_task()/rdma_restrack_uadd(). This uniformly makes all restracks add'd using rdma_restrack_add(). Link: https://lore.kernel.org/r/20200922091106.2152715-6-leon@kernel.org Signed-off-by: Leon Romanovsky <leonro@mellanox.com> Signed-off-by: Jason Gunthorpe <jgg@nvidia.com>	2020-09-22 19:47:35 -03:00
Jason Gunthorpe	f5449e7480	RDMA/ucma: Rework ucma_migrate_id() to avoid races with destroy ucma_destroy_id() assumes that all things accessing the ctx will do so via the xarray. This assumption violated only in the case the FD is being closed, then the ctx is reached via the ctx_list. Normally this is OK since ucma_destroy_id() cannot run concurrenty with release(), however with ucma_migrate_id() is involved this can violated as the close of the 2nd FD can run concurrently with destroy on the first: CPU0 CPU1 ucma_destroy_id(fda) ucma_migrate_id(fda -> fdb) ucma_get_ctx() xa_lock() _ucma_find_context() xa_erase() xa_unlock() xa_lock() ctx->file = new_file list_move() xa_unlock() ucma_put_ctx() ucma_close(fdb) _destroy_id() kfree(ctx) _destroy_id() wait_for_completion() // boom, ctx was freed The ctx->file must be modified under the handler and xa_lock, and prior to modification the ID must be rechecked that it is still reachable from cur_file, ie there is no parallel destroy or migrate. To make this work remove the double locking and streamline the control flow. The double locking was obsoleted by the handler lock now directly preventing new uevents from being created, and the ctx_list cannot be read while holding fgets on both files. Removing the double locking also removes the need to check for the same file. Fixes: `88314e4dda` ("RDMA/cma: add support for rdma_migrate_id()") Link: https://lore.kernel.org/r/0-v1-05c5a4090305+3a872-ucma_syz_migrate_jgg@nvidia.com Reported-and-tested-by: syzbot+cc6fc752b3819e082d0c@syzkaller.appspotmail.com Signed-off-by: Jason Gunthorpe <jgg@nvidia.com>	2020-09-18 20:54:01 -03:00
Alex Dewar	4f680cb9f1	RDMA/ucma: Fix resource leak on error path In ucma_process_join(), if the call to xa_alloc() fails, the function will return without freeing mc. Fix this by jumping to the correct line. In the process I renamed the jump labels to something more memorable for extra clarity. Link: https://lore.kernel.org/r/20200902162454.332828-1-alex.dewar90@gmail.com Addresses-Coverity-ID: 1496814 ("Resource leak") Fixes: `95fe51096b` ("RDMA/ucma: Remove mc_list and rely on xarray") Signed-off-by: Alex Dewar <alex.dewar90@gmail.com> Signed-off-by: Jason Gunthorpe <jgg@nvidia.com>	2020-09-02 17:06:48 -03:00
Jason Gunthorpe	6989aa62d3	Merge tag 'v5.9-rc3' into rdma.git for-next Required due to dependencies in following patches. Signed-off-by: Jason Gunthorpe <jgg@nvidia.com>	2020-08-31 12:28:12 -03:00
Jason Gunthorpe	657360d6c7	RDMA/ucma: Remove closing and the close_wq Use cancel_work_sync() to ensure that the wq is not running and simply assign NULL to ctx->cm_id to indicate if the work ran or not. Delete the close_wq since flush_workqueue() is no longer needed. Link: https://lore.kernel.org/r/20200818120526.702120-15-leon@kernel.org Signed-off-by: Leon Romanovsky <leonro@mellanox.com> Signed-off-by: Jason Gunthorpe <jgg@nvidia.com>	2020-08-27 08:38:16 -03:00
Jason Gunthorpe	a1d33b70db	RDMA/ucma: Rework how new connections are passed through event delivery When a new connection is established the RDMA CM creates a new cm_id and passes it through to the event handler. However inside the UCMA the new ID is not assigned a ucma_context until the user retrieves the event from a syscall. This creates a weird edge condition where a cm_id's context can continue to point at the listening_id that created it, and a number of additional edge conditions on event list clean up related to destroying half created IDs. There is also a race condition in ucma_get_events() where the cm_id->context is being assigned without holding the handler_mutex. Simplify all of this by creating the ucma_context inside the event handler itself and eliminating the edge case of a half created cm_id. All cm_id's can be uniformly destroyed via __destroy_id() or via the close_work. Link: https://lore.kernel.org/r/20200818120526.702120-14-leon@kernel.org Signed-off-by: Leon Romanovsky <leonro@mellanox.com> Signed-off-by: Jason Gunthorpe <jgg@nvidia.com>	2020-08-27 08:38:16 -03:00
Jason Gunthorpe	310ca1a7dc	RDMA/ucma: Narrow file->mut in ucma_event_handler() Since the backlog is now an atomic the file->mut is now only protecting the event_list and ctx_list. Narrow its scope to make it clear Link: https://lore.kernel.org/r/20200818120526.702120-13-leon@kernel.org Signed-off-by: Leon Romanovsky <leonro@mellanox.com> Signed-off-by: Jason Gunthorpe <jgg@nvidia.com>	2020-08-27 08:38:16 -03:00
Jason Gunthorpe	26c15dec49	RDMA/ucma: Change backlog into an atomic There is no reason to grab the file->mut just to do this inc/dec work. Use an atomic. Link: https://lore.kernel.org/r/20200818120526.702120-12-leon@kernel.org Signed-off-by: Leon Romanovsky <leonro@mellanox.com> Signed-off-by: Jason Gunthorpe <jgg@nvidia.com>	2020-08-27 08:38:16 -03:00
Jason Gunthorpe	38e03d0926	RDMA/ucma: Add missing locking around rdma_leave_multicast() All entry points to the rdma_cm from a ULP must be single threaded, even this error unwinds. Add the missing locking. Fixes: `7c11910783` ("RDMA/ucma: Put a lock around every call to the rdma_cm layer") Link: https://lore.kernel.org/r/20200818120526.702120-11-leon@kernel.org Signed-off-by: Leon Romanovsky <leonro@mellanox.com> Signed-off-by: Jason Gunthorpe <jgg@nvidia.com>	2020-08-27 08:38:15 -03:00
Jason Gunthorpe	98837c6c3d	RDMA/ucma: Fix locking for ctx->events_reported This value is locked under the file->mut, ensure it is held whenever touching it. The case in ucma_migrate_id() is a race, while in ucma_free_uctx() it is already not possible for the write side to run, the movement is just for clarity. Fixes: `88314e4dda` ("RDMA/cma: add support for rdma_migrate_id()") Link: https://lore.kernel.org/r/20200818120526.702120-10-leon@kernel.org Signed-off-by: Leon Romanovsky <leonro@mellanox.com> Signed-off-by: Jason Gunthorpe <jgg@nvidia.com>	2020-08-27 08:38:15 -03:00
Jason Gunthorpe	09e328e47a	RDMA/ucma: Fix the locking of ctx->file ctx->file is changed under the file->mut lock by ucma_migrate_id(), which is impossible to lock correctly. Instead change ctx->file under the handler_lock and ctx_table lock and revise all places touching ctx->file to use this locking when reading ctx->file. Link: https://lore.kernel.org/r/20200818120526.702120-9-leon@kernel.org Signed-off-by: Leon Romanovsky <leonro@mellanox.com> Signed-off-by: Jason Gunthorpe <jgg@nvidia.com>	2020-08-27 08:38:15 -03:00
Jason Gunthorpe	308571debc	RDMA/ucma: Do not use file->mut to lock destroying The only reader of destroying is inside a handler under the handler_mutex, so directly use the handler_mutex when setting it instead of the larger file->mut. As the refcount could be zero here, and the cm_id already freed, and additional refcount grab around the locking is required to touch the cm_id. Link: https://lore.kernel.org/r/20200818120526.702120-8-leon@kernel.org Signed-off-by: Leon Romanovsky <leonro@mellanox.com> Signed-off-by: Jason Gunthorpe <jgg@nvidia.com>	2020-08-27 08:38:15 -03:00
Jason Gunthorpe	d114c6feed	RDMA/cma: Add missing locking to rdma_accept() In almost all cases rdma_accept() is called under the handler_mutex by ULPs from their handler callbacks. The one exception was ucma which did not get the handler_mutex. To improve the understand-ability of the locking scheme obtain the mutex for ucma as well. This improves how ucma works by allowing it to directly use handler_mutex for some of its internal locking against the handler callbacks intead of the global file->mut lock. There does not seem to be a serious bug here, other than a DISCONNECT event can be delivered concurrently with accept succeeding. Link: https://lore.kernel.org/r/20200818120526.702120-7-leon@kernel.org Signed-off-by: Leon Romanovsky <leonro@mellanox.com> Signed-off-by: Jason Gunthorpe <jgg@nvidia.com>	2020-08-27 08:38:15 -03:00
Jason Gunthorpe	95fe51096b	RDMA/ucma: Remove mc_list and rely on xarray It is not really necessary to keep a linked list of mcs associated with each context when we can just scan the xarray to find the right things. The removes another overloading of file->mut by relying on the xarray locking for mc instead. Link: https://lore.kernel.org/r/20200818120526.702120-6-leon@kernel.org Signed-off-by: Leon Romanovsky <leonro@mellanox.com> Signed-off-by: Jason Gunthorpe <jgg@nvidia.com>	2020-08-27 08:38:14 -03:00
Jason Gunthorpe	620db1a118	RDMA/ucma: Fix error cases around ucma_alloc_ctx() The store to ctx->cm_id was based on the idea that _ucma_find_context() would not return the ctx until it was fully setup. Without locking this doesn't work properly. Split things so that the xarray is allocated with NULL to reserve the ID and once everything is final set the cm_id and store. Along the way this shows that the error unwind in ucma_get_event() if a new ctx is created is wrong, fix it up. Link: https://lore.kernel.org/r/20200818120526.702120-5-leon@kernel.org Signed-off-by: Leon Romanovsky <leonro@mellanox.com> Signed-off-by: Jason Gunthorpe <jgg@nvidia.com>	2020-08-27 08:38:14 -03:00
Jason Gunthorpe	c07e12d8e9	RDMA/ucma: Consolidate the two destroy flows ucma_close() is open coding the tail end of ucma_destroy_id(), consolidate this duplicated code into a function. Link: https://lore.kernel.org/r/20200818120526.702120-4-leon@kernel.org Signed-off-by: Leon Romanovsky <leonro@mellanox.com> Signed-off-by: Jason Gunthorpe <jgg@nvidia.com>	2020-08-27 08:38:14 -03:00
Jason Gunthorpe	07e266a775	RDMA/ucma: Remove unnecessary locking of file->ctx_list in close During the file_operations release function it is already not possible that write() can be running concurrently, remove the extra locking around the ctx_list. Link: https://lore.kernel.org/r/20200818120526.702120-3-leon@kernel.org Signed-off-by: Leon Romanovsky <leonro@mellanox.com> Signed-off-by: Jason Gunthorpe <jgg@nvidia.com>	2020-08-27 08:38:13 -03:00
Jason Gunthorpe	ca2968c1ef	RDMA/ucma: Fix refcount 0 incr in ucma_get_ctx() Both ucma_destroy_id() and ucma_close_id() (triggered from an event via a wq) can drive the refcount to zero. ucma_get_ctx() was wrongly assuming that the refcount can only go to zero from ucma_destroy_id() which also removes it from the xarray. Use refcount_inc_not_zero() instead. Link: https://lore.kernel.org/r/20200818120526.702120-2-leon@kernel.org Signed-off-by: Leon Romanovsky <leonro@mellanox.com> Signed-off-by: Jason Gunthorpe <jgg@nvidia.com>	2020-08-27 08:38:13 -03:00
Gustavo A. R. Silva	df561f6688	treewide: Use fallthrough pseudo-keyword Replace the existing /* fall through */ comments and its variants with the new pseudo-keyword macro fallthrough[1]. Also, remove unnecessary fall-through markings when it is the case. [1] https://www.kernel.org/doc/html/v5.7/process/deprecated.html?highlight=fallthrough#implicit-switch-case-fall-through Signed-off-by: Gustavo A. R. Silva <gustavoars@kernel.org>	2020-08-23 17:36:59 -05:00
Jason Gunthorpe	31142a4ba6	RDMA/cm: Add min length checks to user structure copies These are missing throughout ucma, it harmlessly copies garbage from userspace, but in this new code which uses min to compute the copy length it can result in uninitialized stack memory. Check for minimum length at the very start. BUG: KMSAN: uninit-value in ucma_connect+0x2aa/0xab0 drivers/infiniband/core/ucma.c:1091 CPU: 0 PID: 8457 Comm: syz-executor069 Not tainted 5.8.0-rc5-syzkaller #0 Hardware name: Google Google Compute Engine/Google Compute Engine, BIOS Google 01/01/2011 Call Trace: __dump_stack lib/dump_stack.c:77 [inline] dump_stack+0x1df/0x240 lib/dump_stack.c:118 kmsan_report+0xf7/0x1e0 mm/kmsan/kmsan_report.c:121 __msan_warning+0x58/0xa0 mm/kmsan/kmsan_instr.c:215 ucma_connect+0x2aa/0xab0 drivers/infiniband/core/ucma.c:1091 ucma_write+0x5c5/0x630 drivers/infiniband/core/ucma.c:1764 do_loop_readv_writev fs/read_write.c:737 [inline] do_iter_write+0x710/0xdc0 fs/read_write.c:1020 vfs_writev fs/read_write.c:1091 [inline] do_writev+0x42d/0x8f0 fs/read_write.c:1134 __do_sys_writev fs/read_write.c:1207 [inline] __se_sys_writev+0x9b/0xb0 fs/read_write.c:1204 __x64_sys_writev+0x4a/0x70 fs/read_write.c:1204 do_syscall_64+0xb0/0x150 arch/x86/entry/common.c:386 entry_SYSCALL_64_after_hwframe+0x44/0xa9 Fixes: `34e2ab57a9` ("RDMA/ucma: Extend ucma_connect to receive ECE parameters") Fixes: `0cb15372a6` ("RDMA/cma: Connect ECE to rdma_accept") Link: https://lore.kernel.org/r/0-v1-d5b86dab17dc+28c25-ucma_syz_min_jgg@nvidia.com Reported-by: syzbot+086ab5ca9eafd2379aa6@syzkaller.appspotmail.com Reported-by: syzbot+7446526858b83c8828b2@syzkaller.appspotmail.com Reviewed-by: Leon Romanovsky <leonro@mellanox.com> Signed-off-by: Jason Gunthorpe <jgg@nvidia.com>	2020-07-27 11:50:00 -03:00
Leon Romanovsky	8094ba0ace	RDMA/cma: Provide ECE reject reason IBTA declares "vendor option not supported" reject reason in REJ messages if passive side doesn't want to accept proposed ECE options. Due to the fact that ECE is managed by userspace, there is a need to let users to provide such rejected reason. Link: https://lore.kernel.org/r/20200526103304.196371-7-leon@kernel.org Signed-off-by: Leon Romanovsky <leonro@mellanox.com> Signed-off-by: Jason Gunthorpe <jgg@mellanox.com>	2020-05-27 16:05:05 -03:00
Leon Romanovsky	0cb15372a6	RDMA/cma: Connect ECE to rdma_accept The rdma_accept() is called by both passive and active sides of CMID connection to mark readiness to start data transfer. For passive side, this is called explicitly, for active side, it is called implicitly while receiving REP message. Provide ECE data to rdma_accept function needed for passive side to send that REP message. Link: https://lore.kernel.org/r/20200526103304.196371-6-leon@kernel.org Signed-off-by: Leon Romanovsky <leonro@mellanox.com> Signed-off-by: Jason Gunthorpe <jgg@mellanox.com>	2020-05-27 16:05:05 -03:00
Leon Romanovsky	93531ee7b9	RDMA/ucma: Deliver ECE parameters through UCMA events Passive side of CMID connection receives ECE request through REQ message and needs to respond with relevant REP message which will be forwarded to active side. The UCMA events interface is responsible for such communication with the user space (librdmacm). Extend it to provide ECE wire data. Link: https://lore.kernel.org/r/20200526103304.196371-4-leon@kernel.org Signed-off-by: Leon Romanovsky <leonro@mellanox.com> Signed-off-by: Jason Gunthorpe <jgg@mellanox.com>	2020-05-27 16:05:05 -03:00
Leon Romanovsky	34e2ab57a9	RDMA/ucma: Extend ucma_connect to receive ECE parameters Active side of CMID initiates connection through librdmacm's rdma_connect() and kernel's ucma_connect(). Extend UCMA interface to handle those new parameters. Link: https://lore.kernel.org/r/20200526103304.196371-3-leon@kernel.org Signed-off-by: Leon Romanovsky <leonro@mellanox.com> Signed-off-by: Jason Gunthorpe <jgg@mellanox.com>	2020-05-27 16:05:05 -03:00
Leon Romanovsky	17793833f8	RDMA/ucma: Return stable IB device index as identifier The librdmacm uses node_guid as identifier to correlate between IB devices and CMA devices. However FW resets cause to such "connection" to be lost and require from the user to restart its application. Extend UCMA to return IB device index, which is stable identifier. Link: https://lore.kernel.org/r/20200504132541.355710-1-leon@kernel.org Signed-off-by: Leon Romanovsky <leonro@mellanox.com> Signed-off-by: Jason Gunthorpe <jgg@mellanox.com>	2020-05-12 19:52:27 -03:00
Jason Gunthorpe	7c11910783	RDMA/ucma: Put a lock around every call to the rdma_cm layer The rdma_cm must be used single threaded. This appears to be a bug in the design, as it does have lots of locking that seems like it should allow concurrency. However, when it is all said and done every single place that uses the cma_exch() scheme is broken, and all the unlocked reads from the ucma of the cm_id data are wrong too. syzkaller has been finding endless bugs related to this. Fixing this in any elegant way is some enormous amount of work. Take a very big hammer and put a mutex around everything to do with the ucma_context at the top of every syscall. Fixes: `7521663857` ("RDMA/cma: Export rdma cm interface to userspace") Link: https://lore.kernel.org/r/20200218210432.GA31966@ziepe.ca Reported-by: syzbot+adb15cf8c2798e4e0db4@syzkaller.appspotmail.com Reported-by: syzbot+e5579222b6a3edd96522@syzkaller.appspotmail.com Reported-by: syzbot+4b628fcc748474003457@syzkaller.appspotmail.com Reported-by: syzbot+29ee8f76017ce6cf03da@syzkaller.appspotmail.com Reported-by: syzbot+6956235342b7317ec564@syzkaller.appspotmail.com Reported-by: syzbot+b358909d8d01556b790b@syzkaller.appspotmail.com Reported-by: syzbot+6b46b135602a3f3ac99e@syzkaller.appspotmail.com Reported-by: syzbot+8458d13b13562abf6b77@syzkaller.appspotmail.com Reported-by: syzbot+bd034f3fdc0402e942ed@syzkaller.appspotmail.com Reported-by: syzbot+c92378b32760a4eef756@syzkaller.appspotmail.com Reported-by: syzbot+68b44a1597636e0b342c@syzkaller.appspotmail.com Signed-off-by: Jason Gunthorpe <jgg@mellanox.com>	2020-02-27 16:40:40 -04:00
Jason Gunthorpe	167b95ec88	RDMA/ucma: Use refcount_t for the ctx->ref Don't use an atomic as a refcount. Link: https://lore.kernel.org/r/20200218191657.GA29724@ziepe.ca Reviewed-by: Leon Romanovsky <leonro@mellanox.com> Signed-off-by: Jason Gunthorpe <jgg@mellanox.com>	2020-02-19 16:41:00 -04:00
Leon Romanovsky	ca750d4a9c	RDMA/ucma: Mask QPN to be 24 bits according to IBTA IBTA declares QPN as 24bits, mask input to ensure that kernel doesn't get higher bits and ensure by adding WANR_ONCE() that other CM users do the same. Link: https://lore.kernel.org/r/20200212072635.682689-2-leon@kernel.org Signed-off-by: Leon Romanovsky <leonro@mellanox.com> Signed-off-by: Jason Gunthorpe <jgg@mellanox.com>	2020-02-13 11:38:19 -04:00
Jason Gunthorpe	8f71bb0030	RDMA: Report available cdevs through RDMA_NLDEV_CMD_GET_CHARDEV Update the struct ib_client for all modules exporting cdevs related to the ibdevice to also implement RDMA_NLDEV_CMD_GET_CHARDEV. All cdevs are now autoloadable and discoverable by userspace over netlink instead of relying on sysfs. uverbs also exposes the DRIVER_ID for drivers that are able to support driver id binding in rdma-core. Signed-off-by: Jason Gunthorpe <jgg@mellanox.com> Reviewed-by: Leon Romanovsky <leonro@mellanox.com> Signed-off-by: Doug Ledford <dledford@redhat.com>	2019-06-18 22:44:08 -04:00
Matthew Wilcox	afcafe07af	ucma: Convert ctx_idr to XArray Signed-off-by: Matthew Wilcox <willy@infradead.org> Signed-off-by: Jason Gunthorpe <jgg@mellanox.com>	2019-06-07 16:43:02 -03:00

1 2 3 4

160 Commits