PatchworkOS  dbbdc99
A non-POSIX operating system.
Loading...
Searching...
No Matches
ring.h
Go to the documentation of this file.
1#pragma once
2
3#include <kernel/config.h>
4#include <kernel/io/irp.h>
5#include <kernel/log/panic.h>
6#include <kernel/mem/mdl.h>
7#include <kernel/mem/vmm.h>
8#include <kernel/sched/wait.h>
9#include <kernel/sync/lock.h>
10
11#include <string.h>
12#include <sys/ioring.h>
13
14/**
15 * @brief Programmable submission/completion interface.
16 * @defgroup kernel_io_ring Kernel-side I/O Ring Interface
17 * @ingroup kernel_io
18 *
19 * @todo The I/O ring system is primarily a design document for now as it remains very work in progress and subject to
20 * change, currently being mostly unimplemented.
21 *
22 * @todo Rewrite the Kernel-side I/O Ring Interface documentation to match the new system.
23 *
24 * The I/O ring provides the core of all interfaces in PatchworkOS, where user-space submits Submission Queue Entries
25 * (SQEs) and receives Completion Queue Entries (CQEs) from it, all within shared memory. Allowing for highly efficient
26 * and asynchronous I/O operations, especially since PatchworkOS is designed to be natively asynchronous.
27 *
28 * Each SQE specifies a verb (the operation to perform) and a set of up to `SQE_MAX_ARG` arguments, while each CQE
29 * returns the result of a previously submitted SQE.
30 *
31 * Synchronous operations are implemented on top of this API in userspace.
32 *
33 * @see libstd_sys_ioring for the userspace interface to the asynchronous ring.
34 * @see [Wikipedia](https://en.wikipedia.org/wiki/Io_uring) for information about `io_uring`, the inspiration for this
35 * system.
36 * @see [Manpages](https://man7.org/linux/man-pages/man7/io_uring.7.html) for more information about `io_uring`.
37 *
38 * ## Syncronization
39 *
40 * The I/O ring structure is designed to be safe under the assumption that there is a single producer (one user-space
41 * thread) and a single consumer (the kernel).
42 *
43 * If an I/O ring needs multiple producers (needs to be accessed by multiple threads) it is the responsibility of
44 * the caller to ensure proper synchronization.
45 *
46 * @note The reason for this limitation is optimization for the common case, as the syncronization logic for multiple
47 * producers would add significant overhead. Additionally, it is rather straight forward for user-space to protect the
48 * ring with a mutex should it need to.
49 *
50 * Regarding the I/O ring structure itself, the structure can only be torndown as long as nothing is using it and there
51 * are no pending operations.
52 *
53 * ## Registers
54 *
55 * Operations performed on a I/O ring can load arguments from, and save their results to, seven 64-bit general purpose
56 * registers. All registers are stored in the shared control area of the I/O ring structure (`ioring_ctrl_t`), as such
57 * they can be inspected and modified by user space.
58 *
59 * When a SQE is processed, the kernel will check six register specifiers in the SQE flags, one for each argument and
60 * one for the result. Each specifier is stored as three bits, with a `SQE_REG_NONE` value indicating no-op and any
61 * other value representing the n-th register. The offset of the specifier specifies its meaning, for example, bits
62 * `0-2` specify the register to load into the first argument, bits `3-5` specify the register to load into the second
63 * argument, and so on until bits `15-17` which specify the register to save the result into.
64 *
65 * This system, when combined with `SQE_LINK`, allows for multiple operations to be performed at once, for example, it
66 * would be possible to open a file, read from it, seek to a new position, write to it, and finally close the file, with
67 * a single `enter()` call.
68 *
69 * @see `sqe_flags_t` for more information about register specifiers and their formatting.
70 *
71 * ## Arguments
72 *
73 * Arguments within a SQE are stored in five 64-bit values, `arg1` through `arg5`. For convenience, each argument value
74 * is stored as a union with various types.
75 *
76 * To avoid nameing conflicts and to avoid having to define new arguments for each verb, we define a convention to be
77 * used for the arguments.
78 *
79 * - `arg0`: The noun or subject of the verb, for example, a `fd_t` for file operations.
80 * - `arg1`: The source or payload of the verb, for example, a buffer or path.
81 * - `arg2`: The magnitude of the operation, for example, a size or encoding.
82 * - `arg3`: The location or a modifier to the operation, for example, an offset or flags.
83 * - `arg4`: An auxiliary argument, for example, additional flags or options.
84 *
85 * It may not always be possible for a verb to follow these conventions, but they should be followed whenever
86 * reasonable.
87 *
88 * @note The kernels internal I/O Request Packet structure uses a similar system but with the kernel equivalents
89 * of the arguments, for example, a `file_t*` instead of a `fd_t`.
90 *
91 * ## Results
92 *
93 * The result of a SQE is stored in its corresponding CQE using a single 64-bit value. For convenience, the result is
94 * stored as a union of various types. Note that this does not actually change the stored value, just how it is
95 * interpreted.
96 *
97 * If a SQE fails, the error code will be stored separately from the result and the result it self may be undefined.
98 * Some verbs may allow partial failures in which case the result may still be valid even if an error code is present.
99 *
100 * @todo Decide if partial failures are a good idea or not.
101 *
102 * ## Errors
103 *
104 * The majority of errors are returned in the CQEs, certain errors (such as `ENOMEM`) may be
105 * reported directly from the `enter()` call.
106 *
107 * Error values that may be returned in a CQE include:
108 * - `EOK`: Success.
109 * - `ECANCELED`: The verb was cancelled.
110 * - `ETIMEDOUT`: The verb timed out.
111 * - Other values may be returned depending on the verb.
112 *
113 * ## Verbs
114 *
115 * Included below is a list of all currently implemented verbs.
116 *
117 * The arguments of each verb is specified in order as `arg0`, `arg1`, `arg2`, `arg3`, `arg4`.
118 *
119 * ### `VERB_NOP`
120 *
121 * A no-operation verb that does nothing but is useful for implementing sleeping.
122 *
123 * @param arg0 Unused
124 * @param arg1 Unused
125 * @param arg2 Unused
126 * @param arg3 Unused
127 * @param arg4 Unused
128 * @result None
129 *
130 * ### `VERB_READ`
131 *
132 * Reads data from a file descriptor.
133 *
134 * @param fd The file descriptor to read from.
135 * @param buffer The buffer to read the data into.
136 * @param count The number of bytes to read.
137 * @param offset The offset to read from, or `IO_CUR` to use the current position.
138 * @param arg4 Unused
139 * @result The number of bytes read.
140 *
141 * ### `VERB_WRITE`
142 *
143 * Writes data to a file descriptor.
144 *
145 * @param fd The file descriptor to write to.
146 * @param buffer The buffer to write the data from.
147 * @param count The number of bytes to write.
148 * @param offset The offset to write to, or `IO_CUR` to use the current position.
149 * @param arg4 Unused
150 * @result The number of bytes written.
151 *
152 * ### `VERB_POLL`
153 *
154 * Polls a file descriptor for events.
155 *
156 * @param fd The file descriptor to poll.
157 * @param events The events to wait for.
158 * @param arg2 Unused
159 * @param arg3 Unused
160 * @param arg4 Unused
161 * @result The events that occurred.
162 *
163 * @{
164 */
165
166/**
167 * @brief Ring context flags.
168 * @enum ioring_ctx_flags_t
169 */
170typedef enum
171{
172 IORING_CTX_NONE = 0, ///< No flags set.
173 IORING_CTX_BUSY = 1 << 0, ///< Context is currently being used, used for fast locking.
174 IORING_CTX_MAPPED = 1 << 1, ///< Context is currently mapped into userspace.
176
177/**
178 * @brief The kernel-side ring context structure.
179 * @struct ioring_ctx_t
180 */
181typedef struct ioring_ctx
182{
183 ioring_t ring; ///< The kernel-side ring structure.
184 irp_pool_t* irps; ///< Pool of preallocated IRPs.
185 void* userAddr; ///< Userspace address of the ring.
186 void* kernelAddr; ///< Kernel address of the ring.
187 size_t pageAmount; ///< Amount of pages mapped for the ring.
188 wait_queue_t waitQueue; ///< Wait queue for completions.
191
192/**
193 * @brief Initialize a I/O context.
194 *
195 * @param ctx Pointer to the context to initialize.
196 */
198
199/**
200 * @brief Deinitialize a I/O context.
201 *
202 * @param ctx Pointer to the context to deinitialize.
203 */
205
206/**
207 * @brief Notify the context of new SQEs.
208 *
209 * @param ctx Pointer to the context.
210 * @param amount The number of SQEs to process.
211 * @param wait The minimum number of CQEs to wait for.
212 * @return On success, the number of SQEs processed. On failure, `ERR` and `errno` is set.
213 */
214uint64_t ioring_ctx_notify(ioring_ctx_t* ctx, size_t amount, size_t wait);
215
216/** @} */
uint64_t ioring_ctx_notify(ioring_ctx_t *ctx, size_t amount, size_t wait)
Notify the context of new SQEs.
Definition ring.c:346
void ioring_ctx_deinit(ioring_ctx_t *ctx)
Deinitialize a I/O context.
Definition ring.c:162
void ioring_ctx_init(ioring_ctx_t *ctx)
Initialize a I/O context.
Definition ring.c:146
ioring_ctx_flags_t
Ring context flags.
Definition ring.h:171
@ IORING_CTX_NONE
No flags set.
Definition ring.h:172
@ IORING_CTX_MAPPED
Context is currently mapped into userspace.
Definition ring.h:174
@ IORING_CTX_BUSY
Context is currently being used, used for fast locking.
Definition ring.h:173
static const path_flag_t flags[]
Definition path.c:47
__UINT64_TYPE__ uint64_t
Definition stdint.h:17
The kernel-side ring context structure.
Definition ring.h:182
ioring_t ring
The kernel-side ring structure.
Definition ring.h:183
irp_pool_t * irps
Pool of preallocated IRPs.
Definition ring.h:184
_Atomic(ioring_ctx_flags_t) flags
wait_queue_t waitQueue
Wait queue for completions.
Definition ring.h:188
size_t pageAmount
Amount of pages mapped for the ring.
Definition ring.h:187
void * kernelAddr
Kernel address of the ring.
Definition ring.h:186
void * userAddr
Userspace address of the ring.
Definition ring.h:185
User I/O ring structure.
Definition ioring.h:204
The primitive that threads block on.
Definition wait.h:185