|
PatchworkOS
19e446b
A non-POSIX operating system.
|
Programmable submission/completion interface. More...
Programmable submission/completion interface.
The I/O ring provides the core of all interfaces in PatchworkOS, where user-space submits Submission Queue Entries (SQEs) and receives Completion Queue Entries (CQEs) from it, all within shared memory. Allowing for highly efficient and asynchronous I/O operations, especially since PatchworkOS is designed to be natively asynchronous.
Each SQE specifies a verb (the operation to perform) and a set of up to SQE_MAX_ARG arguments, while each CQE returns the result of a previously submitted SQE.
Synchronous operations are implemented on top of this API in userspace.
io_uring, the inspiration for this system. io_uring.The I/O ring structure is designed to be safe under the assumption that there is a single producer (one user-space thread) and a single consumer (the kernel).
If an I/O ring needs multiple producers (needs to be accessed by multiple threads) it is the responsibility of the caller to ensure proper synchronization.
Regarding the I/O ring structure itself, the structure can only be torndown as long as nothing is using it and there are no pending operations.
Operations performed on a I/O ring can load arguments from, and save their results to, seven 64-bit general purpose registers. All registers are stored in the shared control area of the I/O ring structure (ioring_ctrl_t), as such they can be inspected and modified by user space.
When a SQE is processed, the kernel will check six register specifiers in the SQE flags, one for each argument and one for the result. Each specifier is stored as three bits, with a SQE_REG_NONE value indicating no-op and any other value representing the n-th register. The offset of the specifier specifies its meaning, for example, bits 0-2 specify the register to load into the first argument, bits 3-5 specify the register to load into the second argument, and so on until bits 15-17 which specify the register to save the result into.
This system, when combined with SQE_LINK, allows for multiple operations to be performed at once, for example, it would be possible to open a file, read from it, seek to a new position, write to it, and finally close the file, with a single enter() call.
sqe_flags_t for more information about register specifiers and their formatting.Arguments within a SQE are stored in five 64-bit values, arg1 through arg5. For convenience, each argument value is stored as a union with various types.
To avoid nameing conflicts and to avoid having to define new arguments for each verb, we define a convention to be used for the arguments.
arg0: The noun or subject of the verb, for example, a fd_t for file operations.arg1: The source or payload of the verb, for example, a buffer or path.arg2: The magnitude of the operation, for example, a size or encoding.arg3: The location or a modifier to the operation, for example, an offset or flags.arg4: An auxiliary argument, for example, additional flags or options.It may not always be possible for a verb to follow these conventions, but they should be followed whenever reasonable.
file_t* instead of a fd_t.The result of a SQE is stored in its corresponding CQE using a single 64-bit value. For convenience, the result is stored as a union of various types. Note that this does not actually change the stored value, just how it is interpreted.
If a SQE fails, the error code will be stored separately from the result and the result it self may be undefined. Some verbs may allow partial failures in which case the result may still be valid even if an error code is present.
The majority of errors are returned in the CQEs, certain errors (such as ENOMEM) may be reported directly from the enter() call.
Error values that may be returned in a CQE include:
EOK: Success.ECANCELED: The verb was cancelled.ETIMEDOUT: The verb timed out.Included below is a list of all currently implemented verbs.
The arguments of each verb is specified in order as arg0, arg1, arg2, arg3, arg4.
A no-operation verb that does nothing but is useful for implementing sleeping.
| arg0 | Unused |
| arg1 | Unused |
| arg2 | Unused |
| arg3 | Unused |
| arg4 | Unused |
Reads data from a file descriptor.
| fd | The file descriptor to read from. |
| buffer | The buffer to read the data into. |
| count | The number of bytes to read. |
| offset | The offset to read from, or IO_CUR to use the current position. |
| arg4 | Unused |
Writes data to a file descriptor.
| fd | The file descriptor to write to. |
| buffer | The buffer to write the data from. |
| count | The number of bytes to write. |
| offset | The offset to write to, or IO_CUR to use the current position. |
| arg4 | Unused |
Polls a file descriptor for events.
| fd | The file descriptor to poll. |
| events | The events to wait for. |
| arg2 | Unused |
| arg3 | Unused |
| arg4 | Unused |
Data Structures | |
| struct | ioring_ctx_t |
| The kernel-side ring context structure. More... | |
Enumerations | |
| enum | ioring_ctx_flags_t { IORING_CTX_NONE = 0 , IORING_CTX_BUSY = 1 << 0 , IORING_CTX_MAPPED = 1 << 1 } |
| Ring context flags. More... | |
Functions | |
| void | ioring_ctx_init (ioring_ctx_t *ctx) |
| Initialize a I/O context. | |
| void | ioring_ctx_deinit (ioring_ctx_t *ctx) |
| Deinitialize a I/O context. | |
| uint64_t | ioring_ctx_notify (ioring_ctx_t *ctx, size_t amount, size_t wait) |
| Notify the context of new SQEs. | |
| enum ioring_ctx_flags_t |
| void ioring_ctx_init | ( | ioring_ctx_t * | ctx | ) |
| void ioring_ctx_deinit | ( | ioring_ctx_t * | ctx | ) |
| uint64_t ioring_ctx_notify | ( | ioring_ctx_t * | ctx, |
| size_t | amount, | ||
| size_t | wait | ||
| ) |
Notify the context of new SQEs.
| ctx | Pointer to the context. |
| amount | The number of SQEs to process. |
| wait | The minimum number of CQEs to wait for. |
ERR and errno is set. Definition at line 346 of file ring.c.