|
PatchworkOS
|
Patchwork is a monolithic non-POSIX operating system for the x86_64 architecture that rigorously follows an "everything is a file" philosophy. Built from scratch in C it takes many ideas from Unix, Plan9 and others while simplifying them and adding in some new ideas of its own.
In the end this is a project made for fun, however the goal is still to make a feature-complete and experimental operating system which attempts to use unique algorithms and designs over tried and tested ones. Sometimes this leads to bad results, and sometimes, hopefully, good ones.
Additionally, the OS aims to, in spite of its experimental nature, remain approachable and educational, something that can work as a middle ground between fully educational operating systems like xv6 and production operating system like Linux.
| |
O(1) per page and O(n) where n is the number of pages per allocation/mapping operation, see benchmarks for more infoAnd much more...
fork(), exec() with spawn()openat() and fchdir() systems callsspawn() for namespace inheritanceAs one of the main goals of PatchworkOS is to be educational, I have tried to document the codebase as much as possible along with providing citations to any sources used. Currently, this is still a work in progress, but as old code is refactored and new code is added, I try to add documentation.
If you are interested in knowing more, then you can check out the Doxygen generated documentation.
All benchmarks were run on real hardware using a Lenovo ThinkPad E495. For comparison, I've decided to use the Linux kernel, specifically Fedora since It's what I normally use.
Note that Fedora will obviously have a lot more background processes running and security features that might impact performance, so these benchmarks are not exactly apples to apples, but they should still give a good baseline for how PatchworkOS performs.
All code for benchmarks can be found in the benchmark program, all tests were run using the optimization flag -O3.
The test maps and unmaps memory in varying page amounts for a set amount of iterations using generic mmap and munmap functions. Below is the results from PatchworkOS as of commit 4b00a88 and Fedora 40, kernel version 6.14.5-100.fc40.x86_64.
We see that PatchworkOS performs better across the board, and the performance difference increases as we increase the page count.
There are a few potential reasons for this, one is that PatchworkOS does not use a separate structure to manage virtual memory, instead it embeds metadata directly into the page tables, and since accessing a page table is just walking some pointers, its highly efficient, additionally it provides better caching since the page tables are likely already in the CPU cache.
In the end we end up with a $O(1)$ complexity per page operation, or technically, since the algorithm for finding unmapped memory sections is $O(r)$ in the worst case where $r$ is the size of the address region to check in pages, having more memory allocated would potentially actually improve performance but only by a very small amount. We do of course get $O(n)$ complexity per allocation/mapping operation where $n$ is the number of pages.
Note that as the number of pages increases we start to see less and less linear performance, this is most likely due to CPU cache saturation.
For fun, we can throw the results into desmos to se that around $800$ to $900$ pages there is a "knee" in the curve. Saying that $x$ is the number of pages per iteration and $y$ is the time in milliseconds let us split the data into two sets. We can now perform linear regression which gives us
Performing quadratic regression on the same data gives us
From this we see that for $x \le 850$ the linear regression has a slightly better fit while for $x > 850$ the quadratic regression has a slightly better fit, this is most likely due to the CPU or TLB caches starting to get saturated. All in all this did not tell us much more than we already knew, but it was fun to do regardless.
Of course, there are limitations to this approach, for example, it is in no way portable (which isn't a concern in our case), each address space can only contain $2^8 - 1$ unique shared memory regions, and copy-on-write would not be easy to implement (however, the need for this is reduced due to PatchworkOS using a spawn() instead of a fork()).
All in all, this algorithm would not be a viable replacement for existing algorithms, but for PatchworkOS, it serves its purpose very efficiently.
Patchwork includes its own shell utilities designed around its file flags system. Included is a brief overview with some usage examples. For convenience the shell utilities are named after their POSIX counterparts, however they are not drop-in replacements.
Opens a file path and then immediately closes it.
Reads from stdin or provided files and outputs to stdout.
Writes to stdout.
Reads the contents of a directory to stdout.
Removes a file or directory.
There are other utils available that work as expected, for example stat and link.
Patchwork strictly follows the "everything is a file" philosophy in a way similar to Plan9, this can often result in unorthodox APIs or could just straight up seem overly complicated, but it has its advantages. We will use sockets to demonstrate the kinds of APIs this produces.
In order to create a local seqpacket socket, you open the /net/local/seqpacket file. The opened file will act as the handle for your socket. Reading from the handle will return the ID of your created socket so, for example, you can do
Note that even when the handle is closed the socket will persist until the process that created it and all its children have exited. The ID that the handle returns is the name of a directory that has been created in the /net/local directory, in which are three files, these include:
data - used to send and retrieve datactl - used to send commandsaccept - used to accept incoming connectionsSo, for example, the sockets data file is located at /net/local/[id]/data.
Say we want to make our socket into a server, we would then use the bind and listen commands with the ctl file, we can then write
Note the use of openf() which allows us to open files via a formatted path and that we name our server myserver. If we wanted to accept a connection using our newly created server, we just open its accept file by writing
The returned file descriptor can be used to send and receive data, just like when calling accept() in for example Linux or other POSIX operating systems. Note that the entire socket API does attempt to mimic the POSIX socket API, apart from using these weird files everything (should) work as expected.
For the sake of completeness, if we wanted to connect to this server, we can do
which would create a new socket and connect it to the server named myserver.
Namespaces are a set of mountpoints that is unique per process with each process able to access the mountpoints in its parent's namespace, which allows each process a unique view of the file system and is utilized for access control.
Think of it like this, in the common case, for instance on Linux, you can mount a drive to /mnt/mydrive and all processes can then open the /mnt/mydrive path and see the contents of that drive. In PatchworkOS, this is also possible, but for security reasons we might not want every process to be able to see that drive, instead processes should see the original contents of /mnt/mydrive which might just be an empty directory. The exception is for the process that created the mountpoint and its children as they would have that mountpoint in their namespace.
For example, the "id" directories mentioned in the socket example are a separate "sysfs" instance mounted in the namespace of the creating process, meaning that only that process and its children can see their contents.
It's possible for two processes to voluntarily share a mountpoint in their namespaces using bind() in combination with two new system calls share() and claim().
For example, if process A wants to share its /net/local/5 directory from the socket example with process B, they can do
This system guarantees consent between processes, and can be used to implement more complex access control systems.
An interesting detail is that when process A opens the net/local/5 directory, the dentry underlying the file descriptor is the root of the mounted file system, if process B were to try to open this directory, it would still succeed as the directory itself is visible, however process B would instead retrieve the dentry of the directory in the parent superblock, and would instead see the content of that directory in the parent superblock. If this means nothing to you, don't worry about it.
You may have noticed that in the above section sections, the open() function does not take in a flags argument. This is because flags are part of the file path directly so if you wanted to create a non-blocking socket, you can write
Multiple flags are allowed, just separate them with the : character, this means flags can be easily appended to a path using the openf() function. It is also possible to just specify the first letter of a flag, so instead of :nonblock you can use :n.
| Directory | Description |
|---|---|
include | Public API |
src | Source code |
root | Files copied to the root directory of the generated .img |
tools | Build scripts (hacky alternative to cross-compiler) |
make | Make files |
lib | Third party dependencies |
meta | Screenshots and repo metadata |
| Requirement | Details |
|---|---|
| OS | Linux (WSL might work, but I make no guarantees) |
| Tools | GCC, make, NASM, mtools, QEMU (optional) |
For frequent testing, it might be inconvenient to frequently flash to a USB. You can instead set up the .img file as a loopback device in GRUB.
Add this entry to the /etc/grub.d/40_custom file:
Regenerate grub configuration using sudo grub2-mkconfig -o /boot/grub2/grub.cfg.
Finally copy the generated .img file to your /boot directory, this can also be done with make grub_loopback.
You should now see a new entry in your GRUB boot menu allowing you to boot into the OS, like dual booting, but without the need to create a partition.
Testing uses a GitHub action that compiles the project and runs it for some amount of time using QEMU both with the DEBUG=1 and TESTING=1 flags enabled. This will run some additional tests in the kernel (for example it will clone ACPICA and run all its runtime tests), and if it has not crashed by the end of the allotted time, it is considered a success.
Currently untested on Intel hardware. Let me know if you have different hardware, and it runs (or doesn't) for you!
Contributions are welcome! Anything from bug reports/fixes, performance improvements, new features, or even just fixing typos or adding documentation!
If you are unsure where to start, try searching for any "TODO" comments in the codebase.
Check out the contribution guidelines to get started.
The first Reddit post and image of PatchworkOS from back when getting to user space was a massive milestone and the kernel was supposed to be a microkernel.