serenity

mirror of https://github.com/SerenityOS/serenity synced 2026-05-14 19:06:55 +02:00

Author	SHA1	Message	Date
Liav A.	b93ca74d81	Kernel: Add a prctl option to enter jail mode until an execve syscall In addition to the already existing option to enter jail mode (which is set indefinitely), there should be a less restrictive option that should allow exiting jail mode when doing the execve syscall. This option will be useful for programs that need this kind of security layer only in their runtime, but they're meant to actually initiate another program in the end.	2024-10-03 12:39:45 +02:00
Liav A.	fdf3e0aca1	Kernel: Don't assume sizes of needed buffers early in the execve syscall Instead, start by trying to read a buffer with size of Elf_Ehdr, and check it for the shebang sign. If it's indeed an executable with shebang then read again from the file, now with PAGE_SIZE size, which should suffice for finding the interpreter path. However, if the executable is an ELF, we quickly validate it and then pass the preliminary buffer to the find_elf_interpreter_for_executable method. That method calculates the last byte offset which is needed to read all of the program headers, so we don't just assume 4096 bytes is sufficient anymore. The same pattern is applied when loading the interpreter ELF main header and its program headers.	2024-09-01 20:52:55 +02:00
Liav A.	4aec3f4ef9	Kernel+Userland: Simplify loading of an ELF interpreter path The LibELF validate_program_headers method tried to do too many things at once, and as a result, we had an awkward return type from it. To be able to simplify it, we no longer allow passing a StringBuilder* but instead we require to pass an Optional<Elf_Phdr> by reference so it could be filled with actual ELF program header that corresponds to an INTERP header if such found. As a result, we ensure that only certain implementations that actually care about the ELF interpreter path will actually try to load it on their own and if they fail, they can have better diagnostics for an invalid INTERP header. This change also fixes a bug that on which we failed to execute an ELF program if the INTERP header is located outside the first 4KiB page of the ELF file, as the kernel previously didn't have support for looking beyond that for that header.	2024-07-21 15:38:52 +02:00
Liav A.	c0f55d4b11	Kernel: Add a check on ELF interpreter to verify we open a regular file While extremely unlikely, it's possible to change the dynamic loader to a non regular file, which will result in a kernel panic upon VERIFY of the `interpreter_description->inode()` statement.	2024-07-21 15:38:52 +02:00
Liav A.	03ae9fdb0a	Kernel: Check condition earlier for ELF file type It makes no sense to do all of the loading work just to figure out that the ELF file is an object file that is a result of compiling and not an actual executable. In addition to that, we should disallow running coredumps as well, so the condition is changed now to only allow ET_DYN or ET_EXEC ELF files.	2024-07-21 15:38:52 +02:00
Liav A.	3692af528e	Kernel: Move most of VirtualFileSystem code to be in a namespace There's no point in constructing an object just for the sake of keeping a state that can be touched by anything in the kernel code. Let's reduce everything to be in a C++ namespace called with the previous name "VirtualFileSystem" and keep a smaller textual-footprint struct called "VirtualFileSystemDetails". This change also cleans up old "friend class" statements that were no longer needed, and move methods from the VirtualFileSystem code to more appropriate places as well. Please note that the method of locking all filesystems during shutdown is removed, as in that place there's no meaning to actually locking all filesystems because of running in kernel mode entirely.	2024-07-21 11:44:23 +02:00
Liav A.	dd59fe35c7	Kernel+Userland: Reduce jails to be a simple boolean flag The whole concept of Jails was far more complicated than I actually want it to be, so let's reduce the complexity of how it works from now on. Please note that we always leaked the attach count of a Jail object in the fork syscall if it failed midway. Instead, we should have attach to the jail just before registering the new Process, so we don't need to worry about unsuccessful Process creation. The reduction of complexity in regard to jails means that instead of relying on jails to provide PID isolation, we could simplify the whole idea of them to be a simple SetOnce, and let the ProcessList (now called ScopedProcessList) to be responsible for this type of isolation. Therefore, we apply the following changes to do so: - We make the Jail concept no longer a class of its own. Instead, we simplify the idea of being jailed to a simple ProtectedValues boolean flag. This means that we no longer check of matching jail pointers anywhere in the Kernel code. To set a process as jailed, a new prctl option was added to set a Kernel SetOnce boolean flag (so it cannot change ever again). - We provide Process & Thread methods to iterate over process lists. A process can either iterate on the global process list, or if it's attached to a scoped process list, then only over that list. This essentially replaces the need of checking the Jail pointer of a process when iterating over process lists.	2024-07-21 11:44:23 +02:00
Liav A.	01e1af732b	Kernel/FileSystem: Introduce the VFSRootContext class The VFSRootContext class, as its name suggests, holds a context for a root directory with its mount table and the root custody/inode in the same class. The idea is derived from the Linux mount namespace mechanism. It mimicks the concept of the ProcessList object, but it is adjusted for a root directory tree context. In contrast to the ProcessList concept, processes that share the default VFSRootContext can't see other VFSRootContext related properties such as as the mount table and root custody/inode. To accommodate to this change progressively, we internally create 2 main VFS root contexts for now - one for kernel processes (as they don't need to care about VFS root contexts for the most part), and another for all userspace programs. This separation allows us to continue pretending for userspace that everything is "normal" as it is used to be, until we introduce proper interfaces in the mount-related syscalls as well as in the SysFS. We make VFSRootContext objects being listed, as another preparation before we could expose interfaces to userspace. As a result, the PowerStateSwitchTask now iterates on all contexts and tear them down one by one.	2024-07-21 11:44:23 +02:00
Dan Klishch	cc5bacf886	Kernel: Allow annotating initially loaded executable segments This allows marking regions as VirtualMemoryRangeFlags::SyscallCode in static executables.	2024-05-07 16:36:38 -06:00
Sönke Holz	243d7003a2	Kernel+LibC+LibELF: Move TLS handling to userspace This removes the allocate_tls syscall and adds an archctl option to set the fs_base for the current thread on x86-64, since you can't set that register from userspace. enter_thread_context loads the fs_base for the next thread on each context switch. This also moves tpidr_el0 (the thread pointer register on AArch64) to the register state, so it gets properly saved/restored on context switches. The userspace TLS allocation code is kept pretty similar to the original kernel TLS code, aside from a couple of style changes. We also have to add a new argument "tls_pointer" to SC_create_thread_params, as we otherwise can't prevent race conditions between setting the thread pointer register and signal handling code that might be triggered before the thread pointer was set, which could use TLS.	2024-04-19 16:46:47 -06:00
Sönke Holz	faede8c93a	Kernel/riscv64: Implement execve	2024-03-25 14:10:05 -06:00
Idan Horowitz	6a4b93b3e0	Kernel: Protect processes' master TLS with a fine-grained spinlock This moves it out of the scope of the big process lock, and allows us to wean some syscalls off it, starting with sys$allocate_tls.	2023-12-26 19:20:21 +01:00
Idan Horowitz	1bea780a7f	Kernel: Reject loading ELF files with no loadable segments If there's no loadable segments then there can't be any code to execute either. This resolves a crash these kinds of ELF files would cause from the directly following VERIFY statement.	2023-12-15 21:36:25 +01:00
Daniel Bertalan	45d81dceed	Everywhere: Replace `ElfW(type)` macro usage with `Elf_type` This works around a `clang-format-17` bug which caused certain usages to be misformatted and fail to compile. Fixes #8315	2023-12-01 10:02:39 +02:00
Liav A	5dba1dedb7	Kernel: Don't warn when running dynamically-linked ELF without PT_INTERP We could technically copy the dynamic loader to other path and run it from there, so let's not assume paths. If the user is so determined to do such thing, then a warning is quite meaningless.	2023-11-27 09:27:34 -07:00
Sönke Holz	da88d766b2	Kernel/riscv64: Make the kernel compile This commits inserts TODOs into all necessary places to make the kernel compile on riscv64!	2023-11-10 15:51:31 -07:00
kleines Filmröllchen	398d271a46	Kernel: Share Processor class (and others) across architectures About half of the Processor code is common across architectures, so let's share it with a templated base class. Also, other code that can be shared in some ways, like FPUState and TrapFrame functions, is adjusted here. Functions which cannot be shared trivially (without internal refactoring) are left alone for now.	2023-10-03 16:08:29 -06:00
Liav A	3fd4997fc2	Kernel: Don't allocate memory for names of processes and threads Instead, use the FixedCharBuffer class to ensure we always use a static buffer storage for these names. This ensures that if a Process or a Thread were created, there's a guarantee that setting a new name will never fail, as only copying of strings should be done to that static storage. The limits which are set are 32 characters for processes' names and 64 characters for thread names - this is because threads' names could be more verbose than processes' names.	2023-08-09 21:06:54 -06:00
Tim Schumacher	9d6372ff07	Kernel: Consolidate finding the ELF stack size with validation Previously, we started parsing the ELF file again in a completely different place, and without the partial mapping that we do while validating. Instead of doing manual parsing in two places, just capture the requested stack size right after we validated it.	2023-07-10 21:08:31 -06:00
Timothy Flynn	c911781c21	Everywhere: Remove needless trailing semi-colons after functions This is a new option in clang-format-16.	2023-07-08 10:32:56 +01:00
Jelle Raaijmakers	81a6976e90	Kernel: De-atomicize fields for promises in `Process` These 4 fields were made `Atomic` in `c3f668a758`, at which time these were still accessed unserialized and TOCTOU bugs could happen. Later, in `8ed06ad814`, we serialized access to these fields in a number of helper methods, removing the need for `Atomic`.	2023-06-09 17:15:54 +02:00
Liav A	927926b924	Kernel: Move Performance-measurement code to the Tasks subdirectory	2023-06-04 21:32:34 +02:00
Liav A	7c0540a229	Everywhere: Move global Kernel pattern code to Kernel/Library directory This has KString, KBuffer, DoubleBuffer, KBufferBuilder, IOWindow, UserOrKernelBuffer and ScopedCritical classes being moved to the Kernel/Library subdirectory. Also, move the panic and assertions handling code to that directory.	2023-06-04 21:32:34 +02:00
Liav A	490856453d	Kernel: Move Random.{h,cpp} code to Security subdirectory	2023-06-04 21:32:34 +02:00
Liav A	1b04726c85	Kernel: Move all tasks-related code to the Tasks subdirectory	2023-06-04 21:32:34 +02:00
Tim Schumacher	9be5dcfd89	Kernel: Also search the main program for stack size requests	2023-04-14 16:12:04 +01:00
Tim Schumacher	ed74f792e2	Kernel: Pick the maximum out of the requested stack sizes	2023-04-14 16:12:04 +01:00
Andreas Kling	9264303f5d	Kernel: Don't reuse old master TLS region data in sys$execve() When switching to the new address space, we also have to switch the Process::m_master_tls_* variables as they may refer to a region in the old address space. This was causing `su` to not run correctly. Regression from `65641187ff`.	2023-04-08 07:28:27 +02:00
Idan Horowitz	003989e1b0	Kernel: Store a pointer to the owner process in PageDirectory This replaces the previous owning address space pointer. This commit should not change any of the existing functionality, but it lays down the groundwork needed to let us properly access the region table under the address space spinlock during page fault handling.	2023-04-06 20:30:03 +03:00
Idan Horowitz	65641187ff	Kernel: Restructure execve to ensure Process::m_space is always in use Instead of setting up the new address space on it's own, and only swap to the new address space at the end, we now immediately swap to the new address space (while still keeping the old one alive) and only revert back to the old one if we fail at any point. This is done to ensure that the process' active address space (aka the contents of m_space) always matches actual address space in use by it. That should allow us to eventually make the page fault handler process- aware, which will let us properly lock the process address space lock.	2023-04-06 20:30:03 +03:00
Idan Horowitz	a349570a04	Kernel: Abstract Processor::assume_context flags using InterruptsState The details of the specific interrupt bits that must be turned on are irrelevant to the sys$execve implementation. Abstract it away to the Processor implementations using the InterruptsState enum.	2023-04-03 02:59:37 -06:00
Liav A	633006926f	Kernel: Make the Jails' internal design a lot more sane This is done with 2 major steps: 1. Remove JailManagement singleton and use a structure that resembles what we have with the Process object. This is required later for the second step in this commit, but on its own, is a major change that removes this clunky singleton that had no real usage by itself. 2. Use IntrusiveLists to keep references to Process objects in the same Jail so it will be much more straightforward to iterate on this kind of objects when needed. Previously we locked the entire Process list and we did a simple pointer comparison to check if the checked Process we iterate on is in the same Jail or not, which required taking multiple Spinlocks in a very clumsy and heavyweight way.	2023-03-12 10:21:59 -06:00
Andreas Kling	d1371d66f7	Kernel: Use non-locking {Nonnull,}RefPtr for OpenFileDescription This patch switches away from {Nonnull,}LockRefPtr to the non-locking smart pointers throughout the kernel. I've looked at the handful of places where these were being persisted and I don't see any race situations. Note that the process file descriptor table (Process::m_fds) was already guarded via MutexProtected.	2023-03-07 00:30:12 +01:00
Andreas Kling	359d6e7b0b	Everywhere: Stop using NonnullOwnPtrVector Same as NonnullRefPtrVector: weird semantics, questionable benefits.	2023-03-06 23:46:35 +01:00
Sam Atkins	fe7b08dad7	Kernel: Protect Process::m_name with a spinlock This also lets us remove the `get_process_name` and `set_process_name` syscalls from the big lock. :^)	2023-02-06 20:36:53 +01:00
Timon Kruiper	b941bd55d9	Kernel: Add Syscalls/execve.cpp to aarch64 build	2023-01-27 20:47:08 +00:00
Timon Kruiper	1fbf562e7e	Kernel: Add ThreadRegisters::set_exec_state and use it in execve.cpp Using this abstraction it is possible to compile this file for aarch64.	2023-01-27 20:47:08 +00:00
Timon Kruiper	12322670cb	Kernel: Use InterruptsState abstraction in execve.cpp This was using the x86_64 specific cpu_flags abstraction, which is not compatible with aarch64.	2023-01-27 20:47:08 +00:00
Andrew Kaster	ddea37b521	Kernel+LibC: Move name length constants to Kernel/API from limits.h Reduce inclusion of limits.h as much as possible at the same time. This does mean that kmalloc.h is now including Kernel/API/POSIX/limits.h instead of LibC/limits.h, but the scope could be limited a lot more. Basically every file in the kernel includes kmalloc.h, and needs the limits.h include for PAGE_SIZE.	2023-01-21 10:43:59 -07:00
Liav A	04221a7533	Kernel: Mark Process::jail() method as const We really don't want callers of this function to accidentally change the jail, or even worse - remove the Process from an attached jail. To ensure this never happens, we can just declare this method as const so nobody can mutate it this way.	2023-01-07 03:44:59 +03:30
yyny	9ca979846c	Kernel: Add `sid` and `pgid` to `Credentials` There are places in the kernel that would like to have access to `pgid` credentials in certain circumstances. I haven't found any use cases for `sid` yet, but `sid` and `pgid` are both changed with `sys$setpgid`, so it seemed sensical to add it. In Linux, `man 7 credentials` also mentions both the session id and process group id, so this isn't unprecedented.	2023-01-03 18:13:11 +01:00
Liav A	e598f22768	Kernel: Disallow executing SUID binaries if process is jailed Check if the process we are currently running is in a jail, and if that is the case, fail early with the EPERM error code. Also, as Brian noted, we should also disallow attaching to a jail in case of already running within a setid executable, as this leaves the user with false thinking of being secure (because you can't exec new setid binaries), but the current program is still marked setid, which means that at the very least we gained permissions while we didn't expect it, so let's block it.	2022-12-30 15:49:37 -05:00
Liav A	5ff318cf3a	Kernel: Remove i686 support	2022-12-28 11:53:41 +01:00
Agustin Gianni	ac40090583	Kernel: Add the auxiliary vector to the stack size validation This patch validates that the size of the auxiliary vector does not exceed `Process::max_auxiliary_size`. The auxiliary vector is a range of memory in userspace stack where the kernel can pass information to the process that will be created via `Process:do_exec`. The reason the kernel needs to validate its size is that the about to be created process needs to have remaining space on the stack. Previously only `argv` and `envp` were taken into account for the size validation, with this patch, the size of `auxv` is also checked. All three elements contain values that a user (or an attacker) can specify. This patch adds the constant `Process::max_auxiliary_size` which is defined to be one eight of the user-space stack size. This is the approach taken by `Process:max_arguments_size` and `Process::max_environment_size` which are used to check the sizes of `argv` and `envp`.	2022-12-14 15:09:28 +00:00
sin-ack	ef6921d7c7	Kernel+LibC+LibELF: Set stack size based on PT_GNU_STACK during execve Some programs explicitly ask for a different initial stack size than what the OS provides. This is implemented in ELF by having a PT_GNU_STACK header which has its p_memsz set to the amount that the program requires. This commit implements this policy by reading the p_memsz of the header and setting the main thread stack size to that. ELF::Image::validate_program_headers ensures that the size attribute is a reasonable value.	2022-12-11 19:55:37 -07:00
Liav A	718ae68621	Kernel+LibCore+LibC: Implement support for forcing unveil on exec To accomplish this, we add another VeilState which is called LockedInherited. The idea is to apply exec unveil data, similar to execpromises of the pledge syscall, on the current exec'ed program during the execve sequence. When applying the forced unveil data, the veil state is set to be locked but the special state of LockedInherited ensures that if the new program tries to unveil paths, the request will silently be ignored, so the program will continue running without receiving an error, but is still can only use the paths that were unveiled before the exec syscall. This in turn, allows us to use the unveil syscall with a special utility to sandbox other userland programs in terms of what is visible to them on the filesystem, and is usable on both programs that use or don't use the unveil syscall in their code.	2022-11-26 12:42:15 -07:00
Liav A	5e062414c1	Kernel: Add support for jails Our implementation for Jails resembles much of how FreeBSD jails are working - it's essentially only a matter of using a RefPtr in the Process class to a Jail object. Then, when we iterate over all processes in various cases, we could ensure if either the current process is in jail and therefore should be restricted what is visible in terms of PID isolation, and also to be able to expose metadata about Jails in /sys/kernel/jails node (which does not reveal anything to a process which is in jail). A lifetime model for the Jail object is currently plain simple - there's simpy no way to manually delete a Jail object once it was created. Such feature should be carefully designed to allow safe destruction of a Jail without the possibility of releasing a process which is in Jail from the actual jail. Each process which is attached into a Jail cannot leave it until the end of a Process (i.e. when finalizing a Process). All jails are kept being referenced in the JailManagement. When a last attached process is finalized, the Jail is automatically destroyed.	2022-11-05 18:00:58 -06:00
Liav A	965afba320	Kernel/FileSystem: Add a few missing includes In preparation to future commits, we need to ensure that OpenFileDescription.h doesn't include the VirtualFileSystem.h file to avoid include loops.	2022-10-22 16:57:52 -04:00
Andreas Kling	cf16b2c8e6	Kernel: Wrap process address spaces in SpinlockProtected This forces anyone who wants to look into and/or manipulate an address space to lock it. And this replaces the previous, more flimsy, manual spinlock use. Note that pointers into the address space are not safe to use after you unlock the space. We've got many issues like this, and we'll have to track those down as wlel.	2022-08-24 14:57:51 +02:00
Anthony Iacono	f86b671de2	Kernel: Use Process::credentials() and remove user ID/group ID helpers Move away from using the group ID/user ID helpers in the process to allow for us to take advantage of the immutable credentials instead.	2022-08-22 12:46:32 +02:00

1 2 3 4 5 ...

279 Commits