In addition to the already existing option to enter jail mode (which is
set indefinitely), there should be a less restrictive option that should
allow exiting jail mode when doing the execve syscall.
This option will be useful for programs that need this kind of security
layer only in their runtime, but they're meant to actually initiate
another program in the end.
In all instances, it should be clear that the jailing of a process is
ending when the process exits.
This is a preparation before introducing another option to set a process
as jailed until it calls the execve syscall.
Since we take a socket address with an address
type, this allows us to support setting an IPv6
address using an IPv4 socket without a
particularly hacky API. This deviates from Linux's
behavior (see
https://www.man7.org/linux/man-pages/man7/netdevice.7.html
) where AF_INET6 uses a completely different
control structure, but this doesn't seem
necessary.
This requires changing the sockaddr size to fit
sockaddr_in6, as the network ioctl's are the only
place where sockaddr is used with a fixed size
(and not with variable size data like in POSIX
APIs, which would support sockaddr_in6 without
changes).
ifconfig takes a new parameter for setting the
IPv6 address of an interface. The IPv4 address
short option '-i' is removed, as it only yields
confusion (which IP version is the default? and
usually such options are called -4 or -6, if they
exist) and isn't necessary thanks to the brief
long option name.
This commit's main purpose for now is to allow
participating in IPv6 NDP and pings without
requiring SLAC.
Co-authored-by: Dominique Liberda <ja@sdomi.pl>
Instead of using a raw `KBuffer` and letting each implementation to
populating the specific flags on its own, we change things so we only
let each FileSystem implementation to validate the flag and its value
but then store it in a HashMap which its key is the flag name and
the value is a special new class called `FileSystemSpecificOption`
which wraps around `AK::Variant<...>`.
This approach has multiple advantages over the previous:
- It allows runtime inspection of what the user has set on a `MountFile`
description for a specific filesystem.
- It ensures accidental overriding of filesystem specific option that
was already set is not possible
- It removes ugly casting of a `KBuffer` contents to a strongly-typed
values. Instead, a strongly-typed `AK::Variant` is used which ensures
we always get a value without doing any casting.
Please note that we have removed support for ASCII string-oriented flags
as there were no actual use cases, and supporting such type would make
`FileSystemSpecificOption` more complicated unnecessarily for now.
These 2 syscalls are responsible for unsharing resources in the system,
such as hostname, VFS root contexts and process lists.
Together with an appropriate userspace implementation, these syscalls
could be used for creating a sandbox environment (containers) for user
programs.
This new syscall will be used by the upcoming runc (run-container)
utility.
In addition to that, this syscall allows userspace to neatly copy RAMFS
instances to other places, which was not possible in the past.
The whole concept of Jails was far more complicated than I actually want
it to be, so let's reduce the complexity of how it works from now on.
Please note that we always leaked the attach count of a Jail object in
the fork syscall if it failed midway.
Instead, we should have attach to the jail just before registering the
new Process, so we don't need to worry about unsuccessful Process
creation.
The reduction of complexity in regard to jails means that instead of
relying on jails to provide PID isolation, we could simplify the whole
idea of them to be a simple SetOnce, and let the ProcessList (now called
ScopedProcessList) to be responsible for this type of isolation.
Therefore, we apply the following changes to do so:
- We make the Jail concept no longer a class of its own. Instead, we
simplify the idea of being jailed to a simple ProtectedValues boolean
flag. This means that we no longer check of matching jail pointers
anywhere in the Kernel code.
To set a process as jailed, a new prctl option was added to set a
Kernel SetOnce boolean flag (so it cannot change ever again).
- We provide Process & Thread methods to iterate over process lists.
A process can either iterate on the global process list, or if it's
attached to a scoped process list, then only over that list.
This essentially replaces the need of checking the Jail pointer of a
process when iterating over process lists.
Expose some initial interfaces in the mount-related syscalls to select
the desired VFSRootContext, by specifying the VFSRootContext index
number.
For now there's still no way to create a different VFSRootContext, so
the only valid IDs are -1 (for currently attached VFSRootContext) or 1
for the first userspace VFSRootContext.
This device was a short-lived solution to allow userspace (WindowServer)
to easily support hotplugging mouse devices with presumably very small
modifications on userspace side.
Now that we have a proper mechanism to propagate hotplug events from the
DeviceMapper program to any program that needs to get such events, we no
longer need this device, so let's remove it.
We used to allocate major numbers quite randomly, with no common place
to look them up if needed.
This commit is changing that by placing all major number allocations
under a new C++ namespace, in the API/MajorNumberAllocation.h file.
We also add the foundations of what is needed before we can publish this
information (allocated numbers for block and char devices) to userspace.
Instead of putting everything in one hash map, let's distinguish between
the devices based on their type.
This change makes the devices semantically separated, and is considered
a preparation before we could expose a comprehensive list of allocations
per major numbers and their purpose.
We add a prctl option which would be called once after the dynamic
loader has finished to do text relocations before calling the actual
program entry point.
This change makes it much more obvious when we are allowed to change
a region protection access from being writable to executable.
The dynamic loader should be able to do this, but after a certain point
it is obvious that such mechanism should be disabled.
The following command was used to clang-format these files:
clang-format-18 -i $(find . \
-not \( -path "./\.*" -prune \) \
-not \( -path "./Base/*" -prune \) \
-not \( -path "./Build/*" -prune \) \
-not \( -path "./Toolchain/*" -prune \) \
-not \( -path "./Ports/*" -prune \) \
-type f -name "*.cpp" -o -name "*.mm" -o -name "*.h")
There are a couple of weird cases where clang-format now thinks that a
pointer access in an initializer list, e.g. `m_member(ptr->foo)`, is a
lambda return statement, and it puts spaces around the `->`.
This removes the allocate_tls syscall and adds an archctl option to set
the fs_base for the current thread on x86-64, since you can't set that
register from userspace. enter_thread_context loads the fs_base for the
next thread on each context switch.
This also moves tpidr_el0 (the thread pointer register on AArch64) to
the register state, so it gets properly saved/restored on context
switches.
The userspace TLS allocation code is kept pretty similar to the original
kernel TLS code, aside from a couple of style changes.
We also have to add a new argument "tls_pointer" to
SC_create_thread_params, as we otherwise can't prevent race conditions
between setting the thread pointer register and signal handling code
that might be triggered before the thread pointer was set, which could
use TLS.
These changes are compatible with clang-format 16 and will be mandatory
when we eventually bump clang-format version. So, since there are no
real downsides, let's commit them now.
Either we mount from a loop device or other source, the user might want
to obfuscate the given source for security reasons, so this option will
ensure this will happen.
If passed during a mount, the source will be hidden when reading from
the /sys/kernel/df node.
This device is a block device that allows a user to effectively treat an
Inode as a block device.
The static construction method is given an OpenFileDescription reference
but validates that:
- The description has a valid custody (so it's not some arbitrary file).
Failing this requirement will yield EINVAL.
- The description custody points to an Inode which is a regular file, as
we only support (seekable) regular files. Failing this requirement
will yield ENOTSUP.
LoopDevice can be used to mount a regular file on the filesystem like
other supported types of (physical) block devices.
This makes it possible to use MakeIndexSequqnce in functions like:
template<typename T, size_t N>
constexpr auto foo(T (&a)[N])
This means AK/StdLibExtraDetails.h must now include AK/Types.h
for size_t, which means AK/Types.h can no longer include
AK/StdLibExtras.h (which arguably it shouldn't do anyways),
which requires rejiggering some things.
(IMHO Types.h shouldn't use AK::Details metaprogramming at all.
FlatPtr doesn't necessarily have to use Conditional<> and ssize_t could
maybe be in its own header or something. But since it's tangential to
this PR, going with the tried and true "lift things that cause the
cycle up to the top" approach.)
This scan code set is more advanced than the basic scan code set 1, and
is required to be supported for some bare metal hardware that might not
properly enable the PS2 first port translation in the i8042 controller.
LibWeb can now also generate bindings for keyboard events like the Pause
key, as well as other function keys (such as Right Alt, etc).
The logic for handling scan code sets is implemented by the PS2 keyboard
driver and is abstracted from the main HID KeyboardDevice code which
only handles "standard" KeyEvent(s).
This scan code set is more advanced than the basic scan code set 1, and
is required to be supported for some bare metal hardware that might not
properly enable the PS2 first port translation in the i8042 controller.
LibWeb can now also generate bindings for keyboard events like the Pause
key, as well as other function keys (such as Right Alt, etc).
The logic for handling scan code sets is implemented by the PS2 keyboard
driver and is abstracted from the main HID KeyboardDevice code which
only handles "standard" KeyEvent(s).
This will be used later on by WindowServer so it will not use the
scancode, which will represent the actual character index in the
keyboard mapping when using scan code set 2.
There's no need to have separate syscall for this kind of functionality,
as we can just have a device node in /dev, called "beep", that allows
writing tone generation packets to emulate the same behavior.
In addition to that, we remove LibC sysbeep function, as this function
was never being used by any C program nor it was standardized in any
way.
Instead, we move the userspace implementation to LibCore.
The Kernel/API directory in general shouldn't include userspace code,
but structure definitions that both are shared between the Kernel and
userspace.
All users of the ioctl API obviously use LibC so LibC is the most common
and shared library for the affected programs.
The Kernel/API directory in general shouldn't include userspace code,
but structure definitions that both are shared between the Kernel and
userspace.
LibC is the most appropriate place for these methods as they're already
included in the sys/sysmacros.h file to create a set of convenient
macros for these methods.
These syscalls are not necessary on their own, and they give the false
impression that a caller could set or get the thread name of any process
in the system, which is not true.
Therefore, move the functionality of these syscalls to be options in the
prctl syscall, which makes it abundantly clear that these operations
could only occur from a running thread in a process that sees other
threads in that process only.