Commit Graph

446 Commits

Author SHA1 Message Date
Liav A.
b93ca74d81 Kernel: Add a prctl option to enter jail mode until an execve syscall
In addition to the already existing option to enter jail mode (which is
set indefinitely), there should be a less restrictive option that should
allow exiting jail mode when doing the execve syscall.

This option will be useful for programs that need this kind of security
layer only in their runtime, but they're meant to actually initiate
another program in the end.
2024-10-03 12:39:45 +02:00
Liav A.
b90a36d2a9 Kernel+Userland: Rename jailed => jailed_until_exit
In all instances, it should be clear that the jailing of a process is
ending when the process exits.

This is a preparation before introducing another option to set a process
as jailed until it calls the execve syscall.
2024-10-03 12:39:45 +02:00
kleines Filmröllchen
ac44ec5ebc Kernel+ifconfig: Allow setting an IPv6 address on an interface
Since we take a socket address with an address
type, this allows us to support setting an IPv6
address using an IPv4 socket without a
particularly hacky API. This deviates from Linux's
behavior (see
https://www.man7.org/linux/man-pages/man7/netdevice.7.html
) where AF_INET6 uses a completely different
control structure, but this doesn't seem
necessary.

This requires changing the sockaddr size to fit
sockaddr_in6, as the network ioctl's are the only
place where sockaddr is used with a fixed size
(and not with variable size data like in POSIX
APIs, which would support sockaddr_in6 without
changes).

ifconfig takes a new parameter for setting the
IPv6 address of an interface. The IPv4 address
short option '-i' is removed, as it only yields
confusion (which IP version is the default? and
usually such options are called -4 or -6, if they
exist) and isn't necessary thanks to the brief
long option name.

This commit's main purpose for now is to allow
participating in IPv6 NDP and pings without
requiring SLAC.

Co-authored-by: Dominique Liberda <ja@sdomi.pl>
2024-09-08 18:27:55 -04:00
Sönke Holz
7b0553eb13 Kernel/x86: Add missing rcx + r11 clobber to Syscall::invoke with 4 args
The 4-arg version was missing those.
2024-08-21 08:17:17 -04:00
Liav A.
cb10f70394 Kernel: Change internal handling of filesystem-specific options
Instead of using a raw `KBuffer` and letting each implementation to
populating the specific flags on its own, we change things so we only
let each FileSystem implementation to validate the flag and its value
but then store it in a HashMap which its key is the flag name and
the value is a special new class called `FileSystemSpecificOption`
which wraps around `AK::Variant<...>`.

This approach has multiple advantages over the previous:
- It allows runtime inspection of what the user has set on a `MountFile`
  description for a specific filesystem.
- It ensures accidental overriding of filesystem specific option that
  was already set is not possible
- It removes ugly casting of a `KBuffer` contents to a strongly-typed
  values. Instead, a strongly-typed `AK::Variant` is used which ensures
  we always get a value without doing any casting.

Please note that we have removed support for ASCII string-oriented flags
as there were no actual use cases, and supporting such type would make
`FileSystemSpecificOption` more complicated unnecessarily for now.
2024-08-03 20:35:06 +02:00
Liav A.
0e6624dc86 Kernel: Introduce the unshare syscall family
These 2 syscalls are responsible for unsharing resources in the system,
such as hostname, VFS root contexts and process lists.

Together with an appropriate userspace implementation, these syscalls
could be used for creating a sandbox environment (containers) for user
programs.
2024-07-21 11:44:23 +02:00
Liav A.
4370bbb3ad Kernel+Userland: Introduce the copy_mount syscall
This new syscall will be used by the upcoming runc (run-container)
utility.

In addition to that, this syscall allows userspace to neatly copy RAMFS
instances to other places, which was not possible in the past.
2024-07-21 11:44:23 +02:00
Liav A.
dd59fe35c7 Kernel+Userland: Reduce jails to be a simple boolean flag
The whole concept of Jails was far more complicated than I actually want
it to be, so let's reduce the complexity of how it works from now on.
Please note that we always leaked the attach count of a Jail object in
the fork syscall if it failed midway.
Instead, we should have attach to the jail just before registering the
new Process, so we don't need to worry about unsuccessful Process
creation.

The reduction of complexity in regard to jails means that instead of
relying on jails to provide PID isolation, we could simplify the whole
idea of them to be a simple SetOnce, and let the ProcessList (now called
ScopedProcessList) to be responsible for this type of isolation.

Therefore, we apply the following changes to do so:
- We make the Jail concept no longer a class of its own. Instead, we
  simplify the idea of being jailed to a simple ProtectedValues boolean
  flag. This means that we no longer check of matching jail pointers
  anywhere in the Kernel code.
  To set a process as jailed, a new prctl option was added to set a
  Kernel SetOnce boolean flag (so it cannot change ever again).
- We provide Process & Thread methods to iterate over process lists.
  A process can either iterate on the global process list, or if it's
  attached to a scoped process list, then only over that list.
  This essentially replaces the need of checking the Jail pointer of a
  process when iterating over process lists.
2024-07-21 11:44:23 +02:00
Liav A.
91c87c5b77 Kernel+Userland: Prepare for considering VFSRootContext when mounting
Expose some initial interfaces in the mount-related syscalls to select
the desired VFSRootContext, by specifying the VFSRootContext index
number.

For now there's still no way to create a different VFSRootContext, so
the only valid IDs are -1 (for currently attached VFSRootContext) or 1
for the first userspace VFSRootContext.
2024-07-21 11:44:23 +02:00
Liav A.
e89726562b Kernel: Remove the AllMiceDevice class
This device was a short-lived solution to allow userspace (WindowServer)
to easily support hotplugging mouse devices with presumably very small
modifications on userspace side.

Now that we have a proper mechanism to propagate hotplug events from the
DeviceMapper program to any program that needs to get such events, we no
longer need this device, so let's remove it.
2024-07-06 21:42:32 +02:00
Liav A.
a1a2470c1e Documentation: Add a small guide on how to allocate new major numbers
Contributors that want to add new types of character or block devices
must follow these guidelines in order to keep everything intact.
2024-07-06 21:42:32 +02:00
Liav A.
16244c490a Kernel: Allocate all device major numbers within one known header file
We used to allocate major numbers quite randomly, with no common place
to look them up if needed.
This commit is changing that by placing all major number allocations
under a new C++ namespace, in the API/MajorNumberAllocation.h file.

We also add the foundations of what is needed before we can publish this
information (allocated numbers for block and char devices) to userspace.
2024-07-06 21:42:32 +02:00
Liav A.
7f5a2c1466 Kernel: Register block and character devices in separate HashMaps
Instead of putting everything in one hash map, let's distinguish between
the devices based on their type.

This change makes the devices semantically separated, and is considered
a preparation before we could expose a comprehensive list of allocations
per major numbers and their purpose.
2024-07-06 21:42:32 +02:00
implicitfield
4574a8c334 Kernel+LibC+LibCore: Implement mknodat(2) 2024-05-14 22:30:39 +02:00
Liav A.
15ddc1f17a Kernel+Userland: Reject W->X prot region transition after a prctl call
We add a prctl option which would be called once after the dynamic
loader has finished to do text relocations before calling the actual
program entry point.

This change makes it much more obvious when we are allowed to change
a region protection access from being writable to executable.
The dynamic loader should be able to do this, but after a certain point
it is obvious that such mechanism should be disabled.
2024-05-14 12:41:51 -06:00
Timothy Flynn
ec492a1a08 Everywhere: Run clang-format
The following command was used to clang-format these files:

    clang-format-18 -i $(find . \
        -not \( -path "./\.*" -prune \) \
        -not \( -path "./Base/*" -prune \) \
        -not \( -path "./Build/*" -prune \) \
        -not \( -path "./Toolchain/*" -prune \) \
        -not \( -path "./Ports/*" -prune \) \
        -type f -name "*.cpp" -o -name "*.mm" -o -name "*.h")

There are a couple of weird cases where clang-format now thinks that a
pointer access in an initializer list, e.g. `m_member(ptr->foo)`, is a
lambda return statement, and it puts spaces around the `->`.
2024-04-24 16:50:01 -04:00
Sönke Holz
243d7003a2 Kernel+LibC+LibELF: Move TLS handling to userspace
This removes the allocate_tls syscall and adds an archctl option to set
the fs_base for the current thread on x86-64, since you can't set that
register from userspace. enter_thread_context loads the fs_base for the
next thread on each context switch.
This also moves tpidr_el0 (the thread pointer register on AArch64) to
the register state, so it gets properly saved/restored on context
switches.

The userspace TLS allocation code is kept pretty similar to the original
kernel TLS code, aside from a couple of style changes.

We also have to add a new argument "tls_pointer" to
SC_create_thread_params, as we otherwise can't prevent race conditions
between setting the thread pointer register and signal handling code
that might be triggered before the thread pointer was set, which could
use TLS.
2024-04-19 16:46:47 -06:00
Sönke Holz
57f4f8caf8 Kernel+LibC: Introduce new archctl syscall
This syscall will be used for architecture-specific operations.
2024-04-19 16:46:47 -06:00
Dan Klishch
5ed7cd6e32 Everywhere: Use east const in more places
These changes are compatible with clang-format 16 and will be mandatory
when we eventually bump clang-format version. So, since there are no
real downsides, let's commit them now.
2024-04-19 06:31:19 -04:00
Liav A
0734de9f9a Kernel+Userland: Add mount MS_SRCHIDDEN option
Either we mount from a loop device or other source, the user might want
to obfuscate the given source for security reasons, so this option will
ensure this will happen.
If passed during a mount, the source will be hidden when reading from
the /sys/kernel/df node.
2024-03-13 15:33:47 -06:00
Liav A
5dcf03ad9a Kernel/Devices: Introduce the LoopDevice device
This device is a block device that allows a user to effectively treat an
Inode as a block device.

The static construction method is given an OpenFileDescription reference
but validates that:
- The description has a valid custody (so it's not some arbitrary file).
  Failing this requirement will yield EINVAL.
- The description custody points to an Inode which is a regular file, as
  we only support (seekable) regular files. Failing this requirement
  will yield ENOTSUP.

LoopDevice can be used to mount a regular file on the filesystem like
other supported types of (physical) block devices.
2024-03-13 15:33:47 -06:00
Timothy Flynn
4b777397b5 Kernel: Define bitwise operations for KeyModifier
This type is designed to be use as a flag. Define bitwise operations for
convenience.
2024-03-06 07:46:18 +01:00
implicitfield
a69e113d49 Kernel: Add a definition for the FAT32 FSInfo structure 2024-02-24 15:54:52 -07:00
implicitfield
8b77737f8e Kernel: Move FAT structure definitions to Kernel/API
We need to be able to include these definitions from userspace as the
upcoming mkfs.fat utility will depend on these.
2024-02-24 15:54:52 -07:00
Nico Weber
4409b33145 AK: Make IndexSequence use size_t
This makes it possible to use MakeIndexSequqnce in functions like:

    template<typename T, size_t N>
    constexpr auto foo(T (&a)[N])

This means AK/StdLibExtraDetails.h must now include AK/Types.h
for size_t, which means AK/Types.h can no longer include
AK/StdLibExtras.h (which arguably it shouldn't do anyways),
which requires rejiggering some things.

(IMHO Types.h shouldn't use AK::Details metaprogramming at all.
FlatPtr doesn't necessarily have to use Conditional<> and ssize_t could
maybe be in its own header or something. But since it's tangential to
this PR, going with the tried and true "lift things that cause the
cycle up to the top" approach.)
2024-02-11 18:53:00 +01:00
Jelle Raaijmakers
015622bc22 Kernel: Set correct KeyCode count
The underlying value of the `KeyCode::Key_*` enum values starts at 0,
so we should add 1 to `Key_Menu` to get the correct count.
2024-01-14 15:06:37 -07:00
hanaa12G
7abda6a36f Kernel: Add new sysconf option _SC_GETGR_R_SIZE_MAX 2024-01-06 04:59:50 -07:00
Liav A
c8f27d7cb8 Kernel+Userland: Implement support for PS2 scan code set 2
This scan code set is more advanced than the basic scan code set 1, and
is required to be supported for some bare metal hardware that might not
properly enable the PS2 first port translation in the i8042 controller.

LibWeb can now also generate bindings for keyboard events like the Pause
key, as well as other function keys (such as Right Alt, etc).

The logic for handling scan code sets is implemented by the PS2 keyboard
driver and is abstracted from the main HID KeyboardDevice code which
only handles "standard" KeyEvent(s).
2024-01-04 10:38:03 -07:00
Andrew Kaster
d3025668a4 Revert "Kernel+Userland: Implement support for PS2 scan code set 2"
This reverts commit 61a385fc01.

The commit broke the shift and caps lock key from working.
2023-12-29 22:02:19 +01:00
Liav A
61a385fc01 Kernel+Userland: Implement support for PS2 scan code set 2
This scan code set is more advanced than the basic scan code set 1, and
is required to be supported for some bare metal hardware that might not
properly enable the PS2 first port translation in the i8042 controller.

LibWeb can now also generate bindings for keyboard events like the Pause
key, as well as other function keys (such as Right Alt, etc).

The logic for handling scan code sets is implemented by the PS2 keyboard
driver and is abstracted from the main HID KeyboardDevice code which
only handles "standard" KeyEvent(s).
2023-12-29 16:40:59 +01:00
Liav A
b89cc81674 Kernel/HID: Expose character map index in the KeyEvent structure
This will be used later on by WindowServer so it will not use the
scancode, which will represent the actual character index in the
keyboard mapping when using scan code set 2.
2023-12-29 16:40:59 +01:00
Idan Horowitz
519214697b Kernel: Mark sys$getsockname as not needing the big process lock
This syscall does not access any big process lock protected resources.
2023-12-26 19:20:21 +01:00
Idan Horowitz
ed5406e47d Kernel: Mark sys$getpeername as not needing the big process lock
This syscall does not access any big process lock protected resources.
2023-12-26 19:20:21 +01:00
Idan Horowitz
24a60c5a10 Kernel: Mark sys$ioctl as not needing the big process lock
This syscall does not access any big process lock protected resources.
2023-12-26 19:20:21 +01:00
Idan Horowitz
d63667dbf1 Kernel: Mark sys$kill_thread as not needing the big process lock
This syscall does not access any big process lock protected resources.
2023-12-26 19:20:21 +01:00
Idan Horowitz
b44628c1fb Kernel: Mark sys$join_thread as not needing the big process lock
This syscall does not access any big process lock protected resources.
2023-12-26 19:20:21 +01:00
Idan Horowitz
82e6090f47 Kernel: Mark sys$detach_thread as not needing the big process lock
This syscall does not access any big process lock protected resources.
2023-12-26 19:20:21 +01:00
Idan Horowitz
b49a0e2c61 Kernel: Mark sys$create_thread as not needing the big process lock
Now that the master TLS region is spinlock protected, this syscall does
not access any big process lock protected resources.
2023-12-26 19:20:21 +01:00
Idan Horowitz
6a4b93b3e0 Kernel: Protect processes' master TLS with a fine-grained spinlock
This moves it out of the scope of the big process lock, and allows us
to wean some syscalls off it, starting with sys$allocate_tls.
2023-12-26 19:20:21 +01:00
Sönke Holz
fd8858ace2 Kernel/riscv64: Add RISC-V Syscall API 2023-11-10 15:51:31 -07:00
Romain Chardiny
61ac554a34 Kernel/Net: Implement TCP_NODELAY 2023-11-08 09:31:54 +01:00
Liav A
26f96d2a42 Kernel+Userland: Add option for duration of /dev/beep producing sound 2023-11-03 15:19:33 +01:00
Liav A
1b00618fd9 Kernel+Userland: Replace the beep syscall with the new /dev/beep device
There's no need to have separate syscall for this kind of functionality,
as we can just have a device node in /dev, called "beep", that allows
writing tone generation packets to emulate the same behavior.

In addition to that, we remove LibC sysbeep function, as this function
was never being used by any C program nor it was standardized in any
way.
Instead, we move the userspace implementation to LibCore.
2023-11-03 15:19:33 +01:00
Liav A
50429d3b22 LibC+Kernel: Move GPU-related API methods to a LibC header file
The Kernel/API directory in general shouldn't include userspace code,
but structure definitions that both are shared between the Kernel and
userspace.

All users of the ioctl API obviously use LibC so LibC is the most common
and shared library for the affected programs.
2023-09-15 11:05:25 -06:00
Liav A
8fe74c7d57 LibC+Kernel: Move device-files related methods to a LibC header file
The Kernel/API directory in general shouldn't include userspace code,
but structure definitions that both are shared between the Kernel and
userspace.

LibC is the most appropriate place for these methods as they're already
included in the sys/sysmacros.h file to create a set of convenient
macros for these methods.
2023-09-15 11:05:25 -06:00
Jakub Berkop
54e79aa1d9 Kernel+ProfileViewer: Display additional filesystem events 2023-09-09 11:26:51 -06:00
Liav A
39c93f63c8 Kernel: Move FileSystem/DeviceFileTypes.h => API/DeviceFileTypes.h
This file will be used by userspace code later on, so let's move to the
API directory.
2023-09-07 11:50:50 -06:00
Liav A
1c0aa51684 Kernel+Userland: Remove the {get,set}_thread_name syscalls
These syscalls are not necessary on their own, and they give the false
impression that a caller could set or get the thread name of any process
in the system, which is not true.

Therefore, move the functionality of these syscalls to be options in the
prctl syscall, which makes it abundantly clear that these operations
could only occur from a running thread in a process that sees other
threads in that process only.
2023-08-25 11:51:52 +02:00
Seal Sealy
1262a7d142 Kernel: Alias MAXNAMLEN to NAME_MAX
MAXNAMLEN is the BSD name for NAME_MAX, as used by some programs.
2023-08-18 11:43:19 +02:00
Lucas CHOLLET
cd0fe4bb48 Kernel: Mark sys$poll as not needing the big lock 2023-08-01 05:35:26 +02:00