Commit Graph

299 Commits

Author SHA1 Message Date
Liav A.
1dfc9e2df3 Kernel+Userland: Add immutable mounts
Immutable mounts are mounts that can't be changed in any aspect, if the
VFSRootContext that hold them is used by a process. This includes two
operations on a mount:
1. Trying to remove the mount from the mount table.
2. Trying to change the flags of the mount.

The condition of a VFSRootContext being held by a process or not is
crucial, as the intention is to allow removal of mounts that marked as
immutable if the VFSRootContext is not being used anymore (for example,
if the container that was created with such context stopped).

Marking mounts as immutable on the first VFS root context essentially
ensures they will never be modified because there will be a process
using that context (which is the "main" VFS root context in the system
runtime).

It should be noted that setting a mount as immutable can be done in
creation time of the mount by passing the MS_IMMUTABLE flag, or by doing
a remount with MS_IMMUTABLE flag.
2024-12-23 20:38:38 +01:00
implicitfield
8ed7225810 Kernel/VFS: Don't explicitly remove "." and ".." entries
No behavior change intended.
2024-12-17 19:02:15 -05:00
implicitfield
0e94887b2c Kernel/VFS: Make filesystem implementations responsible for renames
Previously, the VFS layer would try to handle renames more-or-less by
itself, which really only worked for ext2, and even that was only due to
the replace_child kludge existing specifically for this purpose. This
never worked properly for FATFS, since the VFS layer effectively
depended on filesystems having some kind of reference-counting for
inodes, which is something that simply doesn't exist on any FAT variant
we support.

To resolve various issues with the existing scheme, this commit makes
filesystem implementations themselves responsible for the actual rename
operation, while keeping all the existing validation inside the VFS
layer. The only intended behavior change here is that rename operations
should actually properly work on FATFS.
2024-12-17 19:02:15 -05:00
Liav A.
96e1391c23 Kernel/Devices: Remove the DeviceManagement singleton
This change has many improvements:
- We don't use `LockRefPtr` to hold instances of many base devices as
  with the DeviceManagement class. Instead, we have a saner pattern of
  holding them in a `NonnullRefPtr<T> const`, in a small-text footprint
  class definition in the `Device.cpp` file.
- The awkwardness of using `::the()` each time we need to get references
  to mostly-static objects (like the Event queue) in runtime is now gone
  in the migration to using the `Device` class.
- Acquiring a device feel more obvious because we use now the Device
  class for this method. The method name is improved as well.
2024-10-05 12:26:48 +02:00
Liav A.
cb10f70394 Kernel: Change internal handling of filesystem-specific options
Instead of using a raw `KBuffer` and letting each implementation to
populating the specific flags on its own, we change things so we only
let each FileSystem implementation to validate the flag and its value
but then store it in a HashMap which its key is the flag name and
the value is a special new class called `FileSystemSpecificOption`
which wraps around `AK::Variant<...>`.

This approach has multiple advantages over the previous:
- It allows runtime inspection of what the user has set on a `MountFile`
  description for a specific filesystem.
- It ensures accidental overriding of filesystem specific option that
  was already set is not possible
- It removes ugly casting of a `KBuffer` contents to a strongly-typed
  values. Instead, a strongly-typed `AK::Variant` is used which ensures
  we always get a value without doing any casting.

Please note that we have removed support for ASCII string-oriented flags
as there were no actual use cases, and supporting such type would make
`FileSystemSpecificOption` more complicated unnecessarily for now.
2024-08-03 20:35:06 +02:00
Liav A.
3692af528e Kernel: Move most of VirtualFileSystem code to be in a namespace
There's no point in constructing an object just for the sake of keeping
a state that can be touched by anything in the kernel code.

Let's reduce everything to be in a C++ namespace called with the
previous name "VirtualFileSystem" and keep a smaller textual-footprint
struct called "VirtualFileSystemDetails".

This change also cleans up old "friend class" statements that were no
longer needed, and move methods from the VirtualFileSystem code to more
appropriate places as well.
Please note that the method of locking all filesystems during shutdown
is removed, as in that place there's no meaning to actually locking all
filesystems because of running in kernel mode entirely.
2024-07-21 11:44:23 +02:00
Liav A.
4370bbb3ad Kernel+Userland: Introduce the copy_mount syscall
This new syscall will be used by the upcoming runc (run-container)
utility.

In addition to that, this syscall allows userspace to neatly copy RAMFS
instances to other places, which was not possible in the past.
2024-07-21 11:44:23 +02:00
Liav A.
01e1af732b Kernel/FileSystem: Introduce the VFSRootContext class
The VFSRootContext class, as its name suggests, holds a context for a
root directory with its mount table and the root custody/inode in the
same class.

The idea is derived from the Linux mount namespace mechanism.
It mimicks the concept of the ProcessList object, but it is adjusted for
a root directory tree context.
In contrast to the ProcessList concept, processes that share the default
VFSRootContext can't see other VFSRootContext related properties such as
as the mount table and root custody/inode.

To accommodate to this change progressively, we internally create 2 main
VFS root contexts for now - one for kernel processes (as they don't need
to care about VFS root contexts for the most part), and another for all
userspace programs.
This separation allows us to continue pretending for userspace that
everything is "normal" as it is used to be, until we introduce proper
interfaces in the mount-related syscalls as well as in the SysFS.

We make VFSRootContext objects being listed, as another preparation
before we could expose interfaces to userspace.
As a result, the PowerStateSwitchTask now iterates on all contexts
and tear them down one by one.
2024-07-21 11:44:23 +02:00
Liav A.
7f5a2c1466 Kernel: Register block and character devices in separate HashMaps
Instead of putting everything in one hash map, let's distinguish between
the devices based on their type.

This change makes the devices semantically separated, and is considered
a preparation before we could expose a comprehensive list of allocations
per major numbers and their purpose.
2024-07-06 21:42:32 +02:00
Liav A.
ecc9c5409d Kernel: Ignore dirfd if absolute path is given in VFS-related syscalls
To be able to do this, we add a new class called CustodyBase, which can
be resolved on-demand internally in the VirtualFileSystem resolving path
code.

When being resolved, CustodyBase will return a known custody if it was
constructed with such, if that's not the case it will provide the root
custody if the original path is absolute.
Lastly, if that's not the case as well, it will resolve the given dirfd
to provide a Custody object.
2024-06-01 19:25:15 +02:00
implicitfield
a08d1637e2 Kernel: Add FUSE support
This adds both the fuse device (used for communication between the
kernel and the filesystem) and filesystem implementation itself.
2024-05-07 16:54:27 -06:00
Liav A
0d2e4a7e67 Kernel/FileSystem: Add the DevLoopFS filesystem
Similarly to DevPtsFS, this filesystem is about exposing loop device
nodes easily in /dev/loop, so userspace doesn't need to do anything in
order to use new devices immediately.
2024-03-13 15:33:47 -06:00
Liav A
5dcf03ad9a Kernel/Devices: Introduce the LoopDevice device
This device is a block device that allows a user to effectively treat an
Inode as a block device.

The static construction method is given an OpenFileDescription reference
but validates that:
- The description has a valid custody (so it's not some arbitrary file).
  Failing this requirement will yield EINVAL.
- The description custody points to an Inode which is a regular file, as
  we only support (seekable) regular files. Failing this requirement
  will yield ENOTSUP.

LoopDevice can be used to mount a regular file on the filesystem like
other supported types of (physical) block devices.
2024-03-13 15:33:47 -06:00
Timothy Flynn
4fc88aa17b Kernel: Run clang-format on a couple of FileSystem sources
Fixes bad formatting in commit abcf05801a.
2023-08-25 08:34:21 -04:00
Zak-K-Abdi
abcf05801a Kernel: Allow Ext2FS::flush_writes() to return ErrorOr<void> 2023-08-25 11:36:57 +01:00
Liav A
5efb91ec06 Kernel/VFS: Ensure working with mount entry per a custody is safe
Previously we could get a raw pointer to a Mount object which might be
invalid when actually dereferencing it.
To ensure this could not happen, we should just use a callback that will
be used immediately after finding the appropriate Mount entry, while
holding the mount table lock.
2023-08-05 18:41:01 +02:00
Liav A
d216f780a4 Kernel/VFS: Remove the find_mount_for_guest method
We don't really need this method anymore, because we could just try to
find the mount entry based on the given mount point host custody.

This also allows us to remove the is_vfs_root and root_inode_id methods
from the VirtualFileSystem class.
2023-08-05 18:41:01 +02:00
Liav A
e5c7662638 Kernel/VFS: Check matching absolute path when jump to mount guest inode
We could easily encounter a case where we do the following:

```
mkdir -p /tmp2
mount /dev/hda /tmp2
```

would produce a bug that doing `ls /tmp2/tmp2` will give the contents
on `/dev/hda` ext2 root directory and also on `/tmp2/tmp2/tmp2` and so
on.
To prevent this, we must compare the current custody against each mount
entry's custody to ensure their paths match.
2023-08-05 18:41:01 +02:00
Liav A
80f400a150 Kernel/VFS: Don't resolve root inode mounts when traversing a directory
This is not useful, as we have literally zero knowledge about where this
inode is actually located at with respect to the entire global path tree
so we could easily encounter a case where we do the following:

```
mkdir -p /tmp2
mount /dev/hda /tmp2
```

and when traversing the /tmp2 directory entries, we will see the root
inode of /dev/hda on "/tmp2/tmp2", even if it was not mounted.

Therefore, we should just plainly give the raw directory entries as they
are written "on the disk". Anything else that needs to exactly know if
there's an underlying mounted filesystem, can just use the stat syscall
instead.
2023-08-05 18:41:01 +02:00
Liav A
debbfe07fb Kernel/VFS: Ensure Custodies' absolute path don't match before mounting
This ensures that the host mount point custody path is not the same like
the new to-be-mounted custody.

A scenario that could happen before adding this check is:
```
mkdir -p /tmp2
mount /dev/hda /tmp2/
mount /dev/hda /tmp2/
mount /dev/hda /tmp2/ # this will fail here
```

and after adding this check, the following scenario is now this:
```
mkdir -p /tmp2
mount /dev/hda /tmp2/
mount /dev/hda /tmp2/ # this will fail here
mount /dev/hda /tmp2/ # this will fail here too
```
2023-08-05 18:41:01 +02:00
Liav A
8da7d84512 Kernel/VFS: Remove misleading part of debug message when mounting 2023-08-05 18:41:01 +02:00
kleines Filmröllchen
8940552d1d Kernel/VirtualFileSystem: Allow unmounting via inode and mount path
This pair of information uniquely identifies any mount point, and it can
be used in situations where mount point custodies are not available.
2023-07-15 00:12:01 +02:00
kleines Filmröllchen
abc1eaff36 Kernel/VirtualFileSystem: Count bind mounts towards normal FS mountcount
This is correct since unmount doesn't treat bind mounts specially. If we
don't do this, unmounting bind mounts will call
prepare_for_last_unmount() on the guest FS much too early, which will
most likely fail due to a busy file system.
2023-07-15 00:12:01 +02:00
kleines Filmröllchen
8fb126bec6 Kernel/FileSystem: Pass last mount point guest inode to unmount prepare
This will be important later on when we check file system busyness.
2023-07-15 00:12:01 +02:00
Liav A
23a7ccf607 Kernel+LibCore+LibC: Split the mount syscall into multiple syscalls
This is a preparation before we can create a usable mechanism to use
filesystem-specific mount flags.
To keep some compatibility with userland code, LibC and LibCore mount
functions are kept being usable, but now instead of doing an "atomic"
syscall, they do multiple syscalls to perform the complete procedure of
mounting a filesystem.

The FileBackedFileSystem IntrusiveList in the VFS code is now changed to
be protected by a Mutex, because when we mount a new filesystem, we need
to check if a filesystem is already created for a given source_fd so we
do a scan for that OpenFileDescription in that list. If we fail to find
an already-created filesystem we create a new one and register it in the
list if we successfully mounted it. We use a Mutex because we might need
to initiate disk access during the filesystem creation, which will take
other mutexes in other parts of the kernel, therefore making it not
possible to take a spinlock while doing this.
2023-07-02 01:04:51 +02:00
Liav A
7c0540a229 Everywhere: Move global Kernel pattern code to Kernel/Library directory
This has KString, KBuffer, DoubleBuffer, KBufferBuilder, IOWindow,
UserOrKernelBuffer and ScopedCritical classes being moved to the
Kernel/Library subdirectory.

Also, move the panic and assertions handling code to that directory.
2023-06-04 21:32:34 +02:00
Liav A
1b04726c85 Kernel: Move all tasks-related code to the Tasks subdirectory 2023-06-04 21:32:34 +02:00
Liav A
b40b1c8d93 Kernel+Userland: Ensure proper unveil permissions before using rm/rmdir
When deleting a directory, the rmdir syscall should fail if the path was
unveiled without the 'c' permission. This matches the same behavior that
OpenBSD enforces when doing this kind of operation.

When deleting a file, the unlink syscall should fail if the path was
unveiled without the 'w' permission, to ensure that userspace is aware
of the possibility of removing a file only when the path was unveiled as
writable.

When using the userdel utility, we now unveil that directory path with
the unveil 'c' permission so removal of an account home directory is
done properly.
2023-06-02 17:53:55 +02:00
kleines Filmröllchen
939600d2d4 Kernel: Use UnixDateTime wherever applicable
"Wherever applicable" = most places, actually :^), especially for
networking and filesystem timestamps.

This includes changes to unzip, which uses DOSPackedTime, since that is
changed for the FAT file systems.
2023-05-24 23:18:07 +02:00
kleines Filmröllchen
213025f210 AK: Rename Time to Duration
That's what this class really is; in fact that's what the first line of
the comment says it is.

This commit does not rename the main files, since those will contain
other time-related classes in a little bit.
2023-05-24 23:18:07 +02:00
Timothy Flynn
3d4d0a1243 Kernel: Colorize log message for paths which haven't been unveiled
The log message can be hard to spot in a sea of debug messages. Colorize
it to make the message more immediately pop out.
2023-04-25 18:04:15 +02:00
Liav A
cbf78975f1 Kernel: Add the futimens syscall
We have a problem with the original utimensat syscall because when we
do call LibC futimens function, internally we provide an empty path,
and the Kernel get_syscall_path_argument method will detect this as an
invalid path.

This happens to spit an error for example in the touch utility, so if a
user is running "touch non_existing_file", it will create that file, but
the user will still see an error coming from LibC futimens function.

This new syscall gets an open file description and it provides the same
functionality as utimensat, on the specified open file description.
The new syscall will be used later by LibC to properly implement LibC
futimens function so the situation described with relation to the
"touch" utility could be fixed.
2023-04-10 10:21:28 +02:00
Andreas Kling
673592dea8 Kernel: Stop using *LockRefPtr for FileSystem pointers
There was only one permanent storage location for these: as a member
in the Mount class.

That member is never modified after Mount initialization, so we don't
need to worry about races there.
2023-04-04 10:33:42 +02:00
Andreas Kling
d1371d66f7 Kernel: Use non-locking {Nonnull,}RefPtr for OpenFileDescription
This patch switches away from {Nonnull,}LockRefPtr to the non-locking
smart pointers throughout the kernel.

I've looked at the handful of places where these were being persisted
and I don't see any race situations.

Note that the process file descriptor table (Process::m_fds) was already
guarded via MutexProtected.
2023-03-07 00:30:12 +01:00
Andreas Kling
7369d0ab5f Kernel: Stop using NonnullLockRefPtrVector 2023-03-06 23:46:36 +01:00
Liav A
39de5b7f82 Kernel: Actually check Process unveil data when creating perfcore dump
Before of this patch, we looked at the unveil data of the FinalizerTask,
which naturally doesn't have any unveil restrictions, therefore allowing
an unveil bypass for a process that enabled performance coredumps.

To ensure we always check the dumped process unveil data, an option to
pass a Process& has been added to a couple of methods in the class of
VirtualFileSystem.
2023-03-05 15:15:55 +00:00
Liav A
8266e40b35 Kernel/FileSystem: Don't assume flags for root filesystem mount flags
This is considered somewhat an abstraction layer violation, because we
should always let userspace to decide on the root filesystem mount flags
because it allows the user to configure the mount table to preferences
that they desire.
Now that SystemServer is modified to re-mount the root mount with the
desired flags, we can just mount the root filesystem without assuming
special flags.
2023-02-19 01:20:10 +01:00
Karol Kosek
8cfd445c23 Kernel: Allow to remove files from sticky directory if user owns it
It's what the Linux chmod(1) manpage says (in the 'Restricted Deletion
Flag or Sticky Bit' section), and it just makes sense to me. :^)
2023-01-24 20:13:30 +00:00
Taj Morton
20991a6a3c Kernel/FileSystem: Fix kernel panic during FS init or mount failure
Resolves issue where a panic would occur if the file system failed to
initialize or mount, due to how the FileSystem was already added to
VFS's list. The newly-created FileSystem destructor would fail as a
result of the object still remaining in the IntrusiveList.
2023-01-09 19:26:01 -07:00
Taj Morton
a91fc697bb Kernel/FileSystem: Remove FIXME about old/new path being the same
Added comment after confirming that Linux and OpenBSD implenment the
same behavior.
2023-01-04 09:02:13 +00:00
kleines Filmröllchen
a6a439243f Kernel: Turn lock ranks into template parameters
This step would ideally not have been necessary (increases amount of
refactoring and templates necessary, which in turn increases build
times), but it gives us a couple of nice properties:
- SpinlockProtected inside Singleton (a very common combination) can now
  obtain any lock rank just via the template parameter. It was not
  previously possible to do this with SingletonInstanceCreator magic.
- SpinlockProtected's lock rank is now mandatory; this is the majority
  of cases and allows us to see where we're still missing proper ranks.
- The type already informs us what lock rank a lock has, which aids code
  readability and (possibly, if gdb cooperates) lock mismatch debugging.
- The rank of a lock can no longer be dynamic, which is not something we
  wanted in the first place (or made use of). Locks randomly changing
  their rank sounds like a disaster waiting to happen.
- In some places, we might be able to statically check that locks are
  taken in the right order (with the right lock rank checking
  implementation) as rank information is fully statically known.

This refactoring even more exposes the fact that Mutex has no lock rank
capabilites, which is not fixed here.
2023-01-02 18:15:27 -05:00
Andreas Kling
16f934474f Kernel+Tests: Allow deleting someone else's file in my sticky directory
This should be allowed according to Dr. POSIX. :^)
2023-01-01 10:09:02 +01:00
Andreas Kling
47b9e8e651 Kernel: Annotate VirtualFileSystem::rmdir() errors with spec comments 2023-01-01 10:09:02 +01:00
Andreas Kling
8619f2c6f3 Kernel+Tests: Remove inaccurate FIXME in sys$rmdir()
We were already handling the rmdir("..") case by refusing to remove
directories that were not empty.

This patch removes a FIXME from January 2019 and adds a test. :^)
2023-01-01 10:09:02 +01:00
Andreas Kling
8d781d0216 Kernel+Tests: Make sys$rmdir() fail with EINVAL if basename is "."
Dr. POSIX says that we should reject attempts to rmdir() the file named
"." so this patch does exactly that. We also add a test.

This solves a FIXME from January 2019. :^)
2023-01-01 10:09:02 +01:00
Liav A
2e710de2f4 Kernel/FileSystem: Prevent symlink creation in veiled directory paths
Also, try to resolve the target path and check if it is allowed to be
accessed under the unveil rules.
2022-12-21 09:17:09 +00:00
sin-ack
2a502fe232 Kernel+LibC+LibCore+UserspaceEmulator: Implement faccessat(2)
Co-Authored-By: Daniel Bertalan <dani@danielbertalan.dev>
2022-12-11 19:55:37 -07:00
sin-ack
fa692e13f9 Kernel: Use real UID/GID when checking for file access
This aligns the rest of the system with POSIX, who says that access(2)
must check against the real UID and GID, not effective ones.
2022-12-11 19:55:37 -07:00
sin-ack
d5fbdf1866 Kernel+LibC+LibCore: Implement renameat(2)
Now with the ability to specify different bases for the old and new
paths.
2022-12-11 19:55:37 -07:00
Liav A
aa9fab9c3a Kernel/FileSystem: Convert the mount table from Vector to IntrusiveList
The fact that we used a Vector meant that even if creating a Mount
object succeeded, we were still at a risk that appending to the actual
mounts Vector could fail due to OOM condition. To guard against this,
the mount table is now an IntrusiveList, which always means that when
allocation of a Mount object succeeded, then inserting that object to
the list will succeed, which allows us to fail early in case of OOM
condition.
2022-12-09 23:29:33 -07:00