Commit Graph

106 Commits

Author SHA1 Message Date
Sönke Holz
577bc7ef95 Kernel/MM: Handle concurrent page faults properly in handle_zero_fault()
This essentially reverts 5ada38f9c3.

Previously, two threads could end up trying to allocate a committed
page at once, possibly resulting in a panic because we tried to
allocate more pages than committed.

Another problem was that a thread could incorrectly think that the page
fault was already handled. This can happen if the thread handling the
page fault already set the physical page slot to the newly allocated
page, but didn't remap the page yet. We check if a page fault was
already processed based on the physical page slot contents.
This issue is not causing problems currently, since thinking a page
fault was already handled and incorrectly returning will still work
eventually when the other thread is done remapping the page.
However, a future commit will add extra assertions checking that page
faults were already handled appropriately if we couldn't find a reason
for the fault. These assertions would trip on this.

Prevent these issues by taking the lock for a longer amount of time.
There might be a better solution to this, but that would likely require
more complex code changes.

Also modify the code in handle_fault() a bit to avoid using should_cow()
for zero faults. The checks in should_cow() can refer to a different
physical page if the page fault was handled immediately after the check.
2026-01-22 12:47:45 +01:00
Sönke Holz
23de6e93bd Kernel/MM: Unify the non-RISC-V and RISC-V page fault handler
Instead of having to maintain two different page fault handler
implementations, let's unify the two by using the more generic RISC-V
implementation.
The RISC-V implementation doesn't depend on the processor providing the
reason why a page fault occured.

We don't need to know whether it's a NotPresent or ProtectionViolation
fault to handle it correctly, as we already have enough metadata.
2026-01-22 12:47:45 +01:00
Sönke Holz
76b1277f81 Kernel/MM: Always print the faulting address in the page fault handler
Only some dbgln()s were previously not printing fault.vaddr().
2026-01-22 12:47:45 +01:00
Sönke Holz
0759eefaed Kernel/MM: Make RISC-V page fault implementation checks exhaustive
This causes us to print a more useful error message than "Unexpected
page fault".
Additionally, this change will be necessary in a future commit, which
expects us to handle all reasons for page faults exhaustively.
2026-01-22 12:47:45 +01:00
Sönke Holz
e5f671a71c Kernel/riscv64: Set the access mode to Read for instruction page faults
This matches the x86-64 and AArch64 behavior.

This required moving the is_instruction_fetch() check before the
is_read() check, since is_read() is now also true for instruction fetch
page faults.
2026-01-22 12:47:45 +01:00
Sönke Holz
dcca347b0b Kernel/MM: Rename handle_{dirty_on => inode}_write_fault
... and always use it for inode write faults.

In the RISC-V page fault implementation, we previously only used this
function if `should_dirty_on_write()` returned true. If it didn't, we'd
use `handle_inode_fault()`. But this shouldn't be necessary, since
`handle_dirty_on_write_fault()` already handles cases where the page
isn't mapped yet.

This should now also cause a page to be immediately marked as dirty
if the first access to it was a write. Previously, this should have
caused two page faults: one for loading the inode page and one for
making it dirty.
2026-01-22 12:47:45 +01:00
Sönke Holz
2a2df29687 Kernel/MM: Remove unreachable code from Region::handle_fault()
This code in the x86-64 and AArch64 page fault handler should be
unreachable, since Region::map_individual_page_impl() always maps all
non-null physical pages, therefore we should never get a PageNotPresent
fault if the page slot is set to a non-null value.

Lazy committed pages are implemented by mapping them read-only, so a
write access to them will result in a ProtectionViolation fault, which
will call very similar code in Region::handle_zero_fault().

Similarly in the RISC-V version, we already handle lazy committed page
faults by calling Region::handle_zero_fault() a couple lines earlier
if the page fault was generated by a write access and the region is
writable.
2026-01-22 12:47:45 +01:00
implicitfield
2039c17902 Kernel/Memory: Prefer non-recursive Spinlocks 2025-06-05 22:02:40 +02:00
Sönke Holz
a57cad622b Kernel/MM: Enforce W^X more strictly again
The stricter W^X protection introduced by af3d3c5c4a was accidentally
broken by 5194ab59b5, since it didn't set the shadow permission bits
to the initial Region permissions.
2025-04-27 17:15:07 +02:00
NoobZang
2e5560e96f Kernel: Remove unnecessary shift operations in Region constructor
The shift operations were originally introduced in af3d3c5 to
record the permission bits set for Region, but now has been
replaced by the method introduced in 5194ab59b, so the shift
operations can be removed.
2025-04-27 12:18:07 +02:00
Sönke Holz
1af69c3782 Kernel/riscv64: Don't use __builtin___clear_cache on RISC-V
Clang compiles that builtin to an abort for freestanding environments
because RISC-V does not have an instruction to the flush the instruction
cache for all harts.

We don't support SMP on RISC-V currently, so simply use a `fence.i` for
now.
2025-03-16 11:27:27 +01:00
Sönke Holz
d73a8fe750 Kernel/MM: Synchronize executable memory after handling an inode fault
ARM requires an explicit cache flush when modifying executable memory:
https://community.arm.com/arm-community-blogs/b/architectures-and-processors-blog/posts/caches-and-self-modifying-code

__builtin___clear_cache compiles to a no-op if no explicit flushing is
needed (like on x86):
https://gcc.gnu.org/onlinedocs/gcc/Other-Builtins.html#index-_005f_005fbuiltin_005f_005f_005fclear_005fcache
2025-01-13 19:30:54 +01:00
Sönke Holz
b3bae90e71 Kernel/x86: Set the WC PAT memory type for MemoryType::NonCacheable
This removes the old Region::set_write_combine function and replaces
all usages of it with setting the MemoryType to NonCacheable instead.
2024-11-23 19:29:50 +01:00
Sönke Holz
d3a0ae5c57 Kernel/MM: Replace Region::Cacheable with a more generic MemoryType enum
This replaces all usages of Cacheable::Yes with MemoryType::Normal and
Cacheable::No with either MemoryType::NonCacheable or MemoryType::IO,
depending on the context.

The Page{Directory,Table}::set_cache_disabled function therefore also
has been replaced with a more appropriate set_memory_type_function.
Adding a memory_type "getter" would not be as easy, as some
architectures may not support all memory types, so getting the memory
type again may be a lossy conversion. The is_cache_disabled function
was never used, so just simply remove it altogether.

There is no difference between MemoryType::NonCacheable and
MemoryType::IO on x86 for now.

Other architectures currently don't respect the MemoryType at all.
2024-11-23 19:29:50 +01:00
brody-qq
a0b021cbcf Kernel/Memory: Fix crash on writes to shared file mmaps
Writes to SharedInodeVMObjects could cause a Protection Violation if a
page was marked as dirty by a different process.

This happened due to a combination of 2 things:
* handle_dirty_on_write_fault() was skipped if a page was already marked
  as dirty
* when a page was marked as dirty, only the Region that caused the page
  fault was remapped

This commit:
* fixes the crash by making handle_fault() stop checking if a page was
  marked dirty before running handle_dirty_on_write_fault()
* modifies handle_dirty_on_write_fault() so that it always marks the
  page as dirty and remaps the page (this avoids a 2nd bug that was
  never hit due to the 1st bug)
2024-08-10 16:19:12 +02:00
brody-qq
faa6395a11 Kernel/Memory: Add more efficient method for remapping single page
This commit introduces VMObject::remap_regions_single_page(). This
method remaps a single page in all regions associated with a VMObject.
This is intended to be a more efficient replacement for remap_regions()
in cases where only a single page needs to be remapped.

This commit also updates the cow page fault handling code to use this
new method.
2024-07-12 08:52:06 -04:00
brody-qq
e14f954988 Kernel/Memory: Fix shared anonymous mmap changes not being shared
Writes to a MAP_SHARED | MAP_ANONYMOUS mmap region were not visible to
other processes sharing the mmap region. This was happening because the
page fault handler was not remapping the VMObject's m_regions after
allocating a new page.

This commit fixes the problem by calling remap_regions() after assigning
a new page to the VMObject in the page fault handler. This remapping
only occurs for shared Regions.
2024-07-12 08:52:06 -04:00
brody-qq
781ded408b Kernel/Memory: Small refactor of handle_zero_fault()
This commit makes the following minor changes to handle_zero_fault():
* cleans up a call to static_cast(), replacing it with a reference (a
  future commit will also use this reference).
* replaces a call to vmobject() with the new reference mentioned above.
* moves the definition of already_handled to inside the block where
  already_handled is used.
2024-07-12 08:52:06 -04:00
brody-qq
2278b17c42 Kernel/Memory: Remove cow map updates from try_allocate_split_region()
AddressSpace::try_allocate_split_region() was updating the cow map of
new_region based on the cow map of source_region.

The problem is that both new_region and source_region reference the
same vmobject and the same cow map, so these cow map updates didn't
actually change anything.

This commit:
* removes the cow map updates from try_allocate_split_region()
* removes Region::set_should_cow() since it is no longer used
2024-07-12 08:52:06 -04:00
brody-qq
3e9b269bcd Kernel/Memory: Make mmap objects track dirty pages
InodeVMObjects now track dirty and clean pages. This tracking of
dirty and clean pages is used by the msync and purge syscalls.

dirty page tracking works using the following rules:
* when a new InodeVMObject is made, all pages are marked clean.
* writes to clean InodeVMObject pages will cause a page fault,
  the fault handler will mark the page as dirty.
* writes to dirty InodeVMObject pages do not cause page faults.
* if msync is called, only dirty pages are flushed to storage (and
  marked clean).
* if purge syscall is called, only clean pages are discarded.
2024-07-07 18:25:32 +02:00
Idan Horowitz
3aa1bd520b Kernel: Support re-mapping MMIOVMObject-backed regions
This is required for example when write combine is enabled on a region
after the initial mapping.
2024-06-25 17:46:37 +02:00
brody-qq
6f6966fb55 Kernel: Remove redundant VERIFY()
Removes a VERIFY() that is already checked earlier in the function
2024-06-05 20:18:44 +01:00
Idan Horowitz
26cff62a0a Kernel: Rename Memory::PhysicalPage to Memory::PhysicalRAMPage
Since these are now only used to represent RAM pages, (and not MMIO
pages) rename them to make their purpose more obvious.
2024-05-17 15:38:28 -06:00
Idan Horowitz
827322c139 Kernel: Stop allocating physical pages for mapped MMIO regions
As MMIO is placed at fixed physical addressed, and does not need to be
backed by real RAM physical pages, there's no need to use PhysicalPage
instances to track their pages.
This results in slightly reduced allocations, but more importantly
makes MMIO addresses which end up after the normal RAM ranges work,
like 64-bit PCI BARs usually are.
2024-05-17 15:38:28 -06:00
Sönke Holz
6654021655 Kernel/riscv64: Don't hard-code the page fault reason on RISC-V
Instead, rewrite the region page fault handling code to not use
PageFault::type() on RISC-V.

I split Region::handle_fault into having a RISC-V-specific
implementation, as I am not sure if I cover all page fault handling edge
cases by solely relying on MM's own region metadata.
We should probably also take the processor-provided page fault reason
into account, if we decide to merge these two implementations in the
future.
2024-03-25 14:18:38 -06:00
Liav A
336fb4f313 Kernel: Move InterruptDisabler to the Interrupts subdirectory 2023-06-04 21:32:34 +02:00
Liav A
7c0540a229 Everywhere: Move global Kernel pattern code to Kernel/Library directory
This has KString, KBuffer, DoubleBuffer, KBufferBuilder, IOWindow,
UserOrKernelBuffer and ScopedCritical classes being moved to the
Kernel/Library subdirectory.

Also, move the panic and assertions handling code to that directory.
2023-06-04 21:32:34 +02:00
Liav A
1b04726c85 Kernel: Move all tasks-related code to the Tasks subdirectory 2023-06-04 21:32:34 +02:00
Idan Horowitz
1c2dbed38a Kernel: Extend the lifetime of Regions during page fault handling
Previously we had a race condition in the page fault handling: We were
relying on the affected Region staying alive while handling the page
fault, but this was not actually guaranteed, as an munmap from another
thread could result in the region being removed concurrently.

This commit closes that hole by extending the lifetime of the region
affected by the page fault until the handling of the page fault is
complete. This is achieved by maintaing a psuedo-reference count on the
region which counts the number of in-progress page faults being handled
on this region, and extending the lifetime of the region while this
counter is non zero.
Since both the increment of the counter by the page fault handler and
the spin loop waiting for it to reach 0 during Region destruction are
serialized using the appropriate AddressSpace spinlock, eventual
progress is guaranteed: As soon as the region is removed from the tree
no more page faults on the region can start.
And similarly correctness is ensured: The counter is incremented under
the same lock, so any page faults that are being handled will have
already incremented the counter before the region is deallocated.
2023-04-06 20:30:03 +03:00
Timon Kruiper
697c5ca5e5 Kernel: Move Memory/PageDirectory.{cpp,h} to arch-specific directory
The handling of page tables is very architecture specific, so belongs
in the Arch directory. Some parts were already architecture-specific,
however this commit moves the rest of the PageDirectory class into the
Arch directory.

While we're here the aarch64/PageDirectory.{h,cpp} files are updated to
be aarch64 specific, by renaming some members and removing x86_64
specific code.
2023-01-27 11:41:43 +01:00
Ben Wiederhake
65b420f996 Everywhere: Remove unused includes of AK/Memory.h
These instances were detected by searching for files that include
AK/Memory.h, but don't match the regex:

\\b(fast_u32_copy|fast_u32_fill|secure_zero|timing_safe_compare)\\b

This regex is pessimistic, so there might be more files that don't
actually use any memory function.

In theory, one might use LibCPP to detect things like this
automatically, but let's do this one step after another.
2023-01-02 20:27:20 -05:00
kleines Filmröllchen
a6a439243f Kernel: Turn lock ranks into template parameters
This step would ideally not have been necessary (increases amount of
refactoring and templates necessary, which in turn increases build
times), but it gives us a couple of nice properties:
- SpinlockProtected inside Singleton (a very common combination) can now
  obtain any lock rank just via the template parameter. It was not
  previously possible to do this with SingletonInstanceCreator magic.
- SpinlockProtected's lock rank is now mandatory; this is the majority
  of cases and allows us to see where we're still missing proper ranks.
- The type already informs us what lock rank a lock has, which aids code
  readability and (possibly, if gdb cooperates) lock mismatch debugging.
- The rank of a lock can no longer be dynamic, which is not something we
  wanted in the first place (or made use of). Locks randomly changing
  their rank sounds like a disaster waiting to happen.
- In some places, we might be able to statically check that locks are
  taken in the right order (with the right lock rank checking
  implementation) as rank information is fully statically known.

This refactoring even more exposes the fact that Mutex has no lock rank
capabilites, which is not fixed here.
2023-01-02 18:15:27 -05:00
Timon Kruiper
9827c11d8b Kernel: Move InterruptDisabler out of Arch directory
The code in this file is not architecture specific, so it can be moved
to the base Kernel directory.
2022-10-17 20:11:31 +02:00
Liav A
60b088b89a Kernel: Send SIGBUS to threads that use after valid Inode mmaped range
According to Dr. POSIX, we should allow to call mmap on inodes even on
ranges that currently don't map to any actual data. Trying to read or
write to those ranges should result in SIGBUS being sent to the thread
that did violating memory access.

To implement this restriction, we simply check if the result of
read_bytes on an Inode returns 0, which means we have nothing valid to
map to the program, hence it should receive a SIGBUS in that case.
2022-09-26 20:00:34 +03:00
Liav A
6e26e9fb29 Revert "Kernel: Send SIGBUS to threads that use after valid Inode mmaped range"
This reverts commit 0c675192c9.
2022-09-24 13:49:40 +02:00
Liav A
0c675192c9 Kernel: Send SIGBUS to threads that use after valid Inode mmaped range
According to Dr. POSIX, we should allow to call mmap on inodes even on
ranges that currently don't map to any actual data. Trying to read or
write to those ranges should result in SIGBUS being sent to the thread
that did violating memory access.
2022-09-16 14:55:45 +03:00
Andreas Kling
a3b2b20782 Kernel: Remove global MM lock in favor of SpinlockProtected
Globally shared MemoryManager state is now kept in a GlobalData struct
and wrapped in SpinlockProtected.

A small set of members are left outside the GlobalData struct as they
are only set during boot initialization, and then remain constant.
This allows us to access those members without taking any locks.
2022-08-26 01:04:51 +02:00
Andreas Kling
2c72d495a3 Kernel: Use RefPtr instead of LockRefPtr for PhysicalPage
I believe this to be safe, as the main thing that LockRefPtr provides
over RefPtr is safe copying from a shared LockRefPtr instance. I've
inspected the uses of RefPtr<PhysicalPage> and it seems they're all
guarded by external locking. Some of it is less obvious, but this is
an area where we're making continuous headway.
2022-08-24 18:35:41 +02:00
Andreas Kling
d3e8eb5918 Kernel: Make file-backed memory regions remember description permissions
This allows sys$mprotect() to honor the original readable & writable
flags of the open file description as they were at the point we did the
original sys$mmap().

IIUC, this is what Dr. POSIX wants us to do:
https://pubs.opengroup.org/onlinepubs/9699919799/functions/mprotect.html

Also, remove the bogus and racy "W^X" checking we did against mappings
based on their current inode metadata. If we want to do this, we can do
it properly. For now, it was not only racy, but also did blocking I/O
while holding a spinlock.
2022-08-24 14:57:51 +02:00
Andreas Kling
d6ef18f587 Kernel: Don't hog the MM lock while unmapping regions
We were holding the MM lock across all of the region unmapping code.
This was previously necessary since the quickmaps used during unmapping
required holding the MM lock.

Now that it's no longer necessary, we can leave the MM lock alone here.
2022-08-24 14:57:51 +02:00
Andreas Kling
6cd3695761 Kernel: Stop taking MM lock while using regular quickmaps
You're still required to disable interrupts though, as the mappings are
per-CPU. This exposed the fact that our CR3 lookup map is insufficiently
protected (but we'll address that in a separate commit.)
2022-08-22 17:56:03 +02:00
Andreas Kling
c8375c51ff Kernel: Stop taking MM lock while using PD/PT quickmaps
This is no longer required as these quickmaps are now per-CPU. :^)
2022-08-22 17:56:03 +02:00
Andreas Kling
11eee67b85 Kernel: Make self-contained locking smart pointers their own classes
Until now, our kernel has reimplemented a number of AK classes to
provide automatic internal locking:

- RefPtr
- NonnullRefPtr
- WeakPtr
- Weakable

This patch renames the Kernel classes so that they can coexist with
the original AK classes:

- RefPtr => LockRefPtr
- NonnullRefPtr => NonnullLockRefPtr
- WeakPtr => LockWeakPtr
- Weakable => LockWeakable

The goal here is to eventually get rid of the Lock* classes in favor of
using external locking.
2022-08-20 17:20:43 +02:00
Andreas Kling
5ada38f9c3 Kernel: Reduce time under VMObject lock while handling zero faults
We only need to hold the VMObject lock while inspecting and/or updating
the physical page array in the VMObject.
2022-08-19 12:52:48 +02:00
Andreas Kling
a84d893af8 Kernel/x86: Re-enable interrupts ASAP when handling page faults
As soon as we've saved CR2 (the faulting address), we can re-enable
interrupt processing. This should make the kernel more responsive under
heavy fault loads.
2022-08-19 12:14:57 +02:00
Andreas Kling
4bc3745ce6 Kernel: Make Region's physical page accessors safer to use
Region::physical_page() now takes the VMObject lock while accessing the
physical pages array, and returns a RefPtr<PhysicalPage>. This ensures
that the array access is safe.

Region::physical_page_slot() now VERIFY()'s that the VMObject lock is
held by the caller. Since we're returning a reference to the physical
page slot in the VMObject's physical page array, this is the best we
can do here.
2022-08-18 19:20:33 +02:00
Andreas Kling
b560442fe1 Kernel: Don't hog VMObject lock when remapping a region page
We really only need the VMObject lock when accessing the physical pages
array, so once we have a strong pointer to the physical page we want to
remap, we can give up the VMObject lock.

This fixes a deadlock I encountered while building DOOM on SMP.
2022-08-18 18:56:35 +02:00
Andreas Kling
10399a258f Kernel: Move Region physical page accessors out of line 2022-08-18 18:52:34 +02:00
Andreas Kling
75348bdfd3 Kernel: Don't require MM lock for Region::set_page_directory()
The MM lock is not required for this, it's just a simple ref-counted
pointer assignment.
2022-08-18 18:52:34 +02:00
Andreas Kling
27c1135d30 Kernel: Don't remap all regions from Region::remap_vmobject_page()
When handling a page fault, we only need to remap the faulting region in
the current process. There's no need to traverse *all* regions that map
the same VMObject and remap them cross-process as well.

Those other regions will get remapped lazily by their own page fault
handlers eventually. Or maybe they won't and we avoided some work. :^)
2022-08-18 18:52:34 +02:00