Similar to the last commit, two page faults to the same page can happen
concurrently, so we need to be able to handle this properly.
Previously, we only held the VMObject's lock in
AnonymousVMObject::handle_cow_fault. But as explained in the previous
commit this isn't enough. We need to keep holding the lock until the
page is remapped.
Additionally, this commit makes handle_cow_fault() take a global lock
to make concurrent page faults to the same page from different processes
work correctly.
This global lock is unfortunately necessary since fork() clones all
VMObjects, so taking its lock won't prevent another process from
handling a page fault to the same physical pages. See the FIXME comment
for details.
There might be a better way to fix this, but this fix should still be
better than potentially panicking when concurrent COW faults occur.
AnonymousVMObject::try_clone() computed how many shared cow pages to
commit by counting all VMObject pages that were not shared_zero_pages.
This means that lazy_committed_pages were also being included in the
count. This is a problem because the page fault handling code for
lazy_committed_pages does not allocate from
m_shared_committed_cow_pages. So more pages than necessary were being
committed.
This fixes this overcommitting problem by skipping lazy_committed_pages
when counting how many pages to commit.
After a fork(), page faults on anonymous mmaps can cause a redundant
page fault to occur.
This happens because VMObjects for anonymous mmaps are initially filled
with references to the lazy_committed_page or shared_zero_page. If there
is a fork, VMObject::try_clone() is called and all pages of the VMObject
are marked as cow (via the m_cow_map).
Page faults on a zero/lazy page are handled by handle_zero_fault().
handle_zero_fault() does not update m_cow_map, so if the page was marked
cow before the fault, it will still be marked cow after the fault. This
causes a second (redundant) page fault when the CPU retries the write.
This commit removes the redundant page fault by not marking zero/lazy
pages as cow in m_cow_map.
The methods try_create_with_size() and try_create_purgeable_with_size()
on AnonymousVMObject are almost identical, other than one member
that gets set (m_purgeable). This patch makes
try_create_purgeable_with_size() call try_create_with_size() so that
both methods re-use the same code.
In the VMObject code there are multiple examples of loops over
the VMObject's regions (using for_each_region()) that call remap()
on each region.
To clean up usage of this pattern, this patch adds a method in
VMObject that does this remapping loop. VMObject code that needs
to remap its regions call the new method.
Our existing AnonymousVMObject cloning flow contains an optimization
wherein purgeable VMObjects which are marked volatile during the clone
are created as a new zero-filled VMObject (as if it was purged), which
lets us skip the expensive COW process.
Unfortunately, one crucial part was missing: Marking the cloned region
as purged, (which is the value returned from madvise when unmarking the
region as volatile) so the userland logic was left unaware of the
effective zero-ing of their memory region, resulting in odd behaviour
and crashes in places like our malloc's large allocation support.
I believe this to be safe, as the main thing that LockRefPtr provides
over RefPtr is safe copying from a shared LockRefPtr instance. I've
inspected the uses of RefPtr<PhysicalPage> and it seems they're all
guarded by external locking. Some of it is less obvious, but this is
an area where we're making continuous headway.
You're still required to disable interrupts though, as the mappings are
per-CPU. This exposed the fact that our CR3 lookup map is insufficiently
protected (but we'll address that in a separate commit.)
Until now, our kernel has reimplemented a number of AK classes to
provide automatic internal locking:
- RefPtr
- NonnullRefPtr
- WeakPtr
- Weakable
This patch renames the Kernel classes so that they can coexist with
the original AK classes:
- RefPtr => LockRefPtr
- NonnullRefPtr => NonnullLockRefPtr
- WeakPtr => LockWeakPtr
- Weakable => LockWeakable
The goal here is to eventually get rid of the Lock* classes in favor of
using external locking.
As soon as we've saved CR2 (the faulting address), we can re-enable
interrupt processing. This should make the kernel more responsive under
heavy fault loads.
Uncommitted pages (shared zero pages) can not contain any existing data
and can not be modified, so there's no point to committing a bunch of
extra pages to cover for them in the forked child.
Since both the parent process and child process hold a reference to the
COW committed set, once the child process exits, the committed COW
pages are effectively leaked, only being slowly re-claimed each time
the parent process writes to one of them, realizing it's no longer
shared, and uncommitting it.
In order to mitigate this we now hold a weak reference the parent
VMObject from which the pages are cloned, and we use it on destruction
when available to drop the reference to the committed set from it as
well.
If someone specifically wants contiguous memory in the low-physical-
address-for-DMA range ("super pages"), they can use the
allocate_dma_buffer_pages() helper.
This commit moves the allocation of the resources required for
AnonymousVMObject from its constructors to its factory functions.
We're making this change to expose the fallibility of the allocation.
This commit moves the allocation of the resources required for VMObject
from its constructors to the constructors of its child classes.
We're making this change to give the child classes the chance to expose
the fallibility of the allocation.
We now use AK::Error and AK::ErrorOr<T> in both kernel and userspace!
This was a slightly tedious refactoring that took a long time, so it's
not unlikely that some bugs crept in.
Nevertheless, it does pass basic functionality testing, and it's just
real nice to finally see the same pattern in all contexts. :^)
And also try_create<T> => try_make_ref_counted<T>.
A global "create" was a bit much. The new name matches make<T> better,
which we've used for making single-owner objects since forever.
The quickmap_page() and unquickmap_page() functions are used to map a
single physical page at a kernel virtual address for temporary access.
These use the per-CPU quickmap buffer in the page tables, and access to
this is guarded by the MM lock. To prevent bugs, quickmap_page() should
not *take* the MM lock, but rather verify that it is already held!
This exposed two situations where we were using quickmap without holding
the MM lock during page fault handling. This patch is forced to fix
these issues (which is great!) :^)
This makes for nicer handling of errors compared to checking whether a
RefPtr is null. Additionally, this will give way to return different
types of errors in the future.