Fix incorrect casts of (e1000_rx_desc*) to (e1000_tx_desc*) in functions
related to frame receive path.
Previous code appears to work because only fields common to both the TX
and RX descriptor types are used in the affected code and happen to be
at the same offset inside the packed structs.
Unifying with other structures, this introduces m_payload for all
relevant structures, except the TCP header (which has a different
concept of payload than the rest).
This allows us to return a pointer directly instead of doing pointer
arithmetic to point to the end of the structure.
According to ARP.h, address lengths default to size of their
underlying types (IPv4Address and MACAddress respectively). We never
modify those values, and in reality, ARP never carries anything else.
Hence, those checks resolved to comparing sizeof to itself, which
probably gets optimized out of the compiler anyways.
This change removes the checks that will never fail, plus changes the
function definition to pass adapter directly, matching other handlers.
This commit adds the minimum amount of code to make Serenity reply to
ICMPv6 ping packets. This includes basic Network Discovery Protocol
support, as well as a basic ICMPv6 Echo Request/Response handler.
This change also introduces a new debug flag, ICMPV6_DEBUG.
Co-Authored-By: kleines Filmröllchen <filmroellchen@serenityos.org>
This commit introduces the first bit of IPv6 support into
SerenityOS. An IPv6 packet header structure allows NetworkTask to
recognize and log correctly shaped IPv6 packets, but nothing is done
with those yet. A new logging flag, IPV6_DEBUG, allows debugging of
IPv6 internals.
Co-Authored-By: sdomi <ja@sdomi.pl>
Up until now, protocol handlers for IPv4 and ARP had to rediscover the
adapter based on trying to match the address to all interfaces. This
approach is relatively slow, introduces more code duplication, and
doesn't work with multiple IPs per interface (as is the case with
IPv6).
This change introduces directly passing the adapter to all handlers
that need it, cutting out the retrieval process. Furthermore, we
discard the packet much faster if the IP doesn't match.
Up until now, the internet_checksum function lived in IPv4.h. Due to
its use in IPv6, a better place for it nowadays would be in IP.h.
Due to how it gets used in IPv6 scenarios, it makes sense to extend it
to run over multiple smaller structs instead of one big continous chunk
of memory. This change converts internet_checksum into a class with
methods "add" and "finish", which process data and return the final
result, respectively.
This commit also fixes proper hash computation for payloads that have
an odd length. Furthermore, the function was refactored to use
a ReadonlyBytes object instead of separate data and size.
Co-Authored-By: Wanda <wanda@phinode.net>
Commit ad73adef5d needlessly introduced an IPv4 directory.
If we were to keep it, sharing common headers between IPv4 and IPv6
would be much messier, and may potentially increase code duplication.
This change renames the IPv4 directory to IP to aid with later IPv6
porting efforts.
This enum is neither IPv4-specific, nor does it
have to do anything with the network layer, as the
name may imply. The enum is moved to a new header
containing common IPv4/IPv6 definitions.
This change modernizes various aspects of handle_icmp.
A few dbgln's got turned into dbgln_if's, in accordance with the
behavior throughout the rest of the file.
Furthermore, we revert a part of commit ad73adef5d which introduced
inconsistency between struct names. There's no ICMPv4, only ICMP and
ICMPv6. Thus, it makes more sense to revert the ICMPv4Type back to
ICMPType.
We also remove the FIXME about the TTL - as per RFC1700, TTL of 64 is
a recommended default, so the current behavior can stay.
C++ will jovially select the implicit conversion operator, even if it's
complete bogus, such as for unknown-size types or non-destructible
types. Therefore, all such conversions (which incur a copy) must
(unfortunately) be explicit so that non-copyable types continue to work.
This commit introduces very naive IPv6 autoconfiguration by just
setting fe80::{mac address} as the adapters link local address every
time the link state comes up. Note: this is currently not compliant
with RFC4862 which mandates Duplicate Address Detection, which is
missing from this implementation.
Co-authored-by: famfo <famfo@famfo.xyz>
Since we take a socket address with an address
type, this allows us to support setting an IPv6
address using an IPv4 socket without a
particularly hacky API. This deviates from Linux's
behavior (see
https://www.man7.org/linux/man-pages/man7/netdevice.7.html
) where AF_INET6 uses a completely different
control structure, but this doesn't seem
necessary.
This requires changing the sockaddr size to fit
sockaddr_in6, as the network ioctl's are the only
place where sockaddr is used with a fixed size
(and not with variable size data like in POSIX
APIs, which would support sockaddr_in6 without
changes).
ifconfig takes a new parameter for setting the
IPv6 address of an interface. The IPv4 address
short option '-i' is removed, as it only yields
confusion (which IP version is the default? and
usually such options are called -4 or -6, if they
exist) and isn't necessary thanks to the brief
long option name.
This commit's main purpose for now is to allow
participating in IPv6 NDP and pings without
requiring SLAC.
Co-authored-by: Dominique Liberda <ja@sdomi.pl>
Linux did the same thing 18 years ago and their reasons for the change
are similar to ours - https://github.com/torvalds/linux/commit/7d12e78
Most interrupt handlers (i.e. IRQ handlers) never used the register
state reference anywhere so there's simply no need of passing it around.
I didn't measure the performance boost but surely this change can't make
things worse anyway.
There's no point in constructing an object just for the sake of keeping
a state that can be touched by anything in the kernel code.
Let's reduce everything to be in a C++ namespace called with the
previous name "VirtualFileSystem" and keep a smaller textual-footprint
struct called "VirtualFileSystemDetails".
This change also cleans up old "friend class" statements that were no
longer needed, and move methods from the VirtualFileSystem code to more
appropriate places as well.
Please note that the method of locking all filesystems during shutdown
is removed, as in that place there's no meaning to actually locking all
filesystems because of running in kernel mode entirely.
The VFSRootContext class, as its name suggests, holds a context for a
root directory with its mount table and the root custody/inode in the
same class.
The idea is derived from the Linux mount namespace mechanism.
It mimicks the concept of the ProcessList object, but it is adjusted for
a root directory tree context.
In contrast to the ProcessList concept, processes that share the default
VFSRootContext can't see other VFSRootContext related properties such as
as the mount table and root custody/inode.
To accommodate to this change progressively, we internally create 2 main
VFS root contexts for now - one for kernel processes (as they don't need
to care about VFS root contexts for the most part), and another for all
userspace programs.
This separation allows us to continue pretending for userspace that
everything is "normal" as it is used to be, until we introduce proper
interfaces in the mount-related syscalls as well as in the SysFS.
We make VFSRootContext objects being listed, as another preparation
before we could expose interfaces to userspace.
As a result, the PowerStateSwitchTask now iterates on all contexts
and tear them down one by one.
We have many places in the kernel code that we have boolean flags that
are only set once, and never reset again but are checked multiple times
before and after the time they're being set, which matches the purpose
of the SetOnce class.
The following command was used to clang-format these files:
clang-format-18 -i $(find . \
-not \( -path "./\.*" -prune \) \
-not \( -path "./Base/*" -prune \) \
-not \( -path "./Build/*" -prune \) \
-not \( -path "./Toolchain/*" -prune \) \
-not \( -path "./Ports/*" -prune \) \
-type f -name "*.cpp" -o -name "*.mm" -o -name "*.h")
There are a couple of weird cases where clang-format now thinks that a
pointer access in an initializer list, e.g. `m_member(ptr->foo)`, is a
lambda return statement, and it puts spaces around the `->`.
These changes are compatible with clang-format 16 and will be mandatory
when we eventually bump clang-format version. So, since there are no
real downsides, let's commit them now.
RFC9293 states that from the TimeWait state the TCPSocket
should wait the MSL (2mins) for delayed segments to expire
so that their sequence numbers do not clash with a new
connection's sequence numbers using the same ip address
and port number. The wait also ensures the remote TCP peer
has received the ACK to their FIN segment.
Please be aware that I only have NIC with chip version 6 so
this is the only one that I have tested. Rest was implemented
via looking at Linux rtl8169 driver. Also thanks to IdanHo
for some initial work.
Instead of using C-arrays, and manually counting their lengths, use
AK::Array. And pass these arrays around as spans, instead of as pointer-
and-length pairs.
RFC9293 states that a closed socket should reply to all non-RST
packets with an RST. This change implements this behaviour as
specified in section 3.5.2 in bullet point 1.
According to RFC 9293 Section 3.6.1. Half-Closed Connections, we should
still accept incoming packets in the FinWait2 state. Additionally, we
didn't handle the FIN+ACK case. We should handle this the same we
handle the FIN flag. The ACK is only added to signify successful
reception of the last packet.
The specification uses awkward numbering, marking the first byte as 7,
and the last one as 0, which caused me to misunderstand their ordering,
and use the last byte's address as the first one, and so on.
This should allow us to eventually properly saturate high-bandwidth
network links when using TCP, once other nonoptimal parts of our
network stack are improved.
Instead of lying and claiming we always have space left in our receive
buffer, actually report the available space.
While this doesn't really affect network-bound workloads, it makes a
world of difference in cpu/disk-bound ones, like git clones. Resulting
in a considerable speed-up, and in some cases making them work at all.
(instead of the sender side hanging up the connection due to timeouts)
Previously we would incorrectly handle the (somewhat uncommon) case of
binding and then separately connecting a tcp socket to a server, as we
would register the socket during the manual bind(2) in the sockets by
tuple table, but our effective tuple would then change as the result of
the connect updating our target peer address. This would result in the
the entry not being removed from the table on destruction, which could
lead to a UAF.
We now make sure to update the table entry if needed during connects.
POSIX (rightfully so) specifies that the sendto address argument is
ignored in connection-oriented protocols.
The TCPSocket also assumed the peer address may not change post-connect
and would trigger a UAF in sockets_by_tuple() when it did.
POSIX requires that broadcast sends will only be allowed if the
SO_BROADCAST socket option was set on the socket.
Also, broadcast sends to protocols that do not support broadcast (like
TCP), should always fail.
The networking subsystem currently assumes all adapters are Ethernet
adapters, including the LoopbackAdapter, so all packets are pre-pended
with an Ethernet Frame header. Since the MTU must not include any
overhead added by the data-link (Ethernet in this case) or physical
layers, we need to subtract it from the MTU.
This fixes a kernel panic which occurs when sending a packet that is at
least 65523 bytes long through the loopback adapter, which results in
the kernel "receiving" a packet which is larger than the support MTU
out the other end. (As the actual final size was increased by the
addition of the ethernet frame header)
It seems like the current implementation returns 0 in case we do not
have enough data for a whole packet yet. The 0 value gets propagated
to the return value of the syscall which according to the spec
should return non-zero values for non-errors cases. This causes panic,
as there is a VERIFY guard checking that more than > 0 bytes are
written if no error has occurred.
Simplify core methods in the VirtIO bus handling code by ensuring proper
error propagation. This makes initialization of queues, handling changes
in device configuration, and other core patterns more readable as well.
It also allows us to remove the obnoxious pattern of checking for
boolean "success" and if we get false answer then returning an actual
errno code.
The VirtIO specification defines many types of devices with different
purposes, and it also defines 3 possible transport mediums where devices
could be connected to the host machine.
We only care about the PCIe transport, but this commit puts the actual
foundations for supporting the lean MMIO transport too in the future.
To ensure things are kept abstracted but still functional, the VirtIO
transport code is responsible for what is deemed as related to an actual
transport type - allocation of interrupt handlers and tinkering with low
level transport-related registers, etc.
This is an initial implementation, about as basic as intended by the
RFC, and not configurable from userspace at the moment. It should reduce
the amount of low-sized packets sent, reducing overhead and thereby
network traffic.