Commit Graph

5 Commits

Author SHA1 Message Date
Andreas Kling
47c82b38d0 LibJS: Range-check enum-typed bytecode fields in the validator
The validator now bounds-checks the five enum-shaped field types
that appear in Bytecode.def: Completion::Type, IteratorHint,
EnvironmentMode, PutKind, and ArgumentsKind. The codegen
recognizes each by its .def type name and emits a u32 read plus a
range check against the corresponding variant count.

The variant counts ride across the FFI as new fields on
FFIValidatorBounds rather than being hardcoded on the Rust side,
so the Rust validator never has to know which variants the C++
enum currently defines. The C++ side computes each count as
`to_underlying(LastVariant) + 1` with a static_assert pinning the
expected value, so adding or removing a variant in any of these
enums fails the build until the validator is updated.
2026-05-03 08:43:19 +02:00
Andreas Kling
2782fa1559 LibJS: Tighten the bytecode validator's argument operand bound
Until now the validator passed `u32::MAX` as the argument-region
upper bound because nothing on Executable tracked how many
argument slots a given bytecode buffer might reference. That left
the largest validation hole open: any flat operand index above
`registers + locals + constants` slid through the check.

The Rust assembler already walks every operand during phase 1 so
it can offset each one into the runtime's flat layout. This commit
piggybacks on that walk to record the highest `Operand::argument`
index touched and surfaces `(max + 1)` (or zero if no argument is
ever referenced) on `AssembledBytecode`. The value rides through
`FFIExecutableData` onto a new `Executable::number_of_arguments`
field, which `Validator.cpp` then feeds into `FFIValidatorBounds`.

The bound is now tight: every operand index in the encoded stream
is range-checked against the actual runtime array size, including
the argument region.
2026-05-03 08:43:19 +02:00
Andreas Kling
d3ca680a62 LibJS: Validate basic blocks, exception handlers, and source map
Pass 3 cross-checks the structural metadata stored alongside the
bytecode buffer on Executable against the offset set built during
Pass 1. Every basic block start offset must point at an instruction
boundary; exception handler start, end, and handler offsets must
either be at an instruction boundary or, for the inclusive-start /
exclusive-end pair, equal to the bytecode length; source map
entries must do the same.

Of these, the exception handler's handler_offset is the safety-
critical one for the disk-cache use case: a corrupted offset there
sends control flow into the middle of an instruction. The other
checks tighten the cache-load surface area and catch obvious file
corruption.

The metadata is plumbed across the FFI as a separate
FFIValidatorExtras struct so the validator entry point keeps the
single-call shape, with a flat-offset mirror struct for exception
handlers since the original carries no source data we need.
2026-05-03 08:43:19 +02:00
Andreas Kling
0b8fbc03ef LibJS: Add per-field bytecode validation generated from Bytecode.def
Pass 2 of the validator now runs a per-instruction check that walks
each opcode's fields and verifies every reference points somewhere
sensible. Operand indices, label addresses, identifier/string/
property-key/regex table indices, cache indices, and trailing
operand arrays are all bound-checked against the values the C++
side carries on the Executable. Fields whose bound depends on an
enum variant count or other type information not present in
Bytecode.def are left for a follow-up.

The codegen lives in build.rs and reuses the existing layout
machinery from the bytecode_def crate, so each opcode gets a match
arm whose body reads each field at its known byte offset and calls
the right hand-written validate_* helper. Variable-length
instructions cross-check the count field against m_length before
iterating the trailing array, which guards against an attacker
sneaking a count that walks off the end of the instruction.

Note that the encoded operand format is a flat u32 index into the
runtime [registers | locals | constants | arguments] array, since
Operand::offset_index_by zeroes the 3-bit type tag during assembly.
The validator therefore range-checks the flat index rather than
reading the type tag and dispatching per kind.

The argument-count upper bound isn't tracked on Executable yet, so
arguments remain effectively unbounded; tightening that bound is
left for a later commit.

Cache pointer fields are validated only when before_cache_fixup is
true, since after the fixup pass they hold real pointers and must
be left alone. NewFunction and NewClass have plain u32 fields for
shared-function-data and class-blueprint indices; those are
recognized by name in the codegen so the indices still get
range-checked.

The error category enum is renumbered to drop the per-operand-kind
codes, since at the bytecode level we no longer differentiate.
2026-05-03 08:43:19 +02:00
Andreas Kling
d4ed658429 LibJS: Add bytecode validator scaffolding driven from Bytecode.def
The plan is to start caching compiled JS bytecode on disk. Before
loading anything from a cache we need confidence that the bytes are
structurally well-formed, since a corrupted or tampered-with cache
file could otherwise hand the interpreter an out-of-bounds jump or a
constant-pool index that points past the end of the table.

This commit lays down the scaffolding for that validator. The walker
lives in Rust (Libraries/LibJS/Rust/src/bytecode/validator.rs) so
that it can share the existing Bytecode.def-driven layout machinery
with the encoder. C++ calls into it through cbindgen, the same way
the rest of the Rust pipeline is wired up.

For now, the validator only does Pass 1: walk the byte stream,
verify each instruction is 8-byte aligned, the opcode byte is in
range, and the reported length keeps us inside the buffer. The
length lookup is generated from Bytecode.def so fixed-length and
variable-length instructions stay in sync with the rest of the
codegen automatically. Per-field bounds checks (operands, labels,
table indices, cache indices) and structural extras (basic block
offsets, exception handlers, source map) come in follow-up commits.

The validator runs after every successful compilation in debug and
sanitizer builds, gated on !NDEBUG || HAS_ADDRESS_SANITIZER, so we
get an extra sanity check on every executable the encoder produces
without paying for it in release builds. Failure trips a
VERIFY_NOT_REACHED with the offset, opcode, and error category
logged via dbgln().
2026-05-03 08:43:19 +02:00