Files
ladybird/Libraries/LibJS/Bytecode/Validator.cpp
Andreas Kling 0b8fbc03ef LibJS: Add per-field bytecode validation generated from Bytecode.def
Pass 2 of the validator now runs a per-instruction check that walks
each opcode's fields and verifies every reference points somewhere
sensible. Operand indices, label addresses, identifier/string/
property-key/regex table indices, cache indices, and trailing
operand arrays are all bound-checked against the values the C++
side carries on the Executable. Fields whose bound depends on an
enum variant count or other type information not present in
Bytecode.def are left for a follow-up.

The codegen lives in build.rs and reuses the existing layout
machinery from the bytecode_def crate, so each opcode gets a match
arm whose body reads each field at its known byte offset and calls
the right hand-written validate_* helper. Variable-length
instructions cross-check the count field against m_length before
iterating the trailing array, which guards against an attacker
sneaking a count that walks off the end of the instruction.

Note that the encoded operand format is a flat u32 index into the
runtime [registers | locals | constants | arguments] array, since
Operand::offset_index_by zeroes the 3-bit type tag during assembly.
The validator therefore range-checks the flat index rather than
reading the type tag and dispatching per kind.

The argument-count upper bound isn't tracked on Executable yet, so
arguments remain effectively unbounded; tightening that bound is
left for a later commit.

Cache pointer fields are validated only when before_cache_fixup is
true, since after the fixup pass they hold real pointers and must
be left alone. NewFunction and NewClass have plain u32 fields for
shared-function-data and class-blueprint indices; those are
recognized by name in the codegen so the indices still get
range-checked.

The error category enum is renumbered to drop the per-operand-kind
codes, since at the bytecode level we no longer differentiate.
2026-05-03 08:43:19 +02:00

4.8 KiB