ladybird

eliott/ladybird

Fork 0

mirror of https://github.com/LadybirdBrowser/ladybird synced 2026-05-12 01:46:46 +02:00

Commit Graph

Author	SHA1	Message	Date
Andreas Kling	d3ca680a62	LibJS: Validate basic blocks, exception handlers, and source map Pass 3 cross-checks the structural metadata stored alongside the bytecode buffer on Executable against the offset set built during Pass 1. Every basic block start offset must point at an instruction boundary; exception handler start, end, and handler offsets must either be at an instruction boundary or, for the inclusive-start / exclusive-end pair, equal to the bytecode length; source map entries must do the same. Of these, the exception handler's handler_offset is the safety- critical one for the disk-cache use case: a corrupted offset there sends control flow into the middle of an instruction. The other checks tighten the cache-load surface area and catch obvious file corruption. The metadata is plumbed across the FFI as a separate FFIValidatorExtras struct so the validator entry point keeps the single-call shape, with a flat-offset mirror struct for exception handlers since the original carries no source data we need.	2026-05-03 08:43:19 +02:00
Andreas Kling	0b8fbc03ef	LibJS: Add per-field bytecode validation generated from Bytecode.def Pass 2 of the validator now runs a per-instruction check that walks each opcode's fields and verifies every reference points somewhere sensible. Operand indices, label addresses, identifier/string/ property-key/regex table indices, cache indices, and trailing operand arrays are all bound-checked against the values the C++ side carries on the Executable. Fields whose bound depends on an enum variant count or other type information not present in Bytecode.def are left for a follow-up. The codegen lives in build.rs and reuses the existing layout machinery from the bytecode_def crate, so each opcode gets a match arm whose body reads each field at its known byte offset and calls the right hand-written validate_* helper. Variable-length instructions cross-check the count field against m_length before iterating the trailing array, which guards against an attacker sneaking a count that walks off the end of the instruction. Note that the encoded operand format is a flat u32 index into the runtime [registers \| locals \| constants \| arguments] array, since Operand::offset_index_by zeroes the 3-bit type tag during assembly. The validator therefore range-checks the flat index rather than reading the type tag and dispatching per kind. The argument-count upper bound isn't tracked on Executable yet, so arguments remain effectively unbounded; tightening that bound is left for a later commit. Cache pointer fields are validated only when before_cache_fixup is true, since after the fixup pass they hold real pointers and must be left alone. NewFunction and NewClass have plain u32 fields for shared-function-data and class-blueprint indices; those are recognized by name in the codegen so the indices still get range-checked. The error category enum is renumbered to drop the per-operand-kind codes, since at the bytecode level we no longer differentiate.	2026-05-03 08:43:19 +02:00
Andreas Kling	d4ed658429	LibJS: Add bytecode validator scaffolding driven from Bytecode.def The plan is to start caching compiled JS bytecode on disk. Before loading anything from a cache we need confidence that the bytes are structurally well-formed, since a corrupted or tampered-with cache file could otherwise hand the interpreter an out-of-bounds jump or a constant-pool index that points past the end of the table. This commit lays down the scaffolding for that validator. The walker lives in Rust (Libraries/LibJS/Rust/src/bytecode/validator.rs) so that it can share the existing Bytecode.def-driven layout machinery with the encoder. C++ calls into it through cbindgen, the same way the rest of the Rust pipeline is wired up. For now, the validator only does Pass 1: walk the byte stream, verify each instruction is 8-byte aligned, the opcode byte is in range, and the reported length keeps us inside the buffer. The length lookup is generated from Bytecode.def so fixed-length and variable-length instructions stay in sync with the rest of the codegen automatically. Per-field bounds checks (operands, labels, table indices, cache indices) and structural extras (basic block offsets, exception handlers, source map) come in follow-up commits. The validator runs after every successful compilation in debug and sanitizer builds, gated on !NDEBUG \|\| HAS_ADDRESS_SANITIZER, so we get an extra sanity check on every executable the encoder produces without paying for it in release builds. Failure trips a VERIFY_NOT_REACHED with the offset, opcode, and error category logged via dbgln().	2026-05-03 08:43:19 +02:00

Author

SHA1

Message

Date

Andreas Kling

d3ca680a62

LibJS: Validate basic blocks, exception handlers, and source map

Pass 3 cross-checks the structural metadata stored alongside the
bytecode buffer on Executable against the offset set built during
Pass 1. Every basic block start offset must point at an instruction
boundary; exception handler start, end, and handler offsets must
either be at an instruction boundary or, for the inclusive-start /
exclusive-end pair, equal to the bytecode length; source map
entries must do the same.

Of these, the exception handler's handler_offset is the safety-
critical one for the disk-cache use case: a corrupted offset there
sends control flow into the middle of an instruction. The other
checks tighten the cache-load surface area and catch obvious file
corruption.

The metadata is plumbed across the FFI as a separate
FFIValidatorExtras struct so the validator entry point keeps the
single-call shape, with a flat-offset mirror struct for exception
handlers since the original carries no source data we need.

2026-05-03 08:43:19 +02:00

Andreas Kling

0b8fbc03ef

LibJS: Add per-field bytecode validation generated from Bytecode.def

Pass 2 of the validator now runs a per-instruction check that walks
each opcode's fields and verifies every reference points somewhere
sensible. Operand indices, label addresses, identifier/string/
property-key/regex table indices, cache indices, and trailing
operand arrays are all bound-checked against the values the C++
side carries on the Executable. Fields whose bound depends on an
enum variant count or other type information not present in
Bytecode.def are left for a follow-up.

The codegen lives in build.rs and reuses the existing layout
machinery from the bytecode_def crate, so each opcode gets a match
arm whose body reads each field at its known byte offset and calls
the right hand-written validate_* helper. Variable-length
instructions cross-check the count field against m_length before
iterating the trailing array, which guards against an attacker
sneaking a count that walks off the end of the instruction.

Note that the encoded operand format is a flat u32 index into the
runtime [registers | locals | constants | arguments] array, since
Operand::offset_index_by zeroes the 3-bit type tag during assembly.
The validator therefore range-checks the flat index rather than
reading the type tag and dispatching per kind.

The argument-count upper bound isn't tracked on Executable yet, so
arguments remain effectively unbounded; tightening that bound is
left for a later commit.

Cache pointer fields are validated only when before_cache_fixup is
true, since after the fixup pass they hold real pointers and must
be left alone. NewFunction and NewClass have plain u32 fields for
shared-function-data and class-blueprint indices; those are
recognized by name in the codegen so the indices still get
range-checked.

The error category enum is renumbered to drop the per-operand-kind
codes, since at the bytecode level we no longer differentiate.

2026-05-03 08:43:19 +02:00

Andreas Kling

d4ed658429

LibJS: Add bytecode validator scaffolding driven from Bytecode.def

The plan is to start caching compiled JS bytecode on disk. Before
loading anything from a cache we need confidence that the bytes are
structurally well-formed, since a corrupted or tampered-with cache
file could otherwise hand the interpreter an out-of-bounds jump or a
constant-pool index that points past the end of the table.

This commit lays down the scaffolding for that validator. The walker
lives in Rust (Libraries/LibJS/Rust/src/bytecode/validator.rs) so
that it can share the existing Bytecode.def-driven layout machinery
with the encoder. C++ calls into it through cbindgen, the same way
the rest of the Rust pipeline is wired up.

For now, the validator only does Pass 1: walk the byte stream,
verify each instruction is 8-byte aligned, the opcode byte is in
range, and the reported length keeps us inside the buffer. The
length lookup is generated from Bytecode.def so fixed-length and
variable-length instructions stay in sync with the rest of the
codegen automatically. Per-field bounds checks (operands, labels,
table indices, cache indices) and structural extras (basic block
offsets, exception handlers, source map) come in follow-up commits.

The validator runs after every successful compilation in debug and
sanitizer builds, gated on !NDEBUG || HAS_ADDRESS_SANITIZER, so we
get an extra sanity check on every executable the encoder produces
without paying for it in release builds. Failure trips a
VERIFY_NOT_REACHED with the offset, opcode, and error category
logged via dbgln().

2026-05-03 08:43:19 +02:00

3 Commits