ladybird

mirror of https://github.com/LadybirdBrowser/ladybird synced 2026-05-12 09:56:45 +02:00

Author	SHA1	Message	Date
Andreas Kling	47c82b38d0	LibJS: Range-check enum-typed bytecode fields in the validator The validator now bounds-checks the five enum-shaped field types that appear in Bytecode.def: Completion::Type, IteratorHint, EnvironmentMode, PutKind, and ArgumentsKind. The codegen recognizes each by its .def type name and emits a u32 read plus a range check against the corresponding variant count. The variant counts ride across the FFI as new fields on FFIValidatorBounds rather than being hardcoded on the Rust side, so the Rust validator never has to know which variants the C++ enum currently defines. The C++ side computes each count as `to_underlying(LastVariant) + 1` with a static_assert pinning the expected value, so adding or removing a variant in any of these enums fails the build until the validator is updated.	2026-05-03 08:43:19 +02:00
Andreas Kling	2782fa1559	LibJS: Tighten the bytecode validator's argument operand bound Until now the validator passed `u32::MAX` as the argument-region upper bound because nothing on Executable tracked how many argument slots a given bytecode buffer might reference. That left the largest validation hole open: any flat operand index above `registers + locals + constants` slid through the check. The Rust assembler already walks every operand during phase 1 so it can offset each one into the runtime's flat layout. This commit piggybacks on that walk to record the highest `Operand::argument` index touched and surfaces `(max + 1)` (or zero if no argument is ever referenced) on `AssembledBytecode`. The value rides through `FFIExecutableData` onto a new `Executable::number_of_arguments` field, which `Validator.cpp` then feeds into `FFIValidatorBounds`. The bound is now tight: every operand index in the encoded stream is range-checked against the actual runtime array size, including the argument region.	2026-05-03 08:43:19 +02:00
Andreas Kling	d3ca680a62	LibJS: Validate basic blocks, exception handlers, and source map Pass 3 cross-checks the structural metadata stored alongside the bytecode buffer on Executable against the offset set built during Pass 1. Every basic block start offset must point at an instruction boundary; exception handler start, end, and handler offsets must either be at an instruction boundary or, for the inclusive-start / exclusive-end pair, equal to the bytecode length; source map entries must do the same. Of these, the exception handler's handler_offset is the safety- critical one for the disk-cache use case: a corrupted offset there sends control flow into the middle of an instruction. The other checks tighten the cache-load surface area and catch obvious file corruption. The metadata is plumbed across the FFI as a separate FFIValidatorExtras struct so the validator entry point keeps the single-call shape, with a flat-offset mirror struct for exception handlers since the original carries no source data we need.	2026-05-03 08:43:19 +02:00
Andreas Kling	0b8fbc03ef	LibJS: Add per-field bytecode validation generated from Bytecode.def Pass 2 of the validator now runs a per-instruction check that walks each opcode's fields and verifies every reference points somewhere sensible. Operand indices, label addresses, identifier/string/ property-key/regex table indices, cache indices, and trailing operand arrays are all bound-checked against the values the C++ side carries on the Executable. Fields whose bound depends on an enum variant count or other type information not present in Bytecode.def are left for a follow-up. The codegen lives in build.rs and reuses the existing layout machinery from the bytecode_def crate, so each opcode gets a match arm whose body reads each field at its known byte offset and calls the right hand-written validate_* helper. Variable-length instructions cross-check the count field against m_length before iterating the trailing array, which guards against an attacker sneaking a count that walks off the end of the instruction. Note that the encoded operand format is a flat u32 index into the runtime [registers \| locals \| constants \| arguments] array, since Operand::offset_index_by zeroes the 3-bit type tag during assembly. The validator therefore range-checks the flat index rather than reading the type tag and dispatching per kind. The argument-count upper bound isn't tracked on Executable yet, so arguments remain effectively unbounded; tightening that bound is left for a later commit. Cache pointer fields are validated only when before_cache_fixup is true, since after the fixup pass they hold real pointers and must be left alone. NewFunction and NewClass have plain u32 fields for shared-function-data and class-blueprint indices; those are recognized by name in the codegen so the indices still get range-checked. The error category enum is renumbered to drop the per-operand-kind codes, since at the bytecode level we no longer differentiate.	2026-05-03 08:43:19 +02:00
Andreas Kling	d4ed658429	LibJS: Add bytecode validator scaffolding driven from Bytecode.def The plan is to start caching compiled JS bytecode on disk. Before loading anything from a cache we need confidence that the bytes are structurally well-formed, since a corrupted or tampered-with cache file could otherwise hand the interpreter an out-of-bounds jump or a constant-pool index that points past the end of the table. This commit lays down the scaffolding for that validator. The walker lives in Rust (Libraries/LibJS/Rust/src/bytecode/validator.rs) so that it can share the existing Bytecode.def-driven layout machinery with the encoder. C++ calls into it through cbindgen, the same way the rest of the Rust pipeline is wired up. For now, the validator only does Pass 1: walk the byte stream, verify each instruction is 8-byte aligned, the opcode byte is in range, and the reported length keeps us inside the buffer. The length lookup is generated from Bytecode.def so fixed-length and variable-length instructions stay in sync with the rest of the codegen automatically. Per-field bounds checks (operands, labels, table indices, cache indices) and structural extras (basic block offsets, exception handlers, source map) come in follow-up commits. The validator runs after every successful compilation in debug and sanitizer builds, gated on !NDEBUG \|\| HAS_ADDRESS_SANITIZER, so we get an extra sanity check on every executable the encoder produces without paying for it in release builds. Failure trips a VERIFY_NOT_REACHED with the offset, opcode, and error category logged via dbgln().	2026-05-03 08:43:19 +02:00

5 Commits