Implement a complete Rust reimplementation of the LibJS frontend:
lexer, parser, AST, scope collector, and bytecode code generator.
The Rust pipeline is built via Corrosion (CMake-Cargo bridge) and
linked into LibJS as a static library. It is gated behind a build
flag (ENABLE_RUST, on by default except on Windows) and two runtime
environment variables:
- LIBJS_CPP: Use the C++ pipeline instead of Rust
- LIBJS_COMPARE_PIPELINES=1: Run both pipelines in lockstep,
aborting on any difference in AST or bytecode generated.
The C++ side communicates with Rust through a C FFI layer
(RustIntegration.cpp/h) that passes source text to Rust and receives
a populated Executable back via a BytecodeFactory interface.
There's now an ENABLE_RUST CMake option (on by default).
Install Rust via rustup in devcontainer scripts, document the
requirement in build instructions, and add Cargo's target/ directory
to .gitignore.
Remove CodeGenerationError and make all bytecode generation functions
return their results directly instead of wrapping them in
CodeGenerationErrorOr.
For the few remaining sites where codegen encounters an unimplemented
or unexpected AST node, we now use a new emit_todo() helper that emits
a NewTypeError + Throw sequence at compile time (preserving the runtime
behavior) and then switches to a dead basic block so subsequent codegen
for the same function can continue without issue.
This allows us to remove error handling from all callers of the
bytecode compiler, simplifying the code significantly.
Add a standalone construct_class() function that builds a class from
a ClassBlueprint and an Executable, replacing the virtual dispatch
through ClassElement::class_element_evaluation() with a direct switch
on ClassElementDescriptor::Kind.
This function reads pre-compiled SharedFunctionInstanceData indices
from the blueprint, creates ECMAScriptFunctionObjects at runtime, and
handles all class element types: methods, getters, setters, fields
(with initializers), and static initializers.
The function exists but is not yet called. No behavioral change.
Replace the ScopePusher RAII class (which performed scope analysis
in its destructor chain during parsing) with a two-phase approach:
1. ScopeCollector builds a tree of ScopeRecord nodes during parsing
via RAII ScopeHandle objects. It records declarations, identifier
references, and flags, but does not resolve anything.
2. After parsing completes, ScopeCollector::analyze() walks the tree
bottom-up and performs all resolution: propagate eval/with
poisoning, resolve identifiers to locals/globals/arguments, hoist
functions (Annex B.3.3), and build FunctionScopeData.
Key design decisions:
- ScopeRecord::ast_node is a RefPtr<ScopeNode> to prevent
use-after-free when synthesize_binding_pattern re-parses an
expression as a binding pattern (the original parse's scope records
survive with stale AST node pointers).
- Parser::scope_collector() returns the override collector if set
(for synthesize_binding_pattern's nested parser), ensuring all
scope operations route to the outer parser's scope tree.
- FunctionNode::local_variables_names() delegates to its body's
ScopeNode rather than copying at parse time, since analysis runs
after parsing.
Replace the old indentation-based AST dump with a new tree-drawing
approach using unicode box characters. Each node now also shows its
source position as @line:column, and additional internal state:
- Identifier: [argument:N] vs [variable:N], declaration kind
(var/let/const), [global], [in-eval-scope]
- FunctionNode: [strict], [arrow], [direct-eval], [uses-this],
[uses-this-from-environment], [might-need-arguments]
- Program: (script)/(module), [strict], [top-level-await]
- YieldExpression: [yield*] for delegation
Dump code is moved from AST.cpp into a new ASTDump.cpp file.
ENABLE_WINDOWS_CI and the *_CI presets were initially added back when
the AK library and all the AK Test* executables were the only targets
that supported building and running in CI. Since then, almost all the
targets in the codebase are built on Windows besides the following:
- LibLine
- test-262-runner
Since these targets above are not required to actually run or test the
browser on Windows in its current experimental state, fully disabling
them should be fine for now.
ENABLE_WINDOWS_CI was also used to exclude test-web from ctest. This
can be fully disabled on Windows for now until proper runtime support
is added.
The remaining locations were all using ENABLE_WINDOWS_CI as a proxy for
ENABLE_ADDRESS_SANITIZER, so we can just be explicit instead.
The new presets map much more directly to the unix Release, Debug, and
Sanitizer presets which should make setting up ladybird on Windows less
confusing.
We also make the new Windows_Experimental_Release preset the default in
ladybird.py to match Unix.
Replace the custom AK JSON parser with simdjson for parsing JSON in
LibJS. This eliminates the intermediate AK::JsonValue object graph,
going directly from JSON text to JS::Value.
simdjson's on-demand API parses at ~4GB/s and only materializes values
as they are accessed, making this both faster and more memory efficient
than the previous approach.
The AK JSON parser is still used elsewhere (WebDriver protocol, config
files, etc.) but LibJS now uses simdjson exclusively for JSON.parse()
and JSON.rawJSON().
This adds visit_edges(Cell::Visitor&) methods to various helper structs
that contain GC pointers, and makes sure they are called from owning
GC-heap-allocated objects as needed.
These were found by our Clang plugin after expanding its capabilities.
The added rules will be enforced by CI going forward.
Instead of creating PropertyKeys on the fly during interpreter
execution, we now store fully-formed ones in the Executable.
This avoids a whole bunch of busywork in property access instructions
and substantially reduces code size bloat.
This hosts the ability to compile and run JavaScript to implement
native functions. This is particularly useful for any native function
that is not a normal function, for example async functions such as
Array.fromAsync, which require yielding.
These functions are not allowed to observe anything from outside their
environment. Any global identifiers will instead be assumed to be a
reference to an abstract operation or a constant. The generator will
inject the appropriate bytecode if the name of the global identifier
matches a known name. Anything else will cause a code generation error.
This commit adds a new Bytecode.def file that describes all the LibJS
bytecode instructions.
From this, we are able to generate the full declarations for all C++
bytecode instruction classes, as well as their serialization code.
Note that some of the bytecode compiler was updated since instructions
no longer have default constructor arguments.
The big immediate benefit here is that we lose a couple thousand lines
of hand-written C++ code. Going forward, this also allows us to do more
tooling for the bytecode VM, now that we have an authoritative
description of its instructions.
Key things to know about:
- Instructions can inherit from one another. At the moment, everything
simply inherits from the base "Instruction".
- @terminator means the instruction terminates a basic block.
- @nothrow means the instruction cannot throw. This affects how the
interpreter interacts with it.
- Variable-length instructions are automatically supported. Just put an
array of something as the last field of the instruction.
- The m_length field is magical. If present, it will be populated with
the full length of the instruction. This is used for variable-length
instructions.
This reverts commit c14173f651. We
should only annotate the minimum number of symbols that external
consumers actually use, so I am starting from scratch to do that
This commit adds the minimal export macros needed to run js.exe on
windows. A followup commit is planned to move to explicit export
entirely.
A static_assert for the size of a struct is also ifdef'ed out as the
semantics around object layout and inheritance are different on MSVC abi
and the struct IteratorRecord ends up being 40 bytes not 32.
Instead of returning internal generator results as ordinary JS::Objects
with properties, we now use GeneratorResult and CompletionCell which
both inherit from Cell directly and allow efficient access to state.
1.59x speedup on JetStream3/lazy-collections.js :^)
This also includes a stubbed Temporal.Duration.prototype.
Until we have re-implemented Temporal.PlainDate/ZonedDateTime, some of
Temporal.Duration.compare (and its invoked AOs) are left unimplemented.
Our Temporal implementation is woefully out of date. The spec has been
so vastly rewritten that it is unfortunately not practical to update our
implementation in-place. Even just removing Temporal objects that were
removed from the spec, or updating any of the simpler remaining objects,
has proven to be a mess in previous attempts.
So, this removes our Temporal implementation. AOs used by other specs
are left intact.
Resulting in a massive rename across almost everywhere! Alongside the
namespace change, we now have the following names:
* JS::NonnullGCPtr -> GC::Ref
* JS::GCPtr -> GC::Ptr
* JS::HeapFunction -> GC::Function
* JS::CellImpl -> GC::Cell
* JS::Handle -> GC::Root
Instead, smuggle it in as a `void*` private data and let Javascript
aware code cast out that pointer to a VM&.
In order to make this split, rename JS::Cell to JS::CellImpl. Once we
have a LibGC, this will become GC::Cell. CellImpl then has no specific
knowledge of the VM& and Realm&. That knowledge is instead put into
JS::Cell, which inherits from CellImpl. JS::Cell is responsible for
JavaScript's realm initialization, as well as converting of the void*
private data to what it knows should be the VM&.
While this is used in the implementation of Runtime objects itself, Heap
seems like a more appropriate home. This will also help in factoring out
the GC implementation into it's own library as the heap explicitly has
knowledge of WeakContainer.
This changes the remaining uses of the following functions across LibJS:
- String::format() => String::formatted()
- dbg() => dbgln()
- printf() => out(), outln()
- fprintf() => warnln()
I also removed the relevant 'LogStream& operator<<' overloads as they're
not needed anymore.
The current implementation is not entirely correct yet. Two classes have
been added:
- TypedArrayConstructor, which the various typed array constructors now
inherit from. Calling or constructing this class (from JS, that is)
directly is not possible, we might want to move this abstract class
functionality to NativeFunction at a later point.
- TypedArrayPrototype, which the various typed array prototypes now have
as their own prototype. This will be the place where most of the
functionality is being shared.
Relevant parts from the spec:
22.2.1 The %TypedArray% Intrinsic Object
The %TypedArray% intrinsic object:
- is a constructor function object that all of the TypedArray
constructor objects inherit from.
- along with its corresponding prototype object, provides common
properties that are inherited by all TypedArray constructors and their
instances.
22.2.2 Properties of the %TypedArray% Intrinsic Object
The %TypedArray% intrinsic object:
- has a [[Prototype]] internal slot whose value is %Function.prototype%.
22.2.2.3 %TypedArray%.prototype
The initial value of %TypedArray%.prototype is the %TypedArray%
prototype object.
22.2.6 Properties of the TypedArray Constructors
Each TypedArray constructor:
- has a [[Prototype]] internal slot whose value is %TypedArray%.
22.2.6.2 TypedArray.prototype
The initial value of TypedArray.prototype is the corresponding
TypedArray prototype intrinsic object (22.2.7).
22.2.7 Properties of the TypedArray Prototype Objects
Each TypedArray prototype object:
- has a [[Prototype]] internal slot whose value is %TypedArray.prototype%.
22.2.7.2 TypedArray.prototype.constructor
The initial value of a TypedArray.prototype.constructor is the
corresponding %TypedArray% intrinsic object.
This patch adds six of the standard type arrays and tries to share as
much code as possible:
- Uint8Array
- Uint16Array
- Uint32Array
- Int8Array
- Int16Array
- Int32Array
Proxy is an "exotic object" and doesn't have its own prototype. Use the
regular object prototype instead, but most stuff is happening on the
target object anyway. :^)
with statements evaluate an expression and put the result of it at the
"front" of the scope chain. This is implemented by creating a WithScope
object and placing it in front of the VM's current call frame's scope.