The bytecode interpreter only needed the running execution context,
but still threaded a separate Interpreter object through both the C++
and asm entry points. Move that state and the bytecode execution
helpers onto VM instead, and teach the asm generator and slow paths to
use VM directly.
The metadata parser only handled inline flags (e.g. `flags: [noStrict]`)
but not the YAML block list format:
flags:
- noStrict
This caused ~130 test/staging/sm test262 tests failures from running
them in both strict and sloppy modes by test262-runner.
The Rust parse_program() does not validate regex patterns or catch
all early errors during the parse phase alone -- those are caught
during compilation. The old C++ Parser did catch them during parsing.
Remove the fast parse-only path for negative tests so they go through
the full VM + compilation path, which correctly surfaces all errors.
The parse-only mode (--parse-only flag) is kept for explicit use.
Delete Lexer.cpp/h and Token.cpp, replacing all tokenization with a
new rust_tokenize() FFI function that calls back for each token.
Rewrite SyntaxHighlighter.cpp and js.cpp REPL to use the Rust
tokenizer. The token type and category enums in Token.h now mirror
the Rust definitions in token.rs.
Move is_syntax_character/is_whitespace/is_line_terminator helpers
into RegExpConstructor.cpp as static functions, since they were only
used there.
Delete Parser.cpp/h and ScopeCollector.cpp/h, now that all parsing
goes through the Rust pipeline.
Port test262-runner to use RustIntegration::parse_program() for its
fast parse-only check instead of the C++ Parser.
Add parsed_program_has_errors() and free_parsed_program() to the
RustIntegration public API for parse-only use cases.
Remove Bytecode::compile() and the old create() overloads on
ECMAScriptFunctionObject that accepted C++ AST nodes. These
have no remaining callers now that all compilation goes through
the Rust pipeline.
Also remove the if-constexpr Parse Node branch from
async_block_start, since the Statement template instantiation
was already removed.
Fix transitive include dependencies on Generator.h by adding
explicit includes for headers that were previously pulled in
transitively.
Now that the Rust pipeline is the sole compilation path, remove all
C++ parser/codegen fallback paths from the callers:
- Script::parse() no longer falls back to C++ Parser
- SourceTextModule::parse() no longer falls back to C++ Parser
- perform_eval() no longer falls back to C++ Parser + Generator
- create_dynamic_function() no longer falls back to C++ Parser
- ShadowRealm eval no longer falls back to C++ Parser + Generator
- Interpreter::run(Script&) no longer falls back to Generator
Also remove the now-dead old constructors that took C++ AST nodes,
the module_requests() helper, and AST dump code from js.cpp.
Implement a complete Rust reimplementation of the LibJS frontend:
lexer, parser, AST, scope collector, and bytecode code generator.
The Rust pipeline is built via Corrosion (CMake-Cargo bridge) and
linked into LibJS as a static library. It is gated behind a build
flag (ENABLE_RUST, on by default except on Windows) and two runtime
environment variables:
- LIBJS_CPP: Use the C++ pipeline instead of Rust
- LIBJS_COMPARE_PIPELINES=1: Run both pipelines in lockstep,
aborting on any difference in AST or bytecode generated.
The C++ side communicates with Rust through a C FFI layer
(RustIntegration.cpp/h) that passes source text to Rust and receives
a populated Executable back via a BytecodeFactory interface.
Now that initialize_environment() uses pre-computed data and
execute_module() caches its executable / uses TLA shared data,
we can drop the AST reference after it's no longer needed.
For TLA modules, the AST is dropped immediately after constructing
the SharedFunctionInstanceData (which takes its own ref). For non-TLA
modules, the AST is dropped after the first bytecode compilation.
Also remove the m_default_export field (replaced by the pre-computed
m_default_export_binding_name) and extract default export info in
parse() instead of the constructor.
After compiling the bytecode executable on first run, null out the
AST (m_parse_node) and clear AnnexB candidates since they are no
longer needed. This frees the memory held by the entire AST for the
script's lifetime.
The parse_node() accessor now returns a nullable pointer. Callers
(js.cpp for AST dumping, Interpreter for first compilation) access
the AST before it is dropped.
This adds parsing of `(ref typeidx)` and validates that `typeidx` is a
valid index. Currently, nullability of the reference is lost.
A bug causing the code below to fail parsing has been fixed.
```wat
(module
(type $T (struct (field i32) (field f32)))
(type $T1 (struct (field i32) (field f32)))
(; many more types... ;)
(type $T64 (struct (field i32) (field f32)))
(type $f (func (result (ref null $T64))))
)
```
The spec tests type-equivalence.{0,1,3,13} have been disabled as they
were previously false positives.
Replace the old indentation-based AST dump with a new tree-drawing
approach using unicode box characters. Each node now also shows its
source position as @line:column, and additional internal state:
- Identifier: [argument:N] vs [variable:N], declaration kind
(var/let/const), [global], [in-eval-scope]
- FunctionNode: [strict], [arrow], [direct-eval], [uses-this],
[uses-this-from-environment], [might-need-arguments]
- Program: (script)/(module), [strict], [top-level-await]
- YieldExpression: [yield*] for delegation
Dump code is moved from AST.cpp into a new ASTDump.cpp file.
LibJS+DevTools: Implement console.trace() with source locations
- Add Console::TraceFrame struct with source location data
- Implement Console::trace() to gather stack information
- Add WebView::StackFrame and ConsoleTrace for IPC
- Implement DevToolsConsoleClient::printer() for traces
- Update FrameActor to format traces for DevTools
- Update WorkerDebugConsoleClient trace handling
- Update ReplConsoleClient to format trace output
This patch adds support for parsing structs in the type section.
It also removes the assumption that all types in the type section are
function types, adding appropriate validation.
Spec tests struct.3 and struct.4 have been disable as this would
require expanding `ValueType` to include more heap-types.
Instead of trying to indirectly load 2x64 bits from *cc, load addresses
directly from their own contiguous allocation.
This allows a future optimisation where we defer loading addresses to
reduce memory port pressure.
This ensures that we are explicitly declaring the allocator to use when
allocating a cell(-inheriting) type, instead of silently falling back
to size-based allocation.
Since this is done in allocate_cell, this will only be detected for
types that are actively being allocated. However, since that means
they're _not_ being allocated, that means it's safe to not declare
an allocator to use for those. For example, the base TypedArray<T>,
which is never directly allocated and only the defined specializations
are ever allocated.
ENABLE_WINDOWS_CI and the *_CI presets were initially added back when
the AK library and all the AK Test* executables were the only targets
that supported building and running in CI. Since then, almost all the
targets in the codebase are built on Windows besides the following:
- LibLine
- test-262-runner
Since these targets above are not required to actually run or test the
browser on Windows in its current experimental state, fully disabling
them should be fine for now.
ENABLE_WINDOWS_CI was also used to exclude test-web from ctest. This
can be fully disabled on Windows for now until proper runtime support
is added.
The remaining locations were all using ENABLE_WINDOWS_CI as a proxy for
ENABLE_ADDRESS_SANITIZER, so we can just be explicit instead.
The new presets map much more directly to the unix Release, Debug, and
Sanitizer presets which should make setting up ladybird on Windows less
confusing.
We also make the new Windows_Experimental_Release preset the default in
ladybird.py to match Unix.
Replace the custom AK JSON parser with simdjson for parsing JSON in
LibJS. This eliminates the intermediate AK::JsonValue object graph,
going directly from JSON text to JS::Value.
simdjson's on-demand API parses at ~4GB/s and only materializes values
as they are accessed, making this both faster and more memory efficient
than the previous approach.
The AK JSON parser is still used elsewhere (WebDriver protocol, config
files, etc.) but LibJS now uses simdjson exclusively for JSON.parse()
and JSON.rawJSON().
For XHTML documents, resolve named character entities (e.g., )
using the HTML entity table via a getEntity SAX callback. This avoids
parsing a large embedded DTD on every document and matches the approach
used by Blink and WebKit.
This also removes the now-unused DTD infrastructure:
- Remove resolve_external_resource callback from Parser::Options
- Remove resolve_xml_resource() function and its ~60KB embedded DTD
- Remove all call sites passing the unused callback
In our process architecture, there's only ever one JS::VM per process.
This allows us to have a VM::the() singleton getter that optimizes
down to a single global access everywhere.
Seeing 1-2% speed-up on all JS benchmarks from this.
This moves the responsibility of setting up a SourceCode object to the
users of JS::Lexer.
This means Lexer and Parser are free to use string views into the
SourceCode internally while working.
It also means Lexer no longer has to think about anything other than
UTF-16 (or ASCII) inputs. So the unit test for parsing various invalid
UTF-8 sequences is deleted here.
This commits puts the strict mode flag in the header of every bytecode
instruction. This allows us to check for strict mode without looking at
the currently running execution context.
This new file in the root of the archives contains the git commit hash,
to be used by e.g. the js-benchmarks webhook to determine which commit
was used to build the utilities.
This can be done by passing
`--export-js <module>.<fn>[(<arg>:type, ...)][:type]=<source>`,
which uses a js function `(arg...) => source` to resolve the requested
import `module::fn`.
All literal wasm value types (i<n> and v128) are supported as both
parameter and return types.
This still passes the values on the stack, but registers are now allowed
to cross a call boundary.
This is a very significant (>50%) improvement on the small call
microbenchmarks on my machine.