The bytecode interpreter only needed the running execution context,
but still threaded a separate Interpreter object through both the C++
and asm entry points. Move that state and the bytecode execution
helpers onto VM instead, and teach the asm generator and slow paths to
use VM directly.
Delete Lexer.cpp/h and Token.cpp, replacing all tokenization with a
new rust_tokenize() FFI function that calls back for each token.
Rewrite SyntaxHighlighter.cpp and js.cpp REPL to use the Rust
tokenizer. The token type and category enums in Token.h now mirror
the Rust definitions in token.rs.
Move is_syntax_character/is_whitespace/is_line_terminator helpers
into RegExpConstructor.cpp as static functions, since they were only
used there.
Delete Parser.cpp/h and ScopeCollector.cpp/h, now that all parsing
goes through the Rust pipeline.
Port test262-runner to use RustIntegration::parse_program() for its
fast parse-only check instead of the C++ Parser.
Add parsed_program_has_errors() and free_parsed_program() to the
RustIntegration public API for parse-only use cases.
Remove Bytecode::compile() and the old create() overloads on
ECMAScriptFunctionObject that accepted C++ AST nodes. These
have no remaining callers now that all compilation goes through
the Rust pipeline.
Also remove the if-constexpr Parse Node branch from
async_block_start, since the Statement template instantiation
was already removed.
Fix transitive include dependencies on Generator.h by adding
explicit includes for headers that were previously pulled in
transitively.
Now that the Rust pipeline is the sole compilation path, remove all
C++ parser/codegen fallback paths from the callers:
- Script::parse() no longer falls back to C++ Parser
- SourceTextModule::parse() no longer falls back to C++ Parser
- perform_eval() no longer falls back to C++ Parser + Generator
- create_dynamic_function() no longer falls back to C++ Parser
- ShadowRealm eval no longer falls back to C++ Parser + Generator
- Interpreter::run(Script&) no longer falls back to Generator
Also remove the now-dead old constructors that took C++ AST nodes,
the module_requests() helper, and AST dump code from js.cpp.
Implement a complete Rust reimplementation of the LibJS frontend:
lexer, parser, AST, scope collector, and bytecode code generator.
The Rust pipeline is built via Corrosion (CMake-Cargo bridge) and
linked into LibJS as a static library. It is gated behind a build
flag (ENABLE_RUST, on by default except on Windows) and two runtime
environment variables:
- LIBJS_CPP: Use the C++ pipeline instead of Rust
- LIBJS_COMPARE_PIPELINES=1: Run both pipelines in lockstep,
aborting on any difference in AST or bytecode generated.
The C++ side communicates with Rust through a C FFI layer
(RustIntegration.cpp/h) that passes source text to Rust and receives
a populated Executable back via a BytecodeFactory interface.
Now that initialize_environment() uses pre-computed data and
execute_module() caches its executable / uses TLA shared data,
we can drop the AST reference after it's no longer needed.
For TLA modules, the AST is dropped immediately after constructing
the SharedFunctionInstanceData (which takes its own ref). For non-TLA
modules, the AST is dropped after the first bytecode compilation.
Also remove the m_default_export field (replaced by the pre-computed
m_default_export_binding_name) and extract default export info in
parse() instead of the constructor.
After compiling the bytecode executable on first run, null out the
AST (m_parse_node) and clear AnnexB candidates since they are no
longer needed. This frees the memory held by the entire AST for the
script's lifetime.
The parse_node() accessor now returns a nullable pointer. Callers
(js.cpp for AST dumping, Interpreter for first compilation) access
the AST before it is dropped.
Replace the old indentation-based AST dump with a new tree-drawing
approach using unicode box characters. Each node now also shows its
source position as @line:column, and additional internal state:
- Identifier: [argument:N] vs [variable:N], declaration kind
(var/let/const), [global], [in-eval-scope]
- FunctionNode: [strict], [arrow], [direct-eval], [uses-this],
[uses-this-from-environment], [might-need-arguments]
- Program: (script)/(module), [strict], [top-level-await]
- YieldExpression: [yield*] for delegation
Dump code is moved from AST.cpp into a new ASTDump.cpp file.
LibJS+DevTools: Implement console.trace() with source locations
- Add Console::TraceFrame struct with source location data
- Implement Console::trace() to gather stack information
- Add WebView::StackFrame and ConsoleTrace for IPC
- Implement DevToolsConsoleClient::printer() for traces
- Update FrameActor to format traces for DevTools
- Update WorkerDebugConsoleClient trace handling
- Update ReplConsoleClient to format trace output
This ensures that we are explicitly declaring the allocator to use when
allocating a cell(-inheriting) type, instead of silently falling back
to size-based allocation.
Since this is done in allocate_cell, this will only be detected for
types that are actively being allocated. However, since that means
they're _not_ being allocated, that means it's safe to not declare
an allocator to use for those. For example, the base TypedArray<T>,
which is never directly allocated and only the defined specializations
are ever allocated.
Replace the custom AK JSON parser with simdjson for parsing JSON in
LibJS. This eliminates the intermediate AK::JsonValue object graph,
going directly from JSON text to JS::Value.
simdjson's on-demand API parses at ~4GB/s and only materializes values
as they are accessed, making this both faster and more memory efficient
than the previous approach.
The AK JSON parser is still used elsewhere (WebDriver protocol, config
files, etc.) but LibJS now uses simdjson exclusively for JSON.parse()
and JSON.rawJSON().
This moves the responsibility of setting up a SourceCode object to the
users of JS::Lexer.
This means Lexer and Parser are free to use string views into the
SourceCode internally while working.
It also means Lexer no longer has to think about anything other than
UTF-16 (or ASCII) inputs. So the unit test for parsing various invalid
UTF-8 sequences is deleted here.
This commits puts the strict mode flag in the header of every bytecode
instruction. This allows us to check for strict mode without looking at
the currently running execution context.
This ports the lexer to UTF-16 and deals with the immediate fallout up
to the AST. The AST will be dealt with in upcoming commits.
The lexer will still accept UTF-8 strings as input, and will transcode
them to UTF-16 for lexing. This doesn't actually incur a new allocation,
as we were already converting the input StringView to a ByteString for
each lexer.
One immediate logical benefit here is that we do not need to know off-
hand how many UTF-8 bytes some special code points occupy. They all
happen to be a single UTF-16 code unit. So instead of advancing the
lexer by 3 positions in some cases, we can just always advance by 1.
This has quite a lot of fall out. But the majority of it is just type or
UDL substitution, where the changes just fall through to other function
calls.
By changing property key storage to UTF-16, the main affected areas are:
* NativeFunction names must now be UTF-16
* Bytecode identifiers must now be UTF-16
* Module/binding names must now be UTF-16
Since it does more than removing the quotes by escaping the string too
It makes sense to change the name of the flag to something more close
to what it's really doing.
This commit allows building js.cpp on windows. The repl functionality is
ifdef'ed out. To decrease the number of ifdefs the code that runs the
repl on Linux was moved into one place. Some globals that are unused as
a result of that are markes maybe_unused. The following commit enables
building and testing js in cmake for windows.
The special empty value (that we use for array holes, Optional<Value>
when empty and a few other other placeholder/sentinel tasks) still
exists, but you now create one via JS::js_special_empty_value() and
check for it with Value::is_special_empty_value().
The main idea here is to make it very unlikely to accidentally create an
unexpected special empty value.
Resulting in a massive rename across almost everywhere! Alongside the
namespace change, we now have the following names:
* JS::NonnullGCPtr -> GC::Ref
* JS::GCPtr -> GC::Ptr
* JS::HeapFunction -> GC::Function
* JS::CellImpl -> GC::Cell
* JS::Handle -> GC::Root