Commit Graph

89 Commits

Author SHA1 Message Date
Andreas Kling
b2b72a1884 LibJS: Defer regex literal compilation to post-parse step
Move regex compilation out of the parsing hot path. Both the C++ and
Rust parsers now collect raw regex pattern+flags strings during parsing
and batch-compile them after parsing completes.

This is a prerequisite for moving the Rust parser to a background
thread, since LibRegex is thread-unsafe and FFI calls during parsing
prevent parallelization.

Flag validation remains in the parser since it's trivial string
checking with no LibRegex dependency.
2026-03-06 13:06:05 +01:00
Andreas Kling
8bf1d749a1 LibJS: Suppress global identifier optimization for dynamic functions
Functions created via new Function() cannot assume that unresolved
identifiers refer to global variables, since they may be called in
an arbitrary scope. Pass a flag through the scope collector analysis
to suppress the global identifier optimization in this case.
2026-02-24 09:39:42 +01:00
Andreas Kling
51639fd555 LibJS: Move labels_in_scope out of ParserState
Label scoping is already managed manually at function and arrow
function boundaries via move + ScopeGuard, and speculative parse
paths never modify labels before their potential failure points.

Move the HashMap to a direct Parser member so it is no longer
copied on every save_state()/load_state() cycle.
2026-02-10 02:05:20 +01:00
Andreas Kling
fa8f9a6b2c LibJS: Remove invalid_property_range HashMap from ParserState
Replace the HashMap<size_t, Position> used to communicate invalid
object literal property ranges from parse_object_expression() to
parse_expression() with a field on PrimaryExpressionParseResult.

The range now flows through the parser's return values instead of
being stashed in persistent ParserState, eliminating a HashMap that
was never cleared and got bulk-copied on every save_state() call
during speculative parsing.
2026-02-10 02:05:20 +01:00
Andreas Kling
a8a1aba3ba LibJS: Replace ScopePusher with ScopeCollector
Replace the ScopePusher RAII class (which performed scope analysis
in its destructor chain during parsing) with a two-phase approach:

1. ScopeCollector builds a tree of ScopeRecord nodes during parsing
   via RAII ScopeHandle objects. It records declarations, identifier
   references, and flags, but does not resolve anything.

2. After parsing completes, ScopeCollector::analyze() walks the tree
   bottom-up and performs all resolution: propagate eval/with
   poisoning, resolve identifiers to locals/globals/arguments, hoist
   functions (Annex B.3.3), and build FunctionScopeData.

Key design decisions:
- ScopeRecord::ast_node is a RefPtr<ScopeNode> to prevent
  use-after-free when synthesize_binding_pattern re-parses an
  expression as a binding pattern (the original parse's scope records
  survive with stale AST node pointers).
- Parser::scope_collector() returns the override collector if set
  (for synthesize_binding_pattern's nested parser), ensuring all
  scope operations route to the outer parser's scope tree.
- FunctionNode::local_variables_names() delegates to its body's
  ScopeNode rather than copying at parse time, since analysis runs
  after parsing.
2026-02-10 02:05:20 +01:00
Andreas Kling
fa44fd58d8 LibJS: Remove ParserState::lookahead_lexer
The lookahead lexer used by next_token() no longer needs to be kept
alive, since tokens created by Parser::next_token() now have any string
views guaranteed safe by the fact that they point into the one true
SourceCode provided by whoever set up the lexer.
2025-11-09 12:14:03 +01:00
Andreas Kling
841fe0b51c LibJS: Don't store current token in both Lexer and Parser
Just give Parser a way to access the one stored in Lexer.
2025-11-09 12:14:03 +01:00
Timothy Flynn
b955c9b2a9 LibJS: Port the Identifier AST (and related) nodes to UTF-16
This eliminates quite a lot of UTF-8 / UTF-16 churn.
2025-08-13 09:56:13 -04:00
Timothy Flynn
00182a2405 LibJS: Port the JS lexer and parser to UTF-16
This ports the lexer to UTF-16 and deals with the immediate fallout up
to the AST. The AST will be dealt with in upcoming commits.

The lexer will still accept UTF-8 strings as input, and will transcode
them to UTF-16 for lexing. This doesn't actually incur a new allocation,
as we were already converting the input StringView to a ByteString for
each lexer.

One immediate logical benefit here is that we do not need to know off-
hand how many UTF-8 bytes some special code points occupy. They all
happen to be a single UTF-16 code unit. So instead of advancing the
lexer by 3 positions in some cases, we can just always advance by 1.
2025-08-13 09:56:13 -04:00
Timothy Flynn
eb74781a2d LibJS: Keep the lookahead lexer alive after parsing its next token
Currently, the lexer holds a ByteString, which is always heap-allocated.
When we create a copy of the lexer for the lookahead token, that token
will outlive the lexer copy. The token holds a couple of string views
into the lexer's source string. This is fine for now, because the source
string will be kept alive by the original lexer.

But if the lexer were to hold a String or Utf16String, short strings
will be stored on the stack due to SSO. Thus the token will hold views
into released stack data. We need to keep the lookahead lexer alive to
prevent UAF on views into its source string.
2025-08-13 09:56:13 -04:00
ayeteadoe
2e2484257d LibJS: Enable EXPLICIT_SYMBOL_EXPORT and annotate minimum symbol set 2025-07-22 11:51:29 -04:00
ayeteadoe
539a675802 LibJS: Revert Enable EXPLICIT_SYMBOL_EXPORT
This reverts commit c14173f651. We
should only annotate the minimum number of symbols that external
consumers actually use, so I am starting from scratch to do that
2025-07-22 11:51:29 -04:00
Luke Wilde
3d43462ccd LibJS: Implement the Dynamic Code Brand Checks stage 3 proposal
This is an active proposal at stage 3 of the TC39 proposal process.
See: https://tc39.es/proposal-dynamic-code-brand-checks/
See: https://github.com/tc39/proposal-dynamic-code-brand-checks

This proposal essentially adds support for the TrustedScript type from
the Trusted Types specification to eval and Function. This in turn
pipes support for the type into the CSP hook to check if the CSP allows
dynamic code compilation.

However, it currently doesn't support ShadowRealms, so the
implementation here is a close approximation, using PerformEval as the
basis.
See: https://github.com/tc39/proposal-dynamic-code-brand-checks/issues/19

This is required to support the new function signature for the CSP
hook, and will allow us to slot in Trusted Types support in the future.
2025-07-09 15:52:54 -06:00
ayeteadoe
c14173f651 LibJS: Enable EXPLICIT_SYMBOL_EXPORT 2025-06-30 10:50:36 -06:00
Timothy Flynn
7280ed6312 Meta: Enforce newlines around namespaces
This has come up several times during code review, so let's just enforce
it using a new clang-format 20 option.
2025-05-14 02:01:59 -06:00
Aliaksandr Kalenik
7932091e02 LibJS: Allow using local variable for catch parameters
Local variables are faster to access and if all catch parameters are
locals we can skip lexical environment allocation.
2025-04-22 21:57:25 +02:00
Andreas Kling
ef4e7b7945 LibJS: Make JS parser emit accurate this insights for constructors
This way we don't have to handle it when instantiating the constructor.
2025-04-08 18:52:35 +02:00
Andreas Kling
6c70dc5f09 LibJS: Create FunctionParameters earlier in the parser
This avoids making multiple copies of the Vector<FunctionParameter> in
the parser.
2025-03-27 19:50:13 +00:00
Andreas Kling
f1914893e9 LibJS+LibWeb: Remove more uses of DeprecatedFlyString 2025-03-24 22:27:17 +00:00
Andreas Kling
46a5710238 LibJS: Use FlyString in PropertyKey instead of DeprecatedFlyString
This required dealing with *substantial* fallout.
2025-03-24 22:27:17 +00:00
Timothy Flynn
b64a355a30 LibJS: Remove support for the "assert" keyword for import attributes
This was removed from the spec some time ago. See:
https://github.com/tc39/proposal-import-attributes/commit/14286bb
2025-01-21 14:58:32 +01:00
Shannon Booth
f87041bf3a LibGC+Everywhere: Factor out a LibGC from LibJS
Resulting in a massive rename across almost everywhere! Alongside the
namespace change, we now have the following names:

 * JS::NonnullGCPtr -> GC::Ref
 * JS::GCPtr -> GC::Ptr
 * JS::HeapFunction -> GC::Function
 * JS::CellImpl -> GC::Cell
 * JS::Handle -> GC::Root
2024-11-15 14:49:20 +01:00
Timothy Flynn
93712b24bf Everywhere: Hoist the Libraries folder to the top-level 2024-11-10 12:50:45 +01:00
Andreas Kling
13d7c09125 Libraries: Move to Userland/Libraries/ 2021-01-12 12:17:46 +01:00
AnotherTest
8ca0e8325a LibJS: Don't save rule start positions along with the parser state
This fixes #4617.
Also fixes the small problem where some save states would be leaked.
2020-12-29 17:39:42 +01:00
AnotherTest
d0363bca01 LibJS: `save_state()' before creating a RulePosition
Fixes #4617.
2020-12-29 10:51:33 +01:00
AnotherTest
b34b681811 LibJS: Track source positions all the way down to exceptions
This makes exceptions have a trace of source positions too, which could
probably be helpful in making fancier error tracebacks.
2020-12-29 00:58:43 +01:00
Linus Groh
abd49c174a LibJS: Include source location hint in Parser::print_errors() 2020-12-06 18:52:52 +01:00
Andreas Kling
d617120499 LibJS: Parse "with" statements :^) 2020-11-28 17:16:48 +01:00
Linus Groh
39a1c9d827 LibJS: Implement 'new.target'
This adds a new MetaProperty AST node which will be used for
'new.target' and 'import.meta' meta properties. The parser now
distinguishes between "in function context" and "in arrow function
context" (which is required for this).
When encountering TokenType::New we will attempt to parse it as meta
property and resort to regular new expression parsing if that fails,
much like the parsing of labelled statements.
2020-11-02 22:40:59 +01:00
Linus Groh
e07a39c816 LibJS: Replace 'size_t line, size_t column' with 'Optional<Position>'
This is a bit nicer for two reasons:

- The absence of line number/column information isn't based on 'values
  are zero' anymore but on Optional's value
- When reporting syntax errors with position information other than the
  current token's position we had to store line and column ourselves,
  like this:

      auto foo_start_line = m_parser_state.m_current_token.line_number();
      auto foo_start_column = m_parser_state.m_current_token.line_column();
      ...
      syntax_error("...", foo_start_line, foo_start_column);

  Which now becomes:

      auto foo_start= position();
      ...
      syntax_error("...", foo_start);

  This makes it easier to report correct positions for syntax errors
  that only emerge a few tokens later :^)
2020-11-02 22:40:59 +01:00
Linus Groh
9e80c67608 LibJS: Fix "use strict" directive false positives
By having the "is this a use strict directive?" logic in
parse_string_literal() we would apply it to *any* string literal, which
is incorrect and would lead to false positives - e.g.:

    "use strict" + 1
    `"use strict"`
    "\123"; ({"use strict": ...})

Relevant part from the spec which is now implemented properly:

[...] and where each ExpressionStatement in the sequence consists
entirely of a StringLiteral token [...]

I also got rid of UseStrictDirectiveState which is not needed anymore.

Fixes #3903.
2020-11-02 13:13:54 +01:00
Linus Groh
563d3c8055 LibJS: Require initializer for 'const' variable declaration 2020-10-30 23:43:38 +01:00
Linus Groh
dca9e4ec10 LibJS: Implement rules for duplicate function parameters
- A regular function can have duplicate parameters except in strict mode
  or if its parameter list is not "simple" (has a default or rest
  parameter)
- An arrow function can never have duplicate parameters

Compared to other engines I opted for more useful syntax error messages
than a generic "duplicate parameter name not allowed in this context":

    "use strict"; function test(foo, foo) {}
                                     ^
    Uncaught exception: [SyntaxError]: Duplicate parameter 'foo' not allowed in strict mode (line: 1, column: 34)

    function test(foo, foo = 1) {}
                       ^
    Uncaught exception: [SyntaxError]: Duplicate parameter 'foo' not allowed in function with default parameter (line: 1, column: 20)

    function test(foo, ...foo) {}
                          ^
    Uncaught exception: [SyntaxError]: Duplicate parameter 'foo' not allowed in function with rest parameter (line: 1, column: 23)

    (foo, foo) => {}
          ^
    Uncaught exception: [SyntaxError]: Duplicate parameter 'foo' not allowed in arrow function (line: 1, column: 7)
2020-10-25 12:56:02 +01:00
Linus Groh
4fb96afafc LibJS: Support LegacyOctalEscapeSequence in string literals
https://tc39.es/ecma262/#sec-additional-syntax-string-literals

The syntax and semantics of 11.8.4 is extended as follows except that
this extension is not allowed for strict mode code:

Syntax

    EscapeSequence::
        CharacterEscapeSequence
        LegacyOctalEscapeSequence
        NonOctalDecimalEscapeSequence
        HexEscapeSequence
        UnicodeEscapeSequence

    LegacyOctalEscapeSequence::
        OctalDigit [lookahead ∉ OctalDigit]
        ZeroToThree OctalDigit [lookahead ∉ OctalDigit]
        FourToSeven OctalDigit
        ZeroToThree OctalDigit OctalDigit

    ZeroToThree :: one of
        0 1 2 3

    FourToSeven :: one of
        4 5 6 7

    NonOctalDecimalEscapeSequence :: one of
        8 9

This definition of EscapeSequence is not used in strict mode or when
parsing TemplateCharacter.

Note

It is possible for string literals to precede a Use Strict Directive
that places the enclosing code in strict mode, and implementations must
take care to not use this extended definition of EscapeSequence with
such literals. For example, attempting to parse the following source
text must fail:

function invalid() { "\7"; "use strict"; }
2020-10-24 16:34:01 +02:00
Linus Groh
80bb62b9cc LibJS: Distinguish between statement and declaration
This separates matching/parsing of statements and declarations and
fixes a few edge cases where the parser would incorrectly accept a
declaration where only a statement is allowed - for example:

    if (foo) const a = 1;
    for (var bar;;) function b() {}
    while (baz) class c {}
2020-10-23 19:13:06 +02:00
Linus Groh
15642874f3 LibJS: Support all line terminators (LF, CR, LS, PS)
https://tc39.es/ecma262/#sec-line-terminators
2020-10-22 10:06:30 +02:00
Linus Groh
6331d45a6f LibJS: Move checks for invalid getter/setter params to parse_function_node
This allows us to provide better error messages as we can point the
syntax error location to the exact first invalid parameter instead of
always the end of the function within a object literal or class
definition.

Before this change:

    const Foo = { set bar() {} }
                               ^
    Uncaught exception: [SyntaxError]: Object setter property must have one argument (line: 1, column: 28)

    class Foo { set bar() {} }
                             ^
    Uncaught exception: [SyntaxError]: Class setter method must have one argument (line: 1, column: 26)

After this change:

    const Foo = { set bar() {} }
                          ^
    Uncaught exception: [SyntaxError]: Setter function must have one argument (line: 1, column: 23)

    class Foo { set bar() {} }
                        ^
    Uncaught exception: [SyntaxError]: Setter function must have one argument (line: 1, column: 21)

The only possible downside of this change is that class getters/setters
and functions in objects are not distinguished in the message anymore -
I don't think that's important though, and classes are (mostly) just
syntactic sugar anyway.
2020-10-20 20:27:58 +02:00
Linus Groh
db75be1119 LibJS: Refactor parse_function_node() bool parameters into bit flags
I'm about to add even more options and a bunch of unnamed true/false
arguments is really not helpful. Let's make this a single parse options
parameter using bit flags.
2020-10-20 20:27:58 +02:00
Linus Groh
46cc1f718e LibJS: Unprefixed octal numbers are a syntax error in strict mode 2020-10-19 20:08:22 +02:00
Linus Groh
965d952ff3 LibJS: Share parameter parsing between regular and arrow functions
This simplifies try_parse_arrow_function_expression() and fixes a few
cases that should not produce an arrow function AST but did:

    (a,,) => {}
    (a b) => {}
    (a ...b) => {}
    (...b a) => {}

The new parsing logic checks whether parens are expected and uses
parse_function_parameters() if so, rolling back if a new syntax error
occurs during that. Otherwise it's just an identifier in which case we
parse the single parameter ourselves.
2020-10-19 11:31:55 +02:00
Matthew Olsson
e8da5f99b1 LibJS: break or continue with nonexistent label is a syntax error 2020-10-08 23:27:16 +02:00
Matthew Olsson
e49ea1b520 LibJS: Disallow 'continue' & 'break' outside of their respective scopes
'continue' is no longer allowed outside of a loop, and an unlabeled
'break' is not longer allowed outside of a loop or switch statement.
Labeled 'break' statements are still allowed everywhere, even if the
label does not exist.
2020-10-08 10:20:49 +02:00
Matthew Olsson
9a82c22a85 LibJS: Disallow 'return' outside of a function 2020-10-08 10:03:21 +02:00
Linus Groh
283ee678f7 LibJS: Validate all assignment expressions, not just "="
The check for invalid lhs and assignment to eval/arguments in strict
mode should happen for all kinds of assignment expressions, not just
AssignmentOp::Assignment.
2020-10-05 09:25:04 +02:00
Linus Groh
bc701658f8 LibJS: Use String::formatted() for parser error messages 2020-10-04 19:22:02 +02:00
Matthew Olsson
6eb6752c4c LibJS: Strict mode is now handled by Functions and Programs, not Blocks
Since blocks can't be strict by themselves, it makes no sense for them
to store whether or not they are strict. Strict-ness is now stored in
the Program and FunctionNode ASTNodes. Fixes issue #3641
2020-10-04 10:46:12 +02:00
Muhammad Zahalqa
5a2ec86048 LibJS: Parser refactored to use constexpr precedence table
Replaced implementation dependent on HashMap with a constexpr PrecedenceTable
based on array lookup.
2020-08-21 16:14:14 +02:00
Jack Karamanian
7533fd8b02 LibJS: Initial class implementation; allow super expressions in object
literal methods; add EnvrionmentRecord fields and methods to
LexicalEnvironment

Adding EnvrionmentRecord's fields and methods lets us throw an exception
when |this| is not initialized, which occurs when the super constructor
in a derived class has not yet been called, or when |this| has already
been initialized (the super constructor was already called).
2020-06-29 17:54:54 +02:00
Matthew Olsson
61ac1d3ffa LibJS: Lex and parse regex literals, add RegExp objects
This adds regex parsing/lexing, as well as a relatively empty
RegExpObject. The purpose of this patch is to allow the engine to not
get hung up on parsing regexes. This will aid in finding new syntax
errors (say, from google or twitter) without having to replace all of
their regexes first!
2020-06-07 19:06:55 +02:00