1.25x speed-up on this microbenchmark:
let o = { get x() { return 1; } };
for (let i = 0; i < 10_000_000; ++i)
o.x;
I looked into this because I noticed getter invocation when profiling
long-running WPT tests. We already had the mechanism for non-getter
properties, and the change to support getters turned out to be trivial.
(cherry picked from commit 3c5819a6d27883907237bf8137fd4dc24ed04e72)
This to avoid clashing with the GCC typeof extension, which apparently
confuses clang-format.
(cherry picked from commit 14beda00c9e823dd34da74e7d8fdf46aa57e845c)
The typeof operator has a very small set of possible resulting strings,
so let's make it much faster by caching those strings on the VM.
~8x speed-up on this microbenchmark:
for (let i = 0; i < 10_000_000; ++i) {
typeof i;
}
(cherry picked from commit d0b11af3876a64e6b254b5fc3f474d9bbe552024)
We now defer looking up the various identifiers by IdentifierTableIndex
until the last moment. This allows us to avoid the retrieval in common
cases like when a property access is cached.
Knocks a ~12% item off the profile on https://ventrella.com/Clusters/
(cherry picked from commit 509c10d14db0f2e2ce2b89f307d9dd360b855eb5,
amended to mark the two `throw_null_or_undefined_property_get` functions
static, as -Wmissing-declarations pointed out on CI)
Now that the Interpreter is the only user of these functions, we might
as well keep them in Interpreter.cpp which makes CLion less confused.
(cherry picked from commit ae0cfe4f2d260bfd804364c17f28494e3e8964c9)
These can use an EnvironmentCoordinate for caching, just like normal
binding lookups. Saves a bunch of time for repeated typeof checks.
(cherry picked from commit 60a05ef414f804fe0c44d20fe13c795800826368)
Instead of displaying locals as "locN", we now show them as "name~N".
This makes it a lot easier to follow bytecode dumps, especially in
longer functions.
Note that we keep displaying the local index, to avoid confusion in case
there are multiple separate locals with the same name in one executable.
(cherry picked from commit 0aa8cb7dac60c88eac3bb7674e3fe575cf1da60b)
When resuming execution of a suspended function, we must not overwrite
any cached `this` value with something from the ExecutionContext.
This was causing an empty JS::Value to leak into the VM when resuming
an async arrow function, since the "this mode" for such functions is
lexical and thus ExecutionContext will have an empty `this`.
It became a problem due to the bytecode optimization where we allow
ourselves to assume that `this` remains cached after we've executed a
ResolveThisBinding in this (or the first) basic block of the executable.
Fixes https://github.com/LadybirdBrowser/ladybird/issues/138
(cherry picked from commit a91bb72dabe54849657720e36e19a3d3b737055f)
Functions that don't have a FunctionEnvironment will get their `this`
value from the ExecutionContext. This patch stops generating
ResolveThisBinding instructions at all for functions like that, and
instead pre-populates the `this` register when entering a new bytecode
executable.
We already have a dedicated register slot for `this`, so instead of
having ResolveThisBinding take a `dst` operand, just write the value
directly into the `this` register every time.
Allows to skip function environment allocation for non-arrow functions
if the only reason it is needed is to hold `this` binding.
The parser is changed to do following:
- If a function is an arrow function and uses `this` then all functions
in a scope chain are marked to allocate function environment for
`this` binding.
- If a function uses `new.target` then all functions in a scope chain
are marked to allocate function environment.
`ordinary_call_bind_this()` is changed to put `this` value in execution
context when function environment allocation is skipped.
35% improvement in Octane/typescript.js
50% improvement in Octane/deltablue.js
19% improvement in Octane/raytrace.js
This allows us to skip allocating a function environment in cases where
it was previously impossible because the arguments object needed a
binding.
This change does not bring visible improvement in Kraken or Octane
benchmarks but seems useful to have anyway.
By doing this, we can remove all the special-case checks for `length`
from the generic GetById and GetByIdWithThis code paths, making every
other property lookup a bit faster.
Small progressions on most benchmarks, and some larger progressions:
- 12% on Octane/crypto.js
- 6% on Kraken/ai-astar.js
It wasn't safe to use addition_would_overflow(a, -b) to check if
subtraction (a - b) would overflow, since it doesn't cover this case.
I don't know why we didn't have subtraction_would_overflow(), so this
patch adds it. :^)
With this only `ContinuePendingUnwind` needs to dynamically check if a
scheduled return needs to go through a `finally` block, making the
interpreter loop a bit nicer
Instead of SetVariable having 2x2 modes for variable/lexical and
initialize/set, those 4 modes are now separate instructions, which
makes each instruction much less branchy.
This means that SetVariable instructions will now remember which
(relative) environment contains the targeted binding, letting it bypass
the full binding resolution machinery on subsequent accesses.
Merging registers, constants and locals into single vector means:
- Better data locality
- No need to check type in Interpreter::get() and Interpreter::set()
which are very hot functions
Performance improvement is visible in almost all Octane and Kraken
tests.
When comparing two numbers, we can avoid a lot of implicit type
conversion nonsense and go straight to comparison, saving time in the
most common case.
These were out-of-line because we had some ideas about marking
instruction streams PROT_READ only, but that seems pretty arbitrary and
there's a lot of performance to be gained by putting these inline.
Instead, generate bytecode to execute their AST nodes and save the
resulting operands inside the NewClass instruction.
Moving property expression evaluation to happen before NewClass
execution also moves along creation of new private environment and
its population with private members (private members should be visible
during property evaluation).
Before:
- NewClass
After:
- CreatePrivateEnvironment
- AddPrivateName
- ...
- AddPrivateName
- NewClass
- LeavePrivateEnvironment
If the minimal amount of required bindings is known in advance, it could
be used to ensure capacity to avoid resizing the internal vector that
holds bindings.
By doing that all instructions required for instantiation are emitted
once in compilation and then reused for subsequent calls, instead of
running generic instantiation process for each call.
We now fuse sequences like [LessThan, JumpIf] to JumpLessThan.
This is only allowed for temporaries (i.e VM registers) with no other
references to them.
This removes a layer of indirection in the bytecode where we had to make
sure all the initializer elements were laid out in sequential registers.
Array expressions no longer clobber registers permanently, and they can
be reused immediately afterwards.
If we don't have enough stack space, throw an exception while we still
can, and give the caller a chance to recover.
This particular problem will go away once we make calls non-recursive.
This means inlining all the things. This yields a 40% speedup on the for
loop microbenchmark, and everything else gets faster as well. :^)
This makes compilation take foreeeever with GCC, so I'm only enabling it
for Clang in this commit. We should figure out how to make GCC compile
this without timing out CI, since the speedup is amazing.
This commit converts the main loop in Bytecode::Interpreter to use a
label table and computed goto for fast instruction dispatch.
This yields roughly 35% speedup on the for loop microbenchmark,
and makes everything else faster as well. :^)
This commit adds a HANDLE_INSTRUCTION macro that expands to everything
needed to handle a single instruction (invoking the handler function,
checking for exceptions, and advancing the program counter).
This gives a ~15% speed-up on a for loop microbenchmark, and makes
basically everything faster.