servo

eliott/servo

Fork 0

mirror of https://github.com/servo/servo synced 2026-04-25 17:15:48 +02:00

Table of Contents

Methodology

harproxyserver
mitmproxy

Test environment
Measurement procedure
Analysis procedure
Results

User-facing paint metrics
Raw events
Rendering phases model
Overall rendering time model

Future work

This file contains Unicode characters that might be confused with other characters. If you think that this is intentional, you can safely ignore this warning. Use the Escape button to reveal them.

We’ve analysed the runtime performance of Servo and Chromium in the cases of loading four websites, as of the versions below:

servoshell d3d6a22d27df5095c3342249d0eea0bce153cbe1 (23 September)
servoshell f8933a57353aeca14a6cbc60b3cb0cf98cab6c5d (6 October)
Google Chrome 128.0.6613.113 (Official Build)

We found the newer version of Servo (engine 2) to be significantly faster than the older version (engine 1), thanks to key performance improvements in our font stack (#33530, #33600, #33638). Taking the average change across the pages under test, First Contentful Paint times are 20% lower, and overall rendering times are 29% lower.

We found competitive results for the newer version of Servo (engine 2) in First Contentful Paint, outperforming Chromium (engine 3) in three of the four pages under test (35% faster, 68% slower, 49% faster, and 15% faster, respectively).

The same version of Servo (engine 2) performs less favourably in overall rendering time, outperforming Chromium (engine 3) in only one of the four pages under test (102% slower, 87% slower, 11% faster, and 88% slower, respectively).

All of the code and data used to write this report is in this file including:

analyse.sh and perf-analysis-tools — our tooling
*/summary.txt — summary results (generated by our tooling)
Data for each engine under test
- *.servo.1 — data for the older version of Servo (engine 1)
- *.servo.2 — data for the newer version of Servo (engine 2)
- *.chromium — data for Chromium (engine 3)
- */*.html — Servo HTML traces
- */*.pftrace — Chrome tracing files
Data for each page under test
- servo.org.* — data for https://servo.org/
- www.amazon.com.* — data for https://www.amazon.com/dp/B07S9XZYN2
- zh.wikipedia.org.* — data for https://zh.wikipedia.org/wiki/Servo
- www.baidu.com.* — data for https://www.baidu.com/

Methodology

We use the built-in profilers in Servo (HTML traces) and Chromium (Chrome tracing) to learn how much wall time is spent in two key areas:

Rendering phases: parsing, script, style, layout, paint, rasterise
User-facing paint metrics: First Paint (FP), First Contentful Paint (FCP)

Servo also has support for Time To Interactive (TTI), but not Largest Contentful Paint (LCP), the newer metric now preferred by Chromium.

We quote best results (minimum times in this case), in the absence of more suitable statistics, because using best results mitigates the negative perf effects of any variables we haven’t yet completely controlled for. For more details, see microbenchmarking calls for idealised conditions.

To maximise the quality of our data, we attempt to control for two key sources of noise.

CPU performance is managed by dedicating CPUs to the workload, and disabling anything that might vary the CPU frequency, using this script. In this report, we dedicate only two CPU cores to the workload, to simulate a more realistic and constrained CPU environment. Not many devices have sixteen fast desktop CPU cores.

Network performance is managed by using mitmproxy’s server-side replay feature to locally serve a cached copy of the pages under test and any of the resources they depend on, including cross-origin requests.

harproxyserver

At first we used harproxyserver to cache the pages under test. We start by recording a HAR file of the page under test in Chromium:

Run Chromium in incognito mode (to avoid leaking any cookies)
Open devtools and go to the Network tab
Check “Disable cache” and clear the network log
Go to the URL of the page under test
Save the network log as a HAR file

Note that any pages that start with redirects should be navigated to directly, without the redirects. Otherwise the page can’t be reloaded in harproxyserver. This is not important for the actual measurement process, but it’s useful for manual testing.

To serve the HAR file locally, we run harproxyserver (after applying harproxyserver#15) as follows:

$ node dist/harProxyServer.js -f ~/path/to/example.com.har

The page under test can then be found at its original URL, replacing the origin with http://localhost:3000. This works because harproxyserver is only a proxy server when recording HAR files; when replaying HAR files, it’s an ordinary web server over the cached responses from all origins merged into a single directory tree.

For example, say we’ve recorded this test page (see bucket.daz.cat.har in data file):

<!doctype html><meta charset="utf-8">
<img src="16.diffiesmall.jpg">
<img src="https://bucket.daz.cat/work/igalia/servo/17.diffiesmall.jpg">
<img src="https://www.azabani.com/talks/2023-06-05-servo-2023/_/tsc.png">

In this case, we would be able to access the cached responses at:

Since it also doesn’t rewrite any URL references in responses, cross-origin requests (and requests to same-origin absolute URLs) still hit the network. For example, only two of the four requests above are replayed from the HAR file:

Cross-origin requests are ubiquitous in real-world websites, so we needed to fix this. We considered reworking harproxyserver into an actual proxy server (over HTTP or SOCKS), but Servo doesn’t support proxy servers yet, so we decided to pursue a different approach for now.

mitmproxy

mitmproxy is a versatile set of tools for intercepting HTTP requests from a client or to a server. It has two features that make it very useful for our needs:

We can intercept clients that don’t support proxy servers (transparent proxy)
We can serve cached responses from an earlier recording (server-side replay)

Secure requests can also be intercepted by installing a custom root certificate, or by disabling certificate checking in the client. The latter is easier, and since both Servo and Chromium support it, that’s what we do in this report.

We use mitmproxy for both recording and replaying HTTP requests:

mitmproxy --save-stream-file records requests and responses in a cache
mitmproxy --server-replay makes the cached responses available to clients

While we could intercept requests from only specific processes using a virtual machine, on Linux it’s easier to tweak the suggested iptables rules to only match traffic from a specific group, such as mitmproxy:

$ groupadd mitmproxy
$ iptables -t nat -A OUTPUT -p tcp -m owner --gid-owner mitmproxy --dport 80 -j REDIRECT --to-port 8080
$ iptables -t nat -A OUTPUT -p tcp -m owner --gid-owner mitmproxy --dport 443 -j REDIRECT --to-port 8080

Users can be in any number of groups, but processes run with only one effective group at a time, and we can select that group with newgrp(1):

$ usermod -aG mitmproxy $(whoami)
$ id
uid=1000(delan) gid=100(users) groups=100(users),1(wheel),984(mitmproxy)
$ curl http://example.com  # not proxied, because gid=100(users)

$ newgrp mitmproxy
$ id
uid=1000(delan) gid=984(mitmproxy) groups=984(mitmproxy),1(wheel),100(users)
$ curl http://example.com  # proxied, because gid=984(mitmproxy)

As a result, processes started inside the newgrp(1) shell are proxied, while all other processes behave normally.

To record the requests for a page under test, we run start-mitmproxy.sh in record mode, then load the page in Chromium with a fresh profile. This avoids disk caching, and avoids leaking cookies or other identifying information.

$ ./start-mitmproxy.sh record path/to/example.com.mitmproxy

  - then in another terminal -

$ newgrp mitmproxy
$ google-chrome-stable --ignore-certificate-errors --user-data-dir=$(mktemp -d) --no-first-run http://example.com

Some websites, such as Amazon, divert us to a captcha when visiting from a fresh profile. For these websites, we repeat the process with the same profile, but different disk cache paths:

$ ./start-mitmproxy.sh record path/to/example.com.mitmproxy

  - then in a second terminal -

$ newgrp mitmproxy
$ chromium_profile=$(mktemp -d)
$ google-chrome-stable --ignore-certificate-errors --user-data-dir=$chromium_profile --disk-cache-dir=$(mktemp -d) --no-first-run http://example.com

  - then in the first terminal, restart the recording -

$ ./start-mitmproxy.sh record path/to/example.com.mitmproxy

  - then in the second terminal -

$ google-chrome-stable --ignore-certificate-errors --user-data-dir=$chromium_profile --disk-cache-dir=$(mktemp -d) --no-first-run http://example.com

For the mitmproxy caches used in this report, see *.mitmproxy in the data file.

Test environment

Our test environment is as follows:

AMD 7950X (amd64)
NixOS 24.11.20240905.8ce7f9f running X11
Linux 6.11.0-rc6, linuxPackages_testing from NixOS (as above)
Servo is built with ./mach build --profile production-stripped
Chromium is google-chrome from NixOS (as above)
- NIXPKGS_ALLOW_UNFREE=1 nix run --impure github:NixOS/nixpkgs/8ce7f9f78bdbe659a8d7c1fe376b89b3a43e4cdc\#google-chrome
servo/perf-analysis-tools as of d896f5dcf72ec

The workloads are run in a shell created as follows:

$ newgrp mitmproxy
$ nix-shell ~/path/to/servo/shell.nix --run zsh
$ sudo ./AMD-7950X-8,9.sh $$

Measurement procedure

The way we run Servo and Chromium is fully automated with the scripts in servo/perf-analysis-tools. For each test case, we set $key to a unique name for the results, and $url to the URL of the page under test (note the trailing slashes in $url values at the root).

$ key=servo.org; url=https://servo.org/
$ key=www.amazon.com; url=https://www.amazon.com/dp/B07S9XZYN2
$ key=zh.wikipedia.org; url=https://zh.wikipedia.org/wiki/Servo
$ key=www.baidu.com; url=https://www.baidu.com/

We run Servo as follows (see */trace*.html in data file):

$ ./benchmark-servo.sh ~/path/to/servo1/servo "$url" 30 ./$key.servo.1
$ ./benchmark-servo.sh ~/path/to/servo2/servo "$url" 30 ./$key.servo.2

We run Chromium as follows (see */chrome*.pftrace in data file):

$ ./benchmark-chromium.sh google-chrome-stable "$url" 30 ./$key.chromium

Analysis procedure

We wrote a Rust program in servo/perf-analysis-tools to analyse the data. The program requires that we first convert the Chromium traces from Perfetto format to JSON format:

$ for i in ./$key.chromium/*.pftrace; do python ~/path/to/traceconv json $i ${i%.pftrace}.json; done

Then we can generate a summary of each dataset as follows (see */summary.txt in data file):

$ cargo run -r -- servo "$url" ./$key.servo.1/*.html
$ cargo run -r -- servo "$url" ./$key.servo.2/*.html
$ cargo run -r -- chromium "$url" ./$key.chromium/*.html

We can generate combined traces of all of the Servo and Chromium samples for each test case as follows:

$ cargo run -r -- combined servo "$url" ./$key.servo.1/*.html -- servo "$url" ./$key.servo.2/*.html -- chromium "$url" ./$key.chromium/*.json > $key.combined.json

These traces, like all of the Chromium traces in both formats, can be viewed in the Perfetto UI, but for reasons that are not yet clear, the charts often show bars directly overlapping each other, making them hard to read.

Results

In this section, we’ve numbered the browsers under test the same way as we did at the start of the report:

servoshell d3d6a22d27df5095c3342249d0eea0bce153cbe1 (23 September)
servoshell f8933a57353aeca14a6cbc60b3cb0cf98cab6c5d (6 October)
Google Chrome 128.0.6613.113 (Official Build)

User-facing paint metrics

The paint metrics (FP and FCP) are standard web platform concepts that should be comparable between Chromium and Servo. In Chromium, their duration is measured from the markAsMainFrame event. In Servo, they are measured from the start of the first event associated with the page URL, which is always a ScriptParseHTML event.

The results are as follows.

https://servo.org/

FP: 481.0ms _{(n=30, μ=548.3ms, s=56.57ms, min=481.0ms, max=792.9ms)}
FCP: 481.0ms _{(n=30, μ=548.3ms, s=56.57ms, min=481.0ms, max=792.9ms)}
FP: 403.5ms _{(n=30, μ=468.5ms, s=49.16ms, min=403.5ms, max=621.9ms)}
FCP: 403.5ms _{(n=30, μ=468.5ms, s=49.16ms, min=403.5ms, max=621.9ms)}
FP: 624.5ms _{(n=24, μ=730.2ms, s=76.16ms, min=624.5ms, max=1.002s)}
FCP: 624.5ms _{(n=24, μ=730.2ms, s=76.16ms, min=624.5ms, max=1.002s)}

https://www.amazon.com/dp/B07S9XZYN2

FP: 803.5ms _{(n=28, μ=1.230s, s=154.6ms, min=803.5ms, max=1.479s)}
FCP: 803.5ms _{(n=28, μ=1.230s, s=154.6ms, min=803.5ms, max=1.479s)}
FP: 885.2ms _{(n=29, μ=1.349s, s=183.8ms, min=885.2ms, max=1.545s)}
FCP: 885.2ms _{(n=29, μ=1.349s, s=183.8ms, min=885.2ms, max=1.545s)}
FP: 524.9ms _{(n=25, μ=634.5ms, s=104.4ms, min=524.9ms, max=1.036s)}
FCP: 524.9ms _{(n=25, μ=634.5ms, s=104.4ms, min=524.9ms, max=1.036s)}

https://zh.wikipedia.org/wiki/Servo

FP: 877.3ms _{(n=27, μ=945.4ms, s=41.23ms, min=877.3ms, max=1.074s)}
FCP: 877.3ms _{(n=27, μ=945.4ms, s=41.23ms, min=877.3ms, max=1.074s)}
FP: 305.6ms _{(n=25, μ=333.4ms, s=21.16ms, min=305.6ms, max=398.3ms)}
FCP: 305.6ms _{(n=25, μ=333.4ms, s=21.16ms, min=305.6ms, max=398.3ms)}
FP: 603.7ms _{(n=21, μ=750.7ms, s=452.9ms, min=603.7ms, max=2.725s)}
FCP: 603.7ms _{(n=21, μ=750.7ms, s=452.9ms, min=603.7ms, max=2.725s)}

https://www.baidu.com/

FP: 774.0ms _{(n=30, μ=1.017s, s=444.5ms, min=774.0ms, max=2.868s)}
FCP: 774.0ms _{(n=30, μ=1.017s, s=444.5ms, min=774.0ms, max=2.868s)}
FP: 703.5ms _{(n=30, μ=891.1ms, s=436.7ms, min=703.5ms, max=3.144s)}
FCP: 703.5ms _{(n=30, μ=891.1ms, s=436.7ms, min=703.5ms, max=3.144s)}
FP: 828.7ms _{(n=21, μ=1.304s, s=619.8ms, min=828.7ms, max=2.983s)}
FCP: 828.7ms _{(n=21, μ=1.304s, s=619.8ms, min=828.7ms, max=2.983s)}

Raw events

The other events need to be compared with careful knowledge of what the events are measuring. Notably, Servo and Chromium have very different ideas of what “layout” mean, which in turn influences what Servo’s “LayoutPerform” event maps to in Chromium.

In Servo, “layout” means both building layout trees and converting them to display lists. Display lists are then handed to WebRender, which rasterises the display items and composites the resultant layers. Note that since we now use WebRender, references to “painting” and “compositing” are now vestigial. Servo’s complete list of HTML trace events includes:

ScriptParseHTML — script calls on html5ever to parse HTML and build the DOM
ScriptEvaluate — script calls on SpiderMonkey to compile and execute JavaScript
LayoutPerform — layout calls on Stylo to recalculate styles, builds the box tree and fragment tree, then converts the fragment tree to a display list and sends it to WebRender
Compositing — WebRender splits the display list into layers, rasterises them, then composites and draws the result
Painting* events are completely unused today, and all other Layout* events are only emitted by legacy layout

In Chromium, “layout” only means building layout trees, while converting them to display lists is called “paint”. Some writing describes Servo this way too, but this is not the terminology used by Servo internally. The closest thing Chromium has to a centralised list of tracing events is the list used by their devtools Performance tab and includes:

ParseHTML — parsing HTML and building the DOM
EvaluateScript, FunctionCall, TimerFire — compiling and executing JavaScript
UpdateLayoutTree — recalculating styles (per the source)
Layout — building the fragment tree (per RenderingNG architecture)
PrePaint and Paint — invalidating and building the display list (as above)
Layerize — splitting the display list into layers (as above)

The results are as follows, but the Servo and Chromium results are not comparable.

https://servo.org/

Compositing: 54.75ms _{(n=30, μ=109.4ms, s=30.31ms, min=54.75ms, max=162.8ms)}
LayoutPerform: 236.8ms _{(n=30, μ=343.4ms, s=37.26ms, min=236.8ms, max=445.1ms)}
ScriptEvaluate: 345.5μs _{(n=30, μ=698.4μs, s=1.176ms, min=345.5μs, max=6.807ms)}
ScriptParseHTML: 181.9ms _{(n=30, μ=215.5ms, s=18.89ms, min=181.9ms, max=253.8ms)}
Compositing: 58.60ms _{(n=30, μ=105.5ms, s=31.57ms, min=58.60ms, max=159.8ms)}
LayoutPerform: 130.0ms _{(n=30, μ=169.0ms, s=22.04ms, min=130.0ms, max=215.9ms)}
ScriptEvaluate: 351.1μs _{(n=30, μ=811.1μs, s=1.052ms, min=351.1μs, max=4.727ms)}
ScriptParseHTML: 97.63ms _{(n=30, μ=117.7ms, s=16.92ms, min=97.63ms, max=167.3ms)}
EvaluateScript: 5.257ms _{(n=24, μ=11.89ms, s=3.528ms, min=5.257ms, max=17.94ms)}
FunctionCall: 1.144ms _{(n=24, μ=2.170ms, s=1.720ms, min=1.144ms, max=6.494ms)}
Layerize: 228.0μs _{(n=24, μ=450.6μs, s=416.9μs, min=228.0μs, max=2.157ms)}
Layout: 50.70ms _{(n=24, μ=67.52ms, s=10.98ms, min=50.70ms, max=91.35ms)}
Paint: 983.0μs _{(n=24, μ=1.567ms, s=520.0μs, min=983.0μs, max=2.887ms)}
ParseHTML: 6.569ms _{(n=24, μ=18.80ms, s=6.296ms, min=6.569ms, max=28.85ms)}
PrePaint: 595.0μs _{(n=24, μ=1.634ms, s=2.126ms, min=595.0μs, max=9.470ms)}
TimerFire: 457.0μs _{(n=24, μ=1.299ms, s=1.767ms, min=457.0μs, max=5.807ms)}
UpdateLayoutTree: 19.13ms _{(n=24, μ=35.82ms, s=9.861ms, min=19.13ms, max=55.18ms)}

https://www.amazon.com/dp/B07S9XZYN2

Compositing: 12.36ms _{(n=28, μ=80.32ms, s=27.87ms, min=12.36ms, max=112.2ms)}
LayoutPerform: 2.265s _{(n=28, μ=2.650s, s=201.6ms, min=2.265s, max=3.051s)}
ScriptEvaluate: 283.6ms _{(n=28, μ=434.9ms, s=141.2ms, min=283.6ms, max=710.4ms)}
ScriptParseHTML: 725.1ms _{(n=28, μ=1.079s, s=299.3ms, min=725.1ms, max=1.645s)}
Compositing: 20.71ms _{(n=29, μ=80.37ms, s=19.57ms, min=20.71ms, max=101.4ms)}
LayoutPerform: 2.155s _{(n=29, μ=2.508s, s=185.7ms, min=2.155s, max=2.855s)}
ScriptEvaluate: 220.1ms _{(n=29, μ=251.5ms, s=19.18ms, min=220.1ms, max=297.3ms)}
ScriptParseHTML: 861.1ms _{(n=29, μ=977.6ms, s=63.55ms, min=861.1ms, max=1.101s)}
EvaluateScript: 130.0ms _{(n=25, μ=190.0ms, s=22.75ms, min=130.0ms, max=224.2ms)}
FunctionCall: 916.9ms _{(n=25, μ=1.062s, s=78.09ms, min=916.9ms, max=1.189s)}
Layerize: 8.102ms _{(n=25, μ=12.38ms, s=4.063ms, min=8.102ms, max=26.64ms)}
Layout: 170.3ms _{(n=25, μ=200.7ms, s=18.56ms, min=170.3ms, max=240.4ms)}
Paint: 20.53ms _{(n=25, μ=31.53ms, s=6.047ms, min=20.53ms, max=42.22ms)}
ParseHTML: 161.1ms _{(n=25, μ=226.9ms, s=23.21ms, min=161.1ms, max=265.8ms)}
PrePaint: 22.77ms _{(n=25, μ=31.75ms, s=5.463ms, min=22.77ms, max=42.87ms)}
TimerFire: 512.5ms _{(n=25, μ=609.6ms, s=53.89ms, min=512.5ms, max=726.1ms)}
UpdateLayoutTree: 125.1ms _{(n=25, μ=153.7ms, s=18.82ms, min=125.1ms, max=211.8ms)}

https://zh.wikipedia.org/wiki/Servo

Compositing: 40.22ms _{(n=27, μ=47.77ms, s=13.82ms, min=40.22ms, max=97.69ms)}
LayoutPerform: 799.2ms _{(n=27, μ=918.6ms, s=54.57ms, min=799.2ms, max=1.077s)}
ScriptEvaluate: 636.1μs _{(n=27, μ=755.8μs, s=273.8μs, min=636.1μs, max=2.092ms)}
ScriptParseHTML: 786.0ms _{(n=27, μ=843.0ms, s=33.01ms, min=786.0ms, max=921.1ms)}
Compositing: 41.32ms _{(n=25, μ=45.27ms, s=3.009ms, min=41.32ms, max=52.85ms)}
LayoutPerform: 247.1ms _{(n=25, μ=277.2ms, s=21.07ms, min=247.1ms, max=315.5ms)}
ScriptEvaluate: 656.6μs _{(n=25, μ=755.0μs, s=266.1μs, min=656.6μs, max=1.990ms)}
ScriptParseHTML: 232.9ms _{(n=25, μ=250.0ms, s=8.700ms, min=232.9ms, max=267.7ms)}
EvaluateScript: 8.715ms _{(n=21, μ=13.55ms, s=3.772ms, min=8.715ms, max=21.51ms)}
FunctionCall: 101.8ms _{(n=21, μ=119.0ms, s=19.94ms, min=101.8ms, max=193.9ms)}
Layerize: 1.973ms _{(n=21, μ=2.363ms, s=193.2μs, min=1.973ms, max=2.703ms)}
Layout: 176.3ms _{(n=21, μ=225.0ms, s=27.69ms, min=176.3ms, max=298.5ms)}
Paint: 10.31ms _{(n=21, μ=13.68ms, s=2.274ms, min=10.31ms, max=21.20ms)}
ParseHTML: 9.926ms _{(n=21, μ=17.78ms, s=5.789ms, min=9.926ms, max=29.86ms)}
PrePaint: 4.482ms _{(n=21, μ=4.865ms, s=361.5μs, min=4.482ms, max=6.192ms)}
TimerFire: 25.84ms _{(n=21, μ=32.80ms, s=6.821ms, min=25.84ms, max=54.42ms)}
UpdateLayoutTree: 21.77ms _{(n=21, μ=32.19ms, s=4.835ms, min=21.77ms, max=40.48ms)}

https://www.baidu.com/

Compositing: 27.98ms _{(n=30, μ=50.73ms, s=12.14ms, min=27.98ms, max=68.03ms)}
LayoutPerform: 269.7ms _{(n=30, μ=414.7ms, s=36.85ms, min=269.7ms, max=464.6ms)}
ScriptEvaluate: 9.773ms _{(n=30, μ=10.61ms, s=809.0μs, min=9.773ms, max=13.87ms)}
ScriptParseHTML: 83.38ms _{(n=30, μ=85.18ms, s=1.396ms, min=83.38ms, max=90.06ms)}
Compositing: 34.76ms _{(n=30, μ=53.41ms, s=9.422ms, min=34.76ms, max=65.17ms)}
LayoutPerform: 189.3ms _{(n=30, μ=202.2ms, s=8.943ms, min=189.3ms, max=223.6ms)}
ScriptEvaluate: 9.950ms _{(n=30, μ=10.57ms, s=607.1μs, min=9.950ms, max=13.19ms)}
ScriptParseHTML: 84.65ms _{(n=30, μ=86.53ms, s=1.466ms, min=84.65ms, max=92.67ms)}
EvaluateScript: 85.26ms _{(n=21, μ=186.3ms, s=49.14ms, min=85.26ms, max=310.9ms)}
FunctionCall: 26.96ms _{(n=21, μ=102.3ms, s=22.93ms, min=26.96ms, max=150.5ms)}
Layerize: 475.0μs _{(n=21, μ=739.5μs, s=381.0μs, min=475.0μs, max=1.899ms)}
Layout: 26.32ms _{(n=21, μ=35.20ms, s=25.13ms, min=26.32ms, max=143.8ms)}
Paint: 867.0μs _{(n=21, μ=1.377ms, s=780.8μs, min=867.0μs, max=3.778ms)}
ParseHTML: 31.09ms _{(n=21, μ=58.24ms, s=22.69ms, min=31.09ms, max=119.8ms)}
PrePaint: 864.0μs _{(n=21, μ=1.431ms, s=1.230ms, min=864.0μs, max=6.535ms)}
TimerFire: 7.711ms _{(n=21, μ=13.90ms, s=4.517ms, min=7.711ms, max=20.11ms)}
UpdateLayoutTree: 6.570ms _{(n=21, μ=8.633ms, s=3.173ms, min=6.570ms, max=21.84ms)}

Rendering phases model

Using the events in Servo as a lowest common denominator, we tried to define a unified event model to bridge the gap in terminology:

Parse = ScriptParseHTML in Servo; ParseHTML in Chromium
Script = ScriptEvaluate in Servo; EvaluateScript, FunctionCall, TimerFire in Chromium
Layout = LayoutPerform in Servo; UpdateLayoutTree, Layout, PrePaint, Paint in Chromium
Rasterise = Compositing in Servo; Layerize in Chromium

Unfortunately, applying this model to the actual data is problematic.

Since we’re limited to what Servo’s HTML traces provide, there are rendering phases that are not reflected in this model, such as style or what Chromium calls “paint”.

Rasterise is currently incomplete for Chromium, because aside from Layerize, rasterisation and compositing events are not associated with any particular frame, navigationId, or documentLoaderURL. Our analysis tool currently relies on these for Chromium tracing data.

Worse still, Parse and Script are of questionable value as currently implemented:

Their relative proportions are heavily distorted between Servo and Chromium, which suggests that they may not be comparable
There is always a point in the loading process before which any time we record in LayoutPerform is also recorded in ParseHTML, yielding unreasonable Parse times

The results are as follows, but we do not consider them very meaningful.

https://servo.org/

Parse: 181.9ms _{(n=30, μ=215.5ms, s=18.89ms, min=181.9ms, max=253.8ms)}
Script: 345.5μs _{(n=30, μ=698.4μs, s=1.176ms, min=345.5μs, max=6.807ms)}
Layout: 236.8ms _{(n=30, μ=343.4ms, s=37.26ms, min=236.8ms, max=445.1ms)}
Rasterise: 54.75ms _{(n=30, μ=109.4ms, s=30.31ms, min=54.75ms, max=162.8ms)}
Parse: 97.63ms _{(n=30, μ=117.7ms, s=16.92ms, min=97.63ms, max=167.3ms)}
Script: 351.1μs _{(n=30, μ=811.1μs, s=1.052ms, min=351.1μs, max=4.727ms)}
Layout: 130.0ms _{(n=30, μ=169.0ms, s=22.04ms, min=130.0ms, max=215.9ms)}
Rasterise: 58.60ms _{(n=30, μ=105.5ms, s=31.57ms, min=58.60ms, max=159.8ms)}
Parse: 6.465ms _{(n=24, μ=18.52ms, s=6.129ms, min=6.465ms, max=27.77ms)}
Script: 6.482ms _{(n=24, μ=14.07ms, s=3.680ms, min=6.482ms, max=19.19ms)}
Layout: 81.25ms _{(n=24, μ=106.5ms, s=13.20ms, min=81.25ms, max=148.6ms)}
Rasterise: 228.0μs _{(n=24, μ=450.6μs, s=416.9μs, min=228.0μs, max=2.157ms)}

https://www.amazon.com/dp/B07S9XZYN2

Parse: 725.1ms _{(n=28, μ=1.079s, s=299.3ms, min=725.1ms, max=1.645s)}
Script: 283.6ms _{(n=28, μ=434.9ms, s=141.2ms, min=283.6ms, max=710.4ms)}
Layout: 2.265s _{(n=28, μ=2.650s, s=201.6ms, min=2.265s, max=3.051s)}
Rasterise: 12.36ms _{(n=28, μ=80.32ms, s=27.87ms, min=12.36ms, max=112.2ms)}
Parse: 861.1ms _{(n=29, μ=977.6ms, s=63.55ms, min=861.1ms, max=1.101s)}
Script: 220.1ms _{(n=29, μ=251.5ms, s=19.18ms, min=220.1ms, max=297.3ms)}
Layout: 2.155s _{(n=29, μ=2.508s, s=185.7ms, min=2.155s, max=2.855s)}
Rasterise: 20.71ms _{(n=29, μ=80.37ms, s=19.57ms, min=20.71ms, max=101.4ms)}
Parse: 161.1ms _{(n=25, μ=226.9ms, s=23.21ms, min=161.1ms, max=265.8ms)}
Script: 1.154s _{(n=25, μ=1.308s, s=86.63ms, min=1.154s, max=1.469s)}
Layout: 363.1ms _{(n=25, μ=417.7ms, s=28.54ms, min=363.1ms, max=464.8ms)}
Rasterise: 8.102ms _{(n=25, μ=12.38ms, s=4.063ms, min=8.102ms, max=26.64ms)}

https://zh.wikipedia.org/wiki/Servo

Parse: 786.0ms _{(n=27, μ=843.0ms, s=33.01ms, min=786.0ms, max=921.1ms)}
Script: 636.1μs _{(n=27, μ=755.8μs, s=273.8μs, min=636.1μs, max=2.092ms)}
Layout: 799.2ms _{(n=27, μ=918.6ms, s=54.57ms, min=799.2ms, max=1.077s)}
Rasterise: 40.22ms _{(n=27, μ=47.77ms, s=13.82ms, min=40.22ms, max=97.69ms)}
Parse: 232.9ms _{(n=25, μ=250.0ms, s=8.700ms, min=232.9ms, max=267.7ms)}
Script: 656.6μs _{(n=25, μ=755.0μs, s=266.1μs, min=656.6μs, max=1.990ms)}
Layout: 247.1ms _{(n=25, μ=277.2ms, s=21.07ms, min=247.1ms, max=315.5ms)}
Rasterise: 41.32ms _{(n=25, μ=45.27ms, s=3.009ms, min=41.32ms, max=52.85ms)}
Parse: 9.926ms _{(n=21, μ=17.78ms, s=5.789ms, min=9.926ms, max=29.86ms)}
Script: 116.1ms _{(n=21, μ=134.1ms, s=18.73ms, min=116.1ms, max=205.4ms)}
Layout: 231.2ms _{(n=21, μ=275.7ms, s=29.73ms, min=231.2ms, max=348.0ms)}
Rasterise: 1.973ms _{(n=21, μ=2.363ms, s=193.2μs, min=1.973ms, max=2.703ms)}

https://www.baidu.com/

Parse: 83.38ms _{(n=30, μ=85.18ms, s=1.396ms, min=83.38ms, max=90.06ms)}
Script: 9.773ms _{(n=30, μ=10.61ms, s=809.0μs, min=9.773ms, max=13.87ms)}
Layout: 269.7ms _{(n=30, μ=414.7ms, s=36.85ms, min=269.7ms, max=464.6ms)}
Rasterise: 27.98ms _{(n=30, μ=50.73ms, s=12.14ms, min=27.98ms, max=68.03ms)}
Parse: 84.65ms _{(n=30, μ=86.53ms, s=1.466ms, min=84.65ms, max=92.67ms)}
Script: 9.950ms _{(n=30, μ=10.57ms, s=607.1μs, min=9.950ms, max=13.19ms)}
Layout: 189.3ms _{(n=30, μ=202.2ms, s=8.943ms, min=189.3ms, max=223.6ms)}
Rasterise: 34.76ms _{(n=30, μ=53.41ms, s=9.422ms, min=34.76ms, max=65.17ms)}
Parse: 31.09ms _{(n=21, μ=58.24ms, s=22.69ms, min=31.09ms, max=119.8ms)}
Script: 113.3ms _{(n=21, μ=289.3ms, s=68.56ms, min=113.3ms, max=462.2ms)}
Layout: 35.22ms _{(n=21, μ=46.64ms, s=28.50ms, min=35.22ms, max=169.4ms)}
Rasterise: 475.0μs _{(n=21, μ=739.5μs, s=381.0μs, min=475.0μs, max=1.899ms)}

Overall rendering time model

To make the most of the data we have, we can take the union of all of the events above and call that the Renderer. Whenever Servo or Chromium is recording time in Renderer, they are busy with some phase of the rendering process for the page under test (or a generic task, like Compositing in Servo).

The results are as follows.

https://servo.org/

Renderer: 294.7ms _{(n=30, μ=437.5ms, s=42.09ms, min=294.7ms, max=539.4ms)}
Renderer: 197.7ms _{(n=30, μ=265.5ms, s=31.79ms, min=197.7ms, max=322.4ms)}
Renderer: 97.45ms _{(n=24, μ=129.5ms, s=13.34ms, min=97.45ms, max=163.5ms)}

https://www.amazon.com/dp/B07S9XZYN2

Renderer: 2.680s _{(n=28, μ=3.105s, s=177.9ms, min=2.680s, max=3.424s)}
Renderer: 2.648s _{(n=29, μ=2.993s, s=173.2ms, min=2.648s, max=3.354s)}
Renderer: 1.409s _{(n=25, μ=1.617s, s=96.49ms, min=1.409s, max=1.791s)}

https://zh.wikipedia.org/wiki/Servo

Renderer: 850.4ms _{(n=27, μ=944.5ms, s=50.86ms, min=850.4ms, max=1.089s)}
Renderer: 316.0ms _{(n=25, μ=339.9ms, s=14.87ms, min=316.0ms, max=375.4ms)}
Renderer: 359.0ms _{(n=21, μ=421.2ms, s=41.44ms, min=359.0ms, max=527.5ms)}

https://www.baidu.com/

Renderer: 367.6ms _{(n=30, μ=524.7ms, s=35.69ms, min=367.6ms, max=576.4ms)}
Renderer: 288.7ms _{(n=30, μ=315.9ms, s=8.524ms, min=288.7ms, max=331.4ms)}
Renderer: 152.8ms _{(n=21, μ=343.5ms, s=80.49ms, min=152.8ms, max=529.8ms)}

Future work

We can investigate and rectify the causes of performance gaps between Servo and Chromium. Potentially interesting tasks in this area include further eliminating the overhead of font loading and transfer across IPC. In many cases, memory mapping can be used to share system fonts. Additionally, we can work to avoid unecessary data copies of font data in the interface between Servo, WebRender, and platform APIs.

There are also many layout improvements we can make. Caching can be used, particularly in modes such as flexbox, to reduce redundant work. We also plan to implement incremental layout, effectively a cache of the entire layout tree which should make most layouts of the page after the first, much faster. Currently, incremental layout is only partially implemented in the legacy layout system.

We can explore other test cases, including other pages and other test scenarios besides initial page load. We can also set up perf bots to monitor performance and catch regressions. Both of these would benefit from further automation.

We can add more profiling events to Servo, such as Largest Contentful Paint (LCP). We can also make our profiling events more detailed, to give us more insight into script and layout.