Commit Graph

18 Commits

Author SHA1 Message Date
Nico Weber
c10853ca89 LibGfx/PNGWriter: Change filter heuristic to match comment
We were computing abs(sum(signed_pixels)), while the comment says
that sum(abs(signed_pixels)) works well. Change the code to match
the comment. (While I did tweak this code recently, as far as I can
tell the code hasn't matched the comment from when it was originally
added.)

Using the benchmarking script and same inputs as in #24819, just changed
to write .png files with different --png-compression-level values:

    level 0: 390M -> 390M (no change)
    level 1:  83M ->  78M (6% smaller)
    level 2:  73M ->  69M (5.8% smaller at default compression level)
    level 3:  71M ->  67M (5.6% smaller)

Sizes as computed by `du -hAs`. (`-A` prints apparent size on macOS.
It's important to use that flag to get consistent results. On Linux,
this flag is spelled `-b` instead.)

The size of sunset_retro.png goes up a tiny bit, but but less than
0.4% at all sizes. At level 2, the size goes from 908K to 911K, for
example.

The size of Tests/LibGfx/test-inputs/jpg/big_image.jpg encoded as PNG
goes down by about 2.7%, but it's the 2.7% that gets us over an MB
boundary at levels 1 and 2. At level 1, from 14M to 13M; at level 2
from 13M to 12M. (Exact numbers: 14417809 bytes to 13429605 at level 1,
14076443 bytes to 13088791 at level 2.) For comparison, sips writes a
15M (15610049 bytes) file. So we were already writing a smaller file,
and now we're even better. (We need 778 ms at level 1 while
 sips needs 723ms. So it's a bit faster, but not a ton.)

The size of wow.apng goes from 606K to 584K (3.6% smaller).

Perf-wise, this is close to a wash. Worst case, it's maybe 2-3% slower,
but the numbers are pretty noisy, even with many repetitions in
`hyperfine`. I'm guessing `ministat` would claim that no significant
difference can be shown, but I haven't tried it. For example, for
sunset_retro.png at level 2, time goes from 179.3 ms ± 2.5 ms to
182.8 ms ± 1.9 ms, which would be a 2% slowdown. At level 0, where
the effect is relatively largest, it goes from 21.8 ms ± 0.7 ms to
22.6 ms ± 0.7 ms, a 3.6% slowdown (but with huge error bars).
For big_image.jpg level 1, time goes from 768.5 ms ± 8.4 ms to
777.9 ms ± 6.0 ms, 1.2% slower.
2024-08-17 11:03:29 -04:00
Nico Weber
b9a1eb1533 LibGfx/PNGWriter: Swap red and blue channel at write time
Previously, we were swapping red and blue before doing filtering.
The filters don't care about channel order, so instead only do
this when writing the PNG data.

In theory, this saves the work of channel swizzling when figuring
out which filter is best. In practice, it's perf-neutral:
swizzling is basically free. But it's still conceptually simpler.

No behavior change.
2024-08-17 11:03:29 -04:00
Nico Weber
0a4f8736e3 LibGfx/PNGWriter: Add support for inter-frame compression of apngs
Brings wow.apng from 1.2M to 606K, while reducing encoding time from
233 ms to 167 ms.

(For comparison, writing wow.webp currently takes 88ms and produces
a 255K file. The input wow.gif is 184K.)
2024-08-15 06:35:48 -04:00
Nico Weber
a2b59b9c98 LibGfx/PNGWriter: Implement support for writing animated PNGs
Based on #24021.

Co-Authored-By: Pixel Brush <letsplaytvirtmann@gmail.com>
2024-08-15 06:35:48 -04:00
Nico Weber
37e75223b7 LibGfx/PNGWriter: Add a stream-based encode() overload 2024-08-15 06:35:48 -04:00
Nico Weber
82053da343 LibGfx/PNGWriter: Remove one output data copy
Now that we do two passes, we can easily write into
uncompressed_block_data directly.

Somewhat surprisingly, this is perf-neutral.
2024-08-11 13:50:11 -04:00
Nico Weber
32855d2c49 LibGfx/PNGWriter: Compute which predictor to use first, store data then
Before, we would compute and store the output of each predictor,
then pick the best one, and then copy its data.

Now, we compute the output of each predictor but only compute its
score and do not store the predicted data. We then pick the best
one, and do a second pass that re-computes the output of the best
predictor, and stores it.

Instead of computing the output of the 5 different predictors, we now
compute the output of the 5 different predictors, and then the output of
one of them again. In exchange, we only write each output row once
instead of 5 times. (We also have to read the input row twice instead of
once, but the second time round it'll come from L1 or L2.)
Making the simplifying assumption that each predictor takes the same
time to compute, this increases compute to 6/5th, and reduces memory
bandwidth to 3/6th. (Before: 1 input row read, 5 output row writes;
after: 2 input row reads, 1 output row write.)

Produces exactly the same output, but is faster:

    image -o sunset_retro.png sunset_retro.bmp --png-compression-level 0
       34.8 ms ± 0.9 ms ->  22.7 ms ± 0.8 ms (34.7% faster)

    image -o sunset_retro.png sunset_retro.bmp --png-compression-level 1
       64.2 ms ± 4.9 ms ->  50.5 ms ± 0.5 ms (31.3% faster)

    image -o sunset_retro.png sunset_retro.bmp --png-compression-level 2
      190.3 ms ± 1.6 ms -> 179.0 ms ± 2.8 ms (5.8% faster)

    image -o sunset_retro.png sunset_retro.bmp --png-compression-level 3
      646.5 ms ± 4.7 ms -> 635.3 ms ± 4.4 ms (3.3% faster)

Compression level 2 is the default, so about a 6% speedup in practice.

`sips` still needs 49.9 ms ± 3.0 ms to convert sunset_retro.bmp to
sunset_retro.png at its default compression level 1.
We used to take 1.27x as long as sips, now we take 1.01x as long,
while producing a smaller output :^)

(For other, larger, input files sips is still faster and produces
smaller output.)
2024-08-11 13:50:11 -04:00
Nico Weber
ae57f6cad6 LibGfx/PNGWriter: Extract filter computation into new Filter::predict()
No behavior change.
2024-08-11 13:50:11 -04:00
Hendiadyoin1
832b5ff603 AK: Add simd_cast<T> and replace to_TxN with it 2024-08-08 22:43:53 -04:00
Nico Weber
cf8210175f LibGfx/PNGWriter: Inline the now not very useful append(u8)
No behavior change.
2024-08-06 23:00:32 -04:00
Nico Weber
781a39e613 LibGfx/PNGWriter: Use SIMD for PNG score calculation
Produces exactly the same output, but a bit faster.

The speedup is relatively bigger for worse compression:

    image -o sunset_retro.png sunset_retro.bmp --png-compression-level 0
       56.8 ms ±  1.5 ms ->  34.8 ms ± 0.9 ms (38.7% faster)

    image -o sunset_retro.png sunset_retro.bmp --png-compression-level 1
       84.6 ms ±  1.7 ms ->  64.2 ms ± 4.9 ms (24.1% faster)

    image -o sunset_retro.png sunset_retro.bmp --png-compression-level 2
      212.1 ms ±  2.5 ms -> 190.3 ms ± 1.6 ms (10.3% faster)

    image -o sunset_retro.png sunset_retro.bmp --png-compression-level 3
      671.4 ms ± 12.3 ms -> 646.5 ms ± 4.7 ms (3.7% faster)

Compression level 2 is the default, so about a 10% speedup in practice.

For comparison, `sips` needs 49.9 ms ± 3.0 ms to convert
sunset_retro.bmp to sunset_retro.png, and judging from the output file
size, it uses something similar to our compression level 1.
We used to take 1.7x as long as sips, now we take 1.29x as long.
2024-08-06 23:00:32 -04:00
Nico Weber
fd6142eba4 LibGfx/PNGWriter: Only store alpha channel if it's used
Using the same two benchmarks as in the previous commit:

1.

    n |               time |   size
    --+--------------------+--------
    0 |  56.5 ms ±  0.9 ms | 2.3M
    1 |  88.2 ms ± 14.0 ms |   962K
    2 | 214.8 ms ±  5.6 ms |   908K
    3 | 670.8 ms ±  3.6 ms |   903K

Compared to the numbers in the previous commit:

    n = 0: 17.3% faster, 23.3% smaller
    n = 1: 12.9% faster, 12.5% smaller
    n = 2, 24.9% faster,  9.2% smaller
    n = 3: 49.6% faster,  9.6% smaller

For comparison,
`sips -s format png -o sunset_retro_sips.png sunset_retro.bmp` writes
a 1.1M file (i.e. it always writes RGBA, not RGB when not necessary),
and it needs 49.9 ms ± 3.0 ms for that (also using a .bmp input). So
our output file size is competitive! We have to get a bit faster though.

For another comparison, `image -o sunset_retro.webp sunset_retro.bmp`
writes a 730K file and needs 32.1 ms ± 0.7 ms for that.

2.

    n |         time | size
    --+----------------+------
    0 | 11.334 total | 390M
    1 | 13.640 total |  83M
    2 | 15.642 total |  73M
    3 | 48.643 total |  71M

Compared to the numbers in the previous commit:

    n = 0: 15.8% faster, 25.0% smaller
    n = 1: 15.5% faster,  7.7% smaller
    n = 2: 24.0% faster,  5.2% smaller
    n = 3: 29.2% faster,  5.3% smaller

So a relatively bigger speed win for higher levels, and
a bigger size win for lower levels.

Also, the size at n = 2 with this change is now lower than it
was at n = 3 previously.
2024-07-31 18:39:08 -07:00
Nico Weber
7308ef7ce9 LibGfx/PNGWriter: Make compression level configurable
Not yet used anywhere, no behavior change.
2024-07-31 18:39:08 -07:00
Dan Klishch
56b7f9e404 Meta: Globally disable -Wpsabi
This warning is triggered when one accepts or returns vectors from a
function (that is not marked with [[gnu::target(...)]]) which would have
been otherwise passed in register if the current translation unit had
been compiled with more permissive flags wrt instruction selection (i.
e. if one adds -mavx2 to cmdline). This will never be a problem for us
since we (a) never use different instruction selection options across
ABI boundaries; (b) most of the affected functions are actually
TU-local.

Moreover, even if we somehow properly annotated all of the SIMD helpers,
calling them across ABI (or target) boundaries would still be very
dangerous because of inconsistent and bogus handling of
[[gnu::target(...)]] across compilers. See
https://github.com/llvm/llvm-project/issues/64706 and
https://www.reddit.com/r/cpp/comments/17qowl2/comment/k8j2odi .
2024-07-12 18:30:07 -04:00
Lucas CHOLLET
3f35ffb648 Userland: Prefer _string over _short_string
As `_string` can't fail anymore (since 3434412), there are no real
benefits to use the short variant in most cases.
2023-08-08 07:37:21 +02:00
Nico Weber
8c13b83a84 LibGfx/PNGWriter: Use a better limit for scanline capactiy
One scanline is width pixels long, not height pixels.

No measurable performance difference, but it annoys me every time I look
at this file.
2023-07-12 07:51:54 +01:00
Timothy Flynn
c911781c21 Everywhere: Remove needless trailing semi-colons after functions
This is a new option in clang-format-16.
2023-07-08 10:32:56 +01:00
Lucas CHOLLET
496b7ffb2b LibGfx: Move all image loaders and writers to a subdirectory 2023-03-21 22:39:25 +01:00