2661 Commits

Author SHA1 Message Date
Sönke Holz
f616e57762 LibVT+Userland+Kernel: Use the ConEmu OSC sequence for progress bars
Instead of having our own incompatible OSC 9 sequence for showing a
progress bar in the task bar, switch to using the more commonly used
OSC 9;4 sequence, introduced by ConEmu [1].
Other terminal emulators, like iTerm2 [2], Ghostty [3], and the
Windows Terminal [4] also use this OSC sequence.

This makes progress bar updates sent by Neovim work properly in our
terminal. Previously, it was always stuck at 100%.

[1] https://conemu.github.io/en/AnsiEscapeCodes.html#ConEmu_specific_OSC
[2] https://iterm2.com/documentation-escape-codes.html
[3] https://ghostty.org/docs/vt/osc/conemu#change-progress-state-(osc-94)
[4] https://learn.microsoft.com/en-us/windows/terminal/tutorials/progress-bar-sequences
2026-04-13 16:06:07 +02:00
Sönke Holz
a6acdbf76f nproc: Use Core::System::hardware_concurrency()
This is significantly simpler than parsing JSON and using the cpuinfo
array length.

This also has the side effect of making that command work on AArch64
and RISC-V, since ProcessorInfo is not implemented on those
architectures yet.
2026-04-07 14:44:34 +02:00
Lucas Chollet
3b4c3ed6fa test: Add support for the -s option 2026-03-14 15:31:22 +01:00
Liav A.
f6db24dba4 Kernel+runc: Remove the pivot_root functionality in copy_mount syscall
That functionality seems to be too much complicated.
We shouldn't overengineer how the copy_mount syscall works, so instead
of allowing replacement of the root filesystem, let's make the unshare
file descriptor to be configured via a special ioctl call before we
initialize a new VFSRootContext object.

The special ioctl can either set a new root filesystem for the upcoming
VFSRootContext object, or remove it (by passing fd of -1).
If there's no specified root filesystem, a new RAMFS instance will be
created automatically when invoking the unshare_create syscall.

This also simplifies the code in the boot process, hence making it much
more readable.

It should be noted, that we assumed during pivot_root that the first
mountpoint in a context is the root mountpoint, which is probably a fair
assumption, but we don't assume this anywhere else in the VFSRootContext
code.
If this functionality ever comes back, we should ensure that we make
some effort to not assume this again.
2026-03-14 11:45:37 +01:00
Liav A.
2a4a096e0f Kernel+runc: Make unshare syscalls more fd-oriented
Instead of creating a new resource that has its own ID number and work
with it directly, we can create a file that describes the unshared
resource, execute ioctl calls on it and only enter into it in the end,
essentially creating the resource only during the last call instead
of the previous method of creation of a resource when "attaching" to
that resource.

We can enter a resource for current program execution, after the exec
syscall, or both.
That change allows userspace to create a resource and attach to it only
in the new program, which makes it more comfortable to do cleanups or
track the new process, outside of the created container.

It should be noted that until this commit, we entered a resource without
detaching the old one, essentially leaking the attach counter of a
resource. While this bug didn't have severe effects, it was obvious that
a proper cleanup userspace code later on wouldn't work in that situation
anyway, so this commit changes the way we work, and the terminology of
entering a resource is actually to **replace** it.

These changes essentially open an opportunity to extend runc to be a
container manager rather being launcher of a containerized environment,
which makes it possible to do all sorts of nice cleanups and tracking of
containers' states.
2026-03-14 11:45:37 +01:00
Liav A.
a6868a6a33 Utilities/runc: Remove commented code in VFSRootContextLayout class
Not sure why it was here, so let's just remove this now.
2026-03-14 11:45:37 +01:00
Liav A.
30ffb7f835 init: Use the correct TTY major number when in emergency mode
We used major number 4 which is the major number for serial devices, not
a virtual console (e.g. /dev/ttyX).

This regressed probably a long time ago when there was a re-organization
of major numbers, and went unnoticed due to not being tested in such
scenario.

Therefore, I just put this patch as a quick fix, without trying to find
the exact commit which created this bug.
2026-03-14 11:32:27 +01:00
Nico Weber
883953b441 zip: Use st_mtime to access mtimes
`st_mtime` is a define on modern POSIXes, for `st_mtim.tv_sec` on
e.g. Linux and for `st_mtimespec.tv_sec` on e.g. macOS.

No behavior change, but lets this file compile on macOS.
2026-02-16 19:04:49 -05:00
Sönke Holz
e6f97c316d AK+Userland: Add and use the UnsignedIntegral concept
Unlike the previous commit, IsUnsigned is never true for floating-point
types, since there are no unsigned floats.
However, it's still nice to keep the intent clear that this concept
is only for integral types.
2026-02-08 15:51:33 +01:00
NoobZang
1e69af3b17 top: Fix refresh hangs by correctly configuring non-blocking TTY
Setting O_NONBLOCK via fcntl is not sufficient for TTYs in
non-canonical mode. A subsequent tcsetattr call would override this,
as the default VMIN and VTIME settings cause the driver to block.

Explicitly set VMIN and VTIME to 0 to ensure a true non-blocking read.

Also, apply these TTY settings to STDIN_FILENO instead of STDOUT_FILENO.
This is semantically more correct as we are configuring input behavior.

Fixes #26166.
2026-01-23 21:05:33 +01:00
Sönke Holz
839fd64912 Utilities: Add a simple utility for debugging input events
This helps debugging whether input device drivers work correctly.
2026-01-04 23:19:25 +01:00
po-nuvai
1d3aa969e6 tar: Return error instead of panic for XZ compression
Return a clear error message instead of TODO() panic when attempting to
create XZ-compressed archives since compression is not yet implemented.
2025-12-29 11:26:47 -05:00
implicitfield
47146b00c9 AK+Userland+Kernel+Tests: Deduplicate IPv4 header checksumming
Previously, there were two slightly different implementations of this in
the Kernel and in LibCrypto.

This patch removes those, and replaces them with an implementation that
seeks to combine the best aspects of both while not being a drop-in
replacement for either.

Two new tests are also included.
2025-12-29 14:01:24 +01:00
Nico Weber
8fd767b129 LibGfx/JBIG2Loader+jbig2-from-json: Make JBIG2Loader strictness settable
Clients of JBIG2Loader can now pick how strictly invalid images will
be rejected.

By default, it's set to permissive, which allows spec violations that
happen in practice and are easy to work around. At the moment, this
is just for images in the Power JBIG2 suite, but this will be used to
relax some other checks soon.

jbig2-from-json now requests strict spec-compliant checking. This for
now has the effect that manually adding a comment advertising a file
as being a Power JBIG2 file will no longer allow a few quirks when
using jbig2-from-json.
2025-12-27 07:53:16 -05:00
Nico Weber
db9d02f41e jbig2-from-image: Allow array for pattern dictionary image_data
All images in the array must have the same dimensions. They're all
concatenated into a single wide image.
2025-12-23 10:07:00 -05:00
Nico Weber
7107017412 jbig2-from-image: Don't crash on halftone region without image_data key 2025-12-23 10:07:00 -05:00
Nico Weber
99fb2b4a96 LibGfx/JBIG2Writer+jbig2-from-json: Allow non-identity halftone grids
A halftone region's graymap stores an index that picks a bitmap
from the pattern dictionary. For each graymap pixel, an affine transform
maps that pixel to a position in the output image, and the pattern is
drawn at that position.

So far, we only supported an identity grid that neatly tiled the pattern
bitmaps across the image.

Now we support the full affine transform. This allows for a rotated
grid (for the top left positions; the pattern bitmaps are always drawn
axis-aligned from the top left position).

For tile generation for tests, we just collect tiles off that grid,
and then match against those in graymap generation. This means the
test json file needs to contain numbers that make sense: We currently
don't automatically compute a good rotated grid that covers the
output image.

Tile generation for pattern images clipped on the left or right is
probably wrong. Since the test image has a generous white border,
it doesn't negatively affect our tests, but it's something we might
want to improve in the future.

For `pattern_dictionary`, if the grid parameters are omitted, they
default to an identity grid. Else, it's up to the author of the
json file to make sure the parameters match what's used in the
corresponding halftone region.

For rotated grids, the pattern bitmaps need to overlap to not
have gaps. The pattern construction method currently assumes
that `combination_operator` of `or` is used for the halftone
region.
2025-12-21 19:34:50 -05:00
Nico Weber
a68193ee15 jbig2-from-json: Remove unused variable 2025-12-21 19:34:50 -05:00
Nico Weber
390a7349f1 LibGfx/ICC+icc: Add built-in profiles using mAB / mBA tags
These are really only useful for testing, so don't mention them
in `icc --help` output for `-n`.

They're supposed to behave identically to the built-in XYZ and
LAB profiles (but not the LAB_mft2 profile, which uses a slightly
different internal encoding, see #26462. The new XYZ profiles added
here behave like the XYZ profile described there, but the new LAB
profiles do not behave like the LAB_mft2 profile described there,
but like the LAB (no _mft2) profile:

    % echo 'pcslab(100, 127, -128)' | \
          icc -n LAB_mABmBA_u8_clut --stdin-u8-from-pcs | \
          icc -n LAB --stdin-u8-to-pcs
    pcslab(100, 127, -128)

    % echo 'pcslab(100, 127, -128)' | \
          icc -n LAB_mABmBA_u16_clut --stdin-u8-from-pcs | \
          icc -n LAB --stdin-u8-to-pcs
    pcslab(100, 127, -128)

    % echo 'pcslab(100, 127, -128)' | \
          icc -n LAB_mABmBA_no_clut --stdin-u8-from-pcs | \
          icc -n LAB --stdin-u8-to-pcs
    pcslab(100, 127, -128)

The profiles without a CLUT already round-trip fine, so add
tests for those.
2025-12-11 07:10:39 -05:00
Nico Weber
d81e3dc6a1 icc: Mention XYZ in the -n help 2025-12-11 07:10:39 -05:00
Nico Weber
76de952a7e LibGfx/ICC: Add a built-in LAB mft2 profile
This is even weirder than the XYZ mft2 profile added in an
earlier commit in this PR:

mft2 uses a "legacy 16-bit PCSLAB encoding" instead of the "normal"
16-bit ICC PCSLAB encoding, and this being an identity mapping leaks
this into the outputs. We again linearly map this internal 16-bit
encoding to 8 bits.

This legacy encoding maps 0 to 0.0 and FF00 to 1.0 for L
(and FFFF to a bit more than 1.0), and 0 to -128.0 and FF00 to
127.0 for a* and b* (and FFFF to a bit more than 127.0).

This means this produces different values than our built-in LAB mft1
profile:

    % echo 'pcslab(100, 127, -128)' | \
          icc -n LAB_mft2 --stdin-u8-from-pcs | \
          icc -n LAB --stdin-u8-to-pcs
    pcslab(99.60784, 126, -128)

Staying in LAB for both steps is exact, of course:

    % echo 'pcslab(100, 127, -128)' | \
          icc -n LAB --stdin-u8-from-pcs | \
          icc -n LAB --stdin-u8-to-pcs
    pcslab(100, 127, -128)

Staying in LAB_mft2 for both steps is also exact, up to u8
resolution (100 and 127 don't exactly map to an u8 value with
our "16-bit legacy PCSLAB encoding mapped to u8" encoding):

    % echo 'pcslab(100, 127, -128)' | \
          icc -n LAB_mft2 --stdin-u8-from-pcs | \
          icc -n LAB_mft2 --stdin-u8-to-pcs
    pcslab(99.99693, 126.99219, -128)

This is just because the same PCSLAB value is encoded slightly
differently in the mft1 and mft2 LAB encodings:

    % echo 'pcslab(100, 127, -128)' | \
          icc -n LAB --stdin-u8-from-pcs
    255, 255, 0

    % echo 'pcslab(100, 127, -128)' | \
          icc -n LAB_mft2 --stdin-u8-from-pcs
    254, 254, 0

(There's no XYZ mft1 encoding, so at least we don't have mft1 and mft2
profiles that have different output.)

Use this profile to add a roundtrip test for Lut16TagData when PCS
is PCSLAB.
2025-12-07 09:02:17 -05:00
Nico Weber
97ec973692 LibGfx/ICC+icc: Add a built-in XYZ profile
This profile identity-maps PCSXYZ to nCIEXYZ.

This allows converting byte-encoded LAB values to XYZ:

    % echo '255, 0, 128' | \
          icc -n XYZ --stdin-u8-to-pcs
    pcsxyz(1.9999695, 0, 1.0039062)

It can also be written to a file, for other tools to use it:

    icc -n XYZ --reencode-to serenity-XYZ.icc

(...or you can then run `icc serenity-XYZ.icc` instead of
`icc -n XYZ` if you want to dump it from a file instead of from
memory, for some reason.)

This profile uses mft2 (16-bit lut) because no mft1 (8-bit lut) PCSXYZ
encoding exists. That's probably because XYZ is a linear space, and for
linear spaces 8 bits isn't enough resolution.

This profile is a bit weird!

- *We* map the result to 8-bit values for display in `icc`.
  We store PCS values as floats, so when converting to a space with
  gamma, getting 8 bits out is fine, since we convert to 8 bits after
  all the other conversion math. But here the final data color
  space is also XYZ, and now we invented our own 8-bit XYZ encoding
  by just linearly mapping the ICC 16-bit XYZ encoding to it.

- Since this is an identity mapping, it leaks the spec's internal PCSXYZ
  encoding into the output data:
  0x0000 maps to 0.0f per spec, which for us becomes 0x00
  0x8000 maps to 1.0f per spec, which for us is not representable
                                (0x80 corresponds to 0x8080)
  0xFFFF maps to 1.0 + (32767.0 / 32768.0) == 1.999969f (for us, 0xFF)

To repeat, 0xFF maps to 1.999969f.

We could use a linear ramp to map 0x00..0xff to 0.0f..1.0f, but
then the table wouldn't be its own inverse, and directly exposing
the actual PCSXYZ encoding is nice for testing.

In other words, ICC profiles are data-encoding-dependent. One could also
make a profile that maps 0x00..0xff to red 0.25..0.5 instead of to
0.0..1.0 (if one only needed red values in that range and wanted more
resolution in that range).

Anyways, this profile is probably mostly useful for testing.
2025-12-07 09:02:17 -05:00
Nico Weber
9858ca7c8a imgcmp: Print more details on difference, add --quiet, remove --count
We now print more detail on difference by default:

    % imgcmp serenity-rgb.webp magick-rgb.png
    number of differing pixels: 2261416 (13.48%)
    max error R:    5, G:    4, B:    5
    avg error R: 0.03, G: 0.05, B: 0.07
    max error at (2942, 35): rgb(29, 156, 115) vs rgb(24, 156, 115)
    first difference at (0, 0): rgb(171, 113, 41) vs rgb(170, 113, 42)

Previously with --count, this would print:

    different pixel at (0, 0), rgb(171, 113, 41) vs rgb(170, 113, 42)
    number of differing pixels: 2261416

Locally, I also tried reporting maximum DeltaE, but that takes much
longer to compute. The metrics computed by default can be computed
in the same time as the count that --count used to produce.

If --quiet is passed, imgcmp exits with 1 if there's a difference,
but doesn't produce any output.

As before, if there are no differences, there's no output.
2025-12-06 11:43:10 -05:00
Nico Weber
3841430718 LibGfx/ICC+image: Implement conversion to CMYK color spaces :^)
This used to fail:

    % Build/lagom/bin/image \
        --assign-color-profile serenity-sRGB.icc \
        --convert-to-color-profile \
            ./Build/lagom/Root/res/icc/Adobe/CMYK/USWebCoatedSWOP.icc \
        -o buggie-cmyk.jpg \
        Base/res/graphics/buggie.png
    Runtime error: Can only convert to RGB at the moment,
        but destination color space is not RGB

Now it works.

It only works for CMYK profiles that use an mft1 tag for the BToA tags,
because #26452 added support for converting from PCS to mft1.

Most CMYK profiles use mft1, but not all of them. Support for `mft2` and
`mBA ` tag types will be in a (straightforward) follow-up.

Implementation-wise, Profile grows two more methods for converting RGB
and CMYK bitmaps to CMYK. We now have 2x2 implementations for
{RGB,CMYK} -> {RGB,CMYK}. For N different channel types, we need O(N^2)
of these methods.

Instead, we could have conversion methods between RGB bitmap and PCS
bitmap and between CMYK and PCS bitmap. With this approach, we'd only
need O(N) (2*N) of these methods. Concretely, that'd be 2x2 methods too
though, and since there are two PCS types, depending on how it's
implemented it's actually 4*N (i.e. 8 for RGB and CMYK).

So the current 2x2 approach seems not terrible. It *is* a bit
repetitive.

We then call the right of these 4 methods in `image`.

See the PR that added this commit for rough quantitative quality
evaluation of the implementation.
2025-12-06 08:40:47 -05:00
Nico Weber
5c797ed01f LibGfx/ICC+icc: Add a built-in identity LAB profile
This profile idenity-maps PCSLAB to CIELAB.

This allows converting byte-encoded LAB values to PCS:

    % echo '255, 0, 255' | \
          icc -n LAB --stdin-u8-to-pcs
    pcslab(100, -128, 127)

It can also be written to a file, for other tools to use it:

    icc -n LAB --reencode-to serenity-LAB.icc

(...or you can then run `icc serenity-LAB.icc` instead of
`icc -n LAB` if you want to dump it from a file instead of from
memory, for some reason.)
2025-12-04 21:00:19 -05:00
Lucas CHOLLET
894a1db800 config: Enable printing a whole group or domain
This is done by calling `config DOMAIN_TO_PRINT` or `config DOMAIN
GROUP_TO_PRINT`.
2025-12-04 18:46:15 +01:00
Lucas CHOLLET
956730b46d config: Remove the ability to add a group
The implementation of this feature is inconsistent as
`config DOMAIN GROUP` creates a group while `config DOMAIN GROUP KEY`
prints the key.

This feature should probably be implemented behind a flag like `--add`.
2025-12-04 18:46:15 +01:00
Nico Weber
f6b9fbc76e icc: Apply comment tweak I missed to push to #26448
Bump copyright year while here.
2025-12-04 05:39:07 -05:00
Nico Weber
1c607e855d LibGfx/ICC+icc: Add --stdin-u8-to-pcs / --stdin-u8-from-pcs flags
This allows inspecting profile conversion intermediate results,
and running things like:

    echo '255, 0, 0' |
        icc profile1.icc --stdin-u8-to-pcs |
        icc profile2.icc --stdin-u8-from-pcs |
        icc profile2.icc --stdin-u8-to-pcs |
        icc profile1.icc --stdin-u8-from-pcs

The output of --stdin-u8-to-pcs prints data in the profile
connection space, which is either XYZ or LAB:

    % echo '255, 255, 255' | \
         Build/lagom/bin/icc -n sRGB --stdin-u8-to-pcs
    pcsxyz(0.9642, 0.99999994, 0.8249001)

    % echo '255, 255, 255, 128' | \
         Build/lagom/bin/icc \
           Build/lagom/Root/res/icc/Adobe/CMYK/USWebCoatedSWOP.icc \
           --stdin-u8-to-pcs
    pcslab(6.606001, 2.4342346, -1.8503876)

Which one of the two is printed is a property of the used .icc file.

--stdin-u8-from-pcs parses these two formats and prints bytes in
the data color space of the passed-in color profile.

This is useful for converting single colors from one profile to
another, and for inspecting internal LibGfx state (the PCS values).

As a comment in the code points out, the PCS->data conversion is
a tiny bit fishy as it also depends on the illuminant of the
_source_ profile. To be 100% correct, I think (but I'm not sure)
that we'd also have to write the illuminant to the output
(`pcsxyz(a, b, c, illuminant=d, e, f)` or something). But the V4
ICC spec mandates that the illuminant is the D50 illuminant, and
for most V2 profiles that's true too. So this takes a shortcut,
at least for now. Since ICC::Profile rejects anything that's not
using a D50 illuminant with a diag, it should be fine in practice.
2025-12-03 20:56:29 -05:00
Lucas CHOLLET
3698a1239f run-tests: Show the PID of the last-run test 2025-12-02 12:25:37 -05:00
Nico Weber
19095cddf6 animation: Add --frame-duration-ms flag
This allows overriding the frame duration, which is especially useful
when making an animation out of several single-frame images.
2025-11-29 16:18:26 -05:00
Nico Weber
c509332126 animation: Allow multiple input files 2025-11-29 16:18:26 -05:00
Gwen W
5abb6384a0 sed: Support strict POSIX regexes
Adds command line flags to `sed` to require strict POSIX behavior.
Extended regexes are still used by default but an ignored -E flag is
added for compatibility with other sed versions.
2025-11-29 11:36:10 +01:00
Nico Weber
2b7d13e1f2 jbig2-from-json: Allow repeating a bitmap in image_data objects 2025-11-28 18:09:05 -05:00
Nico Weber
248766e14c jbig2-from-json: Fix copy-pasto in diagnostic 2025-11-23 15:52:50 -05:00
Nico Weber
3e49fed20e jbig2-from-json: Allow cropping halftone match_image 2025-11-20 18:21:01 -05:00
Nico Weber
0a7db9faed jbig2-from-json: Rename jbig2_bitmap_from_json() to jbig2_load_bitmap()
No behavior change.
2025-11-20 18:21:01 -05:00
Nico Weber
c7a5bdb33e jbig2-from-json: Update outdated diagnostic text 2025-11-18 19:00:27 -05:00
Nico Weber
217364b6c4 LibGfx/JBIG2+jbig2-from-json: Write intermediate halftone, text regions
Similar to #26410, the challenge with refining halftone and text regions
is that the writer needs to know the decoded halftone and text region
bitmap. That data is easily available in the loader, but not in the
writer.

Similar to #26410, the approach is to call into the *loader* with a
list of segments needed to decode the intermediate region's data.

...and then some minor plumbing to hook up the intermediate region
types in jbig2-from-json.

With this, we can write all region segment types :^)
2025-11-18 07:51:05 -05:00
Nico Weber
259eb98383 LibGfx/JBIG2+jbig2-from-json: Implement halftone "match_image" feature
This makes it possible to give a halftone region a reference image
instead of a grid of indices. If that's done, the grid of indices is
computed by finding the best-matching pattern for each covered region
in the reference image.

This can be used with a pattern dictionary that uses unique_image_tiles
from #26299 to make exact test images that have fewer tiles than
distinct_image_tiles and identity_tile_indices.

It should also be possible to use this with a regular halftone dot
pattern dictionary to get actual halftone images, but I haven't tried
that yet.

The matching is done in the writer instead of jbig2-from-image because
jbig2-from-image does not have access to referred-to segments, and
because this will eventually have to learn to deal with interesting
grid vectors, and that logic is also all in the writer. (For now,
interesting grid vectors are not supported, though.)
2025-11-15 08:09:21 -05:00
Nico Weber
f2171fa14b jbig2-from-json: Delay construction of HalftoneRegionSegmentData
This will allow putting a non-default-constructible field in
HalftoneRegionSegmentData.

No behavior change.
2025-11-15 08:09:21 -05:00
Nico Weber
3a7e2cb9fc jbig2-from-json: Extract jbig2_bitmap_from_json()
No behavior change.
2025-11-15 08:09:21 -05:00
Nico Weber
735e951dc8 jbig2-from-json: Update outdated diagnostic messages
When I renamed the halftone region halftone keys, I forgot to update
their spelling in some diagnostic messages. Update the diagnostics
to use the right spellings.
2025-11-15 08:09:21 -05:00
Nico Weber
b6eefa333c jbig2-from-json: Hook up 7fff handling toggle for refine-one symbols 2025-11-10 19:11:54 -05:00
Nico Weber
881ed976e0 LibGfx/JBIG2+jbig2-from-json: Allow setting symbol refinement strip t
This will be useful for huffman-encoded data, which will need a
non-0 initial strip t to work with the default huffman tables.

(It can be used with arithmetically-coded files segments too, though.)
2025-11-10 13:38:02 -05:00
Nico Weber
4071350ecf LibGfx/JBIG2Writer+jbig2_from_json: Wrap refines_using_strips in object
This will allow us to add more fields to it.

No behavior change.
2025-11-10 13:38:02 -05:00
Nico Weber
2d28c158f1 jbig2-from-json: Hook up 7fff handling toggle for text region instances 2025-11-09 20:09:55 -05:00
Nico Weber
732f3ce5a2 LibGfx/JBIG2+jbig2-from-json: Add support for symbol and text segments
This adds support for writing symbol dictionary and text region
segments.

Conceptually, this is simple:

* A symbol dictionary defines an (id => bitmap) mapping
* A text region has a list of (id, x, y) tuples that draw the bitmap
  of the given ID at that position

It's made a bit complicated due to the JBIG2 format supporting all kinds
of features to improve compression.

For symbol dictionaries:

* All images in a symbol dictionary are organized into "height classes",
  which are bitmaps of the same height. Within each height class, the
  deltas of the widths of consecutive images are stored.
* Entries in a symbol dictionary can "refine" other entries, using
  either refinement coding if it refines a single existing symbol
  (say, a bitmap looking like an "o" might be a refinement of a bitmap
  looking like a "c"), or it might be a mosaic of several existing
  symbol.
* Symbol dictionaries can refer to earlier symbol dictionaries, and
  re-export a subset of the referred-to symbols.
* The symbol dictionary can optionally be Huffman-coded instead of
  arithmetically-coded. It can then refer to custom table segments.
  In the common case, huffman-coded symbol dictionaries store a
  single "height class collective bitmap" (which is just all bitmaps
  of a given height concatenated horizontally).

For text regions:

* Symbol instances are grouped into stripes, with a predicted position
  advancing in stripe direction, and the actual position of the next
  symbol instance is stored relative to that predicted position.
* To influence how that predicted position changes, a text region
  has a configurable "reference corner" (top left, bottom right, etc)
  and a "transposed" flag.
* A symbol instance too can refine its referred-to symbol
  (but it can only refine a single symbol).
* Text regions also optionally support huffman coding, and if it's used
  a deflate-style "code length lengths" table is written for the
  symbol id table. (This implementation for now just assigns
  equally-sized symbols for everything and doesn't compute an
  optimal table. That's enough for creating tests, and if you care about
  file size, you're likely using arithmetic coding anyway.)

This implementation supports most of that!

And its outputs work in PDFium, pdf.js, and Preview.app (in addition to
in Serenity). The old `jbig2`-created symbol/text test files we have
did not work in Preview.app.

While the file format stores deltas of everything, I made it so that
the JSON files specify the absolute coordinates. Generally I try to keep
the JSON files very close to the final output (that's why height classes
are e.g. visible at the JSON level), but having to compute deltas
manually seemed inconvenient.

Some things don't work yet:

* Refinement of symbols that themselves are refined using the text
  region method

* Intermediate text region segments

* Huffman-coded segments that use refinement

The former two will need drawing of text region segments in the encoder.
Maybe it should call the decoder for that, or maybe it should
re-implement text region segment painting. I need to think about that
some, so I leave it to future us.

The latter needs some additional code since refinement coding always
writes arithmetically-coded data. The loader doesn't handle that
case yet either, since I haven't seen it in practice and up until
now it was nigh-impossible to create test data for it.

The existing encoding procedures were fairly similar to the decoding
procedures. The symbol and text procedures are actually fairly
different. They have the same flow and the same spec comments, but
the actual code is very different between the two. (No point here,
I just thought that's mildly interesting.)
2025-11-06 20:12:21 -05:00
Nico Weber
725864ca41 jbig2-from-image: Give refinement pixels a default value in text regions
This matches what we do in symbol dictionaries and refinement regions.
2025-11-05 06:34:23 -05:00
Nico Weber
65bb84f995 jbig2-from-image: Correct refinement_adaptive_template_pixels default
...when writing symbol dictionary regions.

Only set the default when refinement or aggregate coding is used,
and assign the default to the right variable.
2025-11-05 06:34:23 -05:00