Commit Graph

700 Commits

Author SHA1 Message Date
Nico Weber
e56b7bb17b LibPDF: Hook up dash pattern drawing 2024-11-07 09:12:13 -05:00
Nico Weber
ebf67def77 LibPDF: Let LineDashPattern store data as floats
We used to store them as ints, which threw away information.
In practice, files do use fractional values for these.
2024-11-07 09:12:13 -05:00
Nico Weber
1ebb25c772 LibPDF: Pass miter limit through to LibGfx
With this, the `M` operator has an effect in LibPDF.
2024-10-25 20:35:23 -04:00
Nico Weber
d6e325ef6b LibGfx+Clients: Make Path::stroke_path() take a parameter struct
This makes several calling sites shorter, and makes it easier
to add a miter limit with less plumbing in the future.

This also exposed a bug in AffineDisplayListPlayerCPU.cpp: We weren't
passing path linecap style on. I added a FIXME for that for now.

No intended behavior change.
2024-10-25 20:35:23 -04:00
Nico Weber
a45fb04821 LibPDF: Hook up miter line joins 2024-10-25 18:12:58 -04:00
Nico Weber
55217344b5 LibPDF: Add support for drawing bevel line joins 2024-10-25 18:12:58 -04:00
Nico Weber
e352650777 LibGfx+Clients: Remove Painter::WindingRule forwarding declaration
No behavior change. Follow-up to #25065.
2024-10-20 19:43:24 -04:00
Nico Weber
536d27fbe6 LibPDF: Don't fail on files where object 0 is not in the xref table
History:

* 72f693e9ed from #6974 added the initial XRefTable. Here, -1
  was used for byte_offset of invalid entries. has_object() compared
  byte_offset to -1.

* e23bfd7252 from #7675 added invalid_byte_offset (equal to
  LONG_MAX) and initialized byte_offset with it, but forgot to update
  has_object(). has_object() still compared to -1, so has_object would
  now never return false.

* d1bc89e30b from #16150 added validate_xref_table_and_fix_if_necessary,
  which used `byte_offset_for_object(index) == invalid_byte_offset` to
  detect if an object was in the xref table. `byte_offset_for_object`
  internally did `VERIFY(has_object(index))`, which due to the previous
  bullet was always true. It ran this for all object numbers from 0
  up to the first object with byte_offset != invalid_byte_offset.

* d458471e09 from #24099 updated has_object() to check against
  invalid_byte_offset instead of -1, making it work again -- but causing
  a VERIFY in validate_xref_table_and_fix_if_necessary(). This caused
  validate_xref_table_and_fix_if_necessary() to VERIFY if object 0
  was not in the xref table.

* The fix is to make validate_xref_table_and_fix_if_necessary() call
  has_object() to find out if an object exists in the xref table.
  (When validate_xref_table_and_fix_if_necessary(), that didn't work,
  because has_object() was broken then.)

This is hit 4 times in my 1000 file test set. For these three files,
the xref is valid:

* 0000200.pdf
* 0000567.pdf
* 0000651.pdf

For 0000900.pdf, validate_xref_table_and_fix_if_necessary() actually
fixes up the xref table.

Fixes #25079.
2024-10-18 21:54:38 -04:00
Nico Weber
f1a192da48 LibPDF: Flip two branches of a conditional
Rather than reading an int and then checking if we shouldn't have
read it and using a default value, start with the default value
and read the int if we should read it.

No behavior change.
2024-10-18 21:54:38 -04:00
Nico Weber
de66beee13 LibPDF: Use CommonNames for W in parse_xref_stream
No behavior change, just makes LibPDF more self-consistent.
2024-10-18 21:54:38 -04:00
Nico Weber
33b97f45a1 LibPDF: Add a spec comment to DocumentParser::parse_xref_stream() 2024-10-18 21:54:38 -04:00
Nico Weber
247e7bcb59 LibPDF: Make Document::text_string_to_utf8() fallible
TextCodec does not return Error for invalid UTF-8 or UTF-16, so
this only propagates allocation errors. No expected behavior change
in practice even for invalid PDFs. Removes three calls to
release_value_but_fixme_should_propagate_errors(), mostly for
aesthetic reasons.
2024-10-16 19:31:50 -04:00
Nico Weber
0d7de7681c LibPDF+Clients: Make Document::text_string_to_utf8() return String
It returns UTF-8 text, so make that clear in its return type,
and propagate this to clients.

No behavior change.
2024-10-16 19:31:50 -04:00
Nico Weber
e0b2af654a LibPDF: Hook up square linecap support 2024-10-16 18:18:43 -04:00
Nico Weber
e7d35aa9e4 LibPDF: Plumb through line cap style 2024-09-27 20:18:55 -04:00
kleines Filmröllchen
a3077203fe AK: Don't implicitly convert Optional<T&> to Optional<T>
C++ will jovially select the implicit conversion operator, even if it's
complete bogus, such as for unknown-size types or non-destructible
types. Therefore, all such conversions (which incur a copy) must
(unfortunately) be explicit so that non-copyable types continue to work.
2024-09-14 13:30:27 +02:00
MacDue
e8fe2f4ffe LibPDF: Use draw_rect() to show debug clipping rects
Using the path rasterizer here is much slower than simply drawing four
lines. This also more accurately shows the (real) clip as the bounding
box is truncated to an int before adding it as a clip rect.

Fixes #23056
2024-07-12 09:47:51 -04:00
Lucas CHOLLET
70a3f1f02b LibCompress: Rename LZWDecoder => LzwDecompressor
This is more idiomatic for LibCompress' decoders.
2024-05-14 12:33:53 -06:00
Lucas CHOLLET
11f53ba9cc LibCompress: Rename LZWDecoder.h => Lzw.h 2024-05-14 12:33:53 -06:00
Ava Rider
3a7bea7402 LibPDF: Added empty read check to parse_hex_string 2024-05-05 06:45:42 +01:00
Sergey Bugaev
d458471e09 LibPDF: Convert byte offsets to u64
This fixes a build failure on 32-bit.

Suggested-by: Nico Weber <thakis@chromium.org>
2024-05-02 07:46:53 -06:00
Dan Klishch
5ed7cd6e32 Everywhere: Use east const in more places
These changes are compatible with clang-format 16 and will be mandatory
when we eventually bump clang-format version. So, since there are no
real downsides, let's commit them now.
2024-04-19 06:31:19 -04:00
Nico Weber
76ba374aef LibPDF: Move CFF to use AK::Error instead of PDF::Error
Similar to the previous commit. No real behavior change.
2024-04-14 17:22:00 +02:00
Nico Weber
102ac331c6 LibPDF: Move Type1FontProgram to use AK::Error instead of PDF::Error
Makes some of the errors a bit less descriptive. But this is pretty
stable by now and the errors fire basically never, so that seems ok.

Needing the explicit `AK::` prefix is a bit awkward, but that'll go
away once this class moves out of LibPDF.

Move error() into the two subclasses. I'll remove it from CFF in
a follow-up.

No real behavior change.
2024-04-14 17:22:00 +02:00
Nico Weber
2b905cc482 LibPDF: Move rest of CFF from Reader to Stream
parse_index_data() wants to take ReadonlyByte views of the stream
data, so we need FixedMemoryStream::read_in_place(size_t). All
other remaining code indirectly calls parse_index_data(), so that
all operates on FixedMemoryStreams too.

No behavior change.
2024-04-14 10:45:11 +02:00
Nico Weber
cc3d4b6adb LibPDF: Convert CFF::parse_dict & co to Stream
No behavior change.
2024-04-14 10:45:11 +02:00
Nico Weber
1f7924e14c LibPDF: Convert CFF::parse_charset to Stream
No behavior change.
2024-04-14 10:45:11 +02:00
Nico Weber
4995dfe8f1 LibPDF: Convert CFF::parse_fdselect to Stream
No behavior change.
2024-04-14 10:45:11 +02:00
Nico Weber
16c22885eb LibPDF: Convert CFF::parse_encoding to Stream
No behavior change.
2024-04-14 10:45:11 +02:00
Nico Weber
f570678bf0 LibPDF: Invert image masks used as alpha too
Fixes #23824, a regression from the first commit in #23781.
2024-04-04 06:55:08 -04:00
Nico Weber
40780304b8 LibPDF: Add a fastpath for 1bpp grayscale to load_image()
We used to expand every bit in an 1bpp image to a 0 or 255 byte,
then map that to a float that's either 0.0f or 1.0f (or whatever's
in /DecodeArray), then multiply that by 255.0f to convert it to a
u8 and put that in the rgb channels of a Color.

Now we precompute the two possible outcomes (almost always Black
and White) and do a per-bit lookup.

Reduces time for

    Build/lagom/bin/pdf --render-bench --render-repeats 20 --page 36 \
        ~/Downloads/Flatland.pdf

(the "decoded data cached" case) from 3.3s to 1.1s on my system.

Reduces time for

    Build/lagom/bin/pdf --debugging-stats ~/Downloads/0000/0000231.pdf

(the "need to decode each page" case) from 52s to 43s on my machine.

Also makes paging through PDFs that contain a 1700x2200 pixel CCITT
or JBIG2 bitmap on each page noticeably snappier.
2024-04-02 08:07:46 +02:00
Nico Weber
c01acdd733 LibPDF: Move decode_array up a bit
No behavior change.
2024-04-02 08:07:46 +02:00
Nico Weber
81ff9f45d9 LibPDF: Move image mask inversion from load_image() to show_image()
No behavior change.
2024-04-02 08:07:46 +02:00
Nico Weber
0374c1eb3b LibPDF: Handle indirect reference resolving during parsing more robustly
If `Document::resolve()` was called during parsing, it'd change the
reader's current position, so the parsing code that called it would
then end up at an unexpected position in the file.

Parser.cpp already had special-case recovery when a stream's length
was stored in an indirect reference.

Commit ead02da98a ("/JBIG2Globals") in #23503 added another case
where we could resolve indirect reference during parsing, but wasn't
aware of having to save and restore the reader position for that.

Put the save/restore code in `DocumentParser::parse_object_with_index`
instead, right before the place that ultimately changes the reader's
position during `Document::resolve`. This fixes `/JBIG2Globals` and
lets us remove the special-case code for `/Length` handling.

Since this is kind of subtle, include a test.
2024-03-19 19:20:01 -04:00
Nico Weber
495aaa295c LibPDF: Add some logging behind PDF_DEBUG
I've added these two lines a bunch of times by now. Let's check
them in. If they turn out to be annoying, we can remove them again.
2024-03-19 19:20:01 -04:00
Lucas CHOLLET
1e023a589d LibPDF: Plug in the CCITT3 1D decoder and pass corresponding options 2024-03-19 12:22:28 +01:00
MacDue
8057542dea LibGfx: Simplify path storage and tidy up APIs
Rather than make path segments virtual and refcounted let's store
`Gfx::Path`s as a list of `FloatPoints` and a separate list of commands.

This reduces the size of paths, for example, a `MoveTo` goes from 24
bytes to 9 bytes (one point + a single byte command), and removes a
layer of indirection when accessing segments. A nice little bonus is
transforming a path can now be done by applying the transform to all
points in the path (without looking at the commands).

Alongside this there's been a few minor API changes:

- `path.segments()` has been removed
  * All current uses could be replaced by a new `path.is_empty()` API
  * There's also now an iterator for looping over `Gfx::Path` segments
- `path.add_path(other_path)` has been removed
  * This was a duplicate of `path.append_path(other_path)`
- `path.ensure_subpath(point)` has been removed
  * Had one use and is equivalent to an `is_empty()` check + `move_to()`
- `path.close()` and `path.close_all_subpaths()` assume an implicit
  `moveto 0,0` if there's no `moveto` at the start of a path (for
  consistency with `path.segmentize_path()`).

Only the last point could change behaviour (though in LibWeb/SVGs all
paths start with a `moveto` as per the spec, it's only possible to
construct a path without a starting `moveto` via LibGfx APIs).
2024-03-18 07:09:37 +01:00
Nico Weber
21917e7b1e LibPDF+PDFViewer+MacPDF: Don't draw hidden text by default
Text can be rendered in various ways in PDFs: Filled, stroked,
both filled and stroked, set as clipping path, hidden, or
some combinations thereof.

We don't implement any of this at the moment except "filled".

Hidden text is used in scanned documents: The image of the scan is
drawn in the background, and then OCRd text is "drawn" as hidden
on top of the scanned bitmap. That way, the (hidden) text can be
selected and copied, and it looks like you're selecting text from
the scanned bitmap. Find-in-page also works similarly. (We currently
have neither text selection nor find-in-page, but one day we will.)

Now that we have pretty good support for CCITT and are growing some
support for JBIG2, we now draw both the scanned background image
as well as the foreground text. They're not always perfectly aligned.

This change makes it so that we don't render text that's marked as
hidden. (We still do most of the coordinate math, which will probably
come in handy at some point when we implement text selection.)

This makes these scanned documents appear as they're supposed to
appear (at least in documents where we manage to decode the background
bitmap).

This also adds a debug option to force rendering of hidden text.
2024-03-16 13:10:48 -04:00
Nico Weber
be9a6caa0a LibPDF: In Filter::decode_jbig2(), invert bits
See included comment.
2024-03-10 10:10:55 -04:00
Nico Weber
1eaaa8c3e9 LibPDF+LibGfx: Support JBIG2s with /JBIG2Globals set
Several ramifications:

* /JBIG2Globals is an indirect reference, which means we now need
  a Document for unfiltering. (Technically, other decode parameters
  can also be indirect objects and we should use the Document to
  resolve() those too, but in practice it only seems to be needed
  for /JBIG2Globals.)

* Since /JBIG2Globals are so rare, we just parse once for each
  image that use them, and decode_embedded() now receives a
  Vector<ReadonlyBytes> with all sections of sequences of
  segments.

* Internally, decode_segment_headers() is now called several times
  for embedded JBIG2s with multiple such sections (e.g. PDFs with
  /JBIG2Globals).

* That means `data` is now no longer part of JBIG2LoadingContext
  and things get slightly reshuffled due to this.

This completes the LibPDF part of JBIG2 support. Once LibGfx
implements actual decoding of JBIG2s, things should start to
Just Work in PDFs.
2024-03-09 16:01:22 +01:00
Nico Weber
953f6c5d9b LibPDF+LibGfx: Pass jbig2-filtered data to JBIG2ImageDecoderPlugin
Except for /JBIG2Globals, which we bail out on for now. In my 1000
files, 13 use JBIG2, and of those, 2 use JBIG2Globals (0000372.pdf e.g.
page 11 and 0000857.pdf e.g. page 1), and only one (the latter) of the
two uses the same JBIG2Globals stream for more than a single image.

JBIG2ImageDecoderPlugin cannot decode the data yet, so no behavior
change, but with `#define JBIG2_DEBUG 1` at the top of that file,
it now prints segment header info for PDFs containing JBIG2 data :^)
2024-03-09 16:01:22 +01:00
Nico Weber
24951a039e LibPDF: Clip stroke for B / B* operators
Fixes pages 17-19 on
https://www.iro.umontreal.ca/~feeley/papers/ChevalierBoisvertFeeleyECOOP15.pdf

Calling the fill handler after painting the stroke as previously doesn't
work, since we need to set up the clip before both stroke and fill, and
unset it after both. The duplication is a bit unfortunate, but also
minor.
2024-03-08 10:45:28 -05:00
Nico Weber
3a39939995 LibPDF: Make truetype fonts marked as symbol fonts actually work
Turns out the spec didn't mean that the whole range is populated,
but that one of these ranges is populated. So take the argmax.

As fallout, explicitly mark the Liberation fonts as nonsymbolic
when we use them for the 14 standard fonts. Else, we'd regress
"PostScrõpt", since the Liberation fonts would otherwise go down
the "is symbolic or doesn't have explicit encoding" codepath,
since the standard fonts usually don't have an explicit encoding.

As a fallout from _that_, since the 14 standard fonts now go down
the regular truetype rendering path, and since we don't implement
lookup by postscript name yet, glyphs not present in Liberation
now cause text to stop rendering with a diag, instead of rendering
a "glyph not found" symbol. That isn't super common, only an
additional 4 files appear for the "'post' table not yet implemented"
diag. Since we'll implement that soon, this seems fine until then.
2024-03-07 11:29:47 -05:00
Lucas CHOLLET
be5e7a360f LibGfx/CCITT: Add support for images with an unknown number of lines 2024-03-07 11:07:20 -05:00
Kyle Lanmon
a099d0e140 PDFViewer: Hide the rendering diagnostics window by default
You can enable it in the Debug menu and we will remember your choice.
2024-03-04 10:43:41 +01:00
Nico Weber
c6b484a728 LibPDF: Make image creation in Renderer::load_image() fallible 2024-03-03 11:18:37 -05:00
Nico Weber
9bff8abcc7 LibPDF: Add support for array image masks
An array image mask contains a min/max range for each channel,
and if each channel of a given pixel is in that channel's range,
that pixel is masked out (i.e. transparent). (It's similar to
having a single color or palette index be transparent, but it
supports a range of transparent colors if desired.)

What makes this a bit awkward is that the range is relative to the
origin bits per pixel and the inputs to the image's color space.

So an indexed (palettized) image with 4bpp has a 2-element mask
array where both entries are between 0 and 15.

We currently apply masks after converting images to a Gfx::Bitmap,
that is after converting to 8bpp sRGB. And we do this by mapping
everything to 8bpp very early on in load_image().

This leaves us with a bunch of options that are all a bit awkward:

1. Make load_image() store the up- (or for 16bpp inputs, down-)
   sampled-to-8bpp pixel data. And also return if we expanded the
   pixel range while resampling (for color values) or not (for
   palettized images). Then, when applying the image filter,
   resample the array bounds in exactly the same way. This requires
   passing around more stuff.

2. Like 1, but pass in the mask array to load_image() and apply
   the mask right there and then. This means we'd apply mask arrays
   at a different time than other masks.

3. Make the function that computes the mask from the mask array
   work from the original, unprocessed image data. This is the most
   local change, but probably also requires the largest amount of
   code (in return, the color mask for 16bpp images is precise, in
   addition that it separates concerns the most nicely).

This goes with 3 for now.
2024-03-03 11:18:37 -05:00
Nico Weber
98e272ce15 LibPDF: Silently ignore BX / EX operators
See the added comment for reasoning.
2024-03-02 17:43:53 -05:00
Nico Weber
9e502dcfe4 LibPDF: Honor writing mode in TJ operator as well 2024-03-02 12:25:09 +01:00
Nico Weber
c69797fda9 LibPDF: Implement support for vertical text for Type0
For Identity-V only for now.
2024-03-02 12:25:09 +01:00