Commit Graph

10 Commits

Author SHA1 Message Date
Aliaksandr Kalenik
9375499e52 LibTextCodec: Add streaming decoder
Introduce a StreamingDecoder wrapper that lets callers feed bytes to a
Decoder one chunk at a time. It buffers any incomplete trailing byte
sequence at the end of a chunk and prepends it to the next chunk, so a
multi-byte code point split across a chunk boundary is decoded correctly
once the next chunk arrives.

To support that, add an incomplete_tail_length() virtual on Decoder
returning the number of trailing bytes that form an incomplete sequence
per the Encoding Standard's decoder handler byte ranges, with overrides
for UTF-8, UTF-16BE, UTF-16LE, GB18030, Big5, EUC-JP, ISO-2022-JP,
Shift_JIS, and EUC-KR. The default implementation returns 0, which keeps
single-byte legacy decoders correct.

This is the foundation for the upcoming incremental HTML parser, which
needs to decode network response bodies as they arrive.
2026-04-29 04:12:44 +02:00
R-Goc
ae5f28fb40 LibTextEncoder/LibURL: Cleanup includes
Cleans up LibURL/Parser.h to use the forwarding header from
LibTextEncoder.
2026-02-26 18:31:57 +01:00
Timothy Flynn
0fd80a8f99 LibTextCodec+LibWeb: Move isomorphic coders to LibTextCodec
This will be used outside of LibWeb.
2025-11-27 14:57:29 +01:00
ayeteadoe
e497303e94 LibTextCodec: Enable EXPLICIT_SYMBOL_EXPORT 2025-08-23 16:04:36 -06:00
Andreas Kling
0e9480b944 AK+LibTextCodec: Stop using Utf16View endianness override
This is preparation for removing the endianness override, since it was
only used by a single client: LibTextCodec.

While here, add helpers and make use of simdutf for fast conversion.
2025-04-16 10:04:50 +02:00
Timothy Flynn
93712b24bf Everywhere: Hoist the Libraries folder to the top-level 2024-11-10 12:50:45 +01:00
Andreas Kling
13d7c09125 Libraries: Move to Userland/Libraries/ 2021-01-12 12:17:46 +01:00
Łukasz Maciejewski
518ba73dcb LibTextCodec: Add Latin2 text decoder (#4579) 2020-12-27 22:44:38 +01:00
Luke
f3d2053bff LibTextCodec: Add a function to convert encodings to standardized names
https://encoding.spec.whatwg.org/#names-and-labels
2020-11-14 10:14:03 +01:00
Andreas Kling
e09b83c60c LibTextCodec: Start fleshing out a simple text codec library
We're starting with a very basic decoding API and only ISO-8859-1 and
UTF-8 decoding (and UTF-8 decoding is really a no-op since String is
expected to be UTF-8.)
2020-05-03 23:01:58 +02:00