| License | BSD-style (see LICENSE) |
|---|---|
| Stability | experimental |
| Safe Haskell | Trustworthy |
| Language | Haskell2010 |
Warning: this is an internal module, and does not have a stable API or name. Functions in this module may not check or enforce preconditions expected by public modules. Use at your own risk!
Internals of Data.Text.Encoding.
Since: text-2.0.2
validateUtf8Chunk :: ByteString -> (Int, Maybe Utf8State) Source
Validate a ByteString as UTF-8-encoded text. To be continued using validateUtf8More.
See also validateUtf8More for details on the result of this function.
validateUtf8Chunk = validateUtf8More startUtf8State
Given:
validateUtf8Chunk chunk = (n, ms)
The prefix is valid UTF-8. In particular, it should be accepted by this validation:
validateUtf8Chunk (take n chunk) = (n, Just startUtf8State)
validateUtf8More :: Utf8State -> ByteString -> (Int, Maybe Utf8State) Source
Validate another ByteString chunk in an ongoing stream of UTF-8-encoded text.
Returns a pair:
n is the end position, relative to the current chunk, of the longest prefix of the accumulated bytestring which is valid UTF-8. n may be negative: that happens when an incomplete code point started in a previous chunk and is not completed by the current chunk (either that code point is still incomplete, or it is broken by an invalid byte).The second component ms indicates the following:
ms = Nothing, the remainder of the chunk contains an invalid byte, within four bytes from position n;ms = Just s', you can carry on validating another chunk by calling validateUtf8More with the new state s'.Given:
validateUtf8More s chunk = (n, ms)
If the chunk is invalid, it cannot be extended to be valid.
ms = Nothing ==> validateUtf8More s (chunk <> more) = (n, Nothing)
Validating two chunks sequentially is the same as validating them together at once:
ms = Just s' ==> validateUtf8More s (chunk <> more) = first (length chunk +) (validateUtf8More s' more)
decodeUtf8Chunk :: ByteString -> (StrictTextBuilder, ByteString, Maybe Utf8State) Source
Decode a chunk of UTF-8 text. To be continued with decodeUtf8More.
See decodeUtf8More for details on the result.
decodeUtf8Chunk = decodeUtf8More startUtf8State
Given:
decodeUtf8Chunk chunk = (builder, rest, ms)
builder is a prefix and rest is a suffix of chunk.
encodeUtf8 (strictBuilderToText builder) <> rest = chunk
Since: text-2.0.2
decodeUtf8More :: Utf8State -> ByteString -> (StrictTextBuilder, ByteString, Maybe Utf8State) Source
Decode another chunk in an ongoing UTF-8 stream.
Returns a triple:
StrictBuilder for the decoded chunk of text. You can accumulate chunks with (<>) or output them with toText.Just the new state, or Nothing if an invalid byte was encountered (it will be within the first 4 bytes of the undecoded remainder).Given:
(pre, suf, ms) = decodeUtf8More s chunk
If the output pre is nonempty (alternatively, if length chunk > length suf)
s2b pre `append` suf = p2b s `append` chunk
where
s2b = encodeUtf8 . toText p2b = partUtf8ToByteString
If the output pre is empty (alternatively, if length chunk = length suf)
suf = chunk
Decoding chunks separately is equivalent to decoding their concatenation.
Given:
(pre1, suf1, Just s1) = decodeUtf8More s chunk1 (pre2, suf2, ms2) = decodeUtf8More s1 chunk2 (pre3, suf3, ms3) = decodeUtf8More s (chunk1 `B.append` chunk2)
we have:
s2b (pre1 <> pre2) = s2b pre3 ms2 = ms3
decodeUtf8With1 :: OnDecodeError -> String -> ByteString -> Text Source
Helper for decodeUtf8With.
Since: text-2.0.2
decodeUtf8With2 :: OnDecodeError -> String -> Utf8State -> ByteString -> (StrictTextBuilder, ByteString, Utf8State) Source
Helper for decodeUtf8With, streamDecodeUtf8With, and lazy decodeUtf8With, which use an OnDecodeError to process bad bytes.
See decodeUtf8Chunk for a more flexible alternative.
Since: text-2.0.2
State of decoding a ByteString in UTF-8. Enables incremental decoding (validateUtf8Chunk, validateUtf8More, decodeUtf8Chunk, decodeUtf8More).
Since: text-2.0.2
startUtf8State :: Utf8State Source
Initial Utf8State.
Since: text-2.0.2
data StrictTextBuilder Source
A delayed representation of strict Text.
Since: text-2.1.2
| Monoid StrictTextBuilder Source | |
Defined in Data.Text.Internal.StrictBuilder Methodsmappend :: StrictTextBuilder -> StrictTextBuilder -> StrictTextBuilder | |
| Semigroup StrictTextBuilder Source | Concatenation of |
Defined in Data.Text.Internal.StrictBuilder Methods(<>) :: StrictTextBuilder -> StrictTextBuilder -> StrictTextBuilder sconcat :: NonEmpty StrictTextBuilder -> StrictTextBuilder stimes :: Integral b => b -> StrictTextBuilder -> StrictTextBuilder | |
type StrictBuilder = StrictTextBuilder Source
Deprecated: Use StrictTextBuilder instead
A delayed representation of strict Text.
Since: text-2.0.2
strictBuilderToText :: StrictTextBuilder -> Text Source
Use StrictBuilder to build Text.
Since: text-2.0.2
textToStrictBuilder :: Text -> StrictTextBuilder Source
Copy Text in a StrictBuilder
Since: text-2.0.2
skipIncomplete :: OnDecodeError -> String -> Utf8State -> StrictTextBuilder Source
Call the error handler on each byte of the partial code point stored in Utf8State and append the results.
Exported for use in lazy decodeUtf8With.
Since: text-2.0.2
getCompleteLen :: Utf8State -> Int Source
Exported for testing.
getPartialUtf8 :: Utf8State -> ByteString Source
Exported for testing.
© The University of Glasgow and others
Licensed under a BSD-style license (see top of the page).
https://downloads.haskell.org/~ghc/9.12.1/docs/libraries/text-2.1.2-84f9/Data-Text-Internal-Encoding.html