News Score: Score the News, Sort the News, Rewrite the Headlines

Decoding UTF-8. Part III: Determining Sequence Length - A Lookup Table

In the first part of this series on decoding UTF-8, we discussed what it means to decode a UTF-8 sequence. In part two, we saw how to determine sequence length and mentioned that there are ways to reduce branching.The obvious way to avoid branching in the code is to use a lookup table. As we have only 256 possible values for the first byte of a UTF-8 sequence, we can hard-code a simple table that would map values of the lead byte to sequence lengths. And because the possible values of a byte are...

Read more at nemanjatrifunovic.substack.com

© News Score  score the news, sort the news, rewrite the headlines