Unicode
The characters “U+” are an ASCIIfied version of the MULTISET UNION “⊎” U+228E character (the U-like union symbol with a plus sign inside it), which was meant to symbolize Unicode as the union of character sets
— codepoint - Why is 'U+' used to designate a Unicode code point? - Stack Overflow
Composite and Precomposed Characters
Unicode contains a vast number of characters, many of which have different Unicode numbers, but are in fact the same character. A simple example is the letter e-acute: this can be represented by é, which in UTF-8 encoding is the two hex bytes
c3 a9
, or by é, which is the three hex bytes65 cc 81
. In some fonts there may be small differences, but in most cases we see identical characters and expect our computers to treat them the same.
é and é
Apfelstrudel, Downloads – The Eclectic Light Company
Normalization forms:
NFC: Precomposed string with canonical mapping
NFD: Decomposed string with canonical mapping
NFKC: Precomposed string with compatibility mapping
NFKD: Decomposed string with compatibility mapping
Last updated