Total possible glyphs using UTF-8

UTF-8 is an encoding method for representing large amount of glyphs. UTF-8 will use one, two, three, or four bytes to encode a given glyph depending on the given code point needed. Wikipedia has a good table that explains how UTF-8 breaks out:

Number of bytes Code point bits First code point Last code point Byte 1 Byte 2 Byte 3 Byte 4
1 7 U+0000 U+007F 0xxxxxxx
2 11 U+0080 U+07FF 110xxxxx 10xxxxxx
3 16 U+0800 U+FFFF 1110xxxx 10xxxxxx 10xxxxxx
4 21 U+10000 U+10FFFF 11110xxx 10xxxxxx 10xxxxxx 10xxxxxx

There are 1,114,112 (17 x 2^16) total code points available. BableStone reports that 276,337 (approximately 24.8%) code points are in use, which leaves 837,775 still available. That's a lot of room left for emojis.

Leave A Reply
All content licensed under the Creative Commons License