UTF-8 is an encoding method for representing large amount of glyphs. UTF-8 will use one, two, three, or four bytes to encode a given glyph depending on the given code point needed. Wikipedia has a good table that explains how UTF-8 breaks out:
Number of bytes |
Code point bits |
First code point |
Last code point |
Byte 1 |
Byte 2 |
Byte 3 |
Byte 4 |
1 |
7 |
U+0000 |
U+007F |
0xxxxxxx |
|
|
|
2 |
11 |
U+0080 |
U+07FF |
110xxxxx |
10xxxxxx |
|
|
3 |
16 |
U+0800 |
U+FFFF |
1110xxxx |
10xxxxxx |
10xxxxxx |
|
4 |
21 |
U+10000 |
U+10FFFF |
11110xxx |
10xxxxxx |
10xxxxxx |
10xxxxxx |
There are 1,114,112 (17 x 2^16) total code points available. BableStone reports that 276,337 (approximately 24.8%) code points are in use, which leaves 837,775 still available. That's a lot of room left for emojis.