Showing entries with tag "Unicode".

Found 2 entries

Perl: Several ways to generate Unicode

Once you find a Unicode code point you can put it into a Perl string in several ways:

my $thumbs_up = "";

$thumbs_up = "\x{1F44D}";
$thumbs_up = "\N{U+1F44D}";
$thumbs_up = chr(0x1F44D);
$thumbs_up = pack("U", 0x1F44D);

# Make sure STDOUT is set to accept UTF8
binmode(STDOUT, ":utf8");

print $thumbs_up x 2 . "\n";
Leave A Reply

Total possible glyphs using UTF-8

UTF-8 is an encoding method for representing large amount of glyphs. UTF-8 will use one, two, three, or four bytes to encode a given glyph depending on the given code point needed. Wikipedia has a good table that explains how UTF-8 breaks out:

Number of bytes Code point bits First code point Last code point Byte 1 Byte 2 Byte 3 Byte 4
1 7 U+0000 U+007F 0xxxxxxx
2 11 U+0080 U+07FF 110xxxxx 10xxxxxx
3 16 U+0800 U+FFFF 1110xxxx 10xxxxxx 10xxxxxx
4 21 U+10000 U+10FFFF 11110xxx 10xxxxxx 10xxxxxx 10xxxxxx

There are 1,114,112 (17 x 2^16) total code points available. BableStone reports that 276,337 (approximately 24.8%) code points are in use, which leaves 837,775 still available. That's a lot of room left for emojis.

Leave A Reply