Emoji characters like 🥳, 🎉 and 💖 can be represented in a string in different
ways. The first way is to include a Unicode point as a single character. For
example 💖 is Unicode point U+1F496 , which can be included in a PHP string
either directly ('💖' ) or using the Unicode code point escape
syntax
from PHP 7 onwards: "\u{1F496}" .
You can try these on the PHP REPL (php -a ):
php > echo '💖';
💖
php > echo "\u{1F496}";
💖
Emoji characters might also be represented using a zero width
joiner Unicode point, which
lets you combine two separate Unicode points together to be displayed as a
single character. For example, the rainbow flag emoji 🏳️🌈 can be produced by
the sequence [Waving white flag] [ZWJ] [Rainbow] (flags are a common use-case
for the ZWJ character).
See also: Fun Emoji Hacks: Zero Width
Joiners.
As well as zero width joiners, there are also Unicode variation
selectors.
These can be used in conjunction with ZWJ sequences “where one or more
characters in the sequence have text and emoji
presentation”.
For example, the 💖 emoji above might end up as ️💖 (️💖
in HTML) by the time it makes it into your system.
If you’re trying to print text containing these sequences, or render them into
an image, this can lead to unwanted ? question mark characters appearing in
the rendered text, e.g. ?💖 .
You can eliminate all or most of these in PHP like this:
<?php
use Normalizer;
preg_replace('/[\x{FE00}-\x{FE0F}]/u', '', Normalizer::normalize(trim($text)));
That should clear up the ZWJ and variation selectors into a printable string.
We use this to be able to print emoji characters that customers add to personal
gift
messages
in Pop Robin Cards.
View post:
Stripping unprintable Unicode variation characters in PHP
|