Avoid naive string indexing in PHP
PHP allows treating a string as an array, so that you can use indexing syntax to get or set a single position in the string:
<?php echo 'hello'; // h $foo = 'bar'; $foo = 'z'; echo $foo; // baz
This seems handy but you should never do it.
The reason to avoid string indexing like this in PHP is that PHP strings are not multibyte character strings, they’re just bytes.
You won’t notice this with plain ASCII strings like the above, as each character happens to be one byte anyway so they’re equivalent.
As soon as you get a multi-byte string, which you will as everything is UTF-8 and internationalised now, that kind of naive string indexing will break.
<?php echo '葛修远'; // '�' // You might expect to get '葛' here, but you won't as it's multi-byte. // Instead you get the mangled '�', which is the first byte of the UTF-8 // encoding of '葛'.
"葛修远"; // "葛"
The correct way to handle this in PHP is to never use naive string indexing, and
instead use the
mb_ functions, in this case
<?php echo mb_substr('葛修远', 0, 1); // '葛'
Unfortunately neither of the most popular linters for PHP, PHPMD and PHPCS, seem to have standard rules for banning naive string indexing, as it would be handy to automatically reject it in a codebase.