Presenting several UTF-8 / Multibyte-aware escape functions.
These functions represent alternatives to mysqli::real_escape_string, as long as your DB connection and Multibyte extension are using the same character set (UTF-8), they will produce the same results by escaping the same characters as mysqli::real_escape_string.
This is based on research I did for my SQL Query Builder class:
https://github.com/twister-php/sql
* Returns a string with backslashes before characters that need to be escaped.
* As required by MySQL and suitable for multi-byte character sets
* Characters encoded are NUL (ASCII 0), \n, \r, \, ', ", and ctrl-Z.
*
* @param string $string String to add slashes to
* @return $string with `\` prepended to reserved characters
*
* @author Trevor Herselman
*/if (function_exists('mb_ereg_replace'))
{
functionmb_escape(string $string)
{
returnmb_ereg_replace('[\x00\x0A\x0D\x1A\x22\x27\x5C]','\\\0',$string);
}
} else {
functionmb_escape(string $string)
{
returnpreg_replace('~[\x00\x0A\x0D\x1A\x22\x27\x5C]~u','\\\$0',$string);
}
}?>
Characters escaped are (the same as mysqli::real_escape_string):
00 = \0 (NUL)
0A = \n
0D = \r
1A = ctl-Z
22 = "
27 = '
5C = \
Note: preg_replace() is in PCRE_UTF8 (UTF-8) mode (`u`).
Enhanced version:
When escaping strings for `LIKE` syntax, remember that you also need to escape the special characters _ and %
So this is a more fail-safe version (even when compared to mysqli::real_escape_string, because % characters in user input can cause unexpected results and even security violations via SQL injection in LIKE statements):
* Returns a string with backslashes before characters that need to be escaped.
* As required by MySQL and suitable for multi-byte character sets
* Characters encoded are NUL (ASCII 0), \n, \r, \, ', ", and ctrl-Z.
* In addition, the special control characters % and _ are also escaped,
* suitable for all statements, but especially suitable for `LIKE`.
*
* @param string $string String to add slashes to
* @return $string with `\` prepended to reserved characters
*
* @author Trevor Herselman
*/if (function_exists('mb_ereg_replace'))
{
functionmb_escape(string $string)
{
returnmb_ereg_replace('[\x00\x0A\x0D\x1A\x22\x25\x27\x5C\x5F]','\\\0',$string);
}
} else {
functionmb_escape(string $string)
{
returnpreg_replace('~[\x00\x0A\x0D\x1A\x22\x25\x27\x5C\x5F]~u','\\\$0',$string);
}
}?>
Additional characters escaped:
25 = %
5F = _
Bonus function:
The original MySQL `utf8` character-set (for tables and fields) only supports 3-byte sequences.
4-byte characters are not common, but I've had queries fail to execute on 4-byte UTF-8 characters, so you should be using `utf8mb4` wherever possible.
However, if you still want to use `utf8`, you can use the following function to replace all 4-byte sequences.
{
returnpreg_replace('/[\x{10000}-\x{10FFFF}]/u',"\xEF\xBF\xBD",$str);
}?>
Pick your poison and use at your own risk!