code:
$regex = '/([0-9|#][\x{20E3}])|[\x{00ae}|\x{00a9}|\x{203C}|\x{2047}|\x{2048}|\x{2049}|\x{3030}|\x{303D}|\x{2139}|\x{2122}|\x{3297}|\x{3299}][\x{FE00}-\x{FEFF}]?|[\x{2190}-\x{21FF}][\x{FE00}-\x{FEFF}]?|[\x{2300}-\x{23FF}][\x{FE00}-\x{FEFF}]?|[\x{2460}-\x{24FF}][\x{FE00}-\x{FEFF}]?|[\x{25A0}-\x{25FF}][\x{FE00}-\x{FEFF}]?|[\x{2600}-\x{27BF}][\x{FE00}-\x{FEFF}]?|[\x{2900}-\x{297F}][\x{FE00}-\x{FEFF}]?|[\x{2B00}-\x{2BF0}][\x{FE00}-\x{FEFF}]?|[\x{1F000}-\x{1F6FF}][\x{FE00}-\x{FEFF}]?/u';
var_dump(preg_match($regex, '?'));
var_dump(preg_match($regex, '?'));
var_dump(preg_match($regex, '?'));
var_dump(preg_match($regex, '测试'));
var_dump(preg_match($regex, 'hello, world'));
var_dump(preg_match($regex, 'testing'));
var_dump(preg_match($regex, '中文 English'));
result:
int(1)
int(1)
int(1)
int(0)
int(0)
int(0)
int(0)
表达式是找来的,可以去看下utf8的码表中emoji的范围(或者从unicode码表范围转utf8)
不过建议还是改utf8mb4,这种迁移基本是无痛的,充分测试下即可