php normalize,PHP - Manual: Normalizer::normalize (官方文档)

Especially when matching texts against each-other or against keywords, it is helpful to normalize the texts before.

The following function removes all diacritics (marks like accents) from a given UTF8-encoded texts and returns ASCii-text.

Be sure to have the PHP-Normalizer-extension (intl and icu) installed.

Tipp: You may also want to map the text to lower case before execute matching procedures ...

{// Normalizer-class missing!if (!class_exists("Normalizer",$autoload=false))

return$original_string;// maps German (umlauts) and other European characters onto two characters before just removing diacritics$s=preg_replace('@\x{00c4}@u',"AE",$s);// umlaut Ä => AE$s=preg_replace('@\x{00d6}@u',"OE",$s);// umlaut Ö => OE$s=preg_replace('@\x{00dc}@u',"UE",$s);// umlaut Ü => UE$s=preg_replace('@\x{00e4}@u',"ae",$s);// umlaut ä => ae$s=preg_replace('@\x{00f6}@u',"oe",$s);// umlaut ö => oe$s=preg_replace('@\x{00fc}@u',"ue",$s);// umlaut ü => ue$s=preg_replace('@\x{00f1}@u',"ny",$s);// ñ => ny$s=preg_replace('@\x{00ff}@u',"yu",$s);// ÿ => yu

// maps special characters (characters with diacritics) on their base-character followed by the diacritical mark

// exmaple:  Ú => U´,  á => a`$s=Normalizer::normalize($s,Normalizer::FORM_D);$s=preg_replace('@\pM@u',"",$s);// removes diacritics$s=preg_replace('@\x{00df}@u',"ss",$s);// maps German ß onto ss$s=preg_replace('@\x{00c6}@u',"AE",$s);// Æ => AE$s=preg_replace('@\x{00e6}@u',"ae",$s);// æ => ae$s=preg_replace('@\x{0132}@u',"IJ",$s);// ? => IJ$s=preg_replace('@\x{0133}@u',"ij",$s);// ? => ij$s=preg_replace('@\x{0152}@u',"OE",$s);// Œ => OE$s=preg_replace('@\x{0153}@u',"oe",$s);// œ => oe$s=preg_replace('@\x{00d0}@u',"D",$s);// Ð => D$s=preg_replace('@\x{0110}@u',"D",$s);// Ð => D$s=preg_replace('@\x{00f0}@u',"d",$s);// ð => d$s=preg_replace('@\x{0111}@u',"d",$s);// d => d$s=preg_replace('@\x{0126}@u',"H",$s);// H => H$s=preg_replace('@\x{0127}@u',"h",$s);// h => h$s=preg_replace('@\x{0131}@u',"i",$s);// i => i$s=preg_replace('@\x{0138}@u',"k",$s);// ? => k$s=preg_replace('@\x{013f}@u',"L",$s);// ? => L$s=preg_replace('@\x{0141}@u',"L",$s);// L => L$s=preg_replace('@\x{0140}@u',"l",$s);// ? => l$s=preg_replace('@\x{0142}@u',"l",$s);// l => l$s=preg_replace('@\x{014a}@u',"N",$s);// ? => N$s=preg_replace('@\x{0149}@u',"n",$s);// ? => n$s=preg_replace('@\x{014b}@u',"n",$s);// ? => n$s=preg_replace('@\x{00d8}@u',"O",$s);// Ø => O$s=preg_replace('@\x{00f8}@u',"o",$s);// ø => o$s=preg_replace('@\x{017f}@u',"s",$s);// ? => s$s=preg_replace('@\x{00de}@u',"T",$s);// Þ => T$s=preg_replace('@\x{0166}@u',"T",$s);// T => T$s=preg_replace('@\x{00fe}@u',"t",$s);// þ => t$s=preg_replace('@\x{0167}@u',"t",$s);// t => t

// remove all non-ASCii characters$s=preg_replace('@[^\0-\x80]@u',"",$s);// possible errors in UTF8-regular-expressionsif (empty($s))

return$original_string;

else

return$s;

}?>

The above function is mainly based on the following article:

http://ahinea.com/en/tech/accented-translate.html

  • 0
    点赞
  • 0
    收藏
    觉得还不错? 一键收藏
  • 0
    评论

“相关推荐”对你有帮助么?

  • 非常没帮助
  • 没帮助
  • 一般
  • 有帮助
  • 非常有帮助
提交
评论
添加红包

请填写红包祝福语或标题

红包个数最小为10个

红包金额最低5元

当前余额3.43前往充值 >
需支付:10.00
成就一亿技术人!
领取后你会自动成为博主和红包主的粉丝 规则
hope_wisdom
发出的红包
实付
使用余额支付
点击重新获取
扫码支付
钱包余额 0

抵扣说明:

1.余额是钱包充值的虚拟货币,按照1:1的比例进行支付金额的抵扣。
2.余额无法直接购买下载,可以购买VIP、付费专栏及课程。

余额充值