php正则表达式 u,php正则表达式匹配字母。 Aka u =ü或ê=é=è= e

最新推荐文章于 2022-08-29 16:54:18 发布

不在船上的水手

最新推荐文章于 2022-08-29 16:54:18 发布

阅读量261

点赞数

PHP 正则表达式国际化字符搜索替换 ASCII标准化

关键词由CSDN通过智能技术生成

遗憾的是，php regex(我所知道的)中没有神奇的角色类或技巧可以解决这个问题。我反而选择了另一条路线：

$search = '+ fête foret ca rentrée w0w !!!';

$text = 'La paix fêtée avec plus de 40 cultures dans une forêt. Ça commence bien devant la rentrée...
Il répond: w0w tros cool!!! En + il fait chaud!';

$left_token = '';

$right_token = '';

$encoding = 'UTF-8';

// Let's normalize both search and needle

$search_normalized = normalize($search);

$text_normalized = normalize($text);

// Fixed preg_quote() and match UTF whitespaces

$search_needles = preg_split('/\s+/u', $search_normalized);

// We'll save the output in a separate variable

$text_output = $text;

// Since we made the tokens a variable, we'll need to calculate the offsets

$offset_size = strlen($left_token . $right_token);

// Start searching

foreach($search_needles as $needle) {

// Reset for each word

$search_offset = 0;

// We may have several occurences

while(true) {

if($search_offset > mb_strlen($text_normalized)) { // No more needles

break;

} else {

$pos = mb_stripos($text_normalized, $needle, $search_offset, $encoding);

}

if($pos === false) { // No more needles here

break;

}

$len = mb_strlen($needle);

// Insert tokens

$text_output = mb_substr($text_output, 0, $pos, $encoding) . // Left side

$left_token .

mb_substr($text_output, $pos, $len, $encoding) . // The enclosed word

$right_token .

mb_substr($text_output, $pos + $len, NULL, $encoding); // Right side

// We need to update this too otherwise the positions won't be the same

$text_normalized = mb_substr($text_normalized, 0, $pos, $encoding) . // Left side

$left_token .

mb_substr($text_normalized, $pos, $len, $encoding) . // The enclosed word

$right_token .

mb_substr($text_normalized, $pos + $len, NULL, $encoding); // Right side

// Advance in the search

$search_offset = $pos + $len + $offset_size;

}

}

echo($text_output);

var_dump($text_output);

// Credits: http://stackoverflow.com/a/10064701

function normalize($input) {

$normalizeChars = array(

'Š'=>'S', 'š'=>'s', 'Ð'=>'Dj','Ž'=>'Z', 'ž'=>'z', 'À'=>'A', 'Á'=>'A', 'Â'=>'A', 'Ã'=>'A', 'Ä'=>'A',

'Å'=>'A', 'Æ'=>'A', 'Ç'=>'C', 'È'=>'E', 'É'=>'E', 'Ê'=>'E', 'Ë'=>'E', 'Ì'=>'I', 'Í'=>'I', 'Î'=>'I',

'Ï'=>'I', 'Ñ'=>'N', 'Ń'=>'N', 'Ò'=>'O', 'Ó'=>'O', 'Ô'=>'O', 'Õ'=>'O', 'Ö'=>'O', 'Ø'=>'O', 'Ù'=>'U', 'Ú'=>'U',

'Û'=>'U', 'Ü'=>'U', 'Ý'=>'Y', 'Þ'=>'B', 'ß'=>'Ss','à'=>'a', 'á'=>'a', 'â'=>'a', 'ã'=>'a', 'ä'=>'a',

'å'=>'a', 'æ'=>'a', 'ç'=>'c', 'è'=>'e', 'é'=>'e', 'ê'=>'e', 'ë'=>'e', 'ì'=>'i', 'í'=>'i', 'î'=>'i',

'ï'=>'i', 'ð'=>'o', 'ñ'=>'n', 'ń'=>'n', 'ò'=>'o', 'ó'=>'o', 'ô'=>'o', 'õ'=>'o', 'ö'=>'o', 'ø'=>'o', 'ù'=>'u',

'ú'=>'u', 'û'=>'u', 'ü'=>'u', 'ý'=>'y', 'ý'=>'y', 'þ'=>'b', 'ÿ'=>'y', 'ƒ'=>'f',

'ă'=>'a', 'î'=>'i', 'â'=>'a', 'ș'=>'s', 'ț'=>'t', 'Ă'=>'A', 'Î'=>'I', 'Â'=>'A', 'Ș'=>'S', 'Ț'=>'T',

);

return strtr($input, $normalizeChars);

}

基本上：

标准化：将needle和haystack转换为普通的ASCII字符。

查找位置：在规范化的大海捞针中搜索标准化针的位置。

插入：相应地将开始和结束标记插入原始字符串。

重复：有时您可能会多次出现。重复此过程，直到不再发生。

醇>

示例输出：

La paix fêtée avec plus de 40 cultures dans une forêt. Ça commence bien devant la rentrée...
Il répond: w0w tros cool!!! En + il fait chaud!

不在船上的水手

关注

0
点赞
踩
0

收藏

觉得还不错? 一键收藏
0
评论
复制链接

分享到 QQ

分享到新浪微博

扫一扫

评论

被折叠的条评论为什么被折叠?

到【灌水乐园】发言

查看更多评论

添加红包

成就一亿技术人!

hope_wisdom

发出的红包

实付元

使用余额支付

点击重新获取

扫码支付

钱包余额 0

抵扣说明：

1.余额是钱包充值的虚拟货币，按照1:1的比例进行支付金额的抵扣。
2.余额无法直接购买下载，可以购买VIP、付费专栏及课程。