php iconv translit,php iconv translit for removing accents: not working as excepted?

可以将文章内容翻译成中文,广告屏蔽插件可能会导致该功能失效(如失效,请关闭广告屏蔽插件后再试):

问题:

consider this simple code: echo iconv('UTF-8', 'ASCII//TRANSLIT', 'è');

it prints `e

instead of just e

do you know what I am doing wrong?

nothing changed after adding setlocale setlocale(LC_COLLATE, 'en_US.utf8'); echo iconv('UTF-8', 'ASCII//TRANSLIT', 'è');

回答1:

I have this standard function to return valid url strings without the invalid url characters. The magic seems to be in the line after the //remove unwanted characters comment.

This is taken from the Symfony framework documentation: http://www.symfony-project.org/jobeet/1_4/Doctrine/en/08 which in turn is taken from http://php.vrana.cz/vytvoreni-pratelskeho-url.php but i don't speak Czech ;-) function slugify($text) { // replace non letter or digits by - $text = preg_replace('#[^\\pL\d]+#u', '-', $text); // trim $text = trim($text, '-'); // transliterate if (function_exists('iconv')) { $text = iconv('utf-8', 'us-ascii//TRANSLIT', $text); } // lowercase $text = strtolower($text); // remove unwanted characters $text = preg_replace('#[^-\w]+#', '', $text); if (empty($text)) { return 'n-a'; } return $text; } echo slugify('é'); // --> "e"

回答2:

cf @tchrist, with INTL php extension preg_replace('/\pM*/u','',normalizer_normalize( $mystring, Normalizer::FORM_D));

becomes

As tchrist emphasises, not all unicode characters are considered decomposable:

extract from Unicode charts:

U0080.pdf ≡ 0049 I 0308 ¨

no decomposition available, IMHO strangely (we could consider ASCII letter D as an acceptable equivalent).

U0100.pdf

even stranger: this one is identified as LATIN CAPITAL LETTER D (with stroke), but not decomposable as such! Perhaps a cooler solution should be to get the unicode description of each char, and compare it with the description of each ascii char (and replace accordingly). Anyone? ;-]

回答3:

When doing transliteration, you have to make sure that your LC_COLLATE is properly set, otherwise the default POSIX will be used.

回答4:

I'm tempted to say "nothing", although this is a little outside my expertise. PHP's iconv() is notorious, and the inspiration for many workarounds, including dropping to the system's iconv utility (Unix & Linux)

crafting a lookup table

replacing all accented characters with an ASCII equivalent as kind of a preprocessing stage

setting LC_COLLATE (which doesn't seem to work for everyone)

use htmlentities() instead of iconv()

Read the comments for iconv() documentation for more inspiration. (Or commiseration. Too close to call.)

回答5:

It seems the standard way to handle this is with a "removing accents" function which you can find in library's like flourish or CMS's like Wordpress. Iconv seems to be unable to translate accents (and rightly so) since this isn't a good idea for anything other than URL slugs.

回答6:

It happen with me with pure iconv without php. The Trick was to set LANG environment value to en_US.UTF-8 (it was hu_HU.UTF-8 before, in my case). After it worked as expected.

回答7:

It seem that it depend of the php version...

TestCase #1 php -version

PHP 7.0.0RC8 (cli) (built: Nov 25 2015 12:36:50) ( NTS ) Copyright (c) 1997-2015 The PHP Group Zend Engine v3.0.0, Copyright (c) 1998-2015 Zend Technologies with Zend OPcache v7.0.6-dev, Copyright (c) 1999-2015, by Zend Technologies php -r "var_dump(iconv('UTF-8', 'ASCII//TRANSLIT', 'è'));" string(2) "`e"

TestCase #2 php -version

PHP 7.0.8-1~dotdeb+8.1 (cli) ( NTS ) Copyright (c) 1997-2016 The PHP Group Zend Engine v3.0.0, Copyright (c) 1998-2016 Zend Technologies with Zend OPcache v7.0.8-1~dotdeb+8.1, Copyright (c) 1999-2016, by Zend Technologies php -r "var_dump(iconv('UTF-8', 'ASCII//TRANSLIT', 'è'));" string(1) "e"

评论
添加红包

请填写红包祝福语或标题

红包个数最小为10个

红包金额最低5元

当前余额3.43前往充值 >
需支付:10.00
成就一亿技术人!
领取后你会自动成为博主和红包主的粉丝 规则
hope_wisdom
发出的红包
实付
使用余额支付
点击重新获取
扫码支付
钱包余额 0

抵扣说明:

1.余额是钱包充值的虚拟货币,按照1:1的比例进行支付金额的抵扣。
2.余额无法直接购买下载,可以购买VIP、付费专栏及课程。

余额充值