现在大部分编程环境及文件格式都是utf-8编码的
在使用php的substr方法截取某个带中文的string时,会发生截取错误(因为一个中文长度为3)
utf8RTrim方法可以使截取后的字符串恢复正常
public utf8RTrim ( $str ) {
if ( $str == null ) {
return "";
}
$new = "";
$pos = 0;
$len = strlen($str);
while ( $pos < $len ) {
$ch = ord($str[$pos]);
$mask = $ch & 0xf0;
if ( $mask == 0xf0 ) {
$pos += 4;
} elseif ( $mask == 0xe0 ) {
$pos += 3;
} elseif ( $mask == 0xc0 ) {
$pos += 2;
} else {
$pos ++;
}
if ( $pos - 1 < $len ) {
$last = $pos - 1;
}
}
if ( $last + 1 >= 0 ) {
$new = substr($str, 0, $last + 1);
}
return $new;
}
例子:
$str = "hello你好";
echo utf8RTrim(substr($str, 0, 6)); // hello
echo utf8RTrim(substr($str, 0,8)); // hello你