浅谈ThinkPhp里的utf8格式字符串截取

最新推荐文章于 2024-07-22 14:16:55 发布

猿一只

最新推荐文章于 2024-07-22 14:16:55 发布

阅读量730

点赞数

分类专栏：网站功能开发

本文链接：https://blog.csdn.net/m1101322632/article/details/50389874

版权

网站功能开发专栏收录该内容

7 篇文章 0 订阅

订阅专栏

定义，UTF8是一种unicode的变长通用转换格式字符编码，用1-6个字节来编码所有的Unicode字符。

好处，这个字符编码可以编码几乎所有语言的字符，同时在表示ASCII码可以表示的字符时编码和ASCII码完全一样，即兼容ASCii码。

转换，和unicode转换的方式是将unicode编码值转换成2进制，从低位开始每6位做一个分割，填充到下面UTF8各个长度编码模板中的x处。

一个字节 0xxxxxxx

两个字节 110xxxxx 10xxxxxx

三个字节 1110xxxx 10xxxxxx 10xxxxxx

四个字节 11110xxx 10xxxxxx 10xxxxxx 10xxxxxx

五个字节 111110xx 10xxxxxx 10xxxxxx 10xxxxxx 10xxxxxx

六个字节 1111110x 10xxxxxx 10xxxxxx 10xxxxxx 10xxxxxx 10xxxxxx

比如说unicode的 u0030 对应的UTF8字符编码： 00110000

uFAFA 对应的UTF8字符编码： 11001111 10101011 10111010

进入正题，贴代码：


$want_length    //想要截取的长度
$orgin_str          //原始字符串
$orgin_str_lengh  //原始字符长度
$need_cut_bytes =0;  //需要截取的字节数 
$current_length = 0;    //当前截取字节数对应的字符数
$fisrt_byte_ascII  //UTF8编码字符首个字节的编码值
$single_char_bytes //单个字符占的字节数
for($i =0; $i < $orgin_str_length; ) {
	$first_byte_ascII = ord($orgin_str[$i]); //获取ascII码值
	if($first_byte_ascII < 128)  {    //处理utf8单字节编码，比32小的是控制符
		$single_char_bytes   = 1;
		$ctr_but_display_chars = array(9, 10);  //水平制表符、回车符等控制符算一个显示字符  
                // ascii码小于32的是控制符，跳过读取下一个字符
		if(!in_array($first_byte_ascII, $ctr_but_display_chars) &&$first_byte_ascII < 32){
			$i++;
			$need_cut_bytes += $single_char_bytes;
			continue;  
		}
	} 
        // UTF8 编码中无  128 - 191 这种编码，所以对于utf8字符串这个判断不会进来
	else if($first_byte_ascII  < 192) {
	} 
	else if($first_byte_ascII  < 224) {  //处理utf8  2字节编码
		$single_char_bytes   = 2;
	}
	else if($first_byte_ascII  < 240) { //处理utf8  3字节编码
		$single_char_bytes = 3
	}
	else if($first_byte_ascII  < 248) { //处理utf8  4字节编码
		$single_char_bytes   = 4;
	}
	else if($first_byte_ascII  < 252) { //处理utf8  5字节编码	
		$single_char_bytes   = 5;
	}
	else if($first_byte_ascII  < 254) { //处理utf8  6字节编码
		$single_char_bytes   = 6;
	}
	$current_length ++; //当前截取字节数对应的字符数目自增1   
	$need_cut_bytes += $single_char_bytes;
	$i += $single_char_bytes;
	if($current_length >=$want_length ){ //当前截取字节数对应的字符数大于想要截取的字符数时，跳出循环
		break;
	}
}