IOS汉字编码转化 分类: IOS

转载 2013年12月05日 11:11:07
  • Unicode转化为汉字

+ (NSString *)replaceUnicode:(NSString *)unicodeStr {  

    

   NSString *tempStr1 = [unicodeStrstringByReplacingOccurrencesOfString:@"\\u"withString:@"\\U"];  

   NSString *tempStr2 = [tempStr1stringByReplacingOccurrencesOfString:@"\""withString:@"\\\""];  

   NSString *tempStr3 = [[@"\""stringByAppendingString:tempStr2]stringByAppendingString:@"\""];  

   NSData *tempData = [tempStr3dataUsingEncoding:NSUTF8StringEncoding];  

   NSString* returnStr = [NSPropertyListSerializationpropertyListFromData:tempData  

                                                          mutabilityOption:NSPropertyListImmutable   

                                                                    format:NULL  

                                                          errorDescription:NULL];  

    

   return [returnStrstringByReplacingOccurrencesOfString:@"\\r\\n"withString:@"\n"];  

    

}


  • 汉字与utf8相互转化

 NSString* strA = [@"%E4%B8%AD%E5%9B%BD"stringByReplacingPercentEscapesUsingEncoding:NSUTF8StringEncoding];

 NSString *strB = [@"中国"stringByAddingPercentEscapesUsingEncoding:NSUTF8StringEncoding];


  • NSString 转化为utf8

 NSString *strings = [NSStringstringWithFormat:@"abc"];

    

    NSLog(@"strings : %@",strings);

    

   CF_EXPORT

    CFStringRef CFURLCreateStringByAddingPercentEscapes(CFAllocatorRef allocator,CFStringReforiginalString,CFStringRef charactersToLeaved, CFStringReflegalURLCharactersToBeEscaped,CFStringEncoding encoding);

    

    NSString *encodedValue = (__bridge NSString*)CFURLCreateStringByAddingPercentEscapes(nil,                                    (__bridgeCFStringRef)strings,nil, (CFStringRef)@"!*'();:@&=+$,/?%#[]",kCFStringEncodingUTF8);





  • iso8859-1 到 unicode编码转换

  

+ (NSString *)changeISO88591StringToUnicodeString:(NSString *)iso88591String

{

    

    NSMutableString *srcString = [[[NSMutableString alloc]initWithString:iso88591String]autorelease];

    

    [srcString replaceOccurrencesOfString:@"&" withString:@"&" options:NSLiteralSearchrange:NSMakeRange(0, [srcString length])];

    [srcString replaceOccurrencesOfString:@"&#x" withString:@"" options:NSLiteralSearchrange:NSMakeRange(0, [srcString length])];

        

    NSMutableString *desString = [[[NSMutableString alloc]initautorelease];

    

    NSArray *arr = [srcString componentsSeparatedByString:@";"];

    

    for(int i=0;i<[arr count]-1;i++){

        

        NSString *v = [arr objectAtIndex:i];

        char *c = malloc(3);

        int value = [StringUtil changeHexStringToDecimal:v];

        c[1] = value  &0x00FF;

        c[0] = value >>8 &0x00FF;

        c[2] = '\0';

        [desString appendString:[NSString stringWithCString:c encoding:NSUnicodeStringEncoding]];

        free(c);

    }

    

    return desString;

}




Q: Is there a standard method to package a Unicode character so it fits an 8-Bit ASCII stream?

A: There are three or four options for making Unicode fit into an 8-bit format.

a) Use UTF-8. This preserves ASCII, but not Latin-1, because the characters >127 are different from Latin-1. UTF-8 uses the bytes in the ASCII only for ASCII characters. Therefore, it works well in any environment where ASCII characters have a significance as syntax characters, e.g. file name syntaxes, markup languages, etc., but where the all other characters may use arbitrary bytes. 
Example: “Latin Small Letter s with Acute” (015B) would be encoded as two bytes: C5 9B.

b) Use Java or C style escapes, of the form \uXXXXX or \xXXXXX. This format is not standard for text files, but well defined in the framework of the languages in question, primarily for source files.
Example: The Polish word “wyjście” with character “Latin Small Letter s with Acute” (015B) in the middle (ś is one character) would look like: “wyj\u015Bcie".

c) Use the &#xXXXX; or &#DDDDD; numeric character escapes as in HTML or XML. Again, these are not standard for plain text files, but well defined within the framework of these markup languages.
Example: “wyjście” would look like “wyj&#x015B;cie"

d) Use SCSU. This format compresses Unicode into 8-bit format, preserving most of ASCII, but using some of the control codes as commands for the decoder. However, while ASCII text will look like ASCII text after being encoded in SCSU, other characters may occasionally be encoded with the same byte values, making SCSU unsuitable for 8-bit channels that blindly interpret any of the bytes as ASCII characters.
Example: “<SC2> wyjÛcie” where <SC2> indicates the byte 0x12 and “Û” corresponds to byte 0xDB. [AF] & [KW]


如c所描述,这是一种“未标准"但广泛采用的做法,说是山寨编码也行 :-)

所以编码过程是

字符串 -> Unicode编码 -> &#xXXXX; or &#DDDDD; 

解码过程反过来即可 

iOS 显示汉字的Unicode和UTF-8编码

  • 2013年03月10日 23:38
  • 41KB
  • 下载

解决Qt中文乱码以及汉字编码的问题(UTF-8/GBK)

一、Qt环境设置 Qt Creator,菜单->工具->选项->文本编辑器->行为->文件编码: 默认编码:System(简体中文windows系统默认指的是GBK编码,即下拉框选项里的GBK/win...

utf8汉字编码16进制对照

In this table you will find:       GB Code (in Hex notation)       Unicode Number       UTF-8 Cod...

Unicode汉字编码表-转

Unicode汉字编码表-转 空间 1 Unicode编码表   Unicode只有一个字符集,中、日、韩的三种文字占用了Unicode中0x30...
  • xp5xp6
  • xp5xp6
  • 2016年01月17日 12:04
  • 483

【python】python 转换为json时候 汉字编码问题

有这样一个需求:       需要一个json 文件 数据从数据库里查询出来 1. 设置文件头       Python代码   # -*- coding:utf-8 -*-   ...

Unicode汉字编码表

1 unicode编码表UNICODE只有一个字符集,中、日、韩的三种文字占用了Unicode中0x3000到0x9FFF的部分 Unicode目前普遍采用的是UCS-2,它用两个字节来编码一个字符...

使用WireShark分析HTTP协议时几种常见的汉字编码

在使用WireShark分析HTTP协议的过程中,我们自然是首先要完成解密(若是使用了SSL)、重组(若是使用了chunked分段编码)、解压(若是使用了压缩编码)【幸运的是,除了加解密,这一系列的工...

汉字编码及区位码查询算法

为了使每一个汉字有一个全国统一的代码,1980年,我国颁布了第一个汉字编码的国家标准:GB2312-80《信息交换用汉字编码字符集》基本集,这个字符集是我国中文信息处理技术的发展基础,也是目前国内所有...

Unicode汉字编码表

Unicode汉字编码表1 Unicode编码表  Unicode只有一个字符集,中、日、韩的三种文字占用了Unicode中0x3000到0x9FFF的部分 Unicode目前普遍采用的是U...

gb2312-信息交换用汉字编码字符集

来自百度百科:http://baike.baidu.com/view/443268.htm?from_id=483170&type=syn&fromtitle=GB2312&fr=aladdin ...
内容举报
返回顶部
收藏助手
不良信息举报
您举报文章:IOS汉字编码转化 分类: IOS
举报原因:
原因补充:

(最多只允许输入30个字)