IOS汉字编码转化 分类: IOS

转载 2013年12月05日 11:11:07
  • Unicode转化为汉字

+ (NSString *)replaceUnicode:(NSString *)unicodeStr {  

    

   NSString *tempStr1 = [unicodeStrstringByReplacingOccurrencesOfString:@"\\u"withString:@"\\U"];  

   NSString *tempStr2 = [tempStr1stringByReplacingOccurrencesOfString:@"\""withString:@"\\\""];  

   NSString *tempStr3 = [[@"\""stringByAppendingString:tempStr2]stringByAppendingString:@"\""];  

   NSData *tempData = [tempStr3dataUsingEncoding:NSUTF8StringEncoding];  

   NSString* returnStr = [NSPropertyListSerializationpropertyListFromData:tempData  

                                                          mutabilityOption:NSPropertyListImmutable   

                                                                    format:NULL  

                                                          errorDescription:NULL];  

    

   return [returnStrstringByReplacingOccurrencesOfString:@"\\r\\n"withString:@"\n"];  

    

}


  • 汉字与utf8相互转化

 NSString* strA = [@"%E4%B8%AD%E5%9B%BD"stringByReplacingPercentEscapesUsingEncoding:NSUTF8StringEncoding];

 NSString *strB = [@"中国"stringByAddingPercentEscapesUsingEncoding:NSUTF8StringEncoding];


  • NSString 转化为utf8

 NSString *strings = [NSStringstringWithFormat:@"abc"];

    

    NSLog(@"strings : %@",strings);

    

   CF_EXPORT

    CFStringRef CFURLCreateStringByAddingPercentEscapes(CFAllocatorRef allocator,CFStringReforiginalString,CFStringRef charactersToLeaved, CFStringReflegalURLCharactersToBeEscaped,CFStringEncoding encoding);

    

    NSString *encodedValue = (__bridge NSString*)CFURLCreateStringByAddingPercentEscapes(nil,                                    (__bridgeCFStringRef)strings,nil, (CFStringRef)@"!*'();:@&=+$,/?%#[]",kCFStringEncodingUTF8);





  • iso8859-1 到 unicode编码转换

  

+ (NSString *)changeISO88591StringToUnicodeString:(NSString *)iso88591String

{

    

    NSMutableString *srcString = [[[NSMutableString alloc]initWithString:iso88591String]autorelease];

    

    [srcString replaceOccurrencesOfString:@"&" withString:@"&" options:NSLiteralSearchrange:NSMakeRange(0, [srcString length])];

    [srcString replaceOccurrencesOfString:@"&#x" withString:@"" options:NSLiteralSearchrange:NSMakeRange(0, [srcString length])];

        

    NSMutableString *desString = [[[NSMutableString alloc]initautorelease];

    

    NSArray *arr = [srcString componentsSeparatedByString:@";"];

    

    for(int i=0;i<[arr count]-1;i++){

        

        NSString *v = [arr objectAtIndex:i];

        char *c = malloc(3);

        int value = [StringUtil changeHexStringToDecimal:v];

        c[1] = value  &0x00FF;

        c[0] = value >>8 &0x00FF;

        c[2] = '\0';

        [desString appendString:[NSString stringWithCString:c encoding:NSUnicodeStringEncoding]];

        free(c);

    }

    

    return desString;

}




Q: Is there a standard method to package a Unicode character so it fits an 8-Bit ASCII stream?

A: There are three or four options for making Unicode fit into an 8-bit format.

a) Use UTF-8. This preserves ASCII, but not Latin-1, because the characters >127 are different from Latin-1. UTF-8 uses the bytes in the ASCII only for ASCII characters. Therefore, it works well in any environment where ASCII characters have a significance as syntax characters, e.g. file name syntaxes, markup languages, etc., but where the all other characters may use arbitrary bytes. 
Example: “Latin Small Letter s with Acute” (015B) would be encoded as two bytes: C5 9B.

b) Use Java or C style escapes, of the form \uXXXXX or \xXXXXX. This format is not standard for text files, but well defined in the framework of the languages in question, primarily for source files.
Example: The Polish word “wyjście” with character “Latin Small Letter s with Acute” (015B) in the middle (ś is one character) would look like: “wyj\u015Bcie".

c) Use the &#xXXXX; or &#DDDDD; numeric character escapes as in HTML or XML. Again, these are not standard for plain text files, but well defined within the framework of these markup languages.
Example: “wyjście” would look like “wyj&#x015B;cie"

d) Use SCSU. This format compresses Unicode into 8-bit format, preserving most of ASCII, but using some of the control codes as commands for the decoder. However, while ASCII text will look like ASCII text after being encoded in SCSU, other characters may occasionally be encoded with the same byte values, making SCSU unsuitable for 8-bit channels that blindly interpret any of the bytes as ASCII characters.
Example: “<SC2> wyjÛcie” where <SC2> indicates the byte 0x12 and “Û” corresponds to byte 0xDB. [AF] & [KW]


如c所描述,这是一种“未标准"但广泛采用的做法,说是山寨编码也行 :-)

所以编码过程是

字符串 -> Unicode编码 -> &#xXXXX; or &#DDDDD; 

解码过程反过来即可 

相关文章推荐

使用WireShark分析HTTP协议时几种常见的汉字编码及其解码方法小结

在使用WireShark分析HTTP协议的过程中,我们自然是首先要完成解密(若是使用了SSL)、重组(若是经过分包)、解压(若是使用了压缩编码)【在Wireshark1.8以后的版本中对这些功能提供了...

php url 汉字编码问题

理解URLEncode:URLEncode:是指针对网页url中的中文字符的一种编码转化方式,最常见的就是Baidu、Google等搜索引擎中输入中文查询时候,生成经过 Encode过的网页URL。U...

汉字编码

一般有四种编码Unicode、GB2312、GBK和GB18030。 1、Unicode中的汉字 在Unicode 5.0的99089个字符中,有71226个字符与汉字有关。它们的分布如下: ...

汉字编码范围

汉字编码范围 GB2312 范围: 0xA1A1 - 0xFEFE 汉字范围: 0xB0A1 - 0xF7FE GB2312码是中华人民共和国国家汉字信息交换用编码,全称《信息交换用汉字编码字...

汉字编码对照表(gb2312/unicode/utf8)

一、汉字编码的种类    汉字编码中现在主要用到的有三类,包括GBK,GB2312和Big5。    1、GB2312又称国标码,由国家标准总局发布,1981年5月1日实施,通行于大陆。新加坡等地也使...

转一个“GB,GBK,GB18030,Unicode”汉字编码知识

最近常见有人对GB、GBK、GB18030、Unicode等编码概念不清,再掺杂上Ext-A、Ext-B、Ext-C等,更混乱了。所以特别整理一些知识贴出来给大家:1、GB、BIG5、GBK、GB...

GB2312(部分GBK)汉字编码表

GB2312(部分GBK)汉字编码表 (2009-11-27 11:30:08) 转载 code    +0 +1+2 +3 +4 +5 +6 +7 +8 +9 +A +B +C +...

汉字编码中区位码、国标码和机内码的区别

区码和位码分别加20H后转换成内码 例如:区位码5448,其中区码54(十六进制36H)位码48(十六进制30H) 36H+20H->56H,30H+20H->50H,所以该字国标码为56...

答网友关于base64算法的汉字编码问题

曾几何时,写了这么一文: http://blog.csdn.net/prsniper/article/details/7097643   网友linyanwen99经过运行发现中文下编码是错误的...
内容举报
返回顶部
收藏助手
不良信息举报
您举报文章:深度学习:神经网络中的前向传播和反向传播算法推导
举报原因:
原因补充:

(最多只允许输入30个字)