转结构体_golang处理gb2312转utf8编码的问题

在将PHP或Java代码迁移到Golang时,可能会遇到字符编码转换问题,如GB2312转UTF-8。文章介绍了使用iconv-go库时遇到的'invalid or incomplete multibyte or wide character'错误,并详细解析了iconv-go库的源码,发现可通过在转换时添加'//IGNORE'来忽略无法转换的字符。修改后的代码解决了转换问题。
摘要由CSDN通过智能技术生成

问题描述:


如果你有把曾经的php或者java的老代码用go重写的经验,很可能会遇到gb2312转utf-8的问题

最近有同学在工作有使用到iconv-go这个库,涉及到转换字符的,出现如下报错,然后再咨询我,然后我自己也学习了一下。

报错信息如下:

invalid or incomplete multibyte or wide character

用到的golang转化库为:

github.com/djimenez/iconv-go

使用的函数为:

body, err = iconv.ConvertString(body, "GBK", "utf-8")

解决思路:

进去github.com/djimenez/iconv-go点击源码查看

首先iconv.ConvertString的实现是在iconv.go中

func ConvertString(input string, fromEncoding string, toEncoding string) (output string, err error) {  // create a temporary converter  converter, err := NewConverter(fromEncoding, toEncoding)  if err == nil {    // convert the string    output, err = converter.ConvertString(input)    // close the converter    converter.Close()  }  return}

通过以上发现, 它调用了

NewConverter(fromEncoding, toEncoding)

新建了一个结构体Converter,调用下面结构体的实现的

output, err = converter.ConvertString(input)

继续跟踪这个结构方法,在converter.go内找到实现

type Converter struct {  context C.iconv_t  open    bool}// Initialize a new Converter. If fromEncoding or toEncoding are not supported by// iconv then an EINVAL error will be returned. An ENOMEM error maybe returned if// there is not enough memory to initialize an iconv descriptorfunc NewConverter(fromEncoding string, toEncoding string) (converter *Converter, err error) {  converter = new(Converter)  // convert to C strings  toEncodingC := C.CString(toEncoding)  fromEncodingC := C.CString(fromEncoding)  // open an iconv descriptor  converter.context, err = C.iconv_open(toEncodingC, fromEncodingC)  // free the C Strings  C.free(unsafe.Pointer(toEncodingC))  C.free(unsafe.Pointer(fromEncodingC))  // check err  if err == nil {    // no error, mark the context as open    converter.open = true  }  return}

可以看出,它底层调用的是CGO库转化实现

converter.context, err = C.iconv_open(toEncodingC, fromEncodingC)

通过查询C库的文档man iconv_open,DESCRIPTION部分有如下介绍

The empty encoding name "" is equivalent to "char": it denotes the locale dependent character encoding.When the string "//TRANSLIT" is appended to tocode, transliteration is activated. This means that when a character cannot  be  represented  in  the  targetcharacter set, it can be approximated through one or several similarly looking characters.When the string "//IGNORE" is appended to tocode, characters that cannot be represented in the target character set will be silently discarded.The resulting conversion descriptor can be used with iconv any number of times. It remains valid until deallocated using iconv_close.A  conversion descriptor contains a conversion state. After creation using iconv_open, the state is in the initial state. Using iconv modifies the descrip-tor's conversion state. (This implies that a conversion descriptor can not be used in multiple threads simultaneously.) To bring the state back to the ini-tial state, use iconv with NULL as inbuf argument.

重点是这句话

When the string "//IGNORE" is appended to tocode, characters that cannot be represented in the target character set will be silently discarded.

大致意思是说,在"tocode"之后加"//IGNORE",那些不能被tocode显示的字符将会自动被忽略,oh good,正好是我想要的.

由这些层层调用关系

ConvertString(input string, fromEncoding string, toEncoding string)NewConverter(fromEncoding string, toEncoding string) (converter *Converter, err error)C.iconv_open(toEncodingC, fromEncodingC)

我们只需将//IGNORE传递到c库既可支持

所以代码改为:

body, err = iconv.ConvertString(body, "GBK", "utf-8//IGNORE")

经测试,没有报err,大功告成.


重述一下解决方案:

body, err = iconv.ConvertString(body, "GBK", "utf-8//IGNORE")

推荐阅读

  • Java 微服务能像 Go 一样快吗?

福利 我为大家整理了一份 从入门到进阶的Go学习资料礼包 ,包含学习建议:入门看什么,进阶看什么。 关注公众号 「polarisxu」,回复  ebook  获取;还可以回复「进群」,和数万 Gopher 交流学习。

18c8936bfaceeca63da6fb5066f0b2b9.png

评论
添加红包

请填写红包祝福语或标题

红包个数最小为10个

红包金额最低5元

当前余额3.43前往充值 >
需支付:10.00
成就一亿技术人!
领取后你会自动成为博主和红包主的粉丝 规则
hope_wisdom
发出的红包
实付
使用余额支付
点击重新获取
扫码支付
钱包余额 0

抵扣说明:

1.余额是钱包充值的虚拟货币,按照1:1的比例进行支付金额的抵扣。
2.余额无法直接购买下载,可以购买VIP、付费专栏及课程。

余额充值