VB中Inet控件(Internet Transfer Control控件)读取网页的HTML代码是很方便,不过有个问题,在读取的是utf-8编码的网页时会出现乱码。这也难怪VB默认支持的是UNICODE编码,在读取utf-8的数据时自然不知所措了。那怎么才能将utf-8的网页数据转换成gb2312呢?
首先Inet取得数据的时候必须用二进制的形式取得,Utf8ToUnicode函数即可将Inet收到的二进制数组转换成UNICODE字符串。传入参数为二进制数组返回转换后的字符串。
模块代码如下:'utf-8转换UNICODE代码
Option Explicit
Private Declare Function MultiByteToWideChar Lib "kernel32" (ByVal CodePage As Long, ByVal dwFlags As Long, ByVal lpMultiByteStr As Long, ByVal cchMultiByte As Long, ByVal lpWideCharStr As Long, ByVal cchWideChar As Long) As Long
Private Const CP_UTF8 = 65001
Function Utf8ToUnicode(ByRef Utf() As Byte) As String
Dim lRet As Long
Dim lLength As Long
Dim lBufferSize As Long
lLength = UBound(Utf) - LBound(Utf) + 1
If lLength <= 0 Then Exit Function
lBufferSize = lLength * 2
Utf8ToUnicode = String$(lBufferSize, Chr(0))
lRet = MultiByteToWideChar(CP_UTF8, 0, VarPtr(Utf(0)), lLength, StrPtr(Utf8ToUnicode), lBufferSize)
If lRet <> 0 Then
Utf8ToUnicode = Left(Utf8ToUnicode, lRet)
Else
Utf8ToUnicode = ""
End If
End Function
窗体代码如下:
Private Sub Command1_Click()
Dim BinBuff() As Byte
BinBuff = Inet1.OpenURL(Text3.Text, icByteArray)
RichTextBox1.Text = Utf8ToUnicode(BinBuff)
End Sub