ccsv 编码转换_将CSV文件从任何类型转换为UTF-8

Hello I am creating a simple console application in vb.net in order to convert a file from any type to utf8 but i can't figure out how this thing works with the encoding. I know that the source file is in Unicode, but when i convert it to a new format i get junk. Any suggestions? i am not sure if my code is correct

this is my code.

Imports System.IO

Imports System.Text

Module Module1

Sub Main()

Console.Write("Please give the filepath (example:c:/tesfile.csv):")

Dim filepath As String = Console.ReadLine()

Dim sEncoding As String = DetermineFileType(filepath)

Dim strContents As String

Dim strEncodedContents As String

Dim objReader As StreamReader

Dim ErrInfo As String

Dim bString As Byte()

Try

'Read the file

objReader = New StreamReader(filepath)

'Read untill the end

strContents = objReader.ReadToEnd()

'Close The file

objReader.Close()

'Write Contents on DOS

Console.WriteLine(strContents)

Console.WriteLine("")

bString = EncodeString(strContents, "UTF-8")

strEncodedContents = System.Text.Encoding.UTF8.GetString(bString)

Dim objWriter As New System.IO.StreamWriter(filepath.Replace(".csv", "_encoded.csv"))

objWriter.WriteLine(strEncodedContents)

objWriter.Close()

Console.WriteLine("Encoding Finished")

Catch Ex As Exception

ErrInfo = Ex.Message

Console.WriteLine(ErrInfo)

End Try

Console.ReadKey()

End Sub

Public Function DetermineFileType(ByVal aFileName As String) As String

Dim sEncoding As String = String.Empty

Dim oSR As New StreamReader(aFileName, True)

oSR.ReadToEnd()

' Add this line to read the file.

sEncoding = oSR.CurrentEncoding.EncodingName

Return sEncoding

End Function

Function EncodeString(ByRef SourceData As String, ByRef CharSet As String) As Byte()

'get a byte pointer To the source data

Dim bSourceData As Byte() = System.Text.Encoding.Unicode.GetBytes(SourceData)

'get destination encoding

Dim OutEncoding As System.Text.Encoding = System.Text.Encoding.GetEncoding(CharSet)

'Encode the data To destination code page/charset

Return System.Text.Encoding.Convert(OutEncoding, System.Text.Encoding.UTF8, bSourceData)

End Function

End Module

解决方案

StreamReader has a constructor that takes an Encoding if you know the encoding of the file you should pass that into the constructor of StreamReader

objReader = New StreamReader(filepath, Encoding.UTF32)

EDIT

You say in a comment that the file is Encoded as UCS-2 from Wikipedia

The older UCS-2 (2-byte Universal Character Set) is a similar

character encoding that was superseded by UTF-16 in version 2.0 of the

Unicode standard in July 1996.2 It produces a fixed-length format by

simply using the code point as the 16-bit code unit and produces

exactly the same result as UTF-16 for 96.9% of all the code points in

the range 0-0xFFFF, including all characters that had been assigned a

value at that time.

In which case you can try to decode using UTF-16 which is called Unicode with in System.Text.Encoding so try

objReader = New StreamReader(filepath, Encoding.Unicode)

FYI

Unicode is a standard which has a variety of encodings including

UTF-8

UTF-16 (BigEndian)

UTF-16 (LittleEndian)

UTF-32 (BigEndian)

UTF-32 (LittleEndian)

For Microsoft to call UTF-16 Unicode is a little misleading but not inaccurate, UTF-16 is one encoding possible for Unicode.

  • 0
    点赞
  • 0
    收藏
    觉得还不错? 一键收藏
  • 0
    评论

“相关推荐”对你有帮助么?

  • 非常没帮助
  • 没帮助
  • 一般
  • 有帮助
  • 非常有帮助
提交
评论
添加红包

请填写红包祝福语或标题

红包个数最小为10个

红包金额最低5元

当前余额3.43前往充值 >
需支付:10.00
成就一亿技术人!
领取后你会自动成为博主和红包主的粉丝 规则
hope_wisdom
发出的红包
实付
使用余额支付
点击重新获取
扫码支付
钱包余额 0

抵扣说明:

1.余额是钱包充值的虚拟货币,按照1:1的比例进行支付金额的抵扣。
2.余额无法直接购买下载,可以购买VIP、付费专栏及课程。

余额充值