做个了文件分割软件,但是在分割文本小说时,发现分割出来的子文件并没有使用源文件的编码,
却是使用的UTF-8,在editplus中看倒是也没有什么,但是在MOBIPORKET READER中打开却是乱码,
因此,决定找到问题,修改一下,经过搜索和查看MSDN,
总结如下,
使用
![ContractedBlock.gif](https://www.cnblogs.com/Images/OutliningIndicators/ContractedBlock.gif)
![ExpandedBlockStart.gif](https://www.cnblogs.com/Images/OutliningIndicators/ExpandedBlockStart.gif)
public StreamReader (
string path,
bool detectEncodingFromByteOrderMarks
)
构造函数使用StreamReader就可以了,这样在StreamReader的实例中属性CurrentEncoding就包含了,文件的原始编码信息
扩展一下:
搜索的时候发现文本文件的开头几个字节就可以用来文件使用的编码情况,在C#中使用下面:
![ContractedBlock.gif](https://www.cnblogs.com/Images/OutliningIndicators/ContractedBlock.gif)
![ExpandedBlockStart.gif](https://www.cnblogs.com/Images/OutliningIndicators/ExpandedBlockStart.gif)
1
![](https://www.cnblogs.com/Images/OutliningIndicators/None.gif)
2
![ExpandedBlockStart.gif](https://www.cnblogs.com/Images/OutliningIndicators/ExpandedBlockStart.gif)
![ContractedBlock.gif](https://www.cnblogs.com/Images/OutliningIndicators/ContractedBlock.gif)
![](https://www.cnblogs.com/Images/dot.gif)
3
![](https://www.cnblogs.com/Images/OutliningIndicators/InBlock.gif)
4
![ExpandedSubBlockStart.gif](https://www.cnblogs.com/Images/OutliningIndicators/ExpandedSubBlockStart.gif)
![ContractedSubBlock.gif](https://www.cnblogs.com/Images/OutliningIndicators/ContractedSubBlock.gif)
![](https://www.cnblogs.com/Images/dot.gif)
5
![](https://www.cnblogs.com/Images/OutliningIndicators/InBlock.gif)
6
![](https://www.cnblogs.com/Images/OutliningIndicators/InBlock.gif)
7
![](https://www.cnblogs.com/Images/OutliningIndicators/InBlock.gif)
8
![ExpandedSubBlockStart.gif](https://www.cnblogs.com/Images/OutliningIndicators/ExpandedSubBlockStart.gif)
![ContractedSubBlock.gif](https://www.cnblogs.com/Images/OutliningIndicators/ContractedSubBlock.gif)
![](https://www.cnblogs.com/Images/dot.gif)
9
![](https://www.cnblogs.com/Images/OutliningIndicators/InBlock.gif)
10
![](https://www.cnblogs.com/Images/OutliningIndicators/InBlock.gif)
11
![](https://www.cnblogs.com/Images/OutliningIndicators/InBlock.gif)
12
![](https://www.cnblogs.com/Images/OutliningIndicators/InBlock.gif)
13
![](https://www.cnblogs.com/Images/OutliningIndicators/InBlock.gif)
14
![ExpandedSubBlockStart.gif](https://www.cnblogs.com/Images/OutliningIndicators/ExpandedSubBlockStart.gif)
![ContractedSubBlock.gif](https://www.cnblogs.com/Images/OutliningIndicators/ContractedSubBlock.gif)
![](https://www.cnblogs.com/Images/dot.gif)
15
![](https://www.cnblogs.com/Images/OutliningIndicators/InBlock.gif)
16
![ExpandedSubBlockStart.gif](https://www.cnblogs.com/Images/OutliningIndicators/ExpandedSubBlockStart.gif)
![ContractedSubBlock.gif](https://www.cnblogs.com/Images/OutliningIndicators/ContractedSubBlock.gif)
![](https://www.cnblogs.com/Images/dot.gif)
17
![](https://www.cnblogs.com/Images/OutliningIndicators/InBlock.gif)
18
![](https://www.cnblogs.com/Images/OutliningIndicators/ExpandedSubBlockEnd.gif)
19
![](https://www.cnblogs.com/Images/OutliningIndicators/ExpandedSubBlockEnd.gif)
20
![](https://www.cnblogs.com/Images/OutliningIndicators/InBlock.gif)
21
![ExpandedSubBlockStart.gif](https://www.cnblogs.com/Images/OutliningIndicators/ExpandedSubBlockStart.gif)
![ContractedSubBlock.gif](https://www.cnblogs.com/Images/OutliningIndicators/ContractedSubBlock.gif)
![](https://www.cnblogs.com/Images/dot.gif)
22
![](https://www.cnblogs.com/Images/OutliningIndicators/InBlock.gif)
23
![](https://www.cnblogs.com/Images/OutliningIndicators/ExpandedSubBlockEnd.gif)
24
![](https://www.cnblogs.com/Images/OutliningIndicators/InBlock.gif)
25
![](https://www.cnblogs.com/Images/OutliningIndicators/InBlock.gif)
26
![](https://www.cnblogs.com/Images/OutliningIndicators/InBlock.gif)
27
![](https://www.cnblogs.com/Images/OutliningIndicators/ExpandedSubBlockEnd.gif)
28
![](https://www.cnblogs.com/Images/OutliningIndicators/InBlock.gif)
29
![](https://www.cnblogs.com/Images/OutliningIndicators/ExpandedSubBlockEnd.gif)
30
![](https://www.cnblogs.com/Images/OutliningIndicators/InBlock.gif)
31
![](https://www.cnblogs.com/Images/OutliningIndicators/ExpandedBlockEnd.gif)
utf-16( 1200 ) | FF FE |
unicodeFFFE( 1201 ) | FE FF |
utf-32( 12000 ) | FF FE 00 00 |
utf-32BE( 12001 ) | 00 00 FE FF |
utf-8( 65001 ) | EF BB BF |
得到的记过如上表,很多的编码没有对应的Preamble字节数组...