生成Reader时字符集检测

一般的生成Reader时,指定了字符集编码格式,不会报异常错误

例如:

try {
			reader = new InputStreamReader(input,"GBK");
		} catch (UnsupportedEncodingException e1) {
			// TODO 自动生成的 catch 块
			e1.printStackTrace();
		}
		BufferedReader  bfr = new BufferedReader(reader);
		try {
			System.out.println(bfr.readLine());
		} catch (IOException e) {
			// TODO 自动生成的 catch 块
			e.printStackTrace();
		}

此时结果为:

锘�閮ㄧ讲鏃舵敞鎰忎簨椤�----乱码,因为字符集不正确内容是乱码

因此需要添加文件字符集检测,在学习lucene源码时发现已经有相关接口了。

public static Reader getDecodingReader(InputStream stream, Charset charSet) {
	    final CharsetDecoder charSetDecoder = charSet.newDecoder()
	        .onMalformedInput(CodingErrorAction.REPORT)
	        .onUnmappableCharacter(CodingErrorAction.REPORT);
	    return new BufferedReader(new InputStreamReader(stream, charSetDecoder));
	  }

在生成Reader时添加一次检测,当文件字符集编码格式不匹配时,会报异常信息

java.nio.charset.UnmappableCharacterException: Input length = 2
at java.nio.charset.CoderResult.throwException(CoderResult.java:278)
at sun.nio.cs.StreamDecoder.implRead(StreamDecoder.java:338)
at sun.nio.cs.StreamDecoder.read(StreamDecoder.java:177)
at java.io.InputStreamReader.read(InputStreamReader.java:184)
at java.io.BufferedReader.read1(BufferedReader.java:203)
at java.io.BufferedReader.read(BufferedReader.java:279)
at java.io.BufferedReader.fill(BufferedReader.java:154)
at java.io.BufferedReader.readLine(BufferedReader.java:317)
at java.io.BufferedReader.readLine(BufferedReader.java:382)
at org.apache.lucene.util.DecodingReaderTest.test(DecodingReaderTest.java:34)
at org.apache.lucene.util.DecodingReaderTest.main(DecodingReaderTest.java:18)

完整代码如下;

package org.apache.lucene.util;

import java.io.BufferedReader;
import java.io.IOException;
import java.io.InputStream;
import java.io.InputStreamReader;
import java.io.Reader;
import java.io.UnsupportedEncodingException;
import java.nio.charset.Charset;
import java.nio.charset.CharsetDecoder;
import java.nio.charset.CodingErrorAction;
import java.nio.file.Files;
import java.nio.file.Paths;

public class DecodingReaderTest {

	public static void main(String[] args) {
		test();

	}

	private static void test(){
		String path = "./mytest/configuration.properties";
		InputStream input = null;
		try {
			input = Files.newInputStream(Paths.get(path));
		} catch (IOException e) {
			// TODO 自动生成的 catch 块
			e.printStackTrace();
		}
		Reader reader =getDecodingReader(input, Charset.forName("GBK"));
		BufferedReader  bfr = new BufferedReader(reader);
		try {
			System.out.println(bfr.readLine());
		} catch (IOException e) {
			// TODO 自动生成的 catch 块
			e.printStackTrace();
		}
	}
	/**
	   * Wrapping the given {@link InputStream} in a reader using a {@link CharsetDecoder}.
	   * Unlike Java's defaults this reader will throw an exception if your it detects 
	   * the read charset doesn't match the expected {@link Charset}. 
	   * <p>
	   * Decoding readers are useful to load configuration files, stopword lists or synonym files
	   * to detect character set problems. However, its not recommended to use as a common purpose 
	   * reader.
	   * 检测配置文件,词典文件等字符集与设置的字符集是否匹配
	   * @param stream the stream to wrap in a reader
	   * @param charSet the expected charset
	   * @return a wrapping reader
	   */
	public static Reader getDecodingReader(InputStream stream, Charset charSet) {
	    final CharsetDecoder charSetDecoder = charSet.newDecoder()
	        .onMalformedInput(CodingErrorAction.REPORT)
	        .onUnmappableCharacter(CodingErrorAction.REPORT);
	    return new BufferedReader(new InputStreamReader(stream, charSetDecoder));
	  }
}


  • 0
    点赞
  • 0
    收藏
    觉得还不错? 一键收藏
  • 0
    评论

“相关推荐”对你有帮助么?

  • 非常没帮助
  • 没帮助
  • 一般
  • 有帮助
  • 非常有帮助
提交
评论
添加红包

请填写红包祝福语或标题

红包个数最小为10个

红包金额最低5元

当前余额3.43前往充值 >
需支付:10.00
成就一亿技术人!
领取后你会自动成为博主和红包主的粉丝 规则
hope_wisdom
发出的红包
实付
使用余额支付
点击重新获取
扫码支付
钱包余额 0

抵扣说明:

1.余额是钱包充值的虚拟货币,按照1:1的比例进行支付金额的抵扣。
2.余额无法直接购买下载,可以购买VIP、付费专栏及课程。

余额充值