NIO官方例子—正则表达式匹配和解码问题MalformedInputException

最新推荐文章于 2024-04-30 03:37:32 发布

capslk84

最新推荐文章于 2024-04-30 03:37:32 发布

阅读量174

点赞数

分类专栏： IO与NIO 文章标签：正则表达式 Mina lucene F# 框架

IO与NIO 专栏收录该内容

4 篇文章 0 订阅

订阅专栏

NIO的特性：

Buffers for data of primitive types
Character-set encoders and decoders
A pattern-matching facility based on Perl-style regular expressions
Channels, a new primitive I/O abstraction
A file interface that supports locks and memory mapping
A multiplexed, non-blocking I/O facility for writing scalable servers

很多书本上，一般只提到NIO的特性1、4、5，对特性2提及的也比较少，我看过的几本书上几本上没有提到特性3、6。特性6在网上到能搜到不少资料，socket的高级编程、MINA框架、Lucene框架中大量使用NIO。

最经查看NIO的官方API的时候，发现其官方NIO的第一个例子Grep.java中就有特性3的运用：

public class Grep {
	//16 位的 Unicode 代码单元序列和字节序列之间的命名映射关系。
    //此类定义了用于创建解码器和编码器以及检索与 charset 关联的各种名称的方法。
    // Charset and decoder for ISO-8859-15
    private static Charset charset = Charset.forName("UTF-8");//测试中文。。
    private static CharsetDecoder decoder = charset.newDecoder();

    // Pattern used to parse lines
    private static final Pattern linePattern = Pattern.compile(".*\r?\n");

    // The input pattern that we're looking for
    private static Pattern pattern; 

    // Compile the pattern from the command line
    private static void compile(String pat) {
	try {
	    pattern = Pattern.compile(pat);
	} catch (PatternSyntaxException x) {
	    System.err.println(x.getMessage());
	    System.exit(1);
	}
    }

    // Use the linePattern to break the given CharBuffer into lines, applying
    // the input pattern to each line to see if we have a match
    private static void grep(File f, CharBuffer cb) {
	Matcher lm = linePattern.matcher(cb);	// Line matcher
	Matcher pm = null;			// Pattern matcher
	int lines = 0;
	while (lm.find()) {
	    lines++;
	    CharSequence cs = lm.group(); 	// The current line
	    if (pm == null)
		pm = pattern.matcher(cs);
	    else
		pm.reset(cs);
	    if (pm.find())
		System.out.print(f + ":" + lines + ":" + cs);
	    if (lm.end() == cb.limit())
		break;
	}
    }

    // Search for occurrences of the input pattern in the given file
    //
    private static void grep(File f) throws IOException {

	// Open the file and then get a channel from the stream
	FileInputStream fis = new FileInputStream(f);
	FileChannel fc = fis.getChannel();

	// Get the file's size and then map it into memory
	int sz = (int)fc.size();
	MappedByteBuffer bb = fc.map(FileChannel.MapMode.READ_ONLY, 0, sz);

	// Decode the file into a char buffer
	CharBuffer cb = decoder.decode(bb);
	// Perform the search
	grep(f, cb);
	// Close the channel and the stream
	fc.close();
    }

    public static void main(String[] args) {
    	args = new String[]{"int","test.log"};
	if (args.length < 2) {
	    System.err.println("Usage: java Grep pattern file...");
	    return;
	}
	compile(args[0]);
	for (int i = 1; i < args.length; i++) {
	    File f = new File(args[i]);
	    try {
		grep(f);
	    } catch (IOException x) {
		System.err.println(f + ": " + x);
	    }
	}
    }

}

总结：

Grep.java使用正则表达式来封装分行，这样就可以实现BufferedReader的readLine()方法的功能。

使用这则匹配，可以查询匹配的行的内容，可以做日志查询的功能。

直接借助Charset类提供解码（字符解码），但是用上面的代码解码方式改造后，读文件超过10M就抛出java.nio.charset.MalformedInputException的异常。

public class MalformedInputException extends CharacterCodingException 当输入字节序列对于给定 charset 来说是不合法的，或者输入字符序列不是合法的 16 位 Unicode 序列时，抛出此经过检查的异常。

提出解决方案一：只是一个思路，与我这里的问题不符合，就没有深入研究。

方案二描述：在处理大文本文件字符编码转换时碰到该问题，即使用CharsetDecoder.decode()方法解码一个MappedByteBuffer对象时，如果这个MappedByteBuffer对象的长度设置的不好，可能会出现“java.nio.charset.MalformedInputException:Malformed input length is N(N代表一个整数).”的错误。但是如果直接使用Charset.decode()方法，则不会出现这样的错误。

方案二解码成功，但是我测了个20M的文件就OutOfMemoryError。（20M只是我随便测试的一个值）

capslk84

关注

0
点赞
踩
0

收藏

觉得还不错? 一键收藏
0
评论
NIO官方例子—正则表达式匹配和解码问题MalformedInputException

NIO的特性：Buffers for data of primitive types Character-set encoders and decoders A pattern-matching facility based on Perl-style regular expressions Channels, a new primitive I/O abstraction ...
复制链接

扫一扫