一道文本处理题目的思考

最新推荐文章于 2022-09-09 22:51:00 发布

weixin_34174422

最新推荐文章于 2022-09-09 22:51:00 发布

阅读量413

点赞数

文章标签： java

原文链接：http://www.cnblogs.com/benshan/p/3577229.html

版权

在网上碰到有网友问了这么一道题，题目是这样的：

java 写入txt文件,想要修改txt文件每一行的第一个数字，加1；

例如txt文件是：

1 1 5

2 2 10

3 3 15

转变成：

2 1 5

3 2 10

4 3 15

看到题目的第一反应时可能需要正则表达式，而在java中使用raplaceAll("正则表达式","替换后的表达式")基本上就可以搞定了。但是有一个问题：正则匹配很好写，reg = "^\\d+";就可以匹配每行的第一个数字了，但是替换成什么呢？需要对每个数字加1，这个怎么处理？使用捕获组可以获取我们需要处理的数据，但是捕获后，无法进一步处理数据了。此条路不通之后，很不情愿的想起另外一种办法：按行处理。

扫描需要处理的文本，每扫描一行，就对该行进行匹配，匹配到数据之后，对该行处理，然后将该行写入到新的文件，整个文本扫描完成之后，数据也就处理完了。这个办法是不是很笨拙？对于目前也没有更好的方式（更好的方式也许可以使用excel来处理，但是要求使用编程来完成），就开始代码实现了：

// code version 1.0

开始写代码时，发现数据之间都是以多个空格或者tab来分割，方便期间，使用split函数来处理吧。

// java 解析文本，将每行第一个数字加1

public static void writeFile() {

BufferedReader reader = null ;

BufferedWriter writer = null ;

try {

File file = new File( "new.txt" );

if (!file.exists()) {

file.createNewFile();

}

StringBuffer sb = new StringBuffer();

reader = new BufferedReader( new FileReader( "test.txt" ));

String line = null ;

//按行读取

while ((line = reader.readLine()) != null ) {

String[] arr = line.split( "[ \t]++" );

if (arr. length < 3) {

sb.append(line).append( "\r\n" );

continue ;

}

//获取第一个数字，并加1

int num = Integer. valueOf(arr[0]);

num ++;

sb.append(num).append( "\t" ).append(arr[1]).append( "\t" ).append(arr[2]).append( "\r\n" );

}

//写入新的文件

writer = new BufferedWriter( new FileWriter(file));

writer.write(sb.toString());

} catch (IOException e) {

e.printStackTrace();

} finally {

if (reader != null ) {

try {

reader.close();

} catch (IOException e) {

// TODO Auto-generated catch block

e.printStackTrace();

}

if (writer != null ) {

try {

writer.close();

} catch (IOException e) {

// TODO Auto-generated catch block

e.printStackTrace();

}

代码写起来还是很顺利的，但是有个问题，在数据处理完成之后：

int num = Integer.valueOf(arr[0]);

num ++;

怎么把新的内容写入到当前行中，也就是写入当前的文本中？处理数据的时候是通过reader来按行读取，如果需要将数据写入的话，需要writer，在读取文件的时候，直接使用writer写入数据，也不是件很容易的事。为了方便处理，果断建立一个新的文件，写入新的数据，处理完之后，删掉旧的文件就是！

代码实现之后，就跟一个朋友商讨了下，朋友说，可以使用正则来完成，split的效率有点低。OK，把主要的处理过程重新实现了下：

// code version 1.1

Pattern pattern = Pattern. compile( "^\\d+" );

while(...) {

Matcher matcher = pattern .matcher(line);

if (matcher.find()) {

String str = matcher.group();

int n = Integer.parseInt(str);

n ++;

line = line.replaceFirst(str, String.valueOf(n));

sb.append(line).append( "\r\n" );

}

除了将split修改为Pattern之后，同时，行数据也保持原来的风格保持了不变：

line = line.replaceFirst(str, String. valueOf (n));

但是为什么Pattern会比split的效率高呢？

split的实现中，会调用Pattern.compile("...");也就是在对文本每行的处理中，如果使用split，则每次都会新建一个Pattern.compile("...")对象。而在使用Pattern类，只在最开始生成一个Pattern.compile("...")对象，减少了内存的开销;

一个前辈说，不需要正则，使用indexOf和substring可以提高效率。同时，他建议不要使用BufferedWriter,应当使用printStream。

恩，开始修改代码：

// code version 1.2

public static void writeFile() throws IOException {

BufferedReader reader = null ;

PrintStream writer = null ;

File file = new File( "new.txt" );

if (!file.exists()) {

file.createNewFile();

}

writer = new PrintStream( new FileOutputStream(file));

reader = new BufferedReader( new FileReader( "test.txt" ));

String line = null ;

//按行读取

while ((line = reader.readLine()) != null ) {

//这里通过index来确定需要处理的数据

int index = line.indexOf( " " );

if (index == -1) {

continue ;

}

int num = Integer.parseInt(line.substring(0,index))+1;

line = num + line.substring(index);

writer.println(line);

}

// ....

}

使用indexOf和substring 替换掉正则之后，逻辑似乎也清晰了许多，由于去掉了正则表达式的一些处理，直接对字符串处理，效率上应该会有一些提高。但是使用PrintStream 替换掉 BufferedWriter是不是就是个好主意？不见得，BufferedWriter作为一个具有缓冲功能的包装类，性能上比其他类要高很多。而且在处理文本时，每处理一行，就向文件中写入数据，这个性能也不见得很高。权衡之际，提高处理数据效率，使用indexOf和substring，写文件时，采用BufferedWriter将数据写入缓冲，提高效率。

// code version 1.3

public static void writeFile() throws IOException {

BufferedReader reader = null ;

BufferedWriter writer = null ;

writer = new BufferedWriter( new FileWriter( "new.txt" ));

reader = new BufferedReader( new FileReader( "test.txt" ));

String line = null ;

//按行读取

while ((line = reader.readLine()) != null ) {

//这里通过index来确定需要处理的数据

int index = line.indexOf( " " );

if (index == -1) {

continue ;

}

int num = Integer.parseInt(line.substring(0,index))+1;

line = num + line.substring(index);

writer.write(line);

writer.newLine();

}

//...

}

到此，关于该题目的编码总算尘埃落定。期间，经历了几番波折，从split->Pattern->indexOf再到流的选取，纠结了较长时间。总结一下：

1、尽量不要使用正则表达式，可以使用indexOf和substring来代替正则；必须使用正则的情况下，使用Pattern类能够提高效率。

2、使用Buffer之类的包装类，可以提高效率。

但是，还是有几个问题留作以后慢慢考虑吧。

1、直接使用正则或者是否还有其他更简单的处理方式么？

2、如何直接写入到当前的文件中？

转载于:https://www.cnblogs.com/benshan/p/3577229.html

weixin_34174422

关注

0
点赞
踩
0

收藏

觉得还不错? 一键收藏
0
评论
一道文本处理题目的思考

在网上碰到有网友问了这么一道题，题目是这样的：java 写入txt文件,想要修改txt文件每一行的第一个数字，加1；例如txt文件是：1 1 52 2 103 3 15转变成：2 1 53 2 104 3 15看到题目的第一反应时可能需要正则表...
复制链接

扫一扫