java 获取换行符,读取带有换行符的Java文件

最新推荐文章于 2022-10-22 13:53:33 发布

weixin_39737492

最新推荐文章于 2022-10-22 13:53:33 发布

阅读量622

点赞数

文章标签： java 获取换行符

I have a Unicode file that needs to be exported to database(Vertica). The column delimiter is CTRL+B, record delimiter is newline(\n). Whenever there is a newline within a column value, CTRL+A is used as escape character.

When I use BufferedReader.readLine() to read this file, the records with ID's 2 and 4, are read as two records. Whereas I want to read them as a single whole record as given in output.

Here is the example input file. | stands for CTRL+B and ^ stands for CTRL+A.

Input

ID|Name|Job Desc

----------------

1|xxxx|SO Job

2|YYYY|SO Careers^

Job

3|RRRRR|SO

4|ZZZZ^

ZZ|SO Job

5|AAAA|YU

Output:

ID|Name|Job Desc

----------------

1|xxxx|SO Job

2|YYYY|SO Careers Job

3|RRRRR|SO

4|ZZZZ ZZ|SO Job

5|AAAA|YU

The file is huge, so I cant use StringEscapeUtils. Any suggestions on this?

解决方案

You can use a Scanner with a custom delimeter. The delimeter I use is set to match \n but not \u0001\n (where \u0001 represents CTRL+A):

try {

PrintWriter writer = new PrintWriter("dboutput.txt");

Scanner sc = new Scanner(new File("dbinput.txt"));

sc.useDelimiter(Pattern.compile("^(?!.*(\\u0001\\n)).*\\n$"));

while (sc.hasNext()) {

writer.println(sc.next());

}

scanner.close();

writer.close();

} catch (FileNotFoundException e) {

e.printStackTrace();

}

确定要放弃本次机会？

福利倒计时

: :

立减 ¥

普通VIP年卡可用

立即使用

weixin_39737492

关注关注

0
点赞
踩
0

收藏

觉得还不错? 一键收藏
0
评论
java 获取换行符,读取带有换行符的Java文件

I have a Unicode file that needs to be exported to database(Vertica). The column delimiter is CTRL+B, record delimiter is newline(\n). Whenever there is a newline within a column value, CTRL+A is used...
复制链接

扫一扫