java内存不足时读取文件,在Java中读取大文件时如何避免OutOfMemory异常

I am working on the application which reads large amounts of data from a file. Basically, I have a huge file (around 1.5 - 2 gigs) containing different objects (~5 to 10 millions of them per file). I need to read all of them and put them to different maps in the app. The problem is that the app runs out of memory while reading the objects at some point. Only when I set it to use -Xmx4096m - it can handle the file. But if the file will be larger, it won't be able to do that anymore.

Here's the code snippet:

String sampleFileName = "sample.file";

FileInputStream fileInputStream = null;

ObjectInputStream objectInputStream = null;

try{

fileInputStream = new FileInputStream(new File(sampleFileName));

int bufferSize = 16 * 1024;

objectInputStream = new ObjectInputStream(new BufferedInputStream(fileInputStream, bufferSize));

while (true){

try{

Object objectToRead = objectInputStream.readUnshared();

if (objectToRead == null){

break;

}

// doing something with the object

}catch (EOFException eofe){

eofe.printStackTrace();

break;

} catch (Exception e) {

e.printStackTrace();

continue;

}

}

} catch (Exception e){

e.printStackTrace();

}finally{

if (objectInputStream != null){

try{

objectInputStream.close();

}catch (Exception e2){

e2.printStackTrace();

}

}

if (fileInputStream != null){

try{

fileInputStream.close();

}catch (Exception e2){

e2.printStackTrace();

}

}

}

First of all, I was using objectInputStream.readObject() instead of objectInputStream.readUnshared(), so it solved the issue partially. When I increased the memory from 2048 to 4096, it started parsing the file. BufferedInputStream is already in use. From the web I've found only examples how to read lines or bytes, but nothing regarding objects, performance wise.

How can I read the file without increasing the memory for JVM and avoiding the OutOfMemory exception? Is there any way to read objects from the file, not keeping anything else in the memory?

解决方案

When reading big files, parsing objects and keeping them in memory there are several solutions with several tradeoffs:

You can fit all parsed objects into memory for that app deployed on one server. It either requires to store all objects in very zipped way, for example using byte or integer to store 2 numbers or some kind of shifting in other data structures. In other words fitting all objects in possible minimum space. Or increase memory for that server(scale vertically)

a) However reading the files can take too much memory, so you have to read them in chunks. For example this is what I was doing with json files:

JsonReader reader = new JsonReader(new InputStreamReader(in, "UTF-8"));

if (reader.hasNext()) {

reader.beginObject();

String name = reader.nextName();

if ("content".equals(name)) {

reader.beginArray();

parseContentJsonArray(reader, name2ContentMap);

reader.endArray();

}

name = reader.nextName();

if ("ad".equals(name)) {

reader.beginArray();

parsePrerollJsonArray(reader, prerollMap);

reader.endArray();

}

}

The idea is to have a way to identify when certain object starts and ends and read only that part.

b) You can also split files to smaller ones at the source if you can, then it will be easier to read them.

You can't fit all parsed objects for that app on one server. In this case you have to shard based on some object property. For example split data based on US state into multiple servers.

Hopefully it helps in your solution.

评论
添加红包

请填写红包祝福语或标题

红包个数最小为10个

红包金额最低5元

当前余额3.43前往充值 >
需支付:10.00
成就一亿技术人!
领取后你会自动成为博主和红包主的粉丝 规则
hope_wisdom
发出的红包
实付
使用余额支付
点击重新获取
扫码支付
钱包余额 0

抵扣说明:

1.余额是钱包充值的虚拟货币,按照1:1的比例进行支付金额的抵扣。
2.余额无法直接购买下载,可以购买VIP、付费专栏及课程。

余额充值