遇到一个json 文件,文件只有一行,里面内容是json 数组的形式,形如:
[{"CreateTime":"2021-04-27T10:02:38.243","UpdateTime":null,"SupplierCode":"xxx","SupplierName":"xxxx","HotelId":"xxxxx","RoomCode":"xxxx"},{"CreateTime":"2021-04-27T10:02:38.243","UpdateTime":null,"SupplierCode":"xxx","SupplierName":"xxxx","HotelId":"xxxxx","RoomCode":"xxxx"},{"CreateTime":"2021-04-27T10:02:38.243","UpdateTime":null,"SupplierCode":"xxx","SupplierName":"xxxx","HotelId":"xxxxx","RoomCode":"xxxx"}]
只有一行,按行读取,字符数组长度超过 1亿
超过1.4亿个字符,生产没法保证后续是否会继续扩大如果字符串长度超过 int的最大值那么读取将会报错
或者内存没法存储一行数据也会报错,
思路:
预估一个缓存池,读取到缓存,自己解析
目前一个json 串长度约500个字符,因此可以借用 NIO 进行解析代码如下:
String path ="/Users/wuss/Downloads/20210428/Rooms_2021-04-28.json";
String wPath ="/Users/wuss/Downloads/20210428/Rooms_2021-04-28-new.json";
FileChannel fileChannel = FileChannel.open(Paths.get(path), StandardOpenOption.READ);
BufferedWriter bw = new BufferedWriter(new FileWriter(wPath));
int size = 2048;
ByteBuffer byteBuffer = ByteBuffer.allocate(size);
//避免极端情况,能容下多份 byteBuffer
CharBuffer charBuffer = CharBuffer.allocate(size <<1);
// 通过设置字符集的编码,并将ByteBuffer转换为CharBuffer来避免中文乱码
Charset charset = Charset.forName("UTF-8");
CharsetDecoder decoder = charset.newDecoder();
String string="";
boolean start =false;
while (-1 != fileChannel.read(byteBuffer)){
byteBuffer.flip();
decoder.decode(byteBuffer, charBuffer, byteBuffer.limit() < size);
charBuffer.flip();
char[] chars = new char[charBuffer.limit()];
charBuffer.get(chars);
string = String.valueOf(chars);
int index;
while ((index = string.indexOf("},{"))> 0){
if (!start){
bw.write(string.substring(1, index + 1));
start = true;
}else {
bw.write(string.substring(0, index + 1));
}
bw.newLine();
string = string.substring(index+2);
}
charBuffer.clear();
//存放未解析的数据
charBuffer.put(string);
byteBuffer.compact();
}
if (string.length()>0){
bw.write(string.substring(0,string.lastIndexOf(']')));
}
fileChannel.close();
bw.close();
哪怕一行数据超大,用此种思路也借鉴