观察到的行为的原因是,顾名思义,它BufferedReader是缓冲的。它读取数据的较大块一次(到缓冲器中),并且仅返回的缓冲区的内容的相关部分-即向上部分到下一\n行分隔符。
从广义上讲,我认为有两种可能的方法:您可以实现自己的缓冲逻辑。
使用一些丑陋的反射hack来获得所需的缓冲区偏移量
对于1.,您将不再使用RandomAccessFile#readLine。相反,你可以自己做缓冲byte buffer[] = new byte[8192];...// In a loop:int read = randomAccessFile.read(buffer);
// Figure out where a line break `\n` appears in the buffer,
// return the resulting lines, and take the position of the `\n`
// into account when storing the "file pointer"
由于模糊的评论表明:这可能是繁琐和繁琐的。你基本上重新实现了该readLine方法在BufferedReader类中的作用。在这一点上,我甚至不想提到不同的行分隔符或字符集可能导致的头痛。
对于2.,您可以简单地访问BufferedReader存储缓冲区偏移量的字段。这在以下示例中实现。当然,这是一个有点粗糙的解决方案,但这里提到并显示为一个简单的替代方案,取决于解决方案应该“可持续”的程度以及您愿意投入多少努力。import java.io.BufferedReader;import java.io.FileReader;import java.io.RandomAccessFile;import java.lang.reflect.Field;import
java.util.ArrayList;import java.util.List;public class LargeFileRead {
public static void main(String[] args) throws Exception {
String fileName = "myBigFile.txt";
long before = System.nanoTime();
List result = readBuffered(fileName);
//List result = readDefault(fileName);
long after = System.nanoTime();
double ms = (after - before) / 1e6;
System.out.println("Reading took " + ms + "ms "
+ "for " + result.size() + " lines");
}
private static List readBuffered(String fileName) throws Exception {
List lines = new ArrayList();
RandomAccessFile randomAccessFile = new RandomAccessFile(fileName, "r");
BufferedReader brRafReader = new BufferedReader(
new FileReader(randomAccessFile.getFD()));
String line = null;
long currentOffset = 0;
long previousOffset = -1;
while ((line = brRafReader.readLine()) != null) {
long fileOffset = randomAccessFile.getFilePointer();
if (fileOffset != previousOffset) {
if (previousOffset != -1) {
currentOffset = previousOffset;
}
previousOffset = fileOffset;
}
int bufferOffset = getOffset(brRafReader);
long realPosition = currentOffset + bufferOffset;
System.out.println("Position : " + realPosition
+ " with FP " + randomAccessFile.getFilePointer()
+ " and offset " + bufferOffset);
lines.add(line);
}
return lines;
}
private static int getOffset(BufferedReader bufferedReader) throws Exception {
Field field = BufferedReader.class.getDeclaredField("nextChar");
int result = 0;
try {
field.setAccessible(true);
result = (Integer) field.get(bufferedReader);
} finally {
field.setAccessible(false);
}
return result;
}
private static List readDefault(String fileName) throws Exception {
List lines = new ArrayList();
RandomAccessFile randomAccessFile = new RandomAccessFile(fileName, "r");
String line = null;
while ((line = randomAccessFile.readLine()) != null) {
System.out.println("Position : " + randomAccessFile.getFilePointer());
lines.add(line);
}
return lines;
}}
(注意:偏移可能仍然显示为1,但这是由于在位置中没有考虑线分隔符。如果需要可以调整)
注意:这只是一个草图。当读取完成时,应该正确关闭RandomAccessFile对象,但这取决于超出时间限制时应该如何中断读取,如问题中所述