I try to extract text from a large pdf, but i only get the first pages, i need all text to will be passed to a string variable.
This is the code
public class ParsePDF {
public static void main(String args[]) throws Exception {
try {
File file = new File("C:/vlarge.pdf");
String content = new Tika().parseToString(file);
System.out.println("The Content: " + content);
}
catch (Exception e) {
e.printStackTrace();
}
}
}
解决方案
From the Javadocs:
To avoid unpredictable excess memory use, the returned string contains
only up to getMaxStringLength() first characters extracted from the
input document. Use the setMaxStringLength(int) method to adjust this
limitation.
Calling setMaxStringLength(-1) will disable this limit.