java tika pdf,使用Tika从大型pdf中提取文本

最新推荐文章于 2024-07-06 13:29:25 发布

梓莘彦

最新推荐文章于 2024-07-06 13:29:25 发布

阅读量241

点赞数

文章标签： java tika pdf

I try to extract text from a large pdf, but i only get the first pages, i need all text to will be passed to a string variable.

This is the code

public class ParsePDF {

public static void main(String args[]) throws Exception {

try {

File file = new File("C:/vlarge.pdf");

String content = new Tika().parseToString(file);

System.out.println("The Content: " + content);

}

catch (Exception e) {

e.printStackTrace();

}

解决方案

From the Javadocs:

To avoid unpredictable excess memory use, the returned string contains

only up to getMaxStringLength() first characters extracted from the

input document. Use the setMaxStringLength(int) method to adjust this

limitation.

Calling setMaxStringLength(-1) will disable this limit.

确定要放弃本次机会？

福利倒计时

: :

立减 ¥

普通VIP年卡可用

关注关注