单页PDF含有一张表
com.aistrong.analysis.pdf.service
Class ReaderTextService
Method Detail:
public ArrayList> readWordWithTextPositions(String path)
Arguments:
path - pdf文件存储路径
Returns:
ArrayList>
每个WordWithTextPositions对象中存储了1行(参看注意)中所有字符,其中每个字符对应一个TextPosition对象,每个TextPosition存储了该字符所有相关信息,包含字符、坐标等,详细介绍参看pdfBox API文档Class TextPosition
Instance:
package com;
import java.io.IOException;
import java.util.List;
import org.apache.pdfbox.text.PDFLocalStripper.WordWithTextPositions;
import org.apache.pdfbox.text.TextPosition;
import com.aistrong.analysis.pdf.service.ReaderTextService;
public class TestReadWordWithTextPositions {
public static void main(String[] args) throws IOException {
// TODO Auto-generated method stub
ReaderTextService