java打印取消页眉页脚,如何在Java中使用iText从PDF文件中删除页眉和页脚

最新推荐文章于 2024-05-27 12:07:34 发布

芙蓉塘外有轻雷

最新推荐文章于 2024-05-27 12:07:34 发布

阅读量1.2k

点赞数

文章标签： java打印取消页眉页脚

I am using the PDF iText library to convert PDF to text.

Below is my code to convert PDF to text file using Java.

public class PdfConverter {

/** The original PDF that will be parsed. */

public static final String pdfFileName = "jdbc_tutorial.pdf";

/** The resulting text file. */

public static final String RESULT = "preface.txt";

/**

* Parses a PDF to a plain text file.

* @param pdf the original PDF

* @param txt the resulting text

* @throws IOException

public void parsePdf(String pdf, String txt) throws IOException {

PdfReader reader = new PdfReader(pdf);

PdfReaderContentParser parser = new PdfReaderContentParser(reader);

PrintWriter out = new PrintWriter(new FileOutputStream(txt));

TextExtractionStrategy strategy;

for (int i = 1; i <= reader.getNumberOfPages(); i++) {

strategy = parser.processContent(i, new SimpleTextExtractionStrategy());

out.println(strategy.getResultantText());

System.out.println(strategy.getResultantText());

}

out.flush();

out.close();

reader.close();

}

/**

* Main method.

* @param args no arguments needed

* @throws IOException

public static void main(String[] args) throws IOException {

new PdfConverter().parsePdf(pdfFileName, RESULT);

}

The above code works for extracting PDF to text. But my requirement is to ignore header and footer and extract only content from PDF file.

解决方案

Because your pdf has headers and footers, it would be marked as artifacts(if not its just a text or content placed at the position of a header or footer). If its marked as artifacts, you can extract it using ParseTaggedPdf. You can also make use of ExtractPageContentArea if ParseTaggedPdf doesn't work. You can check for a few examples related to it.

The above solution is general and depends on the file. If you really need an alternate solution, you can use apache API's like PdfBox, tika and others like PDFTextStream. The solution which i'm giving below wont work if you have to persist with iText and can't move on to other libraries. In PdfBox you can use PDFTextStripperByArea or PDFTextStripper. Look at the JavaDoc or some examples if you need to know how to use it.

芙蓉塘外有轻雷

关注

0
点赞
踩
1

收藏

觉得还不错? 一键收藏
0
评论
java打印取消页眉页脚,如何在Java中使用iText从PDF文件中删除页眉和页脚

I am using the PDF iText library to convert PDF to text.Below is my code to convert PDF to text file using Java.public class PdfConverter {/** The original PDF that will be parsed. */public static fin...
复制链接

扫一扫