java c s怎么输出pdf,是否有一个C ++库从PDF文件（如PDFBox for Java）中提取文本？

虚幻自习室

于 2021-02-27 11:55:41 发布

阅读量89

点赞数

文章标签： java c s怎么输出pdf

Last year, I made an application in Java using PDFBox to get the raw text in some PDF files and I need to port that application to C++ now.

I wanted to know what was the best C++ alternative to accomplish what I need.

I'll give an example in case it helps:

With PDFBox, using that file, each line read on page 2 and most of page 3 would output all the data of a line, separated by a space instead of keeping it in a grid like it is now.

So the first relevant line in page 2 would look like this:

FB 847 - Tremblay, Gérard 179,63 56 16167 90 268 s27 p3 669 s14 199 223 193 615

or something like that since there are minor changes in the order they appear, but I don't care about that as long as similar lines output the same since I just parse them and put the values I need in different variables.

So, knowing all of that, is there a library that I can use in a C++ program to get similar results?

Edit: After looking at sacredFaith's link at http://www.codeproject.com/Articles/7056/Code-to-extract-plain-text-from-a-PDF-file and trying it, I'm getting a weird output like such for the example file I mentioned earlier:

The parts I actually need are in the weird characters at the beginning. Using Adobe Acrobat Reader X and using Save As... Text (accessible), I get the following result:

Which is approximately what I get in Java using PDFBox and what I want to get as output in C++.

解决方案

Xpdf is a C++ application/library which includes tools to extract plain text from a PDF file.

虚幻自习室

关注

0
点赞
踩
0

收藏

觉得还不错? 一键收藏
0
评论
复制链接

分享到 QQ

分享到新浪微博

扫一扫