java c s怎么输出pdf,是否有一个C ++库从PDF文件(如PDFBox for Java)中提取文本?

Last year, I made an application in Java using PDFBox to get the raw text in some PDF files and I need to port that application to C++ now.

I wanted to know what was the best C++ alternative to accomplish what I need.

I'll give an example in case it helps:

With PDFBox, using that file, each line read on page 2 and most of page 3 would output all the data of a line, separated by a space instead of keeping it in a grid like it is now.

So the first relevant line in page 2 would look like this:

FB 847 - Tremblay, Gérard 179,63 56 16167 90 268 s27 p3 669 s14 199 223 193 615

or something like that since there are minor changes in the order they appear, but I don't care about that as long as similar lines output the same since I just parse them and put the values I need in different variables.

So, knowing all of that, is there a library that I can use in a C++ program to get similar results?

Edit: After looking at sacredFaith's link at http://www.codeproject.com/Articles/7056/Code-to-extract-plain-text-from-a-PDF-file and trying it, I'm getting a weird output like such for the example file I mentioned earlier:

The parts I actually need are in the weird characters at the beginning. Using Adobe Acrobat Reader X and using Save As... Text (accessible), I get the following result:

Which is approximately what I get in Java using PDFBox and what I want to get as output in C++.

解决方案

Xpdf is a C++ application/library which includes tools to extract plain text from a PDF file.

评论
添加红包

请填写红包祝福语或标题

红包个数最小为10个

红包金额最低5元

当前余额3.43前往充值 >
需支付:10.00
成就一亿技术人!
领取后你会自动成为博主和红包主的粉丝 规则
hope_wisdom
发出的红包
实付
使用余额支付
点击重新获取
扫码支付
钱包余额 0

抵扣说明:

1.余额是钱包充值的虚拟货币,按照1:1的比例进行支付金额的抵扣。
2.余额无法直接购买下载,可以购买VIP、付费专栏及课程。

余额充值