poi读取doc生成pdf_使用PHP读取PDF和Word DOC文件

poi读取doc生成pdf

One of my customers has an insane amount of PDF and Microsoft Word DOC files on their website. It's core to their online services so it's not as though they're garbage files up on the server. My customer wanted their website's search engine (Sphider) to read these PDF files and DOC files so that their clients could get at the documents they needed without going through a bunch of summary pages to get them. I was successful in the task, so let me show you how to read PDF and DOC files using PHP.

我的一位客户的网站上有大量的PDF和Microsoft Word DOC文件。 它是其在线服务的核心,因此就好像它们不是服务器上的垃圾文件一样。 我的客户希望其网站的搜索引擎(Sphider)读取这些PDF文件和DOC文件,以便他们的客户可以获取所需的文档,而无需经过一堆摘要页面来获取它们。 我已经成功完成了这项任务,所以让我向您展示如何使用PHP读取PDF和DOC文件。

阅读PDF文件 (Reading PDF Files)

To read PDF files, you will need to install the XPDF package, which includes "pdftotext." Once you have XPDF/pdftotext installed, you run the following PHP statement to get the PDF text:

要阅读PDF文件,您需要安装XPDF软件包 ,其中包括“ pdftotext”。 安装XPDF / pdftotext后,运行以下PHP语句以获取PDF文本:

$content = shell_exec('/usr/local/bin/pdftotext '.$filename.' -'); //dash at the end to output content

读取DOC文件 (Reading DOC Files)

Like the PDF example above, you'll need to download another package. This package is called Antiword. Here's the code to grab the Word DOC content:

像上面的PDF示例一样,您需要下载另一个软件包。 该软件包称为Antiword 。 这是获取Word DOC内容的代码:

$content = shell_exec('/usr/local/bin/antiword '.$filename);

The above code does NOT read DOCX files and does not (and purposely so) preserve formatting. There are other libraries that will preserve formatting but in our case, we just want to get at the text.

上面的代码不读取DOCX文件,并且不(有意地)保留格式。 还有其他一些库将保留格式,但是在我们的情况下,我们只想获取文本。

A special thank you to Jeremy Parrish for his help and insight with this task.

特别感谢Jeremy Parrish在此任务上的帮助和见识。

翻译自: https://davidwalsh.name/read-pdf-doc-file-php

poi读取doc生成pdf

  • 0
    点赞
  • 0
    收藏
    觉得还不错? 一键收藏
  • 0
    评论

“相关推荐”对你有帮助么?

  • 非常没帮助
  • 没帮助
  • 一般
  • 有帮助
  • 非常有帮助
提交
评论
添加红包

请填写红包祝福语或标题

红包个数最小为10个

红包金额最低5元

当前余额3.43前往充值 >
需支付:10.00
成就一亿技术人!
领取后你会自动成为博主和红包主的粉丝 规则
hope_wisdom
发出的红包
实付
使用余额支付
点击重新获取
扫码支付
钱包余额 0

抵扣说明:

1.余额是钱包充值的虚拟货币,按照1:1的比例进行支付金额的抵扣。
2.余额无法直接购买下载,可以购买VIP、付费专栏及课程。

余额充值