Howto Convert PDF files to HTML files

<script type="text/javascript" src="http://pagead2.googlesyndication.com/pagead/show_ads.js"> </script>Translates pdf documents into html format.Translates pdf files into HTML or XML formats, combined with png images. Supports encrypted pdf files.There is a program called pdftohtml to convert pdf to html file.In ubuntu gutsy this package in bundled with poppler-utils so we need to install this package.

Install poppler-utils in Ubuntu

sudo aptitude install poppler-utils

This will complete the installation

Using pdftohtml

pdftohtml Syntax

pdftohtml [options] [pdf file] [html file]
Available options

A summary of options are included below.

-h, -help - Show summary of options.

-f - first page to print

-l - last page to print

-q - don’t print any messages or errors

-v - print copyright and version info

-p - exchange .pdf links with .html

-c - generate complex output

-i - ignore images

-noframes - generate no frames. Not supported in complex output mode.

-stdout - use standard output

-zoom - zoom the pdf document (default 1.5)

-xml - output for XML post-processing

-enc - output text encoding name

-opw - owner password (for encrypted files)

-upw - user password (for encrypted files)

-hidden - force hidden text extraction

-dev - output device name for Ghostscript (png16m, jpeg etc)

-nomerge - do not merge paragraphs

-nodrm - override document DRM settings

pdftohtml Examples

pdftohtml test.pdf test.html

This command gives you a simple HTML file suitable for reading or copying the textual content of the PDF file. You can actually grab the text from your browser and paste it into other applications. It doesn’t produce any PNG files, so you won’t be able to see any embedded graphics. It’s a great utility if you just want to extract the text from an Adobe file.

If you want to see graphics, you’ll need to use the -c (as in “complex”) option:

pdftohtml -c test.pdf test.html

This option produces individual HTML files, one for each page of the PDF file, with the PNG references mixed in. The graphics in the original PDF file show up in a browser and the text part can be cut and pasted. The total size of the HTML and PNG files generated with the -c option tend to be roughly equivalent to that of the original PDF.

<script type="text/javascript"> </script>
<script type=text/javascript charset=utf-8 src="http://static.bshare.cn/b/buttonLite.js#style=-1&uuid=&pophcol=3&lang=zh"></script> <script type=text/javascript charset=utf-8 src="http://static.bshare.cn/b/bshareC0.js"></script>
阅读(414) | 评论(0) | 转发(0) |
0

上一篇:没有了

下一篇:2008 Linux开发者研讨会专题 ppt 下载地址

给主人留下些什么吧!~~
评论热议
  • 0
    点赞
  • 0
    收藏
    觉得还不错? 一键收藏
  • 0
    评论
评论
添加红包

请填写红包祝福语或标题

红包个数最小为10个

红包金额最低5元

当前余额3.43前往充值 >
需支付:10.00
成就一亿技术人!
领取后你会自动成为博主和红包主的粉丝 规则
hope_wisdom
发出的红包
实付
使用余额支付
点击重新获取
扫码支付
钱包余额 0

抵扣说明:

1.余额是钱包充值的虚拟货币,按照1:1的比例进行支付金额的抵扣。
2.余额无法直接购买下载,可以购买VIP、付费专栏及课程。

余额充值