【背景】
折腾:
期间,去试试用xpdf,将一个不可拷贝的pdf文件,转换为文本或html。
【折腾过程】
1.参考:
去:
->
->
->
->
->
然后去运行:
D:\tmp\dev_tools\python\pdf\xpdfbin-win-3.03\xpdfbin-win-3.03\bin64
中的:
pdftotext.exe
结果还是被保护,无法拷贝:D:\tmp\dev_tools\python\pdf\xpdfbin-win-3.03\xpdfbin-win-3.03\bin64>pdftotext.exe
pdftotext version 3.03
Copyright 1996-2011 Glyph & Cog, LLC
Usage: pdftotext [options] []
-f : first page to convert
-l : last page to convert
-layout : maintain original physical layout
-fixed : assume fixed-pitch (or tabular) text
-raw : keep strings in content stream order
-htmlmeta : generate a simple HTML file, including the meta information
-enc : output text encoding name
-eol : output end-of-line convention (unix, dos, or mac)
-nopgbrk : don't insert page breaks between pages
-opw : owner password (for encrypted files)
-upw : user password (for encrypted files)
-q : don't print any messages or errors
-cfg : configuration file to use in place of .xpdfrc
-v : print copyright and version info
-h : print usage information
-help : print usage information
--help : print usage information
-? : print usage information
D:\tmp\dev_tools\python\pdf\xpdfbin-win-3.03\xpdfbin-win-3.03\bin64>pdftotext.exe -htmlmeta D:\tmp\tmp_dev_root\python\answer_question\self\pdf_table_to_xml\pdf\spec183r21.0.pdf hart183.html
Permission Error: Copying of text from this document is not allowed.
2.所以去解决上述问题:
但是没解决掉。。。
【总结】
目前还是没法用xpdf去把pdf转换为想要的html。