Pdf2htmlEX的作者是国人,项目地址https://github.com/coolwanglu/pdf2htmlEX,可把PDF文件转换成html单一文件格式,最酷的是完全保持PDF的分页和各种编码、图形,还有公式格式。简而言之,PDF真的变成了一模一样的html文件。酷!!很多外国人也在用,这块目前似乎是独一无二(?欢迎有更好的大家留言给我)。
但美中不足的是,编译真心不轻松,笔者用了半天多的时间才算弄过,特此与大家分享一下。
最初的测试环境是在ubuntu上,但生产环境用了Amazon Linux,所以此处可以分享给大家两个系统上的安装。
Amazon Linux
1. 启用epel
yum-config-manager –enable epel
2. 更新
yum -y update
3. 安装key
cd /etc/pki/rpm-gpg/
wget http://mirrors.163.com/centos/RPM-GPG-KEY-CentOS-6
rpm –import /etc/pki/rpm-gpg/RPM-GPG-KEY-CentOS-6
cd /etc/yum.repos.d
wget http://linuxsoft.cern.ch/cern/scl/slc6-scl.repo
4.升级pip
pip install –upgrade setuptools
pip install lxml
https://github.com/cms-sw/cms-docker/blob/master/slc6-vanilla/RPM-GPG-KEY-cern To -o /etc/pki/rpm-gpg/RPM-GPG-KEY-cern
/etc/yum/pluginconf.d/priorities.conf and set enabled = 0
wget http://ftp5.gwdg.de/pub/opensuse/repositories/server:/mail/CentOS_6/x86_64/python-lxml-2.3.3-20.1.x86_64.rpm
rpm -Uvh –nosignature python-lxml-2.3.3-20.1.x86_64.rpm
5. 安装包依赖
yum -y install libtool-ltdl-devel.x86_64 zlib-devel.x86_64 glib2-devel.x86_64 freetype-devel.x86_64 poppler-glib-devel.x86_64 git cmake mk-configure.noarch libjpeg-turbo.x86_64 libtiff.x86_64 libpng-devel.x86_64 giflib-devel.x86_64 libXt-devel.x86_64 autoconf automake libtool bzip2 libxml2.x86_64 libuninameslist-devel.x86_64 libspiro.x86_64 dbus-python-devel.x86_64 pango-devel.x86_64 chrpath uuid-c++.x86_64 uuid.x86_64 uthash-devel.noarch cmake gcc java-1.8.0-openjdk libpng-devel.x86_64 fontforge-devel.x86_64 cairo-devel.x86_64 poppler-devel.x86_64 libspiro-devel.x86_64 freetype-devel.x86_64 poppler-data libjpeg-turbo-devel git gcc-c++ libjpeg-turbo-devel.x86_64 poppler-data.noarch jpackage-utils.noarch gettext.x86_64 jpackage-utils.noarch python27-python-devel.x86_64 libxml2-python27.x86_64 libxml2-python26.x86_64 python27-python-devel.x86_64 libxslt-devel.x86_64 libxslt-python26.x86_64 libxslt.x86_64 libxml2-devel libxslt-devel python-devel python-javapackages.noarch –nogpgcheck install poppler-cpp.x86_64 poppler-cpp-devel.x86_64 libstdc++48-static.x86_64 openjpeg-devel.x86_64
6.安装库依赖
wget http://downloads.sourceforge.net/openjpeg.mirror/openjpeg-2.1.0.tar.gz
tar -xzf openjpeg-2.1.0.tar.gz; cd openjpeg-2.1.0
cmake . && make && make install
wget http://poppler.freedesktop.org/poppler-0.35.0.tar.xz
tar -xf poppler-0.35.0.tar.xz
./configure –enable-xpdf-headers –enable-libjpeg
make && make install
git clone https://github.com/coolwanglu/fontforge.git fontforge.git
cd fontforge.git && git checkout pdf2htmlEX && ./autogen.sh && ./configure && make V=1 && sudo make install
cp fontforge.pc /usr/local/lib/pkgconfig/
export PKG_CONFIG_PATH=/usr/local/lib/pkgconfig
vim CMakeLists.txt #adjust version
wget https://www.softwarecollections.org/repos/rhscl/mongodb24/epel-6-x86_64/javapackages-tools-3.4.1-1.1.el6.noarch.rpm
wget ftp://bo.mirror.garr.it/pub/1/slc/centos/7.0.1406/os/x86_64/Packages/libxml2-2.9.1-5.el7.x86_64.rpm
git clone git://github.com/coolwanglu/pdf2htmlEX.git
cd pdf2htmlEX && cmake . && make && sudo make install
pkg-config –print-provides –cflags –libs poppler
7. 调整库
ln -s /usr/local/lib/libpoppler.so.54 /usr/lib64/libpoppler.so.54
ln -s /usr/local/lib/libfontforge.so.2 /usr/lib64/libfontforge.so.2
8. 测试用的命令:
pdf2htmlEX.exe --hdpi 144 --vdpi 144 a.pdf a.html
Ubuntu:
1 安装git
yum install git –y
2 下载pdf2htmlEX源代码(最新version 0.12)
git clone https://github.com/coolwanglu/pdf2htmlEX.git
3 下载fontforge
git clone https://github.com/coolwanglu/fontforge.git
4 源代码安装autoconf,
版本至少在2.68以上, 不要用yum 安装,因为yum安装的版本是2.63, too old
5 yum install libtool patch libtool-ltdl-devel
解决找不到libtoolize patch
copying.lib' not found in /usr/share/libtool/libltdl'
6 升级python到2.7以上
参考此文http://www.gowhich.com/blog/553
7 升级pkg-config到版本2.8
###########################################################
tar -zxvf pkg-config-0.28.tar.gz
cd pkg-config-0.28
./configure --with-internal-glib
make && make install
mv /usr/bin/pkg-config /usr/bin/pkg-config0.23
ln -s /usr/local/bin/pkg-config /usr/bin/pkg-config
pkg-config --version验证
###########################################################
8 yum install glib2-devel
###########################################################
解决checking for GLIB... configure: error: Package requirements (glib-2.0 >= 2.6 gio-2.0) were not met:
No package 'glib-2.0' found
No package 'gio-2.0' found
参考 https://github.com/fontforge/fontforge/issues/564
###########################################################
9 安装fontforge
###########################################################
cd fontforge
yum install gettext
防止make过程中报错msgfmt: Command not found,则需要安装gettext以提供msgfmt命令, 参考https://github.com/coolwanglu/pdf2htmlEX/issues/118
./boostrap
./configure --without-libzmq --without-x --without-iconv --disable-python-scripting --disable-python-extension #yum install pango-devel(如果./configure不成功)
make
make install
###########################################################
10 可选安装Cairo (为生成SVG背景图片并且处理Type3字体)
###########################################################
安装freetype2
不要安装版本2.5.1 2.5.1遇到ahronbd.ttf等字体时, fc-chae -fv会导致freetype崩溃http://sourceforge.net/projects/freetype/files/freetype2/2.5.2/
tar -zxvf freetype-2.5.2.tar.gz
cd freetype-2.5.2
./configure
make
make install
安装pixman
wget http://cairographics.org/releases/pixman-0.32.4.tar.gz
tar -zxvf pixman-0.32.4.tar.gz
cd pixman-0.32.4
./configure
make
make install
export png_REQUIRES="libpng"
解决configure: error: recommended PNG functions feature could not be enabled 参考http://mattgwwalker.wordpress.com/2010/01/07/cairo-configure-issues/
xz -d cairo-1.12.14.tar.xz
tar -xvf cairo-1.12.14.tar
cd cairo-1.12.14
./configure
make
make install
11 可选 为TTF字体增加提示信息
###########################################################
安装harfbuzz >= 0.9.19
wget http://www.freedesktop.org/software/harfbuzz/release/harfbuzz-0.9.33.tar.bz2
tar jxf harfbuzz-0.9.33.tar.bz2
cd harfbuzz-0.9.33
./configure
make
make install
安装qt4
yum install qt4-devel
安装ttfautohint
tar -zxvf ttfautohint-1.00.tar.gz
cd ttfautohint-1.00
./configure
make
make install
###########################################################
12 安装pdf2htmlEX
###########################################################
cd pdf2htmlEX
配置好poppler和libfontforge
export PKG_CONFIG_PATH=$PKG_CONFIG_PATH:/usr/local/share/fontconfig:/usr/local/lib/pkgconfig:/usr/lib64/pkgconfig:/usr/share/pkgconfig:/usr/share/glib-2.0:/usr/lib64/gio
升级gcc,使之支持lamba表达式 参考文章http://wecoding.cn/?p=190
wget http://people.centos.org/tru/devtools-2/devtools-2.repo -O /etc/yum.repos.d/devtools-2.repo
yum install devtoolset-2-gcc devtoolset-2-binutils devtoolset-2-gcc-c++
mv /usr/bin/gcc /usr/bin/gcc.backup
ln -s /opt/rh/devtoolset-2/root/usr/bin/gcc /usr/bin/gcc
mv /usr/bin/g++ /usr/bin/g++.backup
ln -s /opt/rh/devtoolset-2/root/usr/bin/g++ /usr/bin/g++
gcc -v 查看版本,查看是否支持C++
如果有CMakeCache.txt,要先删除 rm CMakeCache.txt
cmake . -DCMAKE_C_COMPILER=/usr/bin/gcc -DCMAKE_CXX_COMPILER=/usr/bin/g++
make
if echo fatal error: glib.h no such file or directory
cp /usr/include/glib-2.0/glib.h /usr/include/
if echo fatal error: glib/galloca.h: 没有那个文件或目录
mkdir /usr/include/glib
cp /usr/include/glib-2.0/glib/*.h /usr/include/glib
if echo fatal error: glibconfig.h: 没有那个文件或目录
cp /usr/lib64/glibconfig.h /usr/include
if echo fatal error: glib-object.h: 没有那个文件或目录
mkdir /usr/include/gobject
cp /usr/include/glib-2.0/gobject/*.h /usr/include/gobject/
把cairo export 到PKG_CONFIG_PATH
export PKG_CONFIG_PATH=$PKG_CONFIG_PATH:/usr/local/lib/cairo:/usr/local/include/cairo
make install
pdf2htmlEX -v
if echo pdf2htmlEX: error while loading shared libraries: libfontforge.so.2: cannot open shared object file: No such file or directory
cp /opt/fontforge/fontforge/.libs/libfontforge.so.2 /usr/local/include/poppler/
if echo pdf2htmlEX: error while loading shared libraries: libpoppler.so.45: cannot open shared object file: No such file or directory
cp /opt/poppler-0.25.0/poppler/.libs/libpoppler.so.45 /usr/local/include/poppler/
#要把poppler的路径export到LD_LIBRARY_PATH (或者在编译poppler的时候,通过./configure --prefix=/usr 来避免)
export LD_LIBRARY_PATH=/usr/local/include/poppler/
为了长久有效,LD_LIBRARY_PATH写入/etc/profile
vi /etc/profile
LD_LIBRARY_PATH=/usr/local/include/poppler/
export LD_LIBRARY_PATH
source /etc/profile
如果crontab定时任务出现
sh java commond not found
先检查是否配置java环境变量 参考http://www.cnblogs.com/zhoulf/archive/2013/02/04/2891608.html
然后软链java bin的目录到/usr/bin/java
例如: ln -s /usr/local/jdk/java/bin /usr/bin/java
pdf2htmlEX: error while loading shared libraries: libfontforge.so.2: cannot open shared object file: No such file or directory
cd /usr/local/include/poppler/libpoppler.so.45 /usr/lib
cd /usr/local/include/poppler/libpoppler.so.45 /usr/lib64
cd /usr/local/include/poppler/libpoppler.so.45 /usr/lib64
参考资料:
http://www.ibm.com/developerworks/cn/linux/l-cn-cmake/
http://blog.atime.me/note/cmake.html
http://www.hyzgame.com.cn/?p=1631
http://www.cmake.org/cmake-tutorial/
http://sewm.pku.edu.cn/src/paradise/reference/CMake%20Practice.pdf
http://www.linuxfromscratch.org/blfs/view/svn/general/poppler.html
http://www.linuxfromscratch.org/blfs/view/svn/general/openjpeg2.html
https://github.com/coolwanglu/pdf2htmlEX/issues/420
http://unix.stackexchange.com/questions/118550/autoreconf-fails-with-cant-exec-libtoolize
http://unix.stackexchange.com/questions/118550/autoreconf-fails-with-cant-exec-libtoolize
https://github.com/klokoy/pdf2htmlEX_docker/blob/master/Dockerfile
https://github.com/aur-archive/pdf2htmlex/blob/master/PKGBUILD
http://blog.mathieu-leplatre.info/static-build-of-cairo-and-librsvg.html
https://github.com/coolwanglu/pdf2htmlEX