java 将pdf转换image

最新推荐文章于 2024-07-22 08:46:10 发布

yiluoAK_47

最新推荐文章于 2024-07-22 08:46:10 发布

阅读量7k

点赞数 1

分类专栏： java

本文链接：https://blog.csdn.net/yiluoak_47/article/details/25150419

版权

java 专栏收录该内容

80 篇文章 2 订阅

订阅专栏

首先使用了使用了apache的PDFBox组件1.8.4版本

package pdf;

import java.awt.image.BufferedImage;
import java.io.File;
import java.io.IOException;
import java.util.Date;
import java.util.List;

import javax.imageio.ImageIO;

import org.apache.pdfbox.pdmodel.PDDocument;
import org.apache.pdfbox.pdmodel.PDPage;

public class PDFBox {
	
	@SuppressWarnings("rawtypes")
	public static void main(String[] args) throws IOException {
	 	String p=System.getProperty("user.dir") + "/"+"zk.pdf";   
	 	
        PDDocument doc = PDDocument.load(p);
        int pageCount = doc.getNumberOfPages();
        System.out.println(pageCount);
        Date start = new Date();
        try {
        	List pages = doc.getDocumentCatalog().getAllPages();
            for(int i=0;i<pages.size();i++){
                PDPage page = (PDPage) pages.get(i);
                @SuppressWarnings("unused")
				int width = new Float(page.getTrimBox().getWidth()).intValue();
                @SuppressWarnings("unused")
				int height = new Float(page.getTrimBox().getHeight()).intValue();
                BufferedImage image = page.convertToImage();
				ImageIO.write(image, "jpg", new File("img" + File.separator + (i + 1) + ".jpg"));
				System.out.println("image in the page -->"+(i+1));
            }
		} catch (Exception e) {
			e.printStackTrace();
		}finally{
			if(doc != null){
				doc.close();
			}
		}
        Date end = new Date();
        System.out.println(end.getTime()-start.getTime());
        System.out.println("over");
    }
	
}

但是其问题在于问题：

当PDF文档为180M大小时直接报解析异常

当PDF页数为500多页时处理非常慢

其后尝试使用了pdf-renderer 1.0.5 版本

package pdf;

import java.awt.Image;
import java.awt.Rectangle;
import java.awt.image.BufferedImage;
import java.io.File;
import java.io.FileOutputStream;
import java.io.IOException;
import java.io.RandomAccessFile;
import java.nio.MappedByteBuffer;
import java.nio.channels.FileChannel;

import com.sun.image.codec.jpeg.JPEGCodec;
import com.sun.image.codec.jpeg.JPEGEncodeParam;
import com.sun.image.codec.jpeg.JPEGImageEncoder;
import com.sun.pdfview.PDFFile;
import com.sun.pdfview.PDFPage;

public class PDFRenderer {
	
	public static void main(String[] args) throws IOException{
		String pdfRealePath=System.getProperty("user.dir") + "/"+"zk.pdf";
		File file = new File(pdfRealePath);
		RandomAccessFile raf = new RandomAccessFile(file, "r");
		FileChannel channel = raf.getChannel();
		MappedByteBuffer buf = channel.map(FileChannel.MapMode.READ_ONLY,
				0, channel.size());
		PDFFile pdffile = new PDFFile(buf);
		
		for (int i = 1; i <= pdffile.getNumPages(); i++) {
			PDFPage page = pdffile.getPage(i);
			Rectangle rect = new Rectangle(0, 0, ((int) page.getBBox()
					.getWidth()), ((int) page.getBBox().getHeight()));
			Image img = page.getImage(rect.width, rect.height, rect, null,true,true);
			BufferedImage tag = new BufferedImage(rect.width, rect.height,
					BufferedImage.TYPE_INT_RGB);
			tag.getGraphics().drawImage(img, 0, 0, rect.width, rect.height,null);
			
			FileOutputStream out = new FileOutputStream("img" + File.separator + (i + 1) + ".jpg"); // 输出到文件流
			JPEGImageEncoder encoder = JPEGCodec.createJPEGEncoder(out);
			JPEGEncodeParam param2 = encoder.getDefaultJPEGEncodeParam(tag);
			param2.setQuality(1f, false);// 1f是提高生成的图片质量
			encoder.setJPEGEncodeParam(param2);
			encoder.encode(tag); // JPEG编码
			out.close();
			System.out.println("image in the page -->"+(i+1));
		}
	}
}

但是其问题在于问题：当pdf的版本不为1.4时，直接报错：Expected 'xref' at start of table

pdfbox与pdfrenderer相比较来说，转换的效率要低得多。200页左右的pdf花费的时间是后者的6倍左右。同时，对于中文字体的支持存在些问题。

但是对于却不存在pdf版本不同无法转换的问题。

pdfrenderer 不能转换1.4以上版本，查找了解决办法但是没有找到