Java将HTML转化为PDF+获得页数+合并PDF

最新推荐文章于 2024-07-26 16:25:12 发布

SoXiaTea

最新推荐文章于 2024-07-26 16:25:12 发布

阅读量2.6k

点赞数

本文链接：https://blog.csdn.net/q438944209/article/details/82920301

版权

最近公司用到了HTML转PDF的技术，这里就把用到的方式和技术在这里写一下，方便他人和自己以后有需要时进行。

首先将HTML转化成PDF：现在有许多将HTML转化为PDF的方法：PD4ML，IText，Flying Sauser。

选择PD4ML原因：PD4ML实现html2pdf，速度快，纠错能力强，支持多种中文字体。相比较于IText和Flying Sauser方便许多

依赖包：pd4ml.jar fonts.jar pd4ml.jar ss_css2.jar
如果这里出现中文乱码的情况，请大家在src根目录下创建一个包fonts，在里面新建一个配置文件pd4fonts.properties 在配置文件中写

KaiTi_GB2312=SIMKAI.TTF

public class LLL_HTMLToPDF {
	public static void main(String[] args) throws Exception {
		File pdfFile = new File("D:/pdf/index.pdf");
		htmltopdf(pdfFile, "D:/pdf/index.html");
	}
	private static void htmltopdf(File outputPDFFile, String inputHTMLFileName) throws Exception {
		FileOutputStream fos = new FileOutputStream(outputPDFFile);
		PD4ML pd4ml = new PD4ML();
		pd4ml.setPageInsets(new Insets(40,30,30,40));
		pd4ml.setHtmlWidth(960);
		PD4PageMark p = new PD4PageMark();

		pd4ml.setPageHeader(p);
		pd4ml.setPageSize(PD4Constants.A4);
		pd4ml.useTTF("java:fonts", true);
		pd4ml.setDefaultTTFs("KaiTi_GB2312", "KaiTi_GB2312", "KaiTi_GB2312");
		pd4ml.enableDebugInfo();
		pd4ml.render("file:" + inputHTMLFileName,fos);
	}
}

到这里我们就可以看到在目录下有当前生成的pdf文件了。

下面讲一下pdf页数的获取除开一些基础包之外，我们需要用到的包是：pdfbox-app-1.7.1.jar（注意用1.7.1才与我这个一样）

public class LLL_getPDFpage {
	public static void main(String[] args) {
		PDFParser parser;
		File file = new File("D:/pdf/printer/mergedTest.pdf");
		COSDocument cosDoc = null;
		PDDocument pdDoc = null;
		try {
			parser = new PDFParser(new FileInputStream(file));
			parser.parse();
			cosDoc = parser.getDocument();
			pdDoc = new PDDocument(cosDoc);
			System.out.println(pdDoc.getDocumentCatalog().getAllPages().size());
		} catch (Exception e) {
			e.printStackTrace();
			try {
				if (cosDoc != null)
					cosDoc.close();
				if (pdDoc != null)
					pdDoc.close();
			} catch (Exception e1) {
				e.printStackTrace();
			}
		}
	}
}

这里就可以打印出来当前pdf有多少页

然后下面说一下将pdf合并的问题我们现在项目中需要将多个pdf合并成一个pdf文件，这里还是用的pdfbox包就可以

public class LLL_MergePDF {
	public static void main(String[] args) throws Exception {
		PDFMergerUtility mergePdf = new PDFMergerUtility();
		String folder = "D:/pdf/";
		String destinationFileName = "mergedTest.pdf";
		String[] filesInFolder = getFiles(folder);
		for (int i = 0; i < filesInFolder.length; i++)
		{
			mergePdf.addSource(folder + File.separator + filesInFolder[i]);
		}
		mergePdf.setDestinationFileName(folder + File.separator + destinationFileName);
		mergePdf.mergeDocuments();
		System.out.print("合并完成__LLL丶禾羊__博客");
	}
	private static String[] getFiles(String folder) throws IOException {
		File _folder = new File(folder);
		String[] filesInFolder;
		if (_folder.isDirectory()) {
			filesInFolder = _folder.list();
			return filesInFolder;
		} else {
			throw new IOException("Path is not a directory");
		}
	}
}

这里就是java关于pdf的一些操作，讲了java将html转化成pdf，获取pdf的页码，合并多个pdf到一个里面去，如果对你起到了帮助，请把它分享给更多人，谢谢！