如果不想网页上的文章被复制(没错,说的就是某点),如果想实现文档不需要下载下来就能在线预览查看(常见于文档付费下载网站、邮箱附件预览),该怎么做?常见的做法就是将他们转化成图片。以下代码基于 aspose-words(用于txt、word转图片),pdfbox(用于pdf转图片),封装成一个工具类来实现txt、word、pdf等文件转图片的需求。
首先在项目的pom文件里添加下面两个依赖
<dependency>
<groupId>com.luhuiguo</groupId>
<artifactId>aspose-words</artifactId>
<version>23.1</version></dependency>
<dependency>
<groupId>org.apache.pdfbox</groupId>
<artifactId>pdfbox</artifactId>
<version>2.0.4</version>
</dependency>
一、将文件转换成图片,并生成到本地
1、将word文件转成图片
public static void wordToImage(String wordPath, String imagePath) throws Exception {
Document doc = new Document(wordPath);
File file = new File(wordPath);
String filename = file.getName();
String pathPre = imagePath + File.separator + filename.substring(0, filename.lastIndexOf("."));
for (int i = 0; i < doc.getPageCount(); i++) {
Document extractedPage = doc.extractPages(i, 1);
String path = pathPre + (i + 1) + ".png";
extractedPage.save(path, SaveFormat.PNG);
}
}
验证:
public static void main(String[] args) throws Exception {
FileConvertUtil.wordToImage("D:\\书籍\\电子书\\其它\\《山海经》异兽图.doc", "D:\\test\\word");
}
验证结果:
2、将txt文件转成图片(同word文件转成图片)
public static void txtToImage(String txtPath, String imagePath) throws Exception {
wordToImage(txtPath, imagePath);
}
验证:
public static void main(String[] args) throws Exception {
FileConvertUtil.wordToImage("D:\\书籍\\电子书\\其它\\《山海经》异兽图.doc", "D:\\test\\word");
}
验证结果:
3、将pdf文件转图片
public static void pdfToImage(String pdfPath, String imagePath) throws Exception {
File file = new File(pdfPath);
String filename = file.getName();
String pathPre = imagePath + File.separator + filename.substring(0, filename.lastIndexOf("."));
PDDocument doc = PDDocument.load(file);
PDFRenderer renderer = new PDFRenderer(doc);
for (int i = 0; i < doc.getNumberOfPages(); i++) {
BufferedImage image = renderer.renderImageWithDPI(i, 144); // Windows native DPI
String pathname = pathPre + (i + 1) + ".png";
ImageIO.write(image, "PNG", new File(pathname));
}
doc.close();
}
验证:
public static void main(String[] args) throws Exception {
FileConvertUtil.pdfToImage("D:\\书籍\\电子书\\其它\\自然哲学的数学原理.pdf", "D:\\test\\pdf");
}
验证结果:
4、同时支持多种文件类型转成图片
public static void fileToImage(String sourceFilePath, String imagePath) throws Exception {
String ext = sourceFilePath.substring(sourceFilePath.lastIndexOf("."));
switch (ext) {
case ".doc":
case ".docx":
wordToImage(sourceFilePath, imagePath);
break;
case ".pdf":
pdfToImage(sourceFilePath, imagePath);
break;
case ".txt":
txtToImage(sourceFilePath, imagePath);
break;
default:
System.out.println("文件格式不支持");
}
}
二、利用多线程提升文件写入本地的效率
在将牛顿大大的长达669页的巨作《自然哲学的数学原理》时发现执行时间较长,执行花了140,281ms。但其实这种IO密集型的操作是通过使用多线程的方式来提升效率的,于是针对这点,我又写了一版多线程的版本。
同步执行导出 自然哲学的数学原理.pdf 耗时:
优化后的代码如下:
public static void pdfToImageAsync(String pdfPath, String imagePath) throws Exception {
long old = System.currentTimeMillis();
File file = new File(pdfPath);
PDDocument doc = PDDocument.load(file);
PDFRenderer renderer = new PDFRenderer(doc);
int pageCount = doc.getNumberOfPages();
int numCores = Runtime.getRuntime().availableProcessors();
ExecutorService executorService = Executors.newFixedThreadPool(numCores);
for (int i = 0; i < pageCount; i++) {
int finalI = i;
executorService.submit(() -> {
try {
BufferedImage image = renderer.renderImageWithDPI(finalI, 144); // Windows native DPI
String filename = file.getName();
filename = filename.substring(0, filename.lastIndexOf("."));
String pathname = imagePath + File.separator + filename + (finalI + 1) + ".png";
ImageIO.write(image, "PNG", new File(pathname));
} catch (Exception ex) {
ex.printStackTrace();
}
});
}
executorService.shutdown();
executorService.awaitTermination(Long.MAX_VALUE, TimeUnit.NANOSECONDS);
doc.close();
long now = System.currentTimeMillis();
System.out.println("pdfToImage 多线程 转换完成..用时:" + (now - old) + "ms");
}
多线程执行导出 自然哲学的数学原理.pdf 耗时如下:
从上图可以看到本次执行只花了24045ms,只花了原先差不多六分之一的时间,极大地提升了执行效率。除了pdf,word、txt转图片也可以做这样的多线程改造:
//将word转成图片(多线程)
public static void wordToImageAsync(String wordPath, String imagePath) throws Exception {
Document doc = new Document(wordPath);
File file = new File(wordPath);
String filename = file.getName();
String pathPre = imagePath + File.separator + filename.substring(0, filename.lastIndexOf("."));
int numCores = Runtime.getRuntime().availableProcessors();
ExecutorService executorService = Executors.newFixedThreadPool(numCores);
for (int i = 0; i < doc.getPageCount(); i++) {
int finalI = i;
executorService.submit(() -> {
try {
Document extractedPage = doc.extractPages(finalI, 1);
String path = pathPre + (finalI + 1) + ".png";
extractedPage.save(path, SaveFormat.PNG);
} catch (Exception ex) {
ex.printStackTrace();
}
});
}
}
//将txt转成图片(多线程)
public static void txtToImageAsync(String txtPath, String imagePath) throws Exception {
wordToImageAsync(txtPath, imagePath);
}
三、将文件转换成图片流
有的时候我们转成图片后并不需要在本地生成图片,而是需要将图片返回或者上传到图片服务器,这时候就需要将转换后的图片转成流返回以方便进行传输,代码示例如下:
1、将word文件转成图片流
public static List<byte[]> wordToImageStream(String wordPath) throws Exception {
Document doc = new Document(wordPath);
List<byte[]> list = new ArrayList<>();
for (int i = 0; i < doc.getPageCount(); i++) {
try(ByteArrayOutputStream outputStream = new ByteArrayOutputStream()){
Document extractedPage = doc.extractPages(i, 1);
extractedPage.save(outputStream, SaveFormat.*PNG*);
list.add(outputStream.toByteArray());
}
}
return list;
}
2、将txt文件转成图片流
public static List<byte[]> txtToImageStream(String txtPath) throws Exception {
return *wordToImagetream*(txtPath);
}
3、将pdf转成图片流
public static List<byte[]> pdfToImageStream(String pdfPath) throws Exception {
File file = new File(pdfPath);
PDDocument doc = PDDocument.*load*(file);
PDFRenderer renderer = new PDFRenderer(doc);
List<byte[]> list = new ArrayList<>();
for (int i = 0; i < doc.getNumberOfPages(); i++) {
try(ByteArrayOutputStream outputStream = new ByteArrayOutputStream()) {
BufferedImage image = renderer.renderImageWithDPI(i, 144); // Windows native DPI
ImageIO.*write*(image, "PNG", outputStream);
list.add(outputStream.toByteArray());
}
}
doc.close();
return list;
}
4、支持多种类型文件转成图片流
public static List<byte[]> fileToImageStream(String pdfPath) throws Exception {
String ext = pdfPath.substring(pdfPath.lastIndexOf("."));
switch (ext) {
case ".doc":
case ".docx":
return *wordToImageStream*(pdfPath);
case ".pdf":
return *pdfToImageStream*(pdfPath);
case ".txt":
return *txtToImageStream*(pdfPath);
default:
System.*out*.println("文件格式不支持");
}
return null;
}
最后附上完整的工具类代码:
package com.fhey.service.common.utils.file;
import com.aspose.words.Document;
import com.aspose.words.SaveFormat;
import com.aspose.words.SaveOptions;
import javassist.bytecode.ByteArray;
import org.apache.pdfbox.pdmodel.PDDocument;
import org.apache.pdfbox.rendering.PDFRenderer;
import javax.imageio.ImageIO;
import java.awt.image.BufferedImage;
import java.io.ByteArrayOutputStream;
import java.io.File;
import java.util.ArrayList;
import java.util.List;
import java.util.concurrent.ExecutorService;
import java.util.concurrent.Executors;
import java.util.concurrent.TimeUnit;
public class FileConvertUtil {
//文件转成图片
public static void fileToImage(String sourceFilePath, String imagePath) throws Exception {
String ext = sourceFilePath.substring(sourceFilePath.lastIndexOf("."));
switch (ext) {
case ".doc":
case ".docx":
wordToImage(sourceFilePath, imagePath);
break;
case ".pdf":
pdfToImage(sourceFilePath, imagePath);
break;
case ".txt":
txtToImage(sourceFilePath, imagePath);
break;
default:
System.out.println("文件格式不支持");
}
}
//将pdf转成图片
public static void pdfToImage(String pdfPath, String imagePath) throws Exception {
File file = new File(pdfPath);
String filename = file.getName();
String pathPre = imagePath + File.separator + filename.substring(0, filename.lastIndexOf("."));
PDDocument doc = PDDocument.load(file);
PDFRenderer renderer = new PDFRenderer(doc);
for (int i = 0; i < doc.getNumberOfPages(); i++) {
BufferedImage image = renderer.renderImageWithDPI(i, 144); // Windows native DPI
String pathname = pathPre + (i + 1) + ".png";
ImageIO.write(image, "PNG", new File(pathname));
}
doc.close();
}
//txt转成转成图片
public static void txtToImage(String txtPath, String imagePath) throws Exception {
wordToImage(txtPath, imagePath);
}
//将word转成图片
public static void wordToImage(String wordPath, String imagePath) throws Exception {
Document doc = new Document(wordPath);
File file = new File(wordPath);
String filename = file.getName();
String pathPre = imagePath + File.separator + filename.substring(0, filename.lastIndexOf("."));
for (int i = 0; i < doc.getPageCount(); i++) {
Document extractedPage = doc.extractPages(i, 1);
String path = pathPre + (i + 1) + ".png";
extractedPage.save(path, SaveFormat.PNG);
}
}
//pdf转成图片(多线程)
public static void pdfToImageAsync(String pdfPath, String imagePath) throws Exception {
long old = System.currentTimeMillis();
File file = new File(pdfPath);
PDDocument doc = PDDocument.load(file);
PDFRenderer renderer = new PDFRenderer(doc);
int pageCount = doc.getNumberOfPages();
int numCores = Runtime.getRuntime().availableProcessors();
ExecutorService executorService = Executors.newFixedThreadPool(numCores);
for (int i = 0; i < pageCount; i++) {
int finalI = i;
executorService.submit(() -> {
try {
BufferedImage image = renderer.renderImageWithDPI(finalI, 144); // Windows native DPI
String filename = file.getName();
filename = filename.substring(0, filename.lastIndexOf("."));
String pathname = imagePath + File.separator + filename + (finalI + 1) + ".png";
ImageIO.write(image, "PNG", new File(pathname));
} catch (Exception ex) {
ex.printStackTrace();
}
});
}
executorService.shutdown();
executorService.awaitTermination(Long.MAX_VALUE, TimeUnit.NANOSECONDS);
doc.close();
long now = System.currentTimeMillis();
System.out.println("pdfToImage 多线程 转换完成..用时:" + (now - old) + "ms");
}
//将word转成图片(多线程)
public static void wordToImageAsync(String wordPath, String imagePath) throws Exception {
Document doc = new Document(wordPath);
File file = new File(wordPath);
String filename = file.getName();
String pathPre = imagePath + File.separator + filename.substring(0, filename.lastIndexOf("."));
int numCores = Runtime.getRuntime().availableProcessors();
ExecutorService executorService = Executors.newFixedThreadPool(numCores);
for (int i = 0; i < doc.getPageCount(); i++) {
int finalI = i;
executorService.submit(() -> {
try {
Document extractedPage = doc.extractPages(finalI, 1);
String path = pathPre + (finalI + 1) + ".png";
extractedPage.save(path, SaveFormat.PNG);
} catch (Exception ex) {
ex.printStackTrace();
}
});
}
}
//将txt转成图片(多线程)
public static void txtToImageAsync(String txtPath, String imagePath) throws Exception {
wordToImageAsync(txtPath, imagePath);
}
//将文件转成图片流
public static List<byte[]> fileToImageStream(String pdfPath) throws Exception {
String ext = pdfPath.substring(pdfPath.lastIndexOf("."));
switch (ext) {
case ".doc":
case ".docx":
return wordToImageStream(pdfPath);
case ".pdf":
return pdfToImageStream(pdfPath);
case ".txt":
return txtToImageStream(pdfPath);
default:
System.out.println("文件格式不支持");
}
return null;
}
//将pdf转成图片流
public static List<byte[]> pdfToImageStream(String pdfPath) throws Exception {
File file = new File(pdfPath);
PDDocument doc = PDDocument.load(file);
PDFRenderer renderer = new PDFRenderer(doc);
List<byte[]> list = new ArrayList<>();
for (int i = 0; i < doc.getNumberOfPages(); i++) {
try(ByteArrayOutputStream outputStream = new ByteArrayOutputStream()) {
BufferedImage image = renderer.renderImageWithDPI(i, 144); // Windows native DPI
ImageIO.write(image, "PNG", outputStream);
list.add(outputStream.toByteArray());
}
}
doc.close();
return list;
}
//将word转成图片流
public static List<byte[]> wordToImageStream(String wordPath) throws Exception {
Document doc = new Document(wordPath);
List<byte[]> list = new ArrayList<>();
for (int i = 0; i < doc.getPageCount(); i++) {
try(ByteArrayOutputStream outputStream = new ByteArrayOutputStream()){
Document extractedPage = doc.extractPages(i, 1);
extractedPage.save(outputStream, SaveFormat.PNG);
list.add(outputStream.toByteArray());
}
}
return list;
}
//将txt转成图片流
public static List<byte[]> txtToImageStream(String txtPath) throws Exception {
return wordToImageStream(txtPath);
}
}