public class WordParser {
public static void main(String[] args) throws Exception {
// 指定要解析的Word文件路径
String filePath = "path/to/your/file.docx";
FileInputStream fis = new FileInputStream(filePath);
XWPFDocument doc = new XWPFDocument(fis);
for (XWPFParagraph paragraph : doc.getParagraphs()) {
//获取段落文本(非文本的为空)
String t = paragraph.getParagraphText();
for (XWPFRun run : paragraph.getRuns()) {
List<XWPFPicture> embeddedPictures = run.getEmbeddedPictures();
//判断是否存在图片
if (embeddedPictures.size() > 0) {
int type = pic.getPictureData().getPictureType();
byte[] img = pic.getPictureData().getData();
String extension = "";
switch (type) {
case Document.PICTURE_TYPE_EMF:
extension = ".emf";
break;
case Document.PICTURE_TYPE_WMF:
extension = ".wmf";
break;
case Document.PICTURE_TYPE_PICT:
extension = ".pic";
break;
case Document.PICTURE_TYPE_PNG:
extension = ".png";
break;
case Document.PICTURE_TYPE_DIB:
extension = ".dib";
break;
default:
extension = ".jpg";
break;
}
//byte[]转MultipartFile对象
MockMultipartFile multipartFile = new MockMultipartFile("fileName" + extension, "fileName" + extension, extension, img);
}
}
}
doc.close();
fis.close();
}
}
解析word文件
最新推荐文章于 2024-07-26 21:42:36 发布
本文介绍了一个Java程序,使用ApachePOI库中的XWPFDocument类解析Word文档,提取并处理嵌入的图片,将其转换为MultipartFile对象以便于上传或进一步操作。
摘要由CSDN通过智能技术生成