一: 错误生产原因
从MinIO下载文件获取字节流, 将字节流传给PDDocument.load(in)
Caused by: java.io.IOException: Error: End-of-File, expected line
at org.apache.pdfbox.pdfparser.BaseParser.readLine(BaseParser.java:1119) ~[pdfbox-2.0.8.jar:2.0.8]
at org.apache.pdfbox.pdfparser.COSParser.parseHeader(COSParser.java:2442) ~[pdfbox-2.0.8.jar:2.0.8]
at org.apache.pdfbox.pdfparser.COSParser.parsePDFHeader(COSParser.java:2425) ~[pdfbox-2.0.8.jar:2.0.8]
at org.apache.pdfbox.pdfparser.PDFParser.parse(PDFParser.java:233) ~[pdfbox-2.0.8.jar:2.0.8]
at org.apache.pdfbox.pdmodel.PDDocument.load(PDDocument.java:1145) ~[pdfbox-2.0.8.jar:2.0.8]
at org.apache.pdfbox.pdmodel.PDDocument.load(PDDocument.java:1042) ~[pdfbox-2.0.8.jar:2.0.8]
at com.es.canal.SysFileHandler.ocrPdf(SysFileHandler.java:173) ~[classes/:na]
... 9 common frames omitted
1.1 环境依赖
<parent>
<groupId>org.springframework.boot</groupId>
<artifactId>spring-boot-starter-parent</artifactId>
<version>2.6.7</version>
<relativePath/> <!-- lookup parent from repository -->
</parent>
<dependencies>
<!-- minio -->
<!--maven引入minio排除okhttp依赖并添加高版本的okhttp依赖-->
<dependency>
<groupId>io.minio</groupId>
<artifactId>minio</artifactId>
<version>8.5.2</version>
<exclusions>
<exclusion>
<groupId>com.squareup.okhttp3</groupId>
<artifactId>okhttp</artifactId>
</exclusion>
</exclusions>
</dependency>
<dependency>
<groupId>com.squareup.okhttp3</groupId>
<artifactId>okhttp</artifactId>
<version>4.9.0</version>
</dependency>
<!-- PDF文档处理 -->
<dependency>
<groupId>org.apache.pdfbox</groupId>
<artifactId>pdfbox</artifactId>
<version>2.0.8</version>
</dependency>
<!-- ocr-->
<dependency>
<groupId>net.sourceforge.tess4j</groupId>
<artifactId>tess4j</artifactId>
<version>4.1.1</version>
</dependency>
<dependency>
<groupId>top.javatool</groupId>
<artifactId>canal-spring-boot-starter</artifactId>
<version>1.2.1-RELEASE</version>
</dependency>
<!-- elasticsearch -->
<dependency>
<groupId>org.springframework.boot</groupId>
<artifactId>spring-boot-starter-data-elasticsearch</artifactId>
</dependency>
<!-- IOUtils -->
<dependency>
<groupId>commons-io</groupId>
<artifactId>commons-io</artifactId>
<version>2.8.0</version>
</dependency>
<!-- JSON 解析器和生成器 -->
<dependency>
<groupId>com.alibaba.fastjson2</groupId>
<artifactId>fastjson2</artifactId>
<version>2.0.7</version>
</dependency>
<!--web-->
<dependency>
<groupId>org.springframework.boot</groupId>
<artifactId>spring-boot-starter-web</artifactId>
</dependency>
<dependency>
<groupId>org.springframework.boot</groupId>
<artifactId>spring-boot-starter</artifactId>
</dependency>
<dependency>
<groupId>org.springframework.boot</groupId>
<artifactId>spring-boot-devtools</artifactId>
<scope>runtime</scope>
<optional>true</optional>
</dependency>
<dependency>
<groupId>org.projectlombok</groupId>
<artifactId>lombok</artifactId>
<optional>true</optional>
</dependency>
<dependency>
<groupId>org.springframework.boot</groupId>
<artifactId>spring-boot-starter-test</artifactId>
<scope>test</scope>
</dependency>
</dependencies>
<build>
<plugins>
<plugin>
<groupId>org.springframework.boot</groupId>
<artifactId>spring-boot-maven-plugin</artifactId>
<configuration>
<excludes>
<exclude>
<groupId>org.projectlombok</groupId>
<artifactId>lombok</artifactId>
</exclude>
</excludes>
</configuration>
</plugin>
</plugins>
</build>
1.2 错误代码示例
public void a(){
//从MinIO下载文件
GetObjectResponse in =
minioClient.getObject(GetObjectArgs.builder().bucket(sysFile.getBucketName()).object(sysFile.getObjectName()).build());
String base64;
byte[] bytes = IOUtils.toByteArray(in);
//图片解析
PDDocument pdf = PDDocument.load(in);
PDFRenderer renderer = new PDFRenderer(pdf);
}
二: 问题解决代码
把IOUtils.toByteArray(in) 移到PDDocument.load(in)下面,目的就是为了把流传给PDDocument.load(in)之前不要做流操作
public void a(){
//从MinIO下载文件
GetObjectResponse in =
minioClient.getObject(GetObjectArgs.builder().bucket(sysFile.getBucketName()).object(sysFile.getObjectName()).build());
String base64;
//图片解析
PDDocument pdf = PDDocument.load(in);
PDFRenderer renderer = new PDFRenderer(pdf);
byte[] bytes = IOUtils.toByteArray(in);
}
其它
具体原因可能是工具类操作流后没有把流关闭