【Java】使用 pdf-gae 将Word转换为PDF时稳定发生单元格错乱问题
问题描述
版本
JDK -> 1.8
fr.opensagres.poi.xwpf.converter.pdf-gae -> 2.0.2
POI -> 4.1.1
在使用 fr.opensagres.poi.xwpf.converter.pdf-gae 将Word转换为PDF时,在特定的表格样式下,会出现单元格错乱问题
确切一点就是,当表格中存在垂直合并的单元格,且它的前面存在水平拆分的单元格(非垂直合并的最后一行)时,转PDF后会发生错乱
引入的依赖:
<properties>
<poi.version>4.1.1</poi.version>
</properties>
<dependencies>
<dependency>
<groupId>org.apache.poi</groupId>
<artifactId>poi</artifactId>
<version>${poi.version}</version>
</dependency>
<dependency>
<groupId>org.apache.poi</groupId>
<artifactId>poi-ooxml</artifactId>
<version>${poi.version}</version>
</dependency>
<dependency>
<groupId>org.apache.poi</groupId>
<artifactId>ooxml-schemas</artifactId>
<version>1.4</version>
</dependency>
<dependency>
<groupId>org.apache.poi</groupId>
<artifactId>poi-scratchpad</artifactId>
<version>${poi.version}</version>
</dependency>
<!-- https://mvnrepository.com/artifact/org.apache.pdfbox/pdfbox -->
<dependency>
<groupId>org.apache.pdfbox</groupId>
<artifactId>pdfbox</artifactId>
<version>3.0.1</version>
</dependency>
<!-- Word转PDF -->
<dependency>
<groupId>fr.opensagres.xdocreport</groupId>
<artifactId>fr.opensagres.poi.xwpf.converter.pdf-gae</artifactId>
<version>2.0.2</version>
</dependency>
</dependencies>
由于项目限制,不能使用最新的 2.0.3 ,因此在 2.0.2 的版本基础上尝试修复该问题
代码:
public class TestWordUtil {
private static final String docFilaPath = "C:\\Users\\test.docx";
@Test
public void test01() throws Exception {
FileInputStream is = new FileInputStream(docFilaPath);
ByteArrayOutputStream bos = new ByteArrayOutputStream();
XWPFDocument xwpfDocument = new XWPFDocument(is);
PdfOptions pdfOptions = PdfOptions.create();
PdfConverter.getInstance().convert(xwpfDocument, bos, pdfOptions);
FileUtil.writeBytes(bos.toByteArray(), "D:\\output.pdf");
}
}
原因分析
通过调整表格样式和对源码跟踪,知道了问题的原因所在:
我们使用一个简化的表格进行说明:
可以清晰看出,原本 A2 所在单元格与 B2 垂直合并,但是转换后,下一行的 C1 却侵占上来了,而 C2 也消失了
这时因为源码中原本应该在解析到 A2 的时候,将与其一起垂直合并的单元格进行纳入到 container 中,但是却没有查询到该单元格
源码:
// fr.opensagres.poi.xwpf.converter.core.XWPFDocumentVisitor#visitTableRow
protected void visitTableRow( XWPFTableRow row, float[] colWidths, T tableContainer, boolean firstRow,
boolean lastRowIfNoneVMerge, int rowIndex, int rowsSize )
throws Exception
{
boolean headerRow = stylesDocument.isTableRowHeader( row );
startVisitTableRow( row, tableContainer, rowIndex, headerRow );
int nbColumns = colWidths.length;
// Process cell
boolean firstCol = true;
boolean lastCol = false;
boolean lastRow = false;
List<XWPFTableCell> vMergedCells = null;
List<XWPFTableCell> cells = row.getTableCells();
XWPFTableCell firstCell = cells.get(0);
// 如果列数 大于 单元格数
// 表示此行存在合并单元格
if ( nbColumns > cells.size() )
{
// Columns number is not equal to cells number.
// POI have a bug with
// <w:tr w:rsidR="00C55C20">
// <w:tc>
// <w:tc>...
// <w:sdt>
// <w:sdtContent>
// <w:tc> <= this tc which is a XWPFTableCell is not included in the row.getTableCells();
firstCol = true;
int cellIndex = 0;
CTRow ctRow = row.getCtRow();
XmlCursor c = ctRow.newCursor();
c.selectPath( "./*" );
while ( c.toNextSelection() )
{
XmlObject o = c.getObject();
if ( o instanceof CTTc )
{
CTTc tc = (CTTc) o;
XWPFTableCell cell = row.getTableCell( tc );
int nextCellIndex = getCellIndex( cellIndex, cell );
lastCol = ( nextCellIndex == nbColumns );
// 在这里对单元格进行垂直合并的判断
// 如果存在垂直合并的单元格,会一起放入 container 中
System.out.println("hasMerged1");
vMergedCells = getVMergedCells( cell, rowIndex, cellIndex, nbColumns );
if (vMergedCells != null) {
for (XWPFTableCell vMergedCell : vMergedCells) {
System.out.println(" vMerged>" + vMergedCell.getText());
}
} else {
System.out.println(" >" + cell.getText());
}
if ( vMergedCells == null || vMergedCells.size() > 0 )
{
lastRow = isLastRow( lastRowIfNoneVMerge, rowIndex, rowsSize, vMergedCells );
visitCell( cell, tableContainer, firstRow, lastRow, firstCol, lastCol, rowIndex, cellIndex,
vMergedCells );
}
cellIndex = nextCellIndex;
firstCol = false;
}
else if ( o instanceof CTSdtCell )
{
// Fix bug of POI
CTSdtCell sdtCell = (CTSdtCell) o;
List<CTTc> tcList = sdtCell.getSdtContent().getTcList();
for ( CTTc ctTc : tcList )
{
XWPFTableCell cell = new XWPFTableCell( ctTc, row, row.getTable().getBody() );
int nextCellIndex = getCellIndex( cellIndex, cell );
lastCol = ( nextCellIndex == nbColumns );
List<XWPFTableCell> rowCells = row.getTableCells();
if ( !rowCells.contains( cell ) )
{
rowCells.add( cell );
}
// 在这里对单元格进行垂直合并的判断
// 如果存在垂直合并的单元格,会一起放入 container 中
System.out.println("hasMerged2");
vMergedCells = getVMergedCells( cell, rowIndex, cellIndex, nbColumns );
if (vMergedCells != null) {
for (XWPFTableCell vMergedCell : vMergedCells) {
System.out.println(" vMerged>" + vMergedCell.getText());
}
} else {
System.out.println(" >" + cell.getText());
}
if ( vMergedCells == null || vMergedCells.size() > 0 )
{
lastRow = isLastRow( lastRowIfNoneVMerge, rowIndex, rowsSize, vMergedCells );
visitCell( cell, tableContainer, firstRow, lastRow, firstCol, lastCol, rowIndex, cellIndex,
vMergedCells );
}
cellIndex = nextCellIndex;
firstCol = false;
}
}
}
c.dispose();
}
else
{
// Column number is equal to cells number.
for ( int i = 0; i < cells.size(); i++ )
{
lastCol = ( i == cells.size() - 1 );
XWPFTableCell cell = cells.get( i );
// 在这里对单元格进行垂直合并的判断
// 如果存在垂直合并的单元格,会一起放入 container 中
System.out.println("noMerged");
vMergedCells = getVMergedCells( cell, rowIndex, i, nbColumns );
if (vMergedCells != null) {
for (XWPFTableCell vMergedCell : vMergedCells) {
System.out.println(" vMerged>" + vMergedCell.getText());
}
} else {
System.out.println(" >" + cell.getText());
}
if ( vMergedCells == null || vMergedCells.size() > 0 )
{
lastRow = isLastRow( lastRowIfNoneVMerge, rowIndex, rowsSize, vMergedCells );
visitCell( cell, tableContainer, firstRow, lastRow, firstCol, lastCol, rowIndex, i, vMergedCells );
}
firstCol = false;
}
}
endVisitTableRow( row, tableContainer, firstRow, lastRow, headerRow );
}
在这段源码中,会遍历表格里的单元格,并判断其是否被合并,将合并的单元格放在一起,等待后面将其加入 PDFTable 中时进行渲染
而问题就出现在 getVMergedCells 这个方法中
当其遍历首行到 A2 时,可以看到代码里进入这个方法后是知道有被合并的单元格的,但是在里面却没有找到与其一起被垂直合并的单元格
// fr.opensagres.poi.xwpf.converter.core.XWPFDocumentVisitor#getVMergedCells
private List<XWPFTableCell> getVMergedCells( XWPFTableCell cell, int rowIndex, int cellIndex, int nbColumns )
{
List<XWPFTableCell> vMergedCells = null;
STMerge.Enum vMerge = stylesDocument.getTableCellVMerge( cell );
if ( vMerge != null )
{
if ( vMerge.equals( STMerge.RESTART ) )
{
// vMerge="restart"
// Loop for each table cell of each row upon vMerge="restart" was found or cell without vMerge
// was declared.
vMergedCells = new ArrayList<XWPFTableCell>();
vMergedCells.add( cell );
XWPFTableRow row = null;
XWPFTableCell c;
XWPFTable table = cell.getTableRow().getTable();
// 关键代码在这里
// 根据已知的合并单元格,去查找下一行对应的 cellIndex 的单元格是否属于合并时,得到了错误的值
for ( int i = rowIndex + 1; i < table.getRows().size(); i++ )
{
row = table.getRow( i );
c = row.getCell( cellIndex );
if (c == null) {
break;
}
vMerge = stylesDocument.getTableCellVMerge( c );
if ( vMerge != null && vMerge.equals( STMerge.CONTINUE ) )
{
vMergedCells.add( c );
}
else
{
return vMergedCells;
}
}
}
else
{
// vMerge="continue", ignore the cell because it was already processed
return Collections.emptyList();
}
}
return vMergedCells;
}
关键在于这段源码中的 for 循环,在里面它尝试查找与第一个垂直合并单元格相同下标的后面行的单元格,判断它们是否也是被合并的单元格,然后将它们纳入 container
但问题在于,A1所在行有3个单元格,并且其列数也是3个;而B1所在行有2个单元格,而列数是3个
这是因为我们将 A1 进行水平拆分后,B1 及其下面的单元格实际上也被拆分了,并且自动做了合并,这是 XWPFDocument 的底层原理
所以实际上,B1 的单元格数量也是 3 个,第二个和第三个分别和 B1、A2 做了合并
而在源码中,c = row.getCell( cellIndex );
这一步,试图查找 B1 所在行与 A2 相对应的下标(2)的单元格
但是,对 B1 行进行 row.getTableCells()
发现只有两个,这就导致 c == null
,最终没有找到与 A2 一起被垂直合并的单元格 B2
处理方法
当 c == null 时,判断该行是否存在被合并的单元格——即列数是否大于单元格个数,如果有则尝试获取该行第 ( cellIndex - ( 列数 - 单元格个数) )个单元格,即尝试补偿差值来获取被垂直合并的单元格
// 修改后的 getVMergedCells 源码
private List<XWPFTableCell> getVMergedCells( XWPFTableCell cell, int rowIndex, int cellIndex, int nbColumns )
{
List<XWPFTableCell> vMergedCells = null;
STMerge.Enum vMerge = stylesDocument.getTableCellVMerge( cell );
if ( vMerge != null )
{
if ( vMerge.equals( STMerge.RESTART ) )
{
// vMerge="restart"
// Loop for each table cell of each row upon vMerge="restart" was found or cell without vMerge
// was declared.
vMergedCells = new ArrayList<XWPFTableCell>();
vMergedCells.add( cell );
XWPFTableRow row = null;
XWPFTableCell c;
XWPFTable table = cell.getTableRow().getTable();
for ( int i = rowIndex + 1; i < table.getRows().size(); i++ )
{
row = table.getRow( i );
c = row.getCell( cellIndex );
if ( c == null )
{
// 关键在这里
if (nbColumns > row.getTableCells().size()) {
// 列长和单元格数不一致,有单元格被合并了
int tempCellIdx = cellIndex - (nbColumns - row.getTableCells().size());
c = row.getCell( tempCellIdx );
} else {
break;
}
}
if (c == null) {
break;
}
vMerge = stylesDocument.getTableCellVMerge( c );
if ( vMerge != null && vMerge.equals( STMerge.CONTINUE ) )
{
vMergedCells.add( c );
}
else
{
return vMergedCells;
}
}
}
else
{
// vMerge="continue", ignore the cell because it was already processed
return Collections.emptyList();
}
}
return vMergedCells;
}
经测试,可修复单元格错乱问题
完美