Aspose.word用法都类似,此处使用aspose for java进行操作
项目需要将word去掉所有的分页符,再进行一级大纲为划分的分页
目标文件状态:
思考逻辑:遍历整个paragraphs节点下run节点,并取得分页符号节点后移除该节点
public Document deletePageBreaker(String fileName) throws Exception{
//获取文件
InputStream inputStream = this.getClass().getClassLoader().getResourceAsStream(fileName + ".docx");
Document doc = new Document(inputStream);
for (Section section : doc.getSections()) {
Body body = section.getBody();
for (Paragraph paragraph : body.getParagraphs()) {
for (Run run : paragraph.getRuns()) {
if("\f".equals(run.getText())){
run.remove();
}
}
}
}
return doc;
}
但是此方法移除节点后会导致在原有的分页符位置中有换行符的残留,因为以文件节点的思路来说,run移除自身,但是原本的父级节点paragraph依旧存在(无内容)会以单个换行符进行占位
InputStream inputStream = this.getClass().getClassLoader().getResourceAsStream(fileName + ".docx");
Document doc = new Document(inputStream);
for (Section section : doc.getSections()) {
Body body = section.getBody();
for (Paragraph paragraph : body.getParagraphs()) {
for (int i = 0; i < paragraph.getRuns().getCount(); i++) {
Run run = paragraph.getRuns().get(i);
if("\f".equals(run.getText())&¶graph.getRuns().getCount()==1){
paragraph.remove();
}
}
}
}
doc.save(HOME + "tee.docx");
所以实际思路应该为移除该父级别paragraph节点
转换后
符合预期