docx4j使用记录

1、本次主要使用docx4j对docx文件进行文本、和表格的数据替换功能。

 

2、替换的参数写法:${xxx}。要进行替换的时候用

paramMap:{xxx:xxxx}

mainDocumentPart.variableReplace(paramMap);

3、如果含有多个table的话,次使用模板替换的功能的话,

  3.1 要知道table在文档中出现的顺序

  3.2 要在List<Map>中保证有对应的key值 

 结果集 [{bs.xm:111,bs.fz:xxx,bs.nmsamt:xxx,bs.ncsamt:xxx},.....]

进行替换时,要调用的方法:

首先要找到table,第二部找到table的模板行,先添加模板行,在进行数据的替换。依次填充即可,最后删除掉模板行,即填充完成。

ps:当要替换的table中要进行一些样式操作,此时我的做法是找到每一个td,进行设置

下面是一些工具方法:

此类是对模板的docx进行模板规范化操作

public class DocxUtils {


    /**
     * 去任意XML标签
     */
    private static final Pattern XML_PATTERN = Pattern.compile("<[^>]*>");



    /**
     * start符号
     */
    private static final char PREFIX = '$';

    /**
     * 中包含
     */
    private static final char LEFT_BRACE = '{';

    /**
     * 结尾
     */
    private static final char RIGHT_BRACE = '}';

    /**
     * 未开始
     */
    private static final int NONE_START = -1;

    /**
     * 未开始
     */
    private static final int NONE_START_INDEX = -1;

    /**
     * 开始
     */
    private static final int PREFIX_STATUS = 1;

    /**
     * 左括号
     */
    private static final int LEFT_BRACE_STATUS = 2;

    /**
     * 右括号
     */
    private static final int RIGHT_BRACE_STATUS = 3;



    /**
     * cleanDocumentPart
     *
     * @param documentPart
     */
    public static boolean cleanDocumentPart(MainDocumentPart documentPart) throws Exception {
        if (documentPart == null) {
            return false;
        }
        Document document = documentPart.getContents();
        String wmlTemplate =
                XmlUtils.marshaltoString(document, true, false, Context.jc);
        document = (Document) XmlUtils.unwrap(doCleanDocumentPart(wmlTemplate, Context.jc));
        documentPart.setContents(document);
        return true;
    }


    /**
     *
     * @param obj
     * @param toSearch
     * @return
     */
    public static List<Object> getAllElementFromObject(Object obj, Class<?> toSearch) {
        List<Object> result = new ArrayList<Object>();
        if (obj instanceof JAXBElement) obj = ((JAXBElement<?>) obj).getValue();

        if (obj.getClass().equals(toSearch))
            result.add(obj);
        else if (obj instanceof ContentAccessor) {
            List<?> children = ((ContentAccessor) obj).getContent();
            for (Object child : children) {
                result.addAll(getAllElementFromObject(child, toSearch));
            }
        }
        return result;
    }

    /**
     * doCleanDocumentPart
     *
     * @param wmlTemplate
     * @param jc
     * @return
     * @throws JAXBException
     */
    private static Object doCleanDocumentPart(String wmlTemplate, JAXBContext jc) throws JAXBException {
        // 进入变量块位置
        int curStatus = NONE_START;
        // 开始位置
        int keyStartIndex = NONE_START_INDEX;
        // 当前位置
        int curIndex = 0;
        char[] textCharacters = wmlTemplate.toCharArray();
        StringBuilder documentBuilder = new StringBuilder(textCharacters.length);
        documentBuilder.append(textCharacters);
        // 新文档
        StringBuilder newDocumentBuilder = new StringBuilder(textCharacters.length);
        // 最后一次写位置
        int lastWriteIndex = 0;
        for (char c : textCharacters) {
            switch (c) {
                case PREFIX:
                    // TODO 不管其何状态直接修改指针,这也意味着变量名称里面不能有PREFIX
                    keyStartIndex = curIndex;
                    curStatus = PREFIX_STATUS;
                    break;
                case LEFT_BRACE:
                    if (curStatus == PREFIX_STATUS) {
                        curStatus = LEFT_BRACE_STATUS;
                    }
                    break;
                case RIGHT_BRACE:
                    if (curStatus == LEFT_BRACE_STATUS) {
                        // 接上之前的字符
                        newDocumentBuilder.append(documentBuilder.substring(lastWriteIndex, keyStartIndex));
                        // 结束位置
                        int keyEndIndex = curIndex + 1;
                        // 替换
                        String rawKey = documentBuilder.substring(keyStartIndex, keyEndIndex);
                        // 干掉多余标签
                        String mappingKey = XML_PATTERN.matcher(rawKey).replaceAll("");
              /*          if (!mappingKey.equals(rawKey)) {
                            char[] rawKeyChars = rawKey.toCharArray();
                            // 保留原格式
                            StringBuilder rawStringBuilder = new StringBuilder(rawKey.length());
                            // 去掉变量引用字符
                            for (char rawChar : rawKeyChars) {
                                if (rawChar == PREFIX || rawChar == LEFT_BRACE || rawChar == RIGHT_BRACE) {
                                    continue;
                                }
                                rawStringBuilder.append(rawChar);
                            }
                            // FIXME 要求变量连在一起
                            String variable = mappingKey.substring(2, mappingKey.length() - 1);
                            int variableStart = rawStringBuilder.indexOf(variable);
                            if (variableStart > 0) {
                                rawStringBuilder = rawStringBuilder.replace(variableStart, variableStart + variable.length(), mappingKey);
                            }
                            newDocumentBuilder.append(rawStringBuilder.toString());
                        } else {*/
                            newDocumentBuilder.append(mappingKey);
                       // }
                        lastWriteIndex = keyEndIndex;

                        curStatus = NONE_START;
                        keyStartIndex = NONE_START_INDEX;
                    }
                default:
                    break;
            }
            curIndex++;
        }
        // 余部
        if (lastWriteIndex < documentBuilder.length()) {
            newDocumentBuilder.append(documentBuilder.substring(lastWriteIndex));
        }
        return XmlUtils.unmarshalString(newDocumentBuilder.toString(), jc);
    }

}

//一下为一些设置样式的方法

//设置样式
public static void fillCellData(Tc tc, String fontFamily, boolean isBold) {
    ObjectFactory factory = Context.getWmlObjectFactory();
    P p = (P) XmlUtils.unwrap(tc.getContent().get(0));
    Text t = null;
    R run = null;
    List<Object> texts = DocxUtils.getAllElementFromObject(p, Text.class);
    List<Object> rs = DocxUtils.getAllElementFromObject(p, R.class);
    if (StringUtil.isNotEmpty(texts)) {
        t = (Text) texts.get(0);
        String value = t.getValue();
        if (GfrTool.GfrIsEmpty(value)) {
            t.setValue("");
        }
    } else {
        return;
    }
    boolean isNewR = false;
    if (StringUtil.isNotEmpty(rs)) {
        run = (R) rs.get(0);
    } else {
        //设置表格内容的对齐方式
        run = factory.createR();
        isNewR = true;
    }
    //设置表给内字体样式
    run.setRPr(getRpr(fontFamily, isBold));
    TcPr tcPr = tc.getTcPr();
    BooleanDefaultTrue bdt = factory.createBooleanDefaultTrue();
    tcPr.setNoWrap(bdt);
    if (isNewR) {
        run.getContent().add(t);
        p.getContent().add(run);
    }
}

/**
 * 设置缩进
 *
 * @param tc
 * @param rowLevel
 */
public static void setIdent(Tc tc, Object rowLevel) {
    ObjectFactory factory = Context.getWmlObjectFactory();
    P p = (P) XmlUtils.unwrap(tc.getContent().get(0));
    PPr pPr = p.getPPr();
    PPrBase.Ind ind = pPr.getInd();
    Boolean isNew = false;
    if (StringUtil.isEmpty(ind)) {
        ind = new PPrBase.Ind();
        isNew = true;
    }
    PPrBase.Spacing spacing = pPr.getSpacing();
    if (StringUtil.isEmpty(spacing)) {
        spacing = new PPrBase.Spacing();
        isNew = true;
    }
    Integer level = Integer.valueOf(StringUtil.nvl(rowLevel, "1"));
    String ident = String.valueOf((level - 1) * 150);
    spacing.setBefore(new BigInteger(ident));
    ind.setFirstLine(new BigInteger(ident));
    ind.setFirstLineChars(new BigInteger(ident));
    if (isNew) {
        pPr.setInd(ind);
        pPr.setSpacing(spacing);
    }
}

/**
 * 设置tc边框
 *
 * @param tc
 * @param dataMap 查询数据
 */
public static void setTcBorder(Tc tc, Map<String, Object> dataMap,Boolean isFirst) {
    if (!GfrTool.valuesNotEmpty(tc, dataMap)) {
        return;
    }
    List<String> bottoms = getBorderData(dataMap,isFirst);
    if (!StringUtil.isNotEmpty(bottoms)) {
        return;
    }
    TcPr tcPr = tc.getTcPr();
    TcPrInner.TcBorders tcBorders = tcPr.getTcBorders();
    Boolean isNew = false;
    if (StringUtil.isEmpty(tcBorders)) {
        tcBorders = new TcPrInner.TcBorders();
        isNew = true;
    }
    CTBorder border = getBorder();
    for (String flag : bottoms) {
        flag = StringUtil.nvl(flag, BORDER_BOTTOM);
        switch (flag) {
            case BORDER_BOTTOM:
                tcBorders.setBottom(border);
                break;
            case BORDER_LEFT:
                tcBorders.setLeft(border);
                break;
            case BORDER_TOP:
                tcBorders.setTop(border);
                break;
            case BORDER_RIGHT:
                tcBorders.setRight(border);
                break;
        }
    }
    if (isNew) {
        tcPr.setTcBorders(tcBorders);
    }
}


/**
 * 得到CTBorder
 *
 * @return
 */
public static CTBorder getBorder() {
    CTBorder ctBorder = new CTBorder();
    ctBorder.setSz(new BigInteger("4"));
    ctBorder.setColor("black");
    //ctBorder.setSpace(new BigInteger("10"));
    ctBorder.setVal(STBorder.SINGLE);
    return ctBorder;

}

/**
 * 获得设置边框的单元格的方向集合
 *
 * @param dataMap
 * @return
 */
private static List<String> getBorderData(Map<String, Object> dataMap,Boolean isFirst) {
    if (StringUtil.isEmpty(dataMap)) {
        return null;
    }
    List<String> vars = new ArrayList<>();
    BORDER_LIST.stream().forEach(v -> {
        Object o = dataMap.get(v);
        if (!StringUtil.isEmpty(o) && Integer.valueOf(o.toString()) == Integer.valueOf(BT.toString())) {
            vars.add(v);
        }
    });
    if(isFirst){
        if(!vars.contains(BORDER_TOP)){
            vars.add(BORDER_TOP);
        }
    }
    return vars;
}


/**
 * 设置样式
 *
 * @param fontFamily
 * @param isBold
 * @return
 */
private static RPr getRpr(String fontFamily, boolean isBold) {
    ObjectFactory factory = Context.getWmlObjectFactory();
    RPr rPr = factory.createRPr();
    RFonts rf = new RFonts();
    rf.setAscii(fontFamily);
    rf.setHAnsi(fontFamily);
    rPr.setRFonts(rf);
    BooleanDefaultTrue bdt = Context.getWmlObjectFactory().createBooleanDefaultTrue();
    rPr.setBCs(bdt);
    if (isBold) {
        rPr.setB(bdt);
    }
    return rPr;
}

/**
 * @Description: 跨列合并
 */
public void mergeCellsHorizontal(Tbl tbl, int row, int fromCell, int toCell) {
    if (row < 0 || fromCell < 0 || toCell < 0) {
        return;
    }
    List<Object> trs = DocxUtils.getAllElementFromObject(tbl,Tr.class);
    if (row > trs.size()) {
        return;
    }
    Tr tr = (Tr) trs.get(row);
    List<Object> tcList = DocxUtils.getAllElementFromObject(tr,Tc.class);
    for (int cellIndex = fromCell, len = Math
            .min(tcList.size() - 1, toCell); cellIndex <= len; cellIndex++) {
        Tc tc = (Tc) tcList.get(cellIndex);
        TcPr tcPr = tc.getTcPr();
        TcPrInner.HMerge hMerge = tcPr.getHMerge();
        if (hMerge == null) {
            hMerge = new TcPrInner.HMerge();
            tcPr.setHMerge(hMerge);
        }
        if (cellIndex == fromCell) {
            hMerge.setVal("restart");
        } else {
            hMerge.setVal("continue");
        }
    }
}

/**
 * @Description: 跨列合并
 */
public static void mergeCellsHorizontal(Tr tr, int fromCell, int toCell) {
    List<Object> tcList = DocxUtils.getAllElementFromObject(tr,Tc.class);
    for (int cellIndex = fromCell, len = Math
            .min(tcList.size() - 1, toCell); cellIndex <= len; cellIndex++) {
        Tc tc = (Tc) tcList.get(cellIndex);
        TcPr tcPr = tc.getTcPr();
        TcPrInner.HMerge hMerge = tcPr.getHMerge();
        if (hMerge == null) {
            hMerge = new TcPrInner.HMerge();
            tcPr.setHMerge(hMerge);
        }
        if (cellIndex == fromCell) {
            hMerge.setVal("restart");
        } else {
            hMerge.setVal("continue");
        }
    }
}


/**
 * @Description: 跨行合并
 */
public static void mergeCellsVertically(Tbl tbl, int col, int fromRow, int toRow) {
    if (col < 0 || fromRow < 0 || toRow < 0) {
        return;
    }
    for (int rowIndex = fromRow; rowIndex <= toRow; rowIndex++) {
        Tc tc = getTc(tbl, rowIndex, col);
        if (tc == null) {
            break;
        }
        TcPr tcPr = tc.getTcPr();
        TcPrInner.VMerge vMerge = tcPr.getVMerge();
        if (vMerge == null) {
            vMerge = new TcPrInner.VMerge();
            tcPr.setVMerge(vMerge);
        }
        if (rowIndex == fromRow) {
            vMerge.setVal("restart");
        } else {
            vMerge.setVal("continue");
        }
    }
}


/**
 * @Description:得到指定位置的表格
 */
public static Tc getTc(Tbl tbl, int row, int cell) {
    if (row < 0 || cell < 0) {
        return null;
    }
    List<Object> trList = DocxUtils.getAllElementFromObject(tbl,Tr.class);
    if (row >= trList.size()) {
        return null;
    }
    List<Object> tcList = DocxUtils.getAllElementFromObject((Tr)trList.get(row),Tc.class);
    if (cell >= tcList.size()) {
        return null;
    }
    return (Tc)tcList.get(cell);
}

用到此处,基本已经满足本人的需求,因此目前只用到此处,未做更深入的研究 

 

 

 

  • 1
    点赞
  • 2
    收藏
    觉得还不错? 一键收藏
  • 0
    评论
可以使用Python中的docx库来读取doc文档,然后将段落和表格中的内容提取出来,最后将它们输出到txt文档中。对于合并的单元格,可以使用table.cell(row, col).column_span和table.cell(row, col).row_span来判断该单元格是否为合并单元格,如果是,则记录其对应的起始行和列,跳过后续的单元格输出。如果不是,则将该单元格的内容输出到txt文档中。 下面是一个可以实现上述功能的代码示例: ``` from docx import Document # 读取doc文档 doc = Document('example.docx') # 创建txt文档 with open('output.txt', 'w', encoding='utf-8') as f: # 提取段落内容 for para in doc.paragraphs: f.write(para.text + '\n\n') # 提取表格内容 for table in doc.tables: merged_cells = set() # 记录合并单元格的起始行和列 for i, row in enumerate(table.rows): for j, cell in enumerate(row.cells): if (i, j) in merged_cells: # 如果该单元格是合并单元格的一部分,则跳过 continue elif cell.column_span > 1 or cell.row_span > 1: # 如果该单元格是合并单元格,则记录其起始行和列 for k in range(i, i+cell.row_span): merged_cells.add((k, j)) for k in range(j, j+cell.column_span): merged_cells.add((i, k)) f.write(cell.text + '\t') else: # 如果该单元格不是合并单元格,则将其内容输出到txt文档中 f.write(cell.text + '\t') f.write('\n') ``` 注意,上述代码中的`example.docx`为待提取内容的doc文档名,`output.txt`为输出的txt文档名,需要根据实际情况进行修改。
评论
添加红包

请填写红包祝福语或标题

红包个数最小为10个

红包金额最低5元

当前余额3.43前往充值 >
需支付:10.00
成就一亿技术人!
领取后你会自动成为博主和红包主的粉丝 规则
hope_wisdom
发出的红包
实付
使用余额支付
点击重新获取
扫码支付
钱包余额 0

抵扣说明:

1.余额是钱包充值的虚拟货币,按照1:1的比例进行支付金额的抵扣。
2.余额无法直接购买下载,可以购买VIP、付费专栏及课程。

余额充值