easyexcel解析excel文件原理解析

easyexcel是对poi基于事件机制解析excel文件的封装,poi又对sax解析xml文件进行了封装,所以要理解easyexcel解析的过程,就要先从sax开始。

  • sax解析xml文件
    使用sax解析xml文件时只要注册ContentHandler就行,sax是按行解析的,解析时会调用ContentHandler接口的钩子函数。
        SAXParserFactory factory = SAXParserFactory.newInstance();
        SAXParser saxParser = factory.newSAXParser();
        XMLReader xmlReader = saxParser.getXMLReader();
        xmlReader.setContentHandler(new DefaultHandler(){
            @Override
            public void startElement(String uri, String localName, String qName, Attributes attributes){
                log.info("startElement-->>uri:{},localName:{},qName:{}",uri, localName,qName);
            }
            @Override
            public void endElement(String uri, String localName, String qName){
                log.info("endElement-->>uri:{},localName:{},qName:{}",uri, localName,qName);
            }
            @Override
            public void characters(char[] ch, int start, int length){
                log.info("characters-->>ch:{}, start:{}, length:{}", new String(ch,start,length), start, length);
            }
        });
        InputStream inputStream = new FileInputStream(Var.xmlFile);
        xmlReader.parse(new InputSource(inputStream));
        inputStream.close();
  • poi解析excel文件
    poi基于事件解析excel文件是对sax技术的封装,打开excel文件后,获取到所有sheet页,然后再类似xml逐行解析。
        OPCPackage opcPackage = OPCPackage.open(Var.excelFile, PackageAccess.READ);
        XSSFReader reader = new XSSFReader(opcPackage);
        XMLReader xmlReader = XMLReaderFactory.createXMLReader();
        xmlReader.setContentHandler(new SheetHandler());
        Iterator<InputStream> sheetsData = reader.getSheetsData();
        while (sheetsData.hasNext()){
            InputStream next = sheetsData.next();
            InputSource inputSource = new InputSource(next);
            xmlReader.parse(inputSource);
            next.close();
        }

在poi解析大excel文件的基础上出现了easyexcel,先看下easyexcel的用法,真的很easy啊,调静态方法read,传文件路径就行,获取到每行数据时,会帮我们封装成实体类,我们在invoke中进行处理即可。

EasyExcel.read(Var.excelFile, WfqData.class,
                    new AnalysisEventListener<WfqData>() {
                        @Override
                        public void invoke(WfqData data, AnalysisContext analysisContext) {
                            // 按行读文件
                        }

                        @Override
                        public void doAfterAllAnalysed(AnalysisContext analysisContext) {
                            log.info("处理文件结束");
                        }
                    }).headRowNumber(2)// 表头第二行,第一行是免责申明
                    .doReadAll();

接下来开始看easyexcel的源码。read()方法中会构造ReadWorkbook对象,然后设置excel文件的File对象,注册读监听器。

    public static ExcelReaderBuilder read(String pathName, Class head, ReadListener readListener) {
        ExcelReaderBuilder excelReaderBuilder = new ExcelReaderBuilder();
        excelReaderBuilder.file(pathName);
        if (head != null) {
            excelReaderBuilder.head(head);
        }
        if (readListener != null) {
            excelReaderBuilder.registerReadListener(readListener);
        }
        return excelReaderBuilder;
    }

然后构造ExcelReader对象用来读取excel文件。

    public void doReadAll() {
        ExcelReader excelReader = build();
        excelReader.readAll();
        excelReader.finish();
    }

接下来根据excel文件类型确定解析器,

private void choiceExcelExecutor(ReadWorkbook readWorkbook) throws Exception {
        ExcelTypeEnum excelType = ExcelTypeEnum.valueOf(readWorkbook);
        switch (excelType) {
            case XLS:
                ......
                break;
            case XLSX:
                XlsxReadContext xlsxReadContext = new DefaultXlsxReadContext(readWorkbook, ExcelTypeEnum.XLSX);
                analysisContext = xlsxReadContext;
                excelReadExecutor = new XlsxSaxAnalyser(xlsxReadContext, null);
                break;
            default:
                break;
        }
    }

然后设置一堆的配置,在解析的时候会用到,这些配置都保存在DefaultXlsxReadContext中,使用默认的解析事件处理器DefaultAnalysisEventProcessor,在解析excel的过程中会通过这个事件处理器调用我们注册进来的监听器方法。

    public AnalysisContextImpl(ReadWorkbook readWorkbook, ExcelTypeEnum actualExcelType) {
        if (readWorkbook == null) {
            throw new IllegalArgumentException("Workbook argument cannot be null");
        }
        switch (actualExcelType) {
            case XLS:
                readWorkbookHolder = new XlsReadWorkbookHolder(readWorkbook);
                break;
            case XLSX:
                readWorkbookHolder = new XlsxReadWorkbookHolder(readWorkbook);
                break;
            default:
                break;
        }
        currentReadHolder = readWorkbookHolder;
        analysisEventProcessor = new DefaultAnalysisEventProcessor();
        if (LOGGER.isDebugEnabled()) {
            LOGGER.debug("Initialization 'AnalysisContextImpl' complete");
        }
    }

解析前最重要的一环就是打开excel文档OPCPackage.open(xlsxReadWorkbookHolder.getFile());,这里就是我们熟悉的poi解析excel文件做的事情,然后获取每个要解析的sheet页,使用ReadSheet包装。

    public XlsxSaxAnalyser(XlsxReadContext xlsxReadContext, InputStream decryptedStream) throws Exception {
        this.xlsxReadContext = xlsxReadContext;
        // Initialize cache
        XlsxReadWorkbookHolder xlsxReadWorkbookHolder = xlsxReadContext.xlsxReadWorkbookHolder();

        OPCPackage pkg = readOpcPackage(xlsxReadWorkbookHolder, decryptedStream);
        xlsxReadWorkbookHolder.setOpcPackage(pkg);

        ArrayList<PackagePart> packageParts = pkg.getPartsByContentType(XSSFRelation.SHARED_STRINGS.getContentType());

        if (!CollectionUtils.isEmpty(packageParts)) {
            PackagePart sharedStringsTablePackagePart = packageParts.get(0);

            // Specify default cache
            defaultReadCache(xlsxReadWorkbookHolder, sharedStringsTablePackagePart);

            // Analysis sharedStringsTable.xml
            analysisSharedStringsTable(sharedStringsTablePackagePart.getInputStream(), xlsxReadWorkbookHolder);
        }

        XSSFReader xssfReader = new XSSFReader(pkg);
        analysisUse1904WindowDate(xssfReader, xlsxReadWorkbookHolder);

        xlsxReadWorkbookHolder.setStylesTable(xssfReader.getStylesTable());
        sheetList = new ArrayList<ReadSheet>();
        sheetMap = new HashMap<Integer, InputStream>();
        commentsTableMap = new HashMap<Integer, CommentsTable>();
        XSSFReader.SheetIterator ite = (XSSFReader.SheetIterator)xssfReader.getSheetsData();
        int index = 0;
        if (!ite.hasNext()) {
            throw new ExcelAnalysisException("Can not find any sheet!");
        }
        while (ite.hasNext()) {
            InputStream inputStream = ite.next();
            sheetList.add(new ReadSheet(index, ite.getSheetName()));
            sheetMap.put(index, inputStream);
            if (xlsxReadContext.readWorkbookHolder().getExtraReadSet().contains(CellExtraTypeEnum.COMMENT)) {
                CommentsTable commentsTable = ite.getSheetComments();
                if (null != commentsTable) {
                    commentsTableMap.put(index, commentsTable);
                }
            }
            index++;
        }
    }

解析是在XlsxSaxAnalyser#execute方法中进行的,依次遍历获取的每个sheet页,保存下来后进行解析,就是poi解析excel文件的过程了。

    public void execute() {
        for (ReadSheet readSheet : sheetList) {
            readSheet = SheetUtils.match(readSheet, xlsxReadContext);
            if (readSheet != null) {
                xlsxReadContext.currentSheet(readSheet);
                parseXmlSource(sheetMap.get(readSheet.getSheetNo()), new XlsxRowHandler(xlsxReadContext));
                // Read comments
                readComments(readSheet);
                // The last sheet is read
                xlsxReadContext.analysisEventProcessor().endSheet(xlsxReadContext);
            }
        }
    }

        private void parseXmlSource(InputStream inputStream, ContentHandler handler) {
        InputSource inputSource = new InputSource(inputStream);
        try {
            SAXParserFactory saxFactory;
            String xlsxSAXParserFactoryName = xlsxReadContext.xlsxReadWorkbookHolder().getSaxParserFactoryName();
            if (StringUtils.isEmpty(xlsxSAXParserFactoryName)) {
                saxFactory = SAXParserFactory.newInstance();
            } else {
                saxFactory = SAXParserFactory.newInstance(xlsxSAXParserFactoryName, null);
            }
            ...
            SAXParser saxParser = saxFactory.newSAXParser();
            XMLReader xmlReader = saxParser.getXMLReader();
            xmlReader.setContentHandler(handler);
            xmlReader.parse(inputSource);
            inputStream.close();
        }
    }

每次解析时都会回调XlsxRowHandler中的startElement、characters、endElement方法。
RowTagHandler#startElement()方法中会先获取当前要解析的行,然后与之前解析到的行进行对比,防止遗漏。

    public void startElement(XlsxReadContext xlsxReadContext, String name, Attributes attributes) {
        XlsxReadSheetHolder xlsxReadSheetHolder = xlsxReadContext.xlsxReadSheetHolder();
        int rowIndex = PositionUtils.getRowByRowTagt(attributes.getValue(ExcelXmlConstants.ATTRIBUTE_R),
            xlsxReadSheetHolder.getRowIndex());
        Integer lastRowIndex = xlsxReadContext.readSheetHolder().getRowIndex();
        while (lastRowIndex + 1 < rowIndex) {
            xlsxReadContext.readRowHolder(new ReadRowHolder(lastRowIndex + 1, RowTypeEnum.EMPTY,
                xlsxReadSheetHolder.getGlobalConfiguration(), new LinkedHashMap<Integer, Cell>()));
            xlsxReadContext.analysisEventProcessor().endRow(xlsxReadContext);
            xlsxReadSheetHolder.setColumnIndex(null);
            xlsxReadSheetHolder.setCellMap(new LinkedHashMap<Integer, Cell>());
            lastRowIndex++;
        }
        xlsxReadSheetHolder.setRowIndex(rowIndex);
    }

解析完一行后,在AbstractCellValueTagHandler#endElement中会将当前行的列数和对应的值保存到cellMap(private Map<Integer, Cell> cellMap;)中。

public void endElement(XlsxReadContext xlsxReadContext, String name) {
        XlsxReadSheetHolder xlsxReadSheetHolder = xlsxReadContext.xlsxReadSheetHolder();
        CellData tempCellData = xlsxReadSheetHolder.getTempCellData();
        StringBuilder tempData = xlsxReadSheetHolder.getTempData();
        CellDataTypeEnum oldType = tempCellData.getType();
        switch (oldType) {
            case DIRECT_STRING:
            case STRING:
            case ERROR:
                tempCellData.setStringValue(tempData.toString());
                break;
            case BOOLEAN:
                tempCellData.setBooleanValue(BooleanUtils.valueOf(tempData.toString()));
                break;
            case NUMBER:
            case EMPTY:
                tempCellData.setType(CellDataTypeEnum.NUMBER);
                tempCellData.setNumberValue(new BigDecimal(tempData.toString()));
                break;
            default:
                throw new IllegalStateException("Cannot set values now");
        }

        // set string value
        setStringValue(xlsxReadContext);

        if (tempCellData.getStringValue() != null
            && xlsxReadContext.currentReadHolder().globalConfiguration().getAutoTrim()) {
            tempCellData.setStringValue(tempCellData.getStringValue());
        }

        tempCellData.checkEmpty();
        xlsxReadSheetHolder.getCellMap().put(xlsxReadSheetHolder.getColumnIndex(), tempCellData);
    }

然后在RowTagHandler#endElement中调用xlsxReadContext.analysisEventProcessor().endRow(xlsxReadContext);方法进行处理,这里会调用DefaultAnalysisEventProcessor#dealData处理每行的实际数据,其实就是调用读监听器的invoke方法,也就是我们在EasyExcel.read中传的第三个参数。

    private void dealData(AnalysisContext analysisContext) {
        ReadRowHolder readRowHolder = analysisContext.readRowHolder();
        Map<Integer, CellData> cellDataMap = (Map)readRowHolder.getCellMap();
        readRowHolder.setCurrentRowAnalysisResult(cellDataMap);
        int rowIndex = readRowHolder.getRowIndex();
        int currentHeadRowNumber = analysisContext.readSheetHolder().getHeadRowNumber();

        boolean isData = rowIndex >= currentHeadRowNumber;

        // Last head column
        if (!isData && currentHeadRowNumber == rowIndex + 1) {
            buildHead(analysisContext, cellDataMap);
        }
        // Now is data
        for (ReadListener readListener : analysisContext.currentReadHolder().readListenerList()) {
            try {
                if (isData) {
                    readListener.invoke(readRowHolder.getCurrentRowAnalysisResult(), analysisContext);
                } else {
                    readListener.invokeHead(cellDataMap, analysisContext);
                }
            } catch (Exception e) {
                onException(analysisContext, e);
                break;
            }
            if (!readListener.hasNext(analysisContext)) {
                throw new ExcelAnalysisStopException();
            }
        }
    }

完成一个sheet页的解析,就会调用每个监听的doAfterAllAnalysed()方法。

    public void endSheet(AnalysisContext analysisContext) {
        for (ReadListener readListener : analysisContext.currentReadHolder().readListenerList()) {
            readListener.doAfterAllAnalysed(analysisContext);
        }
    }

至此完成了对excel文件的解析。
总结下,easyexcel解析excel文件,先初始化用来解析excel文件用到的参数,比如文件路径、表头行、行数据class等,然后使用poi基于事件的机制进行解析。获取到数据后会把列索引和cell数据保存到map,然后调用每个监听器的invoke方法进行处理。有不对的地方请大神指出,欢迎大家一起讨论交流,共同进步。

  • 6
    点赞
  • 14
    收藏
    觉得还不错? 一键收藏
  • 0
    评论

“相关推荐”对你有帮助么?

  • 非常没帮助
  • 没帮助
  • 一般
  • 有帮助
  • 非常有帮助
提交
评论
添加红包

请填写红包祝福语或标题

红包个数最小为10个

红包金额最低5元

当前余额3.43前往充值 >
需支付:10.00
成就一亿技术人!
领取后你会自动成为博主和红包主的粉丝 规则
hope_wisdom
发出的红包
实付
使用余额支付
点击重新获取
扫码支付
钱包余额 0

抵扣说明:

1.余额是钱包充值的虚拟货币,按照1:1的比例进行支付金额的抵扣。
2.余额无法直接购买下载,可以购买VIP、付费专栏及课程。

余额充值