java解析大数据量xlsx格式excel

最新推荐文章于 2024-11-07 16:19:31 发布

江湖行骗老中医

最新推荐文章于 2024-11-07 16:19:31 发布

阅读量3.7k

点赞数 3

分类专栏： # Java基础知识

原文链接：https://blog.csdn.net/qq_28603127/article/details/102872170

版权

Java基础知识专栏收录该内容

46 篇文章 0 订阅

订阅专栏

百度了好久关于解析excel的内容都找不到自己想要的东西,所以希望跟我有一样需求的人,能够因为这篇文章少走弯路.
　　
　　excel有两种格式,一种xls格式(97),一种xlsx格式(07). 提到excel API可能首先想到的是POI,使用POI能够读写所有的excel,但是POI针对于每种格式的excel分了好几种模式,UserModel EventModel UserEventModel

　本人使用usermodel解析excel,10M大小的文件,就会OOM,因为usermodel模式是一次性加载所有的内容到内存,每个Cell就是一个对象,导致内存被撑爆.对于内存要求比较高或者说是文件比较大(10M也算大?)的场景,应该使用EventModel或者UserEventModel的API(我没研究,官网给的东西比较难看).这两种API比较复杂.
　　
　　因为本人的需求是解析xlsx格式的excel,针对于读取/解析excel的需求,可不必依赖POI , Streaming-Reader是对于POI的再次封装,并且简单易懂,虽然只能针对于xlsx格式的excel读取/解析(xls跟xlsx底层本质是不一样的),但是针对于这种需求,可以完美的解决掉爆内存的问题.
　　
　　解决内存问题的关键就是使用流式处理,读取一批数据解析完后就释放,再进行下一批,Streaming-Reader就是使用这种方式.

<dependency>
    <groupId>com.monitorjbl</groupId>
    <artifactId>xlsx-streamer</artifactId>
    <version>2.0.0</version>
</dependency>

下面是示例代码

import com.monitorjbl.xlsx.StreamingReader;
import org.apache.poi.ss.usermodel.*;
import java.io.File;
import java.io.FileInputStream;

public class Test {
    public static void main(String[] args) throws Exception {
    
        FileInputStream in = new FileInputStream(new File("src/190917_MEAC Aug Database - v1.xlsx"));
        Workbook open = StreamingReader.builder()
                .rowCacheSize(100)//一次读取多少行(默认是10行)
                .bufferSize(1024)//使用的缓冲大小(默认1024)
                .open(in);
        for (Sheet sheet : open) {
            for (Row row : sheet) {
                for (Cell cell : row) {

                }
            }
        }
    }
}

基本逻辑就是这样子,是不是很简单? 可以根据sheet的名字取出特定sheet,但是无法指定具体的rownumber来获取行,因为这是基于流的方式读取excel内容,流的单位是.rowCacheSize(100)指定的行数,我们不可能在当前100行的流中,去指定第101行的Row.至于其他的API 可以自行翻看源码,源码中的内容通俗易懂.

下面贴出来的是我在解析中的一个实例.


public class DataImport {

    public static void main(String[] args) throws FileNotFoundException, SQLException {
        DataImportJdbc dataImportJdbc = new DataImportJdbc();
        HashMap<String, String> excDBMapping = dataImportJdbc.readMapping("smme");
        Map<String, Integer> fieldsIndexMapping = new HashMap<String, Integer>();
        FileInputStream in = new FileInputStream(new File("src/190917_MEAC Aug Database - v1.xlsx"));
        Workbook open = StreamingReader.builder()
                .rowCacheSize(100)
                .bufferSize(1024)
                .open(in);
        HashMap<String, Integer> monthMapping = SMMEFields.monthMapping;
        HashMap<String, Integer> realMonthMapping = new HashMap<>();
        StreamingSheet sheet = (StreamingSheet) open.getSheet("Raw");
        for (Row row : sheet) {
            if (row.getRowNum() == 0) {
                for (Cell cell : row) {
                    String stringCellValue = cell.getStringCellValue().toLowerCase();
                    if (excDBMapping.containsKey(stringCellValue)) {
                        fieldsIndexMapping.put(excDBMapping.get(stringCellValue), cell.getColumnIndex());
                    } else if (monthMapping.containsKey(stringCellValue)) {
                        if("year".equals(stringCellValue)) {
                            realMonthMapping.put(stringCellValue, cell.getColumnIndex());
                        }else{
                            String month = MonthMatch.SMMEMonthMatch(stringCellValue);
                            realMonthMapping.put(month, cell.getColumnIndex());
                        }
                    }
                }
                System.out.println(realMonthMapping);
            } else {
                Integer countryIndex = fieldsIndexMapping.get("country");
                String country = row.getCell(countryIndex).getStringCellValue();
                if (SMMEFields.countryList.contains(country)) {
                    System.out.println(country);
                    continue;
                }
                HashMap<String, String> stringStringHashMap = new HashMap<String, String>();
                for (String key : fieldsIndexMapping.keySet()) {
                    Cell cell = row.getCell(fieldsIndexMapping.get(key));
                    String stringCellValue = cell.getStringCellValue();
                    stringStringHashMap.put(key, stringCellValue);
                }
                for (String key : realMonthMapping.keySet()) {
                    if (!"year".equals(key) && realMonthMapping.get(key) != null) {
                        Integer integer = realMonthMapping.get(key);
                        Cell cell = row.getCell(integer);
                        String volume = cell.getStringCellValue();
                        if (volume.trim().isEmpty() || volume.length() == 0 || volume.equals("- 0")) {
                            volume = "0";
                        } else if (volume.contains(",")) {
                            volume = volume.replace(",", "");
                        }
                        String year = row.getCell(realMonthMapping.get("year")).getStringCellValue();
                        String yearMonth = year + key;
                        String time = DateUtils.getStringTime();
                        OneData oneData = new OneData();
                        oneData.setMap(stringStringHashMap);
                        oneData.setTime(time);
                        oneData.setYearmonth(yearMonth);
                        oneData.setVolume(volume);
                        String id = DigestUtils.md5DigestAsHex(oneData.toString().getBytes());
                        oneData.setId(id);
                        dataImportJdbc.upsertData(oneData);
                    }
                }
            }
        }
    }
}

不用揣想上面的代码中我要做什么,只需要看基本的逻辑,以及API的使用就好.

如果使用中出现异常

Exception in thread "main" java.lang.UnsupportedOperationException
	at com.monitorjbl.xlsx.impl.StreamingSheet.getFirstRowNum(StreamingSheet.java:118)
	at com.parsh.Test.main(Test.java:21)

那是因为源码方法中没有逻辑内容,直接通过Throw来抛出了这个异常.看到这个异常就是说明,有这个方法,但是功能没实现,请不要使用这个方法的意思.