java csv大文本批量导入及文件拆分

最新推荐文章于 2024-07-22 20:46:18 发布

小柴_

最新推荐文章于 2024-07-22 20:46:18 发布

阅读量1.3k

点赞数 1

分类专栏：后端

本文链接：https://blog.csdn.net/weixin_43687353/article/details/107064050

版权

后端专栏收录该内容

63 篇文章 3 订阅

订阅专栏

生成大文件进行测试

   /**
     * 创建大文件
     * @throws IOException
     */
    public static void createBigFile() throws IOException {
        File file = new File("F:\\data\\big_file");
        FileWriter fileWriter = new FileWriter(file);
        BufferedWriter bufferedWriter = new BufferedWriter(fileWriter);
        String str = "aaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaa123,xxxxxxxxxxxxxxxxxx123";
        for (int i = 0; i < 10000000; i++) {
            bufferedWriter.write(str);
            bufferedWriter.newLine();
        }
        bufferedWriter.flush();
        bufferedWriter.close();
    }

分次批量导入CSV文件

分次持久化, 避免内存溢出
注意: List list = new ArrayList<>(1024); 可以改成自己的业务实体, 设置属性

   /**
     * 导入文件
     * @param filePath 读取的文件路径
     * @param size 读取多少条持久化一次
     */
    public void importFile(String filePath, Integer size) throws IOException, ParseException {
        InputStreamReader isr = new InputStreamReader(new FileInputStream(filePath), "UTF-8");
        BufferedReader br = new BufferedReader(isr,5*1024*1024);// 用5M的缓冲读取文本文件
        List<String> list = new ArrayList<>(1024);
        int count = 0;
        String line = null;
        //跳过标题
        String title = br.readLine();
        //读取文本
        while ((line = br.readLine()) != null) {
            //一行数据
            String[] arr = line.split(",");
            //组装实体数据
            list.add(arr[0]);
            if (list.size() >= size){
                count++;
                //调用批量插入
                //service.batchInsert(list);
                log.info("数据列表长度: " + list.size());
                log.info("持久化: " + count + "次");
                list.clear();//清空list, 释放引用对象, 避免内存溢出
            }
        }
        if (list.size() > 0){
            count++;
            //调用批量插入
            //service.batchInsert(list);
            log.info("数据列表长度: " + list.size());
            log.info("持久化最后一次: " + count);
            //list.clear();
        }
    }

文件拆分

注意: 此拆分不是等份的, 需要自己调整一下

思路

思路：给定带拆分数量，计算出每个文件的平均字节数，然后循环文件数进行每个文件的拆分。拆分第一个文件时，根据平均字节数往后取给定的大约行字节数的字节，然后循环字节判断是否为\r或者\n，如果字节为\r或者\n则代表到达行末尾，记录行尾字节位置。知道了开头字节位置与结束字节位置，就可以将此位置之间的数据生成子文件了。继续循环拆分下个文件，基于上个文件记录的结束字节位置继续计算当前文件的结束位置，直到到达拆分文件的数量或者大文件读取完毕。

   /**
     * 拆分大文件
     */
    public static void splitFile(String filePath, int fileCount) throws IOException {
        FileInputStream fis = new FileInputStream(filePath);
        FileChannel inputChannel = fis.getChannel();
        final long fileSize = inputChannel.size();
        long average = fileSize / fileCount;//平均值
        long bufferSize = 200; //缓存块大小，自行调整
        ByteBuffer byteBuffer = ByteBuffer.allocate(Integer.valueOf(bufferSize + "")); // 申请一个缓存区
        long startPosition = 0; //子文件开始位置
        long endPosition = average < bufferSize ? 0 : average - bufferSize;//子文件结束位置
        for (int i = 0; i < fileCount; i++) {
            if (i + 1 != fileCount) {
                int read = inputChannel.read(byteBuffer, endPosition);// 读取数据
                readW:
                while (read != -1) {
                    byteBuffer.flip();//切换读模式
                    byte[] array = byteBuffer.array();
                    for (int j = 0; j < array.length; j++) {
                        byte b = array[j];
                        if (b == 10 || b == 13) { //判断\n\r
                            endPosition += j;
                            break readW;
                        }
                    }
                    endPosition += bufferSize;
                    byteBuffer.clear(); //重置缓存块指针
                    read = inputChannel.read(byteBuffer, endPosition);
                }
            }else{
                endPosition = fileSize; //最后一个文件直接指向文件末尾
            }

            FileOutputStream fos = new FileOutputStream(filePath + (i + 1));
            FileChannel outputChannel = fos.getChannel();
            inputChannel.transferTo(startPosition, endPosition - startPosition, outputChannel);//通道传输文件数据
            outputChannel.close();
            fos.close();
            startPosition = endPosition + 1;
            endPosition += average;
        }
        inputChannel.close();
        fis.close();
    }

测试方法

//    public static void main(String[] args) throws Exception {
//        Scanner scanner = new Scanner(System.in);
//        scanner.nextLine();
//        long startTime = System.currentTimeMillis();
//        splitFile("F:\\data\\big_file",5);
//        long endTime = System.currentTimeMillis();
//        System.out.println("耗费时间： " + (endTime - startTime) + " ms");
//        scanner.nextLine();
//    }

    public static void main(String[] args) throws IOException {
        createBigFile();
    }

参考链接: https://blog.csdn.net/u013632755/article/details/80467324 感谢作者