POI如何揭开excel文件的神秘面纱（POI判断excel文件格式源码解析）

最新推荐文章于 2024-08-01 17:26:45 发布

春秋战国程序猿

最新推荐文章于 2024-08-01 17:26:45 发布

阅读量1w

点赞数 2

分类专栏： easyexcel spring cloud整合easyexcel

本文链接：https://blog.csdn.net/reggergdsg/article/details/104431901

版权

easyexcel 同时被 2 个专栏收录

3 篇文章 0 订阅

订阅专栏

spring cloud整合easyexcel

2 篇文章 0 订阅

订阅专栏

要搞清楚POI如何判断excel文件格式，我们要先搞清楚FileMagic。

FileMagic是什么呢？看官方解释：

The file magic number, i.e. the file identification based on the first bytes of the file

文件魔法数字，即基于文件第一个字节的文件标识。说白了，excel文件的格式，是由文件的第一个字节的值来决定的。当然，这个字节我们看不到，但是应用程序API能“看到”。

这个问题，我们在平常使用过程中，经常遇到。明明我的excel文件是.xls结尾的，为什么poi提示我：Convert excel format exception.You can try specifying the 'excelType' yourself

原因是：poi是根据FileMagic来判断excel格式，而不是根据后缀名来判断excel格式。

OK，接下来我们就来看看具体的判断逻辑吧。

第一步：获取文件的FileMagic值

FileMagic类提供了2种方式来获取FileMagic值：

1，从字节流中获取FileMagic值

    // 字节流判断逻辑
    public static FileMagic valueOf(byte[] magic) {
        for (FileMagic fm : values()) {
            int i=0;
            boolean found = true;
            for (byte[] ma : fm.magic) {
                for (byte m : ma) {
                    byte d = magic[i++];
                    // 看字节流中是否包含指定的二进制数（暗号）
                    if (!(d == m || (m == 0x70 && (d == 0x10 || d == 0x20 || d == 0x40)))) {
                        found = false;
                        break;
                    }
                }

                // 如果包含指定的二进制数（暗号），则返回
                if (found) {
                    return fm;
                }
            }
        }

        // 如果不包含指定的二进制数（暗号），则返回UNKNOWN，说明这个文件无法获取到FileMagic
        return UNKNOWN;
    }

2，从输入流中获取FileMagic值

    /**
     * 从文件输入流中获取FileMagic
     * 
     * Get the file magic of the supplied InputStream (which MUST
     *  support mark and reset).<p>
     *
     * If unsure if your InputStream does support mark / reset,
     *  use {@link #prepareToCheckMagic(InputStream)} to wrap it and make
     *  sure to always use that, and not the original!<p>
     *
     * Even if this method returns {@link FileMagic#UNKNOWN} it could potentially mean,
     *  that the ZIP stream has leading junk bytes
     *
     * @param inp An InputStream which supports either mark/reset
     */
    public static FileMagic valueOf(InputStream inp) throws IOException {
        if (!inp.markSupported()) {
            throw new IOException("getFileMagic() only operates on streams which support mark(int)");
        }

        // Grab the first 8 bytes 抓取文件流的前8个字节
        byte[] data = IOUtils.peekFirst8Bytes(inp);

        return FileMagic.valueOf(data);
    }

好了，这是第一步，我们拿到了FileMagic的值。接下来看如果根据FileMagic的值，判断excel文件的格式。

第二步：根据根据FileMagic的值，判断excel文件的格式

注意：即使用户指定了excelType，但是poi实际上可能并不会信任。poi判断文件格式的逻辑就是FileMagic。

public static ExcelTypeEnum valueOf(File file, InputStream inputStream, ExcelTypeEnum excelType) {

    try {
        // 根据excel文件的魔法二进制数值，判断excel文件格式。
        // 注意：即使用户指定了excelType，但是poi实际上可能并不会相信。poi判断文件格式的逻辑就是FileMagic
        FileMagic fileMagic;

        // 1，优先使用BufferedInputStream流来获取FileMagic
        if (file != null) {
            BufferedInputStream bufferedInputStream = new BufferedInputStream(new FileInputStream(file));
            try {
                fileMagic = FileMagic.valueOf(bufferedInputStream);
            } finally {
                bufferedInputStream.close();
            }

            // 如果FileMagic的值，既不是OLE2，也不是OOXML，则使用文件后缀名来判断文件格式
            if (!FileMagic.OLE2.equals(fileMagic) && !FileMagic.OOXML.equals(fileMagic)) {
                String fileName = file.getName();
                if (fileName.endsWith(XLSX.getValue())) {
                    return XLSX;
                } else if (fileName.endsWith(XLS.getValue())) {
                    return XLS;
                } else {
                    throw new ExcelCommonException("Unknown excel type.");
                }
            }
        // 2，如果没有传入File对象，则使用InputStream流来获取FileMagic
        } else {
            fileMagic = FileMagic.valueOf(inputStream);
        }

        // 如果FileMagic的值是OLE2，则说明这个文件是XLS格式
        if (FileMagic.OLE2.equals(fileMagic)) {
            return XLS;
        }

        // 如果FileMagic的值是OOXML，则说明这个文件是XLSX格式
        if (FileMagic.OOXML.equals(fileMagic)) {
            return XLSX;
        }
    } catch (IOException e) {
        if (excelType != null) {
            return excelType;
        }
        throw new ExcelCommonException(
            "Convert excel format exception.You can try specifying the 'excelType' yourself", e);
    }

    // 3，如果根据FileMagic没有判断出来，没办法，这时候就选择相信用户，直接返回用户指定的excelType
    if (excelType != null) {
        return excelType;
    }
    throw new ExcelCommonException(
        "Convert excel format exception.You can try specifying the 'excelType' yourself");
}

通过这个源码，我们也可以发现，poi其实只能处理xls和xlsx格式的excel，其他格式的excel一律无法处理。

这个其实也是微软的产品让人诟病的一个原因，虽然提供了很多炫酷的可视化操作，但是由于太过繁杂，应用程序解析起来

显得很吃力。

我们来看看excel的格式有多少种？