XML解析中Bom导致错误的问题分析与解决

最新推荐文章于 2022-10-23 21:24:12 发布

微寒Super

最新推荐文章于 2022-10-23 21:24:12 发布

阅读量2.2k

点赞数 1

分类专栏： Java

本文链接：https://blog.csdn.net/supercooly/article/details/46724005

版权

Java 专栏收录该内容

11 篇文章 0 订阅

订阅专栏

错误信息：org.dom4j.DocumentException:Error on line 1of document：Content is not allowed in prolog.

Nested exception: Content is not allowed in prolog.

XML编码错误：
左边报错的XML，右边正常的xml文件，比较工具Beyond Compare 4

这里写图片描述

解决办法：
1、使用Notepad++编辑器，将以UTF-8格式编码的文件转换为以UTF-8无Bom格式编码的文件，另存为即可。

这里写图片描述

2、对于webService接收来的xmlString的处理，使用如下方法，修改xml字符串

/**
     * 检查xml字符串是否有非法前缀
     * @param xmlStr
     * @return
     */
    public String checkXMLStr(String xmlStr){

        StringBuilder sb= new StringBuilder(xmlStr);
        int index = sb.indexOf("<?xml");
        if(index > 0){
            sb.delete(0, index);
            xmlStr = sb.toString();
        }else if(index == -1){
            xmlStr = "";
        }
        return xmlStr;

    }

3、为了程序的健壮性，可以在读文件的时候，加入判断，判断是否有Bom，有的话，在生成字符串的时候，将其删除，方法如下：

/**
     * 检查byte数组 是否有BOM头
     * UTF8文件都有一个3字节的头，为“EF BB BF”(称为BOM--Byte Order Mark)
     * @param bytes
     * @return
     */
    private static boolean CheckBOM( byte[] bytes )
    {
        boolean isBOM = false;
        {
            if(bytes.length >3){
                 if( 0xef == (bytes[0] & 0xff) 
                     && 0xbb == (bytes[1] & 0xff) 
                     && 0xbf == (bytes[2] & 0xff) ){
                     isBOM = true;
                 }
            }
        }
        //System.out.println("是否有BOM："+isBOM);
        return isBOM;
    }
/**
     * 将文件读取为UTF-8编码字符串
     * @param filePath
     * @return
     */
    public String getXMLFileText(String filePath) {
        String retXMLStr = "";

        byte[] bt = fileToByteArray(filePath);
        //加入一个判断，文件流是否含有Bom，有就删除
        if( CheckBOM(bt) ){
            try {
                retXMLStr = new String(bt,3,bt.length -3, "utf-8");
            } catch (UnsupportedEncodingException e) {
                // TODO Auto-generated catch block
                e.printStackTrace();
            }
        }else{
            try {
                retXMLStr = new String(bt,0 ,bt.length, "utf-8");
            } catch (UnsupportedEncodingException e) {
                // TODO Auto-generated catch block
                e.printStackTrace();
            }
        }
        //return checkXMLStr(retXMLStr);
        return retXMLStr;
    }
    // 将文件读成byte[]数组
    public byte[] fileToByteArray(String filePath) {

        filePath = filePath.replaceAll("\\\\", "/");
        File file = null;
        FileInputStream fileInputStream = null;
        BufferedInputStream in = null;
        ByteArrayOutputStream out = null;
        byte[] bt = null;
        try {
            file = new File(filePath);

            if (!file.exists() || file.isDirectory()) {
                return null;
            }
            fileInputStream = new FileInputStream(file);

            in = new BufferedInputStream(fileInputStream);
            out = new ByteArrayOutputStream();
            byte[] temp = new byte[1024 * 1024];  //每次读取 1M
            int size = 0;
            while ((size = in.read(temp)) != -1) {
                out.write(temp, 0, size);
            }

            bt = out.toByteArray();
            // for(int i = 0; i < bt.length; i++)

        } catch (Exception e) {
            e.printStackTrace();
        } finally {
            try {
                fileInputStream.close();
                in.close();
            } catch (IOException e) {
                // TODO Auto-generated catch block
                e.printStackTrace();
            }
        }

        return bt;
    }