Java DOM解析Xml中文乱码问题

最新推荐文章于 2024-06-30 03:27:48 发布

ubooks

最新推荐文章于 2024-06-30 03:27:48 发布

阅读量6.4k

点赞数

分类专栏： android 文章标签： xml java string hashmap exception null

android 专栏收录该内容

9 篇文章 0 订阅

订阅专栏

原文：点击打开链接

今日在开发一个实现读取XML文件指定节点的值或属性值的公用方法时，碰到了中文乱码问题，最终通过百度和google把它搞定了，今天在这里发表一下，一个作为自己的资源，另一个也希望能给其他人带来一些帮助。

xml文件中已经指定utf-8编码方式，里面可包含多条数据记录（我在这里称之为数据单元），采用的是字符流BufferedReader作为 InputSource 的输入源。

其中参数String dataUnitTag表示数据单元的标签,String[]commArr表示所有数据单元所需要的公共属性，String []detailArr表示每个数据单元所需要的具体属性名称

解析方法如下：

public static List parse(BufferedReader br,StringdataUnitTag,String []commArr,String [] detailArr) throws Exception{

LinkedListlist=new LinkedList();

String[]line=null;

try

{

InputSource is = newInputSource(br);

DocumentBuilderFactory domfac= DocumentBuilderFactory.newInstance();

DocumentBuilder dombuilder=domfac.newDocumentBuilder();;

Document document =dombuilder.parse(is);

document.normalize();

HashMapcommMap= new HashMap();

intcommLength=commArr.length;

intdetailLength=detailArr.length;

//获取共有属性

for(inti=0;i<commLength;i++){

line=commArr[i].split(":");

NodeListnodelist=document.getElementsByTagName_r(line[0]);

if(line.length>1){

String []attrArr=line[1].split(",");

if(attrArr==null){

continue;

}

for(intk=0;k<attrArr.length;k++){

StringattrValue =((Element)nodelist.item(0)).getAttribute(attrArr[k]);

commMap.put(attrArr[k],attrValue);

}

}else{

String ss =nodelist.item(0).getFirstChild().getNodeValue();

commMap.put(commArr[i], ss);

}

//获取每个数据单元的具体信息

NodeListrecordslist=document.getElementsByTagName_r(dataUnitTag);

Eelement element=null;

for(int i =0;i<recordslist.getLength();i++)

{

HashMap detailMap=(HashMap)commMap.clone();

element=(Element)recordslist.item(i);

for(intj=0;j<detailLength;j++){

line=detailArr[j].split(":");

NodeListnodelist=element.getElementsByTagName_r(line[0]);

if(line.length>1){

String []attrArr=line[1].split(",");

if(attrArr==null){

continue;

}

for(intk=0;k<attrArr.length;k++){

StringattrValue =((Element)nodelist.item(0)).getAttribute(attrArr[k]);

detailMap.put(attrArr[k], attrValue);

}

}else{

String ss =nodelist.item(0).getFirstChild().getNodeValue();

detailMap.put(detailArr[j], ss);

}

list.add(detailMap);

}

} catch(Exception e) {

throw newException("获取"+line[0]+"节点信息出错，请核查属性名称大小写是否正确！");

}

returnlist;

}

由于之前获取的xml文件没有涉及到中文，所以在获取文件内容时，直接从 BufferedReader br= newBufferedReader( new InputStreamReader(newFileInputStream(file)));

于是我照样采用了这样的方法，结果这样形成的字符流对于xml文件中的中文产生了乱码问题。

刚开始的时候，我把问题定位在I nputSource is = newInputSource(br);因为InputSource有一个

setEncoding(String encoding)方法。但我试了好多种编码方式多不行，于是我去查了下api，结果发现有这么一段话：

SAX 解析器将使用 InputSource对象来确定如何读取 XML输入。如果有字符流可用，则解析器将直接读取该流，而忽略该流中找到的任何文本编码声明。如果没有字符流，但却有字节流，则解析器将使用该字节流，从而使用在InputSource中指定的编码，或者另外（如果未指定编码）通过使用某种诸如 XML 规范中的算法算法自动探测字符编码。如果既没有字符流，又没有字节流可用，则解析器将尝试打开到由系统标识符标识的资源的 URI连接。

由上面一段话可以看出，setEncoding(String encoding)对于字符流是不起作用的，但我又非常喜欢用字符流，一个就是大家都明白字符流效率高，另一个就是我用的多，跟字节流相比，我跟它更熟悉，哈哈哈。

于是我考虑：是不是在把读取的文件内容从字节流转化成字符流的时候，进行编码指定呢?因为我们知道JDK的确提供了这样的方法,如BufferedReader br= newBufferedReader(new InputStreamReader(newFileInputStream(file),"utf-8"));

重新编译后运行，OK，正确显示出中文来了。

总结，在把字节流转化成字符流（FileInputStream变成InputStreamReader）的过程中对于TXT文本貌似是没有影响的，因为我在处理文本文件时，从来没有指定过编码方式，但utf-8下的xml文件需要指定编码方式避免中文乱码；另外，InputSource 对于字节流和字符流的不同处理方式。

题外话，当我碰到问题的时候，不是沮丧，而是兴奋，因为这意味我又能学到新的知识了。

Done！