java使用SAX解析器解析XML文件

一、概述

SAX,也称为Simple API for XML,是jdk自带的 用于解析 XML 文档 API。它是一种基于流的解析方式,边读取XML边解析,并以事件回调的方式让调用者获取数据。因为是一边读一边解析,所以无论XML有多大,占用的内存都很小,所以 SAX 具有高效的内存管理。

二、使用的maven依赖

    <dependencies>
        <dependency>
            <groupId>com.alibaba.fastjson2</groupId>
            <artifactId>fastjson2</artifactId>
            <version>2.0.39</version>
        </dependency>
    </dependencies>

三、待解析的xml文件

<Workbook xmlns="urn:schemas-microsoft-com:office:spreadsheet" xmlns:o="urn:schemas-microsoft-com:office:office">
    <Styles>
        <Style ss:ID="Default" ss:Name="Normal">
            <Alignment ss:Vertical="Bottom"/>
            <Borders/>
            <Font ss:FontName="宋体" x:CharSet="134" ss:Size="12"/>
            <Interior/>
            <NumberFormat/>
            <Protection/>
        </Style>
        <Style ss:ID="s79" ss:Name="常规 2">
            <Alignment ss:Vertical="Center"/>
            <Borders/>
            <Font ss:FontName="等线" x:CharSet="134" ss:Size="11" ss:Color="#000000"/>
            <Interior/>
            <NumberFormat/>
            <Protection/>
        </Style>
    </Styles>
    <Worksheet ss:Name="薪酬发放表1">
        <Row ss:AutoFitHeight="0" ss:Height="33.75">
            <Cell ss:MergeAcross="7" ss:StyleID="s64">
                <Data ss:Type="String">2022年5月某公司托管人员五险一金明细表</Data>
            </Cell>
        </Row>
        <Row ss:AutoFitHeight="0" ss:Height="21">
            <Cell ss:StyleID="s65">
                <Data ss:Type="String">部门</Data>
            </Cell>
            <Cell ss:StyleID="s65" ss:MergeAcross="3" ss:MergeDown="2">
                <Data ss:Type="String">序号</Data>
            </Cell>
            <Cell ss:MergeAcross="1" ss:StyleID="m3181415230988">
                <Data ss:Type="String">姓名</Data>
            </Cell>
            <Cell ss:StyleID="s66" ss:MergeDown="2">
                <Data ss:Type="String">公积金单位部分(社保)</Data>
            </Cell>
            <Cell ss:Index="16371" ss:StyleID="Default"/>
            <Cell ss:StyleID="Default"/>
            <Cell ss:StyleID="Default"/>
        </Row>
        <Row ss:AutoFitHeight="0" ss:Height="21"/>
    </Worksheet>
    <Worksheet ss:Name="薪酬发放表2">
        <Row ss:Height="22.5">
            <Cell ss:MergeAcross="7" ss:StyleID="s64">
                <Data ss:Type="String">2022年6月某公司托管人员五险一金明细表</Data>
            </Cell>
        </Row>
        <Row ss:Index="3">
            <Cell ss:StyleID="s67" ss:MergeAcross="3">
                <Data ss:Type="String">信息中心</Data>
            </Cell>
            <Cell ss:StyleID="s68" ss:Formula=" ">
                <Data ss:Type="Number">1</Data>
            </Cell>
            <Cell ss:StyleID="s68">
                <Data ss:Type="String">张三</Data>
            </Cell>
            <Cell ss:StyleID="s69" ss:MergeDown="6">
                <Data ss:Type="Number">1252.43</Data>
            </Cell>
            <Cell ss:StyleID="s69">
                <Data ss:Type="Number">313.11</Data>
            </Cell>
            <Cell ss:StyleID="s68">
                <Data ss:Type="String"/>
            </Cell>
        </Row>
    </Worksheet>
</Workbook>

四、XmlRow类

package com.xmlutil;

import java.util.ArrayList;

public class XmlRow {
    //存储Cell中Data的数据
    public ArrayList<String> cellList = new ArrayList<>();
    //存储Cell中ss:MergeAcross的属性值
    public ArrayList<Integer> cellMergeAcrossList = new ArrayList<>();
    //存储Cell中ss:MergeDown的属性值
    public ArrayList<Integer> cellMergeDownList = new ArrayList<>();

    @Override
    public String toString() {
        return cellList.toString();
    }
}

五、SAXHandler类

package com.xmlutil;

import com.alibaba.fastjson.JSON;
import com.alibaba.fastjson.JSONObject;
import org.xml.sax.Attributes;
import org.xml.sax.SAXException;
import org.xml.sax.helpers.DefaultHandler;

import java.util.ArrayList;
import java.util.List;

public class SAXHandler extends DefaultHandler {

    private List<XmlRow> xmlRowList = new ArrayList<>();
    //存储所有Worksheet的内容
    public List<List<XmlRow>> xmlSheetList = new ArrayList<>();
    //存储所有Worksheet的ss:Name属性值
    public List<String> sheetNameList = new ArrayList<>();


    XmlRow xmlRow = null;
    String content = null;
    Integer mergeAcross = null;
    Integer mergeDown = null;

    @Override
    //当开始标签被找到时
    public void startElement(String uri, String localName, String qName, Attributes attributes) throws SAXException {
        int size = attributes.getLength();
        switch (qName) {
            //Create a new Row object when the start tag is found
            case "Worksheet":
                xmlRowList.clear();
                for (int i = 0; i < size; i++) {
                    String attName = attributes.getQName(i);
                    if ("ss:Name".equals(attName)) {
                        sheetNameList.add(attributes.getValue(i));
                    }
                }
                break;
            case "Row":
                xmlRow = new XmlRow();
                break;
            case "Cell":
                mergeAcross = mergeDown = 0;
                for (int i = 0; i < size; i++) {
                    String attName = attributes.getQName(i);
                    if ("ss:MergeAcross".equals(attName)) {
                        mergeAcross = Integer.parseInt(attributes.getValue(i));
                    }
                    if ("ss:MergeDown".equals(attName)) {
                        mergeDown = Integer.parseInt(attributes.getValue(i));
                    }
                }
                break;
        }
    }

    @Override
    public void endElement(String uri, String localName, String qName) throws SAXException {
        switch (qName) {
            case "Worksheet":
                //xmlRowList深拷贝为newXmlRowList
                String listStr = JSONObject.toJSONString(xmlRowList);
                List<XmlRow> newXmlRowList = JSON.parseArray(listStr, XmlRow.class);
                xmlSheetList.add(newXmlRowList);
                break;
            case "Row":
                xmlRowList.add(xmlRow);
                break;
            case "Cell":
                xmlRow.cellMergeAcrossList.add(mergeAcross);
                xmlRow.cellMergeDownList.add(mergeDown);
                break;
            case "Data":
                xmlRow.cellList.add(content);
                break;
        }
    }

    @Override
    public void characters(char[] ch, int start, int length) throws SAXException {
        content = String.copyValueOf(ch, start, length).trim();
    }
}

六、测试


import com.alibaba.fastjson2.JSON;
import javax.xml.parsers.SAXParser;
import javax.xml.parsers.SAXParserFactory;
import java.io.ByteArrayInputStream;
import java.nio.charset.StandardCharsets;
import java.nio.file.Files;
import java.nio.file.Paths;

public class MainServer {
    public static void main(String[] args) throws Exception {
      
        SAXParserFactory parserFactor = SAXParserFactory.newInstance();
        SAXParser parser = parserFactor.newSAXParser();
        SAXHandler handler = new SAXHandler();
        String filePath = "F:\\excels\\textXml.xml";
        String fileContent = new String(Files.readAllBytes(Paths.get(filePath)), StandardCharsets.UTF_8);
        ByteArrayInputStream bis = new ByteArrayInputStream(fileContent.getBytes());
        parser.parse(bis, handler);

        System.out.println(JSON.toJSONString(handler.xmlSheetList));
    }

输出:

[
	[{
		"cellList": ["2022年5月某公司托管人员五险一金明细表"],
		"cellMergeAcrossList": [7],
		"cellMergeDownList": [0]
	}, {
		"cellList": ["部门", "序号", "姓名", "公积金单位部分(社保)"],
		"cellMergeAcrossList": [0, 3, 1, 0, 0, 0, 0],
		"cellMergeDownList": [0, 2, 0, 2, 0, 0, 0]
	}, {
		"cellList": [],
		"cellMergeAcrossList": [],
		"cellMergeDownList": []
	}],
	[{
		"cellList": ["2022年6月某公司托管人员五险一金明细表"],
		"cellMergeAcrossList": [7],
		"cellMergeDownList": [0]
	}, {
		"cellList": ["信息中心", "1", "张三", "1252.43", "313.11", ""],
		"cellMergeAcrossList": [3, 0, 0, 0, 0, 0],
		"cellMergeDownList": [0, 0, 0, 6, 0, 0]
	}]
]

参考:

Parsing an XML File Using SAX Parser

Parsing an XML File Using StAX

excel-xml-reader

使用SAX

  • 1
    点赞
  • 16
    收藏
    觉得还不错? 一键收藏
  • 0
    评论
要在Java使用SAX解析XML,你可以按照以下步骤进行操作: 1. 导入相关的类和包,如`javax.xml.parsers.SAXParser`和`javax.xml.parsers.SAXParserFactory`。 2. 创建`SAXParserFactory`的实例。 3. 通过调用`SAXParserFactory`的`newSAXParser()`方法创建一个解析器。 4. 获取需要解析XML文档,并创建一个`File`对象来表示该文档。 5. 创建一个自定义的`SAXHandler`类,该类继承自`DefaultHandler`类,并重写需要的回调方法来处理XML元素和数据。 6. 调用解析器的`parse()`方法,传入文件和自定义的`SAXHandler`对象作为参数,开始解析XML文档。 你可以参考以下示例代码: ```java import java.io.File; import javax.xml.parsers.SAXParser; import javax.xml.parsers.SAXParserFactory; public class TestDemo { public static void main(String[] args) throws Exception { // 1.实例化SAXParserFactory对象 SAXParserFactory factory = SAXParserFactory.newInstance(); // 2.创建解析器 SAXParser parser = factory.newSAXParser(); // 3.获取需要解析的文档,生成解析器,最后解析文档 File f = new File("books.xml"); SaxHandler dh = new SaxHandler(); parser.parse(f, dh); } } ``` 请注意,上述代码中的`SaxHandler`是一个自定义的类,你需要根据自己的需求来实现该类,以便在解析XML时处理相应的元素和数据。 XML文档如下所示: ```xml <?xml version="1.0" encoding="UTF-8"?> <books> <book id="001"> <title>Harry Potter</title> <author>J K. Rowling</author> </book> <book id="002"> <title>Learning XML</title> <author>Erik T. Ray</author> </book> </books> ``` 希望以上信息能够帮到你。<span class="em">1</span><span class="em">2</span><span class="em">3</span> #### 引用[.reference_title] - *1* [java使用sax解析xml的解决方法](https://download.csdn.net/download/weixin_38747216/12815749)[target="_blank" data-report-click={"spm":"1018.2226.3001.9630","extra":{"utm_source":"vip_chatgpt_common_search_pc_result","utm_medium":"distribute.pc_search_result.none-task-cask-2~all~insert_cask~default-1-null.142^v93^chatsearchT3_1"}}] [.reference_item style="max-width: 50%"] - *2* *3* [在java使用sax解析xml](https://blog.csdn.net/weixin_33884611/article/details/86303531)[target="_blank" data-report-click={"spm":"1018.2226.3001.9630","extra":{"utm_source":"vip_chatgpt_common_search_pc_result","utm_medium":"distribute.pc_search_result.none-task-cask-2~all~insert_cask~default-1-null.142^v93^chatsearchT3_1"}}] [.reference_item style="max-width: 50%"] [ .reference_list ]

“相关推荐”对你有帮助么?

  • 非常没帮助
  • 没帮助
  • 一般
  • 有帮助
  • 非常有帮助
提交
评论
添加红包

请填写红包祝福语或标题

红包个数最小为10个

红包金额最低5元

当前余额3.43前往充值 >
需支付:10.00
成就一亿技术人!
领取后你会自动成为博主和红包主的粉丝 规则
hope_wisdom
发出的红包
实付
使用余额支付
点击重新获取
扫码支付
钱包余额 0

抵扣说明:

1.余额是钱包充值的虚拟货币,按照1:1的比例进行支付金额的抵扣。
2.余额无法直接购买下载,可以购买VIP、付费专栏及课程。

余额充值