使用StAX解析器解析大型xml文件

1 篇文章 0 订阅
1 篇文章 0 订阅

前言

本文大部分摘抄于IBM developerworks(主要是理论),详下面三篇文章,摘抄主要是为了使自己理解更深一点儿,仅当作笔记而已...也是为了以后再次使用时有个参考!摘抄并不全面,原文内容要丰富地多,详见原文。

参考文章:

使用 StAX 解析 XML,第 1 部分: Streaming API for XML (StAX) 简介:http://www.ibm.com/developerworks/cn/xml/x-stax1.html
使用 StAX 解析 XML,第 2 部分: 拉式解析和事件:http://www.ibm.com/developerworks/cn/xml/x-stax2.html
使用 StAX 解析 XML,第 3 部分: 使用定制事件和编写 XML:http://www.ibm.com/developerworks/cn/xml/x-stax3.html
————————————————
原文链接:https://blog.csdn.net/zhyh1986/article/details/8528649

关于对StAX的描述不再做过多描述了,说一下我解析xml文件遇到的问题

需求:
想解析一个4GB大小的xml文件里的所有标签为entity的内容包括嵌套的子标签以及内容,并将解析出来的这些entity数据均匀的写入7个新的xml文件里

解析xml的方法大体来说有四种:

  • DOM解析
  • SAX解析
  • DOM4J解析
  • JDOM解析

这四种方法的利弊比较:


1.SAX解析(Simple API for XML)

SAX解析方式:逐行扫描文档,一边扫描一边解析。相比于DOM,SAX可以在解析文档的任意时刻停止解析解析,是一种速度更快,更高效的方法。

优点:不用事先调入整个文档,占用资源少。解析可以立即开始,速度快,没有内存压力。

缺点:不能对结点做修改

适用:读取XML文件
 

2.DOM解析(Document Object Model)

DOM解析方式:为 解析XML 文档定义了一组接口。解析器读入整个文档,然后在内存中建立一个树结构, 然后就可以使用 DOM 接口来操作这个树结构。

优点:整个文档树在内存中,便于操作;支持删除、修改、重新排列等多种功能

缺点:如果文件比较大,内存有压力,解析的时间会比较长。将整个文档调入内 存(包括无用的节点),浪费时间和空间。

适用:修改XML数据
 


 

3.JDOM

JDOM是处理xml的纯java api.使用具体类而不是接口.JDOM具有树的遍历,又有SAX的java规则.JDOM与DOM主要有两方面不同。
首先,JDOM仅使用具体类而不使用接口。这在某些方面简化了API,但是也限制了灵活性。
第二,API大量使用了Collections类,简化了那些已经熟悉这些类的Java开发者的使用。

JDOM自身不包含解析器。它通常使用SAX2解析器来解析和验证输入XML文档(尽管它还可以将以前构造的DOM表示作为输入)。它包含一些转换器以将JDOM表示输出成SAX2事件流、DOM模型或XML文本文档。

 
优点:1、是基于树的处理xml的java api,把树加载到内存中.

2、没有向下兼容的限制,所以比DOM简单.

3、速度快.

4、具有SAX的java 规则.

缺点:1、不能处理大于内存的文档.

2、JDOM表示XML文档逻辑模型,不能保证每个字节真正变换.

3、 针对实例文档不提供DTD与模式的任何实际模型.

4、 不支持于DOM中相应遍历包.
 

4.DOM4J

DOM4J有更复杂的api,所以dom4j比jdom有更大的灵活性.DOM4J性能最好,连Sun的JAXM也在用DOM4J.目前许多开源项目中大量采用DOM4J,例如大名鼎鼎的Hibernate也用DOM4J来读取XML配置文件。如果不考虑可移植性,那就采用DOM4J.

优点:灵活性最高、易用性和功能强大、性能优异

缺点:复杂的api、移植性差

以上这四种方法,我基本都有试过用来解析上述的需求

第一个用的就是DOM解析,但是这个方法只能解析小一点的xml文件,太大的会内存溢出 因为它是一次性加载整个文档的

后面用过DOM4J和SAX,但是都由于电脑系统内存的问题,还是会报JVM内存溢出的问题

没有办法,最后查到了StAX也可以解析大型XML文件的方法

截取一部分要解析的xml文件:

<?xml version='1.0' encoding='UTF-8'?>
<gwl>
<version>20230417084108</version>
<entities>
<entity id="1123831" version="20230414163503">
    <name>ALMOND, LINCOLN CARTER</name>
    <listId>1021</listId>
    <listCode>USP</listCode>
    <entityType>03</entityType>
    <createdDate>09/02/2004</createdDate>
    <lastUpdateDate>04/14/2023</lastUpdateDate>
    <source>USP</source>
    <OriginalSource>PEP</OriginalSource>
    <dobs>
        <dob Y="1936">06/16/1936</dob>
    </dobs>
    <pobs>
        <pob>Pawtucket, Rhode Island, United States</pob>
    </pobs>
    <titles>
        <title>FORMER GOVERNOR OF RHODE ISLAND (JANUARY 3, 1995 - JANUARY 7, 2003). DECEASED JANUARY 02, 2023.</title>
    </titles>
    <sdfs>
        <sdf name="OtherInformation">Career: Governor of Rhode Island (January 03, 1995 - January 07, 2003); United State Attorney for the District of Rhode Island (October 09, 1981 - January 20, 1993); United State Attorney for the District of Rhode Island (1969 - 1978).</sdf>
        <sdf name="DirectID">https://accuity.worldcompliance.com/signin.aspx?ent=d14d930f-7943-4363-b4d0-aa2c59437e1b</sdf>
        <sdf name="EffectiveDate">1981</sdf>
        <sdf name="EntityLevel">State</sdf>
        <sdf name="ExpirationDate">1993</sdf>
        <sdf name="Gender">MALE</sdf>
        <sdf name="NameSource">Website</sdf>
        <sdf name="Org_PID">1706394</sdf>
        <sdf name="OriginalID">7031</sdf>
        <sdf name="Relationship">Father</sdf>
        <sdf name="SubCategory">Former PEP</sdf>
    </sdfs>
    <addresses>
        <address>
            <country>US</country>
            <countryName>UNITED STATES</countryName>
        </address>
    </addresses>
</entity>
<entity id="1124766" version="20230414163503">
    <name>BAUCUS, MAX SIEBEN</name>
    <listId>1021</listId>
    <listCode>USP</listCode>
    <entityType>03</entityType>
    <createdDate>09/02/2004</createdDate>
    <lastUpdateDate>04/14/2023</lastUpdateDate>
    <source>USP</source>
    <OriginalSource>PEP</OriginalSource>
    <dobs>
        <dob Y="1941">12/11/1941</dob>
    </dobs>
    <pobs>
        <pob>Helena, Montana, United States</pob>
    </pobs>
    <aliases>
        <alias type="Alias">ENKE, MAX SIEBEN</alias>
    </aliases>
    <titles>
        <title>FORMER AMBASSADOR OF THE UNITED STATES TO CHINA (MARCH 20, 2014 - JANUARY 16, 2017).</title>
    </titles>
    <sdfs>
        <sdf name="OtherInformation">Political Party: Democratic. Career: Ambassador Extraordinary and Plenipotentiary of the United States to China, (March 20, 2014 - January 16, 2017); Member of the United States Congress, Senate from Montana (December 15, 1978 - February 06, 2014);</sdf>
        <sdf name="DirectID">https://accuity.worldcompliance.com/signin.aspx?ent=945fd382-f5b7-42c4-ad1f-a40c4bf0e285</sdf>
        <sdf name="EffectiveDate">1978</sdf>
        <sdf name="EntityLevel">National</sdf>
        <sdf name="ExpirationDate">2014</sdf>
        <sdf name="Gender">MALE</sdf>
        <sdf name="NameSource">Website</sdf>
        <sdf name="Org_PID">548118</sdf>
        <sdf name="OriginalID">7542</sdf>
        <sdf name="Relationship">Brother</sdf>
        <sdf name="SubCategory">Former PEP</sdf>
    </sdfs>
    <addresses>
        <address>
            <country>US</country>
            <countryName>UNITED STATES</countryName>
            <province>WASHINGTON, DC</province>
            <postalCode>20515</postalCode>
        </address>
        <address>
            <country>US</country>
            <countryName>UNITED STATES</countryName>
            <province>WASHINGTON, D.C.</province>
            <postalCode>20510</postalCode>
        </address>
        <address>
            <address1>55 ANJIALOU RD</address1>
            <city>BEIJING</city>
            <country>CN</country>
            <countryName>CHINA</countryName>
            <postalCode>100600</postalCode>
        </address>
    </addresses>
</entity>
<entity id="1124842" version="20230414163503">
    <name>THOMAS, CRAIG LYLE</name>
    <listId>1021</listId>
    <listCode>USP</listCode>
    <entityType>03</entityType>
    <createdDate>09/02/2004</createdDate>
    <lastUpdateDate>04/14/2023</lastUpdateDate>
    <source>USP</source>
    <OriginalSource>PEP</OriginalSource>
    <dobs>
        <dob Y="1933">02/17/1933</dob>
    </dobs>
    <pobs>
        <pob>Cody, Wyoming, United States</pob>
    </pobs>
    <titles>
        <title>FORMER MEMBER OF THE UNITED STATES CONGRESS (JANUARY 03, 1995 - JUNE 04, 2007). DECEASED JUNE 04, 2007.</title>
    </titles>
    <sdfs>
        <sdf name="OtherInformation">Political Party: Republican. Career: Member of the United States Congress, Senate, Class I (January 03, 1995 - June 04, 2007); Member of the United States Congress, House of Representatives, At-Large (April 27, 1989 - January 03, 1995). Member of the</sdf>
        <sdf name="DirectID">https://accuity.worldcompliance.com/signin.aspx?ent=4e7b1050-36b5-4b1c-9037-c2349c519d40</sdf>
        <sdf name="EffectiveDate">1989</sdf>
        <sdf name="EntityLevel">National</sdf>
        <sdf name="ExpirationDate">1995</sdf>
        <sdf name="Gender">MALE</sdf>
        <sdf name="NameSource">Website</sdf>
        <sdf name="Org_PID">1817490</sdf>
        <sdf name="OriginalID">7629</sdf>
        <sdf name="Relationship">Father</sdf>
        <sdf name="SubCategory">Former PEP</sdf>
    </sdfs>
    <addresses>
        <address>
            <country>US</country>
            <countryName>UNITED STATES</countryName>
            <province>WASHINGTON D.C.</province>
            <postalCode>20510</postalCode>
        </address>
        <address>
            <address1>200 WEST 24TH STREET</address1>
            <city>CHEYENNE</city>
            <state>WY</state>
            <stateName>WYOMING</stateName>
            <country>US</country>
            <countryName>UNITED STATES</countryName>
            <postalCode>82002</postalCode>
        </address>
    </addresses>
</entity>
<entity id="1125230" version="20230414163051">
    <name>PATRIAT, FRANCOIS</name>
    <listId>1020</listId>
    <listCode>PEP</listCode>
    <entityType>03</entityType>
    <createdDate>09/02/2004</createdDate>
    <lastUpdateDate>04/14/2023</lastUpdateDate>
    <source>PEP</source>
    <OriginalSource>PEP</OriginalSource>
    <dobs>
        <dob Y="1943">03/21/1943</dob>
    </dobs>
    <pobs>
        <pob>Semur-en-Auxois, , France</pob>
    </pobs>
    <titles>
        <title>MEMBER OF THE FRENCH PARLIAMENT (OCTOBER 01, 2008 - 2026).</title>
    </titles>
    <sdfs>
        <sdf name="OtherInformation">Political party: La Republique en marche (LREM) (currently known as Renaissance). Career: Member of the Executive Bureau of La Republique en Marche (LREM), The Republic on the Move (currently known as Renaissance), effective from November 18, 2017;</sdf>
        <sdf name="DirectID">https://accuity.worldcompliance.com/signin.aspx?ent=a4ffd4f3-5c75-440b-aeca-4e3a7d2ef642</sdf>
        <sdf name="EffectiveDate">2008</sdf>
        <sdf name="EntityLevel">National</sdf>
        <sdf name="ExpirationDate">2026</sdf>
        <sdf name="Gender">MALE</sdf>
        <sdf name="NameSource">Website</sdf>
        <sdf name="Org_PID">3759009</sdf>
        <sdf name="OriginalID">8117</sdf>
        <sdf name="Relationship">Associate</sdf>
        <sdf name="SubCategory">Govt Branch Member</sdf>
    </sdfs>
    <addresses>
        <address>
            <address1>15, RUE DE VAUGIRARD</address1>
            <city>PARIS</city>
            <country>FR</country>
            <countryName>FRANCE</countryName>
            <postalCode>75291</postalCode>
        </address>
    </addresses>
</entity>
<entity id="1125282" version="20230414163052">
    <name>BENOUTIQ, ABDELKRIM</name>
    <listId>1020</listId>
    <listCode>PEP</listCode>
    <entityType>03</entityType>
    <createdDate>09/02/2004</createdDate>
    <lastUpdateDate>04/14/2023</lastUpdateDate>
    <source>PEP</source>
    <OriginalSource>PEP</OriginalSource>
    <dobs>
        <dob Y="1959">08/19/1959</dob>
    </dobs>
    <pobs>
        <pob>Rabat, Rabat-Sale-Kenitra Region, Morocco</pob>
    </pobs>
    <aliases>
        <alias type="Alias">BEN ATIQ, ABDELKRIM</alias>
        <alias type="Alias">BENATIQ, ABDELKRIM</alias>
    </aliases>
    <nativeCharNames>
        <nativeCharName charSet="" latinCharName="BEN ATIQ, ABDELKRIM" type="Alias">??? ?????? ?? ????</nativeCharName>
        <nativeCharName charSet="" latinCharName="BENATIQ, ABDELKRIM" type="Alias">??? ?????? ??????</nativeCharName>
        <nativeCharName charSet="" latinCharName="BENOUTIQ, ABDELKRIM" type="Primary">??? ?????? ??????</nativeCharName>
    </nativeCharNames>
    <titles>
        <title>FORMER MEMBER OF THE POLITICAL BUREAU OF SOCIALIST UNION OF POPULAR FORCES PARTY, MOROCCO, ELECTED JUNE 10, 2017, EFFECTIVE UNTIL APRIL 24, 2022.</title>
    </titles>
    <sdfs>
        <sdf name="OtherInformation">Political Party: Union Socialiste Des Forces Populaires (USFP) Career: Member of the Political Bureau of Union Socialiste Des Forces Populaires (USFP), Socialist Union of Popular Forces Party, elected June 10, 2017, effective until April 24, 2022;</sdf>
        <sdf name="DirectID">https://accuity.worldcompliance.com/signin.aspx?ent=35f8bcea-6169-4a8f-9715-81de730d1c17</sdf>
        <sdf name="EffectiveDate">2000</sdf>
        <sdf name="EntityLevel">National</sdf>
        <sdf name="ExpirationDate">2001</sdf>
        <sdf name="Gender">MALE</sdf>
        <sdf name="NameSource">Website</sdf>
        <sdf name="OriginalID">8181</sdf>
        <sdf name="SubCategory">Former PEP</sdf>
    </sdfs>
    <addresses>
        <address>
            <address1>9, AVENUE AL ARAAR</address1>
            <city>RABAT</city>
            <country>MA</country>
            <countryName>MOROCCO</countryName>
            <province>RABAT-SALE-KENITRA REGION</province>
        </address>
        <address>
            <address1>AVENUE F.ROOSEVELT</address1>
            <city>RABAT</city>
            <country>MA</country>
            <countryName>MOROCCO</countryName>
            <province>RABAT-SALE-KENITRA REGION</province>
        </address>
        <address>
            <address1>NO. 9 ARAR STREET</address1>
            <city>RABAT</city>
            <country>MA</country>
            <countryName>MOROCCO</countryName>
            <province>RABAT-SALE-KENITRA REGION</province>
        </address>
    </addresses>
</entity>
<entity id="1125443" version="20230414163053">
    <name>OLLING, SVEND</name>
    <listId>1020</listId>
    <listCode>PEP</listCode>
    <entityType>03</entityType>
    <createdDate>09/02/2004</createdDate>
    <lastUpdateDate>04/14/2023</lastUpdateDate>
    <source>PEP</source>
    <OriginalSource>PEP</OriginalSource>
    <dobs>
        <dob Y="1967">11/09/1967</dob>
    </dobs>
    <pobs>
        <pob>Glostrup, , Denmark</pob>
    </pobs>
    <titles>
        <title>AMBASSADOR OF DENMARK TO SOUTH KOREA, AS OF MARCH 30, 2023.</title>
    </titles>
    <sdfs>
        <sdf name="OtherInformation">Career: Ambassador of Denmark to South Korea, as of March 30, 2023; Ambassador of Denmark to Egypt, as of May 28, 2020, expiration reported March 20, 2023; Non-Resident Ambassador of Denmark to Azerbaijan, effective from March 26, 2017, expiration</sdf>
        <sdf name="DirectID">https://accuity.worldcompliance.com/signin.aspx?ent=ef160921-f06b-4942-9527-0ee7565467c0</sdf>
        <sdf name="EffectiveDate">2023</sdf>
        <sdf name="EntityLevel">International</sdf>
        <sdf name="Gender">MALE</sdf>
        <sdf name="NameSource">Website</sdf>
        <sdf name="Org_PID">8698914</sdf>
        <sdf name="OriginalID">8384</sdf>
        <sdf name="Relationship">Father</sdf>
        <sdf name="SubCategory">Diplomat</sdf>
    </sdfs>
    <addresses>
        <address>
            <address1>416, HANGANG-DAERO, JUNG-GU</address1>
            <city>SEOUL</city>
            <country>KR</country>
            <countryName>KOREA, REPUBLIC OF</countryName>
            <postalCode>04637</postalCode>
        </address>
        <address>
            <address1>TURAN GUENES BULVARI 106</address1>
            <city>ANKARA</city>
            <country>TR</country>
            <countryName>TURKEY</countryName>
            <postalCode>06550</postalCode>
        </address>
        <address>
            <address1>ASIATISK PLADS 2</address1>
            <city>COPENHAGEN</city>
            <country>DK</country>
            <countryName>DENMARK</countryName>
            <postalCode>1448</postalCode>
        </address>
        <address>
            <address1>NORTH AVENUE</address1>
            <city>DHAKA</city>
            <country>BD</country>
            <countryName>BANGLADESH</countryName>
            <postalCode>1212</postalCode>
        </address>
        <address>
            <city>CAIRO</city>
            <country>EG</country>
            <countryName>EGYPT</countryName>
        </address>
    </addresses>
</entity>
<entity id="1125610" version="20230414163054">
    <name>TAKAHASHI, KOICHI</name>
    <listId>1020</listId>
    <listCode>PEP</listCode>
    <entityType>03</entityType>
    <createdDate>09/02/2004</createdDate>
    <lastUpdateDate>04/14/2023</lastUpdateDate>
    <source>PEP</source>
    <OriginalSource>PEP</OriginalSource>
    <dobs>
        <dob Y="1944">1944</dob>
    </dobs>
    <nativeCharNames>
        <nativeCharName charSet="" latinCharName="TAKAHASHI, KOICHI" type="Primary">たかはし こういち</nativeCharName>
        <nativeCharName charSet="" latinCharName="TAKAHASHI, KOICHI" type="Primary">高橋 恒一</nativeCharName>
    </nativeCharNames>
    <titles>
        <title>FORMER AMBASSADOR OF JAPAN TO THE CZECH REPUBLIC (FEBRUARY 03, 2003 - 2005).</title>
    </titles>
    <sdfs>
        <sdf name="OtherInformation">Career: Ambassador of Japan to the Czech Republic (February 03, 2003 - 2005); Deputy Vice-Minister in charge of Immigration Bureau, Ministry of Justice (1999 - 2001); Consul-General of Japan to Berlin City, Germany (1995 - 1997); Minister of Japan to</sdf>
        <sdf name="DirectID">https://accuity.worldcompliance.com/signin.aspx?ent=9b2a063e-8d55-4806-b2f2-f2c79d815a33</sdf>
        <sdf name="EffectiveDate">1999</sdf>
        <sdf name="EntityLevel">National</sdf>
        <sdf name="ExpirationDate">2001</sdf>
        <sdf name="Gender">MALE</sdf>
        <sdf name="NameSource">Website</sdf>
        <sdf name="OriginalID">8483</sdf>
        <sdf name="SubCategory">Former PEP</sdf>
    </sdfs>
    <addresses>
        <address>
            <country>JP</country>
            <countryName>JAPAN</countryName>
        </address>
    </addresses>
</entity>
<entity id="1125925" version="20230414163054">
    <name>PINTER, SANDOR</name>
    <listId>1020</listId>
    <listCode>PEP</listCode>
    <entityType>03</entityType>
    <createdDate>09/02/2004</createdDate>
    <lastUpdateDate>04/14/2023</lastUpdateDate>
    <source>PEP</source>
    <OriginalSource>PEP</OriginalSource>
    <dobs>
        <dob Y="1948">07/03/1948</dob>
    </dobs>
    <pobs>
        <pob>Budapest, , Hungary</pob>
    </pobs>
    <titles>
        <title>DEPUTY PRIME MINISTER OF HUNGARY, EFFECTIVE FROM MAY 04, 2018.</title>
    </titles>
    <sdfs>
        <sdf name="OtherInformation">Career: Deputy Prime Minister, effective from May 04, 2018; Minister of Interior, effective from May 29, 2010; Minister of Interior (July 08, 1998 - May 27, 2002); Chief of the Hungarian National Police (September 18, 1991 - 1996).</sdf>
        <sdf name="DirectID">https://accuity.worldcompliance.com/signin.aspx?ent=cd135a22-6242-4999-bc6f-5aae5b0f92e2</sdf>
        <sdf name="EffectiveDate">2018</sdf>
        <sdf name="EntityLevel">National</sdf>
        <sdf name="Gender">MALE</sdf>
        <sdf name="NameSource">Website</sdf>
        <sdf name="Org_PID">2544374</sdf>
        <sdf name="OriginalID">11549</sdf>
        <sdf name="Relationship">Father</sdf>
        <sdf name="SubCategory">Govt Branch Member</sdf>
    </sdfs>
    <addresses>
        <address>
            <address1>TEVE U. 4-6.</address1>
            <city>BUDAPEST</city>
            <country>HU</country>
            <countryName>HUNGARY</countryName>
            <postalCode>1139</postalCode>
        </address>
        <address>
            <address1>JOZSEF ATTILA U. 2-4.</address1>
            <city>BUDAPEST</city>
            <country>HU</country>
            <countryName>HUNGARY</countryName>
            <postalCode>1051</postalCode>
        </address>
    </addresses>
</entity>
</entities>
</gwl>

下面是用StAX解析的方法解析出上述xml文件里标签为entity的所有内容,并均匀写入7个新的xml文件中,并且每个新的xml文件都是自定义固定的格式:

import java.io.FileInputStream;
import java.io.FileOutputStream;
import java.io.InputStream;
import java.io.OutputStream;
import javax.xml.stream.XMLInputFactory;
import javax.xml.stream.XMLOutputFactory;
import javax.xml.stream.XMLStreamConstants;
import javax.xml.stream.XMLStreamException;
import javax.xml.stream.XMLStreamReader;
import javax.xml.stream.XMLStreamWriter;

public class StAXParserTest {
    public static void main(String[] args) {
        String inputFile = "D:\\Desktop\\PEP\\ENTITY.XML"; // 输入XML文件路径
        String outputPrefix = "D:\\Desktop\\PEP\\"; // 输出XML文件前缀
        int numFiles = 7; // 新文件数量

        try {
            // 创建XML输入工厂和读取器
            XMLInputFactory inputFactory = XMLInputFactory.newInstance();
            //创建输入流
            InputStream inputStream = new FileInputStream(inputFile);
            //使用输入工厂创建XMLStreamReader
            XMLStreamReader reader = inputFactory.createXMLStreamReader(inputStream);

            // 创建XML输出工厂和写入器数组
            XMLOutputFactory outputFactory = XMLOutputFactory.newInstance();
            //创建输出流数组:
            OutputStream[] outputStreams = new OutputStream[numFiles];
            //创建XMLStreamWriter数组
            XMLStreamWriter[] writers = new XMLStreamWriter[numFiles];

            for (int i = 0; i < numFiles; i++) {
                String outputFileName = outputPrefix + (i + 1) + ".xml";
                outputStreams[i] = new FileOutputStream(outputFileName);
                writers[i] = outputFactory.createXMLStreamWriter(outputStreams[i]);
                //开始编写XML文件刚开始头部 如:<?xml version='1.0' encoding='UTF-8'?>
                writers[i].writeStartDocument("UTF-8", "1.0");
                //此处为加了一个回车
                writers[i].writeCharacters("\n");
                //创建了GWL标签
                writers[i].writeStartElement("gwl");
                writers[i].writeCharacters("\n");
                //创建了Version标签,并在Version标签内增加值
                writers[i].writeStartElement("version");
                writers[i].writeCharacters("20230417084108");
                //Version标签结束,增加回标签</Version>
                writers[i].writeEndElement();
                writers[i].writeCharacters("\n");
                writers[i].writeStartElement("entities");
            }

            // 解析XML并写入新文件
            int currentFileIndex = 0;
            int entityCount = 0;

            while (reader.hasNext()) {
                int event = reader.next();

                switch (event) {
                    case XMLStreamConstants.START_ELEMENT:
                        String elementName = reader.getLocalName();
                        if ("entity".equals(elementName)) {
                            // 解析entity元素及其子元素
                            writeEntityElement(reader, writers[currentFileIndex]);
                            entityCount++;

                            // 切换到下一个文件
                            currentFileIndex = (currentFileIndex + 1) % numFiles;
                        }
                        break;
                }
            }

            // 关闭写入器和输出流
            for (int i = 0; i < numFiles; i++) {
            	writers[i].writeCharacters("\n");
            	//entities回标签
                writers[i].writeEndElement(); // entities
                writers[i].writeCharacters("\n");
                //gwl回标签
                writers[i].writeEndElement(); // gwl
                writers[i].writeCharacters("\n");
                writers[i].writeEndDocument();
                writers[i].flush();
                writers[i].close();
                outputStreams[i].close();
            }

            // 关闭输入流
            inputStream.close();

            System.out.println("entity总数量: " + entityCount);
            System.out.println("Entities per file: " + (entityCount / numFiles));

        } catch (Exception e) {
            e.printStackTrace();
        }
    }

    private static void writeEntityElement(XMLStreamReader reader, XMLStreamWriter writer) throws XMLStreamException {
    	writer.writeCharacters("\n");
    	//开始写入Entity标签
        writer.writeStartElement("entity");

        // 写入entity元素的属性
        int attributeCount = reader.getAttributeCount();
        //读取entity标签内的属性值:  attributeName为id/version  attributeValue则为值
        for (int i = 0; i < attributeCount; i++) {
            String attributeName = reader.getAttributeLocalName(i);
            String attributeValue = reader.getAttributeValue(i);
            writer.writeAttribute(attributeName, attributeValue);
        }

        // 解析entity元素的子元素
        while (reader.hasNext()) {
            int event = reader.next();
            switch (event) {
                case XMLStreamConstants.START_ELEMENT:
                	//获取当前开始的元素的名称
                    String childElementName = reader.getLocalName();
                    //写入开始元素的代码
                    writer.writeStartElement(childElementName);
                    break;

                case XMLStreamConstants.END_ELEMENT:
                    String endElementName = reader.getLocalName();
                    //写入结束元素的代码  
                    writer.writeEndElement();
                    if ("entity".equals(endElementName)) {
                        // entity元素解析完毕,结束写入
                        return;
                    }
                    break;

                case XMLStreamConstants.CHARACTERS:
                    String text = reader.getText();
                    writer.writeCharacters(text);
                    break;
            }
        }
    }
}

上述示例截取的xml文件中一共8个entity元素,解析完成后,7个xml文件中每个文件平均存入一条,多余出来的1条依次存入,所以第一个xml文件里是2条,其他6个里面只有一条数据

我完整的解析了4GB大小的Entity.xml文件,不会存在内存溢出的问题,解析速度也很快!

  • 1
    点赞
  • 1
    收藏
    觉得还不错? 一键收藏
  • 0
    评论

“相关推荐”对你有帮助么?

  • 非常没帮助
  • 没帮助
  • 一般
  • 有帮助
  • 非常有帮助
提交
评论
添加红包

请填写红包祝福语或标题

红包个数最小为10个

红包金额最低5元

当前余额3.43前往充值 >
需支付:10.00
成就一亿技术人!
领取后你会自动成为博主和红包主的粉丝 规则
hope_wisdom
发出的红包
实付
使用余额支付
点击重新获取
扫码支付
钱包余额 0

抵扣说明:

1.余额是钱包充值的虚拟货币,按照1:1的比例进行支付金额的抵扣。
2.余额无法直接购买下载,可以购买VIP、付费专栏及课程。

余额充值