前言
本文大部分摘抄于IBM developerworks(主要是理论),详下面三篇文章,摘抄主要是为了使自己理解更深一点儿,仅当作笔记而已...也是为了以后再次使用时有个参考!摘抄并不全面,原文内容要丰富地多,详见原文。
参考文章:
使用 StAX 解析 XML,第 1 部分: Streaming API for XML (StAX) 简介:http://www.ibm.com/developerworks/cn/xml/x-stax1.html
使用 StAX 解析 XML,第 2 部分: 拉式解析和事件:http://www.ibm.com/developerworks/cn/xml/x-stax2.html
使用 StAX 解析 XML,第 3 部分: 使用定制事件和编写 XML:http://www.ibm.com/developerworks/cn/xml/x-stax3.html
————————————————
原文链接:https://blog.csdn.net/zhyh1986/article/details/8528649
关于对StAX的描述不再做过多描述了,说一下我解析xml文件遇到的问题
需求:
想解析一个4GB大小的xml文件里的所有标签为entity的内容包括嵌套的子标签以及内容,并将解析出来的这些entity数据均匀的写入7个新的xml文件里
解析xml的方法大体来说有四种:
- DOM解析
- SAX解析
- DOM4J解析
- JDOM解析
这四种方法的利弊比较:
1.SAX解析(Simple API for XML)
SAX解析方式:逐行扫描文档,一边扫描一边解析。相比于DOM,SAX可以在解析文档的任意时刻停止解析解析,是一种速度更快,更高效的方法。
优点:不用事先调入整个文档,占用资源少。解析可以立即开始,速度快,没有内存压力。
缺点:不能对结点做修改
适用:读取XML文件
2.DOM解析(Document Object Model)
DOM解析方式:为 解析XML 文档定义了一组接口。解析器读入整个文档,然后在内存中建立一个树结构, 然后就可以使用 DOM 接口来操作这个树结构。
优点:整个文档树在内存中,便于操作;支持删除、修改、重新排列等多种功能
缺点:如果文件比较大,内存有压力,解析的时间会比较长。将整个文档调入内 存(包括无用的节点),浪费时间和空间。
适用:修改XML数据
3.JDOM
JDOM是处理xml的纯java api.使用具体类而不是接口.JDOM具有树的遍历,又有SAX的java规则.JDOM与DOM主要有两方面不同。
首先,JDOM仅使用具体类而不使用接口。这在某些方面简化了API,但是也限制了灵活性。
第二,API大量使用了Collections类,简化了那些已经熟悉这些类的Java开发者的使用。
JDOM自身不包含解析器。它通常使用SAX2解析器来解析和验证输入XML文档(尽管它还可以将以前构造的DOM表示作为输入)。它包含一些转换器以将JDOM表示输出成SAX2事件流、DOM模型或XML文本文档。
优点:1、是基于树的处理xml的java api,把树加载到内存中.
2、没有向下兼容的限制,所以比DOM简单.
3、速度快.
4、具有SAX的java 规则.
缺点:1、不能处理大于内存的文档.
2、JDOM表示XML文档逻辑模型,不能保证每个字节真正变换.
3、 针对实例文档不提供DTD与模式的任何实际模型.
4、 不支持于DOM中相应遍历包.
4.DOM4J
DOM4J有更复杂的api,所以dom4j比jdom有更大的灵活性.DOM4J性能最好,连Sun的JAXM也在用DOM4J.目前许多开源项目中大量采用DOM4J,例如大名鼎鼎的Hibernate也用DOM4J来读取XML配置文件。如果不考虑可移植性,那就采用DOM4J.
优点:灵活性最高、易用性和功能强大、性能优异
缺点:复杂的api、移植性差
以上这四种方法,我基本都有试过用来解析上述的需求
第一个用的就是DOM解析,但是这个方法只能解析小一点的xml文件,太大的会内存溢出 因为它是一次性加载整个文档的
后面用过DOM4J和SAX,但是都由于电脑系统内存的问题,还是会报JVM内存溢出的问题
没有办法,最后查到了StAX也可以解析大型XML文件的方法
截取一部分要解析的xml文件:
<?xml version='1.0' encoding='UTF-8'?>
<gwl>
<version>20230417084108</version>
<entities>
<entity id="1123831" version="20230414163503">
<name>ALMOND, LINCOLN CARTER</name>
<listId>1021</listId>
<listCode>USP</listCode>
<entityType>03</entityType>
<createdDate>09/02/2004</createdDate>
<lastUpdateDate>04/14/2023</lastUpdateDate>
<source>USP</source>
<OriginalSource>PEP</OriginalSource>
<dobs>
<dob Y="1936">06/16/1936</dob>
</dobs>
<pobs>
<pob>Pawtucket, Rhode Island, United States</pob>
</pobs>
<titles>
<title>FORMER GOVERNOR OF RHODE ISLAND (JANUARY 3, 1995 - JANUARY 7, 2003). DECEASED JANUARY 02, 2023.</title>
</titles>
<sdfs>
<sdf name="OtherInformation">Career: Governor of Rhode Island (January 03, 1995 - January 07, 2003); United State Attorney for the District of Rhode Island (October 09, 1981 - January 20, 1993); United State Attorney for the District of Rhode Island (1969 - 1978).</sdf>
<sdf name="DirectID">https://accuity.worldcompliance.com/signin.aspx?ent=d14d930f-7943-4363-b4d0-aa2c59437e1b</sdf>
<sdf name="EffectiveDate">1981</sdf>
<sdf name="EntityLevel">State</sdf>
<sdf name="ExpirationDate">1993</sdf>
<sdf name="Gender">MALE</sdf>
<sdf name="NameSource">Website</sdf>
<sdf name="Org_PID">1706394</sdf>
<sdf name="OriginalID">7031</sdf>
<sdf name="Relationship">Father</sdf>
<sdf name="SubCategory">Former PEP</sdf>
</sdfs>
<addresses>
<address>
<country>US</country>
<countryName>UNITED STATES</countryName>
</address>
</addresses>
</entity>
<entity id="1124766" version="20230414163503">
<name>BAUCUS, MAX SIEBEN</name>
<listId>1021</listId>
<listCode>USP</listCode>
<entityType>03</entityType>
<createdDate>09/02/2004</createdDate>
<lastUpdateDate>04/14/2023</lastUpdateDate>
<source>USP</source>
<OriginalSource>PEP</OriginalSource>
<dobs>
<dob Y="1941">12/11/1941</dob>
</dobs>
<pobs>
<pob>Helena, Montana, United States</pob>
</pobs>
<aliases>
<alias type="Alias">ENKE, MAX SIEBEN</alias>
</aliases>
<titles>
<title>FORMER AMBASSADOR OF THE UNITED STATES TO CHINA (MARCH 20, 2014 - JANUARY 16, 2017).</title>
</titles>
<sdfs>
<sdf name="OtherInformation">Political Party: Democratic. Career: Ambassador Extraordinary and Plenipotentiary of the United States to China, (March 20, 2014 - January 16, 2017); Member of the United States Congress, Senate from Montana (December 15, 1978 - February 06, 2014);</sdf>
<sdf name="DirectID">https://accuity.worldcompliance.com/signin.aspx?ent=945fd382-f5b7-42c4-ad1f-a40c4bf0e285</sdf>
<sdf name="EffectiveDate">1978</sdf>
<sdf name="EntityLevel">National</sdf>
<sdf name="ExpirationDate">2014</sdf>
<sdf name="Gender">MALE</sdf>
<sdf name="NameSource">Website</sdf>
<sdf name="Org_PID">548118</sdf>
<sdf name="OriginalID">7542</sdf>
<sdf name="Relationship">Brother</sdf>
<sdf name="SubCategory">Former PEP</sdf>
</sdfs>
<addresses>
<address>
<country>US</country>
<countryName>UNITED STATES</countryName>
<province>WASHINGTON, DC</province>
<postalCode>20515</postalCode>
</address>
<address>
<country>US</country>
<countryName>UNITED STATES</countryName>
<province>WASHINGTON, D.C.</province>
<postalCode>20510</postalCode>
</address>
<address>
<address1>55 ANJIALOU RD</address1>
<city>BEIJING</city>
<country>CN</country>
<countryName>CHINA</countryName>
<postalCode>100600</postalCode>
</address>
</addresses>
</entity>
<entity id="1124842" version="20230414163503">
<name>THOMAS, CRAIG LYLE</name>
<listId>1021</listId>
<listCode>USP</listCode>
<entityType>03</entityType>
<createdDate>09/02/2004</createdDate>
<lastUpdateDate>04/14/2023</lastUpdateDate>
<source>USP</source>
<OriginalSource>PEP</OriginalSource>
<dobs>
<dob Y="1933">02/17/1933</dob>
</dobs>
<pobs>
<pob>Cody, Wyoming, United States</pob>
</pobs>
<titles>
<title>FORMER MEMBER OF THE UNITED STATES CONGRESS (JANUARY 03, 1995 - JUNE 04, 2007). DECEASED JUNE 04, 2007.</title>
</titles>
<sdfs>
<sdf name="OtherInformation">Political Party: Republican. Career: Member of the United States Congress, Senate, Class I (January 03, 1995 - June 04, 2007); Member of the United States Congress, House of Representatives, At-Large (April 27, 1989 - January 03, 1995). Member of the</sdf>
<sdf name="DirectID">https://accuity.worldcompliance.com/signin.aspx?ent=4e7b1050-36b5-4b1c-9037-c2349c519d40</sdf>
<sdf name="EffectiveDate">1989</sdf>
<sdf name="EntityLevel">National</sdf>
<sdf name="ExpirationDate">1995</sdf>
<sdf name="Gender">MALE</sdf>
<sdf name="NameSource">Website</sdf>
<sdf name="Org_PID">1817490</sdf>
<sdf name="OriginalID">7629</sdf>
<sdf name="Relationship">Father</sdf>
<sdf name="SubCategory">Former PEP</sdf>
</sdfs>
<addresses>
<address>
<country>US</country>
<countryName>UNITED STATES</countryName>
<province>WASHINGTON D.C.</province>
<postalCode>20510</postalCode>
</address>
<address>
<address1>200 WEST 24TH STREET</address1>
<city>CHEYENNE</city>
<state>WY</state>
<stateName>WYOMING</stateName>
<country>US</country>
<countryName>UNITED STATES</countryName>
<postalCode>82002</postalCode>
</address>
</addresses>
</entity>
<entity id="1125230" version="20230414163051">
<name>PATRIAT, FRANCOIS</name>
<listId>1020</listId>
<listCode>PEP</listCode>
<entityType>03</entityType>
<createdDate>09/02/2004</createdDate>
<lastUpdateDate>04/14/2023</lastUpdateDate>
<source>PEP</source>
<OriginalSource>PEP</OriginalSource>
<dobs>
<dob Y="1943">03/21/1943</dob>
</dobs>
<pobs>
<pob>Semur-en-Auxois, , France</pob>
</pobs>
<titles>
<title>MEMBER OF THE FRENCH PARLIAMENT (OCTOBER 01, 2008 - 2026).</title>
</titles>
<sdfs>
<sdf name="OtherInformation">Political party: La Republique en marche (LREM) (currently known as Renaissance). Career: Member of the Executive Bureau of La Republique en Marche (LREM), The Republic on the Move (currently known as Renaissance), effective from November 18, 2017;</sdf>
<sdf name="DirectID">https://accuity.worldcompliance.com/signin.aspx?ent=a4ffd4f3-5c75-440b-aeca-4e3a7d2ef642</sdf>
<sdf name="EffectiveDate">2008</sdf>
<sdf name="EntityLevel">National</sdf>
<sdf name="ExpirationDate">2026</sdf>
<sdf name="Gender">MALE</sdf>
<sdf name="NameSource">Website</sdf>
<sdf name="Org_PID">3759009</sdf>
<sdf name="OriginalID">8117</sdf>
<sdf name="Relationship">Associate</sdf>
<sdf name="SubCategory">Govt Branch Member</sdf>
</sdfs>
<addresses>
<address>
<address1>15, RUE DE VAUGIRARD</address1>
<city>PARIS</city>
<country>FR</country>
<countryName>FRANCE</countryName>
<postalCode>75291</postalCode>
</address>
</addresses>
</entity>
<entity id="1125282" version="20230414163052">
<name>BENOUTIQ, ABDELKRIM</name>
<listId>1020</listId>
<listCode>PEP</listCode>
<entityType>03</entityType>
<createdDate>09/02/2004</createdDate>
<lastUpdateDate>04/14/2023</lastUpdateDate>
<source>PEP</source>
<OriginalSource>PEP</OriginalSource>
<dobs>
<dob Y="1959">08/19/1959</dob>
</dobs>
<pobs>
<pob>Rabat, Rabat-Sale-Kenitra Region, Morocco</pob>
</pobs>
<aliases>
<alias type="Alias">BEN ATIQ, ABDELKRIM</alias>
<alias type="Alias">BENATIQ, ABDELKRIM</alias>
</aliases>
<nativeCharNames>
<nativeCharName charSet="" latinCharName="BEN ATIQ, ABDELKRIM" type="Alias">??? ?????? ?? ????</nativeCharName>
<nativeCharName charSet="" latinCharName="BENATIQ, ABDELKRIM" type="Alias">??? ?????? ??????</nativeCharName>
<nativeCharName charSet="" latinCharName="BENOUTIQ, ABDELKRIM" type="Primary">??? ?????? ??????</nativeCharName>
</nativeCharNames>
<titles>
<title>FORMER MEMBER OF THE POLITICAL BUREAU OF SOCIALIST UNION OF POPULAR FORCES PARTY, MOROCCO, ELECTED JUNE 10, 2017, EFFECTIVE UNTIL APRIL 24, 2022.</title>
</titles>
<sdfs>
<sdf name="OtherInformation">Political Party: Union Socialiste Des Forces Populaires (USFP) Career: Member of the Political Bureau of Union Socialiste Des Forces Populaires (USFP), Socialist Union of Popular Forces Party, elected June 10, 2017, effective until April 24, 2022;</sdf>
<sdf name="DirectID">https://accuity.worldcompliance.com/signin.aspx?ent=35f8bcea-6169-4a8f-9715-81de730d1c17</sdf>
<sdf name="EffectiveDate">2000</sdf>
<sdf name="EntityLevel">National</sdf>
<sdf name="ExpirationDate">2001</sdf>
<sdf name="Gender">MALE</sdf>
<sdf name="NameSource">Website</sdf>
<sdf name="OriginalID">8181</sdf>
<sdf name="SubCategory">Former PEP</sdf>
</sdfs>
<addresses>
<address>
<address1>9, AVENUE AL ARAAR</address1>
<city>RABAT</city>
<country>MA</country>
<countryName>MOROCCO</countryName>
<province>RABAT-SALE-KENITRA REGION</province>
</address>
<address>
<address1>AVENUE F.ROOSEVELT</address1>
<city>RABAT</city>
<country>MA</country>
<countryName>MOROCCO</countryName>
<province>RABAT-SALE-KENITRA REGION</province>
</address>
<address>
<address1>NO. 9 ARAR STREET</address1>
<city>RABAT</city>
<country>MA</country>
<countryName>MOROCCO</countryName>
<province>RABAT-SALE-KENITRA REGION</province>
</address>
</addresses>
</entity>
<entity id="1125443" version="20230414163053">
<name>OLLING, SVEND</name>
<listId>1020</listId>
<listCode>PEP</listCode>
<entityType>03</entityType>
<createdDate>09/02/2004</createdDate>
<lastUpdateDate>04/14/2023</lastUpdateDate>
<source>PEP</source>
<OriginalSource>PEP</OriginalSource>
<dobs>
<dob Y="1967">11/09/1967</dob>
</dobs>
<pobs>
<pob>Glostrup, , Denmark</pob>
</pobs>
<titles>
<title>AMBASSADOR OF DENMARK TO SOUTH KOREA, AS OF MARCH 30, 2023.</title>
</titles>
<sdfs>
<sdf name="OtherInformation">Career: Ambassador of Denmark to South Korea, as of March 30, 2023; Ambassador of Denmark to Egypt, as of May 28, 2020, expiration reported March 20, 2023; Non-Resident Ambassador of Denmark to Azerbaijan, effective from March 26, 2017, expiration</sdf>
<sdf name="DirectID">https://accuity.worldcompliance.com/signin.aspx?ent=ef160921-f06b-4942-9527-0ee7565467c0</sdf>
<sdf name="EffectiveDate">2023</sdf>
<sdf name="EntityLevel">International</sdf>
<sdf name="Gender">MALE</sdf>
<sdf name="NameSource">Website</sdf>
<sdf name="Org_PID">8698914</sdf>
<sdf name="OriginalID">8384</sdf>
<sdf name="Relationship">Father</sdf>
<sdf name="SubCategory">Diplomat</sdf>
</sdfs>
<addresses>
<address>
<address1>416, HANGANG-DAERO, JUNG-GU</address1>
<city>SEOUL</city>
<country>KR</country>
<countryName>KOREA, REPUBLIC OF</countryName>
<postalCode>04637</postalCode>
</address>
<address>
<address1>TURAN GUENES BULVARI 106</address1>
<city>ANKARA</city>
<country>TR</country>
<countryName>TURKEY</countryName>
<postalCode>06550</postalCode>
</address>
<address>
<address1>ASIATISK PLADS 2</address1>
<city>COPENHAGEN</city>
<country>DK</country>
<countryName>DENMARK</countryName>
<postalCode>1448</postalCode>
</address>
<address>
<address1>NORTH AVENUE</address1>
<city>DHAKA</city>
<country>BD</country>
<countryName>BANGLADESH</countryName>
<postalCode>1212</postalCode>
</address>
<address>
<city>CAIRO</city>
<country>EG</country>
<countryName>EGYPT</countryName>
</address>
</addresses>
</entity>
<entity id="1125610" version="20230414163054">
<name>TAKAHASHI, KOICHI</name>
<listId>1020</listId>
<listCode>PEP</listCode>
<entityType>03</entityType>
<createdDate>09/02/2004</createdDate>
<lastUpdateDate>04/14/2023</lastUpdateDate>
<source>PEP</source>
<OriginalSource>PEP</OriginalSource>
<dobs>
<dob Y="1944">1944</dob>
</dobs>
<nativeCharNames>
<nativeCharName charSet="" latinCharName="TAKAHASHI, KOICHI" type="Primary">たかはし こういち</nativeCharName>
<nativeCharName charSet="" latinCharName="TAKAHASHI, KOICHI" type="Primary">高橋 恒一</nativeCharName>
</nativeCharNames>
<titles>
<title>FORMER AMBASSADOR OF JAPAN TO THE CZECH REPUBLIC (FEBRUARY 03, 2003 - 2005).</title>
</titles>
<sdfs>
<sdf name="OtherInformation">Career: Ambassador of Japan to the Czech Republic (February 03, 2003 - 2005); Deputy Vice-Minister in charge of Immigration Bureau, Ministry of Justice (1999 - 2001); Consul-General of Japan to Berlin City, Germany (1995 - 1997); Minister of Japan to</sdf>
<sdf name="DirectID">https://accuity.worldcompliance.com/signin.aspx?ent=9b2a063e-8d55-4806-b2f2-f2c79d815a33</sdf>
<sdf name="EffectiveDate">1999</sdf>
<sdf name="EntityLevel">National</sdf>
<sdf name="ExpirationDate">2001</sdf>
<sdf name="Gender">MALE</sdf>
<sdf name="NameSource">Website</sdf>
<sdf name="OriginalID">8483</sdf>
<sdf name="SubCategory">Former PEP</sdf>
</sdfs>
<addresses>
<address>
<country>JP</country>
<countryName>JAPAN</countryName>
</address>
</addresses>
</entity>
<entity id="1125925" version="20230414163054">
<name>PINTER, SANDOR</name>
<listId>1020</listId>
<listCode>PEP</listCode>
<entityType>03</entityType>
<createdDate>09/02/2004</createdDate>
<lastUpdateDate>04/14/2023</lastUpdateDate>
<source>PEP</source>
<OriginalSource>PEP</OriginalSource>
<dobs>
<dob Y="1948">07/03/1948</dob>
</dobs>
<pobs>
<pob>Budapest, , Hungary</pob>
</pobs>
<titles>
<title>DEPUTY PRIME MINISTER OF HUNGARY, EFFECTIVE FROM MAY 04, 2018.</title>
</titles>
<sdfs>
<sdf name="OtherInformation">Career: Deputy Prime Minister, effective from May 04, 2018; Minister of Interior, effective from May 29, 2010; Minister of Interior (July 08, 1998 - May 27, 2002); Chief of the Hungarian National Police (September 18, 1991 - 1996).</sdf>
<sdf name="DirectID">https://accuity.worldcompliance.com/signin.aspx?ent=cd135a22-6242-4999-bc6f-5aae5b0f92e2</sdf>
<sdf name="EffectiveDate">2018</sdf>
<sdf name="EntityLevel">National</sdf>
<sdf name="Gender">MALE</sdf>
<sdf name="NameSource">Website</sdf>
<sdf name="Org_PID">2544374</sdf>
<sdf name="OriginalID">11549</sdf>
<sdf name="Relationship">Father</sdf>
<sdf name="SubCategory">Govt Branch Member</sdf>
</sdfs>
<addresses>
<address>
<address1>TEVE U. 4-6.</address1>
<city>BUDAPEST</city>
<country>HU</country>
<countryName>HUNGARY</countryName>
<postalCode>1139</postalCode>
</address>
<address>
<address1>JOZSEF ATTILA U. 2-4.</address1>
<city>BUDAPEST</city>
<country>HU</country>
<countryName>HUNGARY</countryName>
<postalCode>1051</postalCode>
</address>
</addresses>
</entity>
</entities>
</gwl>
下面是用StAX解析的方法解析出上述xml文件里标签为entity的所有内容,并均匀写入7个新的xml文件中,并且每个新的xml文件都是自定义固定的格式:
import java.io.FileInputStream;
import java.io.FileOutputStream;
import java.io.InputStream;
import java.io.OutputStream;
import javax.xml.stream.XMLInputFactory;
import javax.xml.stream.XMLOutputFactory;
import javax.xml.stream.XMLStreamConstants;
import javax.xml.stream.XMLStreamException;
import javax.xml.stream.XMLStreamReader;
import javax.xml.stream.XMLStreamWriter;
public class StAXParserTest {
public static void main(String[] args) {
String inputFile = "D:\\Desktop\\PEP\\ENTITY.XML"; // 输入XML文件路径
String outputPrefix = "D:\\Desktop\\PEP\\"; // 输出XML文件前缀
int numFiles = 7; // 新文件数量
try {
// 创建XML输入工厂和读取器
XMLInputFactory inputFactory = XMLInputFactory.newInstance();
//创建输入流
InputStream inputStream = new FileInputStream(inputFile);
//使用输入工厂创建XMLStreamReader
XMLStreamReader reader = inputFactory.createXMLStreamReader(inputStream);
// 创建XML输出工厂和写入器数组
XMLOutputFactory outputFactory = XMLOutputFactory.newInstance();
//创建输出流数组:
OutputStream[] outputStreams = new OutputStream[numFiles];
//创建XMLStreamWriter数组
XMLStreamWriter[] writers = new XMLStreamWriter[numFiles];
for (int i = 0; i < numFiles; i++) {
String outputFileName = outputPrefix + (i + 1) + ".xml";
outputStreams[i] = new FileOutputStream(outputFileName);
writers[i] = outputFactory.createXMLStreamWriter(outputStreams[i]);
//开始编写XML文件刚开始头部 如:<?xml version='1.0' encoding='UTF-8'?>
writers[i].writeStartDocument("UTF-8", "1.0");
//此处为加了一个回车
writers[i].writeCharacters("\n");
//创建了GWL标签
writers[i].writeStartElement("gwl");
writers[i].writeCharacters("\n");
//创建了Version标签,并在Version标签内增加值
writers[i].writeStartElement("version");
writers[i].writeCharacters("20230417084108");
//Version标签结束,增加回标签</Version>
writers[i].writeEndElement();
writers[i].writeCharacters("\n");
writers[i].writeStartElement("entities");
}
// 解析XML并写入新文件
int currentFileIndex = 0;
int entityCount = 0;
while (reader.hasNext()) {
int event = reader.next();
switch (event) {
case XMLStreamConstants.START_ELEMENT:
String elementName = reader.getLocalName();
if ("entity".equals(elementName)) {
// 解析entity元素及其子元素
writeEntityElement(reader, writers[currentFileIndex]);
entityCount++;
// 切换到下一个文件
currentFileIndex = (currentFileIndex + 1) % numFiles;
}
break;
}
}
// 关闭写入器和输出流
for (int i = 0; i < numFiles; i++) {
writers[i].writeCharacters("\n");
//entities回标签
writers[i].writeEndElement(); // entities
writers[i].writeCharacters("\n");
//gwl回标签
writers[i].writeEndElement(); // gwl
writers[i].writeCharacters("\n");
writers[i].writeEndDocument();
writers[i].flush();
writers[i].close();
outputStreams[i].close();
}
// 关闭输入流
inputStream.close();
System.out.println("entity总数量: " + entityCount);
System.out.println("Entities per file: " + (entityCount / numFiles));
} catch (Exception e) {
e.printStackTrace();
}
}
private static void writeEntityElement(XMLStreamReader reader, XMLStreamWriter writer) throws XMLStreamException {
writer.writeCharacters("\n");
//开始写入Entity标签
writer.writeStartElement("entity");
// 写入entity元素的属性
int attributeCount = reader.getAttributeCount();
//读取entity标签内的属性值: attributeName为id/version attributeValue则为值
for (int i = 0; i < attributeCount; i++) {
String attributeName = reader.getAttributeLocalName(i);
String attributeValue = reader.getAttributeValue(i);
writer.writeAttribute(attributeName, attributeValue);
}
// 解析entity元素的子元素
while (reader.hasNext()) {
int event = reader.next();
switch (event) {
case XMLStreamConstants.START_ELEMENT:
//获取当前开始的元素的名称
String childElementName = reader.getLocalName();
//写入开始元素的代码
writer.writeStartElement(childElementName);
break;
case XMLStreamConstants.END_ELEMENT:
String endElementName = reader.getLocalName();
//写入结束元素的代码
writer.writeEndElement();
if ("entity".equals(endElementName)) {
// entity元素解析完毕,结束写入
return;
}
break;
case XMLStreamConstants.CHARACTERS:
String text = reader.getText();
writer.writeCharacters(text);
break;
}
}
}
}
上述示例截取的xml文件中一共8个entity元素,解析完成后,7个xml文件中每个文件平均存入一条,多余出来的1条依次存入,所以第一个xml文件里是2条,其他6个里面只有一条数据
我完整的解析了4GB大小的Entity.xml文件,不会存在内存溢出的问题,解析速度也很快!