java 拆分xml,Java:使用SAXParser拆分大型XML文件

I am trying to split a large XML file into smaller files using java's SAXParser (specifically the wikipedia dump which is about 28GB uncompressed).

I have a Pagehandler class which extends DefaultHandler:

private class PageHandler extends DefaultHandler {

private StringBuffer text;

...

@Override

public void startElement(String uri, String localName, String qName, Attributes attributes) {

text.append("");

}

@Override

public void endElement(String uri, String localName, String qName) {

text.append("" + qName + ">");

if (qName.equals("page")) {

text.append("\n");

pageCount++;

writePage();

}

if (pageCount >= maxPages) {

rollFile();

}

}

@Override

public void characters(char[] chars, int start, int length) {

for (int i = start; i < start + length; i++) {

text.append(chars[i]);

}

}

}

So I can write out element content no problem. My problem is how to get the element tags and attributes - these characters do not seem to be reported. At best I will have to reconstruct these from what's passed as arguments to startElement - which seems a bit of a a pain. Or is there an easier way?

All I want to do is loop through the file and write it out, rolling the output file every-so-often. How hard can this be :)

Thanks

解决方案

I'm not quite sure I totally understand what you are trying to do but to get the qualified name as a string you simply do qName.toString() and to get the attributes name you just do atts.getQName(int index).

评论
添加红包

请填写红包祝福语或标题

红包个数最小为10个

红包金额最低5元

当前余额3.43前往充值 >
需支付:10.00
成就一亿技术人!
领取后你会自动成为博主和红包主的粉丝 规则
hope_wisdom
发出的红包
实付
使用余额支付
点击重新获取
扫码支付
钱包余额 0

抵扣说明:

1.余额是钱包充值的虚拟货币,按照1:1的比例进行支付金额的抵扣。
2.余额无法直接购买下载,可以购买VIP、付费专栏及课程。

余额充值