I am trying to split a large XML file into smaller files using java's SAXParser (specifically the wikipedia dump which is about 28GB uncompressed).
I have a Pagehandler class which extends DefaultHandler:
private class PageHandler extends DefaultHandler {
private StringBuffer text;
...
@Override
public void startElement(String uri, String localName, String qName, Attributes attributes) {
text.append("");
}
@Override
public void endElement(String uri, String localName, String qName) {
text.append("" + qName + ">");
if (qName.equals("page")) {
text.append("\n");
pageCount++;
writePage();
}
if (pageCount >= maxPages) {
rollFile();
}
}
@Override
public void characters(char[] chars, int start, int length) {
for (int i = start; i < start + length; i++) {
text.append(chars[i]);
}
}
}
So I can write out element content no problem. My problem is how to get the element tags and attributes - these characters do not seem to be reported. At best I will have to reconstruct these from what's passed as arguments to startElement - which seems a bit of a a pain. Or is there an easier way?
All I want to do is loop through the file and write it out, rolling the output file every-so-often. How hard can this be :)
Thanks
解决方案
I'm not quite sure I totally understand what you are trying to do but to get the qualified name as a string you simply do qName.toString() and to get the attributes name you just do atts.getQName(int index).