1.什么是SAX?
SAX,全称Simple API for XML,是一个用于处理XML事件驱动的“推”模型,虽然它不是W3C标准,但它却是一个得到了广泛认可的API。SAX解析器不像DOM那样建立一个完整的文档树,而是在读取文档时激活一系列事件,这些事件被推给事件处理器,然后由事件处理器提供对文档内容的访问。
事件处理器类型:
- 用于访问XML DTD内容的DTDHandler;
- 用于低级访问解析错误的ErrorHandler;
- 用于访问文档内容的ContentHandler,这也是最普遍使用的事件处理器。
优势
- 提供对XML文档内容的有效低级访问;
- 内存消耗小,因为整个文档无需一次加载到内存中;
- 无需像在DOM中那样为所有节点创建对象;
- 可用于广播环境,能够同时注册多个ContentHandler,并行接收事件。
劣势
- 必须实现多个事件处理程序以便能够处理所有到来的事件;
- 必须在应用程序代码中维护这个事件状态;
- 不能支持随机访问。
2.使用的SAX类:
org.xml.sax:
有如下图:
org.xml.sax.ext:
org.xml.sax.helpers
3.实例
测试用的text.xml
<?xml version="1.0" encoding="UTF-8" standalone="yes"?>
<!-- <!DOCTYPE country SYSTEM "country.dtd"> -->
<!DOCTYPE country [
<!ELEMENT country (provinces?,states?,municipalites?)>
<!ATTLIST country name CDATA #REQUIRED>
<!ELEMENT provinces (province+)>
<!ELEMENT province (cities)>
<!ATTLIST province name CDATA #REQUIRED>
<!ELEMENT cities (city+)>
<!ELEMENT city (#PCDATA)>
<!ATTLIST city name CDATA #REQUIRED>
]>
<country name="China">
<provinces>
<province name="GuangDong">
<cities>
<city name="GuangZhou">广州</city>
<city name="ShenZhen">深圳</city>
<city name="ZhuHai">珠海</city>
</cities>
</province>
<province name="HuNan">
<cities>
<city name="ChangSha">长沙</city>
<city name="HengYang">衡阳</city>
<city name="ChangDe">常德</city>
</cities>
</province>
</provinces>
</country>
定义MyContentHandler类,实现ContentHandler接口:
public static class MyContentHandler implements ContentHandler {
private Locator locator;
private int tentLength = 0;//此成员变量用于打印信息的缩进,以更好地观察输出内容
@Override
public void characters(char[] ch, int start, int length)
throws SAXException {
// TODO Auto-generated method stub
//打印空格以缩进,下同
for(int i =0;i<tentLength;i++){
System.out.print(" ");
}
System.out.println("characters():\""+String.copyValueOf(ch, start, length)+"\"");
}
@Override
public void endDocument() throws SAXException {
// TODO Auto-generated method stub
for(int i =0;i<tentLength;i++){
System.out.print(" ");
}
System.out.println("endDocument() called");
}
@Override
public void endElement(String uri, String localName, String qName)
throws SAXException {
// TODO Auto-generated method stub
for(int i =0;i<tentLength;i++){
System.out.print(" ");
}
System.out.println("endElement():</"+qName+">");
}
@Override
public void endPrefixMapping(String prefix) throws SAXException {
// TODO Auto-generated method stub
for(int i =0;i<tentLength;i++){
System.out.print(" ");
}
System.out.println("endPrefixMapping():"+prefix);
}
@Override
public void ignorableWhitespace(char[] ch, int start, int length)
throws SAXException {
// TODO Auto-generated method stub
//System.out.println("ignorableWhitespace():"+length);
tentLength = length;
}
@Override
public void processingInstruction(String target, String data)
throws SAXException {
// TODO Auto-generated method stub
for(int i =0;i<tentLength;i++){
System.out.print(" ");
}
System.out.println("processingInstruction():<"+target+","+data+">");
}
@Override
public void setDocumentLocator(Locator locator) {
// TODO Auto-generated method stub
this.locator = locator;
for(int i =0;i<tentLength;i++){
System.out.print(" ");
}
System.out.println("setDocumentLocator():["+locator+"]");
}
@Override
public void skippedEntity(String name) throws SAXException {
// TODO Auto-generated method stub
for(int i =0;i<tentLength;i++){
System.out.print(" ");
}
System.out.println("skippedEntity():"+name);
}
@Override
public void startDocument() throws SAXException {
// TODO Auto-generated method stub
for(int i =0;i<tentLength;i++){
System.out.print(" ");
}
System.out.println("startDocument() called");
}
@Override
public void startElement(String uri, String localName, String qName,
Attributes atts) throws SAXException {
// TODO Auto-generated method stub
for(int i =0;i<tentLength;i++){
System.out.print(" ");
}
System.out.println("startElement():<"+qName+">");
}
@Override
public void startPrefixMapping(String prefix, String uri)
throws SAXException {
// TODO Auto-generated method stub
for(int i =0;i<tentLength;i++){
System.out.print(" ");
}
System.out.println("startPrefixMapping():"+prefix);
}
}
首先,为了弄清楚这些方法的调用顺序,我们在每个方法中将方法名和接收到的参数打印出来。以下是程序运行的main函数:
public static void main(String[] args) throws SAXException, IOException{
File srcFile = new File("./test.xml");
XMLReader xmlReader = XMLReaderFactory.createXMLReader();
xmlReader.setFeature("http://xml.org/sax/features/validation",false);
xmlReader.setContentHandler(new MyContentHandler());
xmlReader.parse("./test.xml");
}
输出结果:
setDocumentLocator():[com.sun.org.apache.xerces.internal.parsers.AbstractSAXParser$LocatorProxy@1a2961b]
startDocument() called
startElement():<country>
startElement():<provinces>
startElement():<province>
startElement():<cities>
startElement():<city>
characters():"广州"
endElement():</city>
startElement():<city>
characters():"深圳"
endElement():</city>
startElement():<city>
characters():"珠海"
endElement():</city>
endElement():</cities>
endElement():</province>
startElement():<province>
startElement():<cities>
startElement():<city>
characters():"长沙"
endElement():</city>
startElement():<city>
characters():"衡阳"
endElement():</city>
startElement():<city>
characters():"常德"
endElement():</city>
endElement():</cities>
endElement():</province>
endElement():</provinces>
endElement():</country>
endDocument() called
接下来,我们将该XML文件解析成一个Country 类对象。先定义两个类:
public static class Province{
public String name;
public ArrayList<String> cities;
public Province(){
name = "";
cities = new ArrayList<String>(5);
}
}
public static class Country{
public String name;
public ArrayList<Province> provinces;
public Country(){
name = "";
provinces = new ArrayList<Province>();
}
}
在MyContentHandler中声明如下成员变量:
private Country country;
private Province curProvince;
private City curCity;
private boolean isInCityElement = false;//指示当前事件处于City 元素中,用于获取城市的中文名称
由于该XML文件结构比较简单,我们只需要修改startElement()/character()/endElement()/endDocument()四个成员方法,如下:
@Override
public void characters(char[] ch, int start, int length)
throws SAXException {
if(isInCityElement&&curCity != null){//若当前处于City元素中,则获取城市中文名称
curCity.chName = String.copyValueOf(ch, start, length);
}
}
@Override
public void endDocument() throws SAXException {
//print Country object:将解析出来的Country类对象打印出来,验证解析是否正确
System.out.println("country:"+country.name);
int size = country.provinces.size();
for(int i = 0;i<size;i++){
Province prc = country.provinces.get(i);
System.out.println(" |--"+prc.name);
for(City city : prc.cities){
System.out.println(" | |--"+city.enName+"("+city.chName+")");
}
}
}
@Override
public void endElement(String uri, String localName, String qName)
throws SAXException {
if(localName.equalsIgnoreCase("province")){
if(curProvince != null){
country.provinces.add(curProvince);
curProvince = null;
}
}else if(localName.equalsIgnoreCase("city")){
if(curProvince !=null&&curCity != null){
curProvince.cities.add(curCity);
curCity = null;
}
isInCityElement = false;//在此标记已不在City元素中
}
}
@Override
public void startElement(String uri, String localName, String qName,
Attributes atts) throws SAXException {
isInCityElement = true;
if(localName.equalsIgnoreCase("country")){
country = new Country();
country.name = atts.getValue("name");
}else if(localName.equalsIgnoreCase("province")){
curProvince =new Province();
curProvince.name = atts.getValue("name");
}else if(localName.equalsIgnoreCase("city")){
isInCityElement = true;//在此标记进入city元素中
curCity = new City();
curCity.enName = atts.getValue("name");
}
}
输出结果:
country:China
|--GuangDong
| |--GuangZhou(广州)
| |--ShenZhen(深圳)
| |--ZhuHai(珠海)
|--HuNan
| |--ChangSha(长沙)
| |--HengYang(衡阳)
| |--ChangDe(常德)