dom4j如何获取节点的行数,列数

在使用dom4j解析xml文件时,可能会对一些节点做检测,判断是否符合schema,对一些不符合的节点要作出提示。为了使作出的提示更友好,还需要指出错误在哪里。但是dom4j并没有提供相关的功能,或者说这个功能隐藏的很深。搜索了一下,发现这个问题的答案很少。我在一个mail(https://www.mail-archive.com/dom4j-user@lists.sourceforge.net/msg02769.html)中得到了提示,摸索了出来,巨麻烦。该mail提到了主要的过程:

org.xml.sax.Locator locator = new …;
DocumentFactory documentFactory = new DocumentFactoryWithLocator(locator);
SAXContentHandler contentHandler = new SAXContentHandler(documentFactory);
contentHandler.setDocumentLocator(locator);
org.xml.sax.XMLReader reader = …;
reader.setContentHandler(contentHandler );
reader.parse(…);
Document document = contentHandler.getDocument();


在DocumentFactory中:

public Element createElement(QName qname) {
	ElementWithLocation element =  new ElementWithLocation (qname);
    element.setLocation(locator.getLineNumber(),
	locator.getColumnNumber());
    return element;
}


方向是对的,但是按照这个方法做出来,却得不到正确的结果。原因是,SAXReader中的XMLReader在调用parse方法时,会重新赋值locator,把我传进去的locator覆盖掉了,这样在DocumentFactory就不是XMLReader中的locator。dom4j使用的XMLReader是org.apache.xerces中的SAXParser,该类继承了AbstractSAXParser,在AbstractSAXParser中有一个方法:

public void startDocument(XMLLocator locator, String encoding, 
                              NamespaceContext namespaceContext, Augmentations augs)
        throws XNIException {
        
        fNamespaceContext = namespaceContext;

        try {
            // SAX1
            if (fDocumentHandler != null) {
                if (locator != null) {
                    fDocumentHandler.setDocumentLocator(new LocatorProxy(locator));
                }
                fDocumentHandler.startDocument();
            }

            // SAX2
            if (fContentHandler != null) {
                if (locator != null) {
                    fContentHandler.setDocumentLocator(new LocatorProxy(locator));
                }
                fContentHandler.startDocument();
            }
        }
        catch (SAXException e) {
            throw new XNIException(e);
        }

    }

就是这个方法覆盖了contentHandler的locator。我的做法就是在AbstractSAXParser重新赋值locator的时候获取这个值,传递给DocumentFactory。

上代码,首先需要扩展原来的Element类,使之可以记录节点的位置信息(需要记录Attribute等的同理)。

public class GokuElement extends DefaultElement {
	private int lineNum = 0, colNum = 0;

	public GokuElement(QName qname) {
		super(qname);
		// TODO Auto-generated constructor stub
	}

	public GokuElement(QName qname, int attrCount) {
		super(qname, attrCount);
	}

	public GokuElement(String name) {
		super(name);
	}

	public GokuElement(String name, Namespace namespace) {
		super(name, namespace);
	}

	public int getColumnNumber() {
		return this.colNum;
	}

	public int getLineNumber() {
		return this.lineNum;
	}

	public void setLocation(int lineNum, int colNum) {
		this.lineNum = lineNum;
		this.colNum = colNum;
	}
}

然后,扩展DocumentFactory,让factory生成我们定义的Element:

public class DocumentFactoryWithLocator extends DocumentFactory {
	private Locator locator;

	public DocumentFactoryWithLocator(Locator locator) {
		super();
		this.locator = locator;
	}

	@Override
	public Element createElement(QName qname) {
		GokuElement element = new GokuElement(qname);
		element.setLocation(this.locator.getLineNumber(), this.locator.getColumnNumber());
		return element;
	}

	@Override
	public Element createElement(String name) {
		GokuElement element = new GokuElement(name);
		element.setLocation(this.locator.getLineNumber(), this.locator.getColumnNumber());
		return element;
	}

	public void setLocator(Locator locator) {
		this.locator = locator;
	}
}

然后扩展SAXContentHandler:

public class GokuSAXContentHandler extends SAXContentHandler {
	private DocumentFactoryWithLocator documentFactory = null;

	public GokuSAXContentHandler(DocumentFactory documentFactory2, ElementHandler dispatchHandler) {
		// TODO Auto-generated constructor stub
		super(documentFactory2, dispatchHandler);
	}

	public void setDocFactory(DocumentFactoryWithLocator fac) {
		this.documentFactory = fac;
	}

	@Override
	public void setDocumentLocator(Locator documentLocator) {
		super.setDocumentLocator(documentLocator);
		if (this.documentFactory != null)
			this.documentFactory.setLocator(documentLocator);
	}
}

最后扩展SAXReader

public class GokuSAXReader extends SAXReader {
	DocumentFactory docFactory;
	Locator locator;

	public GokuSAXReader(DocumentFactory docFactory) {
		// TODO Auto-generated constructor stub
		super(docFactory);
		this.docFactory = docFactory;
	}

	public GokuSAXReader(DocumentFactory docFactory, Locator locator) {
		// TODO Auto-generated constructor stub
		super(docFactory);
		this.locator = locator;
		this.docFactory = docFactory;
	}

	@Override
	protected SAXContentHandler createContentHandler(XMLReader reader) {
		return new GokuSAXContentHandler(this.getDocumentFactory(), super.getDispatchHandler());
	}

	@Override
	public Document read(InputSource in) throws DocumentException {
		try {
			XMLReader reader = this.getXMLReader();

			reader = this.installXMLFilter(reader);

			EntityResolver thatEntityResolver = super.getEntityResolver();

			if (thatEntityResolver == null) {
				thatEntityResolver = this.createDefaultEntityResolver(in.getSystemId());
				super.setEntityResolver(thatEntityResolver);
			}

			reader.setEntityResolver(thatEntityResolver);

			SAXContentHandler contentHandler = this.createContentHandler(reader);
			contentHandler.setEntityResolver(thatEntityResolver);
			contentHandler.setInputSource(in);

			boolean internal = this.isIncludeInternalDTDDeclarations();
			boolean external = this.isIncludeExternalDTDDeclarations();

			contentHandler.setIncludeInternalDTDDeclarations(internal);
			contentHandler.setIncludeExternalDTDDeclarations(external);
			contentHandler.setMergeAdjacentText(this.isMergeAdjacentText());
			contentHandler.setStripWhitespaceText(this.isStripWhitespaceText());
			contentHandler.setIgnoreComments(this.isIgnoreComments());
			reader.setContentHandler(contentHandler);

			this.configureReader(reader, contentHandler);
			((GokuSAXContentHandler) contentHandler).setDocFactory((DocumentFactoryWithLocator) this.docFactory);
			contentHandler.setDocumentLocator(this.locator);
			reader.parse(in);

			return contentHandler.getDocument();
		} catch (Exception e) {
			if (e instanceof SAXParseException) {
				// e.printStackTrace();
				SAXParseException parseException = (SAXParseException) e;
				String systemId = parseException.getSystemId();

				if (systemId == null) {
					systemId = "";
				}

				String message = "Error on line " + parseException.getLineNumber() + " of document " + systemId + " : "
						+ parseException.getMessage();

				throw new DocumentException(message, e);
			} else {
				throw new DocumentException(e.getMessage(), e);
			}
		}
	}
}
重写了read(InputSource)方法,使用我们写的SAXContentHandler。其他签名的read方法最后都调用了这个read方法。


解析文件的代码:

Locator locator = new LocatorImpl();
DocumentFactory docFactory = new DocumentFactoryWithLocator(locator);
SAXReader reader = new GokuSAXReader(docFactory, locator);
Document doc = reader.read(new File("goku.xml"));

需要获取Element信息时:

System.out.println(((GokuElement) element).getLineNumber());


在使用dom4j获取节点的完整路径和对应的行号时,你可以使用XPath表达式和节点的Location Path。 首先,使用XPath表达式来获取节点的完整路径。XPath表达式是一种用于在XML文档中定位节点的语言。在dom4j中,你可以使用`XPath`类来执行XPath表达式。假设你要获取名为`node`的节点的完整路径,可以按照以下步骤进行: 1. 创建一个`Document`对象,表示你的XML文档。假设你的XML文档已经被解析为`Document`对象,并且存储在变量`document`中。 2. 创建一个`XPath`对象,并使用XPath表达式来指定要获取节点。例如,对于名为`node`的节点,可以使用XPath表达式`//node`。 ```java XPath xpath = DocumentHelper.createXPath("//node"); ``` 3. 使用`evaluate()`方法执行XPath表达式,并传入`document`作为参数。这将返回一个节点列表。 ```java List<Node> nodeList = xpath.evaluate(document); ``` 4. 遍历节点列表,并使用`getPath()`方法获取每个节点的完整路径。 ```java for (Node node : nodeList) { String path = node.getPath(); System.out.println("Node path: " + path); } ``` 接下来,获取对应节点的行号。dom4j提供了一个`Dom4jXPath`类,可以在执行XPath表达式时获取节点的行号。你可以按照以下步骤进行: 1. 导入`Dom4jXPath`类。 ```java import org.dom4j.io.Dom4jXPath; ``` 2. 创建一个`Dom4jXPath`对象,并使用XPath表达式来指定要获取节点。例如,对于名为`node`的节点,可以使用XPath表达式`//node`。 ```java Dom4jXPath xpath = new Dom4jXPath("//node"); ``` 3. 使用`setDocument()`方法将`document`对象设置为`Dom4jXPath`对象的文档。 ```java xpath.setDocument(document); ``` 4. 使用`selectNodes()`方法执行XPath表达式,并传入`document`作为参数。这将返回一个节点列表。 ```java List<Node> nodeList = xpath.selectNodes(document); ``` 5. 遍历节点列表,并使用`getLineNumber()`方法获取每个节点的行号。 ```java for (Node node : nodeList) { int lineNumber = node.getLineNumber(); System.out.println("Node line number: " + lineNumber); } ``` 通过以上步骤,你可以使用dom4j获取节点的完整路径和对应的行号。记得替换掉代码中的`//node`为你实际需要获取节点的XPath表达式。
评论 4
添加红包

请填写红包祝福语或标题

红包个数最小为10个

红包金额最低5元

当前余额3.43前往充值 >
需支付:10.00
成就一亿技术人!
领取后你会自动成为博主和红包主的粉丝 规则
hope_wisdom
发出的红包
实付
使用余额支付
点击重新获取
扫码支付
钱包余额 0

抵扣说明:

1.余额是钱包充值的虚拟货币,按照1:1的比例进行支付金额的抵扣。
2.余额无法直接购买下载,可以购买VIP、付费专栏及课程。

余额充值