未完待续
一、学习网站
https://lxml.de/xpathxslt.html#xpath-return-values
2.https://www.w3.org/TR/xpath/all/
二、lxml - XML and HTML with Python
lxml is the most feature-rich and easy-to-use library for processing XML and HTML in the Python language.
Parsers are represented by parser objects. There is support for parsing both XML and (broken) HTML. Note that XHTML is best parsed as XML, parsing it with the HTML parser can lead to unexpected results.
2.1可使用的解析器
可使用的解析器有:lxml.etree.XMLParser()和lxml.etree.
HTMLParser()我们可以通过etree中的方法来调用它们,先来看解析器
1.class lxml.etree.
XMLParser()
class lxml.etree.
XMLParser
(self, encoding=None, attribute_defaults=False, dtd_validation=False, load_dtd=False, no_network=True, ns_clean=False, recover=False, schema: XMLSchema = None, huge_tree=False, remove_blank_text=False, resolve_entities=True, remove_comments=False, remove_pis=False, strip_cdata=True, collect_ids=True, target=None, compact=True)
Bases: lxml.etree._FeedParser
The XML parser.
Parsers can be supplied as additional argument to various parse functions of the lxml API. A default parser is always available and can be replaced by a call to the global function ‘set_default_parser’. New parsers can be created at any time without a major run-time overhead.
详细说明见: