-
XML
Extensible Markup Language (XML) is a markup language that defines a set of rules for encoding documents in a format that is both human-readable and machine-readable.
The design goals of XML emphasize simplictiy, generality, and usability across the Internet.
A markup language is a system for annotating a document in a way that is syntactically distinguishable from the text, meaning when the document is processed for display, the markup language is not shown, and is only used to format the text.
Markdown is a lightweight markup language with plain-text-formatting syntax, created in 2004 by John Gruber and Aaron Swartz.
-
XPath
XPath
(XML Path Language) is a query language for selecting nodes from an XML document. XPath was defined by the World Wide Web Consortium (W3C).The
XPath
language is based on a tree representation of theXML
document, an XPath expression is often referred to simply as “an XPath”. -
XPath syntax & tutorial
-
Terminology
XPath Nodes has several kinds:
element
,attribute
,text
,namespace
,processing-instruction
,comment
, anddocument
nodes.XML documents are treated as trees of nodes. The topmost element of the tree is called the root element.
Atomic values are nodes with no
children
orparent
.Items are
atomic values
ornodes
.XPath axes represents a relationship to the
context (current) node
, and is used to locate nodes relative to that node on the tree. -
Relationship of Nodes
-
Parent
Each element and attribute has one parent.
-
Children
Element nodes may have zero, one or more children.
-
Siblings
Nodes that have the same parent.
-
Ancestors
A node’s parent, parent’s parent, etc.
-
Descendants
A node’s children, children’s children.
-
-
XPath Syntax
XPath uses path expressions to select nodes or node-sets in an XML document.
Expression Description nodename Selects all nodes with the name “nodename” / Selects from the root node // Selects nodes in the document from the current node that match the selection no matter where they are . Selects the current node … Selects the parent of the current node @ Selects attributes /bookstore/book[1] Predicates, used to find a specific node * Matches any element node @* Matches any attribute node node() Matches any node of any kind |
and -
Axes
axisname::node[predicate] child::book
AxisName Result ancestor Selects all ancestors (parent, grandparent, etc.) of the current node ancestor-or-self Selects all ancestors (parent, grandparent, etc.) of the current node and the current node itself attribute Selects all attributes of the current node child Selects all children of the current node descendant Selects all descendants (children, grandchildren, etc.) of the current node descendant-or-self Selects all descendants (children, grandchildren, etc.) of the current node and the current node itself following Selects everything in the document after the closing tag of the current node following-sibling Selects all siblings after the current node namespace Selects all namespace nodes of the current node parent Selects the parent of the current node preceding Selects all nodes that appear before the current node in the document, except ancestors, attribute nodes and namespace nodes preceding-sibling Selects all siblings before the current node self Selects the current node -
XPath Operators
An XPath expression returns either a node-set, a string, a Boolean, or a number.
Operator Description Example | Computes two node-sets //book | //cd + Addition 6 + 4 - Subtraction 6 - 4 * Multiplication 6 * 4 div Division 8 div 4 = Equal price=9.80 != Not equal price!=9.80 < Less than price<9.80 <= Less than or equal to price<=9.80 > Greater than price>9.80 >= Greater than or equal to price>=9.80 or or price=9.80 or price=9.70 and and price>9.00 and price<9.90 mod Modulus (division remainder) 5 mod 2 -
XSLT
XPath
is a major element in theXSLT
standard.XSLT(Extensible Stylesheet Language Transformation) is a language for transforming XML documents into other XML documents, or other formats such as HTML for web pages, plain text or XSL Formatting Objects, which may subsequently be converted to other formats, such as PDF, PostScript and PNG.
-
Python module : lxml
理解xpath||lxml||markup||markdown
最新推荐文章于 2022-11-04 20:13:43 发布