理解xpath||lxml||markup||markdown

  • XML

    Extensible Markup Language (XML) is a markup language that defines a set of rules for encoding documents in a format that is both human-readable and machine-readable.

    The design goals of XML emphasize simplictiy, generality, and usability across the Internet.

    Markup language

    A markup language is a system for annotating a document in a way that is syntactically distinguishable from the text, meaning when the document is processed for display, the markup language is not shown, and is only used to format the text.

    Markdown

    Markdown is a lightweight markup language with plain-text-formatting syntax, created in 2004 by John Gruber and Aaron Swartz.

  • XPath

    XPath(XML Path Language) is a query language for selecting nodes from an XML document. XPath was defined by the World Wide Web Consortium (W3C).

    The XPath language is based on a tree representation of the XML document, an XPath expression is often referred to simply as “an XPath”.

  • XPath syntax & tutorial

    XML Path Language (XPath) 3.0

    XPath Tutorial

  • Terminology

    XPath Nodes has several kinds: element, attribute, text, namespace, processing-instruction, comment, and document nodes.

    XML documents are treated as trees of nodes. The topmost element of the tree is called the root element.

    Atomic values are nodes with no children or parent.

    Items are atomic values or nodes.

    XPath axes represents a relationship to the context (current) node, and is used to locate nodes relative to that node on the tree.

  • Relationship of Nodes
    1. Parent

      Each element and attribute has one parent.

    2. Children

      Element nodes may have zero, one or more children.

    3. Siblings

      Nodes that have the same parent.

    4. Ancestors

      A node’s parent, parent’s parent, etc.

    5. Descendants

      A node’s children, children’s children.

  • XPath Syntax

    XPath uses path expressions to select nodes or node-sets in an XML document.

    ExpressionDescription
    nodenameSelects all nodes with the name “nodename
    /Selects from the root node
    //Selects nodes in the document from the current node that match the selection no matter where they are
    .Selects the current node
    Selects the parent of the current node
    @Selects attributes
    /bookstore/book[1]Predicates, used to find a specific node
    *Matches any element node
    @*Matches any attribute node
    node()Matches any node of any kind
    |and
  • Axes
    axisname::node[predicate]
    child::book
    
    AxisNameResult
    ancestorSelects all ancestors (parent, grandparent, etc.) of the current node
    ancestor-or-selfSelects all ancestors (parent, grandparent, etc.) of the current node and the current node itself
    attributeSelects all attributes of the current node
    childSelects all children of the current node
    descendantSelects all descendants (children, grandchildren, etc.) of the current node
    descendant-or-selfSelects all descendants (children, grandchildren, etc.) of the current node and the current node itself
    followingSelects everything in the document after the closing tag of the current node
    following-siblingSelects all siblings after the current node
    namespaceSelects all namespace nodes of the current node
    parentSelects the parent of the current node
    precedingSelects all nodes that appear before the current node in the document, except ancestors, attribute nodes and namespace nodes
    preceding-siblingSelects all siblings before the current node
    selfSelects the current node
  • XPath Operators

    An XPath expression returns either a node-set, a string, a Boolean, or a number.

    OperatorDescriptionExample
    |Computes two node-sets//book | //cd
    +Addition6 + 4
    -Subtraction6 - 4
    *Multiplication6 * 4
    divDivision8 div 4
    =Equalprice=9.80
    !=Not equalprice!=9.80
    <Less thanprice<9.80
    <=Less than or equal toprice<=9.80
    >Greater thanprice>9.80
    >=Greater than or equal toprice>=9.80
    ororprice=9.80 or price=9.70
    andandprice>9.00 and price<9.90
    modModulus (division remainder)5 mod 2
  • XSLT

    XPath is a major element in the XSLT standard.

    XSLT(Extensible Stylesheet Language Transformation) is a language for transforming XML documents into other XML documents, or other formats such as HTML for web pages, plain text or XSL Formatting Objects, which may subsequently be converted to other formats, such as PDF, PostScript and PNG.

  • Python module : lxml

    参见《理解lxml module in Python

  • 0
    点赞
  • 0
    收藏
    觉得还不错? 一键收藏
  • 0
    评论
评论
添加红包

请填写红包祝福语或标题

红包个数最小为10个

红包金额最低5元

当前余额3.43前往充值 >
需支付:10.00
成就一亿技术人!
领取后你会自动成为博主和红包主的粉丝 规则
hope_wisdom
发出的红包
实付
使用余额支付
点击重新获取
扫码支付
钱包余额 0

抵扣说明:

1.余额是钱包充值的虚拟货币,按照1:1的比例进行支付金额的抵扣。
2.余额无法直接购买下载,可以购买VIP、付费专栏及课程。

余额充值