解析和遍历文档

[b]To parse a HTML document(解析一个html文档):[/b]

String html = "<html><head><title>First parse</title></head>"
+ "<body><p>Parsed HTML into a doc.</p></body></html>";
Document doc = Jsoup.parse(html);

(See parsing a document from a string for more info.)

The parser will make every attempt to create a clean parse from the HTML you provide, regardless of whether the HTML is well-formed or not. It handles(无论html格式是否完整或正确,解析器都会试图建立一个干净的对象或完整的对象):

[list]
[*]unclosed tags (如未关闭的标签 )(e.g. <p>Lorem <p>Ipsum parses to <p>Lorem</p> <p>Ipsum</p>)
[*]implicit tags (如隐含的标签)(e.g. a naked <td>Table data</td> is wrapped into a <table><tr><td>?)
[*]reliably creating the document structure (可靠地创建文档结构)(html containing a head and body, and only appropriate elements within the head (html包含head 和 body,那些只适合在头部的标签))
[/list]
[b]The object model of a document(一个文档对象模型)[/b]
Documents consist of Elements and TextNodes (文档模型中包含很多元素和文字节点)(and a couple of other misc nodes(一些其他的节点): see the nodes package tree(请看节点包)).
The inheritance chain is(继承连): Document extends Element extends Node(文档继承元素继承节点). TextNode extends Node(文字节点继承节点).
An Element contains a list of children Nodes(一个节点包含许多子节点), and has one parent Element(和有一个父节点). They also have provide a filtered list of child Elements only.
[b]See also[/b]
Extracting data: DOM navigation
Extracting data: Selector syntax
  • 0
    点赞
  • 0
    收藏
    觉得还不错? 一键收藏
  • 0
    评论
评论
添加红包

请填写红包祝福语或标题

红包个数最小为10个

红包金额最低5元

当前余额3.43前往充值 >
需支付:10.00
成就一亿技术人!
领取后你会自动成为博主和红包主的粉丝 规则
hope_wisdom
发出的红包
实付
使用余额支付
点击重新获取
扫码支付
钱包余额 0

抵扣说明:

1.余额是钱包充值的虚拟货币,按照1:1的比例进行支付金额的抵扣。
2.余额无法直接购买下载,可以购买VIP、付费专栏及课程。

余额充值