![](https://img-blog.csdnimg.cn/20201014180756724.png?x-oss-process=image/resize,m_fixed,h_64,w_64)
爬虫
十二分热爱
这个作者很懒,什么都没留下…
展开
-
解析库的使用——使用pyquery
from pyquery import PyQuery as pqhtml = '''<div> <ul class="list"> <li class="item-0">first item</li> <li class="item-1"><a href="link2.html">second item</a></li> <li clas.原创 2021-02-03 23:10:01 · 103 阅读 · 0 评论 -
解析库的使用——使用BeautifulSoup
from bs4 import BeautifulSouphtml= """<html><head><title>The Dormouse's story</title></head><body><p class="title"><b>The Dormouse's story</b></p><p class="story">Once upon a time ther.原创 2021-02-03 22:43:48 · 80 阅读 · 0 评论 -
解析库的使用——使用XPath
from lxml import etreehtml = etree.parse('./test.html',etree.HTMLParser())result = etree.tostring(html)#可自动补全标签print(result.decode('utf-8'))#获取所有节点html = etree.parse('./test.html',etree.HTMLParser())result = html.xpath('//*')print(result)#获取所有d.原创 2021-02-03 19:41:14 · 199 阅读 · 0 评论