PyQuery用法详解
● PyQuery,强大又灵活的网页解析库。如果你觉得正则写起来太麻烦,BeautifulSoup语法太难记,如果你熟悉jQuery的语法,那么PyQuery就是你的绝佳选择。
01.初始化
1.1 字符串初始化
from pyquery import PyQuery as pq
html = '''
<div>
<ul>
<li class="item-0">first item</li>
<li class="item-1"><a href="link2.html">second item</a></li>
<li class="item-0 active"><a href="link3.html"><span class="bold">third item</span></a></li>
<li class="item-1 active"><a href="link4.html">fourth item</a></li>
<li class="item-0"><a href="link5.html">fifth item</a></li>
</ul>
</div>
'''
# 声明一个对象,传递HTML参数
doc = pq(html)
# 选择li标签
print(doc('li'))
1.2 URL初始化
from pyquery import PyQuery as pq
doc = pq(url='http://www.baidu.com')
print(doc('head'))
1.3 文件初始化
demo.html
<!DOCTYPE html>
<html lang="en">
<head>
<meta charset="UTF-8">
<title>demo</title>
</head>
<body>
<div>
<ul>
<li class="item-0">first item</li>
<li class="item-1"><a href="link2.html">second item</a></li>
<li class="item-0 active"><a href="link3.html"><span class="bold">third item</span></a></li>
<li class="item-1 active"><a href="link4.html">fourth item</a></li>
<li class="item-0"><a href="link5.html">fifth item</a></li>
</ul>
</div>
</body>
</html>
from pyquery import PyQuery as pq
doc = pq(filename='demo.html')
print(doc('li'))
02.基本CSS选择器
获取某一个元素
from pyquery import PyQuery as pq
html = '''
<div id="container">
<ul class="list">
<li class="item-0">first item</li>
<li class="item-1"><a href="link2.html">second item</a></li>
<li class="item-0 active"><a href="link3.html"><span class="bold">third item</span></a></li>
<li class="item-1 active"><a href="link4.html">fourth item</a></li>
<li cl