python pyquery库_Python爬虫之pyquery库的基本使用

最新推荐文章于 2024-05-11 22:30:00 发布

weixin_39602637

最新推荐文章于 2024-05-11 22:30:00 发布

阅读量148

点赞数

文章标签： python pyquery库

版权声明：本文为博主原创文章，遵循 CC 4.0 BY-SA 版权协议，转载请附上原文出处链接和本声明。

本文链接：https://blog.csdn.net/weixin_39602637/article/details/111856862

版权

1 #兄弟元素

2 html = '''

3

4

5

12

13

14 '''

15 from pyquery importPyQuery as pq16 doc =pq(html)17 #注意下面item-0后面直接是. 没有空格

18 li = doc('.list .item-0.active')19 print(li.siblings())20

21 print(li.siblings('.active'))22

23 #遍历

24 #单个元素

25 html = '''

26

27

28

35

36

37 '''

38 from pyquery importPyQuery as pq39 doc =pq(html)40 li = doc('.item-0.active')41 print(li)42

43 html = '''

44

45

46

53

54

55 '''

56 from pyquery importPyQuery as pq57 doc =pq(html)58 lis = doc('li').items()59 print(type(lis))60 for li inlis:61 print(li)62

63 #获取信息

64 #获取属性

65 html = '''

66

67

68

75

76

77 '''

78 from pyquery importPyQuery as pq79 doc =pq(html)80 a = doc('.item-0.active a')81 print(a)82 #获取属性的两种方法

83 print(a.attr('href'))84 print(a.attr.href)85

86 #获取文本

87 print(a.text())88

89 #获取html

90 from pyquery importPyQuery as pq91 doc =pq(html)92 li = doc('.item-0.active')93 print(li)94 #得到

标签里面的代码

95 print(li.html())96

97 #DOM操作

98 #addClass、removeClass

99 from pyquery importPyQuery as pq100 doc =pq(html)101 li = doc('.item-0.active')102 print(li)103 li.remove_class('active')104 print(li)105 li.add_class('active')106 print(li)107

108 #attr CSS

109 li.attr('name', 'link')110 print(li)111 li.css('font-size', '14px')112 print(li)113

114 #remove

115 html = '''

116

117 Hello,World118

This is a paragraph

119

120 '''

121 from pyquery importPyQuery as pq122 doc =pq(html)123 wrap = doc('.wrap')124 print(wrap.text())125 wrap.find('p').remove()126 print(wrap.text())127

128 #伪类选择器

129 html = '''

130

131

132

133
first item134
second item135
third item136
fourth item137
fifthth item138

139

140

141 '''

142 from pyquery importPyQuery as pq143 doc =pq(html)144 #获取第一个元素

145 li = doc('li:first-child')146 print(li)147 #获取最后一个元素

148 li = doc('li:last-child')149 print(li)150 #获取第二个元素

151 li = doc('li:nth-child(2)')152 print(li)153 #获取下标为2的元素后面的所有元素(下标从0开始)

154 li = doc('li:gt(2)')155 print(li)156 #获取下标为偶数的元素

157 li = doc('li:nth-child(2n)')158 print(li)159 #获取内容包含second 的元素

160 li = doc('li:contains(second)')161 print(li)

weixin_39602637

关注

0
点赞
踩
0

收藏

觉得还不错? 一键收藏
0
评论
复制链接

分享到 QQ

分享到新浪微博

扫一扫

评论

被折叠的条评论为什么被折叠?

到【灌水乐园】发言

查看更多评论

添加红包

成就一亿技术人!

hope_wisdom

发出的红包

实付元

使用余额支付

点击重新获取

扫码支付

钱包余额 0

抵扣说明：

1.余额是钱包充值的虚拟货币，按照1:1的比例进行支付金额的抵扣。
2.余额无法直接购买下载，可以购买VIP、付费专栏及课程。