css选择器

帮助文档
https://www.w3.org/TR/selectors-3/

原理:
css选择器的语法比xpath更简单一些,功能没有xpath强大。
当我们调用Selector对象的CSS方法时,在内部会使用python库的cssselect将css选择器的表达式翻译成xpath表达式,然后调用Selector对象的xpath方法。

基本语法
在这里插入图片描述

构建HtmlResponse对象

from scrapy.selector import Selector
from scrapy.http import HtmlResponse

body='''
<html>
	<head>
		<base href='http://example.com/'>
		<title>Example website</site>
	</head>
	<body>
		<div id='images-1' style="width:1230px;">
			<a href='image1.html'>Name:Image 1 <br/><img src='image1.jpg' /></a>
			<a href='image2.html'>Name:Image 2 <br/><img src='image2.jpg' /></a>
			<a href='image3.html'>Name:Image 3 <br/><img src='image3.jpg' /></a>
		</div>
		<div id="images-2" class="small">
			<a href='image4.html'>Name:Image 4 <br/><img src='image4.jpg' /></a>
			<a href='image5.html'>Name:Image 5 <br/><img src='image5.jpg' /></a>
		</div>
	</body>
</html>
'''

response = HtmlResponse(url='http://www.example.com',body=body,encoding='utf8')

选中所有的img

print(response.css('img'))
[<Selector xpath='descendant-or-self::img' data='<img src="image1.jpg">'>, 
<Selector xpath='descendant-or-self::img' data='<img src="image2.jpg">'>, 
<Selector xpath='descendant-or-self::img' data='<img src="image3.jpg">'>, 
<Selector xpath='descendant-or-self::img' data='<img src="image4.jpg">'>, 
<Selector xpath='descendant-or-self::img' data='<img src="image5.jpg">'>]

选中所有的base和title

print(response.css('base,title'))
[<Selector xpath='descendant-or-self::base | descendant-or-self::title' data='<base href="http://example.com/">'>, 
<Selector xpath='descendant-or-self::base | descendant-or-self::title' data='<title>Example website\n\t</title>'>]

选择div后代中的img

print(response.css('div img'))
[<Selector xpath='descendant-or-self::div/descendant-or-self::*/img' data='<img src="image1.jpg">'>, 
<Selector xpath='descendant-or-self::div/descendant-or-self::*/img' data='<img src="image2.jpg">'>, 
<Selector xpath='descendant-or-self::div/descendant-or-self::*/img' data='<img src="image3.jpg">'>, 
<Selector xpath='descendant-or-self::div/descendant-or-self::*/img' data='<img src="image4.jpg">'>, 
<Selector xpath='descendant-or-self::div/descendant-or-self::*/img' data='<img src="image5.jpg">'>]

选中body子元素中的div

print(response.css('body>div'))
[<Selector xpath='descendant-or-self::body/div' data='<div id="images-1" style="width:1230p...'>, 
<Selector xpath='descendant-or-self::body/div' data='<div id="images-2" class="small">\n\t\t\t...'>]

选中包含style属性的元素

print(response.css('[style]'))
[<Selector xpath='descendant-or-self::*[@style]' data='<div id="images-1" style="width:1230p...'>]

选中属性id值为images-1的元素

print(response.css('[id=images-1]'))
[<Selector xpath="descendant-or-self::*[@id = 'images-1']" data='<div id="images-1" style="width:1230p...'>]

选中每个div中的第一个a标签

print(response.css('div>a:nth-child(1)'))
[<Selector xpath='descendant-or-self::div/a[count(preceding-sibling::*) = 0]' data='<a href="image1.html">Name:Image 1 <b...'>, 
<Selector xpath='descendant-or-self::div/a[count(preceding-sibling::*) = 0]' data='<a href="image4.html">Name:Image 4 <b...'>]

选中第二个div中的第一个a标签

print(response.css('div:nth-child(2)>a:nth-child(1)'))
[<Selector xpath='descendant-or-self::div[count(preceding-sibling::*) = 1]/a[count(preceding-sibling::*) = 0]' data='<a href="image4.html">Name:Image 4 <b...'>]

选中第一个div中最后一个a标签

print(response.css('div:first-child>a:last-child'))
[<Selector xpath='descendant-or-self::div[count(preceding-sibling::*) = 0]/a[count(following-sibling::*) = 0]' data='<a href="image3.html">Name:Image 3 <b...'>]

选中所有a标签的文本

print(response.css('a::text').extract())
['Name:Image 1 ', 'Name:Image 2 ', 'Name:Image 3 ', 'Name:Image 4 ', 'Name:Image 5 ']
  • 0
    点赞
  • 0
    收藏
    觉得还不错? 一键收藏
  • 0
    评论

“相关推荐”对你有帮助么?

  • 非常没帮助
  • 没帮助
  • 一般
  • 有帮助
  • 非常有帮助
提交
评论
添加红包

请填写红包祝福语或标题

红包个数最小为10个

红包金额最低5元

当前余额3.43前往充值 >
需支付:10.00
成就一亿技术人!
领取后你会自动成为博主和红包主的粉丝 规则
hope_wisdom
发出的红包
实付
使用余额支付
点击重新获取
扫码支付
钱包余额 0

抵扣说明:

1.余额是钱包充值的虚拟货币,按照1:1的比例进行支付金额的抵扣。
2.余额无法直接购买下载,可以购买VIP、付费专栏及课程。

余额充值