srapy_selector_css基础语法

CSS选择器

表达式描述例 子
*选中所有元素*
E选中E元素p
E1,E2选中E1和E2元素div,pre
E1 E2选中E1后代元素中的E2元素div p
E1>E2选中E1子元素中的E2元素div>p
E1+E2选中E1兄弟元素中的E2元素p+strong
.class选中class属性包含class的元素.info
#ID选中id属性为ID的元素#main
[ATTR]选中包含ATTR属性的元素[href]
[ATTR=VALUE]选中包含ATTR属性且值 VALUE的元素[method=post]
[ATTR~=VALUE]选中包含ATTR属性且值 含VALUE的元素[class~=clearfix]
E:nth-child(n)选中E元素,且该元素必须是父元素的第n个子元素a:nth-child(1)
E:nth-last-child(n)选中E元素,且该元素必须是父元素的**(倒数)**第n个子元素a:nth-last-child(2)
E:first-child选中E元素,且该元素必须是父元素的第一个子元素a:first-child
E:last-child选中E元素,且该元素必须是父元素的**(倒数)**第一个子元素a:last-child
E:emty选中没有子元素的E元素div:empty
E::text选中E元素的文本节点(Text Node)p::text
# 《精通 scrapy 网络爬虫》第3章 第4节(即3.4)CSS 实例
from scrapy.http import HtmlResponse

body = '''
<html>
	<head>
		<base href='http://example.com'/>
		<title>Example website</title>
	</head>
	<body>
		<div id='images-1' style="width:1230px;">
			<a href='image1.html'>Name:Image 1 <br/><img src="image1.jpg"/></a>
			<a href='image2.html'>Name:Image 2 <br/><img src="image2.jpg"/></a>
			<a href='image3.html'>Name:Image 3 <br/><img src="image3.jpg"/></a>
		</div>

        <div id="images-2" class="small">
            <a href='image4.html'>Name:Image 4 <br/><img src="image4.jpg"/></a>
			<a href='image5.html'>Name:Image 5 <br/><img src="image5.jpg"/></a>
        </div>
	</body>
</html>
'''
response = HtmlResponse(url='http://www.example.com/', body=body, encoding='utf-8')

# E:选中E元素
print('[1]==========E:选中E元素==========')
print(response.css('img')) # 等同于 print(response.xpath('//img'))
# E1,E2:选中E1和E2元素
print('[2]==========E1,E2:选中E1和E2元素==========')
print(response.css('base,title'))
# E1 E2:选中E1后代中E2元素
print('[3]==========E1 E2:选中E1后代中E2元素==========')
print(response.css('div img')) # 等同 print(response.xpath('//div//img'))
# E1>E2:选中E1元素中的E2元素
print('[4]==========E1>E2:选中E1元素中的E2元素==========')
print(response.css('body>div'))
# [ATTR]:选中包含ATTR属性的元素
print('[5]==========[ATTR]:选中包含ATTR属性的元素==========')
print(response.css('[style]')) # print(response.xpath('//div/@style'))
# [ATTR=VALUE]:选中包含ATTR属性且值为VALUE的元素
print('[6]==========[ATTR=VALUE]:选中包含ATTR属性且值为VALUE的元素==========')
print(response.css('[id="images-1"]')) # print(response.xpath('//div[@id="images-1"]'))
# E:nth-child(n):选中E元素,且该元素必须是其父元素的第n个子元素
print('[7]==========E:nth-child(n):选中E元素,且该元素必须是其父元素的第n个子元素==========')
# 选中每个div的第一个
print(response.css('div>a:nth-child(1)'))
# 选中第二个div的第一个
print(response.css('div:nth-child(2)>a:nth-child(1)'))
# E:first-child:选中E元素,该元素必须其父元素的第一个子元素
# E:last-child:选中E元素,该元素必须其父元素的倒数第一个子元素
print(response.css('div:first-child>a:first-child'))
print(response.css('div:last-child>a:last-child'))
# E::text:选中E元素的文本节点
print('[8]==========E::text:选中E元素的文本节点==========')
print(response.css('a::text').extract()) # print(response.xpath('//a/text()').extract())
---------------------------
D:\Python38\python.exe D:/Project0611/ScrapyBook/practise/scrapySelectorCSSTest.py
[1]==========E:选中E元素==========
[<Selector xpath='descendant-or-self::img' data='<img src="image1.jpg">'>, <Selector xpath='descendant-or-self::img' data='<img src="image2.jpg">'>, <Selector xpath='descendant-or-self::img' data='<img src="image3.jpg">'>, <Selector xpath='descendant-or-self::img' data='<img src="image4.jpg">'>, <Selector xpath='descendant-or-self::img' data='<img src="image5.jpg">'>]
[2]==========E1,E2:选中E1和E2元素==========
[<Selector xpath='descendant-or-self::base | descendant-or-self::title' data='<base href="http://example.com">'>, <Selector xpath='descendant-or-self::base | descendant-or-self::title' data='<title>Example website</title>'>]
[3]==========E1 E2:选中E1后代中E2元素==========
[<Selector xpath='descendant-or-self::div/descendant-or-self::*/img' data='<img src="image1.jpg">'>, <Selector xpath='descendant-or-self::div/descendant-or-self::*/img' data='<img src="image2.jpg">'>, <Selector xpath='descendant-or-self::div/descendant-or-self::*/img' data='<img src="image3.jpg">'>, <Selector xpath='descendant-or-self::div/descendant-or-self::*/img' data='<img src="image4.jpg">'>, <Selector xpath='descendant-or-self::div/descendant-or-self::*/img' data='<img src="image5.jpg">'>]
[4]==========E1>E2:选中E1元素中的E2元素==========
[<Selector xpath='descendant-or-self::body/div' data='<div id="images-1" style="width:1230p...'>, <Selector xpath='descendant-or-self::body/div' data='<div id="images-2" class="small">\n   ...'>]
[5]==========[ATTR]:选中包含ATTR属性的元素==========
[<Selector xpath='descendant-or-self::*[@style]' data='<div id="images-1" style="width:1230p...'>]
[6]==========[ATTR=VALUE]:选中包含ATTR属性且值为VALUE的元素==========
[<Selector xpath="descendant-or-self::*[@id = 'images-1']" data='<div id="images-1" style="width:1230p...'>]
[7]==========E:nth-child(n):选中E元素,且该元素必须是其父元素的第n个子元素==========
[<Selector xpath='descendant-or-self::div/a[count(preceding-sibling::*) = 0]' data='<a href="image1.html">Name:Image 1 <b...'>, <Selector xpath='descendant-or-self::div/a[count(preceding-sibling::*) = 0]' data='<a href="image4.html">Name:Image 4 <b...'>]
[<Selector xpath='descendant-or-self::div[count(preceding-sibling::*) = 1]/a[count(preceding-sibling::*) = 0]' data='<a href="image4.html">Name:Image 4 <b...'>]
[<Selector xpath='descendant-or-self::div[count(preceding-sibling::*) = 0]/a[count(preceding-sibling::*) = 0]' data='<a href="image1.html">Name:Image 1 <b...'>]
[<Selector xpath='descendant-or-self::div[count(following-sibling::*) = 0]/a[count(following-sibling::*) = 0]' data='<a href="image5.html">Name:Image 5 <b...'>]
[8]==========E::text:选中E元素的文本节点==========
['Name:Image 1 ', 'Name:Image 2 ', 'Name:Image 3 ', 'Name:Image 4 ', 'Name:Image 5 ']

Process finished with exit code 0

更多代码

  • 0
    点赞
  • 0
    收藏
    觉得还不错? 一键收藏
  • 0
    评论
评论
添加红包

请填写红包祝福语或标题

红包个数最小为10个

红包金额最低5元

当前余额3.43前往充值 >
需支付:10.00
成就一亿技术人!
领取后你会自动成为博主和红包主的粉丝 规则
hope_wisdom
发出的红包
实付
使用余额支付
点击重新获取
扫码支付
钱包余额 0

抵扣说明:

1.余额是钱包充值的虚拟货币,按照1:1的比例进行支付金额的抵扣。
2.余额无法直接购买下载,可以购买VIP、付费专栏及课程。

余额充值