Beautiful Soup 之 select详解

1
 

[code language="python"]
### select 传入tag标签

1. soup.select("title")

2. soup.select("p")

### 通过tag标签逐层查找

1. soup.select("body a")
2. soup.select("html head title")

### 找到某个tag标签下的直接子标签

1. soup.select("p > a")
2. soup.select("p > #link1")

### 找到兄弟节点标签:

1. soup.select("#link1 ~ .sister")

[<a class="sister" href="http://example.com/lacie" id="link2">Lacie</a>,
<a class="sister" href="http://example.com/tillie" id="link3">Tillie</a>]

2. soup.select("#link1 + .sister")

[<a class="sister" href="http://example.com/lacie" id="link2">Lacie</a>]
###通过CSS的类名查找:

1. soup.select(".sister")

[<a class="sister" href="http://example.com/elsie" id="link1">Elsie</a>,
<a class="sister" href="http://example.com/lacie" id="link2">Lacie</a>,
<a class="sister" href="http://example.com/tillie" id="link3">Tillie</a>]

2. soup.select("[class~=sister]")

[<a class="sister" href="http://example.com/elsie" id="link1">Elsie</a>,
<a class="sister" href="http://example.com/lacie" id="link2">Lacie</a>,
<a class="sister" href="http://example.com/tillie" id="link3">Tillie</a>]

###通过tag的id查找:

1. soup.select("#link1")

[<a class="sister" href="http://example.com/elsie" id="link1">Elsie</a>]

2. soup.select("a#link2")

[<a class="sister" href="http://example.com/lacie" id="link2">Lacie</a>]

###通过是否存在某个属性来查找:

soup.select('a[href]')
[<a class="sister" href="http://example.com/elsie" id="link1">Elsie</a>,
<a class="sister" href="http://example.com/lacie" id="link2">Lacie</a>,p <a class="sister" href="http://example.com/tillie" id="link3">Tillie</a>]

###通过属性的值来查找: 正则表达式

1. soup.select('a[href="http://example.com/elsie"]')

[<a class="sister" href="http://example.com/elsie" id="link1">Elsie</a>]

2. soup.select('a[href^="http://example.com/"]')

[<a class="sister" href="http://example.com/elsie" id="link1">Elsie</a>,
<a class="sister" href="http://example.com/lacie" id="link2">Lacie</a>,
<a class="sister" href="http://example.com/tillie" id="link3">Tillie</a>]

3. soup.select('a[href$="tillie"]')

[<a class="sister" href="http://example.com/tillie" id="link3">Tillie</a>]

4. soup.select('a[href*=".com/el"]')

[<a class="sister" href="http://example.com/elsie" id="link1">Elsie</a>]

#实例:::

#coding:utf-8
import requests, re, sys
from bs4 import BeautifulSoup as bs4
reload(sys)
sys.setdefaultencoding("utf-8")
"""
@author: songhao
@software: PyCharm
@file: demo.py
@time: 2017/7/5 下午5:26
"""
r = requests.get("").content

soup = bs4(r,'lxml')
alists = soup.select('a') #取出的a列表

for a in alists:
# 获取文本
print a.get_text()
print a.string
# 获取link
try:
print a['href']
except:
pass

imges = soup.select('img')
for a in imges:
# # 获取文本
# print a.get_text()
# print a.string
# # 获取link
try:
#获取 src
print a['src'] #
except:
pass

ip = soup.select('.article-content ')
for i in ip:
for p in i.select('p'):
print p

#获取全部img链接
imgz = soup.select('img[src]')

print imgz
for u in imgz:
print u['src']

#以什么开头
print soup.select('img[src^="http://qiniu."]')

#以什么结尾
print soup.select('img[src$=".jpg"]')

#包含
print soup.select('div[class*="crayon"]')
print soup.select('div[class*="crayon"]')

常用的css选择器 http://www.168seo.cn/python/23660.html




  • zeropython 微信公众号 5868037 QQ号 5868037@qq.com QQ邮箱
  • 0
    点赞
  • 5
    收藏
    觉得还不错? 一键收藏
  • 0
    评论
评论
添加红包

请填写红包祝福语或标题

红包个数最小为10个

红包金额最低5元

当前余额3.43前往充值 >
需支付:10.00
成就一亿技术人!
领取后你会自动成为博主和红包主的粉丝 规则
hope_wisdom
发出的红包
实付
使用余额支付
点击重新获取
扫码支付
钱包余额 0

抵扣说明:

1.余额是钱包充值的虚拟货币,按照1:1的比例进行支付金额的抵扣。
2.余额无法直接购买下载,可以购买VIP、付费专栏及课程。

余额充值