学习用requests, bs4 抓取网页特定的内容



#check city pm2.5 value and quality; if assigned city does not exist, then display beijing city result;
'''

如题,学习中。抓取pm2.5信息为借用网络中的帖子目的,内容细节为自己尝试,未验证与其它类似帖子的重合度。

如有意见,请私信,谢谢。
keypoint:
1. using requests to get website result, text;
2. put website result (text) into soup module, a DOM project is created;
3. trying to find out where to store target information; try and try, until there result is correct;
4. collections members can be accessed one by one as: select('abc'), or abc['abc']...
5. analyse website result in chrome, not IE.

result:
works well in win7, python 3.0, requests module, bs4 module
'''
from bs4 import BeautifulSoup
import requests
#checkcity='jiangmen'
checkcity='abc'
find_checkcity=''
pm25url='http://www.pm25.com/'
tempurl=pm25url+checkcity+'.html'
#print (tempurl) #test step
res=requests.get(tempurl)
res.encoding='utf-8'

#if checkcity is not in the list, then checkcity will be assigned as bejing
for city1 in soup.select('.city_province_item'):
    for href1 in city1.select('a'):
        if checkcity in href1['href']:
            find_checkcity=='yes'
if find_checkcity=='':
    find_checkcity='beijing'
       
#print(res.text)
soup=BeautifulSoup(res.text,'html.parser')
#print (soup.text) #works
for city in soup.select('.banner_index'):
    mycity=city.select('h2')[0].text
    mypm25=city.select('a')[2]['pm25']
    myqua=city.select('a')[2]['qua']
    print(mycity, ": 空气PM25-",mypm25, ", 空气质量-" , myqua, sep="")

  • 0
    点赞
  • 1
    收藏
    觉得还不错? 一键收藏
  • 0
    评论

“相关推荐”对你有帮助么?

  • 非常没帮助
  • 没帮助
  • 一般
  • 有帮助
  • 非常有帮助
提交
评论
添加红包

请填写红包祝福语或标题

红包个数最小为10个

红包金额最低5元

当前余额3.43前往充值 >
需支付:10.00
成就一亿技术人!
领取后你会自动成为博主和红包主的粉丝 规则
hope_wisdom
发出的红包
实付
使用余额支付
点击重新获取
扫码支付
钱包余额 0

抵扣说明:

1.余额是钱包充值的虚拟货币,按照1:1的比例进行支付金额的抵扣。
2.余额无法直接购买下载,可以购买VIP、付费专栏及课程。

余额充值