小白Python爬虫入门实例1.1——定向爬取30天城市天气数据

最新推荐文章于 2024-01-15 12:59:47 发布

百练霓裳

最新推荐文章于 2024-01-15 12:59:47 发布

阅读量807

点赞数 3

文章标签： python 爬虫 html

本文链接：https://blog.csdn.net/m0_47105676/article/details/119953772

版权

本实例是在上一篇学习之后的基础上，本人自己做的实例，主要还是参考了上次实例的模板，并可能会一直沿用。因为还是定向爬虫的例子，因此标题中取名实例1.1。

一、源代码

import requests
from bs4 import BeautifulSoup

def getHTML(url):
    try:
        r = requests.get(url,timeout = 30)
        r.encoding = r.apparent_encoding
        r.raise_for_status()
        return r.text
    except:
        return ""

def fillWeather(wlist,html):
    soup = BeautifulSoup(html,"html.parser")
    dlt = soup.find_all('span',class_="calendar__date")
    tlt = soup.find_all('span',class_="calendar__tmp")
    for i in range(0,len(dlt)):
        dates = dlt[i].get_text()
        temp = tlt[i].get_text()
        high = temp.split("~")[0]
        low = temp.split("~")[1]
        wlist.append([dates,high,low])

def printWeather(wlist):
    print("{:^10}\t{:^10}\t{:^25}".format("日期", "最高温度", "最低温度"))
    for i in range(0,len(wlist)):
        w = wlist[i]
        print(("{:^10}\t{:^15}\t{:^25}".format(w[0],w[1],w[2])))

def main():
    print("30天天气查询定向爬虫实例")
    loc = input("输入想查天气的城市（小写拼音）：")
    wlist = []
    url = "https://www.qweather.com/weather30d/"+loc+"-101020100.html"
    html = getHTML(url)
    fillWeather(wlist,html)
    printWeather(wlist)
main()

测试代码

#测试代码
wlist = []
kv = {'user-agent': 'Mozilla/5.0'}//修改爬虫头部信息，以免无法爬取网站内容
url = "https://www.qweather.com/weather30d/shanghai-101020100.html"
r = requests.get(url,headers=kv)
r.encoding = r.apparent_encoding//apparent_encoding能够主动分析网页头部的编码格式
print(r.status_code)//如果输出数为200，则表示成功访问到目标网址
html = r.text//将网站的所有内容保存到html中
soup = BeautifulSoup(html,"html.parser")//引用BeautifulSoup对象
dlt = soup.find_all('span',class_="calendar__date")
tlt = soup.find_all('span',class_="calendar__tmp")
dates = dlt[0].get_text()
temp = tlt[0].get_text()
high = temp.split("~")[0]
low = temp.split("~")[1]
print(dates)
print(temp)
print(high)
print(low)

二、函数解析

1、getHTML(url)：以目标网页作为参数，获取网页文本内容并返回。

url：为传入函数的目标网址；

“try... except..."：为了防止长时间无法响应网站而死机，设置响应时间为30秒；

def getHTML(url):
    try:
        r = requests.get(url,timeout = 30)
        r.encoding = r.apparent_encoding
        r.raise_for_status()
        return r.text
    except:
        return ""

2、fillWeather(wlist,html):分析天气网站中需要的内容(日期，最高温度，最低温度），存储在列表wlist中。

wlist：传入的空白列表参数，用于保存查询到的30天日期，最高温度，最低温度的内容；

html：该参数中保存了目标网站的网页内容；

soup：调用引入的BeautifulSoup库（使用方法请自行搜索，这里不做解释）；

dlt：用于保存日期数据的列表。用BeautifulSoup库中的find_all()函数寻找需要的日期数据；

tlt：用于保存温度数据的列表。用BeautifulSoup库中的find_all()函数寻找需要的温度数据；

dates：用于保存查找到的信息中，文本内容的字符；

temp：用于保存查询到的温度信息的字符；

high：temp中的内容是“最高温度~最低温度”的内容，因此使用字符中的split()函数分割出最高温度并保存在high中；

low：temp中的内容是“最高温度~最低温度”的内容，因此使用字符中的split()函数分割出最低温度并保存在low中；

wlist：用于保存日期，最高温度，最低温度信息的列表。

def fillWeather(wlist,html):
    soup = BeautifulSoup(html,"html.parser")
    dlt = soup.find_all('span',class_="calendar__date")
    tlt = soup.find_all('span',class_="calendar__tmp")
    for i in range(0,len(dlt)):
        dates = dlt[i].get_text()
        temp = tlt[i].get_text()
        high = temp.split("~")[0]
        low = temp.split("~")[1]
        wlist.append([dates,high,low])

3、printWeather(wlist):打印出列表中存储的日期，最高温度，最低温度的内容。

wlist：传入已经保存了所需数据的列表参数；

循环输出列表中的数据。

def printWeather(wlist):
    print("{:^10}\t{:^10}\t{:^25}".format("日期", "最高温度", "最低温度"))
    for i in range(0,len(wlist)):
        w = wlist[i]
        print(("{:^10}\t{:^15}\t{:^25}".format(w[0],w[1],w[2])))

三、网页分析

打开网页"https://www.qweather.com/weather30d/shanghai-101020100.html"可以发现，网页中的"shanghai"表示所查天气的区域，因此我们可以在调用main函数的时候可以添加一个查询各地区的天气的功能。

随便选中日历中的一天，右键点击“检查”，可以查看选中内容的网页源代码。

通过网页源代码定位可以看到，需要的日期以及温度信息都在span标签中，因此，可以通过BeautifulSoup库中的find_all()函数寻找span标签下，class名称为“calendar__date”下的文本内容以及span标签下，class名称为“calendar__tmp”下的文本内容。

在main函数中，将位置信息以用户输入的方式获取，存储到loc变量中。（经过测试发现，输入的变量可以是地点的全拼，也可以是地点的首字母）

将最终的目标地址存储在url中，传入getHTML()函数中；

将getHTML()函数中返回的文本内容存储到html中；

将空的列表wlist以及html传入fill Weather()函数中；

将存储了信息的wlist传入print Weather()函数中，最终执行main（）。

def main():
    print("30天天气查询定向爬虫实例")
    loc = input("输入想查天气的城市（小写拼音）：")
    wlist = []
    url = "https://www.qweather.com/weather30d/"+loc+"-101020100.html"
    html = getHTML(url)
    fillWeather(wlist,html)
    printWeather(wlist)

四、声明

本人是学习python爬虫路上的一名小白，如有不当之处（轻喷，小白需要鼓励），欢迎大佬们批评指正。

百练霓裳

关注

3
点赞
踩
8

收藏

觉得还不错? 一键收藏
0
评论
小白Python爬虫入门实例1.1——定向爬取30天城市天气数据

import requestsfrom bs4 import BeautifulSoupdef getHTML(url): try: r = requests.get(url,timeout = 30) r.encoding = r.apparent_encoding r.raise_for_status() return r.text except: return ""def fillWeather.
复制链接

扫一扫