一、效果图
二、命令
pip install beautifulsoup4
pip install pypinyin
pip install requests
pip install urllib3
三、思路
- 解析指定网站内,选择具体城市和年份选择后的网址的变化
- 手动拼接网址,将拼接完的网址放到数组内
- 使用requests包进行请求网址,获取源码
- 使用beautifulsoup进行解析
- 写到文本内
四、具体代码实现
import requests
from bs4 import BeautifulSoup
import pypinyin
import random
import urllib
import urllib.request
import re
import sys
headers = {
'User-Agent':'Mozilla/5.0 (Windows NT 10.0; Win64; x64) AppleWebKit/537.36 (KHTML, like Gecko) Chrome/86.0.4240.198 Safari/537.36'
}
urllist=[]
def get_data():
file = open('all_city_weather.txt','a+',newline='')
file.writelines("城市,日期,星期,最高温度,最低温度,天气,风向"+"\n")
for url in urllist:
real_url = url.split(",")[1]
print(real_url);
response = requests.get(real_url,headers=headers)
soup = BeautifulSoup(response.text, 'html.parser')
weather_list = soup.select('div[class="tian_three"]')
for weather in weather_list:
ul_list = weather.select('ul')
for ul in ul_list:
li_list= ul.select('li')
for li in li_list:
div_list = li.select('div')
string_div = f'''{url.split(",")[0]},'''
for div in div_list:
row_data = div.string
if "℃" in row_data:
string_div += f'''{row_data},'''
elif " 星期" in row_data:
data = row_data.split(" ")[0]
week = row_data.split(" ")[1]
string_div += f'''{data},'''
string_div += f'''{week},'''
else:
string_div += f'''{row_data}'''
print(string_div)
file.writelines(string_div+"\n")
file.close()
def main(areas,years):
for area in areas:
area_pinyin = pypinyin.slug(area, separator='')
for year in years:
for month in range(1,13):
urllist.append("%s,http://lishi.tianqi.com/%s/%d%02d.html"%(area,area_pinyin,year,month))
get_data()
if __name__ == '__main__':
areas = ["南昌"]
years = [2020]
main(areas,years)
五、使用
1. 可在areas内指定具体城市,目前仅有“南昌”
2. 可在years内指定具体年份,目前仅有“2020”
3. 目前运行结果存储到同脚本文件相同路径内的“all_city_weather.txt”