1.天气数据获取
进入中国天气网,各省市天气被分为七个大区,将各个大区的URL放入一个列表,依次获取数据
urllist = ["http://www.weather.com.cn/textFC/hb.shtml",'http://www.weather.com.cn/textFC/db.shtml','http://www.weather.com.cn/textFC/hd.shtml',
'http://www.weather.com.cn/textFC/hz.shtml','http://www.weather.com.cn/textFC/hn.shtml','http://www.weather.com.cn/textFC/xb.shtml',
'http://www.weather.com.cn/textFC/xn.shtml']##需要爬取的七大片区链接,依次爬取
headers = {'User-Agent': 'Mozilla/5.0 (Windows NT 10.0; Win64; x64) AppleWebKit/537.36 (KHTML, like Gecko) Chrome/91.0.4472.77 Safari/537.36 Edg/91.0.864.41'}
for url in urllist:
response = requests.get(url,headers=headers)##获得网页请求
# print(response.content.decode('utf-8'))
html=response.content.decode('utf-8')
soup = BeautifulSoup(html,'lxml')##将获得的网页美化,方便获取所需数据
all_weathers = soup.find('div', class_='hanml') # 先找到最大的div
weather=all_weathers.find_all('div', class_="conMidtab")[1]# 先找到天气的所有数据
为了方便起见,只需要获取每个省市的省会城市的数据,通过对页面结构的分析,获取我们需要的三个关键数据(城市,最高气温,最低气温)
for each_weather in weather.find_all('div', class_="conMidtab2"):
all_tr_tag = each_weather.find_all('tr') # 找到所有的tr标签
# print(all_tr_tag[2].text)
i=0
for td in all_tr_tag[2].find_all('td'): # 找到所有的td标签
# print(td.text)
# print(i)
if i==0: # 将数据存入对应列表
print(td.text.strip('\n'))
add.append(td.text.strip('\n'))
if i==4:
print(td.text)
temp.append(td.text)
if i==7:
print(td.text)
templow.append(td.text)
i+=1
2.数据可视化
map = Map('最高气温热力图',title_pos='center', width=1200, height=600)
map.add("", add, temp, visual_range=[0, 40], maptype='china', is_visualmap=True,is_label_show=True,
visual_text_color='#000')
map2 = Map('最低气温热力图',title_pos='center', width=1200, height=600)
map2.add("", add, templow, visual_range=[0, 40], maptype='china', is_visualmap=True,is_label_show=True,
visual_text_color='#000')
page = Page(page_title= "中国天气") ##整合页面
page.add(map)
page.add(map2)
page.render("天气.html")
nowtime = time.localtime(time.time())
with open("天气.html", "r+", encoding='utf-8') as html:## 页面布局调整
html_bf = BeautifulSoup(html, 'lxml')
# print(html_bf)
divs = html_bf.select('div')
divs[0]['style'] = "width:700px;height:800px;position:absolute;top:5px;left:0px;border-style:solid;border-color:#444444;border-width:0px;"
divs[1]["style"] = "width:700px;height:800px;position:absolute;top:5px;left:750px;border-style:solid;border-color:#444444;border-width:0px;"
body = html_bf.find("body")
div_title = "<div align=\"center\" style=\"width:1500px;\">\n<span style=\"font-size:32px;font face=\'黑体\';color:#000000\"><b>{}年{}月{}日 中国天气情况分析</b></div>".format(nowtime.tm_year,nowtime.tm_mon,nowtime.tm_mday)
body["style"] = "background-color:#ffffff;"
body.insert(0, BeautifulSoup(div_title, "lxml").div)
html_new = str(html_bf)
html.seek(0, 0)
html.truncate()
html.write(html_new)
html.close()
3.最终效果