目录
简介
爬虫的基本思路:根据输入的URL进行爬取数据进行解析并获取数据,最后保存数据。
爬取网页数据案例:https://blog.csdn.net/qq_39979646/article/details/104510843
获取数据
中国天气网:https://tianqi.so.com/weather/
获取天气信息代码如下:
data_list = []
response = requests.get(url)
html_doc = response.text
soup = BeautifulSoup(html_doc, 'lxml')
temp = soup.find('div', class_='temperature').get_text() # 温度
wea = soup.find('div', class_='weather-icon-wrap').get_text() # 天气
data_list.append("现在的温度:%s度\n现在的天气情况:%s" % (temp, wea))
list = soup.find_all('ul', class_='weather-columns')
表格输出
第三方库:PrettyTable
数据以表格输出代码如下:
for i in list:
data_list.append(i.get_text())
print("列表数据:", data_list)
a = 1
tb_abject = pt.PrettyTable()
tb_abject.field_names = ["日期", "天气", "详情"]
for item in data_list:
# print(a)
if a != 1:
tb_abject.add_row(
[item.strip().split()[0] + item.strip().split()[1], item.strip().split()[2], item.strip().split()[3]])
else:
print(item.strip())
a += 1
print(tb_abject)
输出结果如下:
邮箱发送
Tips:将数据通过邮箱发送给您最亲爱的人~
邮箱发送案例:https://blog.csdn.net/qq_39979646/article/details/103933617
代码
import requests
from bs4 import BeautifulSoup
import prettytable as pt
#获取天气数据
def get_Data(url):
data_list = []
response = requests.get(url)
html_doc = response.text
soup = BeautifulSoup(html_doc, 'lxml')
temp = soup.find('div', class_='temperature').get_text() # 温度
wea = soup.find('div', class_='weather-icon-wrap').get_text() # 天气
data_list.append("现在的温度:%s度\n现在的天气情况:%s" % (temp, wea))
list = soup.find_all('ul', class_='weather-columns')
for i in list:
data_list.append(i.get_text())
# print("列表数据:", data_list)
a = 1
tb_abject = pt.PrettyTable()
tb_abject.field_names = ["日期", "天气", "详情"]
for item in data_list:
# print(a)
if a != 1:
tb_abject.add_row(
[item.strip().split()[0] + item.strip().split()[1], item.strip().split()[2], item.strip().split()[3]])
else:
print(item.strip())
a += 1
print(tb_abject)
return tb_abject
if __name__ == '__main__':
# 选择地区,复制链接
url1 = "https://tianqi.so.com/weather/101281601"
get_Data(url1)