制作一个小爬虫程序

最新推荐文章于 2021-09-16 22:49:41 发布

老街下着雨

最新推荐文章于 2021-09-16 22:49:41 发布

阅读量1.2k

点赞数 2

文章标签： python

本文链接：https://blog.csdn.net/l15767016983/article/details/106099464

版权

通过requests+xpath来完成一个爬虫程序，完成以下功能：

（1）获取某地未来一周的天气信息，包括日期，内容，温度，天气情况等

（2）使用json格式来保存数据。

（3）天气网址：http://www.weather.com.cn/weather/10128100101A.shtml

思路分析：

（1）先找到所在的div部分，div下面有7个li标签表示未来7天的天气情况。

（2）提取li下的所有数据，通过xpath

（3）保存json文件

import requests
#请求头
header={
        "User-Agent": "Mozilla/5.0 (Windows NT 10.0; Win64; x64) AppleWebKit/537.36 (KHTML, like Gecko) Chrome/72.0.3626.121 Safari/537.36"}
#请求连接
url="http://www.weather.com.cn/weather/10128100101A.shtml"
response=requests.get(url,header)
res=response.content.decode("utf-8")
#对整个页面的结果利用xpath提取指定内容
import lxml.html
metree=lxml.html.etree
#获得解析对象
parser=metree.HTML(res,metree.HTMLParser())
# ret=metree.tostring(parser)
# print(ret.decode("utf-8"))
#获取Li
result=parser.xpath("//div[@class='c7d']/ul[@class='t clearfix']/li")
# print(result)
#使用xpath提取li下面的所有内容
data=[]
for i in result:
    # print(i)
    dict_date={}
    dict_date["date"]=i.xpath("./h1/text()")[0]
    dict_date["weather"]=i.xpath("./p[@class='wea']/text()")[0]
    dict_date["low_tem"]=i.xpath("./p[@class='tem']/i/text()")[0]
    dict_date["hightlow_tem"]=i.xpath("./p[@class='tem']/span/text()")[0]
    data.append(dict_date)
# print(data)
#保存json文件
import json
json_str=json.dumps(data,ensure_ascii=False,indent=2)#设置转化的编码为False,默认是True,ident设置前进空格
# print(type(json_str))
# print(json_str)
with open("./json_test.json","w",encoding="utf-8") as fs:
    fs.write(json_str)

（4）结果

老街下着雨

关注

2
点赞
踩
7

收藏

觉得还不错? 一键收藏
0
评论
制作一个小爬虫程序

通过requests+xpath来完成一个爬虫程序，完成以下功能：（1）获取某地未来一周的天气信息，包括日期，内容，温度，天气情况等（2）使用json格式来保存数据。（3）天气网址：http://www.weather.com.cn/weather/10128100101A.shtml思路分析：（1）先找到所在的div部分，div下面有7个li标签表示未来7天的天气情况。（2）提取li下的所有数据，通过xpath（3）保存json文件import reque.
复制链接

扫一扫