爬虫实例5（爬取房天下二手房信息）（网页重定向，字典数据存入csv文件，bs4）

最新推荐文章于 2021-12-15 10:13:42 发布

岁月如梭518

最新推荐文章于 2021-12-15 10:13:42 发布

阅读量2.2k

点赞数

分类专栏： python 爬虫解析网页文章标签： python

本文链接：https://blog.csdn.net/weixin_47476051/article/details/106032574

版权

本文介绍了如何使用Python爬虫技术处理房天下网站的网页重定向问题，通过分析源代码获取重定向URL，并利用BeautifulSoup（bs4）解析网页内容。最终，将抓取到的数据保存到CSV文件中，作为学习用途。

摘要由CSDN通过智能技术生成

爬取要点分析

***声明：***纯粹是用于学习
1.网页重定向
分析房天下网站，每个网页有个重定向
如：访问https://cd.esf.fang.com/chushou/3_211293494.htm会跳转至https://cd.esf.fang.com/chushou/3_211293494.htm?rfss=1-b71f212cbb874a451c-3a
解决方法：在原网页源代码中找到重定向网址，request 新网址即可

 response=requests.get(url,headers = headers)
    html=response.text
    #网页重定向
    pat=re.compile(r'<a class="btn-redir".*?href="(.*?)">点击跳转')
    url=re.findall(pat,html)[0]

    response=requests.get(url,headers = headers)
    return response.text

2.bs4获取标签内容，部分代码

temp_dict['房源']=soup.find('title').string
temp_dict['小区'] = soup.find('div',id="xq_message").get_text()
temp_dict['总价']=soup.find('div',class_="tab-cont-right").find('div',class_="trl-item price_esf sty1").get_text()

3.将数据保存在csv文件中

def save_data_csv(keyword_list,dict_data):

    if not os.path.exists('fang.csv'):
        with open('fang.csv', "w", newline='', encoding='utf-8') as