Python爬取售房信息并保存至CSV文件

DataLaboratory

已于 2024-01-24 14:05:01 修改

阅读量6.5k

点赞数 17

分类专栏： Python爬虫文章标签： python 爬虫数据挖掘

于 2022-03-18 19:23:04 首次发布

本文链接：https://blog.csdn.net/weixin_49742236/article/details/123580414

版权

Python爬虫专栏收录该内容

5 篇文章 2 订阅

订阅专栏

Python爬取售房信息并保存至CSV文件

在上一篇文章： Python爬取租房信息并保存至Excel文件，介绍了如何使用Python爬取租房信息并保存至Excel文件，在本案例中则是使用Python爬取售房信息并保存至CSV文件。与之前相比，数据的提取方式有所不同，这里用到了Selector选择器，而数据保存的目标文件则是CSV文件。

相关代码如下：

import requests
import parsel
import csv
import time

f = open('静安区售房信息.csv', mode='a', encoding='utf_8_sig', newline='')
csv_write = csv.DictWriter(f, fieldnames=['标题', '地址', '户型', '面积', '朝向', '装修', '楼层', '年代', '关注及发布', '其它', '总价', '单价', '详情'])
csv_write.writeheader()

for page in range(1, 29):
    time.sleep(3)
    print(f'======================正在爬取第{page}页数据内容======================')
    url = f'https://sh.lianjia.com/ershoufang/jingan/pg{page}/'
    headers = {
        'User-Agent': 'Mozilla/5.0 (Windows NT 10.0; Win64; x64) AppleWebKit/537.36 (KHTML, like Gecko) Chrome/98.0.4758.82 Safari/537.36'}
    response = requests.get(url=url, headers=headers)
    # print(response.text)
    selector = parsel.Selector(response.text)
    divs = selector.css('div.info.clear')
    # print(divs)
    for div in divs:
        title = div.css('.title a::text').get()
        area_list = div.css('.positionInfo a::text').getall()
        area = '-'.join(area_list)
        house_info = div.css('.houseInfo::text').get().split('|')
        house_type = house_info[0]
        house_area = house_info[1]
        house_face = house_info[2]
        decoration = house_info[3]
        floor = house_info[4]
        years = house_info[5]
        follow_info = div.css('.followInfo::text').get().replace(' / ', ',')
        tag_list = div.css('.tag span::text').getall()
        tag = '|'.join(tag_list)
        totalprice = div.css('.totalPrice span::text').get() + '万'
        unitprice = div.css('.unitPrice span::text').get().replace('单价', '')
        href = div.css('.title a::attr(href)').get()
        dit = {
            '标题': title,
            '地址': area,
            '户型': house_type,
            '面积': house_area,
            '朝向': house_face,
            '装修': decoration,
            '楼层': floor,
            '年代': years,
            '关注及发布': follow_info,
            '其它': tag,
            '总价': totalprice,
            '单价': unitprice,
            '详情': href,
        }
        csv_write.writerow(dit)
        print(title, area, house_type, house_area, house_face, decoration, floor, years, follow_info, tag, totalprice,
              unitprice, href, sep='|')
print("爬取完毕！")