58同城租房房源信息获取

百年੭ ᐕ)੭*⁾⁾

已于 2023-12-30 14:54:49 修改

阅读量4.6k

点赞数 21

分类专栏：五八同城租房信息获取文章标签： python 爬虫

于 2023-12-27 11:29:12 首次发布

本文链接：https://blog.csdn.net/weixin_45938063/article/details/135240477

版权

五八同城租房信息获取专栏收录该内容

1 篇文章

订阅专栏

五八房源信息获取

爬取房源信息算是比较常见的且稍微有些难度的，今日分享关于五八租房信息的爬取，仅作参考，抛砖引玉

房源链接

上海五八租房

目标定位

获取上海租房信息
保存为csv文件

部分核心代码演示

导库

import csv
import random
import requests
from lxml import etree

xpath路径（利用xpath定位到要爬取的信息是关键步骤之一）

//div[@class=“house-title”]/h1/text() 标题
//div[@class=“house-pay-way f16”]/span/b/text() 价格
//div[@class=“house-desc-item fl c_333”]/ul/li[1]/span[2]/text() 租赁方式
//div[@class=“house-desc-item fl c_333”]/ul/li[2]/span[2]/text() 房屋类型
//div[@class=“house-desc-item fl c_333”]/ul/li[3]/span[2]/text() 朝向楼层
//div[@class=“house-desc-item fl c_333”]/ul/li[4]/span[2]/a/text() 所在小区
//div[@class=“house-desc-item fl c_333”]/ul/li[5]/span[2]/a[1]/text()所属区域
//div[@class=“house-desc-item fl c_333”]/ul/li[5]/span[2]/a[2]/text() 所在路段
//div[@class=“house-desc-item fl c_333”]/ul/li[@class=“li_br”]/span[2]/text() 详细地址
‘’’

代理池配置（防止反爬，不赘述，最好用自己的租用的代理）

proxies_pool=[
    {'http': '188.132.221.27:8080'},
    {'http': '103.152.232.134:8080'},
    {'http': '103.227.252.102:8080'},
    {'http': '79.106.170.34:8989'},
    {'http': '190.110.99.189:999'},

]

构造请求头

proxies=random.choice(proxies_pool)
headers={
    'user-agent': 'Mozilla/5.0 (Windows NT 10.0; Win64; x64) AppleWebKit/537.36 (KHTML, like Gecko) Chrome/110.0.0.0 Safari/537.36'

}

自定义输入页面构建csv文件方便写入

base_url='https://sh.58.com/zufang/pn'
i=int(input("请输入起始页:"))
end=int(input("请输入终止页:"))
fp=open('./wuba.csv','a+',encoding='utf-8',newline='')

分析每页的二级块所涵盖的链接并打印

for i in range(i,end+1) :
    # startpage_index+=i
    url=base_url+str(i)+'/'
    print(url) #<--定义页面，选择第几页
    response=requests.get(url=url,headers=headers,proxies=proxies)
    content=response.text
    # print(content)
    tree=etree.HTML(content)
    hrefs=tree.xpath('//div[@class="des"]/h2/a/@href')
    print(hrefs)

之后是逐个获取

 try:
            totalprice=fintree.xpath('//div[@class="house-pay-way f16"]/span/b/text()')[0]
        except IndexError:
            totalprice=''
        # 租赁方式
        try:
            typeofrent=fintree.xpath('//div[@class="house-desc-item fl c_333"]/ul/li[1]/span[2]/text()')[0]
        except IndexError:
            typeofrent=''
        # 小区
        try:
            commname=fintree.xpath('//div[@class="house-desc-item fl c_333"]/ul/li[4]/span[2]/a/text()')[0]
        except IndexError:
            commname=''
        # 所在路段
        try:
            roadinfo=fintree.xpath('//div[@class="house-desc-item fl c_333"]/ul/li[5]/span[2]/a[2]/text()')[0]
        except IndexError:
            roadinfo=''
        # 6房屋类型house_type
        try:
            house_type=fintree.xpath('//div[@class="house-desc-item fl c_333"]/ul/li[2]/span[2]/text()')[0]
        except IndexError:
            house_type=''
        # 所属区域
        try:
            area=fintree.xpath('//div[@class="house-desc-item fl c_333"]/ul/li[5]/span[2]/a[1]/text()')[0]
        except IndexError:
            area=''

写入并保存

writer.writerow(houselst)
        print(houselst)