某地区楼盘信息爬取

gitee: https://gitee.com/livingbody/district-information-crawling

url分析

https://xiangxi.loupan.com/info/7075218.html

1.流程分析

搜索获取楼盘数据–如果存在打开详情页–爬取楼盘信息
–如果不存在结束
楼盘名称 建筑类型 占地面积 参考起价 容积率

2.提交搜索post

https://xiangxi.loupan.com/xinfang/?q=吉首碧桂园

3.获取搜索数据

  • 建筑类型:高层 低层 板楼
  • 4.访问楼盘信息

    5.提取楼盘数据

    import re
    import pandas as pd
    import requests
    import collections
    from bs4 import BeautifulSoup
    import urllib.parse as urp
    import json
    import csv
    
    
    def read_name(filename):
        data = pd.read_csv(filename)
        data = data["name"]
        return data.to_numpy()
    
    
    # 获取接口
    def get_district_info(district_name):
        session = requests.Session()
        url = "https://xiangxi.loupan.com/xinfang/?q=" + urp.quote(district_name)
        response = requests.get(url)
        if response.status_code == 200:
            # print("200")
            html = response.text
            text = BeautifulSoup(html, 'lxml')
            for link in text.find_all(name='a'):
                if (link.get_text() == district_name):
                    # print(link.get_text())
                    url = link['href']
                    # print(url)
                    break
            return url
        return None
    
    
    def main(district_name):
        tmp = []
        url = get_district_info(district_name)
        id = url.split("/")[-1].split(".")[0]
        url = "https://xiangxi.loupan.com/info/" + str(id) + ".html"
        print(url)
        session = requests.Session()
        response = requests.get(url)
        if response.status_code == 200:
            html = response.text
            text = BeautifulSoup(html, 'html.parser')
            tmp.append(district_name)
            for li in text.find_all('li'):
                # print(li.get_text())
                if "建筑类型:" in li.get_text():
                    jzlx = li.get_text().strip("建筑类型:")
                    tmp.append(jzlx)
                elif "占地面积:" in li.get_text():
                    zdmj = li.get_text().strip("占地面积:")
                    tmp.append(zdmj)
                elif "参考起价:" in li.get_text():
                    ckjg = li.get_text().strip("参考起价:")
                    tmp.append(ckjg)
                elif "容积率:" 
评论
添加红包

请填写红包祝福语或标题

红包个数最小为10个

红包金额最低5元

当前余额3.43前往充值 >
需支付:10.00
成就一亿技术人!
领取后你会自动成为博主和红包主的粉丝 规则
hope_wisdom
发出的红包
实付
使用余额支付
点击重新获取
扫码支付
钱包余额 0

抵扣说明:

1.余额是钱包充值的虚拟货币,按照1:1的比例进行支付金额的抵扣。
2.余额无法直接购买下载,可以购买VIP、付费专栏及课程。

余额充值