python 安居客 爬虫_基于bs4+requests的安居客爬虫-阿里云开发者社区

本文介绍了使用Python的BeautifulSoup4和requests库实现安居客厦门楼盘信息的爬取。代码实现了从安居客网站抓取楼盘名称、价格、地址、面积、销售状态和户型,并将数据保存到Excel文件中。
摘要由CSDN通过智能技术生成

1.代码可以直接运行,请下载anaconda并安装,用spyder方便查看变量

或者可以查看生成的excel文件

2.依赖库,命令行运行(WIN10打开命令行快捷键:windows+x组合键,然后按a键):

pip install BeautifulSoup4

pip install requests

3.爬取的网站是安居客(厦门)网站,可以进入https://xm.fang.anjuke.com/loupan/all/进行观察

4.关于如何判断代码是python2还是python3,print('')为python3,print ''为python2

# -*- coding: utf-8 -*-

"""

Created on Sun Jan 14 19:07:39 2018

@author: Steven Lei

"""

def getHousesDetails(url):

import requests

from bs4 import BeautifulSoup

request = requests.get(url)

request.encoding = 'utf-8'

soup = BeautifulSoup(request.text,'lxml')

houses = soup.select('.item-mod')[3:]

housesDetails = []

for house in houses:

#获取楼盘名字

houseName = house.select('.items-name')[0].text

#获取楼盘价格

priceBefore = house.select('.price')

if(len(priceBefore) == 0):

priceBefore = house.select('.price-txt')

price = priceBefore[0].text

#获取楼盘地址

address = house.select('.list-map')[0].text

if(address[-1] == '.'):

href = house.select('.pic')[0]['href']

request = requests.get(href)

request.encoding = 'utf-8'

soup = BeautifulSoup(request.text,'lxml')

address = soup.select('.lpAddr-text')[0].text

#获取房屋面积

houseSizeBefore = house.select('.huxing span')

if(len(houseSizeBefore) >0):

houseSize = houseSizeBefore[-1].text

else:

houseSize = ''

#获取销售状态

saleStatus = house.select('.tag-panel i')[0].text

#获取户型

if(len(house.select('.tag-panel i')) == 2):

houseType = house.select('.tag-panel i')[1].text

else:

houseType = house.select('.tag-panel span')[0].text

#将获取的信息做成房屋信息字典

houseDetail = {}

houseDetail['houseName'] = houseName

houseDetail['price'] = price

houseDetail['address'] = address

houseDetail['houseSize'] = houseSize

houseDetail['saleStatus'] = saleStatus

houseDetail['houseType'] = houseType

print(houseDetail)

housesDetails.append(houseDetail)

return housesDetails

def getAllHouseDetails():

import pandas

urlBefore = 'https://xm.fang.anjuke.com/loupan/all/p{}/'

allHouseDetails = []

for i in range(1,8):

url = urlBefore.format(i)

allHouseDetails.extend(getHousesDetails(url))

dataframe = pandas.DataFrame(allHouseDetails)

return dataframe

if __name__ == '__main__':

#houseDetails = getHousesDetails('https://xm.fang.anjuke.com/loupan/all/p1/')

allHouseDetails = getAllHouseDetails()

allHouseDetails.to_excel('anjukeHousesDetails.xlsx')

print(allHouseDetails.head(10))

  • 0
    点赞
  • 0
    收藏
    觉得还不错? 一键收藏
  • 0
    评论
评论
添加红包

请填写红包祝福语或标题

红包个数最小为10个

红包金额最低5元

当前余额3.43前往充值 >
需支付:10.00
成就一亿技术人!
领取后你会自动成为博主和红包主的粉丝 规则
hope_wisdom
发出的红包
实付
使用余额支付
点击重新获取
扫码支付
钱包余额 0

抵扣说明:

1.余额是钱包充值的虚拟货币,按照1:1的比例进行支付金额的抵扣。
2.余额无法直接购买下载,可以购买VIP、付费专栏及课程。

余额充值