爬取上海链家二手房源信息源码分享

最新推荐文章于 2024-03-03 09:15:14 发布

努力学习各种软件

最新推荐文章于 2024-03-03 09:15:14 发布

阅读量920

点赞数 10

分类专栏：爬虫案例文章标签： python

本文链接：https://blog.csdn.net/m0_57265868/article/details/134789961

版权

爬虫案例专栏收录该内容

26 篇文章 5 订阅

订阅专栏

import requests
from lxml import etree
import csv

f = open('数据.csv',mode='a',encoding='utf-8',newline='')
csv_writer = csv.writer(f)
csv_writer.writerow(['介绍','地址','户型','面积','装修','楼层','样式','总价','均价'])
'''
这种数据在网页源码中的称为静态数据
1.确定爬取的内容
2.分析数据从哪来
开发者工具抓包分析
3.解析数据
4.保存数据
5.多页爬取

'''
def down_load(page):
for i in range(1,page+1):
if i==1:
url = 'https://cm.lianjia.com/ershoufang/'
else:
url = 'https://cm.lianjia.com/ershoufang/pg'+str(i)+'/'
headers = {'User-Agent':
'Mozilla/5.0 (Windows NT 10.0; Win64; x64) AppleWebKit/537.36 (KHTML, like Gecko) Chrome/118.0.0.0 Safari/537.36',
}
response = requests.get(url=url,headers=headers)
response.encoding = 'utf-8'
tree = etree.HTML(response.text)
# print(response.text)
mark_list = tree.xpath('//ul[@class="sellListContent"]//div[@class="info clear"]//div[@class="title"]//a//text()')
address_list = tree.xpath('//ul[@class="sellListContent"]//div[@class="flood"]//div//a[1]//text()')
introduce_list = tree.xpath('//ul[@class="sellListContent"]//div[@class="address"]//div//text()')
Sum_price_list = tree.xpath('//div[@class="priceInfo"]//div[@class="totalPrice totalPrice2"]//span//text()')
avg_price_list = tree.xpath('//div[@class="priceInfo"]//div[@class="unitPrice"]//span//text()')
# print(avg_price_list)
for i in range(len(mark_list)):
mark = mark_list[i] # 房子的介绍
address = address_list[i] # 房子的地址
introduce = introduce_list[i].split('|')
sum_price = Sum_price_list[i]+'万' # 房子的总价
avg_price = avg_price_list[i] # 房子的均价
unit_type = introduce[0] # 房子的面积几室几厅
acreage = introduce[1] # 房子的面积
decorate_type = introduce[2] # 装修
flood = introduce[3] #楼层
try:
build_type = introduce[4] #样式
except:
build_type='无数据'
# dit = {
# '介绍':mark,
# '地址':address,
# '户型':unit_type,
# '面积':acreage,
# '装修':decorate_type,
# '楼层':flood,
# '样式':build_type,
# '总价':sum_price,
# '均价':avg_price,
# }
csv_writer.writerow([mark,address,unit_type,acreage,decorate_type,flood,build_type,sum_price,avg_price])

# print(mark,address,unit_type,acreage,decorate_type,flood,build_type,sum_price,avg_price,sep='|')
down_load(30)

#down_load函数内输入想要的页数，不超过70页，换一个headers即可，亲测没有反爬。

努力学习各种软件

关注

10
点赞
踩
8

收藏

觉得还不错? 一键收藏
打赏
0
评论
爬取上海链家二手房源信息源码分享

csv_writer.writerow(['介绍','地址','户型','面积','装修','楼层','样式','总价','均价'])f = open('数据.csv',mode='a',encoding='utf-8',newline='')sum_price = Sum_price_list[i]+'万' # 房子的总价。avg_price = avg_price_list[i] # 房子的均价。# '样式':build_type,# '均价':avg_price,
复制链接

扫一扫