Python爬虫爬取网页数据

离异带俩娃

已于 2022-02-24 15:42:59 修改

阅读量3k

点赞数 2

分类专栏：日常大数据分享文章标签：爬虫 python 数据挖掘

于 2022-02-16 21:12:25 首次发布

原文链接：https://blog.csdn.net/qq_42155078/article/details/123113199?spm=1001.2014.3001.5502

版权

日常大数据分享专栏收录该内容

5 篇文章 0 订阅

订阅专栏

本文介绍了如何使用Python的requests和parsel库抓取租房网站的数据。通过设置User-Agent并请求指定URL，解析HTML获取房源标题、区域、地址和价格信息，最后将数据整理并打印。此方法适用于学习目的，数据不作商业使用。

摘要由CSDN通过智能技术生成

本篇文章介绍爬虫爬取某租房信息数据，数据仅用于学习使用无商业用途。

首先在Python Console 控制台中安装requests、parsel模块，requests发送网络请求获取数据，parsel用于对数据源进行解析。

pip install requests

pip install parsel

下面开始实操代码：

import requests
import parsel

# file = open("C:\\Users\\AUSU\\Desktop\\租房数据.txt", "a")
# for i in range(98):
# url = "https://hz.lianjia.com/zufang/pg" + str(i + 2) + "rt200600000002/#contentList"
url = "https://nj.lianjia.com/zufang/pg3/#contentList"
header = {
    "user-agent": "Mozilla/5.0 (Windows NT 10.0; Win64; x64) AppleWebKit/537.36 (KHTML, like Gecko) Chrome/98.0.4758.82 Safari/537.36"
}
response = requests.get(url=url, headers=header)
selector = parsel.Selector(response.text)

lis = selector.css(".content__list--item--main ")
for li in lis:
    title = li.css(".content__list--item--title a::text").getall()
    if title:
        info = str(title).replace("\\n", "").replace(" ", "").replace("[", "").replace("'", "").replace("]", "")
    location: list = li.css(".content__list--item--des a::text").getall()
    if location:
        area = str("-".join(location))
    address: list = li.css(".content__list--item--des ::text").getall()
    if address:
        addressInfo = str(address).replace("\\n", "").replace(" ", "").replace("[", "").replace("]", "") \
            .replace("'-'", "").replace("'", "").replace(",", "")
    price = li.css(".content__list--item-price em::text").get()
    result = info + "|" + area + "|" + addressInfo + "|" + price + "元"
    # file.write(result)
    # file.write("\n")
    print(result)

离异带俩娃

关注

2
点赞
踩
50

收藏

觉得还不错? 一键收藏
0
评论
Python爬虫爬取网页数据

本篇文章介绍爬虫爬取某租房信息数据，数据仅用于学习使用无商业用途。首先在Python Console 控制台中安装requests、parsel模块，requests发送网络请求获取数据，parsel用于对数据源进行解析。pip install requestspip install parsel下面开始实操代码：import requestsimport parsel# file = open("C:\\Users\\AUSU\\Desktop\\租房数据.txt", "a"
复制链接

扫一扫

专栏目录