python爬虫:使用scrapy框架对链家租房深度爬取,并存入redis、mysql、mongodb数据库

本文介绍了如何使用Python的Scrapy框架,对链家租房信息进行深度爬取,详细讲解了items.py、lianjia.py、pipelines.py和main.py四个关键文件的实现,最后将爬取的数据存储到Redis、MySQL和MongoDB数据库中。
摘要由CSDN通过智能技术生成

1.items.py

# -*- coding: utf-8 -*-

# Define here the models for your scraped items
#
# See documentation in:
# https://doc.scrapy.org/en/latest/topics/items.html

import scrapy


class LianjiaItem(scrapy.Item):
    # define the fields for your item here like:
    # name = scrapy.Field()

    pic = scrapy.Field()
    title = scrapy.Field()
    detail_url = scrapy.Field()
    price = scrapy.Field()
    publish_info = scrapy.Field()
    pic_list = scrapy.Field()
    house_code = scrapy.Field()
    ucid = scrapy.Field()
    agent_name = scrapy.Field()
    agent_phone = scrapy.Field()

2.lianjia.py

# -*- coding: utf-8 -*-
import scrapy
from LianJia.items import LianjiaItem
import re
import json
import requests


class LianjiaSpider(scrapy.Spider):
    name = 'lianjia'
    allowed_domains = ['lianjia.com']
    start_urls = ['https://www.lianjia.com/city/']

    def parse(self, response):
        # 获取到的是新房的url
        city_url_list = response.xpath("//div[@class='city_province']//li/a/@href").extract()
        # print(city_url_list)
        city_name_list = response.xpath("//div[@class='city_province']//li/a/text()").extract()
        for index in range(len(city_url_list)):
            city_name = city_name_list[index]
            city_url = city_url_list[index]
            # print(city_url)
            # 城市首字母
            city_alp = re.findall(r"https://(\w*).", city_url)[0]
            # print(city_alp)
            # 拼接租房城市url
            city_url = "https://" + city_alp + ".lianjia.com/zufang/"
            # print("--------------------{}开始下载-------------------------------".format(city_name))
            yield scrapy.Request(url=city_url, 
评论
添加红包

请填写红包祝福语或标题

红包个数最小为10个

红包金额最低5元

当前余额3.43前往充值 >
需支付:10.00
成就一亿技术人!
领取后你会自动成为博主和红包主的粉丝 规则
hope_wisdom
发出的红包
实付
使用余额支付
点击重新获取
扫码支付
钱包余额 0

抵扣说明:

1.余额是钱包充值的虚拟货币,按照1:1的比例进行支付金额的抵扣。
2.余额无法直接购买下载,可以购买VIP、付费专栏及课程。

余额充值