python爬房源信息_Python爬链家网租房信息

本文介绍了一个Python脚本,它通过requests和BeautifulSoup库爬取链家网的租房信息,包括标题、地点、区域、类型等关键字段,并将数据存储到MySQL数据库中,展示了如何利用多用户代理和动态SQL插入来高效处理数据。
摘要由CSDN通过智能技术生成

爬去链家网的租房信息然后存储到数据库中。

#-*- coding:utf-8 -*-

import requests

import re

import random

import MySQLdb

from bs4 import BeautifulSoup

class house():

def get_house(self):

user_agent=['Mozilla/5.0 (Windows NT 6.1; WOW64) AppleWebKit/537.36 (KHTML, like Gecko) Chrome/54.0.2840.87 Safari/537.36',

'Mozilla/5.0 (X11; U; Linux x86_64; zh-CN; rv:1.9.2.10) Gecko/20100922 Ubuntu/10.10 (maverick) Firefox/3.6.10',

'Mozilla/5.0 (X11; Linux x86_64) AppleWebKit/537.11 (KHTML, like Gecko) Chrome/23.0.1271.64 Safari/537.11',

'Mozilla/5.0 (Windows NT 6.1; WOW64) AppleWebKit/537.36 (KHTML, like Gecko) Chrome/30.0.1599.101 Safari/537.36',

'Mozilla/5.0 (Windows NT 6.1; WOW64) AppleWebKit/537.1 (KHTML, like Gecko) Chrome/21.0.1180.71 Safari/537.1 LBBROWSER',

'Mozilla/5.0 (compatible; MSIE 9.0; Windows NT 6.1; WOW64; Trident/5.0; SLCC2; .NET CLR 2.0.50727; .NET CLR 3.5.30729; .NET CLR 3.0.30729; Media Center PC 6.0; .NET4.0C; .NET4.0E; QQBrowser/7.0.3698.400)',

]

headers={

'Accept': 'text/html,application/xhtml+xml,application/xml;q=0.9,image/webp,*/*;q=0.8',

'Accept-Encoding': 'gzip, deflate, sdch',

'Accept-Language': 'zh-CN,zh;q=0.8',

'User-Agent': user_agent[random.randint(0,5)]

}

db = MySQLdb.connect('localhost','root','liao1234','liao',charset='utf8')

cursor = db.cursor()

for i in range(2,72):

url = 'http://hz.lianjia.com/zufang/pg'+str(i)+'/'

r = requests.get(url,headers=headers)

r.encoding = 'utf8'

html = r.text

soup = BeautifulSoup(html)

for tag in soup.find('ul',id='house-lst').find_all('div',class_='info-panel'):

ss = []

for aa in tag.find_all('a'):

print aa.string

ss.append(aa.string)

for bb in tag.find_all('span'):

print bb.string

ss.append(bb.string)

print len(ss)

if len(ss) == 15:

sql = "insert into lianjia(title,village,are,type,size,ori,info,rent,people) values('%s','%s','%s','%s','%s','%s','%s','%s','%s')"%(ss[0],ss[1],ss[2],ss[4],ss[6],ss[7],ss[11],ss[13],ss[14])

elif len(ss) == 18:

sql = "insert into lianjia(title,village,are,type,size,ori,info,rent,people) values('%s','%s','%s','%s','%s','%s','%s','%s','%s')"%(ss[0],ss[1],ss[2],ss[4],ss[6],ss[7],ss[11],ss[16],ss[17])

else:

continue

cursor.execute(sql)

test = house()

test.get_house()

  • 0
    点赞
  • 0
    收藏
    觉得还不错? 一键收藏
  • 0
    评论
评论
添加红包

请填写红包祝福语或标题

红包个数最小为10个

红包金额最低5元

当前余额3.43前往充值 >
需支付:10.00
成就一亿技术人!
领取后你会自动成为博主和红包主的粉丝 规则
hope_wisdom
发出的红包
实付
使用余额支付
点击重新获取
扫码支付
钱包余额 0

抵扣说明:

1.余额是钱包充值的虚拟货币,按照1:1的比例进行支付金额的抵扣。
2.余额无法直接购买下载,可以购买VIP、付费专栏及课程。

余额充值