最近荷兰的租房市场太紧俏了,埃因霍温随随便便的一个房子都2k+欧元。朋友随口问了下能不能写一个爬虫监控下房源,并能及时通知他。就有了这个爬虫。
框架
基本框架:
1)爬虫抓取租房网站页面信息
2)处理获取到的页面信息
3)推送信息给到目标邮箱
实现
1)获取网页信息
response = requests.get(url)
通过requests
获取网页的html。
2)解析网页信息,获取需要的内容
使用BeautifulSoup处理返回的html生成包含网页内容的BeautifulSoup对象,通过BeautifulSoup对象选择指定class中的内容,处理内容。
soup = BeautifulSoup(response.text, 'html.parser')
residences = soup.select('.regi-list') # 住宅列表
if len(residences) == 0:
logging.info("geen huis beschikbaar")
for residence in residences:
title = residence.text
email_alert(title)
3)组装smtp请求并发送
生成smtp server对象,组装发送邮件信息,发送邮件,关闭server
def email_alert(message):
msg = MIMEMultipart()
msg['From'] = sender_email
msg['To'] = receiver_email
msg['Subject'] = "huis alert"
msg.attach(MIMEText(message, 'plain'))
server = smtplib.SMTP(smtp_server, port)
server.connect(smtp_server, port)
server.starttls()
server.login(sender_email, password)
server.sendmail(sender_email, receiver_email, msg.as_string())
server.quit()
logging.warn("Email body: %s", message)
4)创建计划任务,周期执行
schedule.every(60).seconds.do(main)
while True:
schedule.run_pending()
time.sleep(1)
上面的代码会用到的变量有:
url = "https://holland2stay.com/residences.html?_=1700477697025&available_to_book=179&city=29"
sender_email = "foo@hotmail.com"
receiver_email = "barr@gmail.com"
password = "password"
# 也可选用其他smtp服务器
smtp_server = "smtp.office365.com"
# smtp服务器不同,端口也会不同,详情可参考各邮件服务文档
port = 587
全部代码
import time
import requests
from bs4 import BeautifulSoup
import smtplib
import schedule
from email.mime.multipart import MIMEMultipart
from email.mime.text import MIMEText
import logging
url = "https://holland2stay.com/residences.html?_=1700477697025&available_to_book=179&city=29"
sender_email = "foo@hotmail.com"
receiver_email = "bar@gmail.com"
password = "password"
smtp_server = "smtp.office365.com"
port = 587
logging.basicConfig(filename='email_alerts.log', level=logging.INFO,
format='%(asctime)s:%(levelname)s:%(message)s')
def main():
response = requests.get(url)
soup = BeautifulSoup(response.text, 'html.parser')
residences = soup.select('.regi-list')
if len(residences) == 0:
logging.info("geen huis beschikbaar")
for residence in residences:
title = residence.text
email_alert(title)
def email_alert(message):
msg = MIMEMultipart()
msg['From'] = sender_email
msg['To'] = receiver_email
msg['Subject'] = "huis alert"
msg.attach(MIMEText(message, 'plain'))
server = smtplib.SMTP(smtp_server, port)
server.connect(smtp_server, port)
server.starttls()
server.login(sender_email, password)
server.sendmail(sender_email, receiver_email, msg.as_string())
server.quit()
logging.warn("Email body: %s", message)
schedule.every(60).seconds.do(main)
while True:
schedule.run_pending()
time.sleep(1)
总结
这个爬虫实现了对单个租房网站,单个城市的房源监控。并发送监控到的信息至用户邮箱。如有必要,该爬虫监控服务还可以实现更多,更复杂的功能。
1)提取价格信息,做价格过滤,发送满足价格条件的房源
2)邮件提醒聚合逻辑,对单个房源的多次条件满足做聚合,以免不停发送邮件
3)对多个租房网站进行监控
4)在客户端发送信息,比如Mac上面使用pync的Notifier模块发送系统信息,更快捷