python爬取前程无忧_【Python】爬虫框架PySpider爬取前程无忧职位

最新推荐文章于 2024-05-01 14:35:23 发布

weixin_39562606

最新推荐文章于 2024-05-01 14:35:23 发布

阅读量118

点赞数

文章标签： python爬取前程无忧

[Python] 纯文本查看复制代码#!/usr/bin/env python

# -*- encoding: utf-8 -*-

# Created on 2018-01-29 11:56:33

# Project: qcwy

from pyspider.libs.base_handler import *

import pymongo

class Handler(BaseHandler):

crawl_config = {

}

client=pymongo.MongoClient("localhost") # 本地的MongoDB数据库

db=client["tb_qcwy"] # 数据库名

@every(minutes=24 * 60)

def on_start(self):

self.crawl('http://search.51job.com/jobsearch/search_result.php?fromJs=1&jobarea=030200&keyword=python&keywordtype=2&lang=c&stype=2&postchannel=0000&fromType=1&confirmdate=9',

callback=self.index_page,

validate_cert=False,

connect_timeout = 50,

timeout = 500

)

@config(age=10 * 24 * 60 * 60)

def index_page(self, response):

for each in response.doc('p > span > a').items(): # 每个职位详情链接

self.crawl(each.attr.href, callback=self.detail_page,validate_cert=False)

next=response.doc('.bk > a').attr.href # 下一页链接

self.crawl(next,callback=self.index_page,validate_cert=False)

@config(priority=2)

def detail_page(self, response):

return {

"url": response.url, # 页面地址

"location": response.doc('h1').text(), # 地理位置

"company":response.doc('.cname > a').text(), # 公司名

"work_location":response.doc('.lname').text(), # 工作地点

"salary":response.doc('.cn > strong').text(), # 工资

"requirements":response.doc('.sp4').text(), # 工作需求

"zhiweixinxi":response.doc('.job_msg').text(), # 职位信息

"address":response.doc('.bmsg > .fp').text(), # 公司地址

}

# 保存到MongoDB

def on_result(self,result):

if result:

self.save_to_mongo(result)

def save_to_mongo(self,result):

if self.db["qcwy20180129"].insert(result): # 数据库表名

print("save to mongo",result)

weixin_39562606

关注

0
点赞
踩
0

收藏

觉得还不错? 一键收藏
0
评论
python爬取前程无忧_【Python】爬虫框架PySpider爬取前程无忧职位

[Python] 纯文本查看复制代码#!/usr/bin/env python# -*- encoding: utf-8 -*-# Created on 2018-01-29 11:56:33# Project: qcwyfrom pyspider.libs.base_handler import *import pymongoclass Handler(BaseHandler):crawl_c...
复制链接

扫一扫

评论

被折叠的条评论为什么被折叠?

到【灌水乐园】发言

查看更多评论

添加红包

成就一亿技术人!

hope_wisdom

发出的红包

实付元

使用余额支付

点击重新获取

扫码支付

钱包余额 0

抵扣说明：

1.余额是钱包充值的虚拟货币，按照1:1的比例进行支付金额的抵扣。
2.余额无法直接购买下载，可以购买VIP、付费专栏及课程。