github上的python爬虫项目_GitHub - ahaharry/PythonCrawler: 用python编写的爬虫项目集合

最新推荐文章于 2021-03-14 11:26:56 发布

weixin_39950764

最新推荐文章于 2021-03-14 11:26:56 发布

阅读量149

点赞数

文章标签： github上的python爬虫项目

PythonCrawler: 用python编写的爬虫项目集合

(

)\ ) ) ) ( (

(()/( ( ( /( ( /( )\ ( ) ( ( )\ ( (

/(_)))\ ) )\()))\()) ( ( (((_) )( ( /( )\))( ((_) ))\ )(

(_)) (()/( (_))/((_)\ )\ )\ ) )\___ (()\ )(_))((_)()\ _ /((_)(()\

| _ \ )(_))| |_ | |(_) ((_) _(_/(((/ __| ((_)((_)_ _(()((_)| |(_)) ((_)

| _/| || || _|| ' \ / _ \| ' \))| (__ | '_|/ _` |\ V V /| |/ -_) | '_|

|_| \_, | \__||_||_|\___/|_||_| \___||_| \__,_| \_/\_/ |_|\___| |_|

|__/

—————— by yanghangfeng

spiderFile模块简介

1. baidu_sy_img.py: 抓取百度的‘高清摄影’图片

2. baidu_wm_img.py: 抓取百度图片‘唯美意境’模块

3. get_photos.py: 抓取百度贴吧某话题下的所有图片

5. lagou_position_spider.py: 任意输入关键字，一键抓取与关键字相关的职位招聘信息，并保存到本地文件

6. student_img.py: 基于本学校官网的url漏洞，获取所有注册学生学籍证件照

7. JD_spider.py: 大批量抓取京东商品id和标签

8. ECUT_pos_html.py: 抓取学校官网所有校园招聘信息，并保存为html格式，图片也会镶嵌在html中。

9. ECUT_get_grade.py: 模拟登陆学校官网，抓取成绩并计算平均学分绩

10. github_hot.py: 抓取github上面热门语言所对应的项目，并把项目简介和项目主页地址保存到本地文件。

11.xz_picture_spider.py: 应一位知友的请求，抓取某网站上面所有的写真图片。

12.one_img.py: 抓取one文艺网站的图片

spiderAPI模块简介

本模块提供一些网站的API爬虫接口，功能可能不是很全因此可塑性很大智慧的你如果有兴趣可以继续改进。

1.大众点评

from spiderAPI.dianping import *

'''

citys = {

'北京': '2', '上海': '1', '广州': '4', '深圳': '7', '成都': '8', '重庆': '9', '杭州': '3', '南京': '5', '沈阳': '18', '苏州': '6', '天津': '10','武汉': '16', '西安': '17', '长沙': '344', '大连': '19', '济南': '22', '宁波': '11', '青岛': '21', '无锡': '13', '厦门': '15', '郑州': '160'

}

ranktype = {

'最佳餐厅': 'score', '人气餐厅': 'popscore', '口味最佳': 'score1', '环境最佳': 'score2', '服务最佳': 'score3'

}

'''

result=bestRestaurant(cityId=1, rankType='popscore')#获取人气餐厅

shoplist=dpindex(cityId=1, page=1)#商户风云榜

restaurantlist=restaurantList('http://www.dianping.com/search/category/2/10/p2')#获取餐厅

2.获取代理IP

from spiderAPI.proxyip import get_enableips

enableips=get_enableips()

3.百度地图

百度地图提供的API,对查询有一些限制，这里找出了web上查询的接口

from spiderAPI.baidumap import *

citys=citys()#获取城市列表

result=search(keyword="美食", citycode="257", page=1)#获取搜索结果

4.模拟登录github

from spiderAPI.github import GitHub

github = GitHub()

github.login() # 这一步会提示你输入用户名和密码

github.show_timeline() # 获取github主页时间线

# 更多的功能有待你们自己去发掘

5.拉勾网

from spiderAPI.lagou import *

lagou_spider(key='数据挖掘', page=1) # 获取关键字为数据挖掘的招聘信息

weixin_39950764

关注

0
点赞
踩
0

收藏

觉得还不错? 一键收藏
0
评论
github上的python爬虫项目_GitHub - ahaharry/PythonCrawler: 用python编写的爬虫项目集合

PythonCrawler: 用python编写的爬虫项目集合()\ ) ) ) ( ((()/( ( ( /( ( /( )\ ( ) ( ( )\ ( (/(_)))\ ) )\()))\()) ( ( (((_)...
复制链接

扫一扫

评论

被折叠的条评论为什么被折叠?

到【灌水乐园】发言

查看更多评论

添加红包

成就一亿技术人!

hope_wisdom

发出的红包

实付元

使用余额支付

点击重新获取

扫码支付

钱包余额 0

抵扣说明：

1.余额是钱包充值的虚拟货币，按照1:1的比例进行支付金额的抵扣。
2.余额无法直接购买下载，可以购买VIP、付费专栏及课程。