CVPR2017_Papers下载爬虫程序

每年CVPR总是要看不少papers,于是,不如把所有papers都下载下来,再一一筛选,免去了在线查找的麻烦。So,下载就是简单的不能再简单的爬虫程序,毕竟,山不在高,有仙则名,水不在深,有龙则灵,code不在全,能用就行!

#!/usr/bin/env python
# coding=utf-8
import urllib
import urllib2
import re

def getHtml(url):
    page = urllib.urlopen(url)
    html = page.read()
    return html
def download_file(download_url,file_name, count):
    response = urllib2.urlopen(download_url)
    file = open(file_name, 'w')
    file.write(response.read())
    file.close()
    print("Completed" + str(count).zfill(4))

save_path = '/home/nick/cvpr2017/'  # New folder
url = 'http://openaccess.thecvf.com/CVPR2017.py'
html = getHtml(url)
parttern = re.compile(r'\bcontent_cvpr_2017.*paper\.pdf\b')
url_list = parttern.findall(html)
print len(url_list)  # Should be 783
count = 0
breakpoint = 0
for url in url_list:
    count += 1
    if count>breakpoint:  # Sometime there is timeout wrong, So we need to continue to  download from the breakpoint
        name = url.split('/')[-1]
        file_name = save_path + name
        download_file('http://openaccess.thecvf.com/'+url,file_name, count)



评论
添加红包

请填写红包祝福语或标题

红包个数最小为10个

红包金额最低5元

当前余额3.43前往充值 >
需支付:10.00
成就一亿技术人!
领取后你会自动成为博主和红包主的粉丝 规则
hope_wisdom
发出的红包
实付
使用余额支付
点击重新获取
扫码支付
钱包余额 0

抵扣说明:

1.余额是钱包充值的虚拟货币,按照1:1的比例进行支付金额的抵扣。
2.余额无法直接购买下载,可以购买VIP、付费专栏及课程。

余额充值