python爬虫小笔记/干货

最新推荐文章于 2021-05-10 23:33:02 发布

Qclover

最新推荐文章于 2021-05-10 23:33:02 发布

阅读量600

点赞数

本文链接：https://blog.csdn.net/Hq_Dream/article/details/51035964

版权

获取网页内容：

我们可以利用Python的urllib2模块来抓取网页：

import urllib2
response = urllib2.urlopen(‘http://www.laitaolun.com’)
html = response.read()
print(html)

实现网站自动登入：

import urllib,urllib2,httplib,cookielib

url = ‘http://www.yourwebsite.com/login.asp?action=chk’

values = {‘username’ : ‘admin’, ‘password’ : ‘admin’}

data = urllib.urlencode(values)

req = urllib2.Request(url, data)

response = urllib2.urlopen(req)

the_page = response.read()

print the_page
干货博客：http://www.elias.cn/Python/HomePage
python网络爬虫干货：http://www.aboutyun.com/thread-10626-1-1.html

批量干货连接：http://pan.gfsousou.cn/python%E7%BD%91%E7%BB%9C%E7%88%AC%E8%99%AB%E5%AE%9E%E6%88%98_%E8%B0%B7%E7%B2%89%E7%9B%98%E6%90%9C.html

爬虫范例：https://segmentfault.com/a/1190000000657305实现：

登录Coursera；

在课程资源页面里面找到资源链接；

根据资源链接选择合适的工具下载资源。

Python搜索爬虫视频教程：http://pan.baidu.com/s/1eQxQuNg

python爬取百度云盘资源:python爬取百度云盘资源
http://www.oschina.net/code/snippet_2391943_52647