Python爬虫
文章平均质量分 54
One-Shell
The best preparation for tomorrow is doing my best today.
展开
-
Python网络爬虫(1)获取网页
from urllib.request import urlopen from urllib.error import HTTPError from urllib.error import URLError from bs4 import BeautifulSoup def getHTML(url): try: html = urlopen(url) exc原创 2017-01-07 17:21:57 · 527 阅读 · 0 评论 -
Python网络爬虫(2)处理网页数据find和findall函数
findAll(tag, attributes, recursive, text, limit, keywords) find(tag, attributes, recursive, text, keywords) 标签参数tag :传一个标签的名称或多个标签名称组成的 Python 列表做标签参数 属性参数attributes 是用一个 Python 字典封装一个标签的若干属性和对应的属性原创 2017-01-07 17:28:32 · 4544 阅读 · 0 评论 -
Python网络爬虫(3)正则表达式
http://blog.csdn.net/drdairen/article/details/51134816转载 2017-01-07 19:21:59 · 574 阅读 · 0 评论 -
Python网络爬虫(4)煎蛋网妹子图片抓取
from urllib.request import urlopen from urllib.error import HTTPError from urllib.error import URLError from bs4 import BeautifulSoup import urllib.request import re import os def get_html(url):原创 2017-01-08 11:09:33 · 589 阅读 · 0 评论 -
Python网络爬虫(5)糗事百科段子抓取
def get_html(url): try: html = urlopen(url) except HTTPError as e: print(e) return None except URLError as e: print(e) return None try:原创 2017-01-09 09:17:09 · 445 阅读 · 0 评论 -
Python网络爬虫(6)糗事百科图片抓取按主题名保存
from urllib.request import urlopen from urllib.request import urlretrieve from bs4 import BeautifulSoup import requests import re import os url = "http://www.qiushibaike.com/imgrank/" path = "D:/2/"原创 2017-01-09 20:18:36 · 647 阅读 · 0 评论 -
Python网络爬虫(7)西南科技大学统一认证平台登录
import requests from http.cookiejar import CookieJar from bs4 import BeautifulSoup urlBefore = "http://cas.swust.edu.cn/authserver/login" def getResopnseAfterLogin(): head ={ "User-Agent":"M原创 2017-01-11 16:06:27 · 2764 阅读 · 0 评论 -
Python网络爬虫模拟CSDN
import re import requests from bs4 import BeautifulSoup url = "http://passport.csdn.net/account/login" def Login(): head={ "User-Agent":"Mozilla/5.0 (Windows NT 10.0; WOW64) AppleWebKit/537.36原创 2017-01-16 21:22:00 · 427 阅读 · 0 评论