Python网络爬虫与信息提取
爬虫相关
cycy小陈
进一步有一步的欢喜。
展开
-
Python爬虫 爬取网页的例子
>>> import requests>>> r = requests.get("https://item.jd.com/2967929.html")>>> r.status_code200>>> >>> r.encoding'gbk'>>原创 2018-09-11 18:16:38 · 2095 阅读 · 0 评论 -
Python爬虫 淘宝商品信息定向爬虫
代码:import requestsimport re def getHTMLText(url): try: r = requests.get(url, timeout=30) r.raise_for_status() r.encoding = r.apparent_encoding return r.text...原创 2018-09-13 17:25:50 · 704 阅读 · 1 评论 -
re贪婪匹配
原创 2018-09-13 16:46:48 · 251 阅读 · 0 评论 -
Python爬虫 match对象
原创 2018-09-13 16:34:01 · 356 阅读 · 0 评论 -
Python爬虫 正则表达式RE
>>> import re>>> match = re.search(r'[1-9]\d{5}', 'BIT 100081')>>> if match: print(match.group(0)) 100081#match匹配开始位置>>> match = re.ma...原创 2018-09-13 16:27:34 · 218 阅读 · 0 评论 -
Python爬虫 ruquests库的几种方法
post函数>>> payload = {'key1':'value1','key2':'value2','key3':'value3'}>>> r = requests.post('http://httpbin.org/post', data = payload)>>> print(r.text){ "args": {}, ...原创 2018-09-11 18:00:42 · 805 阅读 · 0 评论 -
Python 爬虫通用代码框架
import requestsdef getHTMLText(url): try: r = requests.get(url, timeout=30) r.raise_for_status() r.encoding = r.apparent_encoding return r.text except: ...原创 2018-09-10 21:50:48 · 1656 阅读 · 0 评论 -
Python爬虫 中国大学排名爬虫
案例:import requestsfrom bs4 import BeautifulSoupimport bs4def getHTMLText(url): #爬取最好大学排名网站内容 try: r = requests.get(url, timeout = 30) r.raise_for_status() r.enco...原创 2018-09-12 23:05:28 · 2433 阅读 · 1 评论 -
Python爬虫 Scrapy 股票信息爬取
PS G:\pycourse> scrapy startproject BaiduStocksNew Scrapy project 'BaiduStocks', using template directory 'c:\\python\\python37\\lib\\site-packages\\scrapy\\templates\\project', created in: G...原创 2018-09-15 17:20:24 · 1013 阅读 · 0 评论 -
scrapy crawl douban_spider这个出错 def write(self, data, async=False)
。。。。。。 from twisted.conch import manhole, telnet File "d:\jsuk\python37\lib\site-packages\twisted\conch\manhole.py", line 154 def write(self, data, async=False): ...原创 2018-09-15 16:08:07 · 194 阅读 · 0 评论 -
Python爬虫 Html标签树
>>> r = requests.get("https://python123.io/ws/demo.html")>>> from bs4 import BeautifulSoup>>> demo = r.text>>> soup = BeautifulSoup(demo, "html.parser&qu原创 2018-09-11 22:50:01 · 1294 阅读 · 0 评论 -
Python爬虫 BeautifulSoup
>>> import requests>>> r = requests.get("https://python123.io/ws/demo.html")>>> r.text'<html><head><title>This is a python demo page&原创 2018-09-11 22:38:16 · 242 阅读 · 0 评论 -
Python爬虫 IP地址查询
原创 2018-09-11 20:33:32 · 1131 阅读 · 0 评论 -
Python爬虫 将网络图片爬去并保存到本地
代码:import requestsimport osurl = "https://gss0.baidu.com/-Po3dSag_xI4khGko9WTAnF6hhy/zhidao/wh%3D600%2C800/sign=bc75fc5640a7d933bffdec759d7bfd2b/d009b3de9c82d1587f799ff3820a19d8bd3e42fd.jpg"root...原创 2018-09-11 18:41:35 · 1881 阅读 · 1 评论 -
Python爬虫 百度360信息搜索并爬取
对百度输入要搜索的信息,并怕去返回的网页信息import requestskeyword = "Python"try: kv = {'wd': keyword} r = requests.get('https://www.baidu.com/s', params=kv) print(r.request.url) r.raise_for_status()...原创 2018-09-11 18:27:53 · 1995 阅读 · 0 评论 -
Python爬虫 股票数据定向爬虫
import requestsfrom bs4 import BeautifulSoupimport tracebackimport re def getHTMLText(url, code='utf-8'): try: r = requests.get(url) r.raise_for_status() r.encoding =...原创 2018-09-13 21:06:09 · 301 阅读 · 0 评论