1.配置环境
1.下载anaconda包
2.下载python 下载链接:https://www.python.org/ftp/python/3.8.0/python-3.8.0-amd64.exe
2.安装所需库:
1.python内置库
1.urllib:
import urllib
import urllib.request
urllib.request.urlopen()
2.re
2.请求库
1.requests: pip3 install requests ;requests.get();
3.驱动浏览器库 :得到js渲染之后的网页
1.selenium:pip3 install selenium
2.浏览器驱动
1.chromedriver:
1.phantomjs:浏览器。
4.网页解析库
2.lxml对网页解析。pip3 install lxml
3.beautifulsoup库: pip3 install beautifulsoup4
4.pyquery网页解析库 pip3 install pyquery
//通过浏览器驱动,获取网页源代码。解析js渲染。
from selenium import webdriver
driver =webdriver.PhantomJS()
driver.get("http://www.baidu.com")
driver.page_source```
```python
from bs4 import BeautifulSoup
soup =BeautifulSoup('<html></html>','lxml')
//pyquery
from pyquery import PyQuery as pq
doc=pq('<html>Hello</html>')
result= doc('htnl').text()
result
5.数据连接存储库
pymysql: pip3 pymysql
pymongo连接数据库的库:pip3 insatll pymongo
redis用于分布式爬虫非关系数据库:pip3 install redis
flask库web库。pip3 install flask
django是web服务器框架,做爬虫网站。pip3 install django
jupyter运行在网页上的记事本。pip3 insatll jupyter
//pymysql
import pymysql
conn=pymysql.connect(host='localhost',user='root',password='123456',port=3306,db='mysql')
cursor=conn.cursor()
cursor.execute('select * from db')
cursor.fetchone()
//mongo
import pymongo
cliebt=pymongo.MongoClient('localhost')
db=client['newtestdb']
db['table'].insert(('name':'Bob'))
db['table'].find_one(('name':'Bob'))
//redis
import redis
r=redis.Redis('localhost',6379)
r.set('name','Bob')
r.get('name')
//jupyter
jupyter notebook