![12ba479a7bce3233b92b36e8d3a2046b.png](https://img-blog.csdnimg.cn/img_convert/12ba479a7bce3233b92b36e8d3a2046b.png)
简单Python爬虫教程
![65564d055484b1ce65cf32d6ebbdc6f0.png](https://img-blog.csdnimg.cn/img_convert/65564d055484b1ce65cf32d6ebbdc6f0.png)
准备工作
需要安装第三方库
pip install requests
requests第三方库安装的时候没有截图,大家可以输入命令直接安装
pip install beautifulsoup4
D:20180801scriptpython>pip install beautifulsoup4Collecting beautifulsoup4 Downloading https://files.pythonhosted.org/packages/1a/b7/34eec2fe5a49718944e215fde81288eec1fa04638aa3fb57c1c6cd0f98c3/beautifulsoup4-4.8.0-py3-none-any.whl (97kB) |████████████████████████████████| 102kB 65kB/sCollecting soupsieve>=1.2 (from beautifulsoup4) Downloading https://files.pythonhosted.org/packages/35/e3/25079e8911085ab76a6f2facae0771078260c930216ab0b0c44dc5c9bf31/soupsieve-1.9.2-py2.py3-none-any.whlInstalling collected packages: soupsieve, beautifulsoup4Successfully installed beautifulsoup4-4.8.0 soupsieve-1.9.2WARNING: You are using pip version 19.1.1, however version 19.2.1 is available.You should consider upgrading via the 'python -m pip install --upgrade pip' command.D:20180801scriptpython>
pip install lxml
C:甥敳獲>pip install lxmlCollecting lxml Downloading https://files.pythonhosted.org/packages/21/ba/ca19058e1ae455c0425f72bd9fe1a0493e89f19f494b46a5c88867371def/lxml-4.4.0-cp37-cp37m-win_amd64.whl (3.7MB) |████████████████████████████████| 3.7MB 59kB/sInstalling collected packages: lxmlSuccessfully installed lxml-4.4.0WARNING: You are using pip version 19.1.1, however version 19.2.1 is available.You should consider upgrading via the 'python -m pip install --upgrade pip' command.C:甥敳獲>
先看下代码
这一个段代码 爬取的是静态页面中最简单的文本文件,超级简单的。
# -*- coding: utf-8 -*-import requestsfrom bs4 import BeautifulSoupreq = requests.get('http://www.huanyue123.com/book/37/37849/22075553.html')#打开网页#req.encoding = 'GBK' #编码html = req.text #获取连接的响应报文bf = BeautifulSoup(html ,'lxml') #按照 lxml报文解析texts = bf.find_all('div', class_ = 'contentbox clear') #找到div格式 contentbox clearprint(texts[0].text) #打印日志
打印日志
![04f63fb7dab9ee4108d0b583bb15b5aa.png](https://img-blog.csdnimg.cn/img_convert/04f63fb7dab9ee4108d0b583bb15b5aa.png)
先展示日志情况,下一篇出具体的教程。
![d2d0d06d5fdf2f3c4b0439df2828c5f0.png](https://img-blog.csdnimg.cn/img_convert/d2d0d06d5fdf2f3c4b0439df2828c5f0.png)