用python写爬虫demo

最新推荐文章于 2022-11-03 17:49:53 发布

shuzhuang25

最新推荐文章于 2022-11-03 17:49:53 发布

阅读量696

点赞数

分类专栏： python 数据分析爬虫文章标签： python 爬虫 requests BeautifulSoup

本文链接：https://blog.csdn.net/qq_33429225/article/details/79616050

版权

本文介绍了一次使用Python进行网络爬虫的实践，通过requests和BeautifulSoup库抓取网易股票网站的数据。代码包含在StaticStock.py和DynamicStock.py文件中，详细注释便于理解。

摘要由CSDN通过智能技术生成

python真的特别适合处理字符串

而且python有大量的库，如用来处理网页的requests和 BeautifulSoup 库

这次demo是用python爬取网易的股票网站

http://quotes.money.163.com/

直接上代码，里面的注释很详细。

代码在这里：点击打开链接

代码：

StaticStock.py

import requests
import re
from bs4 import BeautifulSoup

####   第一题   ######################################
#调用requests来爬取网页
url = 'http://quotes.money.163.com/0600795.html'#定义url
res = requests.get(url)
res.encoding = "utf-8"  # 设置网页编码
# 字符串处理，得到   0600795.html
filename = url.split('/')[-1]
#保存网页
fd = open(filename, 'w', encoding='utf-8', errors='ignore')
print(res.text, file=fd)
fd.close()
##########    第二题   ###########################
##调用BeautifulSoup来处理网页
soup = BeautifulSoup(open(filename, encoding='utf-8'), "html.parser")
##找到类名为 corp_info 的table
tag1 = soup.select(".corp_info")
ch = []
for child in tag1[0].strings:
    ch.append(child)
##正则匹配，得到日期
mat = re.search(r"(\d{4}-\d{1,2}-\d{1,2})", (ch[19]))
##输出日期
print('这一只股票首次上市的时间:\n',mat.group())
#############   第三题       ################################
##调用requests来爬取网页
url = 'http://