经典爬虫：爬取百度股票

最新推荐文章于 2024-07-22 16:19:52 发布

utopianist

最新推荐文章于 2024-07-22 16:19:52 发布

阅读量3k

点赞数

分类专栏：爬虫

本文链接：https://blog.csdn.net/qq_35193302/article/details/83718203

版权

本文介绍如何使用Python爬虫从百度获取股票信息。首先，通过东方财富网爬取股票代号，接着利用HTML下载器和URL生成器解析网页。接着，解析并保存股票信息，过程中使用异常处理确保稳定运行。最终，股票数据将被保存到本地文件中。

摘要生成于 C知道，由 DeepSeek-R1 满血版支持，前往体验 >

关键字： 百度股票 爬虫 文件保存

课程 URL：http://www.icourse163.org/course/BIT-1001870001

GitHub： https://github.com/utopianist/CrawBaiduStock

前言

百度股票 URL ：https://gupiao.baidu.com/stock/ + sz300059 +.html，其中以 sh 开头的代表上交所挂牌交易的股票，以 sz 开头的代表深交所挂牌交易的股票。

东方财富.png

第一步我们要在 东方财富网 爬取类似 sz300059 这样的股票代号：

HTML下载器

def getHTML(url):
    try:
        r = requests.get(url)
        r.raise_for_status()
        r.encoding = r.apparent_encoding
        return r.text
    except:
        return ""

URL生成器

def getStockURL(nameurl, urllist):
    html = getHTML(nameurl) #调用HTML下载器
    name = re.findall('[s][hz]\d{6}', html)
    for item in name:
        urllist.append("https://gupiao.baidu.com/stock/%s.html" %item)

从 东方财富网 下载类似 sz300059 这样的股票代号，我们调用 re 库，再用正则表达式 [s][hz]\d{6} 去完成匹配。

获取股票信息并保存

def getStockInfo(urllist, fpath):
    for i in range(len(urllist)):
        html = getHTML(urllist[i]) #调用HTML下载器
        soup = BeautifulSoup(html, "html.parser")
        try:
            info = {
   }
            title = soup.find_all('a', attrs={
   'class':

最低0.47元/天解锁文章