python爬虫爬取steam,epic,origin平台游戏数据

最新推荐文章于 2025-01-05 21:59:05 发布

萧瑟1

最新推荐文章于 2025-01-05 21:59:05 发布

阅读量9.1k

点赞数 4

分类专栏： python 文章标签： python 爬虫

本文链接：https://blog.csdn.net/qq_41410799/article/details/112195318

版权

这是我们课程实训的一个功能模块，实现将steam,epic,origin游戏价格信息爬取出来，由于三个网站的构造不一样，加载数据的方式也不一样所以我们需要采用不同的方法来爬取这三个平台的游戏数据

用到的工具包

BeautifulSoup包 提取爬取网页标签的属性值(游戏的价格信息等)
selenium的webdriver  利用脚本实现动态加载数据
requests 爬取网页数据用

mysql 数据表

在这里插入图片描述

具体实现步骤

1. 提取steam数据

首页url
https://store.steampowered.com/search/?specials=1&page=1

steam网站的游戏数据是分页的，我们可以通过url拼接进行爬取

获取页面html信息

# 获取页面信息
def getPage(pagenum):
    headers = {
   'User-Agent': 'Mozilla/5.0 (Windows NT 10.0; WOW64) AppleWebKit/537.36 (KHTML, like Gecko) '
                             'Chrome/51.0.2704.63 Safari/537.36'}
    #https://store.steampowered.com/search/?specials=1&page=1
    urlh = "https://store.steampowered.com/search/?specials=1&page="
    url = urlh+str(pagenum);
    print(url)
    reponse = requests.get(url, headers=headers)

    reponse.encoding = 'utf-8'
    return reponse.text

将html信息存入txt文件中
进行这一步是为了方便测试，也是为了如果爬取的页面发生变动或更新，相当于留了一个备份

def saveHtmlCode(html,path):
    file = open(path, "wb")
    file.write(html)

提取页面中的游戏数据
这里面要进行数据清理，去除游戏价格中多余的特殊字符(空格，换行等)

# 获取游戏信息
def getGameInfo1(html,game_list):
    global count
    soup = BeautifulSoup(html, 'html.parser')

    # 游戏列表
    games_Info = soup.find(id='search_resultsRows')
    games_a = games_Info.find_all('a');
    for i in range(0,len(games_a)):
        #print(games_a[i])
        #商品是否打折
        is_free = games_a[i].find('div',class_="col search_price responsive_secondrow");
        if(is_free!=None): continue
        #获取商品src
        game_src = games_a[i].find('img')['src']
        #print("src = "+game_src)
        #获取商品名字
        game_name = games_a[i].find('span',class_='title').get_text()
        #print("name = "+game_name)
        #获取商品折扣
        game_discount = stripAndreplace(games_a[i].find('div',class_="col search_discount responsive_secondrow").get_text())
        #print(game_discount)
        #获取折扣信息
        priceText = stripAndreplace(games_a[i].find('div',class_="col search_price discounted responsive_secondrow").get_text())
        priceText = split(priceText)
        if (len(priceText) < 2): continue
        #获取商品原来价格
        game_original_price = priceText[0].replace(' ','');
        #获取商品折扣价格
        game_final_price

最低0.47元/天解锁文章