Python爬取steam特惠促销榜

forthenight996

已于 2022-01-28 10:21:08 修改

阅读量1.2k

点赞数 2

分类专栏： Python爬虫文章标签： python 爬虫 html

于 2021-04-14 20:02:01 首次发布

本文链接：https://blog.csdn.net/forthenight996/article/details/115707403

版权

Python爬虫专栏收录该内容

4 篇文章 1 订阅

订阅专栏

Python爬取steam特惠促销榜

用python爬取https://store.steampowered.com/search/?os=win&specials=1&filter=topsellers的steam特惠促销信息

import requests
from bs4 import BeautifulSoup
import bs4

def Get_html(url):
    try:        
        r=requests.get(url,timeout=30)
        r.raise_for_status()
        r.encoding=r.apparent_encoding
        return r.text
    except:
        return ""
    
def Fill_html(gList,html):
    soup=BeautifulSoup(html,"html.parser")     
    a=soup.find_all(name='a',attrs={"data-search-page":"1"})
    b=soup.find_all(name='span',attrs={"class":"title"})
    c=soup.find_all(name="span",attrs={"style":"color: #888888;"})
    d=soup.find_all(name="div",attrs={"class":"col search_discount responsive_secondrow"})
    for i in b:
        gList.append(i.string)
    for i in d:
        gList.append(i.text)
    for i in c:
        gList.append(i.string)
    for i in c:
        gList.append(i.next_sibling.next_sibling)
    for i in a:
        gList.append(i["href"])
   
def Print_html(gList):
    d="{0:{6}^2}\t{1:{6}^3}\t{2:{6}^3}\t{3:{6}^3}\t{4:{6}<50}\t{5:{6}>15}"
    print(d.format("排名","折扣","原价","现价","游戏","链接",chr(12288)))
    a=len(gList)//5
    for i in range(a):
        print(d.format(i+1,gList[a+i].strip(),gList[a*2+i],gList[a*3+i].strip(),gList[i].strip(),gList[a*4+i].strip(),chr(12288)))

def main():
    url="https://store.steampowered.com/search/?os=win&specials=1&filter=topsellers"
    getinfo=[]
    html=Get_html(url)
    Fill_html(getinfo,html)
    Print_html(getinfo)

main()

引用requests库和BeautifulSoup完成爬虫，主体为爬取转换函数，指定爬取函数和输出函数三部分。

源码分析

steam特售商品原码如下

				<a href="https://store.steampowered.com/app/1118010/Monster_Hunter_World_Iceborne/?snr=1_7_7_2300_150_1"
			 data-ds-appid="1118010" data-ds-itemkey="App_1118010" data-ds-tagids="[19,3859,1695,1685,9564,4026,1697]" data-ds-crtrids="[33273264,34827959]" onmouseover="GameHover( this, event, 'global_hover', {&quot;type&quot;:&quot;app&quot;,&quot;id&quot;:1118010,&quot;public&quot;:1,&quot;v6&quot;:1} );" onmouseout="HideGameHover( this, event, 'global_hover' )" class="search_result_row ds_collapse_flag "
		   data-search-page="1" data-gpnav="item">
            <div class="col search_capsule"><img src="https://media.st.dl.pinyuncloud.com/steam/apps/1118010/capsule_sm_120.jpg?t=1605143784" srcset="https://media.st.dl.pinyuncloud.com/steam/apps/1118010/capsule_sm_120.jpg?t=1605143784 1x, https://media.st.dl.pinyuncloud.com/steam/apps/1118010/capsule_231x87.jpg?t=1605143784 2x"></div>
            <div class="responsive_search_name_combined">
                <div class="col search_name ellipsis">
                    <span class="title">Monster Hunter World: Iceborne</span>
                    <p>
                        <span class="platform_img win"></span>                    </p>
                </div>
                <div class="col search_released responsive_secondrow">2020年1月9日</div>
                <div class="col search_reviewscore responsive_secondrow">
                                            <span class="search_review_summary mixed" data-tooltip-html="褒贬不一&lt;br&gt;13,454 篇用户的游戏评测中有 52% 为好评。&lt;br&gt;&lt;br&gt;此产品在一个或多个时间段内出现跑题评测活动。这些时间段内的评测已按您的偏好设置不计入此产品的评测分数。">
								</span>
                                    </div>


                <div class="col search_price_discount_combined responsive_secondrow" data-price-final="16800">
                    <div class="col search_discount responsive_secondrow">
                        <span>-38%</span>
                    </div>
                    <div class="col search_price discounted responsive_secondrow">
                        <span style="color: #888888;"><strike>¥ 271</strike></span><br>¥ 168                    </div>
                </div>
            </div>


            <div style="clear: left;"></div>
        </a>

这里我们通过分析原码解析爬取特定数据的函数；
我们用soup.find_all(name,attrs={})搜索特定html数据，name为标签名，attrs为区分的特定属性，以链接为例，我们找到name为a，attrs包含{“data-search-page”:“1”}的href属性就是要找的链接，其他元素的搜索同理。
当一个标签下有多个要提取元素时，且被提取元素被标签分割，即这些标签处于平行关系时，我们可以用特定指令搜索平行节点，例如：

			i.next_sibling 下一平行节点标签
			i.previous_sibling 上一平行节点标签
			i.next_siblings 后续所有平行节点标签
			i.previous_siblings 前面所有平行节点标签

forthenight996

关注

2
点赞
踩
11

收藏

觉得还不错? 一键收藏
1
评论
Python爬取steam特惠促销榜

Python爬取steam特惠促销榜用python爬取https://store.steampowered.com/search/?os=win&specials=1&filter=topsellers的steam特惠促销信息import requestsfrom bs4 import BeautifulSoupimport bs4def Get_html(url): try: r=requests.get(url,timeout=30)
复制链接

扫一扫

专栏目录