Python爬取steam特惠促销榜

Python爬取steam特惠促销榜

用python爬取https://store.steampowered.com/search/?os=win&specials=1&filter=topsellers的steam特惠促销信息

import requests
from bs4 import BeautifulSoup
import bs4

def Get_html(url):
    try:        
        r=requests.get(url,timeout=30)
        r.raise_for_status()
        r.encoding=r.apparent_encoding
        return r.text
    except:
        return ""
    
def Fill_html(gList,html):
    soup=BeautifulSoup(html,"html.parser")     
    a=soup.find_all(name='a',attrs={"data-search-page":"1"})
    b=soup.find_all(name='span',attrs={"class":"title"})
    c=soup.find_all(name="span",attrs={"style":"color: #888888;"})
    d=soup.find_all(name="div",attrs={"class":"col search_discount responsive_secondrow"})
    for i in b:
        gList.append(i.string)
    for i in d:
        gList.append(i.text)
    for i in c:
        gList.append(i.string)
    for i in c:
        gList.append(i.next_sibling.next_sibling)
    for i in a:
        gList.append(i["href"])
   
def Print_html(gList):
    d="{0:{6}^2}\t{1:{6}^3}\t{2:{6}^3}\t{3:{6}^3}\t{4:{6}<50}\t{5:{6}>15}"
    print(d.format("排名","折扣","原价","现价","游戏","链接",chr(12288)))
    a=len(gList)//5
    for i in range(a):
        print(d.format(i+1,gList[a+i].strip(),gList[a*2+i],gList[a*3+i].strip(),gList[i].strip(),gList[a*4+i].strip(),chr(12288)))

def main():
    url="https://store.steampowered.com/search/?os=win&specials=1&filter=topsellers"
    getinfo=[]
    html=Get_html(url)
    Fill_html(getinfo,html)
    Print_html(getinfo)

main()

引用requests库和BeautifulSoup完成爬虫,主体为爬取转换函数,指定爬取函数和输出函数三部分。

源码分析

steam特售商品原码如下

				<a href="https://store.steampowered.com/app/1118010/Monster_Hunter_World_Iceborne/?snr=1_7_7_2300_150_1"
			 data-ds-appid="1118010" data-ds-itemkey="App_1118010" data-ds-tagids="[19,3859,1695,1685,9564,4026,1697]" data-ds-crtrids="[33273264,34827959]" onmouseover="GameHover( this, event, 'global_hover', {&quot;type&quot;:&quot;app&quot;,&quot;id&quot;:1118010,&quot;public&quot;:1,&quot;v6&quot;:1} );" onmouseout="HideGameHover( this, event, 'global_hover' )" class="search_result_row ds_collapse_flag "
		   data-search-page="1" data-gpnav="item">
            <div class="col search_capsule"><img src="https://media.st.dl.pinyuncloud.com/steam/apps/1118010/capsule_sm_120.jpg?t=1605143784" srcset="https://media.st.dl.pinyuncloud.com/steam/apps/1118010/capsule_sm_120.jpg?t=1605143784 1x, https://media.st.dl.pinyuncloud.com/steam/apps/1118010/capsule_231x87.jpg?t=1605143784 2x"></div>
            <div class="responsive_search_name_combined">
                <div class="col search_name ellipsis">
                    <span class="title">Monster Hunter World: Iceborne</span>
                    <p>
                        <span class="platform_img win"></span>                    </p>
                </div>
                <div class="col search_released responsive_secondrow">2020年1月9日</div>
                <div class="col search_reviewscore responsive_secondrow">
                                            <span class="search_review_summary mixed" data-tooltip-html="褒贬不一&lt;br&gt;13,454 篇用户的游戏评测中有 52% 为好评。&lt;br&gt;&lt;br&gt;此产品在一个或多个时间段内出现跑题评测活动。这些时间段内的评测已按您的偏好设置不计入此产品的评测分数。">
								</span>
                                    </div>


                <div class="col search_price_discount_combined responsive_secondrow" data-price-final="16800">
                    <div class="col search_discount responsive_secondrow">
                        <span>-38%</span>
                    </div>
                    <div class="col search_price discounted responsive_secondrow">
                        <span style="color: #888888;"><strike>¥ 271</strike></span><br>¥ 168                    </div>
                </div>
            </div>


            <div style="clear: left;"></div>
        </a>

这里我们通过分析原码解析爬取特定数据的函数;
我们用soup.find_all(name,attrs={})搜索特定html数据,name为标签名,attrs为区分的特定属性,以链接为例,我们找到name为a,attrs包含{“data-search-page”:“1”}的href属性就是要找的链接,其他元素的搜索同理。
当一个标签下有多个要提取元素时,且被提取元素被标签分割,即这些标签处于平行关系时,我们可以用特定指令搜索平行节点,例如:

			i.next_sibling 下一平行节点标签
			i.previous_sibling 上一平行节点标签
			i.next_siblings 后续所有平行节点标签
			i.previous_siblings 前面所有平行节点标签
  • 2
    点赞
  • 11
    收藏
    觉得还不错? 一键收藏
  • 1
    评论

“相关推荐”对你有帮助么?

  • 非常没帮助
  • 没帮助
  • 一般
  • 有帮助
  • 非常有帮助
提交
评论 1
添加红包

请填写红包祝福语或标题

红包个数最小为10个

红包金额最低5元

当前余额3.43前往充值 >
需支付:10.00
成就一亿技术人!
领取后你会自动成为博主和红包主的粉丝 规则
hope_wisdom
发出的红包
实付
使用余额支付
点击重新获取
扫码支付
钱包余额 0

抵扣说明:

1.余额是钱包充值的虚拟货币,按照1:1的比例进行支付金额的抵扣。
2.余额无法直接购买下载,可以购买VIP、付费专栏及课程。

余额充值