Project: crawl the title of all free games on steam (Part II)

Project: crawl the title of all free games on steam (Part II)

Python modules: requests, re 

Objective: get needed information from websites

In the previous section, we have successfully crawled the titles of free games on steam. The question left is how to get titles from other pages. 

import requests
import re

titles = []
for i in range(11):
    freegame_url = f'https://store.steampowered.com/genre/Free%20to%20Play/#p={i}&tab=TopSellers'

#    headers = {''}
#    freegame_content = requests.get(freegame_url, headers=headers) #try to add headers when error 418 is raised

    freegame_content = requests.get(freegame_url)
    freegame_html = freegame_content.text
    title_pat = '<div class="tab_item_name">(.*?)</div>'
    titles_page = re.compile(title_pat).findall(freegame_html)
    for t in titles_page:
        titles.append(t)

print(titles)
print(f'{len(titles)} free games in total!')

The answer is simple! We add loops! 

We can observe from steam's website that there are 12 pages for free games. And from the url we can notice the nuance between each page: p=page, so we enumerate from 0 to 11. For each loop, we generate a new url and post request to it. Each time we get the title list of the page, then we add each element in the list to the list: titles[] that we create in the first place. In the end we can get the list with all titles included. At last we print the length of the list so we can know how many free games are there in total.

It is worthwhile to try with other websites. We may start with inspecting the html file of the site. Then we try to find the location of the elements needed. We use ctrl+f to search if the pattern works only for the elements we want, and do some modification if it is not the case. We have others tools to help if the condition is more complicated, we will learn to use css, xpath, related-location selectors when we introduce scrapy, selenium, helium, etc. 

  • 0
    点赞
  • 0
    收藏
    觉得还不错? 一键收藏
  • 0
    评论
评论
添加红包

请填写红包祝福语或标题

红包个数最小为10个

红包金额最低5元

当前余额3.43前往充值 >
需支付:10.00
成就一亿技术人!
领取后你会自动成为博主和红包主的粉丝 规则
hope_wisdom
发出的红包
实付
使用余额支付
点击重新获取
扫码支付
钱包余额 0

抵扣说明:

1.余额是钱包充值的虚拟货币,按照1:1的比例进行支付金额的抵扣。
2.余额无法直接购买下载,可以购买VIP、付费专栏及课程。

余额充值