Project: crawl the title of all free games on steam (Part II)

最新推荐文章于 2024-09-09 00:00:00 发布

GAVIN FU

最新推荐文章于 2024-09-09 00:00:00 发布

阅读量308

点赞数

分类专栏： Tutorial Blogs 文章标签： python 正则表达式 html

本文链接：https://blog.csdn.net/GG_Fu/article/details/108566442

版权

Tutorial Blogs 专栏收录该内容

2 篇文章 0 订阅

订阅专栏

Project: crawl the title of all free games on steam (Part II)

Python modules: requests, re

Objective: get needed information from websites

In the previous section, we have successfully crawled the titles of free games on steam. The question left is how to get titles from other pages.

import requests
import re

titles = []
for i in range(11):
    freegame_url = f'https://store.steampowered.com/genre/Free%20to%20Play/#p={i}&tab=TopSellers'

#    headers = {''}
#    freegame_content = requests.get(freegame_url, headers=headers) #try to add headers when error 418 is raised

    freegame_content = requests.get(freegame_url)
    freegame_html = freegame_content.text
    title_pat = '<div class="tab_item_name">(.*?)</div>'
    titles_page = re.compile(title_pat).findall(freegame_html)
    for t in titles_page:
        titles.append(t)

print(titles)
print(f'{len(titles)} free games in total!')

The answer is simple! We add loops!

We can observe from steam's website that there are 12 pages for free games. And from the url we can notice the nuance between each page: p=page, so we enumerate from 0 to 11. For each loop, we generate a new url and post request to it. Each time we get the title list of the page, then we add each element in the list to the list: titles[] that we create in the first place. In the end we can get the list with all titles included. At last we print the length of the list so we can know how many free games are there in total.

It is worthwhile to try with other websites. We may start with inspecting the html file of the site. Then we try to find the location of the elements needed. We use ctrl+f to search if the pattern works only for the elements we want, and do some modification if it is not the case. We have others tools to help if the condition is more complicated, we will learn to use css, xpath, related-location selectors when we introduce scrapy, selenium, helium, etc.

GAVIN FU

关注

0
点赞
踩
0

收藏

觉得还不错? 一键收藏
0
评论
Project: crawl the title of all free games on steam (Part II)

Project: crawl the title of all free games on steam (Part II)Python modules: requests, reObjective: get needed information from websitesIn the previous section, we have successfully crawled the titles of free games on steam. The question left is how.
复制链接

扫一扫