Project: crawl the title of all free games on steam (Part II)
Python modules: requests, re
Objective: get needed information from websites
In the previous section, we have successfully crawled the titles of free games on steam. The question left is how to get titles from other pages.
import requests
import re
titles = []
for i in range(11):
freegame_url = f'https://store.steampowered.com/genre/Free%20to%20Play/#p={i}&tab=TopSellers'
# headers = {''}
# freegame_content = requests.get(freegame_url, headers=headers) #try to add headers when error 418 is raised
freegame_content = requests.get(freegame_url)
freegame_html = freegame_content.text
title_pat = '<div class="tab_item_name">(.*?)</div>'
titles_page = re.compile(title_pat).findall(freegame_html)
for t in titles_page:
titles.append(t)
print(titles)
print(f'{len(titles)} free games in total!')
The answer is simple! We add loops!
We can observe from steam's website that there are 12 pages for free games. And from the url we can notice the nuance between each page: p=page, so we enumerate from 0 to 11. For each loop, we generate a new url and post request to it. Each time we get the title list of the page, then we add each element in the list to the list: titles[] that we create in the first place. In the end we can get the list with all titles included. At last we print the length of the list so we can know how many free games are there in total.
It is worthwhile to try with other websites. We may start with inspecting the html file of the site. Then we try to find the location of the elements needed. We use ctrl+f to search if the pattern works only for the elements we want, and do some modification if it is not the case. We have others tools to help if the condition is more complicated, we will learn to use css, xpath, related-location selectors when we introduce scrapy, selenium, helium, etc.