1、从图像网站下载
编写一个程序,访问图像共享网站,如 Flickr 或 Imgur,查找一个类型的照片,然后下载所有查询结果的图像。可以编写一个程序,访问任何具有查找功能的图像网站。
#-*-coding:utf-8-*-
import os
import requests
import bs4
baseUrl = 'http://imgur.com'
dirName = 'image'
os.makedirs(dirName, exist_ok=True)
# 搜索参数
url = baseUrl + '/search/score?q=' + 'movie'
response = requests.get(url)
response.raise_for_status()
soup = bs4.BeautifulSoup(response.text, "html.parser")
imageUrls = soup.select(".image-list-link img")
if not imageUrls:
print('Could not find image.')
else:
for imageUrl in imageUrls:
downloadUrl = imageUrl.get('src')
print("Download image %s..." % downloadUrl)
split = downloadUrl.split('/')
fileName = os.path.basename(split[len(split) - 1])
filePath = os.path.join(dirName, fileName)
print("FilePath is %s..." % filePath)
if not os.path.exists(filePath):
imageStream = requests.get('http:' + downloadUrl)
imageStream.raise_for_status()
imageFile = open(filePath, 'wb')
for chunk in imageStream.iter_content(100000):
imageFile.write(chunk)
2、2048游戏
2048 是一个简单的游戏,通过箭头向上、下、左、右移动滑块,让滑块合并。实际上,你可以通过一遍一遍的重复“上、右、下、左”模式,获得相当高的分数。编写一个程序,打开 https://gabrielecirulli.github.io/2048/上的游戏,不断发送上、右、下、左按键,自动玩游戏。
#-*-coding:utf-8-*-
from selenium import webdriver
from selenium.webdriver.common.keys import Keys
browser = webdriver.Firefox()
url = 'https://gabrielecirulli.github.io/2048/'
browser.get(url)
game_elem = browser.find_element_by_class_name('game-container')
while True:
retry_elem = browser.find_element_by_class_name('retry-button')
# new game
if retry_elem.text == 'Try again':
retry_elem.click()
game_elem = browser.find_element_by_class_name('game-container')
game_elem.send_keys(Keys.UP)
game_elem.send_keys(Keys.RIGHT)
game_elem.send_keys(Keys.DOWN)
game_elem.send_keys(Keys.LEFT)
3、链接验证
编写一个程序,对给定的网页 URL,下载该页面所有链接的页面。程序应该标记出所有具有 404“Not Found”状态码的页面,将它们作为坏链接输出。
#-*-coding:utf-8-*-
import requests
import bs4
url = 'http://ifeve.com/'
response = requests.get(url)
soup = bs4.BeautifulSoup(response.text, "html.parser")
a_list = soup.select('a')
for a in a_list:
a_url = a.get('href')
try:
response = requests.get(a_url)
if response.status_code == requests.codes.not_found:
print("Page %s is broken link" % a_url)
else:
print("Page %s is other type link" % a_url)
response.raise_for_status()
except:
print("Page %s is Error" % a_url)