《python编程快速上手让繁琐工作自动化》第十一章习题+实践答案

最新推荐文章于 2021-02-04 09:08:41 发布

月月吃喝

最新推荐文章于 2021-02-04 09:08:41 发布

阅读量1.1k

点赞数 4

分类专栏： Python 文章标签：第十一章

本文链接：https://blog.csdn.net/weixin_43840640/article/details/100077243

版权

Python 专栏收录该内容

25 篇文章 11 订阅

订阅专栏

1 习题

1．webbrowser 模块有一个open() 方法，它启动web 浏览器，打开指定的URL，就这样。Requests 模块可以从网上下载文件和页面。BeautifulSoup 模块解析HTML。最后，selenium 模块可以启动并控制浏览器。
2．requests.get() 函数返回一个Response 对象，它有一个text 属性，包含下载内容的字符串。
3．如果下载有问题，raise_for_status() 方法将抛出异常，如果下载成功，什么也不做。
4．Response 对象的status_code 属性包含了HTTP 状态码。
5．以’wb’，即“写二进制”模式在你的计算机上打开新文件后，利用一个for循环迭代遍历Response 对象的iter_content() 方法，将各段写入该文件。下面是例子：

saveFile = open('filename.html', 'wb')
for chunk in res.iter_content(100000):
	saveFile.write(chunk)

6．F12 在Chrome 中打开开发者工具。按下Ctrl-Shift-C（在Windows 和Linux 上）或-Option-C（在OS X），在Firefox 中打开开发者工具。
7．右键点击页面上的元素，并从菜单中选择Inspect Element。
8．’#main’
9．’.highlight’
10．‘div div’
11．‘button[value=“favorite”]’
12．spam.getText()
13．linkElem.attrs
14．selenium 模块是通过from selenium import webdriver 导入的。
15．find_element_* 方法将第一个匹配的元素返回，作为一个WebElement 对象。find_elements_* 方法返回所有匹配的元素，作为一个WebElement 对象列表。
16．click() 和 send_keys() 方法分别模拟鼠标点击和键盘按键。
17．对表单中的任意对象调用submit() 方法将提交该表单。
18．forward()、back() 和 refresh() 等WebDriver 对象方法模拟了这些浏览器按钮。

2 实践项目

2.1命令行邮件程序
编写一个程序，通过命令行接受电子邮件地址和文本字符串。然后利用selenium登录到你的邮件账号，将该字符串作为邮件，发送到提供的地址（你也许希望为这个程序建立一个独立的邮件账号）。
这是为程序添加通知功能的一种好方法。你也可以编写类似的程序，从Facebook 或Twitter 账号发送消息。

from selenium import webdriver
import time,sys

driver=webdriver.Chrome("C:\Program Files (x86)\Chrome-bin\chromedriver.exe")
driver.get("https://mail.126.com/")
time.sleep(2)
#选择用户名密码登陆方式
driver.find_element_by_id("lbNormal").click()
time.sleep(2)

#跳转到iframe
iframe = driver.find_elements_by_tag_name("iframe")[0]
driver.switch_to.frame(iframe)

#输入用户名和密码
email_name=driver.find_element_by_name("email")
email_pwd=driver.find_element_by_name("password")
email_name.send_keys('******')
email_pwd.send_keys("*********")
#点击登录
driver.find_element_by_id("dologin").click()
#让程序等待2秒，待页面加载完成
time.sleep(2)
#勾选“是我在使用，不再提示”
#driver.find_element_by_id("ismyphonebox").click()
#点击登陆
#driver.find_element_by_xpath("//*[@class='u-btn u-btn-middle3 f-ib bgcolor f-fl']").click()
#从frame中切回主文档
driver.switch_to.default_content()

time.sleep(2)
#点击“写信”
driver.find_element_by_id("_mail_component_24_24").click()
time.sleep(2)
#填写收件人
receiver=driver.find_element_by_xpath("//div[contains(@id,'_mail_emailinput')]/input")
receiver.send_keys("*******@163.com")
time.sleep(2)
#填写主题
title=driver.find_element_by_xpath("//input[contains(@id,'_subjectInput')]")
title.send_keys("命令行邮件程序")

#跳转到iframe
#iframe = driver.find_elements_by_tag_name("iframe")[2]
iframe=driver.find_element_by_xpath("//iframe[@class='APP-editor-iframe']")
driver.switch_to.frame(iframe)
#开始编写邮件内容
write_body=driver.find_element_by_xpath("/html/body")
write_body.send_keys("天空飘来五个字，那都不是事儿")
#跳出frame
driver.switch_to.default_content()
time.sleep(2)
#点击发送按钮
driver.find_element_by_xpath("//*[@class='jp0']/div/span[@class='nui-btn-text']").click()
time.sleep(2)
driver.close()

2.2图像网站下载
编写一个程序，访问图像共享网站，如Flickr 或Imgur，查找一个类型的照片，
然后下载所有查询结果的图像。可以编写一个程序，访问任何具有查找功能的图像
网站。

import sys, time, requests,os
from selenium import webdriver
print(os.getcwd())
os.makedirs('picture',exist_ok=True)

pictureUrl='http://image.baidu.com'

browser = webdriver.Chrome("C:\Program Files (x86)\Chrome-bin\chromedriver.exe")
browser.get(pictureUrl)                                          # 控制浏览器打开页面

word_elem = browser.find_element_by_xpath('//input[@name="word"]')             # 定位搜索框关联的元素
word_elem.send_keys('hello')                                               # 输入命令行中的搜索关键字
word_elem.submit()                                                             # 提交表单，搜索

time.sleep(2)

image_elem = browser.find_elements_by_xpath('//ul/li/div/a/img')    # 查找检索结果的每一幅图片的关联元素
num = 1
if not image_elem:
    print("Could not find images.")
else:
	#这里有一点需要注意：百度检索出来的前几张图片是广告，要规避掉，否则就会报错
    for URL in image_elem[10:20]:
        # 循环下载前10张图
        target_url=URL.get_attribute('src')
        res = requests.get(target_url, stream=True, timeout=60)                               # 获取图片对应的URL
        res.raise_for_status()
        img_name=target_url.split('/')[-1]
        filename=os.path.join('picture', img_name[-25:])
        # 写二进制形式打开文件
        with open(filename,'wb') as f:
            for chunk in res.iter_content(100000):                                     # 以循环写入的方式将图片保存到本地硬盘
                if chunk:
                    f.write(chunk)
                    f.flush()

print('Done!')

time.sleep(2)
browser.close()

2.3 2048
2048 是一个简单的游戏，通过箭头向上、下、左、右移动滑块，让滑块合并。实际上，你可以通过一遍一遍的重复“上、右、下、左”模式，获得相当高的分数。编写一个程序，打开https://gabrielecirulli.github.io/2048/上的游戏，不断发送上、右、下、左按键，自动玩游戏。

from selenium import webdriver
from selenium.webdriver.common.keys import Keys
import time

# 控制浏览器打开页面
driver=webdriver.Chrome("C:\Program Files (x86)\Chrome-bin\chromedriver.exe")
driver.get("https://gabrielecirulli.github.io/2048/")
time.sleep(2)

keyList = [Keys.ARROW_RIGHT, Keys.ARROW_UP, Keys.ARROW_LEFT, Keys.ARROW_DOWN]
#开始新游戏
driver.find_element_by_link_text("New Game").click()
time.sleep(2)
count=0

while True:
    game_over=driver.find_elements_by_link_text("Try again")
    if game_over!=None:
        print("Game Over!")
        break
    else:
        htmlElem = driver.find_element_by_tag_name('html')
        htmlElem.send_keys(keyList[count % 4])
        count+=1

time.sleep(5)
driver.close()

2.4 链接验证
编写一个程序，对给定的网页URL，下载该页面所有链接的页面。程序应该标记出所有具有404“Not Found”状态码的页面，将它们作为坏链接输出。

import requests,bs4

url="http://news.baidu.com/guonei/"
res=requests.get(url)

res.raise_for_status()
res_soup=bs4.BeautifulSoup(res.text)
res_soup_href=res_soup.select("a[href]")

print('There are %d <a href=""> in %s' % (len(res_soup_href), url))
for index in range(len(res_soup_href)):
    a_href_url=res_soup_href[index].get('href')
    if a_href_url.startswith('http:') or a_href_url.startswith('https:'):
        a_href_url_fullname=a_href_url
    else:
        a_href_url_fullname = 'http://news.baidu.com/guonei' + a_href_url.replace('/guonei/', '')
    try:
        a_res=requests.get(a_href_url_fullname,timeout=0.1)
        if a_res.status_code==404:
            print('This url(%s) link to 404 not found... \n %s' % (res_soup_href[index].getText(),a_href_url))
    except requests.exceptions.Timeout:
        print('A url request over time...')
print("Done!")

月月吃喝

关注

4
点赞
踩
6

收藏

觉得还不错? 一键收藏
0
评论
《python编程快速上手让繁琐工作自动化》第十一章习题+实践答案

1 习题1．webbrowser 模块有一个open() 方法，它启动web 浏览器，打开指定的URL，就这样。Requests 模块可以从网上下载文件和页面。BeautifulSoup 模块解析HTML。最后，selenium 模块可以启动并控制浏览器。2．requests.get() 函数返回一个Response 对象，它有一个text 属性，包含下载内容的字符串。3．如果下载有问题，r...
复制链接

扫一扫