python制作可访问网页_Python编程学习之利用selenium分辨出可访问的网页并获取网页内容...

最新推荐文章于 2024-03-02 13:52:21 发布

weixin_39684235

最新推荐文章于 2024-03-02 13:52:21 发布

阅读量93

点赞数

文章标签： python制作可访问网页

一、前言

笔者在前面的文章中收集到一些域名，在这些域名收集完后，并不是每一个域名都有作用，我们要过滤掉访问不了的网站，所以今天学习利用Python中的selenium模块启动Chromium来请求网站，下面记录一下自己的学习过程。

二、学习过程

1.开发工具：

Python版本：3.7.1

相关模块：

selenium模块

pymysql模块

2.原理简介

从数据库读取出需要访问的域名------利用selenium进行访问域名并获取网站标题、内容长度、截图------存入数据库

from selenium import webdriver

from selenium.webdriver.chrome.options import Options

import pymysql

# 获取存活的域名

def run(cursor):

# 获取域名

domains = get_domains(cursor)

# Chrome的参数选项

chrome_options = Options()

# 无头操作

chrome_options.add_argument('--headless')

# 利用这个路径的Chromium来进行操作

chrome_options.binary_location = r'%s'%"/Applications/Chromium.app/Contents/MacOS/Chromium"

# 创建Chrome实例

driver = webdriver.Chrome(executable_path=(r'/Users/hello/Desktop/chromedriver/chromedriver'), options=chrome_options)

# 设置20秒的超时时间

driver.set_page_load_timeout(20)

success_list = []

for i in domains:

try:

# 请求网站

driver.get('https://'+i[0])

#获取网站的信息

http_length = len(driver.page_source)

http_status = '响应成功'

img_path = "/Users/hello/Desktop/py test/%s.png"%i[0]

screenshot = driver.get_screenshot_as_file(img_path)

if driver.title:

title = driver.title

else:

title = ''

success_list.append([i[0], title, http_length, img_path, http_status])

except :

print('%s 响应失败'%i[0])

return success_list

# 去数据库查询域名

def get_domains(cursor):

sql = "SELECT hostname FROM 数据库"

cursor.execute(sql)

domain_lists = cursor.fetchall()

return domain_lists

# 把可访问的域名插入数据库

def insert(cursor, list, db):

for i in list:

select_sql = "SELECT id FROM 数据库 WHERE hostname = '%s'"%i[0]

cursor.execute(select_sql)

result = cursor.fetchone()

update_sql = "UPDATE 数据库 SET page_title = '%s', http_length = %d, page_jietu_path = '%s', http_status= '%s' WHERE id = %s" %(i[1], i[2], i[3], i[4], result[0])

cursor.execute(update_sql)

db.commit()

if __name__ == "__main__":

db = pymysql.connect('localhost', '账户', '密码', 'test')

cursor = db.cursor()

list = run(cursor)

insert(cursor, list, db)

db.close()

三、效果展示

四、总结

程序速度较慢，程序编写能力有待加强。

weixin_39684235

关注

0
点赞
踩
0

收藏

觉得还不错? 一键收藏
0
评论
python制作可访问网页_Python编程学习之利用selenium分辨出可访问的网页并获取网页内容...

一、前言笔者在前面的文章中收集到一些域名，在这些域名收集完后，并不是每一个域名都有作用，我们要过滤掉访问不了的网站，所以今天学习利用Python中的selenium模块启动Chromium来请求网站，下面记录一下自己的学习过程。二、学习过程1.开发工具：Python版本：3.7.1相关模块：selenium模块pymysql模块2.原理简介从数据库读取出需要访问的域名------利用seleniu...
复制链接

扫一扫

评论

被折叠的条评论为什么被折叠?

到【灌水乐园】发言

查看更多评论

添加红包

成就一亿技术人!

hope_wisdom

发出的红包

实付元

使用余额支付

点击重新获取

扫码支付

钱包余额 0

抵扣说明：

1.余额是钱包充值的虚拟货币，按照1:1的比例进行支付金额的抵扣。
2.余额无法直接购买下载，可以购买VIP、付费专栏及课程。