python检索html文件并返回链接,Python 用BeautifulSoup从HTML网页格式中获取信息

最新推荐文章于 2023-06-27 15:00:00 发布

weixin_39941732

最新推荐文章于 2023-06-27 15:00:00 发布

阅读量240

点赞数

文章标签： python检索html文件并返回链接

记录一下py4e 课程的 beautiful soup 作业，虽然是我写的但其实应该算半原创的。

找到某个网页上的链接。

步骤解析：

引入相关库

忽略SSL错误

打开网站并且引用BS4直接提取相关内容

import urllib.request, urllib.parse, urllib.error

from bs4 import BeautifulSoup

import ssl

# Ignore SSL certificate errors

ctx = ssl.create_default_context()

ctx.check_hostname = False

ctx.verify_mode = ssl.CERT_NONE

url = input('Enter url - ')

# url = 'http://py4e-data.dr-chuck.net/known_by_Elita.html'

position = int(input('enter position - '))

times = int(input('enter times - '))

for time in range(times):

if time == 0:

openurl = url

else:

openurl = get_urls[position-1]

html = urllib.request.urlopen(openurl, context=ctx).read()

soup = BeautifulSoup(html, 'html.parser')

tags = soup('a')

get_urls = []

for tag in tags:

get_urls.append(tag.get('href', None))

print(get_urls[position-1])

2018.4.27

确定要放弃本次机会？

福利倒计时

: :

立减 ¥

普通VIP年卡可用

关注关注