python解析获取网页链接代码

学魔学编程

已于 2022-04-04 13:03:45 修改

阅读量735

点赞数

分类专栏： python 文章标签： python 爬虫

于 2021-09-19 10:15:21 首次发布

本文链接：https://blog.csdn.net/qq_59717525/article/details/120377867

版权

本文介绍如何使用Python进行网络爬虫，通过解析网页源代码来提取其中的链接信息。我们将探讨Python的requests库获取网页内容，再利用BeautifulSoup库解析HTML，从而有效地抓取并处理网页链接。

摘要由CSDN通过智能技术生成

# coding=utf-8
import requests
from bs4 import BeautifulSoup
import time
import winreg

import os
# 使用winreg模块


def desktop_path():
    key = winreg.OpenKey(winreg.HKEY_CURRENT_USER,
                         r'Software\Microsoft\Windows\CurrentVersion\Explorer\Shell Folders')
    return winreg.QueryValueEx(key, "Desktop")[0]


ticks = time.time()
url = 'https://www.pcauto.com.cn/'
resp = requests.get(url)  # 请求百度首页
"""
print(resp) #打印请求结果的状态码
print(resp.content) #打印请求到的网页源码
"""
bsobj = BeautifulSoup(
    resp.content, 'html.parser')  # 将网页源码构造成BeautifulSoup对象，方便操作
a_list = bsobj.find_all('a')  # 获取网页中的所有a标签对象
time1Str = "解析开始 开始时间："+time.strftime("%Y-%m-%d %H:%M:%S", time.localtime())
str1 = ""
i = 0
for a in a_list:
    if(str(a.get('href')).find('http') != -1):
        i = i+1