Python爬虫初体验

最新推荐文章于 2023-10-11 12:31:54 发布

予,人乐后飘零

最新推荐文章于 2023-10-11 12:31:54 发布

阅读量232

点赞数

分类专栏： Python 文章标签： Python 爬虫 requests 正则表达式网页链接

本文链接：https://blog.csdn.net/qq_32535455/article/details/113713466

版权

Python 专栏收录该内容

1 篇文章 0 订阅

订阅专栏

Python爬虫初体验

闲来无事,试着写一个小爬虫,功能是爬取一个网页上的url链接,话不多说上代码:

import requests #导入爬虫的库，不然调用不了爬虫的函数 需要pip install
import re

#提取所有的url
def Find(string):
    url = re.findall('https?://(?:[-\w.]|(?:%[\da-fA-F]{2}))+/[a-z,A-Z,0-9,/,.]+', string)
    return url 

#获取页面内容
def Url(string):
    response = requests.get(string)  #生成一个response对象

    response.encoding = response.apparent_encoding
    if response.status_code == 200:
        return Find(response.text)
    else:
        return False

#获取用户输入的url开始爬取
string =input("请输入一个带http的url链接:")
result =Url(string)
if False == result:
    print("没有获取到信息")
else:
    print("爬取到的url链接有:")    
    print(result)

总结:写python的代码还是很舒服的,没有那么多限制,想怎么写就怎么写.

予,人乐后飘零

关注

0
点赞
踩
0

收藏

觉得还不错? 一键收藏
1
评论
Python爬虫初体验

Python爬虫初体验闲来无事,试着写一个小爬虫,功能是爬取一个网页上的url链接,话不多说上代码:import requests #导入爬虫的库，不然调用不了爬虫的函数需要pip installimport re#提取所有的urldef Find(string): url = re.findall('https?://(?:[-\w.]|(?:%[\da-fA-F]{2}))+/[a-z,A-Z,0-9,/,.]+', string) return url #获取页面
复制链接

扫一扫

专栏目录