爬虫抓取网站主页，面向过程和面向对象的代码区别 python实现

最新推荐文章于 2024-01-12 23:25:49 发布

菜小饼

最新推荐文章于 2024-01-12 23:25:49 发布

阅读量305

点赞数

分类专栏：爬虫文章标签： python

本文链接：https://blog.csdn.net/caism8877/article/details/105558666

版权

爬虫专栏收录该内容

3 篇文章 0 订阅

订阅专栏

小遛主页爬虫

import requests
#面向过程 实现
url="https://www.xlgxapp.com"
headers={"user-agent":"Mozilla/5.0 (Windows NT 6.3; WOW64) AppleWebKit/537.36 (KHTML, like Gecko) Chrome/80.0.3987.149 Safari/537.36"}

response=requests.get(url,headers)
response.encoding="utf-8"
print(response.text)

51job主页爬虫
# 面向对象实现

class spider():
    def __init__(self):
        self.headers={"user-agent":"Mozilla/5.0 (Windows NT 6.3; WOW64) AppleWebKit/537.36 (KHTML, like Gecko) Chrome/80.0.3987.149 Safari/537.36"}
    def submit(self,url):
        response=requests.get(url,self.headers)
        response.encoding = "gbk"
        return response.text

if __name__=="__main__":
    obj=spider()
    res=obj.submit("https://www.51job.com")
    print(res)

用google浏览器登录网页后（如:www.51job.com）,点击右上角更多工具–开发者工具，再重新加载网页。
点network，拉到最上面，点网页地址，点击Headers，可以看到请求方法为GET。登录用户后，可以看到请求方法为Post
获取URL–>获取Headers（属性）：network–headers 拉到最下面的User-agent

菜小饼

关注

0
点赞
踩
2

收藏

觉得还不错? 一键收藏
0
评论
爬虫抓取网站主页，面向过程和面向对象的代码区别 python实现

小遛主页爬虫import requests#面向过程实现url="https://www.xlgxapp.com"headers={"user-agent":"Mozilla/5.0 (Windows NT 6.3; WOW64) AppleWebKit/537.36 (KHTML, like Gecko) Chrome/80.0.3987.149 Safari/537.36"}r...
复制链接

扫一扫