爬虫自学day1：requests模块爬取搜狗首页页面数据

最新推荐文章于 2022-12-27 14:18:00 发布

_Skylar_

最新推荐文章于 2022-12-27 14:18:00 发布

阅读量940

点赞数

分类专栏：爬虫文章标签：爬虫

本文链接：https://blog.csdn.net/You1022/article/details/109460387

版权

爬虫专栏收录该内容

5 篇文章 4 订阅

订阅专栏

来自：B站学习视频 BV1VV411m78j

如何使用：（requests模块的编码流程）：

指定URL
发起请求
获取响应数据
持久化存储

代码：

#爬取百度百科首页的页面数据
import requests
if __name__ == "__main__":
    #1、指定url
    url = 'https://baike.baidu.com/'
    #2、发起请求用get方法获取一个响应对象
    response = requests.get(url=url)
    #3、获取响应数据，text返回的是字符串形式的响应数据
    page_text = response.text
    print(page_text)
    #4、持久化数据
    with open('./baike.html','w',encoding='utf-8') as fp:
        fp.write(page_text)
    print('爬取数据结束')

得到爬虫数据字符串形式输出（截取一部分）：

<!DOCTYPE html><html lang="cn"><head><meta name="viewport" content="width=device-width,minimum-scale=1,maximum-scale=1,user-scalable=no"><script>window._speedMark = new Date();  window.lead_ip = '140.243.16.165';
    window.now = 1604332613712;</script><script type="text/javascript">/*file=static/js/resourceErrorReport.js*/!function(a){var n=(new Date).getTime(),r=a.location.protocol;function c(e,t){var o=(new Date).getTime()-n;(new Image).src=["//pb.sogou.com/pv.gif?uigs_productid=wapapp&type=resource-error&stype=",e,"&timestamp=",o,"&protocol=",r,"&host=",encodeURIComponent(a.location.host),"&path=",encodeURIComponent(a.location.pathname),"&resource=",encodeURIComponent(t)].join("")}function e(e){if((e=e||a.event)&&"error"===e.type){var t=e.srcElement?e.srcElement:e.target;if(t){var o,n,r=t.tagName;"LINK"===r?(n="css",(o=t.getAttribute("href"))&&o.match(/\.css($|\?)/)&&c(n,o)):"SCRIPT"===r&&(n="js",(o=t.getAttribute("src"))&&o.match(/\.js($|\?)/)&&c(n,o))}}}r&&(r=r.substring(0,r.length-1)),a.addEventListener?a.addEventListener("error",e,!0):a.attachEvent&&a.attachEvent("onerror",e)}(window);</script><meta charset="utf-8"><link rel="dns-prefetch" href="//img01.sogoucdn.com"><link rel="dns-prefetch" href="//img02.sogoucdn.com"><link rel="dns-prefetch" href="//img03.sogoucdn.com"><link rel="dns-prefetch" href="//img04.sogoucdn.com"><link rel="dns-prefetch" href="//dlweb.sogoucdn.com"><title>搜狗搜索引擎 - 上网从搜狗开始</title><link rel="shortcut icon" href="/images/logo/new/favicon.ico?v=4" type="image/x-icon"><meta http-equiv="X-UA-Compatible" content="IE=Edge"><link rel="search" type="application/opensearchdescription+xml" href="/content-search.xml" title="搜狗搜索"><meta name="keywords" content="搜狗搜索,网页搜索,微信搜索,视频搜索,图片搜索,音乐搜索,新闻搜索,软件搜索,问答搜索,百科搜索,购物搜索"><meta name="description" content="搜狗搜索是全球第三代互动式搜索引擎，支持微信公众号和文章搜索、知乎搜索、英文搜索及翻译等，通过自主研发的人工智能算法为用户提供专业、精准、便捷的搜索服务。"><link

得到sougou.html:
在这里插入图片描述

_Skylar_

关注

0
点赞
踩
1

收藏

觉得还不错? 一键收藏
1
评论
爬虫自学day1：requests模块爬取搜狗首页页面数据

来自：B站学习视频 BV1VV411m78j如何使用：（requests模块的编码流程）：指定URL发起请求获取响应数据持久化存储代码：#爬取百度百科首页的页面数据import requestsif __name__ == "__main__": #1、指定url url = 'https://baike.baidu.com/' #2、发起请求用get方法获取一个响应对象 response = requests.get(url=url) #3、获
复制链接

扫一扫