Python 爬取网页并存储至本地

最新推荐文章于 2024-07-27 03:55:04 发布

视觉闫小亘

最新推荐文章于 2024-07-27 03:55:04 发布

阅读量8.3k

点赞数 5

分类专栏： Python网络爬虫文章标签： Python 爬虫本地存储

本文链接：https://blog.csdn.net/two_ye/article/details/94162401

版权

Python网络爬虫专栏收录该内容

2 篇文章 0 订阅

订阅专栏

Python 爬取网页并存储至本地

使用Python爬取网页，并将该网页存储至本地目录。

注：本文爬取的是网易新浪的一个网页。

代码如下：

import urllib.request 

def getHTML(url):  
     html = urllib.request.urlopen(url).read()   
     return html
  
def saveHTML(file_name, file_content):  
     # 注意windows文件命名的禁用符，比如 /   
     with open(file_name.replace('/', '_') + ".html", "wb") as f:  
            # 写文件用bytes而不是str，所以要转码   
            f.write(file_content) 


aurl = "https://mil.news.sina.com.cn/2019-06-27/doc-ihytcerk9733591.shtml"

html = getHTML(aurl)
print("网页已爬取")

saveHTML("sina", html)
print("网页已存储至本地")

目标网页如下： 爬取的网页