python网络数据采集学习笔记（一）

最新推荐文章于 2024-07-17 14:00:00 发布

Nyte2018

最新推荐文章于 2024-07-17 14:00:00 发布

阅读量1.2k

点赞数

文章标签： python 爬虫网络数据采集

本文链接：https://blog.csdn.net/Nyte2018/article/details/88713447

版权

本文是作者的Python网络数据采集学习笔记，主要介绍了使用Python进行网络连接，通过urllib.request模块获取网页HTML内容，并探讨了BeautifulSoup库在解析HTML结构中的作用。文章还讨论了如何处理可能遇到的网络连接异常和BeautifulSoup中的标签查找问题。

摘要由CSDN通过智能技术生成

目前python已成为主流编程语言之一，在我们这个年纪多学一点知识总是好的，感觉自己经常碌碌无为平庸而过，研究生生活虽然不精彩，但是自己不能放弃自己。以后我会每天自己学一些新的内容，然后发学习笔记作为勉励自己的见证，欢迎大家一起努力。
python之前学过一点基础，看的是《Python编程：从入门到实践》，百度云链接：https://pan.baidu.com/s/1CL7qy7fSmcjaUQfz3DhDjQ 提取码: nkkk
现在学习关于python的爬虫，学习书目为《python网络数据采集》，百度云链接：https://pan.baidu.com/s/1SMxVqjM7aU7BBmGIn3CYtQ 提取码: ekuj

1、网络连接

先来看下面代码1：

from urllib.request import urlopen
        html = urlopen("http://pythonscraping.com/pages/page1.html")
        print(html.read())

输出结果为：

b'<html>\n<head>\n<title>A Useful Page</title>\n</head>\n<body>\n<h1>An Interesting Title</h1>\n<div>\nLorem ipsum dolor sit amet, consectetur adipisicing elit, sed do eiusmod tempor incididunt ut labore et dolore magna aliqua. Ut enim ad minim veniam, quis nostrud exercitation ullamco laboris nisi ut aliquip ex ea commodo consequat. Duis aute irure dolor in reprehenderit in voluptate velit esse cillum dolore eu fugiat nulla pariatur. Excepteur sint occaecat cupidatat non proident, sunt in culpa qui officia deserunt mollit anim id est laborum.\n</div>\n</body>\n</html>\n'

最低0.47元/天解锁文章

Nyte2018

关注

0
点赞
踩
4

收藏

觉得还不错? 一键收藏
0
评论
复制链接

分享到 QQ

分享到新浪微博

扫一扫