python 爬虫基础学习笔记

最新推荐文章于 2024-07-18 15:53:23 发布

myhuisir

最新推荐文章于 2024-07-18 15:53:23 发布

阅读量66

点赞数

分类专栏：爬虫文章标签： python 爬虫

本文链接：https://blog.csdn.net/weixin_41945913/article/details/119655804

版权

爬虫专栏收录该内容

1 篇文章 0 订阅

订阅专栏

1.用 Python 登录网页

from urllib.request import urlopen

# if has Chinese, apply decode()
html = urlopen(
    "https://mofanpy.com/static/scraping/basic-structure.html"
).read().decode('utf-8')
print(html)

2.匹配网页内容
** 2.1正则表达式**

##正则表达式
import re  
res = re.findall(r"<title>(.+?)</title>", html)
print("\nPage title is: ", res[0])
##因为这个段落在 HTML 中还夹杂着 tab, new line, 所以我们给一个 flags=re.DOTALL 来对这些 tab, new line 不敏感.
res = re.findall(r"<p>(.*?)</p>", html, flags=re.DOTALL)    # re.DOTALL if multi line
print("\nPage paragraph is: ", res[0])

确定要放弃本次机会？

福利倒计时

: :

立减 ¥

普通VIP年卡可用

立即使用

myhuisir

关注关注

0
点赞
踩
0

收藏

觉得还不错? 一键收藏
0
评论
python 爬虫基础学习笔记

1.用 Python 登录网页from urllib.request import urlopen# if has Chinese, apply decode()html = urlopen( "https://mofanpy.com/static/scraping/basic-structure.html").read().decode('utf-8')print(html)2.匹配网页内容** 2.1正则表达式**##正则表达式import re res = re
复制链接

扫一扫