11.BeautifulSoup基础

最新推荐文章于 2021-07-30 17:08:52 发布

zmjames2000

最新推荐文章于 2021-07-30 17:08:52 发布

阅读量91

点赞数

分类专栏：爬虫 python 文章标签： python BeautifulSoup xpath 正则

本文链接：https://blog.csdn.net/zmjames2000/article/details/100690828

版权

python 同时被 2 个专栏收录

56 篇文章 0 订阅

订阅专栏

爬虫

18 篇文章 0 订阅

订阅专栏

正则=xpath=BeautifulSoup

from bs4 import BeautifulSoup as bsf
import urllib.request

data = urllib.request.urlopen('xxxx.com').read().decode('utf-8','ignore')
bs = bsf(data)  #格式化输出
print(bs.prettify())

bs.title  #bs.标签名  <title>hello</title>
bs.title.name # 'title'
bs.title.string #   hello

bs.a.attrs  # 获取<a> 中所有属性
bs.a["class"] = bs.a.get("class")  #获取的是 class="xxx"中的 xxx

bs.find_all('a')
bs.find_all(['a','u'])  #获取所有a，u节点的内容

k1 = bs.ul.contents #返回list
k2 = bs.ul.children  #返回的是生成器
children = [ i for i in k2]

确定要放弃本次机会？

福利倒计时

: :

立减 ¥

普通VIP年卡可用

立即使用

zmjames2000

关注关注

0
点赞
踩
0

收藏

觉得还不错? 一键收藏
0
评论
11.BeautifulSoup基础

正则=xpath=BeautifulSoupfrom bs4 import BeautifulSoup as bsfimport urllib.requestdata = urllib.request.urlopen('xxxx.com').read().decode('utf-8','ignore')bs = bsf(data) #格式化输出print(bs.prettify())...
复制链接

扫一扫