python简单爬虫及 beautifulSoup简单用法

最新推荐文章于 2024-05-07 01:00:00 发布

qq_33930703

最新推荐文章于 2024-05-07 01:00:00 发布

阅读量329

点赞数

主要是在这里看到的：

http://www.jb51.net/article/65287.htm

爬取电影天堂的磁力链接

#-*- coding: utf-8 -*-
import urllib.request
from bs4 import BeautifulSoup
import re

url='http://www.dytt8.net/index.htm'
res=urllib.request.urlopen(url).read()
html=res.decode('gbk')
#print(html)

soup = BeautifulSoup(html,"html.parser")

res = soup.find_all(href=re.compile('/html/gndy/dyzz/2017'))

for each in res:

    each='http://www.dytt8.net/'+each['href']
    res2=urllib.request.urlopen(each)
    html2=res2.read().decode('gbk')
    soup2 = BeautifulSoup(html2,"html.parser")
    data = soup2.find_all(href=re.compile('ftp://'))
    print(data[0]['href'])

创建 beautifulsoup 对象

soup = BeautifulSoup(html)

find_all() 用法返回的是一个列表

A.name参数 = name 参数可以查找所有名字为 name 的tag,字符串对象会被自动忽略掉

1.传字符串

2.传正则表达式

3.传列表

4.传True （没用过）

5.传方法（没用过）

B.keyword参数 = 搜索时会把该参数当作指定名字tag的属性来搜索

C.text参数返回的不是tag

soup.find_all(text="Elsie")
# [u'Elsie']
  
soup.find_all(text=["Tillie", "Elsie", "Lacie"])
# [u'Elsie', u'Lacie', u'Tillie']
  
soup.find_all(text=re.compile("Dormouse"))
[u"The Dormouse's story", u"The Dormouse's story"]

确定要放弃本次机会？

福利倒计时

: :

立减 ¥

普通VIP年卡可用

立即使用

qq_33930703

关注关注

0
点赞
踩
0

收藏

觉得还不错? 一键收藏
0
评论
python简单爬虫及 beautifulSoup简单用法

主要是在这里看到的：http://www.jb51.net/article/65287.htm 爬取电影天堂的磁力链接#-*- coding: utf-8 -*-import urllib.requestfrom bs4 import BeautifulSoupimport reurl='http://www.dytt8.net/index.htm'res=urllib.
复制链接

扫一扫