python的requests库爬取和re库提取，os库使用（实现爬取wrecking_Ball音乐）

最新推荐文章于 2024-04-27 00:44:19 发布

不羁_神话

最新推荐文章于 2024-04-27 00:44:19 发布

阅读量307

点赞数

分类专栏： python爬虫

本文链接：https://blog.csdn.net/weixin_43408020/article/details/115291885

版权

python 正则表达式

python爬虫专栏收录该内容

28 篇文章 0 订阅

订阅专栏

今天，我接着跟大家分享一波爬取wrecking_Ball。昨天搞得东西有点多，没有发文现在补上。
因为，我的fans想要了解一下requests库怎么爬取音频，我就再发几篇文章作为参考。大家也可以看看，希望能对大家有所帮助。

完整代码如下图：

def spyder3():#wrecking ball
	headers1 = {#设置请求报头
        'user-agent': 'Mozilla/5.0 (Windows NT 10.0; Win64; x64) AppleWebKit/537.36 (KHTML, like Gecko) 				Chrome/89.0.4389.90 Safari/537.36 Edg/89.0.774.57'
	}
	headers2 = {
    	'User-Agent':'Mozilla/5.0 (X11; Ubuntu; Linux x86_64; rv:86.0) Gecko/20100101 Firefox/86.0'
	}
	headers3 = {
    	'User-Agent': 'Mozilla/5.0 (X11; Linux x86_64; rv:78.0) Gecko/20100101 Firefox/78.0'
	}
	headers_ls = []#设置请求报头列表
	headers_ls.append(headers1)
	headers_ls.append(headers2)
	headers_ls.append(headers3)
    url = 'http://www.333ttt.com/up/yy1092490.html'
    headers = random.choice(headers_ls)#从请求头中随机选取，防止被记录行为。（反爬）
    response = requests.get(url,headers=headers)
    html = response.text#获取响应的内容，页面源代码
    print(html)
    href = re.findall('<a href="(.*?)"',html)
    music = ''#定义一个中间变量用于接收音频链接
    music_name = re.findall('<meta property="og:site_name" content="(.*?)" />',html)[0]
    print(music_name)
    for music_href in href:
        if 'mp3' in music_href:
            music += music_href
        else:
            continue
    print(music)
    os.mkdir('F:/music')#在F盘创建一个music文件夹
    Music = requests.get(music,headers=headers)#对链接发出get请求
    with open(r'F:/music/{}.mp3'.format(music_name),'wb+') as f:#以二进制写入音频文件
        f.write(Music.content)
spyder3()#调用函数

运行结果如下图：
在这里插入图片描述
根据正则提取链接：

根据正则提取音乐名：

接着去F盘，打开文件夹，下面有我们爬到的音乐。如下图：

点击播放一下，如下图：

可以全部播放，搞定！现在在听Wrecking Ball。过瘾！若是大家在爬取过程遇到问题，可以与我交流探讨。
最后，感谢大家前来观看鄙人的文章，文中或有诸多不妥之处，还望指出和海涵。