request、selenium爬取网易云歌手名下的歌名+链接

最新推荐文章于 2021-05-26 17:26:41 发布

lingyuncelia

最新推荐文章于 2021-05-26 17:26:41 发布

阅读量536

点赞数 1

文章标签： python xpath 爬虫

本文链接：https://blog.csdn.net/lingyuncelia/article/details/115022868

版权

这篇博客介绍了如何使用request和selenium爬取网易云音乐中某个歌手的所有歌曲名称及其链接。博主提醒注意在爬取过程中，由于'#'的存在，xpath可能无法直接获取到正确的请求URL。通过调整代码，成功获取到了包括‘富士山下’、‘好久不见’等在内的多首歌曲的详细信息。

摘要由CSDN通过智能技术生成

F12进入开发者工具
在这里插入图片描述
ctrl+u查看网页源代码
]
xpath无法爬取网页源代码不存在的东西，例如运行以下代码：

import time
from lxml import etree
import requests
def open_url(s):
    url="https://music.163.com/#/artist?id=2116"  
    res = requests.get(url)
    time.sleep(1)
    r=res.text
    selector = etree.HTML(r)
    # /text()提取文字
    # x="(//span[@class='txt']//b)[{}]/text()".format(str(s)) #运行结果：['${soil(x.name)}']
    # @提取属性
    x="(//span[@class='txt']//a)[{}]/@href".format(str(s)) #运行结果：['/song?id=${x.id}']
    a=selector.xpath(x)
    print(a)
open_url(1)

注意URL的“#”，浏览器上有“#”，但是开发者工具的Request URL没有“#”。我怀疑开发是故意的。
在这里插入图片描述
注释掉 ‘Accept-Encoding’:‘gzip, deflate, br’,否则乱码

import requests
from bs4 import BeautifulSoup
url="https://music.163.com/artist?id=2116"
headers={
   
'authority':'music.163.com',
'method':'GET',
'path':'/artist?id=2116',
'scheme':'https',
'accept':'text/html,application/xhtml+xml,application/xml;q=0.9,image/avif,image/webp,image/apng,*/*;q=0.8,application/signed-exchange;v=b3;q=0.9',
# 'accept-encoding':'gzip,deflate,br',
'accept-language':'zh-CN,zh;q=0.9',

最低0.47元/天解锁文章

lingyuncelia

关注

1
点赞
踩
3

收藏

觉得还不错? 一键收藏
4
评论
request、selenium爬取网易云歌手名下的歌名+链接

F12进入开发者工具ctrl+u查看网页源代码]xpath无法爬取网页源代码不存在的东西，例如运行以下代码：import timefrom lxml import etreeimport requestsdef open_url(s): url="https://music.163.com/#/artist?id=2116" res = requests.get(url) time.sleep(1) r=res.text selector = et
复制链接

扫一扫