Python crawler: TypeError: list indices must be integers or slices, not str

本文介绍如何使用Python的requests和BeautifulSoup库从豆瓣图书页面抓取书籍的图片src。通过解决实际代码问题,展示了如何选择并获取列表中第一个元素的属性。最终成功抓取了四张图书封面链接。
摘要由CSDN通过智能技术生成

目标:获取class中的src:

#[]

import requests
from bs4 import BeautifulSoup

url2 = 'https://book.douban.com/tag/%E5%B0%8F%E8%AF%B4?start=20&type=T'
headers = {
    'Cookie': 'bid=PZvLUOLGEXA; gr_user_id=058ae679-f073-4439-8fee-e150845fc5d1; gr_session_id_22c937bbd8ebd703f2d8e9445f7dfd03=718f9611-f56d-4157-83c0-fa0630d23f54; gr_cs1_718f9611-f56d-4157-83c0-fa0630d23f54=user_id%3A0; _vwo_uuid_v2=DE8315B475926BC0EC8073A25BB5E417F|ffb24e86a47c3f06a4183acca38babfc; ap_v=0,6.0; _pk_ref.100001.3ac3=%5B%22%22%2C%22%22%2C1640518435%2C%22https%3A%2F%2Fwww.baidu.com%2Flink%3Furl%3DsIBOcf95tPXFgrZgUKpJSPufDaP4TslGv5VL2CyxOi0tPKZETuh2YFEaJP8FNwIY%26wd%3D%26eqid%3Db947cc3b00004f400000000661c85312%22%5D; _pk_ses.100001.3ac3=*; __utma=30149280.900502028.1640518436.1640518436.1640518436.1; __utmc=30149280; __utmz=30149280.1640518436.1.1.utmcsr=baidu|utmccn=(organic)|utmcmd=organic; __utmt_douban=1; __utma=81379588.849952259.1640518436.1640518436.1640518436.1; __utmc=81379588; __utmz=81379588.1640518436.1.1.utmcsr=baidu|utmccn=(organic)|utmcmd=organic; __utmt=1; gr_session_id_22c937bbd8ebd703f2d8e9445f7dfd03_718f9611-f56d-4157-83c0-fa0630d23f54=true; Hm_lvt_16a14f3002af32bf3a75dfe352478639=1640518553; Hm_lpvt_16a14f3002af32bf3a75dfe352478639=1640518553; __yadk_uid=RT3p6hcsmxbifVYimtJJSxCo516tSqMP; __gads=ID=4b7d058c20e70905-223b74b087cf0020:T=1640518613:RT=1640518613:S=ALNI_MbDGG3WX_yK5S2yX8OTnynMikqx1w; __utmb=30149280.7.10.1640518436; __utmb=81379588.7.10.1640518436; _pk_id.100001.3ac3=6c02ab20909a7a05.1640518435.1.1640518830.1640518435.',
    'User-Agent': 'Mozilla/5.0 (Windows NT 6.1) AppleWebKit/537.36 (KHTML, like Gecko) Chrome/96.0.4664.110 Safari/537.36'
    }
html = requests.get(url2, headers=headers).text
soup = BeautifulSoup(html, 'lxml')
books = soup.select('div .subject-item')
for b in books:
    img = b.select(' img')['src']  #此处报错
    print(img)

解决办法:
img = b.select(’ img’)[0][‘src’]
中间加[0],代表列表中的第一个元素中取值。

Print结果:
https://img9.doubanio.com/view/subject/s/public/s4468484.jpg
https://img2.doubanio.com/view/subject/s/public/s33946803.jpg
https://img2.doubanio.com/view/subject/s/public/s1103152.jpg
https://img2.doubanio.com/view/subject/s/public/s27264181.jpg

评论
添加红包

请填写红包祝福语或标题

红包个数最小为10个

红包金额最低5元

当前余额3.43前往充值 >
需支付:10.00
成就一亿技术人!
领取后你会自动成为博主和红包主的粉丝 规则
hope_wisdom
发出的红包
实付
使用余额支付
点击重新获取
扫码支付
钱包余额 0

抵扣说明:

1.余额是钱包充值的虚拟货币,按照1:1的比例进行支付金额的抵扣。
2.余额无法直接购买下载,可以购买VIP、付费专栏及课程。

余额充值