在 利用BeautifulSoup爬取网址的时候总是遇到TypeError: list indices must be integers or slices, not str
部分的代码是这样的
def loadHTML(html):
soup = BeautifulSoup(html,"html.parser")
text = soup.find_all("img","origin_image zh-lightbox-thumb lazy")["data-actualsrc"]
for each in text:
print(each)
准备提取的url如下图所示
<img src="**https://pic3.zhimg.com/80/v2-9ad4fdb451e2dd3ee2d124bcd55dbf09_hd.jpg" data-size="normal**" data-rawwidth="2660" data-rawheight="806" data-default-watermark-src="https://pic4.zhimg.com/50/v2-704439adebbd0b4015026dbaf198b946_hd.jpg" class="origin_image zh-lightbox-thumb lazy" data-original="https://pic3.zhimg.com/v2-9ad4fdb451e2dd3ee2d124bcd55dbf09_r.jpg" data-actualsrc="https://pic3.zhimg.com/50/v2-9ad4fdb451e2dd3ee2d124bcd55dbf09_hd.jpg" data-lazy-status="ok" width="2660">
后来发现是系统把 soup.find_all(“img”,“origin_image zh-lightbox-thumb lazy”)不能识别为一个列表,实际上这个是表中表结构
所以根据你的需求更改啦,如果你只想要第一个的第一个网址的话,那就是soup.find_all(“img”,“origin_image zh-lightbox-thumb lazy”)[0][‘tag’]啦