我在爬取一个漫画网站时遇到了一个这样的问题
这是我的选择器
from bs4 import BeautifulSoup
#选择器
comicelm= soup.findAll('img')
print(comicelm)
这是网站源码
<!DOCTYPE HTML>
<html lang="zh-CN">
<head lang="zh-CN">
<meta charset="UTF-8">
<meta http-equiv="X-UA-Compatible" content="IE=edge,chrome=1">
<meta name="renderer" content="webkit">
<meta name="viewport" content="width=device-width,initial-scale=1,maximum-scale=1">
<meta http-equiv="content-language" content="zh-CN">
<title>(C101)SWEET CANDY POT! 7 (オリジナル)漫画 全一话免费观看-爱国漫</title>
<meta name="keywords" lang="zh-CN" content="(C101)SWEET CANDY POT! 7 (オリジナル)漫画,(C101)SWEET CANDY POT! 7 (オリジナル)全一话免费阅读,(C101)SWEET CANDY POT! 7 (オリジナル)漫画在线观看">
<meta name="description" lang="zh-CN" content="爱国漫为您更新(C101)SWEET CANDY POT! 7 (オリジナル)漫画,(C101)SWEET CANDY POT! 7 (オリジナル)全一话免费阅读,(C101)SWEET CANDY POT! 7 (オリジナル)漫画在线观看,(C101)SWEET CANDY POT! 7 (オリジナル)全一话漫画情节,更多精彩漫画尽在爱国漫漫画网!">
<meta http-equiv="Cache-Control" content="no-transform">
<meta http-equiv="Cache-Control" content="no-siteapp">
<link rel="stylesheet" type="text/css" href="/template/pc/mangabz/css/lit-reader-5976e0f0f2.css?v=1695730296">
<link rel="stylesheet" type="text/css" href="/template/pc/mangabz/css/common-d795625f09.css?v=1695730296">
<link rel="stylesheet" type="text/css" href="/template/pc/mangabz/css/reader-e8e4175d18.css?v=1695730296">
<script type="text/javascript">
function isMobileHanddle() {
var e = navigator.userAgent;
return (screen.width / screen.height < 1 || /AppleWebKit.*Mobile/i.test(e) || /MIDP|SymbianOS|NOKIA|SAMSUNG|LG|NEC|TCL|Alcatel|BIRD|DBTEL|Dopod|PHILIPS|HAIER|LENOVO|MOT-|Nokia|SonyEricsson|SIE-|Amoi|ZTE/.test(e)) && !/ipad/gi.test(e)
}
isMobileHanddle() && (window.location.href = 'https://m.aiguoman.com/' + window.location.pathname.substr(1));
</script>
</head>
<body class="toolbar">
<!-- 顶部 -->
<div class="nav-top-wrap J_nav-top J_block" data-block="810100" data-blockname="工具视图">
<div class="logo-wrap logo-sub-wrap">
<h1><a href="javascript:;" class="logo" data-pageend="quit" p-rseat="mgchapter" title="叭嗒"></a></h1>
</div>
<div class="nav-top">
<div class="logo-wrap">
<a href="/" data-pageend="quit" p-rseat="mgchapter" class="logo"></a>
<span class="cartoon-title">
<a href="/comic/40c10141sweetcandypot74041" target="_blank" class="chapter" data-rseat="mgchapter" data-bookid="18yzme91z9">(C101)SWEET CANDY POT! 7 (オリジナル)<span class="icon-arr-top"></span></a>
<!-- 右箭头与题目需要连在一起 -->
<!-- <span class="icon-arr-top"></span> -->
<a href="javascript:;" class="chapter-sub">全一话 </a>
</span>
</div>
</div>
</div> <!-- 中间主体漫画部分 -->
<!-- 滚动模式 -->
<div class="main main-scroll_mode J_scroll_mode J_block" data-block="810100" data-blockname="阅读视图">
<ul class="main-container">
<li class="main-item"><img src="https://res.xiaoqinre.com/images/comic/1014/2027038/1674908611yioH1MrtvZ2D6LpP.jpg"></li>
<li class="main-item"><img src="https://res.xiaoqinre.com/images/comic/1014/2027038/1674908611VAfrYl73yiemkdvW.jpg"></li>
<li class="main-item"><img src="https://res.xiaoqinre.com/images/comic/1014/2027038/16749086106lZHGxc612L3c2n0.jpg"></li>
<li class="main-item"><img src="https://res.xiaoqinre.com/images/comic/1014/2027038/1674908610FKXNG5OEiqPo9NeA.jpg"></li>
<li class="main-item"><img src="https://res.xiaoqinre.com/images/comic/1014/2027038/167490860994dnW80_stZ8pi-z.jpg"></li>
<li class="main-item"><img src="https://res.xiaoqinre.com/images/comic/1014/2027038/1674908609HSJ9BlnnEiz9trJb.jpg"></li>
<li class="main-item"><img src="https://res.xiaoqinre.com/images/comic/1014/2027038/1674908608dnK2-1T5_FJme1TU.jpg"></li>
<li class="main-item"><img src="https://res.xiaoqinre.com/images/comic/1014/2027038/1674908608N3psToVy_xqdDuGP.jpg"></li>
<li class="main-item"><img src="https://res.xiaoqinre.com/images/comic/1014/2027038/1674908607z_7JMzahbDs5m0Ic.jpg"></li>
<li class="main-item"><img src="https://res.xiaoqinre.com/images/comic/1014/2027038/16749086076bsaHtURI1zTU-wG.jpg"></li>
<li class="main-item"><img src="https://res.xiaoqinre.com/images/comic/1014/2027038/1674908606awNk3_sPuZhH7iPz.jpg"></li>
<li class="main-item"><img src="https://res.xiaoqinre.com/images/comic/1014/2027038/1674908606BtSyyCYoBIou9rY6.jpg"></li>
<li class="main-item"><img src="https://res.xiaoqinre.com/images/comic/1014/2027038/1674908605CsK4ohyHtKybpppr.jpg"></li>
<li class="main-item"><img src="https://res.xiaoqinre.com/images/comic/1014/2027038/1674908605zP0qpB-zum0u_B0O.jpg"></li>
<li class="main-item">
<p class="next-subtit">即将进入下一话</p><a data-rseat="tnextchp" href="" class="next-title">没有了<i class="icon-next"></i></a>
</li>
</ul>
</div> <!-- 底部 -->
<script src="/poster/pc-agm-sy-hengfu.js?v=1695730296"></script>
<div class="nav-bottom-wrap J_nav-bottom J_block" data-block="810100" data-blockname="工具视图">
<div class="nav-bottom nav-bottom-toolbar">
<ul class="page-container">=
</ul>
<ul class="nav-bottom-ul">
<li class="catalog-item"><a href="/app/" target="_blank" class="collect-form-btn-4">下载APP</a></li>
<li class="splite-item"></li>
<li class="catalog-item"><a href="/comic/40c10141sweetcandypot74041" class="J_catalog_button" data-rseat="catalog"><i class="icon-catalog"></i>目录</a></li>
<li class><a data-rseat="nechp" class="J_next_eposide_btn " href="">下一话<i class="icon-nextpage"></i></a></li>
<li class><a data-rseat="bachp" class="J_prev_eposide_btn " href=""><i class="icon-uppage"></i>上一话</a></li>
</ul>
</div>
</div>
<script src="/template/pc/mangabz/js/jquery-1-4f775cb966.11.1.min.js?v=1695730296"></script>
<script type="text/javascript">
window.jquery = window.jQuery
</script>
<div class="footer">
<p>Copyright (C) 2005-2018 </p>
<p> 爱国漫(www.aiguoman.com)是一家漫画免费分享以及在线浏览平台</p>
<p>版权投诉 manhuahao@gmail.com</p>
</div>
<script type="text/javascript">
$(function() {
$("#btnSearch").click(function() {
newsearch(0);
});
//回车事件
$("#txtKeywords").bind("keyup", function(event) {
var e = event || window.event;
if (e && e.keyCode === 13 && $('.header-search-list li.active').index() === -1 &&
$.trim($(this).val()) !== '') {
newsearch($(this).data("isnew"));
}
});
});
function newsearch(isnew) {
var $keywords = $("#txtKeywords");
$keywords.focusout();
var title = $keywords.val();
if (title === "") {
title = $keywords.attr("data");
}
if (isnew && isnew === 1) {
window.location.href = "/search?key=" + encodeURIComponent(title);
} else {
window.open("/search?key=" + encodeURIComponent(title));
}
}
</script>
<script src="/template/pc/mangabz/js/vendor-6a7044.js?v=1695730296"></script>
<script src="/template/pc/mangabz/js/reader-6a7044.js?v=1695730296"></script>
<script src="/api/hits/comic/199421"></script></body>
</html>
函数返回了空列表
但奇怪的是同一个网站的另一个结构相似的页面正常返回了所有<img元素的列表
<!DOCTYPE HTML>
<html lang="zh-CN">
<head lang="zh-CN">
<meta charset="UTF-8">
<meta http-equiv="X-UA-Compatible" content="IE=edge,chrome=1">
<meta name="renderer" content="webkit">
<meta name="viewport" content="width=device-width,initial-scale=1,maximum-scale=1">
<meta http-equiv="content-language" content="zh-CN">
<title>世界终焉的世界录漫画 第01话免费观看-爱国漫</title>
<meta name="keywords" lang="zh-CN" content="世界终焉的世界录漫画,世界终焉的世界录第01话免费阅读,世界终焉的世界录漫画在线观看">
<meta name="description" lang="zh-CN" content="爱国漫为您更新世界终焉的世界录漫画,世界终焉的世界录第01话免费阅读,世界终焉的世界录漫画在线观看,世界终焉的世界录第01话漫画情节,更多精彩漫画尽在爱国漫漫画网!">
<meta http-equiv="Cache-Control" content="no-transform">
<meta http-equiv="Cache-Control" content="no-siteapp">
<link rel="stylesheet" type="text/css" href="/template/pc/mangabz/css/lit-reader-5976e0f0f2.css?v=1695730296">
<link rel="stylesheet" type="text/css" href="/template/pc/mangabz/css/common-d795625f09.css?v=1695730296">
<link rel="stylesheet" type="text/css" href="/template/pc/mangabz/css/reader-e8e4175d18.css?v=1695730296">
<script type="text/javascript">
function isMobileHanddle() {
var e = navigator.userAgent;
return (screen.width / screen.height < 1 || /AppleWebKit.*Mobile/i.test(e) || /MIDP|SymbianOS|NOKIA|SAMSUNG|LG|NEC|TCL|Alcatel|BIRD|DBTEL|Dopod|PHILIPS|HAIER|LENOVO|MOT-|Nokia|SonyEricsson|SIE-|Amoi|ZTE/.test(e)) && !/ipad/gi.test(e)
}
isMobileHanddle() && (window.location.href = 'https://m.aiguoman.com/' + window.location.pathname.substr(1));
</script>
</head>
<body class="toolbar">
<!-- 顶部 -->
<div class="nav-top-wrap J_nav-top J_block" data-block="810100" data-blockname="工具视图">
<div class="logo-wrap logo-sub-wrap">
<h1><a href="javascript:;" class="logo" data-pageend="quit" p-rseat="mgchapter" title="叭嗒"></a></h1>
</div>
<div class="nav-top">
<div class="logo-wrap">
<a href="/" data-pageend="quit" p-rseat="mgchapter" class="logo"></a>
<span class="cartoon-title">
<a href="/comic/shijiezhongyandeshijielu" target="_blank" class="chapter" data-rseat="mgchapter" data-bookid="18yzme91z9">世界终焉的世界录<span class="icon-arr-top"></span></a>
<!-- 右箭头与题目需要连在一起 -->
<!-- <span class="icon-arr-top"></span> -->
<a href="javascript:;" class="chapter-sub">第01话 </a>
</span>
</div>
</div>
</div> <!-- 中间主体漫画部分 -->
<!-- 滚动模式 -->
<div class="main main-scroll_mode J_scroll_mode J_block" data-block="810100" data-blockname="阅读视图">
<ul class="main-container">
<li class="main-item"><img src="https://res.xiaoqinre.com/images/comic/1/1964/1481176135eSRYn46P2zra-nuW.jpg"></li>
<li class="main-item"><img src="https://res.xiaoqinre.com/images/comic/1/1964/1481176141iy0LxT2e69YV8yfK.jpg"></li>
<li class="main-item"><img src="https://res.xiaoqinre.com/images/comic/1/1964/1481176144w2qZhrtnH4qOutTC.jpg"></li>
<li class="main-item"><img src="https://res.xiaoqinre.com/images/comic/1/1964/1481176148r7z-zlGbWZEihczX.jpg"></li>
<li class="main-item"><img src="https://res.xiaoqinre.com/images/comic/1/1964/1481176152krBBCwliMuUIiFGh.jpg"></li>
<li class="main-item"><img src="https://res.xiaoqinre.com/images/comic/1/1964/1481176155Ft58vCT-Q6lSaqTo.jpg"></li>
<li class="main-item"><img src="https://res.xiaoqinre.com/images/comic/1/1964/1481176158Ab2dZKEXguZWNpR3.jpg"></li>
<li class="main-item"><img src="https://res.xiaoqinre.com/images/comic/1/1964/1481176162kw2dMATy3bhfhcu2.jpg"></li>
<li class="main-item"><img src="https://res.xiaoqinre.com/images/comic/1/1964/1481176165AWEVd4vKmkGpVSnM.jpg"></li>
<li class="main-item">
<p class="next-subtit">即将进入下一话</p><a data-rseat="tnextchp" href="/chapter/142355-35474.html" class="next-title">第02话<i class="icon-next"></i></a>
</li>
</ul>
</div> <!-- 底部 -->
<script src="/poster/pc-agm-sy-hengfu.js?v=1695730296"></script>
<div class="nav-bottom-wrap J_nav-bottom J_block" data-block="810100" data-blockname="工具视图">
<div class="nav-bottom nav-bottom-toolbar">
<ul class="page-container">=
</ul>
<ul class="nav-bottom-ul">
<li class="catalog-item"><a href="/app/" target="_blank" class="collect-form-btn-4">下载APP</a></li>
<li class="splite-item"></li>
<li class="catalog-item"><a href="/comic/shijiezhongyandeshijielu" class="J_catalog_button" data-rseat="catalog"><i class="icon-catalog"></i>目录</a></li>
<li class><a data-rseat="nechp" class="J_next_eposide_btn " href="/chapter/142355-35474.html">下一话<i class="icon-nextpage"></i></a></li>
<li class><a data-rseat="bachp" class="J_prev_eposide_btn " href=""><i class="icon-uppage"></i>上一话</a></li>
</ul>
</div>
</div>
<script src="/template/pc/mangabz/js/jquery-1-4f775cb966.11.1.min.js?v=1695730296"></script>
<script type="text/javascript">
window.jquery = window.jQuery
</script>
<div class="footer">
<p>Copyright (C) 2005-2018 </p>
<p> 爱国漫(www.aiguoman.com)是一家漫画免费分享以及在线浏览平台</p>
<p>版权投诉 manhuahao@gmail.com</p>
</div>
<script type="text/javascript">
$(function() {
$("#btnSearch").click(function() {
newsearch(0);
});
//回车事件
$("#txtKeywords").bind("keyup", function(event) {
var e = event || window.event;
if (e && e.keyCode === 13 && $('.header-search-list li.active').index() === -1 &&
$.trim($(this).val()) !== '') {
newsearch($(this).data("isnew"));
}
});
});
function newsearch(isnew) {
var $keywords = $("#txtKeywords");
$keywords.focusout();
var title = $keywords.val();
if (title === "") {
title = $keywords.attr("data");
}
最后我发现是因为网站的<title>中有日文字符
<title>(C101)SWEET CANDY POT! 7 (オリジナル)漫画 全一话免费观看-爱国漫</title>
删掉网页源码的这三行后函数正常返回所有带<img标签的元素