关于soup.findAll返回空列表

我在爬取一个漫画网站时遇到了一个这样的问题

这是我的选择器

from bs4 import BeautifulSoup
#选择器
comicelm= soup.findAll('img')
print(comicelm)

这是网站源码

<!DOCTYPE HTML>
<html lang="zh-CN">
<head lang="zh-CN">
    <meta charset="UTF-8">
    <meta http-equiv="X-UA-Compatible" content="IE=edge,chrome=1">
    <meta name="renderer" content="webkit">
    <meta name="viewport" content="width=device-width,initial-scale=1,maximum-scale=1">
    <meta http-equiv="content-language" content="zh-CN">
    <title>&#40C101&#41SWEET CANDY POT! 7 &#40オリジナル&#41漫画 全一话免费观看-爱国漫</title>
    <meta name="keywords" lang="zh-CN" content="&#40C101&#41SWEET CANDY POT! 7 &#40オリジナル&#41漫画,&#40C101&#41SWEET CANDY POT! 7 &#40オリジナル&#41全一话免费阅读,&#40C101&#41SWEET CANDY POT! 7 &#40オリジナル&#41漫画在线观看">
    <meta name="description" lang="zh-CN" content="爱国漫为您更新&#40C101&#41SWEET CANDY POT! 7 &#40オリジナル&#41漫画,&#40C101&#41SWEET CANDY POT! 7 &#40オリジナル&#41全一话免费阅读,&#40C101&#41SWEET CANDY POT! 7 &#40オリジナル&#41漫画在线观看,&#40C101&#41SWEET CANDY POT! 7 &#40オリジナル&#41全一话漫画情节,更多精彩漫画尽在爱国漫漫画网!">
    <meta http-equiv="Cache-Control" content="no-transform">
    <meta http-equiv="Cache-Control" content="no-siteapp">
    <link rel="stylesheet" type="text/css" href="/template/pc/mangabz/css/lit-reader-5976e0f0f2.css?v=1695730296">
    <link rel="stylesheet" type="text/css" href="/template/pc/mangabz/css/common-d795625f09.css?v=1695730296">
    <link rel="stylesheet" type="text/css" href="/template/pc/mangabz/css/reader-e8e4175d18.css?v=1695730296">
    <script type="text/javascript">
        function isMobileHanddle() {
            var e = navigator.userAgent;
            return (screen.width / screen.height < 1 || /AppleWebKit.*Mobile/i.test(e) || /MIDP|SymbianOS|NOKIA|SAMSUNG|LG|NEC|TCL|Alcatel|BIRD|DBTEL|Dopod|PHILIPS|HAIER|LENOVO|MOT-|Nokia|SonyEricsson|SIE-|Amoi|ZTE/.test(e)) && !/ipad/gi.test(e)
        }

        isMobileHanddle() && (window.location.href = 'https://m.aiguoman.com/' + window.location.pathname.substr(1));
    </script>
</head>
<body class="toolbar">
<!-- 顶部 -->
<div class="nav-top-wrap J_nav-top J_block" data-block="810100" data-blockname="工具视图">
    <div class="logo-wrap logo-sub-wrap">
        <h1><a href="javascript:;" class="logo" data-pageend="quit" p-rseat="mgchapter" title="叭嗒"></a></h1>
    </div>
    <div class="nav-top">
        <div class="logo-wrap">
            <a href="/" data-pageend="quit" p-rseat="mgchapter" class="logo"></a>
            <span class="cartoon-title">
                <a href="/comic/40c10141sweetcandypot74041" target="_blank" class="chapter" data-rseat="mgchapter" data-bookid="18yzme91z9">&#40C101&#41SWEET CANDY POT! 7 &#40オリジナル&#41<span class="icon-arr-top"></span></a>
                <!-- 右箭头与题目需要连在一起 -->
                <!-- <span class="icon-arr-top"></span> -->
                <a href="javascript:;" class="chapter-sub">全一话 </a>
            </span>
        </div>
    </div>
</div> <!-- 中间主体漫画部分 -->
<!-- 滚动模式 -->
<div class="main main-scroll_mode J_scroll_mode J_block" data-block="810100" data-blockname="阅读视图">
    <ul class="main-container">
        
        
        <li class="main-item"><img src="https://res.xiaoqinre.com/images/comic/1014/2027038/1674908611yioH1MrtvZ2D6LpP.jpg"></li>
        
        <li class="main-item"><img src="https://res.xiaoqinre.com/images/comic/1014/2027038/1674908611VAfrYl73yiemkdvW.jpg"></li>
        
        <li class="main-item"><img src="https://res.xiaoqinre.com/images/comic/1014/2027038/16749086106lZHGxc612L3c2n0.jpg"></li>
        
        <li class="main-item"><img src="https://res.xiaoqinre.com/images/comic/1014/2027038/1674908610FKXNG5OEiqPo9NeA.jpg"></li>
        
        <li class="main-item"><img src="https://res.xiaoqinre.com/images/comic/1014/2027038/167490860994dnW80_stZ8pi-z.jpg"></li>
        
        <li class="main-item"><img src="https://res.xiaoqinre.com/images/comic/1014/2027038/1674908609HSJ9BlnnEiz9trJb.jpg"></li>
        
        <li class="main-item"><img src="https://res.xiaoqinre.com/images/comic/1014/2027038/1674908608dnK2-1T5_FJme1TU.jpg"></li>
        
        <li class="main-item"><img src="https://res.xiaoqinre.com/images/comic/1014/2027038/1674908608N3psToVy_xqdDuGP.jpg"></li>
        
        <li class="main-item"><img src="https://res.xiaoqinre.com/images/comic/1014/2027038/1674908607z_7JMzahbDs5m0Ic.jpg"></li>
        
        <li class="main-item"><img src="https://res.xiaoqinre.com/images/comic/1014/2027038/16749086076bsaHtURI1zTU-wG.jpg"></li>
        
        <li class="main-item"><img src="https://res.xiaoqinre.com/images/comic/1014/2027038/1674908606awNk3_sPuZhH7iPz.jpg"></li>
        
        <li class="main-item"><img src="https://res.xiaoqinre.com/images/comic/1014/2027038/1674908606BtSyyCYoBIou9rY6.jpg"></li>
        
        <li class="main-item"><img src="https://res.xiaoqinre.com/images/comic/1014/2027038/1674908605CsK4ohyHtKybpppr.jpg"></li>
        
        <li class="main-item"><img src="https://res.xiaoqinre.com/images/comic/1014/2027038/1674908605zP0qpB-zum0u_B0O.jpg"></li>
        
        
        <li class="main-item">
            <p class="next-subtit">即将进入下一话</p><a data-rseat="tnextchp" href="" class="next-title">没有了<i class="icon-next"></i></a>
        </li>
    </ul>
</div> <!-- 底部 -->
<script src="/poster/pc-agm-sy-hengfu.js?v=1695730296"></script>
<div class="nav-bottom-wrap J_nav-bottom J_block" data-block="810100" data-blockname="工具视图">
    <div class="nav-bottom nav-bottom-toolbar">
        <ul class="page-container">=
        </ul>
        <ul class="nav-bottom-ul">
            <li class="catalog-item"><a href="/app/" target="_blank" class="collect-form-btn-4">下载APP</a></li>
            <li class="splite-item"></li>
            <li class="catalog-item"><a href="/comic/40c10141sweetcandypot74041" class="J_catalog_button" data-rseat="catalog"><i class="icon-catalog"></i>目录</a></li>
            <li class><a data-rseat="nechp" class="J_next_eposide_btn " href="">下一话<i class="icon-nextpage"></i></a></li>
            <li class><a data-rseat="bachp" class="J_prev_eposide_btn " href=""><i class="icon-uppage"></i>上一话</a></li>
        </ul>
    </div>
</div>
<script src="/template/pc/mangabz/js/jquery-1-4f775cb966.11.1.min.js?v=1695730296"></script>
<script type="text/javascript">
    window.jquery = window.jQuery
</script>
<div class="footer">
  <p>Copyright (C) 2005-2018  </p>
  <p> 爱国漫(www.aiguoman.com)是一家漫画免费分享以及在线浏览平台</p>
  <p>版权投诉 manhuahao@gmail.com</p>
</div>
<script type="text/javascript">
  $(function() {
    $("#btnSearch").click(function() {
      newsearch(0);
    });
    //回车事件
    $("#txtKeywords").bind("keyup", function(event) {
      var e = event || window.event;
      if (e && e.keyCode === 13 && $('.header-search-list li.active').index() === -1 &&
              $.trim($(this).val()) !== '') {
        newsearch($(this).data("isnew"));
      }
    });
  });
  function newsearch(isnew) {
    var $keywords = $("#txtKeywords");
    $keywords.focusout();
    var title = $keywords.val();
    if (title === "") {
      title = $keywords.attr("data");
    }
    if (isnew && isnew === 1) {
      window.location.href = "/search?key=" + encodeURIComponent(title);
    } else {
      window.open("/search?key=" + encodeURIComponent(title));
    }
  }
</script>
<script src="/template/pc/mangabz/js/vendor-6a7044.js?v=1695730296"></script>
<script src="/template/pc/mangabz/js/reader-6a7044.js?v=1695730296"></script>
<script src="/api/hits/comic/199421"></script></body>
</html>

函数返回了空列表

但奇怪的是同一个网站的另一个结构相似的页面正常返回了所有<img元素的列表

<!DOCTYPE HTML>
<html lang="zh-CN">
<head lang="zh-CN">
    <meta charset="UTF-8">
    <meta http-equiv="X-UA-Compatible" content="IE=edge,chrome=1">
    <meta name="renderer" content="webkit">
    <meta name="viewport" content="width=device-width,initial-scale=1,maximum-scale=1">
    <meta http-equiv="content-language" content="zh-CN">
    <title>世界终焉的世界录漫画 第01话免费观看-爱国漫</title>
    <meta name="keywords" lang="zh-CN" content="世界终焉的世界录漫画,世界终焉的世界录第01话免费阅读,世界终焉的世界录漫画在线观看">
    <meta name="description" lang="zh-CN" content="爱国漫为您更新世界终焉的世界录漫画,世界终焉的世界录第01话免费阅读,世界终焉的世界录漫画在线观看,世界终焉的世界录第01话漫画情节,更多精彩漫画尽在爱国漫漫画网!">
    <meta http-equiv="Cache-Control" content="no-transform">
    <meta http-equiv="Cache-Control" content="no-siteapp">
    <link rel="stylesheet" type="text/css" href="/template/pc/mangabz/css/lit-reader-5976e0f0f2.css?v=1695730296">
    <link rel="stylesheet" type="text/css" href="/template/pc/mangabz/css/common-d795625f09.css?v=1695730296">
    <link rel="stylesheet" type="text/css" href="/template/pc/mangabz/css/reader-e8e4175d18.css?v=1695730296">
    <script type="text/javascript">
        function isMobileHanddle() {
            var e = navigator.userAgent;
            return (screen.width / screen.height < 1 || /AppleWebKit.*Mobile/i.test(e) || /MIDP|SymbianOS|NOKIA|SAMSUNG|LG|NEC|TCL|Alcatel|BIRD|DBTEL|Dopod|PHILIPS|HAIER|LENOVO|MOT-|Nokia|SonyEricsson|SIE-|Amoi|ZTE/.test(e)) && !/ipad/gi.test(e)
        }

        isMobileHanddle() && (window.location.href = 'https://m.aiguoman.com/' + window.location.pathname.substr(1));
    </script>
</head>
<body class="toolbar">
<!-- 顶部 -->
<div class="nav-top-wrap J_nav-top J_block" data-block="810100" data-blockname="工具视图">
    <div class="logo-wrap logo-sub-wrap">
        <h1><a href="javascript:;" class="logo" data-pageend="quit" p-rseat="mgchapter" title="叭嗒"></a></h1>
    </div>
    <div class="nav-top">
        <div class="logo-wrap">
            <a href="/" data-pageend="quit" p-rseat="mgchapter" class="logo"></a>
            <span class="cartoon-title">
                <a href="/comic/shijiezhongyandeshijielu" target="_blank" class="chapter" data-rseat="mgchapter" data-bookid="18yzme91z9">世界终焉的世界录<span class="icon-arr-top"></span></a>
                <!-- 右箭头与题目需要连在一起 -->
                <!-- <span class="icon-arr-top"></span> -->
                <a href="javascript:;" class="chapter-sub">第01话 </a>
            </span>
        </div>
    </div>
</div> <!-- 中间主体漫画部分 -->
<!-- 滚动模式 -->
<div class="main main-scroll_mode J_scroll_mode J_block" data-block="810100" data-blockname="阅读视图">
    <ul class="main-container">
        
        
        <li class="main-item"><img src="https://res.xiaoqinre.com/images/comic/1/1964/1481176135eSRYn46P2zra-nuW.jpg"></li>
        
        <li class="main-item"><img src="https://res.xiaoqinre.com/images/comic/1/1964/1481176141iy0LxT2e69YV8yfK.jpg"></li>
        
        <li class="main-item"><img src="https://res.xiaoqinre.com/images/comic/1/1964/1481176144w2qZhrtnH4qOutTC.jpg"></li>
        
        <li class="main-item"><img src="https://res.xiaoqinre.com/images/comic/1/1964/1481176148r7z-zlGbWZEihczX.jpg"></li>
        
        <li class="main-item"><img src="https://res.xiaoqinre.com/images/comic/1/1964/1481176152krBBCwliMuUIiFGh.jpg"></li>
        
        <li class="main-item"><img src="https://res.xiaoqinre.com/images/comic/1/1964/1481176155Ft58vCT-Q6lSaqTo.jpg"></li>
        
        <li class="main-item"><img src="https://res.xiaoqinre.com/images/comic/1/1964/1481176158Ab2dZKEXguZWNpR3.jpg"></li>
        
        <li class="main-item"><img src="https://res.xiaoqinre.com/images/comic/1/1964/1481176162kw2dMATy3bhfhcu2.jpg"></li>
        
        <li class="main-item"><img src="https://res.xiaoqinre.com/images/comic/1/1964/1481176165AWEVd4vKmkGpVSnM.jpg"></li>
                
        <li class="main-item">
            <p class="next-subtit">即将进入下一话</p><a data-rseat="tnextchp" href="/chapter/142355-35474.html" class="next-title">第02话<i class="icon-next"></i></a>
        </li>
    </ul>
</div> <!-- 底部 -->
<script src="/poster/pc-agm-sy-hengfu.js?v=1695730296"></script>
<div class="nav-bottom-wrap J_nav-bottom J_block" data-block="810100" data-blockname="工具视图">
    <div class="nav-bottom nav-bottom-toolbar">
        <ul class="page-container">=
        </ul>
        <ul class="nav-bottom-ul">
            <li class="catalog-item"><a href="/app/" target="_blank" class="collect-form-btn-4">下载APP</a></li>
            <li class="splite-item"></li>
            <li class="catalog-item"><a href="/comic/shijiezhongyandeshijielu" class="J_catalog_button" data-rseat="catalog"><i class="icon-catalog"></i>目录</a></li>
            <li class><a data-rseat="nechp" class="J_next_eposide_btn " href="/chapter/142355-35474.html">下一话<i class="icon-nextpage"></i></a></li>
            <li class><a data-rseat="bachp" class="J_prev_eposide_btn " href=""><i class="icon-uppage"></i>上一话</a></li>
        </ul>
    </div>
</div>
<script src="/template/pc/mangabz/js/jquery-1-4f775cb966.11.1.min.js?v=1695730296"></script>
<script type="text/javascript">
    window.jquery = window.jQuery
</script>
<div class="footer">
  <p>Copyright (C) 2005-2018  </p>
  <p> 爱国漫(www.aiguoman.com)是一家漫画免费分享以及在线浏览平台</p>
  <p>版权投诉 manhuahao@gmail.com</p>
</div>
<script type="text/javascript">
  $(function() {
    $("#btnSearch").click(function() {
      newsearch(0);
    });
    //回车事件
    $("#txtKeywords").bind("keyup", function(event) {
      var e = event || window.event;
      if (e && e.keyCode === 13 && $('.header-search-list li.active').index() === -1 &&
              $.trim($(this).val()) !== '') {
        newsearch($(this).data("isnew"));
      }
    });
  });
  function newsearch(isnew) {
    var $keywords = $("#txtKeywords");
    $keywords.focusout();
    var title = $keywords.val();
    if (title === "") {
      title = $keywords.attr("data");
    }

最后我发现是因为网站的<title>中有日文字符

<title>&#40C101&#41SWEET CANDY POT! 7 &#40オリジナル&#41漫画 全一话免费观看-爱国漫</title>

删掉网页源码的这三行后函数正常返回所有带<img标签的元素

  • 0
    点赞
  • 0
    收藏
    觉得还不错? 一键收藏
  • 0
    评论

“相关推荐”对你有帮助么?

  • 非常没帮助
  • 没帮助
  • 一般
  • 有帮助
  • 非常有帮助
提交
评论
添加红包

请填写红包祝福语或标题

红包个数最小为10个

红包金额最低5元

当前余额3.43前往充值 >
需支付:10.00
成就一亿技术人!
领取后你会自动成为博主和红包主的粉丝 规则
hope_wisdom
发出的红包
实付
使用余额支付
点击重新获取
扫码支付
钱包余额 0

抵扣说明:

1.余额是钱包充值的虚拟货币,按照1:1的比例进行支付金额的抵扣。
2.余额无法直接购买下载,可以购买VIP、付费专栏及课程。

余额充值