我们知道部分网站属于动态页面,数据不通过产生新的url即可加载。比如说今日头条,数据通过下拉方式加载;而又比如说信用成都网站,则通过产生新窗口加载数据。
以爬取信用成都列异名录为例。爬取地址为:“http://credit.chengdu.gov.cn/www/index.html#/m///exceptionList/1/10/”
按F12打开源码,发现数据被隐藏。
打开pychram下载源码试试。
import requests
from bs4 import BeautifulSoup
url="http://credit.chengdu.gov.cn/www/index.html#/m///exceptionList/1/10/"
r= requests.get(url)
soup = BeautifulSoup(r.text,"lxml")
print(soup)
打印出的页面如下:
<!DOCTYPE html>
<html>
<head>
<meta charset="utf-8"/>
<style>
body{ background:#fff; font-family: microsoft yahei; color:#969696; font-size:14px;}
.online-desc-con { text-align:center; }
.r-tip01 { color: #333; font-size: 18px; display: block; text-align: center; width: 500px; padding: 0 10px; overflow: hidden; text-overflow: ellipsis; margin: 0 auto 15px; }
.r-tip02 { color: #585858; font-size: 14px; display: block; margin-top: 20px; margin-bottom: 20px; }
#notice-jiasule {
word-wrap: break-word;
word-break: normal;
color:#585858;
border:1px solid #ddd;
padding:0px 20px 0px 20px
}
img { border: 0; }
.u-ico{ vertical-align: middle; margin-right: 12px;}
.btn{ padding: 8px 22px; border-radius: 3px; border: 0; display: inline-block;vertical-align: middle;text-decoration: none;}
.btn-g{ background-color: #61b25e; color: #fff;}
.report {color: #858585; text-decoration: none;}
.report:hover {text-decoration: underline; color: #0088CC;}
hr{ border-top: 1px dashed #ddd;}
c