Python获取网页中动态加载的数据
0、XHR 是什么?
XHR是 XMLHttpRequest 对象。既Ajax功能实现所依赖的对象,在JQuery中的Ajax是对 XHR的封装。
1、查看异步加载数据的RequestURL
图片示例:
2、查看图片在HTML页面中的绝对定位
图片示例:可以看到动态JS新增Div标签。
复制IMG在HTML 页面中的绝对定位
3、爬取异步加载的数据
这种可以用来爬取循环加载的网站。
代码示例:
from bs4 import BeautifulSoupimport requestsimport time
url = 'https://knewone.com/discover?page='def get_page(Url, data=None): print(Url)
wb_data = requests.get(Url)
soup = BeautifulSoup(wb_data.text, 'lxml')
imgs = soup.select('a.cover-inner > img') # 获取页面所有的img titles = soup.select('section.content > h4 > a') # 获取所有img的title links = soup.select('section.content > h4 > a') # 获取所有标签的链接
if data == None:
for img, title, link in zip(imgs, titles, links):
data = {
'img': img.get('src'),
'title': title.get('title'),
'link': link.get('href')
}
print(data)
def get_more_pages(Url, start, end): for one in range(start, end):
get_page(Url + str(one)) # 添加页码 time.sleep(1) # 防止被封IP,所以暂停1秒。
get_more_pages(url, 1, 10) # 获取1-9页的数据。
代码运行结果:
/Library/Frameworks/Python.framework/Versions/3.6/bin/python3 /Users/mac/Desktop/data/cloudbility/四周爬虫/2-KneWOne.py
https://knewone.com/discover?page=1{'img': 'https://making-photos.b0.upaiyun.com/photos/dfaec1d3ba6df86562f9699869ababd4.jpg!thing.fixed.big', 'title': 'Osmo 儿童游戏套件', 'link': '/things/osmo-er-tong-you-xi-tao-jian'}
{'img': 'https://making-photos.b0.upaiyun.com/photos/2883a5c06b4da12a0cde1b7dff26b104.jpg!thing.fixed.big', 'title': 'TBot', 'link': '/things/tbot'}
{'img': 'https://making-photos.b0.upaiyun.com/photos/aaeb7de0751ebc627c0971deb633b265.jpg!thing.fixed.big', 'title': 'olloclip 四合一摄像镜头 iPhone 6/6 Plus 版', 'link': '/things/olloclip-si-he-she-xiang-jing-tou-iphone-6-slash-6-plus-ban'}
{'img': 'https://making-photos.b0.upaiyun.com/photos/91063c24d62dead12a8d9e2a54887f51.jpg!thing.fixed.big', 'title': 'Momax SelfiFit Mini 蓝牙自拍器', 'link': '/things/momax-selfifit-lan-ya-zi-pai-qi'}
{'img': 'https://making-photos.b0.upaiyun.com/photos/a97bf7f2a200bace8bd1d629b6436b85.jpg!thing.fixed.big', 'title': '贱驴 007', 'link': '/things/jian-lu-007-1'}
{'img': 'https://making-photos.b0.upaiyun.com/photos/8c7c2008ebb9844a6a86123e8554b8e4.jpg!thing.fixed.big', 'title': 'Moshi Xync Lightning Keychain 连接线', 'link': '/things/moshi-xync-lightning-keychain-lian-jie-xian'}
{'img': 'https://making-photos.b0.upaiyun.com/photos/9fb1f1372d5b2c486e3ca903ca11826e.jpg!thing.fixed.big', 'title': 'RainDesign iLevel 2 支架', 'link': '/things/raindesign-ilevel-2'}
{'img': 'https://making-photos.b0.upaiyun.com/photos/5ad99a590f7be4ba4b0df26f94e2c8a4.jpg!thing.fixed.big'