循环中注意递增公式位置的（爬取**文章为例）

最新推荐文章于 2024-06-13 23:55:30 发布

hyperbola_001

最新推荐文章于 2024-06-13 23:55:30 发布

阅读量156

点赞数

文章标签： python

本文链接：https://blog.csdn.net/hyperbola_001/article/details/106962310

版权

循环中注意递增公式位置（爬取**文章为例）

一、取出后循环（取出了两页内容）

import requests
from bs4 import BeautifulSoup

url1 = 'https://www.……articles'#网址

headers={'user-agent':'Mozilla/5.0 (Macintosh; Intel Mac OS X 10_13_6) AppleWebKit/537.36 (KHTML, like Gecko) Chrome/71.0.3578.98 Safari/537.36'}#请求头

offset = 10#初始值
while True:#循环，取多页
    params = {
        'include': 'data[*].comment_count,suggest_edit,is_normal,thumbnail_extra_info,thumbnail,can_comment,comment_permission,admin_closed_comment,content,voteup_count,created,updated,upvoted_followees,voting,review_info,is_labeled,label_info;data[*].author.badge[?(type=best_answerer)].topics',
        'offset': str(offset),
        'limit': '10',
        'sort_by': 'created'}
    
    res = requests.get(url1,headers = headers,params = params)
    
    articles1 = res.json()
    articles = articles1['data']    
   
    for i in articles:
        title = [i['title']]
        print(title)  
        
    **offset = offset + 20    #注：取一次，递增一次
    if offset > 30:
        break**

二、循环后取出（取出一页内容）

import requests
from bs4 import BeautifulSoup

url1 = 'https://www.……articles'#网址

headers={'user-agent':'Mozilla/5.0 (Macintosh; Intel Mac OS X 10_13_6) AppleWebKit/537.36 (KHTML, like Gecko) Chrome/71.0.3578.98 Safari/537.36'}#请求头

offset = 10#初始值
while True:#循环，取多页
    params = {
        'include': 'data[*].comment_count,suggest_edit,is_normal,thumbnail_extra_info,thumbnail,can_comment,comment_permission,admin_closed_comment,content,voteup_count,created,updated,upvoted_followees,voting,review_info,is_labeled,label_info;data[*].author.badge[?(type=best_answerer)].topics',
        'offset': str(offset),
        'limit': '10',
        'sort_by': 'created'}
    
    res = requests.get(url1,headers = headers,params = params)
    
    articles1 = res.json()
    articles = articles1['data']    
    
    **offset = offset + 20    #注：取第二页之前，已经停止
    if offset > 30:
        break**
           
    for i in articles:
        title = [i['title']]
        print(title)

hyperbola_001

关注

0
点赞
踩
0

收藏

觉得还不错? 一键收藏
0
评论
循环中注意递增公式位置的（爬取**文章为例）

循环中注意递增公式位置（爬取**文章为例）一、取出后循环（取出了两页内容）import requestsfrom bs4 import BeautifulSoupurl1 = 'https://www.……articles'#网址headers={'user-agent':'Mozilla/5.0 (Macintosh; Intel Mac OS X 10_13_6) AppleWebKit/537.36 (KHTML, like Gecko) Chrome/71.0.3578.98 Safa
复制链接

扫一扫