使用java抓取网络头条_抓取今日头条部分信息

最新推荐文章于 2021-03-11 16:47:38 发布

weixin_39719476

最新推荐文章于 2021-03-11 16:47:38 发布

阅读量612

点赞数

文章标签：使用java抓取网络头条

本文链接：https://blog.csdn.net/weixin_39719476/article/details/114815888

版权

import requests import re from urllib.parse import urlencode from requests.exceptions import RequestException import json from bs4 import BeautifulSoup import codecs from conm import *

header = { 'User-Agent': 'Mozilla/5.0 (Windows NT 6.1; WOW64) AppleWebKit/537.36 (KHTML, like Gecko) Chrome/55.0.2883.87 Safari/537.36', } #定义获取内容的方法 def get_page_index(offset,keyword): data = { 'offset': offset, 'format': 'json', 'keyword': keyword, 'autoload': 'true', 'count': 20, 'cur_tab': 3, } #合成url url = 'http://www.toutiao.com/search_content/?' + urlencode(data) #对请求做些异常处理 try: response = requests.get(url,headers = header) if response.status_code == 200: return response.text return None except RequestException: print('请求索引页出错') return None

def parse_page_index(html): data = json.loads(html) if data and 'data

最低0.47元/天解锁文章

确定要放弃本次机会？

福利倒计时

: :

立减 ¥

普通VIP年卡可用

立即使用

weixin_39719476

关注关注

0
点赞
踩
0

收藏

觉得还不错? 一键收藏
0
评论
使用java抓取网络头条_抓取今日头条部分信息

import requests import re from urllib.parse import urlencode from requests.exceptions import RequestException import json from bs4 import BeautifulSoup import codecs from conm import *header = { 'User...
复制链接

扫一扫