关于爬虫过程出现的错误类型 AttributeError: 'NoneType' object has no attribute 'text'

最新推荐文章于 2024-07-30 10:28:28 发布

海hong

最新推荐文章于 2024-07-30 10:28:28 发布

阅读量1w

点赞数 2

分类专栏：问题

本文链接：https://blog.csdn.net/haihonga/article/details/98755227

版权

问题专栏收录该内容

5 篇文章 6 订阅

订阅专栏

Python爬虫学习错误记录
关于出现AttributeError: ‘NoneType’ object has no attribute 'text’编译错误的情况
在学习爬取百度新闻网站的新闻标题的时候，自己跟着网上的教程来写代码发现最后什么错误都没有，但是却无法通过编译，出现了AttributeError: ‘NoneType’ object has no attribute 'text’这个错误类型。
于是上各种论坛寻找解决办法，但发现都于事无补。
于是我对代码重新进行逐字检查，发现了原来是我自己写多了两行代码，有一行代码没有给入爬取的数据CSSSelector，于是我把它删掉之后编译就过了。
下面是爬取代码：

#get_baiduNews_everything.py

from requests_html import HTMLSession
from apscheduler.schedulers.blocking import BlockingScheduler

def get_news():
	#定义一个新的空列表，用来存储新闻标题
	ans_news_titles = []

	session = HTMLSession()
	#爬取新闻网页的源码
	r = session.get('http://news.baidu.com/')
	#找到对应的新闻标题
	title_baidu_news = r.html.find('', first = True)
	#新闻标题加入到空列表
	ans_news_titles.append(title_baidu_news)

	for i in range(6):
		#加入多个链接爬取新闻
		links = '#pane-news > div > ul > li.hdline{} > strong > a'.format(i)
		titles_baidu_news = r.html.find(links)
		#依次加入到空列表
		ans_news_titles += titles_baidu_news
	#依次打印空列表内容
	for title in ans_news_titles:
		print(title.text)

#运行函数get_news()
if __name__ == '__main__':
	get_news()

由于是小白所以代码基本一行一行进行理解，还做了备注，上面是编译错误的代码，主要是第14行出现问题，后来修改了之后，变成下面的代码（即把相关代码注释掉）

#get_baiduNews_everything.py

from requests_html import HTMLSession
from apscheduler.schedulers.blocking import BlockingScheduler

def get_news():
	#定义一个新的空列表，用来存储新闻标题
	ans_news_titles = []

	session = HTMLSession()
	#爬取新闻网页的源码
	r = session.get('http://news.baidu.com/')
	#找到对应的新闻标题
	#title_baidu_news = r.html.find('', first = True)
	#新闻标题加入到空列表
	#ans_news_titles.append(title_baidu_news)

	for i in range(6):
		#加入多个链接爬取新闻
		links = '#pane-news > div > ul > li.hdline{} > strong > a'.format(i)
		titles_baidu_news = r.html.find(links)
		#依次加入到空列表
		ans_news_titles += titles_baidu_news
	#依次打印空列表内容
	for title in ans_news_titles:
		print(title.text)

#运行函数get_news()
if __name__ == '__main__':
	get_news()