百度api——python大批量数据情感分析

最新推荐文章于 2025-03-15 16:52:13 发布

GISer阿兴

最新推荐文章于 2025-03-15 16:52:13 发布

阅读量4.8k

点赞数 15

文章标签： python

本文链接：https://blog.csdn.net/qq_43542339/article/details/105181078

版权

最近琢磨了如何用百度api进行文本批量情感分析，最终成效挺好，能稳定批量分析大量数据。在膜拜了各大博文后主要解决了在进行批量分析文本时的各种奇怪问题（有说的不对的地方请多多指出）：

貌似是QPS受限制，使得文本情感分析不出来（不是很懂但是设置一个循环可解决）
网页访问受限制，分析过程中在进行到中途会报错，可能是访问网页的问题。

最终能完成8000+条文本的情绪分析，较为稳定。

首先是调用百度api分析情感的函数：

def sentiment_classify(text):
    raw = {"text":"内容"}
    raw['text'] = text
    data = json.dumps(raw).encode('utf-8')
    
	host = 'https://aip.baidubce.com/oauth/2.0/token?grant_type=client_credentials&client_id=[API Key]&client_secret=[Secret Key]'
	# API Key 和 Secret Key的获取防范见下方博客链接
	
    request = urllib.request.Request(url=host,data=data)
    request.add_header('Content-Type', 'application/json')
    response = urllib.request.urlopen(request)
    content = response.read().decode('utf-8')
    rdata = json.loads(content)
    return rdata

本代码以及API Key 和 Secret Key的获取参考下方博客链接

https://blog.csdn.net/ChenVast/article/details/82682750?depth_1-utm_source=distribute.pc_relevant.none-task&utm_source=distribute.pc_relevant.none-task

接着是主函数

if __name__ == '__main__':
    file = 'F:/...（文件目录）'
    text = pd.read_excel(file)
    review = text['txt']
    length = len(review)
    
    #初始化用来存储情感分析结果的列表
    sentiment = ['blank']*length
    confidence = ['blank']*length
    positive_prob = ['blank']*length

    time_start = time.time()#计时
    i = 0
    for content in review:
        content = content[:512]#百度api限制512个字符，过长也会导致出错
        op = True #利用循环和输出条件来保证获取到情绪分析的结果
        while op:
            maxTryNum = 50 #设置最大尝试访问的次数，通过多次访问保证不会因为访问受限制而得不到结果（可修改）
            for tries in range(maxTryNum):
                try:
                    result = sentiment_classify(content)
                    break
                except:
                    if tries < (maxTryNum - 1):
                        continue
                    else:
                        print('尝试了 %d 次都失败了！！！',maxTryNum)
                        break
            #因为发现如果能够成功调用api则输出结果长度为3，失败了长度为2，故将其设为控制输出的条件
            if len(result)==3:
                op = False
            else:
                op = True

        result1 = result.get('items')
        item = result1[0]
        sentiment[i] = item['sentiment']
        confidence[i] = item['confidence']
        positive_prob[i] = item['positive_prob']

		#方便观察进度
        print('第 ' + str(i+1) + ' 条评论已分析完成， 一共 ' + str(length) + ' 条评论')
        i = i+1

    time_end = time.time()
    print('分析评论一共耗时：' ,time_end-time_start)

    text['sentiment'] = sentiment
    text['confidence'] = confidence
    text['positive_prob'] = positive_prob

	#保存
    text.to_excel('F:/互联网＋/data/情绪分析结果/湖北_37_result.xlsx',index=None)
    print(file + "    result写入成功!")

最终可以完成大批量数据的情绪分析，还借鉴了该篇博文的思想，即设立循环多次尝试访问。最大的缺点就是大约每秒只能分析2条文本，数据量大时会花费很多时间。

https://blog.csdn.net/haoaiqian/article/details/70228025?utm_source=blogxgwz3

在这里插入图片描述