[python]bertopic安装后测试代码

from bertopic import BERTopic
from sklearn.datasets import fetch_20newsgroups

docs = fetch_20newsgroups(subset='all', remove=('headers', 'footers', 'quotes'))['data']

topic_model = BERTopic()
print('start fit transform...')
topics, probs = topic_model.fit_transform(docs)
print('fit done')
print(topic_model.get_topic_info())

上面fetch_20newsgroups加载需要国外源因此很难下载,需要手动离线加载,加载方法参考文章:

[python]离线加载fetch_20newsgroups数据集-CSDN博客文章浏览阅读438次,点赞7次,收藏9次。打开twenty_newsgroups.py文件。下载这个文件后和脚本放一起就行,然后。首先手动下载这个数据包。https://blog.csdn.net/FL1623863129/article/details/134654050?ops_request_misc=%257B%2522request%255Fid%2522%253A%2522170348829216800222813703%2522%252C%2522scm%2522%253A%252220140713.130102334.pc%255Fblog.%2522%257D&request_id=170348829216800222813703&biz_id=0&utm_medium=distribute.pc_search_result.none-task-blog-2~blog~first_rank_ecpm_v1~rank_v31_ecpm-1-134654050-null-null.nonecase&utm_term=fetch_20newsgroups&spm=1018.2226.3001.4450

测试输出:

start fit transform...
fit done
     Topic  ...                                Representative_Docs
0       -1  ...  [This is a periodic posting intended to answer...
1        0  ...  [I thought I'd post my predicted standings sin...
2        1  ...  [\nI am not an expert in the cryptography scie...
3        2  ...                            [Hello,, Hello,, ites:]
4        3  ...  [*********************************************...
..     ...  ...                                                ...
227    226  ...  [\n\nTrue, coach Matikainen is ready to keep a...
228    227  ...  [Archive-name: typing-injury-faq/software\nVer...
229    228  ...  [\n\nIn this era of AIDS, isn't someone's fuck...
230    229  ...  [Hi, I am doing a term paper on the syringe an...
231    230  ...  [\n\n\n\n\nSounds to me like your dealer reall...

[232 rows x 5 columns]

 

  • 24
    点赞
  • 24
    收藏
    觉得还不错? 一键收藏
  • 打赏
    打赏
  • 0
    评论
评论
添加红包

请填写红包祝福语或标题

红包个数最小为10个

红包金额最低5元

当前余额3.43前往充值 >
需支付:10.00
成就一亿技术人!
领取后你会自动成为博主和红包主的粉丝 规则
hope_wisdom
发出的红包

打赏作者

FL1623863129

你的打赏是我写文章最大的动力

¥1 ¥2 ¥4 ¥6 ¥10 ¥20
扫码支付:¥1
获取中
扫码支付

您的余额不足,请更换扫码支付或充值

打赏作者

实付
使用余额支付
点击重新获取
扫码支付
钱包余额 0

抵扣说明:

1.余额是钱包充值的虚拟货币,按照1:1的比例进行支付金额的抵扣。
2.余额无法直接购买下载,可以购买VIP、付费专栏及课程。

余额充值