lda主题词评论python_如何将主题转换为python LDA中每个主题的前20个单词的列表

fromgensimimportcorporaimportgensimfromgensim.models.ldamodelimportLdaModelfromgensim.parsing.preprocessingimportSTOPWORDS# example docsdoc1="""

Java (Indonesian: Jawa; Javanese: ꦗꦮ; Sundanese: ᮏᮝ) is an island of Indonesia.\

With a population of over 141 million (the island itself) or 145 million (the \

administrative region), Java is home to 56.7 percent of the Indonesian population \

and is the most populous island on Earth.[1] The Indonesian capital city, Jakarta, \

is located on western Java. Much of Indonesian history took place on Java. It was \

the center of powerful Hindu-Buddhist empires, the Islamic sultanates, and the core \

of the colonial Dutch East Indies. Java was also the center of the Indonesian struggle \

for independence during the 1930s and 1940s. Java dominates Indonesia politically, \

economically and culturally.

"""doc2="""

Hydrogen fuel is a zero-emission fuel when burned with oxygen, if one considers water \

not to be an emission. It often uses electrochemical cells, or combustion in internal \

engines, to power vehicles and electric devices. It is also used in the propulsion of \

spacecraft and might potentially be mass-produced and commercialized for passenger vehicles \

and aircraft.Hydrogen lies in the first group and first period in the periodic table, i.e. \

it is the first element on the periodic table, making it the lightest element. Since \

hydrogen gas is so light, it rises in the atmosphere and is therefore rarely found in \

its pure form, H2."""doc3="""

The giraffe (Giraffa) is a genus of African even-toed ungulate mammals, the tallest living \

terrestrial animals and the largest ruminants. The genus currently consists of one species, \

Giraffa camelopardalis, the type species. Seven other species are extinct, prehistoric \

species known from fossils. Taxonomic classifications of one to eight extant giraffe species\

have been described, based upon research into the mitochondrial and nuclear DNA, as well \

as morphological measurements of Giraffa, but the IUCN currently recognizes only one \

species with nine subspecies.

"""documents=[doc1,doc2,doc3]document_wrd_splt=[[wordforwordindocument.lower().split()ifwordnotinSTOPWORDS]\fordocumentindocuments]dictionary=corpora.Dictionary(document_wrd_splt)print(dictionary.token2id)corpus=[dictionary.doc2bow(text)fortextintexts]lda=LdaModel(corpus,num_topics=3,id2word=dictionary,passes=50)num_topics=3topic_words=[]foriinrange(num_topics):tt=lda.get_topic_terms(i,20)topic_words.append([dictionary[pair[0]]forpairintt])# output>>>topic_words[0]['indonesian','java','species','island','population','million','(the','java.','center','giraffe','currently','genus','city,','economically','administrative','east','sundanese:','itself)','took','1940s.']>>>topic_words[1]['vehicles','fuel','hydrogen','periodic','table,','i.e.','uses','form,','considers','zero-emission','internal','period','burned','cells,','rises','pure','atmosphere','aircraft.hydrogen','water','engines,']>>>topic_words[2]['giraffa,','even-toed','living','described,','camelopardalis,','consists','extinct,','seven','fossils.','morphological','terrestrial','(giraffa)','dna,','mitochondrial','nuclear','ruminants.','classifications','species,','prehistoric','known']

评论
添加红包

请填写红包祝福语或标题

红包个数最小为10个

红包金额最低5元

当前余额3.43前往充值 >
需支付:10.00
成就一亿技术人!
领取后你会自动成为博主和红包主的粉丝 规则
hope_wisdom
发出的红包
实付
使用余额支付
点击重新获取
扫码支付
钱包余额 0

抵扣说明:

1.余额是钱包充值的虚拟货币,按照1:1的比例进行支付金额的抵扣。
2.余额无法直接购买下载,可以购买VIP、付费专栏及课程。

余额充值