fromgensimimportcorporaimportgensimfromgensim.models.ldamodelimportLdaModelfromgensim.parsing.preprocessingimportSTOPWORDS# example docsdoc1="""
Java (Indonesian: Jawa; Javanese: ꦗꦮ; Sundanese: ᮏᮝ) is an island of Indonesia.\
With a population of over 141 million (the island itself) or 145 million (the \
administrative region), Java is home to 56.7 percent of the Indonesian population \
and is the most populous island on Earth.[1] The Indonesian capital city, Jakarta, \
is located on western Java. Much of Indonesian history took place on Java. It was \
the center of powerful Hindu-Buddhist empires, the Islamic sultanates, and the core \
of the colonial Dutch East Indies. Java was also the center of the Indonesian struggle \
for independence during the 1930s and 1940s. Java dominates Indonesia politically, \
economically and culturally.
"""doc2="""
Hydrogen fuel is a zero-emission fuel when burned with oxygen, if one considers water \
not to be an emission. It often uses electrochemical cells, or combustion in internal \
engines, to power vehicles and electric devices. It is also used in the propulsion of \
spacecraft and might potentially be mass-produced and commercialized for passenger vehicles \
and aircraft.Hydrogen lies in the first group and first period in the periodic table, i.e. \
it is the first element on the periodic table, making it the lightest element. Since \
hydrogen gas is so light, it rises in the atmosphere and is therefore rarely found in \
its pure form, H2."""doc3="""
The giraffe (Giraffa) is a genus of African even-toed ungulate mammals, the tallest living \
terrestrial animals and the largest ruminants. The genus currently consists of one species, \
Giraffa camelopardalis, the type species. Seven other species are extinct, prehistoric \
species known from fossils. Taxonomic classifications of one to eight extant giraffe species\
have been described, based upon research into the mitochondrial and nuclear DNA, as well \
as morphological measurements of Giraffa, but the IUCN currently recognizes only one \
species with nine subspecies.
"""documents=[doc1,doc2,doc3]document_wrd_splt=[[wordforwordindocument.lower().split()ifwordnotinSTOPWORDS]\fordocumentindocuments]dictionary=corpora.Dictionary(document_wrd_splt)print(dictionary.token2id)corpus=[dictionary.doc2bow(text)fortextintexts]lda=LdaModel(corpus,num_topics=3,id2word=dictionary,passes=50)num_topics=3topic_words=[]foriinrange(num_topics):tt=lda.get_topic_terms(i,20)topic_words.append([dictionary[pair[0]]forpairintt])# output>>>topic_words[0]['indonesian','java','species','island','population','million','(the','java.','center','giraffe','currently','genus','city,','economically','administrative','east','sundanese:','itself)','took','1940s.']>>>topic_words[1]['vehicles','fuel','hydrogen','periodic','table,','i.e.','uses','form,','considers','zero-emission','internal','period','burned','cells,','rises','pure','atmosphere','aircraft.hydrogen','water','engines,']>>>topic_words[2]['giraffa,','even-toed','living','described,','camelopardalis,','consists','extinct,','seven','fossils.','morphological','terrestrial','(giraffa)','dna,','mitochondrial','nuclear','ruminants.','classifications','species,','prehistoric','known']