http://blog.csdn.net/pipisorry
Browse LDA Topic Models
This package allows you to create a set of HTML files to browse a topic model.It creates a word cloud and time-graph per topic, and annotates a selection of documents with the topic for each word.
Installing 安装
R命令行中输入:
if (!require(devtools)) {install.packages("devtools"); library(devtools)} install_github("vanatteveldt/topicbrowser") library(topicbrowser)
第一步提示要安装Rtools : Loading required package: devtools WARNING: Rtools is required to build R packages, but is not currently installed. Please download and install Rtools 3.1 from http://cran.r-project.org/bin/windows/Rtools/, then run> find_rtools() [1] TRUE ... 安装完成后第一步就不会报错了
Note:
1. 当前我使用的R版本3.2.0, Rtools版本3.3会报错,要安装Rtools3.1!再看看官方的也是醉了!!!
[http://cran.r-project.org/bin/windows/Rtools/]
2. 注意Rtools和R版本的兼容,没有错误时的样子:
> if (!require(devtools)) {install.packages("devtools"); library(devtools)} > install_github("vanatteveldt/topicbrowser") Downloading github repo vanatteveldt/topicbrowser@master Installing topicbrowser "C:/PROGRA~1/R/R-32~1.0/bin/x64/R" --vanilla CMD INSTALL \ "C:/Users/pi/AppData/Local/Temp/RtmpcvsU6M/devtools11d0fc638d5/vanatteveldt-topicbrowser-cfa62a3" \ --library="C:/Users/pi/Documents/R/win-library/3.2" --install-tests * installing *source* package 'topicbrowser' ... ** R ** data ** inst ** preparing package for lazy loading ** help *** installing help indices ** building package indices ** testing if installed package can be loaded * DONE (topicbrowser) Reloading installed topicbrowser > library(topicbrowser) >
Creating a topic browser
1. 先要安装R实现的topicmodel包
> install.packages("topicmodels") Installing package into 慍:/Users/pi/Documents/R/win-library/3.2?(as 憀ib?is unspecified) trying URL 'http://cran.rstudio.com/bin/windows/contrib/3.2/topicmodels_0.2-1.zip' Content type 'application/zip' length 1308321 bytes (1.2 MB) downloaded 1.2 MB package 憈opicmodels?successfully unpacked and MD5 sums checked The downloaded binary packages are in C:\Users\pi\AppData\Local\Temp\RtmpcvsU6M\downloaded_packages
[How can I install topicmodels package in R?]
2. To create a topic browser, you need to have:
- A model fit using
topicmodels::LDA
- The set of original tokens used to create the document term matrix, and the document ids these tokens are from
- The metadata of the documents, containing aid, headline, and date
Note:
the solution for problem of "Failed with error: ‘package ‘topicmodels
’ was built before R 3.0.0: please re-install it". Used the following sequence of commands from R console:
require(devtools)
install_url("http://cran.r-project.org/src/contrib/topicmodels_0.2-1.tar.gz")
require(topicmodels
)
ls("package:topicmodels
")
[
Failed with error: ‘package ‘sentiment’ was built before R 3.0.0: please re-install it’]
[topicmodels: An R Package for Fitting Topic Models]
但是这样安装topicmodel的R语言包会出错:ERROR: compilation failed for package 'topicmodels'3. The provided data file 'sotu' contains this data from the state of the union addresses. Make sure that the tokens are ordered in the way they appeared in the article
> data(sotu) > tokens = tokens[order(tokens$aid, tokens$id), ] > class(m) [1] "LDA_Gibbs" attr(,"package") [1] "topicmodels"4. With these data, you can create a topic browser as follows:> head(tokens) aid lemma word sentence pos offset id pos1 freq 20 111541965 it It 1 PRP 0 1 O 1 10 111541965 be is 1 VBZ 3 2 V 1 40 111541965 we our 1 PRP$ 6 3 O 1 39 111541965 unfinished unfinished 1 JJ 10 4 A 1 32 111541965 task task 1 NN 21 5 N 1 38 111541965 to to 1 TO 26 6 ? 1 > head(meta) id date medium headline 1 111541965 2013-02-12 Speeches Barack Obama 2 111541995 2013-02-12 Speeches Barack Obama 3 111542001 2013-02-12 Speeches Barack Obama 4 111542006 2013-02-12 Speeches Barack Obama 5 111542013 2013-02-12 Speeches Barack Obama 6 111542018 2013-02-12 Speeches Barack Obama
output = createTopicBrowser(m, tokens$lemma, tokens$aid, words=tokens$word, meta=meta)
## Writing html to /tmp/Rtmp7o5E48/topicbrowser_3f047fbf0d1e.html
## Preparing variables
## Rendering overview
## Rendering topic 1
## Rendering topic 2
## Rendering topic 3
## Rendering topic 4
## Rendering topic 5
## Rendering topic 6
## Rendering topic 7
## Rendering topic 8
## Rendering topic 9
## Rendering topic 10
## HTML written to /tmp/Rtmp7o5E48/topicbrowser_3f047fbf0d1e.html
You can also publish the output file directly using markdown::rpubsupload
:
library(markdown)
result = rpubsUpload("Example topic browser", output)
browseURL(result$continueUrl)
See the [the example](http://rpubs.com/vanatteveldt/topicbrowser) for a collection of State of the Union addresses.
[/topicbrowser]
All codes:
#download and install Rtools 3.1 from http://cran.r-project.org/bin/windows/Rtools/, then run
find_rtools()
if (!require(devtools)) {install.packages("devtools"); library(devtools)}
#install_github("vanatteveldt/topicbrowser")
library(topicbrowser)
#install.packages("topicmodels")
library(topicmodels)
topicmodels::LDA
data(sotu)
tokens = tokens[order(tokens$aid, tokens$id), ]
class(m)
head(tokens)
head(meta)
output = createTopicBrowser(m, tokens$lemma, tokens$aid, words=tokens$word, meta=meta)
输出:...
wordcloud
测试运行(simple.py)
1. 下载对应字体
2. windows下运行要修改font_path
wordcloud = WordCloud(font_path=r'C:\Windows\Fonts\DejaVuSansMono.ttf', ranks_only=True).generate(text)
相关设置(wordcloud.py)
FONT_PATH = os.environ.get("FONT_PATH", "/usr/share/fonts/truetype/droid/DroidSansMono.ttf") STOPWORDS = set([x.strip() for x in open(os.path.join(os.path.dirname(__file__), 'stopwords')).read().split('\n')])from: http://blog.csdn.net/pipisorry