主题模型TopicModel:主题模型可视化

http://blog.csdn.net/pipisorry

Browse LDA Topic Models
This package allows you to create a set of HTML files to browse a topic model.It creates a word cloud and time-graph per topic, and annotates a selection of documents with the topic for each word.
Installing 安装

R命令行中输入:

if (!require(devtools)) {install.packages("devtools"); library(devtools)}
install_github("vanatteveldt/topicbrowser")
library(topicbrowser)
第一步提示要安装Rtools :
Loading required package: devtools
WARNING: Rtools is required to build R packages, but is not currently installed.
Please 
download and install Rtools 3.1 from http://cran.r-project.org/bin/windows/Rtools/, then run 
> find_rtools()
[1] TRUE
...
安装完成后第一步就不会报错了

Note:

1. 当前我使用的R版本3.2.0, Rtools版本3.3会报错,要安装Rtools3.1!再看看官方的也是醉了!!!

[http://cran.r-project.org/bin/windows/Rtools/]

2. 注意Rtools和R版本的兼容,没有错误时的样子:

> if (!require(devtools)) {install.packages("devtools"); library(devtools)}
> install_github("vanatteveldt/topicbrowser")
Downloading github repo vanatteveldt/topicbrowser@master
Installing topicbrowser
"C:/PROGRA~1/R/R-32~1.0/bin/x64/R" --vanilla CMD INSTALL  \
  "C:/Users/pi/AppData/Local/Temp/RtmpcvsU6M/devtools11d0fc638d5/vanatteveldt-topicbrowser-cfa62a3"  \
  --library="C:/Users/pi/Documents/R/win-library/3.2" --install-tests 

* installing *source* package 'topicbrowser' ...
** R
** data
** inst
** preparing package for lazy loading
** help
*** installing help indices
** building package indices
** testing if installed package can be loaded
* DONE (topicbrowser)
Reloading installed topicbrowser
> library(topicbrowser)
>

Creating a topic browser

1. 先要安装R实现的topicmodel包

> install.packages("topicmodels")
Installing package into 慍:/Users/pi/Documents/R/win-library/3.2?(as 憀ib?is unspecified)
trying URL 'http://cran.rstudio.com/bin/windows/contrib/3.2/topicmodels_0.2-1.zip'
Content type 'application/zip' length 1308321 bytes (1.2 MB)
downloaded 1.2 MB

package 憈opicmodels?successfully unpacked and MD5 sums checked

The downloaded binary packages are in
	C:\Users\pi\AppData\Local\Temp\RtmpcvsU6M\downloaded_packages

[How can I install topicmodels package in R?]

2. To create a topic browser, you need to have:

  • A model fit using 
    topicmodels::LDA
  • The set of original tokens used to create the document term matrix, and the document ids these tokens are from
  • The metadata of the documents, containing aid, headline, and date

Note:

the solution for problem of "Failed with error: ‘package ‘topicmodels’ was built before R 3.0.0: please re-install it". Used the following sequence of commands from R console:

require(devtools)
install_url("http://cran.r-project.org/src/contrib/topicmodels_0.2-1.tar.gz")
require(topicmodels)
ls("package:topicmodels")
[ Failed with error: ‘package ‘sentiment’ was built before R 3.0.0: please re-install it’]

[topicmodels: Topic models]

[topicmodels: An R Package for Fitting Topic Models]

但是这样安装topicmodel的R语言包会出错:ERROR: compilation failed for package 'topicmodels'

3. The provided data file 'sotu' contains this data from the state of the union addresses. Make sure that the tokens are ordered in the way they appeared in the article

> data(sotu)
> tokens = tokens[order(tokens$aid, tokens$id), ]

> class(m)
[1] "LDA_Gibbs"
attr(,"package")
[1] "topicmodels"

> head(tokens)
         aid      lemma       word sentence  pos offset id pos1 freq
20 111541965         it         It        1  PRP      0  1    O    1
10 111541965         be         is        1  VBZ      3  2    V    1
40 111541965         we        our        1 PRP$      6  3    O    1
39 111541965 unfinished unfinished        1   JJ     10  4    A    1
32 111541965       task       task        1   NN     21  5    N    1
38 111541965         to         to        1   TO     26  6    ?    1
> head(meta)
         id       date   medium     headline
1 111541965 2013-02-12 Speeches Barack Obama
2 111541995 2013-02-12 Speeches Barack Obama
3 111542001 2013-02-12 Speeches Barack Obama
4 111542006 2013-02-12 Speeches Barack Obama
5 111542013 2013-02-12 Speeches Barack Obama
6 111542018 2013-02-12 Speeches Barack Obama
4. With these data, you can create a topic browser as follows:

output = createTopicBrowser(m, tokens$lemma, tokens$aid, words=tokens$word, meta=meta)
## Writing html to /tmp/Rtmp7o5E48/topicbrowser_3f047fbf0d1e.html
## Preparing variables
## Rendering overview
## Rendering topic 1
## Rendering topic 2
## Rendering topic 3
## Rendering topic 4
## Rendering topic 5
## Rendering topic 6
## Rendering topic 7
## Rendering topic 8
## Rendering topic 9
## Rendering topic 10
## HTML written to /tmp/Rtmp7o5E48/topicbrowser_3f047fbf0d1e.html

You can also publish the output file directly using markdown::rpubsupload:

library(markdown)
result = rpubsUpload("Example topic browser", output)
browseURL(result$continueUrl)

See the [the example](http://rpubs.com/vanatteveldt/topicbrowser) for a collection of State of the Union addresses.

[vanatteveldt/topicbrowser]

All codes:

#download and install Rtools 3.1 from http://cran.r-project.org/bin/windows/Rtools/, then run 
find_rtools()
if (!require(devtools)) {install.packages("devtools"); library(devtools)}
#install_github("vanatteveldt/topicbrowser")
library(topicbrowser)
#install.packages("topicmodels")
library(topicmodels)
topicmodels::LDA
data(sotu)
tokens = tokens[order(tokens$aid, tokens$id), ]
class(m)
head(tokens)
head(meta)
output = createTopicBrowser(m, tokens$lemma, tokens$aid, words=tokens$word, meta=meta)
输出:...



wordcloud

测试运行(simple.py)

1. 下载对应字体

2. windows下运行要修改font_path

wordcloud = WordCloud(font_path=r'C:\Windows\Fonts\DejaVuSansMono.ttf', ranks_only=True).generate(text)

相关设置(wordcloud.py)

FONT_PATH = os.environ.get("FONT_PATH", "/usr/share/fonts/truetype/droid/DroidSansMono.ttf")
STOPWORDS = set([x.strip() for x in open(os.path.join(os.path.dirname(__file__), 'stopwords')).read().split('\n')])
from: http://blog.csdn.net/pipisorry


评论
添加红包

请填写红包祝福语或标题

红包个数最小为10个

红包金额最低5元

当前余额3.43前往充值 >
需支付:10.00
成就一亿技术人!
领取后你会自动成为博主和红包主的粉丝 规则
hope_wisdom
发出的红包
实付
使用余额支付
点击重新获取
扫码支付
钱包余额 0

抵扣说明:

1.余额是钱包充值的虚拟货币,按照1:1的比例进行支付金额的抵扣。
2.余额无法直接购买下载,可以购买VIP、付费专栏及课程。

余额充值