问题
https://mp.weixin.qq.com/s/MGRIJAcSsePtMdLD6hwDwQ, 根据45文章的标题,生成有词云图。
词云
word cloud, 词云图,将word 展现在一张图上。图片中word 大小由word 的频率大小决定。一张好的词云图,可以让人快速了解到,词云图对应数据的所包含的重点。
一张词云图的绘制,可以简单分为三个步骤:
- 数据收集, 即获取需要进行词云图分析的数据
- 数据处理, 即数据清理,最终得到有效的词频
- 绘图
数据收集
总共45篇文章的标题,保存为文件:data.txt
A Novel Copolymer Poly(Lactide-co-b-Malic Acid).pdf
A novel miRNA identified in GRSF1 complex drives the metastasis via the PIK3R3_AKT_NF-百B and TIMP3_MMP9 pathways in cervical cancer cells.pdf
A novel microRNA identified in hepatocellular carcinomas is __responsive to LEF1 and facilitates proliferation and epithelial- mesenchymal transition via targeting of NFIX.pdf
B4GALT3 up-regulation by miR-27a contributes to the oncogenic.pdf
......
数据处理
数据导入
install.packages("tm") ## 安装tm(text mining)
library("tm")
### 读取数据, 生成一个字符向量
lines <- readLines("data.txt")
### 构建词料库
### VectorSource 从vector 构建Source object
txt_source <- VectorSource(lines)
docs <- tm::Corpus(txt_source)
#### inspect 可用来查看语料库信息
inspect(docs[1:3])
-------------------------------------------------
> inspect(docs[1:3])
<<SimpleCorpus>>
Metadata: corpus specific: 1, document level (indexed): 0
Content: documents: 3
[1] A Novel Copolymer Poly(Lactide-co-b-Malic Acid).pdf
[2] A novel miRNA identified