Social Network Analysis

http://www.rdatamining.com/examples/social-network-analysis

Social Network Analysis

This post presents an example of social network analysis with R using package igraph. 

The data to analyze is Twitter text data of @RDataMining used in  the example of Text Mining, and it can be downloaded as file "termDocMatrix.rdata" at  the Data webpage. Putting it in a general scenario of social networks, the terms can be taken as people and the tweets as groups on LinkedIn, and the term-document matrix can then be taken as the group membership of people. We will build a network of terms based on their co-occurrence in the same tweets, which is similar with a network of people based on their group memberships.

At first, a term-document matrix, termDocMatrix, is loaded into R. After that, it is transformed into a term-term adjacency matrix, based on which a graph is built. Then we plot the graph to show the relationship between frequent terms, and also make the graph more readable by setting colors, font sizes and transparency of vertices and edges.

Load Data

> # load termDocMatrix
> load("data/termDocMatrix.rdata")
> # inspect part of the matrix
> termDocMatrix[5:10,1:20]



Note that the above  termDocMatrix is a standard matrix, instead of a term-document matrix under the framework of text mining. To try the code with your own term-document matrix built with the  tm package, you need to run the code below before going to the next step.

termDocMatrix  <- as.matrix( termDocMatrix )

Transform Data into an Adjacency Matrix

> # change it to a Boolean matrix
> termDocMatrix[termDocMatrix>=1] <- 1
> # transform into a term-term adjacency matrix
> termMatrix <- termDocMatrix %*% t(termDocMatrix)
> # inspect terms numbered 5 to 10
> termMatrix[5:10,5:10]

Build a Graph

Now we have built a term-term adjacency matrix, where the rows and columns represents terms, and every entry is the number of co-occurrences of two terms. Next we can build a graph with graph.adjacency() from package igraph.

> library(igraph)
> # build a graph from the above matrix
> g <- graph.adjacency(termMatrix, weighted=T, mode = "undirected")
> # remove loops
> g <- simplify(g)
> # set labels and degrees of vertices
> V(g)$label <- V(g)$name
> V(g)$degree <- degree(g)

Plot the Graph

> # set seed to make the layout reproducible
> set.seed(3952)
> layout1 <- layout.fruchterman.reingold(g)
> plot(g, layout=layout1)


A different layout can be generated with the first line of code below. The second line produces an interactive plot, which allows us to manually rearrange the layout. Details about other layout options can be obtained by running ?igraph::layout in R.

> plot(g, layout=layout.kamada.kawai)
> tkplot(g, layout=layout.kamada.kawai)

Make it Look Better

Next, we will set the label size of vertices based on their degrees, to make important terms stand out. Similarly, we also set the width and transparency of edges based on their weights. This is useful in applications where graphs are crowded with many vertices and edges. In the code below, the vertices and edges are accessed with V() and E(). Function rgb(red, green, blue, alpha) defines a color, with an alpha transparency. We plot the graph in the same layout as the above figure.

> V(g)$label.cex <- 2.2 * V(g)$degree / max(V(g)$degree)+ .2
> V(g)$label.color <- rgb(0, 0, .2, .8)
> V(g)$frame.color <- NA
> egam <- (log(E(g)$weight)+.4) / max(log(E(g)$weight)+.4)
> E(g)$color <- rgb(.5, .5, 0, egam)
> E(g)$width <- egam
> # plot the graph in layout1
> plot(g, layout=layout1)


More Examples

More examples on social network analysis with R and other data mining techniques can be found in my book " R and Data Mining: Examples and Case Studies", which is downloadable as a .PDF file at the link.



### 使用 Jupyter Notebook 进行网络分析 #### 安装必要的库 为了在网络分析中使用 Python 和 Jupyter Notebook,安装一些常用的库是非常重要的。这些库包括 `networkx` 用于创建、操作和研究复杂网络结构的属性;以及 `matplotlib` 或者 `plotly` 来可视化图形。 ```bash pip install networkx matplotlib plotly jupyter ``` #### 创建简单的无向图并绘制它 下面是一个简单例子来展示如何利用 NetworkX 库构建一个基本的无向图,并通过 Matplotlib 将其显示出来: ```python import networkx as nx import matplotlib.pyplot as plt # 初始化一个新的空图表对象 G = nx.Graph() # 添加节点到图表中 nodes_list = ['A', 'B', 'C'] G.add_nodes_from(nodes_list) # 向图表添加边 edges_list = [('A', 'B'), ('B', 'C')] G.add_edges_from(edges_list) # 绘制图表 plt.figure(figsize=(8,6)) nx.draw(G, with_labels=True, node_color='skyblue') plt.show() ``` 此代码片段展示了怎样定义一组顶点 (即节点),接着指定哪些顶点之间存在连接关系(即边)[^1]。 #### 加载实际数据集进行更深入的研究 对于更加复杂的案例,则可以从外部文件加载真实世界的数据集来进行探索性的数据分析。例如可以读取 CSV 文件中的社交网络联系人列表作为输入源,进而建立相应的加权有向图模型。 ```python import pandas as pd df = pd.read_csv('social_network.csv') # 假设有一个CSV文件名为 social_network.csv DG = nx.from_pandas_edgelist(df, source='source_column_name', target='target_column_name', edge_attr=['weight'], create_using=nx.DiGraph()) ``` 这里假设 CSV 文件内含有两列分别代表每条记录里的起点(source)终点(target), 可能还有一列表示权重(weight). 上述命令会依据给定参数自动转换成适合进一步处理的形式.
评论
添加红包

请填写红包祝福语或标题

红包个数最小为10个

红包金额最低5元

当前余额3.43前往充值 >
需支付:10.00
成就一亿技术人!
领取后你会自动成为博主和红包主的粉丝 规则
hope_wisdom
发出的红包
实付
使用余额支付
点击重新获取
扫码支付
钱包余额 0

抵扣说明:

1.余额是钱包充值的虚拟货币,按照1:1的比例进行支付金额的抵扣。
2.余额无法直接购买下载,可以购买VIP、付费专栏及课程。

余额充值