Twitter朋友关系挖掘

转载:http://blog.ouseful.info/2010/09/page/2/

OUseful.Info, the blog…

Trying to find useful things to do with emerging technologies in open education

Archive for September 2010

Digging Deeper into the Structure of My Twitter Friends Network: Librarian Spotting

with 2 comments

A couple of days ago, I grabbed the Twitter friends lists of all my Twitter friends (that is, lists of all the people that the people I follow on Twitter follow…) and plotted the connections between them filtered through the people I follow (Small World? A Snapshot of How My Twitter “Friends” Follow Each Other…). That is, for all of the people I follow on Twitter, I plotted the extent to which they follow each other… got that?

Running the resulting network through Gephi’s modularity statistic (some sort of clustering algorithm; I really need to find out which), several distinct clusters of people turned up: OU folk, data journalism folk, ed techies, JISC/Museums/library folk, and open gov data folk.

(Gephi allows you to export the graph file for the current project, including annotations, if appropriate, (such as modularity class) that are added by running Gepi’s statistics. Extracting the list of nodes (i.e. Twitter users), and filtering them by modularity class means we can create separate lists of individuals based on which cluster they appear in; which in turn means that we could generate a Twitter list from those individuals.)

From my “curated” list of Twitter friends, we can identify a set of “OU twitterers” through a cluster analysis of the mass action of their own friending behaviour, and I could use this to automatically generate a Twitter list of (potential) OU Twitterers that other people can follow.

Here’s the total set of my followers, coloured by modularity class and sized by in-degree (that is, the number of my friend who follow that person).

My Twitter friends, coloured by modularity class

If we filter on modularity class, we can just look at the folk in what I have labelled “OU Twitterers”. There areone or two folk in there who donlt quite fit this label (e.g. University of Leicester folk, and a handful of otherwise “disconnected” folk…), but it’s not bad.

OU Twitterers

Note that if I grab the complete friends and followers lists of these individuals, and look for users who are commonly followed, who also tend to follow back, and who donlt have huge numbers of followers (ie they aren’t celebrities who automatically follow back…) I may discover other OU Twitterers that I don’t follow…

If we run the modularity stat over this group of people, the “OU Twitterers” (most easily done by generating a new workspace from the filtered group), we see three more partitions fall out. Broadly, this first one corresponds to OU Library folk (ish…):

OU LIbrary folk...

Twitterers from my faculty (several whom rarely, if ever, tweet):

Twitterers I follow in my faculty

And the rest (the vast majority, in fact):

OU folk

(Note that a coule of folk are completely disconnected, and have nothing to do with the OU…)

Running the modulraity class over this larger group turns up nothing of interest.

So… so what? So this. Firstly, I can mine the friends lists of the friends of arbitrary people on Twitter and pull out clusters from that may tell me something about the interests of those people. (For example, we might grab their twitter biography statements and run them through a word cloud as a first approximation; or grab their recent tweets and do some text mining on that to see if there is any common interest. Hashtag analysis might also be revealing…) Secondly, we could use the members of cluster to act as a first approximation for a list of connected members of a community interested in a particular topic area; for these community members we could then pull down lists of all their friends and followers and look to see if we can grow the list through other commonly connected to individuals.

PS after tweeting the original post, a couple of people asked if I could grab the data from their friends lists. For example, @neilkod’s turned up clusters relating to “Utah tweeps, my cycling ones, and of course data/#rstats.” So the approach appears to work in general…:-)

  • 0
    点赞
  • 0
    收藏
    觉得还不错? 一键收藏
  • 0
    评论
评论
添加红包

请填写红包祝福语或标题

红包个数最小为10个

红包金额最低5元

当前余额3.43前往充值 >
需支付:10.00
成就一亿技术人!
领取后你会自动成为博主和红包主的粉丝 规则
hope_wisdom
发出的红包
实付
使用余额支付
点击重新获取
扫码支付
钱包余额 0

抵扣说明:

1.余额是钱包充值的虚拟货币,按照1:1的比例进行支付金额的抵扣。
2.余额无法直接购买下载,可以购买VIP、付费专栏及课程。

余额充值