kaggel[6] - recommend missing links in a social network

比赛地址:http://www.kaggle.com/c/FacebookRecruiting

数据集很简单。

训练集(train): 两列 (source_node, destination_node) ---source follow destination

测试集(test):一列(source_node)。 对每个source_node, 预测10个destination_node。(预测他follow的10个好友关系)

衡量的标准是Mean Average Precision,具体参考比赛链接。。。


先说说思路吧。由于数据简单的很,第一想法就是bfs咯,找到离source_node最近的点作为预测。第二个是可以用random walk,根据train算转移概率,然后对每个source_node可以求出稳定的时候位于每个node的概率,再排下序即可。第三个的话可以考虑下推荐的模型,毕竟最后是要找出top10的missing edge。最后的话,还是可以做成分类问题,前提条件是得创造正负样本集,然后根绝输出概率,排序。


从大家的尝试来看,第一个思路只是一个最基本的可以作为一个benchmark。第二个思路有好结果(6th),也有不好结果的。第三个思路他们主要用到的办法是edgerank,第四个的话属于各显神通吧。下面看看第一名的做法:

1. candidates selection。 他用了多个edgerank,来对每个source_node提取前30的destination_node(前后不一定有follow关系)。

2. 对每个关系对(A,B)  建立若干特征,主要有:A 是否 follow B、A和B的一些相似性特征 以及其他一些特征。

3. 构造训练集。每个样本是2中的一个关系对,如果A确实followB,那么标记为1,否则为0(就是2中某个特征)。他这里的要随机删掉4%的边,是为了使得training data更加robust,表示没有看懂==、

4. 最后就是把训练集丢到模型里去了。主要用了:MatrixNet(据说不是公开的)、GBM、RanfomForest。


最后看看别人用到的特征:

Existence of a reverse link between nodes. (1=yes/0=no)
Count of forward-forward links between nodes.
Count of forward-reverse links between nodes.
Count of forward-bidirectional links between nodes.
Count of reverse-forward links between nodes.
Count of reverse-reverse links between nodes.
Count of reverse-bidirectional links between nodes.
Count of bidirectional-forward links between nodes.
Count of bidirectional-reverse links between nodes.
Count of bidirectional-bidirectional links between nodes.
Count of common neighbors.
Number of links ending at the node to be predicted / Number of links starting at the node to be predicted.
Number of links starting at the node to be ranked.
Number of links ending at the node to be ranked.
Number of links ending at the node to be ranked / Number of links starting at the node to be ranked.
Count of common neighbors / Count of all neighbors.
Count of paths with exactly three links between nodes / Count of paths with exactly three links from node to be predicted to any node.
Count of forward-forward-forward links between nodes / Count of all length three paths between nodes.
Count of forward-forward-reverse links between nodes / Count of all length three paths between nodes.
Count of forward-reverse-forward links between nodes / Count of all length three paths between nodes.
Count of forward-reverse-reverse links between nodes / Count of all length three paths between nodes.
Count of reverse-forward-forward links between nodes / Count of all length three paths between nodes.
Count of reverse-forward-reverse links between nodes / Count of all length three paths between nodes.
Count of reverse-forward-forward links between nodes / Count of all length three paths between nodes.
Count of reverse-forward-reverse links between nodes / Count of all length three paths between nodes.
Count of reverse-reverse-forward links between nodes / Count of all length three paths between nodes.
Count of reverse-reverse-reverse links between nodes / Count of all length three paths between nodes.
Average length of all unique paths from a node to its immediate successors.
Average length of all unique paths from a node to its immediate predecessors.


  • 0
    点赞
  • 0
    收藏
    觉得还不错? 一键收藏
  • 1
    评论
评论 1
添加红包

请填写红包祝福语或标题

红包个数最小为10个

红包金额最低5元

当前余额3.43前往充值 >
需支付:10.00
成就一亿技术人!
领取后你会自动成为博主和红包主的粉丝 规则
hope_wisdom
发出的红包
实付
使用余额支付
点击重新获取
扫码支付
钱包余额 0

抵扣说明:

1.余额是钱包充值的虚拟货币,按照1:1的比例进行支付金额的抵扣。
2.余额无法直接购买下载,可以购买VIP、付费专栏及课程。

余额充值