论文笔记：Leveraging network topology for better fake account detection in social networks

最新推荐文章于 2021-10-11 17:22:21 发布

麦地与诗人

最新推荐文章于 2021-10-11 17:22:21 发布

阅读量303

点赞数

分类专栏：异常检测

本文链接：https://blog.csdn.net/YPP0229/article/details/106224449

版权

28 篇文章 5 订阅

订阅专栏

用10个推文特征判断一条推文特征是怎样生成的：

分别是：

isReply ∈ {0, 1} indicates if a tweet is a reply
isRetweet ∈ {0, 1} 判断是否是转发
accountReputation given by number of followers divided by number of friends and followers
一条推文中hashtagdensity（#）,urldensity（http://）, mentiondensity(@)的数量
$\frac{出现的次数}{推文的字数}$
statusesPerDay 每天更新的状态数量
favoritesPerDay是每天被收藏的推文的数量
登录设备的类型 deviceType ∈ {web, mobile, app, bot, …}

该算法的性能不如单语言分类器。如果有足够的资源可用，那么更明智的做法是为每种语言训练一个单一语言分类器，用于识别自动生成的tweet，而不是使用多语言模型。该模型仅在两种语言的小数据集上进行了训练，如果使用其他语言，可能会表现得更好。此外，作者仅用另一种语言对模型进行了评估，可能需要更广泛的评估

举个例子，对于名人来说，有很粉丝关注他们，相比之下，被名人关注的对象就很少。

于是推测，bot account会大量关注别的账户，而它们一般不会拥有很多粉丝。

现实世界数据的特点： large, real-life class-imbalanced network dataset

generalized bot detection methods 比特定的 botnet specific methods要表现的好

a number of different supervised learning algorithms：

结果发现：

关注