网络表示学习
陈维政,张 岩,李晓明
(北京大学信息科学技术学院 北京 100871)
4 评测方法和应用场景
可以从定量和定性两个角度评测网络表示学习算法的性能,为此需要结合具体的应用场景。在当前的网络表示学习的相关工作中,最常见的定量评测方法是节点标签预测任务,表1列举了一些常见的网络数据及其对应的节点和标签。
表1 网络数据示例
在标签预测任务中,网络里部分节点的标签是已知的,目的是推断出其他节点的标签。通常的过程是首先学习出每个节点的向量表示,然后利用已知标签的节点训练分类器(常用的分类器有logistics regression和SVM等),然后以其余节点的向量表示作为输入,预测它们的标签。如果在网络表示学习的过程中利用到了标签信息,通常可以提高标签预测任务的性能[15,45],即半监督的网络表示学习可能会优于无监督的网络表示学习,这是因为前者学习到的网络向量表示是和预测任务相关的。
常用的定性评测方法则是对网络进行可视化展示[6,9,31,45],具体做法是对节点的向量表示进行降维,从而将网络中的节点都映射到二维空间上的一个点。通常需要节点标签信息,在可视化展示中,标签相同的节点之间距离应该尽可能小。常用的高维向量降维工具有t-SNE[45]、PE[20]、PCA等。参考文献[31]给出了利用t-SNE对一个论文合作网络进行可视化的样例[31],其中每一个节点代表一个用户,每一种颜色代表一个研究领域,同一颜色的节点越集中则表明网络表示学习算法的性能越好。
网络表示学习中一个重要的参数是向量空间的维度k,在实验环节需要调整参数k的设置。k的值过小,则学习到的节点向量的表示能力不足;k的值过大,算法的计算量会大大增加。在概率生成式算法中,k就是主题或者社团的个数。在基于深度学习的网络表示学习算法中,k的取值一般是3位数。
5 结束语
本文总结了网络表示学习的主要方法,并特别介绍了基于深度学习的网络表示学习的最新进展。深度学习在社会网络分析领域的应用还处于方兴未艾的探索阶段,但已有工作的结果是令人感到鼓舞的,基于深度学习的网络表示学习算法(如Deepwalk、LINE)在网络节点标签预测任务上的表现,已经超越了传统的基于谱的方法[13]、基于最优化的方法[46]。然而,这两个算法缺少在其他社会网络分析任务中的应用案例,普适性不足。基于以上分析,本文设想了未来两个值得探索的研究方向。
虽然标签预测任务十分常见,但是SNA领域还有很多其他重要的任务,比如影响力最大化、链接预测、社团检测等,这些任务和标签预测任务有着很大的不同:一方面,可以考虑针对特定的社会网络分析任务,设计网络表示学习模型;另一方面,无监督的网络表示学习算法是否对多种任务具有普适性也值得探究。
现实中存在着大量的异构网络,而且节点都对应着丰富的标签、文本、图像、语音等多媒体信息。如何使基于深度学习的网络表示学习算法可以同时利用网络的结构和节点自身的特征信息,是一个重要的问题。这涉及如何设计网络结构,如何从网络中采样节点对等细节。TADW算法通过和Deepwalk等价的矩阵分解方法,在同构网络的表示学习中融入了节点自身的文本信息,这也为从矩阵分解的视角看待网络表示学习的扩展提供了有益的启示。
参考文献
[1] Mairal J, Ponce J, Sapiro G, et al. Superviseddictionary learning. Proceedings of the 2009 Conference on Neural InformationProcessing Systems, Vancouver, Canada, 2009:1033~1040
[2] Roweis S T, Saul L K. Nonlinear dimensionalityreduction by locally linear embedding. Science, 2000, 290(5): 2323~2326
[3] Hyvärinen A, Oja E. Independent componentanalysis: algorithms and applications. Neural networks, 2000, 13(4~5): 411~430
[4] Lee H, BattleA, Rain R, et al. Efficient sparsecoding algorithms. Proceedings of the 2006 Conference on Neural InformationProcessing Systems. Vancouver, Canada, 2006:801~808
[5] Bengio Y, Courville A, Vincent P. Representationlearning: a review and new perspectives. IEEE Transactions on Pattern Analysisand Machine Intelligence, 2013, 35(8): 1798~1828
[6] Chen M, Yang Q, Tang X O. Directed graphembedding. Proceedings of the20th International Joint Conference on ArtificialIntelligence (IJCAI), Hyderabad, India, 2007:2707~2712
[7] Kannan R,Vempala S. Spectral algorithms.Theoretical Computer Science, 2009, 4(3~4):157~288
[8] Brand M, Huang K. A unifying theorem forspectral embedding and clustering. Proceedings of the Ninth InternationalConference onWorkshop on Artificial Intelligence and Statistics, Florida, USA,2003
[9] Le T, Lauw H W. Probabilistic Latent DocumentNetwork Embedding. Proceedings of 2014 IEEE International Conference on DataMining (ICDM), Shenzhen, China, 2014:270~279
[10] Wojciech C, Brooks M J. A note on the locallylinear embedding algorithm. International Journal of Pattern Recognition andArtificial Intelligence, 2009, 23(8): 1739~1752
[11] Belkin M, Niyogi P. Laplacian eigenmaps andspectral techniques for embedding and clustering. Proceedings of AnnualConference on Neural Information Processing Systems(NIPS), 2001:585~591
[12] Tang L, Liu H. Relational learning via latentsocial dimensions. Proceedings of the 15th ACM SIGKDD International Conferenceon Knowledge Discovery and Data Mining,Paris, France, 2009:817~826
[13] Newman M. Modularity and community structure innetworks. Proceedings of the National Academy of Sciences, 2006, 103(23):8577~8582
[14] Zhou D Y, Huang J Y, Schölkopf B. Learning fromlabeled and unlabeled data on a directed graph. Proceedings of the 22ndInternational Conference on Machine Learning, Bonn, Germany, 2005: 1036~1043
[15] Jacob Y, Denoyer L, Gallinari P. Learninglatent representations of nodes for classifying in heterogeneous socialnetworks. Proceedings of the 7th ACM International Conference on Web Search andData Mining, New York, USA, 2014: 373~382
[16] Yang J, Leskovec J. Modeling informationdiffusion in implicit networks. Proceedings of 2010 IEEE 10th InternationalConference on Data Mining (ICDM), Sydney, Australia, 2010: 599~608
[17] Bourigault S, Lagnier C, Lamprier S, et al. Learning social network embeddings for predicting information diffusion.Proceedings of the 7th ACM International Conference on Web Search and DataMining, New York, USA, 2014: 393~402
[18] Nallapati R, Ahmed A, Xing E, et al. Jointlatent topic models for text and citations. Proceedings of the 14th ACM SIGKDDInternational Conference on Knowledge Discovery and Data Mining, Las Vegas,USA, 2008: 542~550
[19] Chang J, Blei D. Relational topic models fordocument networks. International Conference on Artificial Intelligence andStatistics, Clearwater Beach, Florida, USA, 2009: 81~88
[20] Iwata T, Saito K, Ueda N, et al. Parametricembedding for class visualization. Neural Computation, 2007, 19 (9): 2536~2556
[22] Gopalan P, Blei D. Efficient discovery ofoverlapping communities in massive networks. Proceedings of the NationalAcademy of Sciences, 2013,110 (36): 14534~14539
[22] Gopalan P, Mimno D, Gerrish S, et al. Scalableinference of overlapping communities. Proceedings of the 2012 Conference onNeural Information Processing Systems, Lake Tahoe, USA, 2012: 2249~2257
[23] Hu Z T, Yao J J, Cui B, et al. Community leveldiffusion extraction. Proceedings of the 2015 ACM SIGMOD InternationalConference on Management of Data, Melbourne, Victoria, Australia, 2015:1555~1569
[24] Kobourov S. Spring embedders and force directedgraph drawing algorithms. arXiv preprint 2012, arXiv:1201.3011, 2012
[25] Fruchterman T, Reingold E. Graph drawing byforce-directed placement. Software-Practice & Experience, 1991, 21(11):1129~1164
[26] Kamada T, Satoru Kawai. An algorithm fordrawing general undirected graphs. Information Processing Letters, 1989, 31(1):7~15
[27] Bastian M, Heymann S, Jacomy M. Gephi: an opensource software for exploring and manipulating networks. Proceedings of theThird International Conference on Weblogs and Social Media, San Jose,California, USA, 2009: 361~362
[28] Ellson J, Gansner E, Koutsofios L, et al.Graphviz-open source graph drawing tools. Graph Drawing. Berlin Heidelberg:Springer, 2002
[29] Bengio Y, Goodfellow I, Courville A. DeepLearning, 2015
[30] Perozzi B, Al-Rfou R, Skiena S. Deepwalk:online learning of social representations. Proceedings of the 20th ACM SIGKDDInternational Conference on Knowledge Discovery and Data Mining. New York City,USA, 2014: 701~710
[31] Tang J, Qu M, Wang M Z, et al. LINE:large-scale information network embedding. Proceedings of the 24thInternational Conference on World Wide Web, Florence, Italy, 2015: 1067~1077
[32] Mikolov T, Sutskever I, Chen K, et al.Distributed representations of words and phrases and their compositionality.Proceedings of the 2013 Conference on Neural Information Processing Systems,Lake Tahoe, USA, 2013: 3111~3119
[33] Mikolov T, Chen K, Corrado G, et al. Efficientestimation of word representations in vector space. arXiv preprintarXiv:1301.3781, 2013
[34] Mikolov T, Yih W T, Zweig G. Linguisticregularities in continuous space word representations. Proceedings of the 2013Conference on NAACL and SEM, Atlanta, USA, 2013: 746~751
[35] Hinton G E. Learning distributedrepresentations of concepts. Proceedings of the Eighth Annual Conference on theCognitive Science Society, Amherst, Mass, USA, 1986: 1~12
[36] Bengio Y, Ducharme R, Vincent P, et al. Aneural probabilistic language model. The Journal of Machine Learning Research,2003(3): 1137~1155
[37] Morin F, Bengio Y. Hierarchical probabilisticneural network language model. Proceedings of the 10th International WorkshopConference on Artificial Intelligence and Statistics, Barbados, 2005: 246~252
[38] Collober R, Weston J. A unified architecturefor natural language processing: deep neural networks with multitask learning.Proceedings of the 25th International Conference on Machine Learning, Helsinki,Finland, 2008: 160~167
[39] Gutmann M, Hyvärinen A. Noise-contrastiveestimation: a new estimation principle for unnormalized statistical models.Proceedings on International Conference on Artificial Intelligence andStatistics, Sardinia, Italy, 2010: 297~304
[40] Yang C, Liu Z Y. Comprehend deepwalk as matrixfactorization. arXivpreprint arXiv:1501.00358, 2015
[41] Goldberg Y, Levy O. word2vec Explained:deriving Mikolov et al.'s negative-sampling word-embedding method. arXivpreprint arXiv:1402.3722, 2014
[42] Li Y T, Xu L L, Tian F, et al. Word embeddingrevisited: anew representation learning and explicit matrix factorizationperspective. Proceedings of the 24th International Joint Conference onArtificial Intelligence, Buenos Aires, Argentina, 2015: 3650~3656
[43] Yang C, Liu Z Y, Zhao D L, et al. Networkrepresentation learning with rich text information. Proceedings of the 24thInternational Joint Conference on Artificial Intelligence, Buenos Aires,Argentina, 2015:2111~2117
[44] Yu H F, Jain P, Kar P, et al. Large-scalemulti-label learning with missing labels. arXiv preprint arXiv:1307.5101, 2013
[45] Tang J, Qu M, Mei Q Z. PTE: predictive textembedding through large-scale heterogeneous text networks. Proceedings of the21st ACM SIGKDD Conference on Knowledge Discovery and Data Mining, Sydney,Australia, 2015
[46] Ahmed A, Shervashidze N, Narayanamurthy S,et al. Distributed large-scale natural graph factorization. Proceedings of the22nd International Conference on World Wide Web, Rio, Brazil, 2013: 37~48