Vector space 向量空间
node classification节点分类
link prediction边预测
community detection社群检测
case study 小样本任务,案例分析(比如77个点 254条边)
word representations 词表示
language model 语言模型 :预测句子在语言出现的概率
One-hot representation:独热编码 最简单的词向量,用数字表示词
语义鸿沟问题 词与词的相似度没办法求
维数灾难、稀疏
无法表示未出现的词汇
Distributed Word Representation 分布式表示:词的含义相近,距离也较近
不同语言构建的词向量空间,对于语义相近的词,在向量空间的位置相似,说明词向量的构造与语言的表示形式无关
data arguements 数据增强
backpropagtion through time 通过时间反向传播
gradient vanishing :梯度消失 训练过程非常慢
word hasing 词哈希技术
hidden markov model 隐马尔可夫模型
initial probabilities初始隐状态概率
conditional random field 条件随机场
characteristic function特征函数
node2vec:scalable feature learning for networks 大规模网络节点的表征学习
downstreaming tasks 下游任务
first-order proximity一阶相似度:节点间的局部相似度,不足以表征整个网络结构
second-order proximity 二阶相似度
word analogy 单词推理
document classification文本分类
node classification 节点分类
visualization 可视化
语料库
研究背景:
许多实际应用场景中的数据是从非欧式空间生成的,如何将深度学习方法应用在图数据难度大,主要难点在于图的不规则性(无序节点、邻居数量不同)
应用领域广泛:电子商务、金融风控、推荐系统
描述任务:
Sequence tagging including part of speech tagging (POS), chunking, and named entity recogni-tion (NER) has been a classic NLP task. 序列标注问题包括词性标注,命名实体识别,它是个经典的NLP任务
given a network/graph G=(V,E,W),where V is the set of nodes,E is the set of edges between the nodes,and W is the set of weights of the edges,the goal of node embedding is to represent each node i with a vecctor ,which preserves the structure of networks.
结点编码是通过向量表达每个节点,且隐含网络的结构信息
This paper studies the problem of embedding very large information networks into low-dimensional vector spaces, which is useful in many tasks such as visualization, node classification, and link prediction.
这篇论文研究如何将大规模信息网络用低位向量空间表示,这在很多任务非常有用
前人工作综述
Most existing sequence tagging models are linear statistical models which include Hidden Markov Models (HMM), Maximum entropy Markoy models (MEMMs) McCallum et al.
2000). and Conditional Random Fields (CRF)
大多数模型是线性模型包括HMM,MEMMs和CRF
Many current NLP systems and techniques treat words as atomic units - there is no notion of similarity berween