英文是纯手打的!论文原文的summarizing and paraphrasing。可能会出现难以避免的拼写错误和语法错误,若有发现欢迎评论指正!文章偏向于笔记,谨慎食用
目录
2.4. Uncertainty Quantification for Node Classification
2.4.2. Graph Posterior Network
2.4.3. Uncertainty Estimation Guarantees
1. 省流版
1.1. 心得
(1)2.2.的“聚合步骤通常假设网络同质性”,的确,促进的也正聚合相反的也正聚合会有问题,不过相关性不存在这个问题。“这样破坏了独立同分布假设”
2. 论文逐段精读
2.1. Abstract
①GNN cand Label Propagation (LP) can both predict nodes
②Lack of uncertain non-independent node-level predictions
axiom n. [数]公理;公设;原理
2.2. Introduction
①Uncertainty can be categorized by aleatoric and epistemic uncertainty, where aleatoric uncertainty (AU) represents irreducible and epistemic uncertainty (EU) means lack of accurate data
②Uncertainty estimation is applied on out-of-distribution (OOD) or shift detection, active learning, continual learning and reinforcement learning
③⭐"The aggregation step commonly assumes network homophily"
④They derive three axioms, propose Graph Posterior Network (GPN), and build a general uncertainty assessment
aleatoric adj. 任意的;偶然的 epistemic adj. 知识的;认识的,与认识有关的
account for (数量或比例上)占;解释,说明(某事);导致,解释(某种事实或情况);(某人)对(行动、政策等)负有责任;将(钱款)列入(预算)
2.3. Related Work
(1)Uncertainty for i.i.d. inputs
(2)Uncertainty for graphs
2.4. Uncertainty Quantification for Node Classification
①They set graph , adjacency matrix , node attribute matrix
②Node set , where are labelled and are unlabelled nodes
③Task: infer the node label of
2.4.1. Axioms
①Overview of axioms:
which aims at research uncertainty situation and network effects
②这一节很难很抽象因此我中文简单叙述
(1)Axiom 3.1.(网络效应)
①Self-attributes based analysis should assign more uncertainty on nodes with high difference between trained nodes
②如果一个节点有异常特征,要么用无网络影响的先验经验去分类它,要么远离独立同分布的训练数据(好绕口啊,也不知道有没有更简单的解释)
(2)Axiom 3.2.(节点外扩)
①Epistemical certain in one node w/o network effects epistemical certain in its neighbors w/ network effects, keeping all other conditions same
②同质图中,对当前节点预测的epistemical certain高它会将它的高置信度也传递给邻居。但要是当前节点有异常特征邻居也会受影响,不那么容易确定
③For non-attributed graph/plain graph (graphs which do not contain node attributes), they still hope there are confidence influence on neighbors
(3)Axiom 3.3.(邻居聚合)
①High aleatoric uncertainty on neighbors of one node w/o network effects High aleatoric uncertainty on that node w/ network effects, keeping all other conditions same
②如果对节点的邻居分类都已经不明确了就别再分类当前节点了
③如过邻居的分类是冲突的(盲猜不同类别,此时当前节点可能位于边界上)当前节点的分类也挺难的
④一个节点具有high aleatoric uncertainty,此时它邻居可能是既有②low aleatoric uncertainty也有③不同类别,这样会导致这个处于中间的节点非常难做人
anomalous adj. 异常的;不规则的;不恰当的
2.4.2. Graph Posterior Network
①Bayesian concentrates on uncertain samples
②Bayesian straightly update single categorical distribution . And the natural choice for a prior distribution over is its conjugate prior, such as Dirichlet distribution with
③Bayesian uptate at given observation :
then the posterior distribution with posterior parameter and class counts
④They solve the AU and EU problems by Dirichlet mean and the total evidence count
⑤"The aleatoric uncertainty is commonly measured by the entropy of the categorical distribution, such as "
⑥The epistemic uncertainty can be measured by the total evidence count or Dirichlet differential entropy
⑦For classification, the specific class label in also shows significance. It usually set in Dirichlet prior to 1 and then predict/update to get posterior with posterior parameter where the can be regarded as class pseudo-counts
(1)Bayesian Update for Interdependent Inputs
①Their improvement: they diffuse predicted by independent node classification task to based on features of neighbors(相当于就是无网络结构的用然后有网络结构的用啦)
②Schematic od GPN:
left: the total feature evidence and are EU and AU only based on node features;
middle: Personalized Page Rank (PPR) message passing to get aggregated class pseudo-counts, where with are the dense PPR scores implicitly reflecting the importance of node on (但作者又说他们是用power iteration similarly来替代的PPR,为啥要替代,计算成本太大吗?不造,没提). Furthermore, PPR only utilize edge connections and only use node features. Then the authors combined them two to:
right: Bayesian updating.
③Loss function with Bayesian loss:
with regularization factor
2.4.3. Uncertainty Estimation Guarantees
①They test a parameterized GPN model with "a (feature) encoder with piecewise ReLU activations, a PPR diffusion, and a density estimator "
2.4.4. Limitations & Impact
(1)OOD data close to ID data
①GPN guaranteed uncertainty estimates with extreme OOD, but can not guarantee OOD data close to ID
(2)Non-homophilic uncertainty
①They did not consider heterophilic graphs
(3)Task-specific OOD
①有些特征空间用密度检测不到OOD
tabular adj. 扁平的;列成表格的
(4)Broader Impact
①......data breach......privacy......
2.5. Experiments
2.5.1. Set-up
(1)Ablation
①Ablation study on module:
②Ablation study on misclassification:
(2)Baselines
比较了很多很多很多但是在附录里太多太长不放
(3)Datasets
①CoraML, CiteSeer, PubMed, CoauthorPhysics,CoauthorCS, AmazonPhotos, AmazonComputers, OGBN Arxiv
2.5.2. Results
(1)OOD Detection
(2)Attributed Graph Shifts
(3)Qualitative Evaluation
(4)Inference & training time
2.6. Conclusion
3. 知识补充
3.1. Label Propagation (LP)
参考学习:半监督学习之labelPropagation原理与实现 - 知乎 (zhihu.com)
3.2. Pseudo-counts
(1)定义:
在人工智能,尤其是机器学习和统计建模中,pseudo-counts(伪计数)是一种用于处理数据稀疏性和平滑概率分布的技术。当我们在处理离散数据(如文本数据、类别数据等)时,经常会遇到某些事件或类别在训练集中很少或从未出现过的情况,这可能导致在测试或预测时得到不合理的概率或估计。
为了解决这个问题,我们可以使用pseudo-counts来“平滑”这些概率。具体来说,pseudo-counts是在观察到的计数上添加的一个小的、固定的值,以增加那些很少或从未出现的事件或类别的概率。这有助于防止模型对训练集中未出现的事件或类别做出过于极端的预测。
例如,在朴素贝叶斯分类器中,我们可能会使用Laplace平滑(也称为加1平滑),其中pseudo-count被设置为1。如果我们有一个类别在训练集中从未出现过,使用Laplace平滑可以确保该类别在预测时仍有一个非零的概率。
总的来说,pseudo-counts是一种用于处理数据稀疏性和平滑概率分布的技术,它通过在观察到的计数上添加一个小值来增加那些很少或从未出现的事件或类别的概率。
(2)感觉就像是模拟了一个噪音
3.3. Radial normalizing flows
?
4. Reference List
Stadler, M. et al. (2021) 'Graph Posterior Network: Bayesian Predictive Uncertainty for Node Classification', Neural Information Processing Systems. doi: https://doi.org/10.48550/arXiv.2110.14012