RGCN学习笔记
了解RGCN基本原理以及案例执行方法笔记
1. Difference between GCN and RGCN
For GCN Self connection is the same as connecting to others
For RGCN, it has R number of projection matrix w)
RGCN is widely used for
- link prediction
- node classificiton for ( knowledge graph, heterogenous graphs)
looking at the propagation model, each relation type has its own projection matrix W
-
w 0 w_0 w0 is the projection matrix to self-connection
-
In the case of RGCN we do the aggregation and projection based on relation type first and the idea is that the projection
matrices will put them in the same space so that when you sum them together it can make some semantic sense of what’s going on.
2. Regularization in R-GCN model
RGCNs are going to have more parameters than gcn because each relation type is going to have its own w w w and that can be a problem if you have a lot of relation types especially if some of them are rare so the paper gets around this by proposing two different ways of _regularizing these things.
1. Basis Decomposition Regularization
The first way is called basis decomposition
In this framework you specify the number of unique W W Wthat you want to have for the layer and then each of the W W W is calculated by combining those components and linearly so they learn a coefficient for each of the components so with no regularization, each relation type has its own projection matrix but with maximal regularization.
You say okay i only want one shared weight matrix but each of the relation types we then learned some coefficient to scale it. so on the other hand if you had a hundred relation types and you said i only want two bases then each of the 100 relation types would learn two coefficients, one to scale the first and one to scale the second. so it’s a linear combination of the number of components and this is basically just a weight sharing scheme
so in the world of the twitter example we were just talking about where you have a blocks and a follows maybe it doesn’t make sense to have two totally separate weight matrices for those two because they’re kind of semantically similar maybe instead you have one weight matrix and it would turn out that the coefficient for blocks is negative one and the coefficient for follows is positive one. So you have the same weight matrix but different coefficients.
2. Block Diagonal Matrix Regularization
The other type of regularization technique they present is called block diagonal decomposition and this basically just takes small matrices and stacks them diagonally in a bigger matrix and the idea is that many of the parameters or the values in this matrix will be zero so the way they motivate this and as a sensible choice is they claim that there are variables that are strongly interconnected within the group but don’t have much interaction outside of that group In this sort of structure of the matrix just codifies that.
so as an example you might expect to represent a person their physical characteristics like height and weight might be important and maybe their political affiliation is important as well but you might not expect much interaction between those two groups of variables(political affiliation and physical characteristics) so the elements of the matrix that would codify those interactions are just set to zero but at the end of the day this is just a way to reduce the number of parameters in the matrices and therefore not over fit
⏩ Paper Title: Modeling Relational Data with Graph Convolutional Networks
⏩ Paper: https://arxiv.org/abs/1703.06103
⏩ Author: Michael Schlichtkrull, Thomas N. Kipf, Peter Bloem, Rianne van den Berg, Ivan Titov, Max Welling
⏩ Organisation: University of Amsterdam, VU Amsterdam, CIFAR