知识图谱补全:Triple Classification(三元组分类)

6) Triple Classification: Triple classification is to determine whether facts are correct in testing data, which is typically regarded as a binary classification problem.

6)三元组分类: 三元组分类是确定测试数据中事实是否正确,通常被认为是一个二元分类问题。

 

The decision rule is based on the scoring function with a specific threshold. Aforementioned embedding methods could be applied for triple classification, including translational distance-based methods like TransH [20] and TransR [17] and semantic matching-based methods such as NTN [18], HolE [21] and ANALOGY [22]. 

决策规则基于具有特定阈值的评分函数。前面提到的嵌入方法可以应用于三级分类,包括基于翻译距离的TransH[20]、TransR[17]和基于语义匹配的NTN[18]、HolE[21]、类推[22]等方法。

Vanilla vector-based embedding methods failed to deal with 1-to-n relations. Recently, Dong et al. [79] extended the embedding space into region-based n-dimensional balls where the tail region is in the head region for 1-to-n relation using fine-grained type chains, i.e., tree-structure conceptual clusterings.

基于嵌入向量的方法无法处理1到n的关系。最近Dong等人[79]利用细粒度类型链,将嵌入空间扩展为基于区域的n维球,其中尾部区域位于头部区域,用于1到n的关系,即树结构概念聚类。

This relaxation of embedding to n-balls turns triple classification into a geometric containment problem and improves the performance for entities with long type chains. However, it relies on the type chains of entities and suffers from the scalability problem.
这种松弛的n球嵌入使三重分类问题变成一个几何包容问题,并提高了具有长类型链的实体的性能。然而,它依赖于实体的类型链,并受到可伸缩性问题的困扰。

### STAR 知识图谱补全方法 STAR(Sparse Tensor-based Attention Relation prediction)是一种用于知识图谱补全的技术,其核心在于通过稀疏张量表示和注意力机制来预测缺失的关系。该技术利用已有的三元组数据构建稀疏张量,并在此基础上应用多层感知机(MLP)模型进行关系预测。 #### 数据预处理 为了准备训练数据集,在输入阶段需要将原始的知识图谱转换成适合于机器学习的形式。具体来说,每一个事实(fact),即(h, r, t)—头实体(head entity), 关系(relation), 尾实体(tail entity)—被编码为向量形式。这些向量可以基于词嵌入(word embedding)[^1]或其他特征提取方式获得。 ```python import numpy as np from sklearn.preprocessing import LabelEncoder def preprocess_knowledge_graph(kg_data): """ Preprocesses the knowledge graph data. Args: kg_data (list of tuples): List containing triples in format [(h,r,t)] Returns: tuple: Encoded head entities, relations and tail entities along with their encoders """ heads = [triple[0] for triple in kg_data] rels = [triple[1] for triple in kg_data] tails = [triple[2] for triple in kg_data] le_heads = LabelEncoder() le_rels = LabelEncoder() le_tails = LabelEncoder() encoded_heads = le_heads.fit_transform(heads) encoded_rels = le_rels.fit_transform(rels) encoded_tails = le_tails.fit_transform(tails) return (encoded_heads, encoded_rels, encoded_tails, le_heads, le_rels, le_tails) ``` #### 构建稀疏张量并计算相似度得分 在完成数据预处理之后,下一步是创建一个三维的稀疏张量S∈R^(N×M×K),这里N代表实体数量,M代表可能存在的不同关系种类数,而K则是时间戳维度(如果适用)。对于每一对(h_i ,r_j )组合,则会寻找所有满足条件t_l 的实例,并据此更新对应位置上的值。接着使用特定的距离评估指标如余弦相似度(cosine similarity) 或者更复杂的LIN测度(LIN measure)[^2] 来衡量候选边之间的关联程度。 #### 训练与推理过程 最后一步涉及到了实际的学习算法的选择以及参数调整工作。通常情况下会选择神经网络架构来进行端到端(end-to-end) 的优化;然而也可以考虑传统统计学方法比如矩阵分解(matrix factorization) 。无论采取何种策略,目标都是最小化损失函数(loss function),从而使得预测出来的链接尽可能接近真实情况下的连接模式。 ```python import tensorflow as tf from tensorflow.keras.layers import Dense, Input from tensorflow.keras.models import Model class STARMODEL(tf.Module): def __init__(self, num_entities, num_relations): super().__init__() self.entity_embeddings = tf.Variable( initial_value=tf.random.uniform((num_entities, EMBEDDING_DIM)), trainable=True ) self.relation_embeddings = tf.Variable( initial_value=tf.random.uniform((num_relations, EMBEDDING_DIM)), trainable=True ) @tf.function(input_signature=[Input(shape=(None,), dtype=tf.int32), Input(shape=(None,), dtype=tf.int32)]) def call(self, h_idx, r_idx): e_h = tf.nn.embedding_lookup(self.entity_embeddings, h_idx) e_r = tf.nn.embedding_lookup(self.relation_embeddings, r_idx) scores = tf.reduce_sum(e_h * e_r, axis=-1) return scores EMBEDDING_DIM = 50 model = STARMODEL(num_entities=NUM_ENTITIES, num_relations=NUM_RELATIONS) optimizer = tf.optimizers.Adam() @tf.function def train_step(model, optimizer, batch_x, batch_y_true): with tf.GradientTape() as tape: predictions = model(batch_x[:, 0], batch_x[:, 1]) loss = tf.losses.mean_squared_error(predictions=predictions, labels=batch_y_true) gradients = tape.gradient(loss, model.trainable_variables) optimizer.apply_gradients(zip(gradients, model.trainable_variables)) return loss ```
评论 5
添加红包

请填写红包祝福语或标题

红包个数最小为10个

红包金额最低5元

当前余额3.43前往充值 >
需支付:10.00
成就一亿技术人!
领取后你会自动成为博主和红包主的粉丝 规则
hope_wisdom
发出的红包
实付
使用余额支付
点击重新获取
扫码支付
钱包余额 0

抵扣说明:

1.余额是钱包充值的虚拟货币,按照1:1的比例进行支付金额的抵扣。
2.余额无法直接购买下载,可以购买VIP、付费专栏及课程。

余额充值