文章目录
论文来源
https://relbench.stanford.edu/paper.pdf
https://github.com/snap-stanford/relbench
Abstract
背景:Much of the world’s most valued data is stored in data warehouses, where the data is spread across many tables connected by primary-foreign key relations.
问题:The core problem is that no machine learning method is capableof learning directly on the data spread across multiple relational tables.
方法:Here we introduce an end-to-end deep representation learning approach to directly
learn on data spread across multiple tables. We name our approach Relational Deep Learning. The core idea is to view relational tables as a heterogeneous graph, with a node for each row in each table, and edges specified by primary-foreign key relations. Message Passing Neural Networks can then automatically learn across multiple tables to extract representations that leverage all input data, without any manual feature engineering.
结果:To facilitate research, we also develop RELBENCH, a set of benchmark datasets and an implementation of Relational Deep Learning.Overall, we define a new research area that generalizes graph machine learning and broadens its applicability to a wide set of AI use cases.
1、Introduction
Many predictive problems over relational data have significant implications for human decision making. However, existing learning paradigms, notably tabular learning, cannot be directly applied to interlinked relational tables.
There are several issues with feature engineering: (1) it is a manual, slow and labor intensive process; (2) feature choices are likely highly-suboptimal; (3) only a small fraction of the overall space of possible features can be manually explored; (4) by forcing data into a
single table, information is aggregated into lower-granularity features, thus losing out on valuable
fine-grain signal; (5) whenever data distribution changes or drifts, current features become obsolete
and new features have to be manually reinvented.
The core of our approach is to represent relational tables as a heterogeneous Relational Entity Graph, where each row defines a node, columns define node features, and primary-foreign key relations define edges.
Graph Neural Networks (GNNs) can then be applied to build end-to-end predictive models.
2、Predictive Tasks on Relational Tables
Thus, when the model is trained on a training example that was sampled from a specific time s in the past, it is of utmost importance
to ensure that the model only sees the state of the database as it was before that time t.
3、Predictive Tasks as Graph Representation Learning Problems
Here, we formulate a generic machine learning architecture based on Graph Neural Networks, which solves predictive tasks on relational databases.
4、RELBENCH: A Benchmark for Relational Deep Learning
RELBENCH enables training and evaluation of machine learning models on relational data. RELBENCH supports deep learning framework agnostic data loading, task specification, standardized data splitting, and transforming data into graph format. RELBENCH provides standardized evaluation metric computations, and a leaderboard for tracking progress. We
additionally provide example training scripts built using PyTorch Geometric and PyTorch Frame.
The goal of RELBENCH is to facilitate scalable, robust, and reproducible machine learning research on relational tables.