Paper Notes: Graph Convolutional Neural Networks for Web-Scale Recommender Systemste

本文介绍了一种名为PinSage的图卷积网络(GCN)算法,用于解决在包含数十亿节点和边的大型图中训练和推理GCN的挑战。PinSage结合了随机游走和图卷积,生成考虑了图结构和节点特征的节点(物品)嵌入。通过使用随机游走和重要性池化等技术,实现了在MapReduce上的高效训练和推断。
摘要由CSDN通过智能技术生成
Graph Convolutional Neural Networks for Web-Scale Recommender Systems
  • LINK: https://arxiv.org/abs/1806.01973

  • CLASSIFICATION: RECOMMENDER-SYSTEM, GCN

  • YEAR: Submitted on 6 Jun 2018

  • FROM: KDD 2018

  • WHAT PROBLEM TO SOLVE: The main challenge is to scale both the training as well as inference of GCN-based node embeddings to graphs with billions of nodes and tens of billions of edges. Scaling up GCNs is difficult because many of the core assumptions underlying their design are violated when working in a big data environment.

  • SOLUTION: We develop a data-efficient Graph Convolutional Network (GCN) algorithm PinSage, which combines efficient random walks and graph convolutions to generate embeddings of nodes (i.e., items) that incorporate both graph structure as well as node feature information.

  • CORE POINT:

    • Key insights to drastically improve the scalability of GCNs

      • On-the-fly convolutions

        PinSage algorithm performs efficient, localized convolutions by sampling the neighborhood around a node and dynamically constructing a computation graph from this sampled neighborhood.

      • Producer-consumer minibatch construction

        A large-memory, CPU-bound producer efficiently samples node network neighborhoods and fetches the necessary features to define local convolutions, while a GPU-bound TensorFlow model consumes these pre-defined computation graphs to efficiently run stochastic gradient decent.

      • Efficient MapReduce inference

        Given a fully-trained GCN model, we design an efficient MapReduce pipeline that can distribute the trained model to generate embeddings for billions of nodes, while minimizing repeated computations.

    • New training techniques and algorithmic innovations

      • Constructing convolutions via random walks

        Random sampling is suboptimal, and we develop a new technique using short random walks to sample the computation graph. An additional benefit is that each node now has an importance score, which we use in the pooling/aggregation step.

      • Importance pooling

        We introduce a method to weigh the importance of node features in this aggregation based upon random-walk similarity measures.

      • Curriculum training

        The algorithm is fed harder-and-harder examples during training.

    • Difference between GraphSage and PinSage

      We fundamentally im

评论
添加红包

请填写红包祝福语或标题

红包个数最小为10个

红包金额最低5元

当前余额3.43前往充值 >
需支付:10.00
成就一亿技术人!
领取后你会自动成为博主和红包主的粉丝 规则
hope_wisdom
发出的红包
实付
使用余额支付
点击重新获取
扫码支付
钱包余额 0

抵扣说明:

1.余额是钱包充值的虚拟货币,按照1:1的比例进行支付金额的抵扣。
2.余额无法直接购买下载,可以购买VIP、付费专栏及课程。

余额充值