大数据Spark结合图数据库Neo4j设计架构

  1. Introduce

   大数据分布式技术结合图库Neo4J项目,由于Neo4j采用单节点,性能存在以下问题:

  1. . 插入速率随着图库数据增加而减少,成反比相关。
  2. . 对前端页面查询点边关系,测试一条数据耗时10s以上。

 

    所以重新设计架构,采用分布式中间件来取代单节点式Neo4j部分功能。经测试,几套架构尚可满足Spark离线处理和实时计算需求。

 

  1. Coding Introduce
def getDriver(): Driver = {
    val url = Contants.NEO4j_URL
    val user = Contants.NEO4J_USER
    val password = Contants.NEO4j_PWD
    val driver = GraphDatabase.driver(url, AuthTokens.basic(user, password), Config.build()
      .withMaxIdleSessions(1000)
      .withConnectionLivenessCheckTimeout(10,TimeUnit.SECONDS)
      .toConfig)
    return driver
  }


  def getSession(driver: Driver): Session = {
    val session = driver.session()
    return session
  }

 

 

def relationShip(session: Session, msisdn: String, touser: String,capdate:String): String = {
      //查询个人和个人之间的文件关系
      val result = session.run("match (p1:person)-[r:fileTofile]-(p2:person) where p1.msisdn={msisdn} and p2.touser={touser} return r.uuid as uuid,p1.wxid as wxid1,p1.name as name1,p2.msisdn as msisdn2,p2.wxid as wxid2,p2.name as name2",
        parameters("msisdn", msisdn, "touser", touser))
    if (result.hasNext) {
      val record = result.next()
      val uuid = record.get("uuid").asString()
      val wxid1 = record.get("wxid1").asString()
      val name1 = record.get("name1").asString()
      val msisdn2 = record.get("msisdn2").asString()
      val wxid2 = record.get("wxid2").asString()
      val name2 = record.get("name2").asString()
      return uuid + "|" + wxid1 + "|" + name1 + "|" + msisdn2 + "|" + wxid2 + "|" + name2
    } else {
      val uuid = UUID.randomUUID().toString.replaceAll("-", "")
      val rel = session.run("match (p1:person),(p2:person) where p1.msisdn={msisdn} and p2.touser={touser} merge (p1)-[r:fileTofile]-(p2) on create set r.uuid={uuid},r.capdate={capdate} return p1.wxid as wxid1,p1.name as name1,p2.msisdn as msisdn2,p2.wxid as wxid2,p2.name as name2;",
        parameters("msisdn", msisdn, "touser", touser, "uuid", uuid,"capdate",capdate))
      if (rel.hasNext) {
        val record = rel.next()
        val wxid1 = record.get("wxid1").asString()
        val name1 = record.get("name1").asString()
        val msisdn2 = record.get("msisdn2").asString()
        val wxid2 = record.get("wxid2")
        val name2 = record.get("name2")
        return uuid + "|" + wxid1 + "|" + name1 + "|" + msisdn2 + "|" + wxid2 + "|" + name2
      }
    }

  1. 传入msisdn,touser查询该关系是否存在。

   若存在,则返回关系+节点属性

如果不存在,则新建关系,且关系属性上使用唯一UUID作为标识。同时返回关系+属性参数。

 

要注意遍历RDD时一定要在每次遍历查询之后关闭Neo4j的Driver,防止内存溢出。

try {
      resultMappRdd.foreachRDD(rdd => {
        rdd.saveToEs("wechat_neo4j/file")

      })
    } catch {
      case e: InterruptedException =>
        Thread.currentThread().interrupt()
    } finally {
      try {
        if (ssc != null) {
          ssc.start()
          ssc.awaitTermination()
        }
      } catch {
        case e:Exception =>
          e.printStackTrace()
      }

之后把接口方法体中返回的节点+关系数据遍历插入ES中。

注意在实时计算采用这种方法时,有时会在流量暴增情况下出现:上个队列批次尚未处理完成,下个批次队列就进入线程中。会出现OOM问题。所以可以使用Thread.currentThred.interrupt方法让OOM之后数据重新开始。

同时为了解决实时计算流量暴增情况:

  1. 可以使用Redis在一个批次最后做一次性查询或者建立关系。这样就要对节点属性和关系    属性做redis同步。
  2.  使用Kafka的反压策略,限制Kafka消费者读取速率等。
Graph Algorithms: Practical Examples in Apache Spark and Neo4j By 作者: Mark Needham – Amy E. Hodler ISBN-10 书号: 1492047686 ISBN-13 书号: 9781492047681 Edition 版本: 1 出版日期: 2019-01-04 pages 页数: (217) Discover how graph algorithms can help you leverage the relationships within your data to develop more intelligent solutions and enhance your machine learning models. You’ll learn how graph analytics are uniquely suited to unfold complex structures and reveal difficult-to-find patterns lurking in your data. Whether you are trying to build dynamic network models or forecast real-world behavior, this book illustrates how graph algorithms deliver value—from finding vulnerabilities and bottlenecks to detecting communities and improving machine learning predictions. This practical book walks you through hands-on examples of how to use graph algorithms in Apache Spark and Neo4j—two of the most common choices for graph analytics. Also included: sample code and tips for over 20 practical graph algorithms that cover optimal pathfinding, importance through centrality, and community detection. Learn how graph analytics vary from conventional statistical analysis Understand how classic graph algorithms work, and how they are applied Get guidance on which algorithms to use for different types of questions Explore algorithm examples with working code and sample datasets from Spark and Neo4j See how connected feature extraction can increase machine learning accuracy and precision Walk through creating an ML workflow for link prediction combining Neo4j and Spark
评论
添加红包

请填写红包祝福语或标题

红包个数最小为10个

红包金额最低5元

当前余额3.43前往充值 >
需支付:10.00
成就一亿技术人!
领取后你会自动成为博主和红包主的粉丝 规则
hope_wisdom
发出的红包

打赏作者

XuTengRui

你的鼓励将是我创作的最大动力

¥1 ¥2 ¥4 ¥6 ¥10 ¥20
扫码支付:¥1
获取中
扫码支付

您的余额不足,请更换扫码支付或充值

打赏作者

实付
使用余额支付
点击重新获取
扫码支付
钱包余额 0

抵扣说明:

1.余额是钱包充值的虚拟货币,按照1:1的比例进行支付金额的抵扣。
2.余额无法直接购买下载,可以购买VIP、付费专栏及课程。

余额充值