【大数据头歌实验】Spark相关编程 (Spark案例剖析 - 谷歌网页排名引擎PageRank实战)

第1关:海量数据导入:SparkSQL大数据导入处理 

import org.apache.spark.SparkConf  
import org.apache.spark.SparkContext  
import org.apache.spark.sql._ 
object SparkSQLHive {  
  
  def main(args: Array[String]) = {  
	val sparkConf=new SparkConf().setAppName("PageRank")
	val sc=new SparkContext(sparkConf)
	val spark = SparkSession.builder.master("local").appName("tester").enableHiveSupport().getOrCreate()
    spark.sql("use default")

        import spark.implicits._  
    	//drop table if it exists
    	spark.sql("DROP TABLE IF EXISTS vertices") 
    	spark.sql("DROP TABLE IF EXISTS edges")
    	//create table here	
    	spark.sql("CREATE TABLE IF NOT EXISTS vertices(ID BigInt,Title String)ROW FORMAT DELIMITED FIELDS TERMINATED BY '\t' LINES TERMINATED BY '\n'")  
    	//load data from file system
    	spark.sql("LOAD DATA LOCAL INPATH 'file:///root/graphx-wiki-vertices.txt' INTO TABLE vertices")
    	
    	//***************begin***************//
    	println("begin to create table in databases") 

    	//***********end***********//
    	//***************begin***************//
    	println("begin to load data in text file")
    	
		//***********end***********//
		println("success")   
	}

}

第2关:翻帐查数:Spark大数据查询

import org.apache.spark.SparkConf  
import org.apache.spark.SparkContext  
import org.apache.spark.sql._ 
object SparkSQLHive2 {  
  
  def main(args: Array[String]) = {  
	val sparkConf=new SparkConf().setAppName("PageRank")
	val sc=new SparkContext(sparkConf)
	val spark = SparkSession.builder.master("local").appName("tester").enableHiveSupport().getOrCreate()
	
	//chose database
	spark.sql("use default")
    import spark.implicits._ 
    spark.sql("DROP TABLE IF EXISTS vertices") 
    //create table
    spark.sql("CREATE TABLE IF NOT EXISTS vertices(ID BigInt,Title String)ROW FORMAT DELIMITED FIELDS TERMINATED BY '\t' LINES TERMINATED BY '\n'")  
	//load data in file system    
    spark.sql("LOAD DATA LOCAL INPATH 'file:///root/graphx-wiki-vertices.txt' INTO TABLE vertices")
	//***********begin***********//
	//query data from databases
	val res1=spark.sql("SELECT *FROM vertices limit 5")
	//***********end***********//
	
	res1.collect().foreach(println)
	  
	}

}

第3关:垃圾中觅黄金:网页评分算法处理

import org.apache.log4j.{Level, Logger}
import org.apache.spark.{HashPartitioner, SparkConf, SparkContext}
object PageRank {
  def main(args: Array[String]): Unit = {
    Logger.getLogger("org").setLevel(Level.ERROR)
    val conf = new SparkConf().setAppName("PageRank").setMaster("local")
    val sc = new SparkContext(conf)
    val links = sc.parallelize(List(
      ("A",List("B","C")),
      ("B",List("A","D")),
      ("C",List("A")),
      ("D",List("A","B","C"))
    )).partitionBy(new HashPartitioner(10)).persist()
    var ranks = links.mapValues(v => 0.25)
   //***********begin***********//
    for(i <- 0 until 10){
      val contributions = links.join(ranks).flatMap{
        case(pageId, (links, rank)) =>
          links.map(link => (link, rank / links.size))
      }
    //***********end***********//
    //***********begin***********//
      ranks = contributions
        .reduceByKey((x,y) =>x+y )
        .mapValues(v => 0.2*0.25 + 0.8*v)
    }
    //***********end***********//
    ranks.collect().foreach(println)
  }
}

  • 2
    点赞
  • 1
    收藏
    觉得还不错? 一键收藏
  • 0
    评论
评论
添加红包

请填写红包祝福语或标题

红包个数最小为10个

红包金额最低5元

当前余额3.43前往充值 >
需支付:10.00
成就一亿技术人!
领取后你会自动成为博主和红包主的粉丝 规则
hope_wisdom
发出的红包
实付
使用余额支付
点击重新获取
扫码支付
钱包余额 0

抵扣说明:

1.余额是钱包充值的虚拟货币,按照1:1的比例进行支付金额的抵扣。
2.余额无法直接购买下载,可以购买VIP、付费专栏及课程。

余额充值