scala利用spark处理游戏数据


一、scala与spark

Scala是一门多范式的编程语言,一种类似java的编程语言,设计初衷是实现可伸缩的语言 、并集成面向对象编程和函数式编程的各种特性

Spark 是一种基于内存的快速、通用、可扩展的大数据分析计算引擎

二、环境配置

1.scala环境配置

先在系统变量里配置SCALA_HOME,再到PATH里引用SCALA_HOME下的bin目录

在这里插入图片描述

在这里插入图片描述

2.idea里的scala工具

利用idea开发,idea里有scala工具可以帮我们很快的配置scala环境
在这里插入图片描述

3.添加pom.xml依赖

<dependencies>
        
        <dependency>
            <groupId>org.apache.spark</groupId>
            <artifactId>spark-core_2.12</artifactId>
            <version>3.0.0</version>
        </dependency>

        <dependency>
            <groupId>org.apache.spark</groupId>
            <artifactId>spark-sql_2.12</artifactId>
            <version>3.0.0</version>
        </dependency>

        <dependency>
            <groupId>org.apache.spark</groupId>
            <artifactId>spark-streaming_2.12</artifactId>
            <version>3.0.0</version>
        </dependency>

    </dependencies>

三、编写代码处理游戏数据

1.统计出Counter-Strike游戏每个月玩家最多人数峰值

  def main(args: Array[String]): Unit = {

    val conf: SparkConf = new SparkConf().setAppName("test").setMaster("local[*]")
    val spark: SparkSession = SparkSession.builder().config(conf).getOrCreate()

    val read: DataFrame = spark.read
      .format("csv")
      .option("encoding","GBK")
      .option("sep", ",")
      .option("inferSchema", "true")
      .option("header", "true")
      .load("datas/AllSteamData.csv")
    read.createOrReplaceTempView("steam")

    val frame: DataFrame = spark.sql("select " +
      "*,substring_index(Month,'月',1) as mon " +
      "from steam " +
      "where Month not like 'Last 30 Days' " +
      "and " +
      "Name='Counter-Strike'"
    )
    frame.createOrReplaceTempView("steam1")

    spark.sql("select " +
      "mon,max(PeakPlayers) as max_Players " +
      "from steam1 " +
      "group by mon " +
      "order by max_Players desc").show()

    spark.close()

  }

最总结果
在这里插入图片描述

2.求游戏峰值人数大于10000最多的那个月份峰值最高前十个游戏

    def main(args: Array[String]): Unit = {

    val conf: SparkConf = new SparkConf().setAppName("test").setMaster("local[*]")
    val spark: SparkSession = SparkSession.builder().config(conf).getOrCreate()

    val read: DataFrame = spark.read
      .format("csv")
      .option("encoding","GBK")
      .option("sep", ",")
      .option("inferSchema", "true")
      .option("header", "true")
      .load("datas/AllSteamData.csv")
    read.createOrReplaceTempView("steam")

    val frame: DataFrame = spark.sql("select " +
      "Name,PeakPlayers,substring_index(Month,'月',1) as mon " +
      "from steam " +
      "where Month not like 'Last 30 Days' " +
      "and " +
      "PeakPlayers > 10000")
    frame.createOrReplaceTempView("steam1")


    val frame1: DataFrame = spark.sql("select " +
      "mon,count(mon) as ct " +
      "from steam1 " +
      "group by mon")
    frame1.createOrReplaceTempView("steam2")


    spark.sql("select " +
      "Name,mon,max(cast(PeakPlayers as bigint)) as max_players " +
      "from steam1 " +
      "where mon=(select mon from steam2 where ct=(select max(ct) from steam2)) " +
      "group by Name,mon " +
      "order by max_players desc limit 10").show

    spark.close()

  }

最终结果
在这里插入图片描述

总结

我找的数据里有一些脏数据,处理的时候很多的坑也是弄了很久,要转数据类型等等,总体做下来没有太多困难

  • 0
    点赞
  • 0
    收藏
    觉得还不错? 一键收藏
  • 1
    评论
评论 1
添加红包

请填写红包祝福语或标题

红包个数最小为10个

红包金额最低5元

当前余额3.43前往充值 >
需支付:10.00
成就一亿技术人!
领取后你会自动成为博主和红包主的粉丝 规则
hope_wisdom
发出的红包
实付
使用余额支付
点击重新获取
扫码支付
钱包余额 0

抵扣说明:

1.余额是钱包充值的虚拟货币,按照1:1的比例进行支付金额的抵扣。
2.余额无法直接购买下载,可以购买VIP、付费专栏及课程。

余额充值