SparkSql知识点回顾(三)

创建SparkSession

第一种方式:通过SparkSession的内部类Builder对象创建

import org.apache.Spark.sql
val spark = new sql.SparkSession.Builder().master("local[*]").appName("test").getOrCreate()

第二种方式:通过SparkSession中builer方法创建

import org.apache.Spark.sql.SparkSession
val spark2 = SparkSession.builder()..master("local[*]").appName("test").getOrCreate()

创建DataFrame

1.通过toFD()方法创建
2.通过读取文件的方法创建
3.通过其他DataFrame衍生
4.通过createDataFrame方法创建(不推荐)

1、通过toDF方法创建
* 集合.toDF
* rdd.toDF

import org.apache.Spark.sql.SparkSession
val spark2 = SparkSession.builder()..master("local[*]").appName("test").getOrCreate()
//写入spark隐式转换
import spark2.implicits._

    val list = List( Person("lisi",20,"man"), Person("wagnwu",30,"woman"),Person("zhaoliu",45,"man"),Person("hanmeimei",18,"woman") )

    val df = list.toDF()
//会自动使用样例类Person的形参名作为列名,如果List中是元组,就为_1、_2、_3,如果List中是String类型,列名为value
	df.show
	
    //TODO 可以使用有参的toDF方法重定义列名,必须列名的个数与列的个数一致。
    list2.toDF("NAME","AGE","SEX").show

	//通过spark直接.sparkContext
	val rd = spark.sparkContext.parallelize(list)
case class Person(name:String,age:Int,sex:String)

2.通过读取文件的方法创建

```scala
val df = spark.read.json("文件名")
df.printSchema()//打印列结构
df.show()

3.通过其他DataFrame衍生

  val list = List( Person("lisi",20,"man"), Person("wagnwu",30,"woman"),Person("zhaoliu",45,"man"),Person("hanmeimei",18,"woman") )

    val df = list.toDF()

    val df2 = df.selectExpr("name","age")

    val df3 = df2.where("sex='man'")

4.通过createDataFrame方法创建(不推荐)

 //dataFrame表所有行数据
    val rdd = spark.sparkContext.parallelize(List( Row("lisi",20,"shenzhen"),Row("wangwu",33,"beijing"),Row("zhaoliu",43,"beijing") ))

    //指定DataFrame的列信息
    val fileds = Array[StructField](StructField("name",StringType),StructField("age",IntegerType),StructField("address",StringType))
    val schema = StructType(fileds)
    val fd = spark.createDateFrame(rdd, schema)

创建DataSet

1.通过toDS()方法创建
2.通过读取文件的方法创建
3.通过其他DataSet衍生
4.通过createDataSet方法创建(不推荐)

1、通过toDS方法创建
* 集合.toDS
* rdd.toDS

 val list = List( Person("lisi",20,"man"), Person("wagnwu",30,"woman"),Person("zhaoliu",45,"man"),Person("hanmeimei",18,"woman") )

    val ds = list.toDS()

    ds.printSchema()

    ds.show

//想要给列名先toDF
    val list2 = List( ("lisi",20,"man"), ("wagnwu",30,"woman"),("zhaoliu",45,"man"),("hanmeimei",18,"woman") )
    list2.toDS().toDF("NAME","age","address").show

2.通过读取文件的方法创建

	spark.read.textFile("文件名")

3.其他DataSet衍生

  val ds = spark2.read.textFile("datas/wc.txt")

    val ds2 = ds.flatMap(x=> x.split(" "))

4.通过createDataSet方法创建


    val list = List( Person("lisi",20,"man"), Person("wagnwu",30,"woman"),Person("zhaoliu",45,"man"),Person("hanmeimei",18,"woman") )

    val ds = spark2.createDataset(list)

sparkSql的两种编程方式:1.命令式2.声明式

/**
* 命令式: 使用一些方法操作数据[了解]
* 列裁剪: selectExpr
* 过滤: where/filter两种方法相同
* 去重:
* distinct: 当两行数据所有列相同就会去重
* dropDuplicates: 当两行数据指定列相同就会去重
*/

 val list = List(
      ("lisi",20,"man","2020-06-14 11:25:25"),
      ("wagnwu",30,"woman","2020-06-17 11:25:25"),
      ("wagnwu",30,"woman","2020-06-17 11:25:25"),
      ("wagnwu",99,"woman","2020-06-22 11:25:25"),
      ("zhaoliu",45,"man","2020-06-15 11:25:25"),
      ("hanmeimei",18,"woman","2020-06-16 11:25:25") )

    val df = list.toDF("name","age","sex","date_str")
    //TODO 列裁剪: selectExpr(列名,函数(..) as 别名,...)
    //import org.apache.spark.sql.functions._

    //val df2 = df.select('name,'age,date_format('date_str, "yyyy-MM-dd"))

    //df2.show
    df.selectExpr("name","age","date_format(date_str,'yyyy-MM-dd') date_str").show

    //TODO 过滤
    df.where("age>=25").show

    df.filter("age>=25").show

    //TODO 去重
    df.distinct().show

    df.dropDuplicates("name").show

/**
* 声明式: sql操作数据
* 1、注册表:
* createOrReplaceTempView : 创建临时表,如果表存在则覆盖
* 2、写sql: spark.sql(“sql语句”)
*/

val list = List(
      ("lisi",20,"man","2020-06-14 11:25:25"),
      ("wagnwu",30,"woman","2020-06-17 11:25:25"),
      ("wagnwu",30,"woman","2020-06-17 11:25:25"),
      ("wagnwu",99,"woman","2020-06-22 11:25:25"),
      ("zhaoliu",45,"man","2020-06-15 11:25:25"),
      ("hanmeimei",18,"woman","2020-06-16 11:25:25") )

    val df = list.toDF("name","age","sex","date_str")

    //select ... from 表名 where ...
    //TODO 注册表
    df.createOrReplaceTempView("person")

    //TODO 写sql
    spark.sql(
      """
        |select
        | name,age,date_format(date_str,'yyyy-MM-dd') date_str
        |from person where age>=25
      """.stripMargin).show
  • 0
    点赞
  • 0
    收藏
    觉得还不错? 一键收藏
  • 0
    评论

“相关推荐”对你有帮助么?

  • 非常没帮助
  • 没帮助
  • 一般
  • 有帮助
  • 非常有帮助
提交
评论
添加红包

请填写红包祝福语或标题

红包个数最小为10个

红包金额最低5元

当前余额3.43前往充值 >
需支付:10.00
成就一亿技术人!
领取后你会自动成为博主和红包主的粉丝 规则
hope_wisdom
发出的红包
实付
使用余额支付
点击重新获取
扫码支付
钱包余额 0

抵扣说明:

1.余额是钱包充值的虚拟货币,按照1:1的比例进行支付金额的抵扣。
2.余额无法直接购买下载,可以购买VIP、付费专栏及课程。

余额充值