调用方法(通过调用方法实现数据分析)
show:(以表格的形式展示数据集中前N行(20)记录)
select :(投影查询,指定查询的字段)
selectExpr :(支持表达式(基本运算或者别名)的投影查询)
df
.selectExpr("id+10", "name as username")
withColumn :(添加额外列方法)
withColumnRenamed :(给列重命名方法 相当于sql中的as 别名)
val rdd = spark.sparkContext.makeRDD(List((1, "zs", true, 2000D), (2, "ls", false, 3000D)))
// rdd转换为ds或者df
import spark.implicits._
// scala隐式转换
val df = rdd.toDF("id", "name", "sex", "salary")
df
//.select("id", "name")
//.selectExpr("id+10", "name as username")
.withColumn("year_salary", $"salary" * 12)
.withColumnRenamed("name","username")
.show()
// select id,name,sex,salary,salary * 12 as yearsalary from t_user
+---+----+-----+------+-----------+
| id|name| sex|salary|year_salary|
+---+----+-----+------+-----------+
| 1| zs| true|2000.0| 24000.0|
| 2| ls|false|3000.0| 36000.0|
+---+----+-----+------+-----------+
+---+--------+-----+------+-----------+
| id|username| sex|salary|year_salary|
+---+--------+-----+------+-----------+
| 1| zs| true|2000.0| 24000.0|
| 2| ls|false|3000.0| 36000.0|
+---+--------+-----+------+-----------+
printSchema :(打印输出表的结果)
//结果类似与这样
root
|-- id: integer (nullable = false)
|-- username: string (nullable = true)
|-- sex: boolean (nullable = false)
|-- salary: double (nullable = false)
|-- year_salary: double (nullable = false)
drop :(用来删除特定列方法)
dropDuplicats :(特定列内容重复的数据只保留一个 (结果去重))
import spark.implicits._
val df = List(
(1, "zs", false, 1, 15000),
(2, "ls", false, 1,