1、Spark SQL基本操作
(一)创建DataFrame
(二)查询所有数据
(三)查询所有数据并去除重复的数据
(四)查询所有数据打印时去除id字段
(五)筛选出age>30的记录
(六)将数据按age分组
(七)将数据按name升序排序
(八)取出前3行数据
(九)查询所有记录的name列,并为其取别名为username
(十)查询年龄age的平均值
(十一)查询年龄age的最小值
2、编程实现将RDD转换为DataFrame
(一) 代码编写
import org.apache.spark.sql.types._
import org.apache.spark.sql.Row
val peopleRDD = spark.sparkContext.textFile("file:///root/wyqWork/Demo6/employee.txt")
val schemaString = "id name age"
val fields = schemaString.split(" ").map(fieldName => StructField(fieldName, StringType, nullable = true))
val schema = StructType(fields)
val rowRDD = peopleRDD.map(_.split(",")).map(attributes => Row(attributes(0), attributes(1).trim, attributes(2).trim))
val peopleDF = spark.createDataFrame(rowRDD, schema)
peopleDF.createOrReplaceTempView("people")
val results = spark.sql("SELECT id,name,age FROM people")
results.map(attributes => "id: " + attributes(0)+","+"name:"+attributes(1)+","+"age:"+attributes(2)).show()
(二)结果输出
3、利用DataFrame读写MySQL的数据
(一)新建数据库sparktest,再创建表employee,并输入数据
(二)查询当前表中所有数据
4、配置Sp a r k通过JDBC连接数据库MySQL,编程实现利用Da t aFr a m e插入如表6 -2所示的两行数据到MySQL中,最后打印出a g e的最大值和a g e的总和。
(一)代码编写
import java.util.Properties
import org.apache.spark.sql.types._
import org.apache.spark.sql.Row
import org.apache.spark.sql.SparkSession
val spark=SparkSession.builder().appName("TestMySQL").master("local").getOrCreate()
import spark.implicits._
val employeeRDD=spark.sparkContext.parallelize(Array("3 Mary F 26","4 Tom M 23")).map(_.split(" "))
val schema=StructType(List(StructField("id",IntegerType,
true),StructField("name",StringType,true),StructField("gender",StringType,true),
StructField("age",IntegerType,true)))
val rowRDD=employeeRDD.map(p=>Row(p(0).toInt,p(1).trim,p(2).trim,p(3).toInt))
val employeeDF=spark.createDataFrame(rowRDD,schema)
val prop=new Properties()
prop.put("user","root")
prop.put("password","10086CYc#")
prop.put("driver","com.mysql.cj.jdbc.Driver")
employeeDF.write.mode("append").jdbc("jdbc:mysql://localhost:3306/sparktest","sparktest.employee",prop)
val jdbcDF = spark.read.format("jdbc").
option("url","jdbc:mysql://localhost:3306/sparktest").
option("driver","com.mysql.cj.jdbc.Driver").
option("dbtable", "employee").
option("user", "root").
option("password", "10086CYc#").
load()
jdbcDF.agg("age" -> "max", "age" -> "sum").show()
(二)结果输出