1、启动spark
spark-shell --master local[2]
2、创建一个简单的RDD
val foodRDD = sc.makeRDD(List((1,"大虾","1元"),(2,"大闸蟹","8角"),(3,"三文鱼","5毛")))
3、将RDD转换为一个DataFrame(Frame则代表了数据结构)
val foodDF = foodRDD.toDF("id","name","price");
4、使用sparkSql展示
foodDF.show()
+---+----+-----+
| id|name|price|
+---+----+-----+
| 1| 大虾| 1元|
| 2| 大闸蟹| 8角|
| 3| 三文鱼| 5毛|
+---+----+-----+
5、使用sparkSql查询
5.1、api查询
val crab = foodDF.filter(foodDF.col("name").equalTo("大闸蟹")).show()
+---+----+-----+
| id|name|price|
+---+----+-----+
| 2| 大闸蟹| 8角|
+---+----+-----+
5.2、sparkSql查询
将foodDF转化成一张表
foodDF.createOrReplaceTempView("foods")
用sparksql进行查询
spark.sql("select * from foods where name ='大闸蟹'").show()
展示
+---+----+-----+
| id|name|price|
+---+----+-----+
| 2| 大闸蟹| 8角|
+---+----+-----+
6、dataFrame转换成RDD
val foodDF2RDD = foodDF.rdd
foodDF2RDD.foreach(println(_))
[1,大虾,1元]
[2,大闸蟹,8角]
[3,三文鱼,5毛]
7、将txt转化为dataFrame
val rdd = sc.textFile("file:///root/hadoop/data/food.txt");
也就是先把文件读进来,再根据数据将rdd转换成dataFrame
8、将json文件读入,直接生成dataFrame
val sqlContext = new org.apache.spark.sql.SQLContext(sc);
val foodDF = sqlContext.read.json("file:///home/hadoop/data/food.json");
9、将parquet文件读入,直接申城dataFrame
val sqlContext = new org.apache.spark.sql.SQLContext(sc);
val foodDF = sqlContext.read.parquet("file:///home/hadoop/data/food.parquet");
10、将mysql数据库内容读入,转化为dataFrame
启动时,就要加载mysql驱动包
./spark-shell --master=local --driver-class-path=/root/work/mysql-conneva-5.1.38-bin.jar
执行代码
val sqlContext = new org.apache.spark.sql.SQLContext(sc);
val prop = new java.util.Properties
prop.put("user","root")
prop.put("password","root")
val foodDF = sqlContext.read.jdbc("jdbc:mysql://hadoop001:3306/sparkdb","food",prop)