创建SQLContext对象
val sqlContext=new.org.apache.spark.sql.SQLContext(sc)
创建dataframe对象
-
结构化数据文件创建dataframe
(1)parquet文件:
val dfusers = sqlContext.read.load(/xxx.xls)
(2)json文件:
val sfusers=sqlContext.read.format(/xxx.xls) -
外部数据库创建dataframe
JDBC或ODBC
val url = "jdbc:mysql://192.168.1.1/test"
val jdbcDF=sqlContext.read.format("jdbc").options(
|Map("url"->url,
|"user"->root
|"password"->root
|"dbtable"->"people")),load()
- RDD创建dataframe
(1)利用反射机制推断RDD反射模式
case class person(name:String , age:int)
val data =sc.textfile("xxx.txt").map(_spilt(",")
val people=data.map(p=>person(p(0),p(1).trim.toInt)).toDF()
dataframe查询
peopleDataFrame.registerTempTable(“peopleTempTab”)
val personsRDD=sqlContext.sql(“select name,age from peopleTempTab where age>20”).rdd
personsRDD.collect