1.客户端启动shell
进入spark安装目录
bin/spark-shell --master spark://IP:7077 --executor-memory 1g
2.scala操作
(1)把HDFS上的文件映射为表
启动sparkSession对象:
val spark = org.apache.spark.sql.SparkSession.builder().appName("SparkSessionZipsExample").config("spark.sql.crossJoin.enabled", "true").getOrCreate()
建立表映射类:
case class User (id:String,name:String,age:String)
文件与表映射并过滤掉长度不够的字段(文件以“\t”隔开,映射后表名为user):
spark.read.textFile("hdf