下面的数据分别是:日期,用户,关键词,地址,平台,型号
2017-03-13,leo,barbecue,beijing,android,1.0
2017-03-13,leo,barbecue,beijing,android,1.0
2017-03-13,leo,barbecue,beijing,android,1.0
2017-03-13,leo,cloth,beijing,android,1.0
2017-03-13,leo2,cloth,beijing,android,1.0
2017-03-13,jack,barbecue,shanghai,android,1.1
2017-03-13,leo,paper,beijing,ios,1.0
2017-03-13,tom,barbecue,beijing,android,1.2
2017-03-13,leo,cup,beijing,android,1.0
2017-03-13,mary,barbecue,beijing,android,1.2
2017-03-13,leo,barbecue,beijing,ios,1.3
2017-03-13,leo,cup,beijing,android,1.0
2017-03-13,leo1,cup,beijing,android,1.0
2017-03-13,leo2,cup,beijing,android,1.2
2017-03-13,leo3,cup,beijing,android,1.2
2017-03-13,leo4,cup,beijing,android,1.3
使用sparksql完成,查看每个关键词的uv量
CREATE TABLE IF NOT EXISTS UVTEST (
sdate date,
suser String,
keyword String,
address String,
platform String,
version double
)
ROW FORMAT DELIMITED
FIELDS TERMINATED BY ','
LINES TERMINATED BY '\n'
STORED AS TEXTFILE;
LOAD DATA LOCAL INPATH '/data/uvtest.txt' OVERWRITE INTO TABLE UVTEST;
答案
select keyword,count(distinct suser ) as uv
from uvtest
group by keyword;
sparksql
import org.apache.spark.sql.{DataFrame, SparkSession}
object HiveDemo extends App{
private val session: SparkSession = SparkSession.builder().enableHiveSupport()
.config("hive.exec.dynamic.partition.mode","nonstrict")
.master("local").appName("hive").getOrCreate()
private val frame: DataFrame = session.table("default.uvtest")
frame.createTempView("a")
val sql=
"""
|select
|keyword,count(distinct suser) as uv
|from uvtest
|group by keyword
|""".stripMargin
session.sql(sql).show()
}
执行结果
| keyword| uv|
+--------+---+
|barbecue| 4|
| paper| 1|
| cup| 5|
| cloth| 2|
+--------+---+