SparkSQL读取Tuple类型的Dataset加载DatFrame

测试数据:

146.202.84.90	江西	2020-10-28	1603879301437	6285032924209569490	www.jd.com	Login
146.202.84.90	江西	2020-10-28	1603879301438	6285032924209569490	www.gome.com.cn	Login
146.202.84.90	江西	2020-10-28	1603879301438	6285032924209569490	www.taobao.com	Comment
118.62.67.216	北京	2020-10-28	1603879301438	2988409670998681798	www.dangdang.com	Click
118.62.67.216	北京	2020-10-28	1603879301438	2988409670998681798	www.suning.com	Click
118.62.67.216	北京	2020-10-28	1603879301439	2988409670998681798	www.gome.com.cn	Comment
100.214.27.58	河北	2020-10-28	1603879301441	6531278323337129900	www.taobao.com	View
100.214.27.58	河北	2020-10-28	1603879301444	6531278323337129900	www.taobao.com	Click
100.214.27.58	河北	2020-10-28	1603879301444	6531278323337129900	www.mi.com	Regist
42.222.37.182	香港	2020-10-28	1603879301444	4579529561379204385	www.dangdang.com	View
42.222.37.182	香港	2020-10-28	1603879301444	4579529561379204385	www.baidu.com	Regist
42.222.37.182	香港	2020-10-28	1603879301445	4579529561379204385	www.suning.com	Comment

示例代码;

import org.apache.spark.sql.{DataFrame, Dataset, SparkSession}


/**
 * Tuple格式的DataSet加载DataFrame
 */
object ReadTupleDataSetToDF {
  def main(args: Array[String]): Unit = {
    val session: SparkSession = SparkSession.builder()
      .master("local")
      .appName("ReadTupleDataSetToDF")
      .getOrCreate()
    session.sparkContext.setLogLevel("Error")
    val ds: Dataset[String] = session.read.textFile("T:/code/spark_scala/data/pvuvdata")
    import session.implicits._
    val tupleDs: Dataset[(String, String, String, String, String, String, String)] = ds.map(line => {
      //126.54.121.136	浙江	2020-07-13	1594648118250	4218643484448902621	www.jd.com	Comment
      val arr: Array[String] = line.split("\t")
      (arr(0), arr(1), arr(2), arr(3), arr(4), arr(5), arr(6))
    })
    val frame: DataFrame = tupleDs.toDF("ip", "local", "date", "ts", "uid", "site", "operator")
    frame.createTempView("t")
    // pv
    session.sql(
      """
        | select site ,count(*) as pv from t group by site order by pv
        |""".stripMargin).show()
    // uv
    session.sql(
      """
        |select site,count(*) uv from (select distinct ip,site from t) t1 group by site order by uv
        |""".stripMargin).show()
  }
}

结果显示:

+----------------+-----+
|            site|   pv|
+----------------+-----+
|   www.baidu.com|18293|
|  www.suning.com|18320|
|  www.taobao.com|18375|
|www.dangdang.com|18576|
| www.gome.com.cn|18587|
|      www.jd.com|18600|
|      www.mi.com|18667|
+----------------+-----+

+----------------+-----+
|            site|   uv|
+----------------+-----+
|  www.suning.com|15442|
|   www.baidu.com|15489|
|  www.taobao.com|15582|
|www.dangdang.com|15609|
|      www.mi.com|15619|
| www.gome.com.cn|15672|
|      www.jd.com|15683|
+----------------+-----+

评论
添加红包

请填写红包祝福语或标题

红包个数最小为10个

红包金额最低5元

当前余额3.43前往充值 >
需支付:10.00
成就一亿技术人!
领取后你会自动成为博主和红包主的粉丝 规则
hope_wisdom
发出的红包
实付
使用余额支付
点击重新获取
扫码支付
钱包余额 0

抵扣说明:

1.余额是钱包充值的虚拟货币,按照1:1的比例进行支付金额的抵扣。
2.余额无法直接购买下载,可以购买VIP、付费专栏及课程。

余额充值