【spark内置函数】window

目录

一、说明

二、数据样式

三、代码示例

四、运行结果


一、说明

  • 这里的window函数是属于functions.scala文件中的内置函数。
  • window函数是在spark2.0.0中新增的API函数。
  • 这里选取股票数据为测试数据:测试数据下载地址
  • 时间一定要整理成yyyy-MM-dd hh:mm:ss格式,其他格式内置函数不识别。

二、数据样式

--表头
id,date,openPrice,highestPrice,lowestPrice,closePrice,exchangeNum
代码,时间,开盘价,最高价,最低价,收盘价,成交量(手)
--数据
SH600000,2015-7-1 9:31:00,137.5999,137.5999,136.5358,136.7814,64912
SH600000,2015-7-1 9:32:00,136.8632,137.5181,136.6995,137.5181,31363
SH600000,2015-7-1 9:33:00,137.4362,137.8455,137.1088,137.6818,25057
SH600000,2015-7-1 9:34:00,137.5999,137.5999,136.7814,136.8632,32136
SH600000,2015-7-1 9:35:00,137.1088,137.3544,136.8632,137.0269,29211
SH600000,2015-7-1 9:36:00,137.1088,137.3544,136.9451,136.9451,23235
SH600000,2015-7-1 9:37:00,136.9451,137.2725,136.9451,137.1088,15110
SH600000,2015-7-1 9:38:00,137.3544,137.6818,137.0269,137.3544,17345
...

三、代码示例

package com.dt.spark.Test

import java.sql.Date

import org.apache.spark.SparkContext
import org.apache.spark.sql.{DataFrame, Dataset, SparkSession}

object windowTest {

  case class Ticket(id: String, date: String, openPrice: Double, highestPrice: Double, lowestPrice: Double, closePrice: Double, exchangeNum: Int)

  def main(args: Array[String]): Unit = {
    val spark: SparkSession = SparkSession.builder().master("local[1]").appName("windowtest").getOrCreate()
    val sc: SparkContext = spark.sparkContext
    sc.setLogLevel("WARN")
    import org.apache.spark.sql.functions._
    import spark.implicits._
    val ticketDf: DataFrame = spark.read
      //推断数据类型
      .option("inferSchema", "true")
      //表示有表头,若没有则为false
      .option("header", true)
      .csv("data/test/ticket.csv")

    /**
      * 由于dataSet是强类型,故DF转化为DS时,DF的推断类型要和上面的实体类一致。
      * 可以打印schema看自动推断类型
      */
    println(ticketDf.schema)
    val ticketDs: Dataset[Ticket] = ticketDf.as[Ticket]
    ticketDf.show(false)

    /**
      * window时间窗口函数的应用一:
      * 十分钟滚动窗口的平均收盘价
      */
    ticketDf.groupBy(window($"date", "10 minute").alias("time"), $"id")
      .agg(round(mean($"closePrice"), 2).alias("avg"))
      .orderBy($"time")
      .show(false) //false可以将列的值显示全

    /**
      * window时间窗口函数的应用二:
      * 滑动窗口每隔5min,每10min窗口的平均股价
      */
    ticketDf.groupBy(window(col("date"), "10 minute", "5 minute").alias("time"), col("id"))
      .agg(round(mean(col("closePrice")), 2).alias("avg"))
      .orderBy(col("time"))
      .show(false)

    /**
      * window时间窗口函数应用三:
      * 滑动窗口每5min的后3分钟,每10min窗口的平均股价
      */
    ticketDf.groupBy(window('date, "10 minute", "5 minute", "3 minute").alias("time"), 'id)
      .agg(round(mean('closePrice), 2).alias("avg"))
      .orderBy('time)
      .show(false)
  }
}

四、运行结果

StructType(StructField(id,StringType,true), StructField(date,TimestampType,true), StructField(openPrice,DoubleType,true), StructField(highestPrice,DoubleType,true), StructField(lowestPrice,DoubleType,true), StructField(closePrice,DoubleType,true), StructField(exchangeNum,IntegerType,true))
+--------+---------------------+---------+------------+-----------+----------+-----------+
|id      |date                 |openPrice|highestPrice|lowestPrice|closePrice|exchangeNum|
+--------+---------------------+---------+------------+-----------+----------+-----------+
|SH600000|2015-07-01 09:31:00.0|137.5999 |137.5999    |136.5358   |136.7814  |64912      |
|SH600000|2015-07-01 09:32:00.0|136.8632 |137.5181    |136.6995   |137.5181  |31363      |
|SH600000|2015-07-01 09:33:00.0|137.4362 |137.8455    |137.1088   |137.6818  |25057      |
|SH600000|2015-07-01 09:34:00.0|137.5999 |137.5999    |136.7814   |136.8632  |32136      |
|SH600000|2015-07-01 09:35:00.0|137.1088 |137.3544    |136.8632   |137.0269  |29211      |
|SH600000|2015-07-01 09:36:00.0|137.1088 |137.3544    |136.9451   |136.9451  |23235      |
|SH600000|2015-07-01 09:37:00.0|136.9451 |137.2725    |136.9451   |137.1088  |15110      |
|SH600000|2015-07-01 09:38:00.0|137.3544 |137.6818    |137.0269   |137.3544  |17345      |
|SH600000|2015-07-01 09:39:00.0|137.3544 |137.5181    |136.7814   |136.7814  |30376      |
|SH600000|2015-07-01 09:40:00.0|136.9451 |138.0092    |136.7814   |137.1088  |25467      |
|SH600000|2015-07-01 09:41:00.0|137.1088 |137.5999    |136.8632   |137.1907  |41187      |
|SH600000|2015-07-01 09:42:00.0|137.1907 |137.5181    |137.0269   |137.5181  |16239      |
|SH600000|2015-07-01 09:43:00.0|137.3544 |137.3544    |136.8632   |136.9451  |27479      |
|SH600000|2015-07-01 09:44:00.0|137.0269 |137.5181    |137.0269   |137.4362  |11787      |
|SH600000|2015-07-01 09:45:00.0|137.5181 |137.5181    |137.2725   |137.3544  |11257      |
|SH600000|2015-07-01 09:46:00.0|137.2725 |137.4362    |137.1088   |137.1088  |14879      |
|SH600000|2015-07-01 09:47:00.0|137.1907 |137.1907    |136.8632   |136.9451  |37616      |
|SH600000|2015-07-01 09:48:00.0|137.0269 |137.1088    |136.8632   |137.0269  |15868      |
|SH600000|2015-07-01 09:49:00.0|137.1088 |137.1088    |136.8632   |136.9451  |14866      |
|SH600000|2015-07-01 09:50:00.0|137.3544 |137.5181    |137.3544   |137.5181  |28107      |
+--------+---------------------+---------+------------+-----------+----------+-----------+
only showing top 20 rows

+---------------------------------------------+--------+------+
|time                                         |id      |avg   |
+---------------------------------------------+--------+------+
|[2015-07-01 09:30:00.0,2015-07-01 09:40:00.0]|SH600000|137.12|
|[2015-07-01 09:40:00.0,2015-07-01 09:50:00.0]|SH600000|137.16|
|[2015-07-01 09:50:00.0,2015-07-01 10:00:00.0]|SH600000|137.39|
|[2015-07-01 10:00:00.0,2015-07-01 10:10:00.0]|SH600000|137.07|
|[2015-07-01 10:10:00.0,2015-07-01 10:20:00.0]|SH600000|136.42|
|[2015-07-01 10:20:00.0,2015-07-01 10:30:00.0]|SH600000|136.69|
|[2015-07-01 10:30:00.0,2015-07-01 10:40:00.0]|SH600000|136.78|
|[2015-07-01 10:40:00.0,2015-07-01 10:50:00.0]|SH600000|137.02|
|[2015-07-01 10:50:00.0,2015-07-01 11:00:00.0]|SH600000|136.83|
|[2015-07-01 11:00:00.0,2015-07-01 11:10:00.0]|SH600000|137.04|
|[2015-07-01 11:10:00.0,2015-07-01 11:20:00.0]|SH600000|136.96|
|[2015-07-01 11:20:00.0,2015-07-01 11:30:00.0]|SH600000|136.51|
|[2015-07-01 11:30:00.0,2015-07-01 11:40:00.0]|SH600000|136.29|
|[2015-07-01 13:00:00.0,2015-07-01 13:10:00.0]|SH600000|136.39|
|[2015-07-01 13:10:00.0,2015-07-01 13:20:00.0]|SH600000|136.24|
|[2015-07-01 13:20:00.0,2015-07-01 13:30:00.0]|SH600000|137.24|
|[2015-07-01 13:30:00.0,2015-07-01 13:40:00.0]|SH600000|137.95|
|[2015-07-01 13:40:00.0,2015-07-01 13:50:00.0]|SH600000|137.25|
|[2015-07-01 13:50:00.0,2015-07-01 14:00:00.0]|SH600000|137.81|
|[2015-07-01 14:00:00.0,2015-07-01 14:10:00.0]|SH600000|137.73|
+---------------------------------------------+--------+------+
only showing top 20 rows

+---------------------------------------------+--------+------+
|time                                         |id      |avg   |
+---------------------------------------------+--------+------+
|[2015-07-01 09:25:00.0,2015-07-01 09:35:00.0]|SH600000|137.21|
|[2015-07-01 09:30:00.0,2015-07-01 09:40:00.0]|SH600000|137.12|
|[2015-07-01 09:35:00.0,2015-07-01 09:45:00.0]|SH600000|137.14|
|[2015-07-01 09:40:00.0,2015-07-01 09:50:00.0]|SH600000|137.16|
|[2015-07-01 09:45:00.0,2015-07-01 09:55:00.0]|SH600000|137.15|
|[2015-07-01 09:50:00.0,2015-07-01 10:00:00.0]|SH600000|137.39|
|[2015-07-01 09:55:00.0,2015-07-01 10:05:00.0]|SH600000|137.38|
|[2015-07-01 10:00:00.0,2015-07-01 10:10:00.0]|SH600000|137.07|
|[2015-07-01 10:05:00.0,2015-07-01 10:15:00.0]|SH600000|136.67|
|[2015-07-01 10:10:00.0,2015-07-01 10:20:00.0]|SH600000|136.42|
|[2015-07-01 10:15:00.0,2015-07-01 10:25:00.0]|SH600000|136.44|
|[2015-07-01 10:20:00.0,2015-07-01 10:30:00.0]|SH600000|136.69|
|[2015-07-01 10:25:00.0,2015-07-01 10:35:00.0]|SH600000|136.89|
|[2015-07-01 10:30:00.0,2015-07-01 10:40:00.0]|SH600000|136.78|
|[2015-07-01 10:35:00.0,2015-07-01 10:45:00.0]|SH600000|136.97|
|[2015-07-01 10:40:00.0,2015-07-01 10:50:00.0]|SH600000|137.02|
|[2015-07-01 10:45:00.0,2015-07-01 10:55:00.0]|SH600000|136.8 |
|[2015-07-01 10:50:00.0,2015-07-01 11:00:00.0]|SH600000|136.83|
|[2015-07-01 10:55:00.0,2015-07-01 11:05:00.0]|SH600000|136.9 |
|[2015-07-01 11:00:00.0,2015-07-01 11:10:00.0]|SH600000|137.04|
+---------------------------------------------+--------+------+
only showing top 20 rows

+---------------------------------------------+--------+------+
|time                                         |id      |avg   |
+---------------------------------------------+--------+------+
|[2015-07-01 09:23:00.0,2015-07-01 09:33:00.0]|SH600000|137.15|
|[2015-07-01 09:28:00.0,2015-07-01 09:38:00.0]|SH600000|137.13|
|[2015-07-01 09:33:00.0,2015-07-01 09:43:00.0]|SH600000|137.16|
|[2015-07-01 09:38:00.0,2015-07-01 09:48:00.0]|SH600000|137.17|
|[2015-07-01 09:43:00.0,2015-07-01 09:53:00.0]|SH600000|137.15|
|[2015-07-01 09:48:00.0,2015-07-01 09:58:00.0]|SH600000|137.31|
|[2015-07-01 09:53:00.0,2015-07-01 10:03:00.0]|SH600000|137.4 |
|[2015-07-01 09:58:00.0,2015-07-01 10:08:00.0]|SH600000|137.18|
|[2015-07-01 10:03:00.0,2015-07-01 10:13:00.0]|SH600000|136.84|
|[2015-07-01 10:08:00.0,2015-07-01 10:18:00.0]|SH600000|136.52|
|[2015-07-01 10:13:00.0,2015-07-01 10:23:00.0]|SH600000|136.37|
|[2015-07-01 10:18:00.0,2015-07-01 10:28:00.0]|SH600000|136.58|
|[2015-07-01 10:23:00.0,2015-07-01 10:33:00.0]|SH600000|136.84|
|[2015-07-01 10:28:00.0,2015-07-01 10:38:00.0]|SH600000|136.81|
|[2015-07-01 10:33:00.0,2015-07-01 10:43:00.0]|SH600000|136.9 |
|[2015-07-01 10:38:00.0,2015-07-01 10:48:00.0]|SH600000|137.02|
|[2015-07-01 10:43:00.0,2015-07-01 10:53:00.0]|SH600000|136.89|
|[2015-07-01 10:48:00.0,2015-07-01 10:58:00.0]|SH600000|136.84|
|[2015-07-01 10:53:00.0,2015-07-01 11:03:00.0]|SH600000|136.86|
|[2015-07-01 10:58:00.0,2015-07-01 11:08:00.0]|SH600000|136.95|
+---------------------------------------------+--------+------+
only showing top 20 rows
评论
添加红包

请填写红包祝福语或标题

红包个数最小为10个

红包金额最低5元

当前余额3.43前往充值 >
需支付:10.00
成就一亿技术人!
领取后你会自动成为博主和红包主的粉丝 规则
hope_wisdom
发出的红包

打赏作者

郝少

你的鼓励将是我创作的最大动力

¥1 ¥2 ¥4 ¥6 ¥10 ¥20
扫码支付:¥1
获取中
扫码支付

您的余额不足,请更换扫码支付或充值

打赏作者

实付
使用余额支付
点击重新获取
扫码支付
钱包余额 0

抵扣说明:

1.余额是钱包充值的虚拟货币,按照1:1的比例进行支付金额的抵扣。
2.余额无法直接购买下载,可以购买VIP、付费专栏及课程。

余额充值