目录
一、说明
- 这里的window函数是属于functions.scala文件中的内置函数。
- window函数是在spark2.0.0中新增的API函数。
- 这里选取股票数据为测试数据:测试数据下载地址。
- 时间一定要整理成yyyy-MM-dd hh:mm:ss格式,其他格式内置函数不识别。
二、数据样式
--表头
id,date,openPrice,highestPrice,lowestPrice,closePrice,exchangeNum
代码,时间,开盘价,最高价,最低价,收盘价,成交量(手)
--数据
SH600000,2015-7-1 9:31:00,137.5999,137.5999,136.5358,136.7814,64912
SH600000,2015-7-1 9:32:00,136.8632,137.5181,136.6995,137.5181,31363
SH600000,2015-7-1 9:33:00,137.4362,137.8455,137.1088,137.6818,25057
SH600000,2015-7-1 9:34:00,137.5999,137.5999,136.7814,136.8632,32136
SH600000,2015-7-1 9:35:00,137.1088,137.3544,136.8632,137.0269,29211
SH600000,2015-7-1 9:36:00,137.1088,137.3544,136.9451,136.9451,23235
SH600000,2015-7-1 9:37:00,136.9451,137.2725,136.9451,137.1088,15110
SH600000,2015-7-1 9:38:00,137.3544,137.6818,137.0269,137.3544,17345
...
三、代码示例
package com.dt.spark.Test
import java.sql.Date
import org.apache.spark.SparkContext
import org.apache.spark.sql.{DataFrame, Dataset, SparkSession}
object windowTest {
case class Ticket(id: String, date: String, openPrice: Double, highestPrice: Double, lowestPrice: Double, closePrice: Double, exchangeNum: Int)
def main(args: Array[String]): Unit = {
val spark: SparkSession = SparkSession.builder().master("local[1]").appName("windowtest").getOrCreate()
val sc: SparkContext = spark.sparkContext
sc.setLogLevel("WARN")
import org.apache.spark.sql.functions._
import spark.implicits._
val ticketDf: DataFrame = spark.read
//推断数据类型
.option("inferSchema", "true")
//表示有表头,若没有则为false
.option("header", true)
.csv("data/test/ticket.csv")
/**
* 由于dataSet是强类型,故DF转化为DS时,DF的推断类型要和上面的实体类一致。
* 可以打印schema看自动推断类型
*/
println(ticketDf.schema)
val ticketDs: Dataset[Ticket] = ticketDf.as[Ticket]
ticketDf.show(false)
/**
* window时间窗口函数的应用一:
* 十分钟滚动窗口的平均收盘价
*/
ticketDf.groupBy(window($"date", "10 minute").alias("time"), $"id")
.agg(round(mean($"closePrice"), 2).alias("avg"))
.orderBy($"time")
.show(false) //false可以将列的值显示全
/**
* window时间窗口函数的应用二:
* 滑动窗口每隔5min,每10min窗口的平均股价
*/
ticketDf.groupBy(window(col("date"), "10 minute", "5 minute").alias("time"), col("id"))
.agg(round(mean(col("closePrice")), 2).alias("avg"))
.orderBy(col("time"))
.show(false)
/**
* window时间窗口函数应用三:
* 滑动窗口每5min的后3分钟,每10min窗口的平均股价
*/
ticketDf.groupBy(window('date, "10 minute", "5 minute", "3 minute").alias("time"), 'id)
.agg(round(mean('closePrice), 2).alias("avg"))
.orderBy('time)
.show(false)
}
}
四、运行结果
StructType(StructField(id,StringType,true), StructField(date,TimestampType,true), StructField(openPrice,DoubleType,true), StructField(highestPrice,DoubleType,true), StructField(lowestPrice,DoubleType,true), StructField(closePrice,DoubleType,true), StructField(exchangeNum,IntegerType,true))
+--------+---------------------+---------+------------+-----------+----------+-----------+
|id |date |openPrice|highestPrice|lowestPrice|closePrice|exchangeNum|
+--------+---------------------+---------+------------+-----------+----------+-----------+
|SH600000|2015-07-01 09:31:00.0|137.5999 |137.5999 |136.5358 |136.7814 |64912 |
|SH600000|2015-07-01 09:32:00.0|136.8632 |137.5181 |136.6995 |137.5181 |31363 |
|SH600000|2015-07-01 09:33:00.0|137.4362 |137.8455 |137.1088 |137.6818 |25057 |
|SH600000|2015-07-01 09:34:00.0|137.5999 |137.5999 |136.7814 |136.8632 |32136 |
|SH600000|2015-07-01 09:35:00.0|137.1088 |137.3544 |136.8632 |137.0269 |29211 |
|SH600000|2015-07-01 09:36:00.0|137.1088 |137.3544 |136.9451 |136.9451 |23235 |
|SH600000|2015-07-01 09:37:00.0|136.9451 |137.2725 |136.9451 |137.1088 |15110 |
|SH600000|2015-07-01 09:38:00.0|137.3544 |137.6818 |137.0269 |137.3544 |17345 |
|SH600000|2015-07-01 09:39:00.0|137.3544 |137.5181 |136.7814 |136.7814 |30376 |
|SH600000|2015-07-01 09:40:00.0|136.9451 |138.0092 |136.7814 |137.1088 |25467 |
|SH600000|2015-07-01 09:41:00.0|137.1088 |137.5999 |136.8632 |137.1907 |41187 |
|SH600000|2015-07-01 09:42:00.0|137.1907 |137.5181 |137.0269 |137.5181 |16239 |
|SH600000|2015-07-01 09:43:00.0|137.3544 |137.3544 |136.8632 |136.9451 |27479 |
|SH600000|2015-07-01 09:44:00.0|137.0269 |137.5181 |137.0269 |137.4362 |11787 |
|SH600000|2015-07-01 09:45:00.0|137.5181 |137.5181 |137.2725 |137.3544 |11257 |
|SH600000|2015-07-01 09:46:00.0|137.2725 |137.4362 |137.1088 |137.1088 |14879 |
|SH600000|2015-07-01 09:47:00.0|137.1907 |137.1907 |136.8632 |136.9451 |37616 |
|SH600000|2015-07-01 09:48:00.0|137.0269 |137.1088 |136.8632 |137.0269 |15868 |
|SH600000|2015-07-01 09:49:00.0|137.1088 |137.1088 |136.8632 |136.9451 |14866 |
|SH600000|2015-07-01 09:50:00.0|137.3544 |137.5181 |137.3544 |137.5181 |28107 |
+--------+---------------------+---------+------------+-----------+----------+-----------+
only showing top 20 rows
+---------------------------------------------+--------+------+
|time |id |avg |
+---------------------------------------------+--------+------+
|[2015-07-01 09:30:00.0,2015-07-01 09:40:00.0]|SH600000|137.12|
|[2015-07-01 09:40:00.0,2015-07-01 09:50:00.0]|SH600000|137.16|
|[2015-07-01 09:50:00.0,2015-07-01 10:00:00.0]|SH600000|137.39|
|[2015-07-01 10:00:00.0,2015-07-01 10:10:00.0]|SH600000|137.07|
|[2015-07-01 10:10:00.0,2015-07-01 10:20:00.0]|SH600000|136.42|
|[2015-07-01 10:20:00.0,2015-07-01 10:30:00.0]|SH600000|136.69|
|[2015-07-01 10:30:00.0,2015-07-01 10:40:00.0]|SH600000|136.78|
|[2015-07-01 10:40:00.0,2015-07-01 10:50:00.0]|SH600000|137.02|
|[2015-07-01 10:50:00.0,2015-07-01 11:00:00.0]|SH600000|136.83|
|[2015-07-01 11:00:00.0,2015-07-01 11:10:00.0]|SH600000|137.04|
|[2015-07-01 11:10:00.0,2015-07-01 11:20:00.0]|SH600000|136.96|
|[2015-07-01 11:20:00.0,2015-07-01 11:30:00.0]|SH600000|136.51|
|[2015-07-01 11:30:00.0,2015-07-01 11:40:00.0]|SH600000|136.29|
|[2015-07-01 13:00:00.0,2015-07-01 13:10:00.0]|SH600000|136.39|
|[2015-07-01 13:10:00.0,2015-07-01 13:20:00.0]|SH600000|136.24|
|[2015-07-01 13:20:00.0,2015-07-01 13:30:00.0]|SH600000|137.24|
|[2015-07-01 13:30:00.0,2015-07-01 13:40:00.0]|SH600000|137.95|
|[2015-07-01 13:40:00.0,2015-07-01 13:50:00.0]|SH600000|137.25|
|[2015-07-01 13:50:00.0,2015-07-01 14:00:00.0]|SH600000|137.81|
|[2015-07-01 14:00:00.0,2015-07-01 14:10:00.0]|SH600000|137.73|
+---------------------------------------------+--------+------+
only showing top 20 rows
+---------------------------------------------+--------+------+
|time |id |avg |
+---------------------------------------------+--------+------+
|[2015-07-01 09:25:00.0,2015-07-01 09:35:00.0]|SH600000|137.21|
|[2015-07-01 09:30:00.0,2015-07-01 09:40:00.0]|SH600000|137.12|
|[2015-07-01 09:35:00.0,2015-07-01 09:45:00.0]|SH600000|137.14|
|[2015-07-01 09:40:00.0,2015-07-01 09:50:00.0]|SH600000|137.16|
|[2015-07-01 09:45:00.0,2015-07-01 09:55:00.0]|SH600000|137.15|
|[2015-07-01 09:50:00.0,2015-07-01 10:00:00.0]|SH600000|137.39|
|[2015-07-01 09:55:00.0,2015-07-01 10:05:00.0]|SH600000|137.38|
|[2015-07-01 10:00:00.0,2015-07-01 10:10:00.0]|SH600000|137.07|
|[2015-07-01 10:05:00.0,2015-07-01 10:15:00.0]|SH600000|136.67|
|[2015-07-01 10:10:00.0,2015-07-01 10:20:00.0]|SH600000|136.42|
|[2015-07-01 10:15:00.0,2015-07-01 10:25:00.0]|SH600000|136.44|
|[2015-07-01 10:20:00.0,2015-07-01 10:30:00.0]|SH600000|136.69|
|[2015-07-01 10:25:00.0,2015-07-01 10:35:00.0]|SH600000|136.89|
|[2015-07-01 10:30:00.0,2015-07-01 10:40:00.0]|SH600000|136.78|
|[2015-07-01 10:35:00.0,2015-07-01 10:45:00.0]|SH600000|136.97|
|[2015-07-01 10:40:00.0,2015-07-01 10:50:00.0]|SH600000|137.02|
|[2015-07-01 10:45:00.0,2015-07-01 10:55:00.0]|SH600000|136.8 |
|[2015-07-01 10:50:00.0,2015-07-01 11:00:00.0]|SH600000|136.83|
|[2015-07-01 10:55:00.0,2015-07-01 11:05:00.0]|SH600000|136.9 |
|[2015-07-01 11:00:00.0,2015-07-01 11:10:00.0]|SH600000|137.04|
+---------------------------------------------+--------+------+
only showing top 20 rows
+---------------------------------------------+--------+------+
|time |id |avg |
+---------------------------------------------+--------+------+
|[2015-07-01 09:23:00.0,2015-07-01 09:33:00.0]|SH600000|137.15|
|[2015-07-01 09:28:00.0,2015-07-01 09:38:00.0]|SH600000|137.13|
|[2015-07-01 09:33:00.0,2015-07-01 09:43:00.0]|SH600000|137.16|
|[2015-07-01 09:38:00.0,2015-07-01 09:48:00.0]|SH600000|137.17|
|[2015-07-01 09:43:00.0,2015-07-01 09:53:00.0]|SH600000|137.15|
|[2015-07-01 09:48:00.0,2015-07-01 09:58:00.0]|SH600000|137.31|
|[2015-07-01 09:53:00.0,2015-07-01 10:03:00.0]|SH600000|137.4 |
|[2015-07-01 09:58:00.0,2015-07-01 10:08:00.0]|SH600000|137.18|
|[2015-07-01 10:03:00.0,2015-07-01 10:13:00.0]|SH600000|136.84|
|[2015-07-01 10:08:00.0,2015-07-01 10:18:00.0]|SH600000|136.52|
|[2015-07-01 10:13:00.0,2015-07-01 10:23:00.0]|SH600000|136.37|
|[2015-07-01 10:18:00.0,2015-07-01 10:28:00.0]|SH600000|136.58|
|[2015-07-01 10:23:00.0,2015-07-01 10:33:00.0]|SH600000|136.84|
|[2015-07-01 10:28:00.0,2015-07-01 10:38:00.0]|SH600000|136.81|
|[2015-07-01 10:33:00.0,2015-07-01 10:43:00.0]|SH600000|136.9 |
|[2015-07-01 10:38:00.0,2015-07-01 10:48:00.0]|SH600000|137.02|
|[2015-07-01 10:43:00.0,2015-07-01 10:53:00.0]|SH600000|136.89|
|[2015-07-01 10:48:00.0,2015-07-01 10:58:00.0]|SH600000|136.84|
|[2015-07-01 10:53:00.0,2015-07-01 11:03:00.0]|SH600000|136.86|
|[2015-07-01 10:58:00.0,2015-07-01 11:08:00.0]|SH600000|136.95|
+---------------------------------------------+--------+------+
only showing top 20 rows