17.
Spark Sql 日期函数的综合应用:
https://www.cnblogs.com/feiyumo/p/8760846.html
https://www.obstkel.com/blog/spark-sql-date-functions
上述两个帖子,有日期函数的使用说明
下面说下重点,求出某一天对应年及月及日的指标,伪代码如下
//新增 day_of_year, week_of_year, month_of_year, day_of_week, day_of_month, days_of_month, days_of_year
#年的第几天
.withColumn("day_of_year", dayofyear(from_unixtime(col("unix_time"), "yyyy-MM-dd")).cast(LongType))
#年的第几周
.withColumn("week_of_year", weekofyear(from_unixtime(col("unix_time"), "yyyy-MM-dd")).cast(LongType))
#年的第几个月
.withColumn("month_of_year", month(from_unixtime(col("unix_time"))).cast(LongType))
#一周的第几天(周一为第一天,未使用dayofmonth,该函数的每周第一天是周日)
.withColumn("day_of_week", from_unixtime(col("unix_time"), "u").cast(LongType))
#月的第几天
.withColumn("day_of_month", dayofmonth(from_unixtime(col("unix_time"))).cast(LongType))
#所在月份有几天
.withColumn("days_of_month", datediff( last_day(trunc(from_unixtime(col("unix_time"), "yyyy-MM-dd"), "MM")), trunc(from_unixtime(col("unix_time"), "yyyy-MM-dd"), "MM")).cast(LongType) +1)
#所在年份有几天
.withColumn("days_of_year", datediff( concat(year(from_unixtime(col("unix_time"), "yyyy-MM-dd")), lit("-12-31")), trunc(from_unixtime(col("unix_time"), "yyyy-MM-dd"), "year")).cast(LongType) +1 )
#所在月份对应季度的最后一天
.withColumn("quarter_date", l
last_day(concat(year(from_unixtime(col("unix_time"))).cast(StringType),
lit("-"),
when(quarter(from_unixtime(col("unix_time"))) === 1, lit("03"))
.when(quarter(from_unixtime(col("unix_time"))) === 2, lit("06"))
.when(quarter(from_unixtime(col("unix_time"))) === 3, lit("09"))
.otherwise(lit("12")),
lit("-01"))
)
补充一个函数的说明:
date_trunc函数用法,我参考了官网的说明,发现其示例有错误。
https://spark.apache.org/docs/2.3.0/api/sql/#date_trunc
date_trunc
date_trunc(fmt, ts) - Returns timestamp ts
truncated to the unit specified by the format model fmt
.fmt
should be one of ["YEAR", "YYYY", "YY", "MON", "MONTH", "MM", "DAY", "DD", "HOUR", "MINUTE", "SECOND", "WEEK", "QUARTER"]
正确用法
https://www.iteblog.com/archives/2336.html
此贴来自汇总贴的子问题,只是为了方便查询。
总贴请看置顶帖:
pyspark及Spark报错问题汇总及某些函数用法。