Scala104-Spark.sql的内置日期时间函数

有些时候我们会直接用df.createOrReplaceTempView(temp)创建临时表,用sql去计算。sparkSQL有些语法和hql不一样,做个笔记。

  • <scala.version>2.11.12</scala.version>
  • <spark.version>2.4.3</spark.version>
import org.apache.spark.sql.functions._
import spark.implicits._
import org.apache.spark.ml.feature.VectorAssembler
import org.apache.spark.ml.linalg.{Vector, Vectors}
import org.apache.spark.sql.{DataFrame, Row, SparkSession}
import org.apache.spark.sql.functions._
import spark.implicits._
import org.apache.spark.ml.feature.VectorAssembler
import org.apache.spark.ml.linalg.{Vector, Vectors}
import org.apache.spark.sql.{DataFrame, Row, SparkSession}
val builder = SparkSession
      .builder()
      .appName("learningScala")
      .config("spark.executor.heartbeatInterval","60s")
      .config("spark.network.timeout","120s")
      .config("spark.serializer", "org.apache.spark.serializer.KryoSerializer")
      .config("spark.kryoserializer.buffer.max","512m")
      .config("spark.dynamicAllocation.enabled", false)
      .config("spark.sql.inMemoryColumnarStorage.compressed", true)
      .config("spark.sql.inMemoryColumnarStorage.batchSize", 10000)
      .config("spark.sql.broadcastTimeout", 600)
      .config("spark.sql.autoBroadcastJoinThreshold", -1)
      .config("spark.sql.crossJoin.enabled", true)
      .master("local[*]") 
val spark = builder.getOrCreate()
spark.sparkContext.setLogLevel("ERROR")
builder: org.apache.spark.sql.SparkSession.Builder = org.apache.spark.sql.SparkSession$Builder@5418e964
spark: org.apache.spark.sql.SparkSession = org.apache.spark.sql.SparkSession@15775384
var df1 = Seq(
(1,"2019-04-01 11:45:50",11.15,"2019-04-02 11:45:49"),
(2,"2019-05-02 11:56:50",10.37,"2019-05-02 11:56:51"),
(3,"2019-07-21 12:45:50",12.11,"2019-08-21 12:45:50"),
(4,"2019-08-01 12:40:50",14.50,"2020-08-03 12:40:50"),
(5,"2019-01-06 10:00:50",16.39,"2019-01-05 10:00:50")
).toDF("id","startTimeStr", "payamount","endTimeStr")
df1 = df1.withColumn("startTime",$"startTimeStr".cast("Timestamp"))
         .withColumn("endTime",$"endTimeStr".cast("Timestamp"))
df1.printSchema
df1.show()
root
 |-- id: integer (nullable = false)
 |-- startTimeStr: string (nullable = true)
 |-- payamount: double (nullable = false)
 |-- endTimeStr: string (nullable = true)
 |-- startTime: timestamp (nullable = true)
 |-- endTime: timestamp (nullable = true)

+---+-------------------+---------+-------------------+-------------------+-------------------+
| id|       startTimeStr|payamount|         endTimeStr|          startTime|            endTime|
+---+-------------------+---------+-------------------+-------------------+-------------------+
|  1|2019-04-01 11:45:50|    11.15|2019-04-02 11:45:49|2019-04-01 11:45:50|2019-04-02 11:45:49|
|  2|2019-05-02 11:56:50|    10.37|2019-05-02 11:56:51|2019-05-02 11:56:50|2019-05-02 11:56:51|
|  3|2019-07-21 12:45:50|    12.11|2019-08-21 12:45:50|2019-07-21 12:45:50|2019-08-21 12:45:50|
|  4|2019-08-01 12:40:50|     14.5|2020-08-03 12:40:50|2019-08-01 12:40:50|2020-08-03 12:40:50|
|  5|2019-01-06 10:00:50|    16.39|2019-01-05 10:00:50|2019-01-06 10:00:50|2019-01-05 10:00:50|
+---+-------------------+---------+-------------------+-------------------+-------------------+






df1: org.apache.spark.sql.DataFrame = [id: int, startTimeStr: string ... 4 more fields]
df1: org.apache.spark.sql.DataFrame = [id: int, startTimeStr: string ... 4 more fields]

timestamp转string

把timestamp转换成对应格式字符串

  • date_format把timestamp转换成对应的字符串
  • 字符串格式用"yyyyMMdd"表示
df1.createOrReplaceTempView("temp")
var sql = """
SELECT date_format(startTime,'yyyyMMdd') AS yyyyMMdd,
       date_format(startTime,'yyyy-MM-dd') AS yyyy_MM_dd,
       date_format(startTime,'yyyy') AS yyyy
FROM TEMP
"""
spark.sql(sql).printSchema
spark.sql(sql).show()
root
 |-- yyyyMMdd: string (nullable = true)
 |-- yyyy_MM_dd: string (nullable = true)
 |-- yyyy: string (nullable = true)

+--------+----------+----+
|yyyyMMdd|yyyy_MM_dd|yyyy|
+--------+----------+----+
|20190401|2019-04-01|2019|
|20190502|2019-05-02|2019|
|20190721|2019-07-21|2019|
|20190801|2019-08-01|2019|
|20190106|2019-01-06|2019|
+--------+----------+----+






sql: String =
"
SELECT date_format(startTime,'yyyyMMdd') AS yyyyMMdd,
       date_format(startTime,'yyyy-MM-dd') AS yyyy_MM_dd,
       date_format(startTime,'yyyy') AS yyyy
FROM TEMP
"

timestamp转date

  • to_date可以把timestamp转换成date类型
sql = """
SELECT startTime,endTime,
       to_date(startTime) AS startDate,
       to_date(endTime) AS endDate
FROM TEMP
"""
var df2 = spark.sql(sql)
df2.printSchema
df2.show()
root
 |-- startTime: timestamp (nullable = true)
 |-- endTime: timestamp (nullable = true)
 |-- startDate: date (nullable = true)
 |-- endDate: date (nullable = true)

+-------------------+-------------------+----------+----------+
|          startTime|            endTime| startDate|   endDate|
+-------------------+-------------------+----------+----------+
|2019-04-01 11:45:50|2019-04-02 11:45:49|2019-04-01|2019-04-02|
|2019-05-02 11:56:50|2019-05-02 11:56:51|2019-05-02|2019-05-02|
|2019-07-21 12:45:50|2019-08-21 12:45:50|2019-07-21|2019-08-21|
|2019-08-01 12:40:50|2020-08-03 12:40:50|2019-08-01|2020-08-03|
|2019-01-06 10:00:50|2019-01-05 10:00:50|2019-01-06|2019-01-05|
+-------------------+-------------------+----------+----------+






sql: String =
SELECT startTime,endTime,
       to_date(startTime) AS startDate,
       to_date(endTime) AS endDate
FROM TEMP

df2: org.apache.spark.sql.DataFrame = [startTime: timestamp, endTime: timestamp ... 2 more fields]

求时间差

  • 天数差函数datediff可以应用在timestamp中,也可应用在date类型中,单位是自然天,而不是24小时
  • 月份差函数months_between同样可以,月度的单位好像是不固定的,即31天or30天
df2.createOrReplaceTempView("temp")

var sql = """
SELECT startTime,
       endTime,
       datediff(endTime,startTime) AS dayInterval1,
       datediff(endDate,startDate) AS dayInterval2,
       months_between(endTime,startTime) AS monthInterval1,
       months_between(endDate,startDate) AS monthInterval2
FROM TEMP
"""
// spark.sql(sql).printSchema
spark.sql(sql).show()
+-------------------+-------------------+------------+------------+--------------+--------------+
|          startTime|            endTime|dayInterval1|dayInterval2|monthInterval1|monthInterval2|
+-------------------+-------------------+------------+------------+--------------+--------------+
|2019-04-01 11:45:50|2019-04-02 11:45:49|           1|           1|    0.03225769|    0.03225806|
|2019-05-02 11:56:50|2019-05-02 11:56:51|           0|           0|           0.0|           0.0|
|2019-07-21 12:45:50|2019-08-21 12:45:50|          31|          31|           1.0|           1.0|
|2019-08-01 12:40:50|2020-08-03 12:40:50|         368|         368|   12.06451613|   12.06451613|
|2019-01-06 10:00:50|2019-01-05 10:00:50|          -1|          -1|   -0.03225806|   -0.03225806|
+-------------------+-------------------+------------+------------+--------------+--------------+






sql: String =
"
SELECT startTime,
       endTime,
       datediff(endTime,startTime) AS dayInterval1,
       datediff(endDate,startDate) AS dayInterval2,
       months_between(endTime,startTime) AS monthInterval1,
       months_between(endDate,startDate) AS monthInterval2
FROM TEMP
"

Ref

[1] https://www.cnblogs.com/feiyumo/p/8760846.html

                                2020-03-24 于南京市江宁区九龙湖

  • 1
    点赞
  • 1
    收藏
    觉得还不错? 一键收藏
  • 0
    评论
你可以使用Android系统自带的日历提供者(ContentProvider)来实现这个功能。具体步骤如下: 1. 首先获取日历提供者的URI: ``` Uri uri = CalendarContract.Events.CONTENT_URI; ``` 2. 构建查询条件,包括要查询的日历ID和时间范围: ``` String selection = "((" + CalendarContract.Events.CALENDAR_ID + " = ?) OR (" + CalendarContract.Events.CALENDAR_ID + " = ?) OR (" + CalendarContract.Events.CALENDAR_ID + " = ?)) AND (" + CalendarContract.Events.DTSTART + " >= ?) AND (" + CalendarContract.Events.DTEND + " <= ?)"; String[] selectionArgs = new String[] {"1", "2", "3", String.valueOf(startTime), String.valueOf(endTime)}; ``` 其中,`startTime` 和 `endTime` 分别为开始时间和结束时间时间戳值。 3. 执行查询操作: ``` Cursor cursor = getContentResolver().query(uri, null, selection, selectionArgs, null); ``` 4. 遍历查询结果: ``` while (cursor.moveToNext()) { // 处理查询结果 } ``` 在处理查询结果时,你可以获取相关的日程信息,例如: ``` long eventId = cursor.getLong(cursor.getColumnIndex(CalendarContract.Events._ID)); String title = cursor.getString(cursor.getColumnIndex(CalendarContract.Events.TITLE)); long startTime = cursor.getLong(cursor.getColumnIndex(CalendarContract.Events.DTSTART)); long endTime = cursor.getLong(cursor.getColumnIndex(CalendarContract.Events.DTEND)); ``` 以上是实现查找本地日历id为1,2,3并且查询条件有开始时间和结束时间的日程的基本步骤。需要注意的是,你需要在AndroidManifest.xml文件中添加相应的权限声明,例如: ``` <uses-permission android:name="android.permission.READ_CALENDAR" /> ```

“相关推荐”对你有帮助么?

  • 非常没帮助
  • 没帮助
  • 一般
  • 有帮助
  • 非常有帮助
提交
评论
添加红包

请填写红包祝福语或标题

红包个数最小为10个

红包金额最低5元

当前余额3.43前往充值 >
需支付:10.00
成就一亿技术人!
领取后你会自动成为博主和红包主的粉丝 规则
hope_wisdom
发出的红包
实付
使用余额支付
点击重新获取
扫码支付
钱包余额 0

抵扣说明:

1.余额是钱包充值的虚拟货币,按照1:1的比例进行支付金额的抵扣。
2.余额无法直接购买下载,可以购买VIP、付费专栏及课程。

余额充值