时间工具类 package com.huc.utils;import java.time.*;import java.time.format.DateTimeFormatter;import java.util.Date;/** * LocalDate 年月日 * LocalTime 时分秒 * LocalDateTime 年月日 时分秒 */public class DateTimeUtil { private final static DateTimeFormatter for
SpringBoot中常见注解@Controller@ResponseBody@RestController@RequestMapping@RequestParam package com.huc.demo.controller;import org.springframework.web.bind.annotation.RequestMapping;import org.springframework.web.bind.annotation.RequestParam;import org.springframework.web.bind.annotation.RestController;//@Controller// 将类标识为Controller层
SQL中求两日期的天数差值 DATEDIFF() select DATEDIFF(“2020-12-10”,“2020-12-11”)结果为:-1select DATEDIFF(“2020-12-12”,“2020-12-11”)结果为:1
Spark SQL中DataFrame和DataSet之间相互转换 package com.huc.sparkSqlimport org.apache.spark.SparkConfimport org.apache.spark.rdd.RDDimport org.apache.spark.sql.{DataFrame, Dataset, SparkSession}object Test04_DSAndDF { def main(args: Array[String]): Unit = { // 1. 创建sparkSession配置对象 v
Spark SQL中RDD和DataSet之间相互转换 package com.huc.sparkSqlimport org.apache.spark.{SparkConf, SparkContext}import org.apache.spark.rdd.RDDimport org.apache.spark.sql.{DataFrame, Dataset, SparkSession}/** * 1.RDD 转换为DataSet * RDD.map{x=>User(x._1,x._2)}.toDS() * SparkSQL 能够自动将包含
SparkSQL中RDD和DF之间相互转换 package com.huc.sparkSqlimport org.apache.spark.rdd.RDDimport org.apache.spark.sql.{DataFrame, Row, SparkSession}import org.apache.spark.{SparkConf, SparkContext}/** * 1.RDD 转换为DataFrame * 手动转换:RDD.toDF("列名1","列名2") * 通过样例类反射转换:UserRDD.map{x=>U
SparkSQL中自定义函数之UDAF package com.huc.sparkSqlimport org.apache.spark.SparkConfimport org.apache.spark.sql.expressions.Aggregatorimport org.apache.spark.sql.{DataFrame, Encoder, Encoders, SparkSession, functions}object Test07_UDAF { def main(args: Array[String]): Unit =
SparkSQL中自定义函数之UDF package com.huc.sparkSqlimport org.apache.spark.SparkConfimport org.apache.spark.sql.{DataFrame, SparkSession}object Test06_CustomUDF { def main(args: Array[String]): Unit = { // 1. 创建sparkSession配置对象 val conf: SparkConf = new SparkConf().se
SparkSQL数据的保存 package com.huc.sparkSqlimport org.apache.spark.{SPARK_BRANCH, SparkConf}import org.apache.spark.sql.{DataFrame, DataFrameReader, SaveMode, SparkSession}object Test10_Write { def main(args: Array[String]): Unit = { // 1. 创建sparkSession配置对象 v
SparkSQL数据的加载 package com.huc.sparkSqlimport org.apache.spark.SparkConfimport org.apache.spark.sql.{DataFrame, DataFrameReader, SparkSession}object Test09_Read { def main(args: Array[String]): Unit = { // 1. 创建sparkSession配置对象 val conf: SparkConf = new Sp
spark中使用Scala来写自定义分区器 自定义分区器HashPartitioner源码解读: /*class HashPartitioner(partitions: Int) extends Partitioner { // 传进来的分区个数必须是大于等于0的,不然它会报错 require(partitions >= 0, s"Number of partitions ($partitions) cannot be negative.") // 重写分区器的抽象方法 // 记录它有多少个分区 就是外面
combineByKey()转换结构后分区内和分区间操作 1)函数签名:def combineByKey[C](createCombiner: V => C,mergeValue: (C, V) => C,mergeCombiners: (C, C) =>C): RDD[(K, C)](1)createCombiner(转换数据的结构): combineByKey() 会遍历分区中的所有元素,因此每个元素的键要么还没有遇到过,要么就和之前的某个元素的键相同。如果这是一个新的元素,combineByKey()会使用一个叫作createCo