![](https://img-blog.csdnimg.cn/22cc0850903d447bb10b38af60058d26.jpeg?x-oss-process=image/resize,m_fixed,h_224,w_224)
SparkSQL
SparkSQL
遥遥晚风点点
大数据,java
展开
-
SparkSQL自定义函数
1,UDF,输入一行返回一行(自定义拼接字符串案例): object MyUDF { def main(args: Array[String]): Unit = { val spark: SparkSession = SparkSession.builder() .appName(this.getClass.getSimpleName) .master("local[*]") .getOrCreate() import spark.impli原创 2020-08-21 20:36:29 · 328 阅读 · 0 评论 -
SparkSQL整合Hive
1,hive连接的MySQL中创建一个普通用户,并且授权 CREATE USER 'spark'@'%' IDENTIFIED BY '123456'; GRANT ALL PRIVILEGES ON hivedb.* TO 'spark'@'%' IDENTIFIED BY '123456' WITH GRANT OPTION; FLUSH PRIVILEGES; 2,spark安装目录conf下添加hive-site.xml 配置MySQL表空间,配置第一步创建的用户名和密码 <原创 2020-08-20 18:50:03 · 220 阅读 · 0 评论 -
SparkSQL流量统计案例
数据: +---+-------------------+-------------------+----+ | id| startTime| endTime|flow| +---+-------------------+-------------------+----+ | 1|2020-02-18 14:20:30|2020-02-18 14:46:30| 20| | 1|2020-02-18 14:47:20|2020-02-18 15:20:30|原创 2020-08-19 23:43:00 · 339 阅读 · 0 评论 -
从结构化文件创建DataFrame
1,CSV object CreateDataFrameFromCSV { def main(args: Array[String]): Unit = { val spark: SparkSession = SparkSession.builder() .appName(this.getClass.getSimpleName) .master("local[*]") .getOrCreate() //获取数据的schema信息,每一行都有读取原创 2020-08-18 21:41:40 · 372 阅读 · 0 评论 -
创建DataFrame的几种方式
1,从RDD[Case class类]创建DataFrame import org.apache.spark.rdd.RDD import org.apache.spark.sql.{DataFrame, SparkSession} object SparkSQL02 { def main(args: Array[String]): Unit = { val spark: SparkSession = SparkSession.builder().appName(this.getClas原创 2020-08-18 21:20:06 · 793 阅读 · 0 评论 -
SparkSQL第一个程序,统计wordCount
import org.apache.spark.sql.{DataFrame, Dataset, Row, SparkSession} object SparkSQL01 { def main(args: Array[String]): Unit = { val spark: SparkSession = SparkSession.builder().appName(this.getClass.getName).master("local[*]").getOrCreate() .原创 2020-08-18 20:18:50 · 195 阅读 · 0 评论