SparkSQL之Hive操作

最新推荐文章于 2024-08-08 16:44:55 发布

维维weiwei

最新推荐文章于 2024-08-08 16:44:55 发布

阅读量5.1k

点赞数

分类专栏： Spark生态系统

版权声明：本文为博主原创文章，遵循 CC 4.0 BY-SA 版权协议，转载请附上原文出处链接和本声明。

本文链接：https://blog.csdn.net/tangshiweibbs/article/details/71598600

版权

Spark生态系统专栏收录该内容

24 篇文章 0 订阅

订阅专栏

package com.uplooking.bigdata.sql.p2

import org.apache.spark.{SparkConf, SparkContext}
import org.apache.spark.sql.hive.HiveContext

/**
* sparksql集成hive的基本操作
* 需求：
* 把在jdbc操作过程使用hive来一遍
* 数据源---->Hive总的表
*   teacher_basic.txt
*     name,age,married,children
*   teacher_info.txt
*     name,height
* 最终实际上要的一个效果就是
*     teacher_basic.txt.join(teacher_info.txt)
* 将
*     name,age,married,height,children
*   保存到hive中的表
*   teachers
*
*/
object ScalaSparkSQLHiveOps extends App {
val conf = new SparkConf().setAppName("ScalaSparkSQLHiveOps")
/**
    * conf.setMaster("spark://master:7077")和在spark-submit.sh脚本中设置为--master spark://master:7077
    * 之间的区别？
    * 第二种方式在该案例中无法运行程序，第一种方式可以，为什么？
    */
conf.setMaster("spark://master:7077")
val sc = new SparkContext(conf)
val hiveContext = new HiveContext(sc)
/**
    * 步骤：
    *   1、在hive中创建两个表
    *   create table teacher_basic
    *   create table teacher_info
    *   2、将数据加载进这两表中
    *     load data local inpath '/opt/data/spark/sql/teacher_basic.txt' into table teacher_basic
    *     load data local inpath '/opt/data/spark/sql/teacher_info.txt' into table teacher_info
    *   3、进行hiveql的join操作
    *     select id, name, age, height, married, children from teacher_basic
    *     tb left join teacher_info ti on tb.name = ti.name
    *   4、将计算之后的(步骤3)数据保存到一张表teachers
    */
/**
    * step-1、在Hive中创建两张表
    */
hiveContext.sql("create database if not exists spark_hive")
//name,age,married,children
hiveContext.sql("create table if not exists spark_hive.teacher_basic(name string, age int, married boolean, children int) row format delimited fields terminated by ','")
hiveContext.sql("create table if not exists spark_hive.teacher_info(name string, height int) row format delimited fields terminated by ','")

/**
    * step-2: 加载数据到这两张表中
    */
hiveContext.sql("load data local inpath '/opt/data/spark/sql/teacher_basic.txt' overwrite into table spark_hive.teacher_basic")
hiveContext.sql("load data local inpath '/opt/data/spark/sql/teacher_info.txt' overwrite into table spark_hive.teacher_info")

/**
    * step-3:进行多表关联计算
    * name,age,married,height,children
    */
val joinedDF = hiveContext.sql("select tb.name, tb.age, tb.married, ti.height, tb.children from spark_hive.teacher_basic tb left join spark_hive.teacher_info ti on tb.name = ti.name")
/**
    * step-4:数据落地
    */
joinedDF.write.saveAsTable("spark_hive.teachers")
sc.stop
}

关注

0
点赞
踩
1

收藏

觉得还不错? 一键收藏
0
评论
复制链接

分享到 QQ

分享到新浪微博

扫一扫

专栏目录

评论

被折叠的条评论为什么被折叠?

到【灌水乐园】发言

查看更多评论

添加红包

成就一亿技术人!

hope_wisdom

发出的红包

实付元

使用余额支付

点击重新获取

扫码支付

钱包余额 0

抵扣说明：

1.余额是钱包充值的虚拟货币，按照1:1的比例进行支付金额的抵扣。
2.余额无法直接购买下载，可以购买VIP、付费专栏及课程。