任务一:数据抽取
题目:编写Scala工程代码,将MySQL的shtd_store库中表CUSTOMER、NATION、PART、PARTSUPP、REGION、SUPPLIER的数据全量抽取到Hive的ods库中对应表customer,nation,part,partsupp,region,supplier中,将表ORDERS、LINEITEM的数据增量抽取到Hive的ods库中对应表ORDERS,LINEITEM中。
板块B:离线数据处理
完整代码:
def main(args:Array[string]):Unin={
Sysetem.setProperty("HADOOP_USER_NANE","root")
val spark =SparkSession.builder()
.master("local[*]")
.appName("抽取数据")
.enableHiveSupport()
.config("spark.sql.warehouse.dir","hdfs://master:50070/usr/hive/warehouse")
.config("hive.metastore.uris","thrift://master:9083")
.getOrCreate()
spark.read
.format("jdbc")
.option("driver","com.mysql.jdbc.Driver")
.option("url","jdbc://localhost:3306/spark-sql")
.option("user","root")
.option("password ,123456")
.option("dbtable","user")
.load()
.createTempView("data")
spark.sql("select*from data").show()
println("*********")
spark.sql("use study")
//静态分区
spark.sql(
"""
|create table if not exists customer(
|id int,
|name string,
|age int
|)
|partitioned by(time string)
|row format delimited fields terminated by '\t'
|;
|""".stripMargin)
println("*************")
spark.sql(
"""
|insert overwrite table customer partition (time='1001')
|select id,name,age
|from data;
|""".stripMargin)
spark.sql("select * from customer").show()
println("------------------------")
spark.stop()
}