Spark与jdbc连接Postgre库(scala代码实现)

最新推荐文章于 2023-11-07 00:47:04 发布

ZeroXu0

最新推荐文章于 2023-11-07 00:47:04 发布

阅读量1.8k

点赞数 1

分类专栏： Spark学习记录文章标签： spark jdbc postgresql

本文链接：https://blog.csdn.net/weixin_41579433/article/details/119148511

版权

Spark学习记录专栏收录该内容

16 篇文章 0 订阅

订阅专栏

在开发spark应用过程中，遇到需要连接postgres库的场景，进行增删改查操作。可以通过原生的jdbc去连接pg库，也可以使用spark直连pg库作增删改查操作，代码均是用scala写的。

1.原生jdbc连接

下面是 通过DriverManager连接pg

try{
// 将“用户名、密码”加入properties
val conn_prop = new Properties()
conn_prop.put("user","xxxx")
conn_prop.put("password","xxxx")
//pg库的连接串：'xxxx'是库名
val conn_url = "jdbc:postgresql://127.0.0.1:5432/xxxx"
//创建pg数据库连接
val con:Connection = DriverManager.getConnection(conn_url,conn_prop )
//创建statement
//查询statement
val query_stm:Statement = con.createStatement()
val rs1= query_stm.excuteQuery("select id,name,money from test_zero where id<1")
while(rs1.next()){
	val r_id = rs1.getInt("id")
	val r_name = rs1.getString("name")
	val r_sex = rs1.getDouble("money")
	println(r_id+","+r_name+","+r_money)
}
//更新statement
val update_stm:Statement = con.createStatement()
val rs2 = update_stm.excuteUpdate("update test_zero set name='zero_update',money=100000.00 where name='zero'")
}
finally{
   //关闭连接
   conn.close()
}

2.spark直连

下面是 通过Spark直接连接pg,有两种方式
方式一：sparkSession.read.jdbc(url,table,properties)
方式二：sparkSession.read.format(“jdbc”).option("","").load()

//创建sparkSession
val conf  = new SparkConf().setAppName("conncetPG").setMaster("local[*]")
val spark = SparkSession.builder().config(conf).getOrCreate()
/**
 * 第一种：spark.read.jdbc()
 */
//将“用户名、密码、驱动类”加入properties类
val conn_prop = new Properties()
conn_prop.put("user","xxxx")
conn_prop.put("password","xxxx")
conn_prop.put("driver","org.postgresql.Driver")
//连接pg库，将properties参数和待查询的表一并传入，生成结果dataframe
val df1 = spark.read.jdbc(url="jdbc:postgresql://127.0.0.1:5432/xxxx",table="test_zero",conn_prop)
//对结果dataFrmae进行查询操作
df.select("id","name","money").show()
/**
 * 第二种：spark.read.format("jdbc").option("","").load()
 */
 //将数据库相关参数直接以.option()方式传入sparksession，然后load结果生产dataframe
 val df2 = spark.read
 		.format("jdbc")
 		.option("url","jdbc:postgresql://127.0.0.1:5432/xxxx")
 		.option("dbtable","test_zero")
 		.option("user","xxxx")
 		.option("password","xxxx")
 		.load()
 //对结果表dataframe进行过滤查询操作
 val n_df2 = df2.select("id","name","money").filter("name='zero'")
 /**
 * 另外可以将查询结果dataframe创建成临时表，这样就可以使用纯hive-sql的方式进行查询操作
 * 例如：sparkSession.sql("query_sql")
 */
 //创建临时表，这样可以直接使用spark.sql()
 n_df2.createOrReplaceTempView("tmp_test_zero")
 //执行sparksql
 spark.sql("select * from tmp_test_zero where name like '%zero%'").show()

ZeroXu0

关注

1
点赞
踩
12

收藏

觉得还不错? 一键收藏
打赏
2
评论
Spark与jdbc连接Postgre库(scala代码实现)

在开发spark过程中，遇到需要连接postgres库的场景，可以通过原生的jdbc去连接，也可以使用spark直连。1.scala 原生jdbc连接下面是通过DriverManager连接pgtry{// 将用户名密码加入propertiesval conn_prop = new Properties()conn_prop.put("user","xxxx")conn_prop.put("password","xxxx")//pg库的连接串：'xxxx'是库名val conn_ur.
复制链接

扫一扫