1.代码:
import org.apache.spark.sql.SparkSession;
import java.io.File;
public class SparkSQL_Hive {
public static void main(String[] args){
SparkSession ss = SparkSession
.builder()
.appName("Java Spark Hive Example")
.master("local[2]")
//.config("spark.sql.warehouse.dir", warehouseLocation)
.enableHiveSupport()
.getOrCreate();
ss.sql("use test");
ss.sql("select * from user").show();
}
2.目录结构及复制hive-site.xml、core-site.xml、hdfs-site.xml
其中resource这个文件夹要在idea的project structure中将其设置为resource类型额文件夹,才会去加载里面的xml文件。
这三个xml都需要
hive-site.xml:
<?xml version="1.0"?>
<?xml-stylesheet type="text/xsl" href="configuration.xsl"?>
<configuration>
<property>
<name>spark.sql.warehouse.dir</name>
<value>hdfs://bigdata-pro01.kfk.com:9000/user/hive/warehouse</value>
</property>
<property>
<name>javax.jdo.option.ConnectionURL</name>
<value>jdbc:mysql://bigdata-pro01.kfk.com/metastore?createDatabaseIfNotExist=true</value>
</property>
<property>
<name>javax.jdo.option.ConnectionDriverName</name>
<value>com.mysql.jdbc.Driver</value>
</property>
<property>
<name>javax.jdo.option.ConnectionUserName</name>
<value>root</value>
</property>
<property>
<name>javax.jdo.option.ConnectionPassword</name>
<value>root</value>
</property>
<property>
<name>hive.cli.print.header</name>
<value>true</value>
<description>Whether to print the names of the columns in query output.</description>
</property>
<property>
<name>hive.cli.print.current.db</name>
<value>true</value>
<description>Whether to include the current database in the Hive prompt.</description>
</property>
</configuration>
3.参考:
如何使用Hive 远程模式(beeline+hiveserver2)
https://www.cnblogs.com/tq03/p/5107949.html
Java端jdbc连接:
https://www.cnblogs.com/shysky77/p/6971967.html