背景
记录一下前一阵spark连接hive和HBase的过程,主要得保证主机和虚拟机的主机名映射是一致的
步骤
1、首先保证windows的hosts文件、CentOS的hosts文件、CentOS的hostname文件中的待连接ip对应的主机名是一致的
比如我要连接的ip是192.168.57.141,那我的windows下的C:\Windows\System32\drivers\etc\hosts文件中相应内容为
192.168.57.141 scentos
虚拟机中/etc/hosts中相应内容为(注意下面的localhost部分也别少,否则windows还是连不过来)
192.168.57.141 scentos
127.0.0.1 localhost localhost.localdomain localhost4 localhost4.localdomain4
::1 localhost localhost.localdomain localhost6 localhost6.localdomain6
虚拟机中/etc/hostname中相应内容为
scentos
虚拟机中两个文件内容修改后重启才能生效,也可以用下面命令临时改变主机名
[root@scentos spark-2.1.0]# hostname scentos
2、把hive-site.xml复制到spark目录的conf目录下,关闭tez引擎
3、检查mysql中hive的SDS和DBS表,如果以前hdfs有用localhost存过数据,一律把localhost改成真实的ip
mysql> update SDS set LOCATION=REPLACE (LOCATION,'hdfs://localhost:8020/user/hive/warehouse','hdfs://192.168.57.141:8020/user/hive/warehouse');
mysql> update DBS set DB_LOCATION_URI=REPLACE (DB_LOCATION_URI,'hdfs://localhost:8020/user/hive/warehouse','hdfs://192.168.57.141:8020/user/hive/warehouse');
4、把hive-site.xml、core-site.xml和hdfs.xml复制到idea中的resource目录下,其中hive-site.xml复制后要关闭tez引擎
5、在idea项目的pom文件中引入依赖
<dependency>
<groupId>org.apache.spark</groupId>
<artifactId>spark-sql_2.11</artifactId>
<version>2.1.0</version>
</dependency>
<dependency>
<groupId>org.apache.spark</groupId>
<artifactId>spark-hive_2.11</artifactId>
<version>2.1.0</version>
</dependency>
<dependency>
<groupId>org.apache.hive</groupId>
<artifactId>hive-hbase-handler</artifactId>
<version>2.3.5</version>
</dependency>
6、启动spark集群、hive的metastore和hbase集群后,在项目中写下如下代码,其中zookeeperIp、zookeeperPort和hbaseMasterURL换成自己的ZooKeeper地址、端口和HBase的地址
SparkConf conf = new SparkConf()
.setMaster("local[*]")
.setAppName("ActionConsumer")
.set("spark.serializer", KryoSerializer.class.getCanonicalName())
.registerKryoClasses(new Class[]{ConsumerRecord.class})
.set("spark.kryoserializer.buffer.max", "512m")
.set("hbase.zookeeper.quorum", zookeeperIp)
.set("hbase.zookeeper.property.clientPort", zookeeperPort)
.set("hbase.master", hbaseMasterURL);
SparkSession session = SparkSession
.builder()
.config(conf)
.enableHiveSupport()
.getOrCreate();
Dataset<Row> rawData = session.sql("select * from profile");
rawData.show();
把session.sql()的参数换成自己的sql语句,然后编译运行即可