HBase代码在linux上执行的几种方式
Scala object的方式
- 编译Scala Object文件
export HBASE_CLASSPATH=`hbase classpath`
scalac -classpath "$HBASE_CLASSPATH" ${用户的scala文件}
# 注意,用户的scala文件中需要定义main函数
- 使用scala执行编译好的scala class文件
scala -classpath "$HBASE_CLASSPATH" ${用户scala文件main函数对应Object的class}
Scala 脚本方式
scala -classpath "$HBASE_CLASSPATH" ${用户的scala脚本}
使用HBase命令调用Jar包中的java类
- 用Java编写HBase操作代码,比如Get数据
- 在本地将代码打包成jar包
- 将jar包上传到linux环境中
- 将jar包添加到HBASE_CLASSPATH
export HBASE_CLASSPATH=`hbase classpath`
export HBASE_CLASSPATH="${HBASE_CLASSPATH}:${用户jar包所在的路径}"
- 通过hbase命令调用用户定义的类
hbase ${用户所定义的操作代码类}
注意1, 必须是用java定义类,并实现main函数,scala的Object经过测试,在此种条件下无法找到所要执行的类
使用HBase命令调用jar包中的scala类
- 在maven项目中加入如下的配置:
(1) dependies部分
<dependency>
<groupId>org.scala-lang</groupId>
<artifactId>scala-library</artifactId>
<version>2.11.12</version>
</dependency>
(2)build部分
<build>
<plugins>
<plugin>
<groupId>org.scala-tools</groupId>
<artifactId>maven-scala-plugin</artifactId>
<version>2.15.2</version>
<executions>
<execution>
<goals>
<goal>compile</goal>
<goal>testCompile</goal>
</goals>
</execution>
</executions>
</plugin>
<plugin>
<artifactId>maven-compiler-plugin</artifactId>
<version>3.6.0</version>
<configuration>
<source>1.8</source>
<target>1.8</target>
</configuration>
</plugin>
<plugin>
<groupId>org.apache.maven.plugins</groupId>
<artifactId>maven-surefire-plugin</artifactId>
<version>2.19</version>
<configuration>
<skip>true</skip>
</configuration>
</plugin>
</plugins>
</build>
- 在HBASE_CLASSPATH变量中添加scala的jar包
主要是添加 $SCALA_HOME/lib/scala-library.jar
export HBASE_CLASSPATH=`hbase classpath`
export HBASE_CLASSPATH=$HBASE_CLASSPATH:$SCALA_HOME/lib/scala-library.jar:./HBaseTest.art-1.0-SNAPSHOT.jar
- 运行类
hbase run ${your class}
Spark操作HBase
- spark代码
package hbase
import org.apache.hadoop.hbase.{HBaseConfiguration, TableName}
import org.apache.hadoop.hbase.client._
import org.apache.hadoop.hbase.util.Bytes
import org.apache.spark.{SparkConf, SparkContext}
import scala.collection.mutable.ListBuffer
import scala.collection.mutable.Set
import scala.collection.mutable.Map
import model.pvuv.PvuvData
import util.common.{IPUtil, RowKeyUtil}
import util.pvuv.PvuvUtil
object SparkHBaseTest {
def main (args: Array[String]):Unit = {
// Spark Variables
val sparkConf: SparkConf = new SparkConf().setAppName("Spark-HBase")
val sc: SparkContext = new SparkContext(sparkConf)
val textRDD = sc.textFile("test data path")
textRDD.mapPartitions ({iterator => {
hbaseConf.set("hbase.zookeeper.quorum", "test")
hbaseConf.set("hbase.zookeeper.property.clientPort", "2181")
val connection: Connection = ConnectionFactory.createConnection(hbaseConf)
val table = connection.getTable(TableName.valueOf("test"))
val putList:java.util.ArrayList[Put] = new java.util.ArrayList[Put]
while (iterator.hasNext) {
val line = iterator.next()
val rowKey = ${生成rowkey的方法}
val put:Put = new Put(Bytes.toBytes(rowKey))
put.addColumn(Bytes.toBytes("test"),Bytes.toBytes("test"),Bytes.toBytes(1))
putList.add(put)
}
table.put(putList)
}
}).collect()
}
}
- 打成jar包,maven环境如下:
<?xml version="1.0" encoding="UTF-8"?>
<project xmlns="http://maven.apache.org/POM/4.0.0"
xmlns:xsi="http://www.w3.org/2001/XMLSchema-instance"
xsi:schemaLocation="http://maven.apache.org/POM/4.0.0 http://maven.apache.org/xsd/maven-4.0.0.xsd">
<modelVersion>4.0.0</modelVersion>
<groupId>HBaseTest</groupId>
<artifactId>HBaseTest.art</artifactId>
<version>1.0-SNAPSHOT</version>
<dependencies>
<dependency>
<groupId>org.scala-lang</groupId>
<artifactId>scala-library</artifactId>
<version>2.11.12</version>
</dependency>
<dependency>
<groupId>org.apache.hbase</groupId>
<artifactId>hbase-client</artifactId>
<version>2.2.0</version>
</dependency>
<dependency>
<groupId>org.apache.hbase</groupId>
<artifactId>hbase</artifactId>
<version>2.2.0</version>
<type>pom</type>
</dependency>
<dependency>
<groupId>org.apache.spark</groupId>
<artifactId>spark-core_2.12</artifactId>
<version>2.4.3</version>
</dependency>
<dependency>
<groupId>org.apache.spark</groupId>
<artifactId>spark-sql_2.12</artifactId>
<version>2.4.3</version>
</dependency>
<dependency>
<groupId>org.apache.hadoop</groupId>
<artifactId>hadoop-common</artifactId>
<version>2.7.7</version>
</dependency>
<dependency>
<groupId>org.apache.hadoop</groupId>
<artifactId>hadoop-hdfs</artifactId>
<version>2.7.7</version>
</dependency>
<dependency>
<groupId>org.apache.flume</groupId>
<artifactId>flume-ng-core</artifactId>
<version>1.9.0</version>
</dependency>
</dependencies>
<build>
<plugins>
<plugin>
<groupId>org.scala-tools</groupId>
<artifactId>maven-scala-plugin</artifactId>
<version>2.15.2</version>
<executions>
<execution>
<goals>
<goal>compile</goal>
<goal>testCompile</goal>
</goals>
</execution>
</executions>
</plugin>
<plugin>
<artifactId>maven-compiler-plugin</artifactId>
<version>3.6.0</version>
<configuration>
<source>1.8</source>
<target>1.8</target>
</configuration>
</plugin>
<plugin>
<groupId>org.apache.maven.plugins</groupId>
<artifactId>maven-surefire-plugin</artifactId>
<version>2.19</version>
<configuration>
<skip>true</skip>
</configuration>
</plugin>
</plugins>
</build>
</project>
- 使用spark-submit提交任务
HBase2.X的table获取
HBase2.2的操作好像做了稍许修改,如下代码是我从HBase2.2官方文档找到的table的获取方式
import org.apache.hadoop.hbase.{HBaseConfiguration, TableName}
import org.apache.hadoop.hbase.client._
···
val hbaseConf = HBaseConfiguration.create()
hbaseConf.set("hbase.zookeeper.quorum", "zk1,zk2,zk3")
hbaseConf.set("hbase.zookeeper.property.clientPort", "2181")
val connection: Connection = ConnectionFactory.createConnection(hbaseConf)
val table = connection.getTable(TableName.valueOf("test:test")) //new HTable(hbaseConf, "ctg:pvuv")