准备
环境:CentOS7.5
yum安装kudu1.4
下载cloudera-kudu.repo
If using a Yum repository, use the following commands to install Kudu packages on each host, after saving the cloudera-kudu.repo file to /etc/yum.repos.d/.
sudo yum install kudu # Base Kudu files
sudo yum install kudu-master # Kudu master init.d service script and default configuration
sudo yum install kudu-tserver # Kudu tablet server init.d service script and default configuration
sudo yum install kudu-client0 # Kudu C++ client shared library
sudo yum install kudu-client-devel # Kudu C++ client SDK
启动
$ sudo service kudu-master start
$ sudo service kudu-tserver start
关闭
$ sudo service kudu-master stop
$ sudo service kudu-tserver stop
自启
$ sudo chkconfig kudu-master on # RHEL / CentOS / SLES
$ sudo chkconfig kudu-tserver on # RHEL / CentOS / SLES
web界面
Kudu Master Web Interface ip:8051
Kudu Tablet Server Web Interface ip:8050
rpc连接地址 ip:7051
spark api使用
pom.xml配置
<dependency>
<groupId>org.apache.kudu</groupId>
<artifactId>kudu-client</artifactId>
<version>1.8.0</version>
</dependency>
<!-- https://mvnrepository.com/artifact/org.apache.kudu/kudu-spark2 -->
<dependency>
<groupId>org.apache.kudu</groupId>
<artifactId>kudu-spark2_2.11</artifactId>
<version>1.8.0</version>
</dependency>
KuduDemo
package kudu
import org.apache.kudu.client.CreateTableOptions
import org.apache.kudu.spark.kudu.KuduContext
import org.apache.kudu.{ColumnSchema, Schema, Type}
import org.apache.spark.rdd.RDD
import org.apache.spark.sql.{DataFrame, Row, SparkSession}
import org.apache.spark.{SparkConf, SparkContext}
import scala.collection.mutable.ArrayBuffer
/**
* @Description
* @Author pengj <ugeg@163.com>
* @Version V1.0.0
* @Date 2018/12/7
*/
case class Person(name: String, sex:String, age:Int)
object KuduDemo {
val tableName = "impala::default.student"
val kuduMaster1 = "192.168.5.132:7051"
def main(args: Array[String]): Unit = {
val sparkConf: SparkConf = new SparkConf().setAppName("kududemo").setMaster("local[*]")
sparkConf.set("spark.sql.shuffle.partitions", "1")
val sparkContext: SparkContext = new SparkContext(sparkConf)
val sparkSession: SparkSession = SparkSession.builder().getOrCreate()
val kuduMasters = Seq(kuduMaster1).mkString(",")
val kuduContext = new KuduContext(kuduMasters,sparkContext)
import scala.collection.JavaConversions._
val columnSchemas: ArrayBuffer[ColumnSchema] = ArrayBuffer[ColumnSchema]()
columnSchemas.+=(new ColumnSchema.ColumnSchemaBuilder("name",Type.STRING).key(true).build())
columnSchemas.+=(new ColumnSchema.ColumnSchemaBuilder("sex",Type.STRING).key(true).build())
columnSchemas.+=(new ColumnSchema.ColumnSchemaBuilder("age",Type.INT32).key(true).build())
val schema: Schema = new Schema(columnSchemas)
if(kuduContext.tableExists(tableName)){
kuduContext.deleteTable(tableName)
}
if(!kuduContext.tableExists(tableName)){
kuduContext.createTable(tableName,schema,new CreateTableOptions().addHashPartitions(List("name"),2).setNumReplicas(1))
}
import sparkSession.implicits._
val person: DataFrame = Seq(Person("张三","男",17),Person("李四","男",18)).toDF()
kuduContext.insertRows(person,tableName)
val df: RDD[Row] = kuduContext.kuduRDD(sparkContext,tableName,Seq("name", "age"))
df.map({case Row(name:String, age:Int)=>(name,age)}).collect().foreach(println(_))
}
}
结果
(张三,17)
(李四,18)
源码编译kudu1.8.0
源码下载kudu-1.8
下载需要的插件
$ sudo yum install autoconf automake cyrus-sasl-devel cyrus-sasl-gssapi \
cyrus-sasl-plain flex gcc gcc-c++ gdb git java-1.8.0-openjdk-devel \
krb5-server krb5-workstation libtool make openssl-devel patch \
pkgconfig redhat-lsb-core rsync unzip vim-common which
可选:编译documentation需要
yum install doxygen gem graphviz ruby-devel zlib-devel
编译可能需要的依赖
$ build-support/enable_devtoolset.sh thirdparty/build-if-necessary.sh
建立输出目录并编译
mkdir -p build/release
cd build/release
../../build-support/enable_devtoolset.sh \
../../thirdparty/installed/common/bin/cmake \
-DCMAKE_BUILD_TYPE=release \
../..
make -j4
安装.通过DESTDIR指定位置,默认/usr/local
sudo make DESTDIR=/opt/kudu install
可选:编译doc
make docs
参考
https://kudu.apache.org/docs/installation.html#rhel_from_source
https://blog.cloudera.com/blog/2017/02/up-and-running-with-apache-spark-on-apache-kudu/