carbandata官网地址
一、编译
1、获取源码
git clone https://github.com/apache/carbondata.git
编译需要安装maven和jdk 1.7或者1.8
2、编译
mvn -DskipTests -Pspark-2.2 -Dspark.version=2.2.0 clean package
3、获取打包后的jar,打包后只有一个大的jar
$carbindata_home/carbondata/assembly/target/scala-2.11/carbondata_2.11-1.3.0-SNAPSHOT-shade-hadoop2.6.0.jar
二、安装和集群Spark 2.2
1、将carbondata_2.11-1.3.0-SNAPSHOT-shade-hadoop2.6.0.jar复制到$spark_home/jars/
cp $carbindata_home/carbondata/assembly/target/scala-2.11/carbondata_2.11-1.3.0-SNAPSHOT-shade-hadoop2.6.0.jar $spark_home/jars/
2、配置carbandata的配置文件到$spark_home/conf
cp carbon.properties $spark_home/conf
3、配置carbon.properties,同时进行参数调优,参数影响查询性能比较大
#################### System Configuration ################## #Mandatory. Carbon Store path carbon.storelocation=hdfs://hacluster/data/CarbonData/CarbonStore #Base directory for Data files carbon.ddl.base.hdfs.url=hdfs://hacluster/data/CarbonData/data #Path where the bad records are stored carbon.badRecords.location=/home/biadmin/tmp/wuzl/carbondata/Spark/badrecords
4、在$spark_home新建一个carbonlib目录
mkdir carbonlib
5、将$spark_home下的carbon.properties、spark-defaults.conf、spark-env.sh复制到carbonlib下
6、将$spark_home/jars/carbondata_2.11-1.3.0-SNAPSHOT-shade-hadoop2.6.0.jar 复制到carbonlib下