Carbondata1.3.0学习之路(一)

Apache CarbonData里程碑式版本1.3发布

http://www.infoq.com/cn/news/2018/02/apache-carbondata-1.3

在做大数据分析无从下手的时候看到了这个帖子,支持查询数据自动上卷,就像深入研究一下。

根据官方文档进行编译配置

http://carbondata.apache.org/quick-start-guide.html

mvn -DskipTests -Pspark-2.2 -Dspark.version=2.2.1 clean package
得到一个carbondata_2.11-1.3.0-shade-hadoop2.7.2.jar

目前官方也直接有编译好的包

https://dist.apache.org/repos/dist/release/carbondata/1.3.0/

不过遗憾的是,自己编译的包与官网提供的1.3.0的包,使用官方例子既然报错了,找不到CarbonSessionStateBuilder类

  • Import the following :
import org.apache.spark.sql.SparkSession
import org.apache.spark.sql.CarbonSession._
  • Create a CarbonSession :
val carbon = SparkSession.builder().config(sc.getConf)
             .getOrCreateCarbonSession("<hdfs store path>")


scala> import org.apache.spark.sql.SparkSession
import org.apache.spark.sql.SparkSession

scala> import org.apache.spark.sql.CarbonSession._
import org.apache.spark.sql.CarbonSession._

scala> val carbon = SparkSession.builder().config(sc.getConf).getOrCreateCarbonSession("hdfs://localhost:9000/Opt")
18/03/02 01:22:34 WARN SparkContext: Using an existing SparkContext; some configuration may not take effect.
18/03/02 01:22:34 WARN CarbonProperties: main The enable unsafe sort value "null" is invalid. Using the default value "false
18/03/02 01:22:34 WARN CarbonProperties: main The custom block distribution value "null" is invalid. Using the default value "false
18/03/02 01:22:34 WARN CarbonProperties: main The enable vector reader value "null" is invalid. Using the default value "true
18/03/02 01:22:34 WARN CarbonProperties: main The carbon task distribution value "null" is invalid. Using the default value "block
18/03/02 01:22:34 WARN CarbonProperties: main The enable auto handoff value "null" is invalid. Using the default value "true
18/03/02 01:22:34 ERROR CarbonProperties: main The specified value for property sort.inmemory.size.inmbis Invalid. Taking the default value.1024
java.lang.ClassNotFoundException: org.apache.spark.sql.hive.CarbonSessionStateBuilder
  at scala.reflect.internal.util.AbstractFileClassLoader.findClass(AbstractFileClassLoader.scala:62)
  at java.lang.ClassLoader.loadClass(ClassLoader.java:424)
  at java.lang.ClassLoader.loadClass(ClassLoader.java:357)
  at java.lang.Class.forName0(Native Method)
  at java.lang.Class.forName(Class.java:348)
  at org.apache.spark.util.Utils$.classForName(Utils.scala:230)
  at org.apache.spark.util.CarbonReflectionUtils$.createObject(CarbonReflectionUtils.scala:238)
  at org.apache.spark.util.CarbonReflectionUtils$.getSessionState(CarbonReflectionUtils.scala:205)
  at org.apache.spark.sql.CarbonSession.sessionState$lzycompute(CarbonSession.scala:49)
  at org.apache.spark.sql.CarbonSession.sessionState(CarbonSession.scala:48)
  at org.apache.spark.sql.CarbonSession$CarbonBuilder$$anonfun$getOrCreateCarbonSession$2.apply(CarbonSession.scala:173)
  at org.apache.spark.sql.CarbonSession$CarbonBuilder$$anonfun$getOrCreateCarbonSession$2.apply(CarbonSession.scala:173)
  at scala.collection.mutable.HashMap$$anonfun$foreach$1.apply(HashMap.scala:99)
  at scala.collection.mutable.HashMap$$anonfun$foreach$1.apply(HashMap.scala:99)
  at scala.collection.mutable.HashTable$class.foreachEntry(HashTable.scala:230)
  at scala.collection.mutable.HashMap.foreachEntry(HashMap.scala:40)
  at scala.collection.mutable.HashMap.foreach(HashMap.scala:99)
  at org.apache.spark.sql.CarbonSession$CarbonBuilder.getOrCreateCarbonSession(CarbonSession.scala:173)
  at org.apache.spark.sql.CarbonSession$CarbonBuilder.getOrCreateCarbonSession(CarbonSession.scala:85)
  ... 50 elided

反编译看了生成的jar源码,确实没有找到org.apache.spark.sql.hive.CarbonSessionStateBuilder这个类,我很郁闷,难道官方没测试好久发布了?无奈,就只能从github上获取了最新的开发板源码,重新编译。

GitHub:https://github.com/apache/carbondata.git

编译完后变成1.4版本了carbondata_2.11-1.4.0-SNAPSHOT-shade-hadoop2.7.2.jar


多了这个些类。

开始测试

spark-shell --jars /opt/spark-2.2.1-bin-hadoop2.7/carbonlib/carbondata_2.11-1.4.0-SNAPSHOT-shade-hadoop2.7.2.jar

scala> import org.apache.spark.sql.SparkSession

scala> import org.apache.spark.sql.CarbonSession._

scala> val carbon = SparkSession.builder().config(sc.getConf).getOrCreateCarbonSession("hdfs://localhost:9000/Opt")

scala> carbon.sql("LOAD DATA INPATH 'hdfs://localhost:9000/user/root/input/sample.csv'   INTO TABLE test_table")
18/03/02 00:46:14 AUDIT CarbonDataRDDFactory$: [localhost.localdomain][root][Thread-1]Data load request has been received for table default.test_table
18/03/02 00:46:14 WARN CarbonDataProcessorUtil: main sort scope is set to LOCAL_SORT
18/03/02 00:46:17 WARN CarbonDataProcessorUtil: [Executor task launch worker for task 0][partitionID:table;queryID:1114726938817] sort scope is set to LOCAL_SORT
18/03/02 00:46:17 WARN CarbonDataProcessorUtil: [Executor task launch worker for task 0][partitionID:table;queryID:1114726938817] batch sort size is set to 0
18/03/02 00:46:17 WARN CarbonDataProcessorUtil: [Executor task launch worker for task 0][partitionID:table;queryID:1114726938817] sort scope is set to LOCAL_SORT
18/03/02 00:46:17 WARN CarbonDataProcessorUtil: [Executor task launch worker for task 0][partitionID:table;queryID:1114726938817] Error occurs while creating dirs: /tmp/carbon1114860648927_0/Fact/Part0/Segment_0/0
18/03/02 00:46:17 WARN CarbonDataProcessorUtil: [Executor task launch worker for task 0][partitionID:table;queryID:1114726938817] sort scope is set to LOCAL_SORT
18/03/02 00:46:20 AUDIT CarbonDataRDDFactory$: [localhost.localdomain][root][Thread-1]Data load is successful for default.test_table
res3: org.apache.spark.sql.DataFrame = []
scala> carbon.sql("SELECT * FROM test_table").show()
+---+----+----+---+
| id|name|city|age|
+---+----+----+---+
|  1|aaaa|  xm| 20|
|  2|bbbb|  xm| 21|
|  3|cccc|  zz| 30|
|  4|dddd|  hh| 20|
+---+----+----+---+

聚合查询试试

scala> carbon.sql("SELECT city, avg(age), sum(age)  FROM test_table  GROUP BY city").show()
+----+--------+--------+                                                        
|city|avg(age)|sum(age)|
+----+--------+--------+
|  hh|    20.0|      20|
|  zz|    30.0|      30|
|  xm|    20.5|      41|
+----+--------+--------+

终于通了!

如果使用HDFS模式记得设置锁类型LOCK_TYPE属性为:HDFSLOCK,系统默认为:LOCALLOCK
scala> import org.apache.carbondata.core.util.CarbonProperties
import org.apache.carbondata.core.util.CarbonProperties

scala> import org.apache.carbondata.core.constants.CarbonCommonConstants
import org.apache.carbondata.core.constants.CarbonCommonConstants

scala> CarbonProperties.getInstance().addProperty(CarbonCommonConstants.LOCK_TYPE, "HDFSLOCK")


  • 1
    点赞
  • 3
    收藏
    觉得还不错? 一键收藏
  • 3
    评论

“相关推荐”对你有帮助么?

  • 非常没帮助
  • 没帮助
  • 一般
  • 有帮助
  • 非常有帮助
提交
评论 3
添加红包

请填写红包祝福语或标题

红包个数最小为10个

红包金额最低5元

当前余额3.43前往充值 >
需支付:10.00
成就一亿技术人!
领取后你会自动成为博主和红包主的粉丝 规则
hope_wisdom
发出的红包
实付
使用余额支付
点击重新获取
扫码支付
钱包余额 0

抵扣说明:

1.余额是钱包充值的虚拟货币,按照1:1的比例进行支付金额的抵扣。
2.余额无法直接购买下载,可以购买VIP、付费专栏及课程。

余额充值