对应chd5.14的spark_carbonData使用文档(基于CDH 的spark-yarn模式)

一、部署(基于CDH 的spark-yarn模式)

下载源码编译(目前官网已经提供编译好的jar包了)【https://dist.apache.org/repos/dist/release/carbondata/】

mvn -DskipTests -Pspark-2.1 -Dspark.version=2.1.0 clean package

mvn -DskipTests -Pspark-2.2 -Dspark.version=2.2.1 clean package

修改配置文件carbon.properties文件

############## System Configuration ##################

#Mandatory. Carbon Store path

carbon.storelocation=hdfs://hostname:8020/carbon

#Base directory for Data files

#carbon.ddl.base.hdfs.url=hdfs://hostname:8020/carbon/data

#Path where the bad records are stored

carbon.badRecords.location=/opt/test

################################################

进入cloudera manager的webUI

点击spark2组件、进入配置 ,在搜索框输入spark-default

在进行配置

spark.master=yarn-client

spark.yarn.dist.files=/opt/cloudera/parcels/CDH-5.10.2-1.cdh5.10.2.p0.5/lib/spark/conf/carbon.properties

spark.yarn.dist.archives=/opt/cloudera/parcels/CDH-5.10.2-1.cdh5.10.2.p0.5/lib/spark/carbonlib/carbondata.tar.gz

spark.executor.extraJavaOptions=-Dcarbon.properties.filepath=carbon.properties

spark.executor.extraClassPath= carbondata.tar.gz/carbonlib/*

spark.driver.extraClassPath=/opt/cloudera/parcels/CDH-5.10.2-1.cdh5.10.2.p0.5/lib/spark/carbonlib/*

spark.driver.extraJavaOptions=-Dcarbon.properties.filepath=/opt/cloudera/parcels/CDH-5.10.2-1.cdh5.10.2.p0.5/lib/spark

/conf/carbon.properties

spark.dynamicAllocation.enabled true

spark.shuffle.service.enabled true

spark.dynamicAllocation.minExecutors 0

spark.dynamicAllocation.maxExecutors 20

保存重启

二、启动:

spark2-shell

三、使用

3.1创建上下文

import org.apache.carbondata.core.util.CarbonProperties

import org.apache.carbondata.core.constants.CarbonCommonConstants

CarbonProperties.getInstance().addProperty(CarbonCommonConstants.LOCK_TYPE, “HDFSLOCK”)

import org.apache.spark.sql.SparkSession

import org.apache.spark.sql.CarbonSession._

val carbon = SparkSession.builder().config(sc.getConf).getOrCreateCarbonSession(“hdfs://hostname:/carbon”)

3.2创表

carbon.sql("create table if not exists log_T (IP string," +

"website string," +

"cnid string," +

"uid string," +

"version string," +

"stored by 'carbondata'")

3.3 入库(加载数据)

carbon.sql("load data inpath '/carbon/carbontest.gz' " +

"into table log_T " +

"options('header'='false'," +

"'fileheader'='ip," +

"website," +

"datestr," +

"timestr," +

"cnid," +

"uid," +

"version'," +

"'DELIMITER'='\t'," +

"'MULTILINE'='true'," +

"'ESCAPECHAR'='\t'," +

"'SKIP_EMPTY_LINE'='TRUE'," +

"'SINGLE_PASS'='TRUE')")

DataFrame:

df.write.format(“carbondata").options("tableName", "log_T")) .mode(SaveMode.Overwrite).save()

3.4 查询:

carbon.sql(“select * from log_T”)

3.5 更新和删除

更新一列

UPDATE table1 log_T

SET log_T.REVENUE = log_T.REVENUE - 10 WHERE log_T.PRODUCT = 'uid'

Modify two columns in table1

更新两列

UPDATE table1 log_T

SET (log_T.PRODUCT, log_T.REVENUE) =

(

SELECT PRODUCT, REVENUE

FROM table2 B

WHERE log_B.CITY = log_T.CITY AND log_B.BROKER =log_T.BROKER

)

WHERE log_T.DATE BETWEEN '2017-01-01' AND '2017-01-31'

3.6 删除

carbon.sql("DELETE FROM table1 log_T WHERE log_T.CUSTOMERID =‘123’")

scala> carbon.sql("select * from log_t.carbon_tablename_new").show

+-----+-----------+

| uid | cnid |

+-----+-----------+

| 1 | 2|

| 3 | 4|

+---- +-----------+

scala> carbon.sql("delete from dtwave_dev.carbon_tablename_new a WHERE a.name='1'")

scala> carbon.sql("select * from dtwave_dev.carbon_tablename_new").show

+----+-----------+

|name|PhoneNumber|

+----+-----------+

| 3| 4|

+----+-----------+

carbon.sql("update dtwave_dev.carbon_tablename_new A SET (A.name) =

A.name WHERE A.PhoneNumber = '4'")

carbon.sql("UPDATE dtwave_dev.carbon_tablename_new a SET (a.name, a.PhoneNumber) =

( SELECT '5' as name ,'6' from dtwave_dev.carbon_tablename_new b)")

carbon.sql("UPDATE dtwave_dev.carbon_tablename_new a SET (a.name, a.PhoneNumber) =

( SELECT '5' as name ,'6' as PhoneNumber)")

四、HDFS对应的文件

4.1 HDFS 目录:

/carbon/default/log_tab

Permission

Owner

Group

Size

Last Modified

Replication

Block Size

Name

drwxr-xr-x

root

spark

0 B

Fri Mar 30 19:38:11 +0800 2018

0

0 B

drwxr-xr-x

root

spark

0 B

Fri Mar 30 19:39:05 +0800 2018

0

0 B

4.2 数据文件

/carbon/default/log_tab/Fact/Part0/Segment_0

4.3 元数据文

/carbon/carbonstore/dtwave_dev/carbon_tablename_new/Metadata

  • 0
    点赞
  • 0
    收藏
    觉得还不错? 一键收藏
  • 0
    评论
评论
添加红包

请填写红包祝福语或标题

红包个数最小为10个

红包金额最低5元

当前余额3.43前往充值 >
需支付:10.00
成就一亿技术人!
领取后你会自动成为博主和红包主的粉丝 规则
hope_wisdom
发出的红包
实付
使用余额支付
点击重新获取
扫码支付
钱包余额 0

抵扣说明:

1.余额是钱包充值的虚拟货币,按照1:1的比例进行支付金额的抵扣。
2.余额无法直接购买下载,可以购买VIP、付费专栏及课程。

余额充值