Apache CarbonData快速入门指南

How to Use it?

CarbonData是由华为开发、开源并支持Apache Hadoop的列式存储文件格式,支持索引、压缩以及解编码等,
其目的是为了实现同一份数据达到多种需求,而且能够实现更快的交互查询。

 Follow the steps in CarbonData-Quick Start.

  • Put the *.csv file into HDFS, like:
cd carbondata
$ Create a sample.csv file using the following commands
$ put into hdfs, like: 'hdfs://presto00:9000/carbon/sample.csv'
  • Start spark, like:
$ ./sbin/start-master.sh
$ ./bin/spark-class org.apache.spark.deploy.worker.Worker spark://presto00:7077
  • Start spark-shell, like:
$ ./bin/spark-shell --jars ../carbondata-1.2.0/carbondata_2.11-1.2.0-SNAPSHOT-shade-hadoop2.7.3.jar --executor-memory 6G

Note: --executor-memory 6G setted for the java eap space, if the load data is not big, you can ignore it.

  • execute by scala, like:
$ import org.apache.spark.sql.SparkSession
$ import org.apache.spark.sql.CarbonSession._
$ val carbon = SparkSession.builder().config(sc.getConf).config(sc.getConf).getOrCreateCarbonSession("hdfs://presto00:9000//carbon/db")
$ carbon.sql("CREATE TABLE IF NOT EXISTS test(id string, name string, city string, age Int) STORED BY 'carbondata'")
$ carbon.sql("LOAD DATA INPATH 'hdfs://presto00:9000/carbon/sample.csv' INTO TABLE test options('DELIMITER'=',', 'FILEHEADER'='id,name,city,age')")

Note:
1. /carbon/db is the hdfs store path that tables stored.
2. CREATE TABLE defines the column and the type
3. 'DELIMITER'=',' or 'DELIMITER'='\t', to explain the separator of the data in the *.csv
4. LOAD DATA options rely on the header of the csv, like:

id,name,city,age
1,david,shenzhen,31
2,eason,shenzhen,27
3,jarry,wuhan,35

run:

$ carbon.sql("LOAD DATA INPATH 'hdfs://presto00:9000/carbon/sample.csv' INTO TABLE test")

1,david,shenzhen,31
2,eason,shenzhen,27
3,jarry,wuhan,35

run:

$ carbon.sql("LOAD DATA INPATH 'hdfs://presto00:9000/carbon/sample.csv' INTO TABLE test options('FILEHEADER'='id,name,city,age')")

More Usage

  • file like split by '\t':

    1 david shenzhen 31
    2 eason shenzhen 27
    3 jarry wuhan 35

  • must run:

$ carbon.sql("CREATE TABLE IF NOT EXISTS test(id string, name string, age Int) STORED BY 'carbondata'")
$ carbon.sql("LOAD DATA INPATH 'hdfs://presto00:9000/carbon/sample.csv' INTO TABLE test options('DELIMITER'='\t','FILEHEADER'='id,name,city,age')")

Note: CREATE TABLE do not need to contain all the column, but when LOAD DATA you must give all the header info, more to see in Programming Guide.

For any question, you can make comments followed.
评论 1
添加红包

请填写红包祝福语或标题

红包个数最小为10个

红包金额最低5元

当前余额3.43前往充值 >
需支付:10.00
成就一亿技术人!
领取后你会自动成为博主和红包主的粉丝 规则
hope_wisdom
发出的红包
实付
使用余额支付
点击重新获取
扫码支付
钱包余额 0

抵扣说明:

1.余额是钱包充值的虚拟货币,按照1:1的比例进行支付金额的抵扣。
2.余额无法直接购买下载,可以购买VIP、付费专栏及课程。

余额充值