bulk loading的优缺点这里就不再赘述,在本博客的其他文章已经进行过详细的分析:
Hbase Bulk Loading与HBase API方式分析和对比
bulk loading的过程主要分为两部分,一部分为数据生成,一部分为数据加载。
我们先来看看执行bulk loading的执行过程,然后再分析代码。下面是一个执行bulk loading的完整shell脚本:
CLASSPATH=./bulkload.jar:/etc/yarn1/conf:/etc/hdfs1/conf:/etc/hbase/conf:/usr/lib/hadoop/lib/*:/usr/lib/hadoop/.//*:/usr/lib/hadoop-hdfs/./:/usr/lib/hadoop-hdfs/lib/*:/usr/lib/hadoop-hdfs/.//*:/usr/lib/hadoop-yarn/lib/*:/usr/lib/hadoop-yarn/.//*:/usr/lib/hadoop-mapreduce/lib/*:/usr/lib/hadoop-mapreduce/.//*:/contrib/capacity-scheduler/*.jar
TABLE_NAME=CZSJ_hbase
RECORD_COUNT_FILE=/tmp/data1
RECORDS_PER_REGION=/tmp/output
echo "Start bulk load at `date`"
start=`date +%s`
echo "create hbase table=========="
hbase shell << EOF
disable 'CZSJ_hbase'
drop 'CZSJ_hbase'
create 'CZSJ_hbase', 'cf', {SPLITS_FILE => 'region_start_row_info'}
EOF
jars=`ls ./lib`
for jar in $jars
do
CLASSPATH="$CLASSPATH:./lib/$jar"
done
CLASSPATH=/usr/lib/hadoop/conf:/usr/lib/hbase/conf:$CLASSPATH
java -cp $CLASSPATH com.learn.hbasebulkload.HFileGenerator $TABLE_NAME $RECORD_COUNT_FILE $RECORDS_PER_REGION
java -cp $CLASSPATH com.learn.hbasebulkload.HFileLoader $TABLE_NAME $RECORDS_PER_REGION
hdfs dfs -rmr $RECORDS_PER_REGI