HBase的集群搭建和使用

最新推荐文章于 2024-07-19 22:28:09 发布

疯流小子

最新推荐文章于 2024-07-19 22:28:09 发布

阅读量571

点赞数

分类专栏： HBase 文章标签：集群 hbase hadoop 分布式存储 zk

本文链接：https://blog.csdn.net/wo890726/article/details/41021951

版权

HBase 专栏收录该内容

1 篇文章 0 订阅

订阅专栏

1. HBase简介

HBase – Hadoop Database，是一个高可靠性、高性能、面向列、可伸缩的分布式存储系统，利用HBase技术可在廉价PC Server上搭建起大规模结构化存储集群。HBase利用Hadoop HDFS作为其文件存储系统，利用Hadoop MapReduce来处理HBase中的海量数据，利用Zookeeper作为协调工具。

HBASE中的每一张表，就是所谓的BigTable。稀疏表。

RowKey 和 ColumnKey 是二进制值byte[]，按字典顺序排序；

Timestamp 是一个 64 位整数；

value 是一个未解释的字节数组byte[]。

表中的不同行可以拥有不同数量的成员。即支持“动态模式“模型

2. 数据模型－行

字符串、整数、二进制串甚至串行化的结构都可以作为行键

表按照行键的“逐字节排序”顺序对行进行有序化处理

表内数据非常‘稀疏’，不同的行的列的数完全目可以大不相同

可以只对一行上“锁”

对行的写操作是始终是“原子”的

3. 数据模型－列

列必须用‘族’(family)来定义

任意一列有如下形式

“族：标签”

其中，族和标签都可为任意形式的串

物理上将同“族”数据存储在一起

数据可通过时间戳区分版本

4. 表

表是存放数据的。表由行和列组成

数据模型

1. Row Key: 行键，Table的主键，Table中的记录按照Row Key排序

2.Timestamp: 时间戳，每次数据操作对应的时间戳，可以看作是数据的version number

3.Column Family：列簇，Table在水平方向有一个或者多个Column Family组成，一个Column Family中可以有任意多个Column组成，即Column Family支持动态扩展，无需预先定义Column的数量以及类型，所有Column均以二进制格式存储，用户需要自行进行类型转换。

5. 集群的搭建

5.1解压和重命名

tar -zxvf hbase-0.94.2-security.tar.gz

mv hbase-0.94.2-security hbase

5.2 修改/etc/profile文件。

#vi /etc/profile

增加

export HBASE_HOME=/home/hbase

修改

export PATH=$JAVA_HOME/bin:$PATH:$HADOOP_HOME/bin:$HBASE_HOME/bin

保存退出

#source /etc/profile

5.3 修改$HBASE_HOME/conf/hbase-env.sh文件

export JAVA_HOME=/usr/local/jdk

export HBASE_MANAGES_ZK=false

保存后退出

5.4 修改$HBASE_HOME/conf/hbase-site.xml

<name>hbase.rootdir</name>

<value>hdfs://master:9000/hbase</value>

</property>

<name>hbase.cluster.distributed</name>

</property>

<name>hbase.zookeeper.quorum</name>

<value>master,slaves1,slaves2</value>

</property>

<name>dfs.replication</name>

</property>

注意：$HBASE_HOME/conf/hbase-site.xml的hbase.rootdir的主机和端口号与$HADOOP_HOME/conf/core-site.xml的fs.default.name的主机和端口号一致

5.5 在$HBASE_HOME/conf/regionservers文件增加

slaves1

slaves2

保存退出

5.6 复制hbase文件夹到从节点

scp -r hbase slaves1:/usr/local/

scp -r hbase salves2:/usr/local/

5.7 复制/etc/profile文件到从节点

scp /etc/profile slaves1:/etc/

scp /etc/profile slaves2:/etc/

5.8 先启动Hadoop，后启动HBASE

#cd $HBASE_HOME/bin

#./start-hbase.sh

停止

#cd $HBASE_HOME/bin

#./stop-hbase.sh

5.9 查看hdfs目录，你会发现在根目录下多了一个hbase的目录

#hadoop fs -ls /

5.10 jps

5.11 Web Console

http://hadoop0:60010/master-status

5.12 hbase提供了一个shell的终端给用户交互

#$HBASE_HOME/bin/hbase shell

名称	命令表达式
创建表	create '表名称', '列族名称1','列族名称2','列族名称N'
添加记录	put '表名称', '行名称', '列名称:', '值'
查看记录	get '表名称', '行名称'
查看表中的记录总数	count '表名称'
删除记录	delete '表名' ,'行名称' , '列名称'
删除一张表	先要屏蔽该表，才能对该表进行删除，第一步 disable '表名称' 第二步 drop '表名称'
查看所有记录	scan "表名称"
查看某个表某个列中所有数据	scan "表名称" , {COLUMNS=>'列族名称:列名称'}
更新记录	就是重写一遍进行覆盖

5.13 HBASE Shell的DDL操作

创建表

>create 'users','user_id','address','info'

表users,有三个列族user_id,address,info

§列出全部表

>list

§得到表的描述

>describe 'users'

§创建表

>create 'users_tmp','user_id','address','info'

§删除表

>disable 'users_tmp'

>drop 'users_tmp'

5.14 HBASE Shell的DML操作

添加记录

5.15 获取一条记录

1.取得一个id的所有数据

>get 'users','xiaoming'

2.获取一个id，一个列族的所有数据

>get 'users','xiaoming','info'

3.获取一个id，一个列族中一个列的

所有数据

get 'users','xiaoming','info:age

5.16 更新记录

>put 'users','xiaoming','info:age' ,'29'

>get 'users','xiaoming','info:age'

>put 'users','xiaoming','info:age' ,'30'

>get 'users','xiaoming','info:age'

§获取单元格数据的版本数据

>get 'users','xiaoming',{COLUMN=>'info:age',VERSIONS=>1}

>get 'users','xiaoming',{COLUMN=>'info:age',VERSIONS=>2}

>get 'users','xiaoming',{COLUMN=>'info:age',VERSIONS=>3}

§获取单元格数据的某个版本数据

〉get 'users','xiaoming',{COLUMN=>'info:age',TIMESTAMP=>1364874937056}

§全表扫描

>scan 'users'

5.17

删除xiaoming值的'info:age'字段

>delete 'users','xiaoming','info:age'

>get 'users','xiaoming'

§删除整行

>deleteall 'users','xiaoming'

§统计表的行数

>count 'users'

§清空表

>truncate 'users'

5.18 HBASE的Java_API

//hbase操作必备

private static Configuration getConfiguration() {

Configuration conf = HBaseConfiguration.create();

conf.set("hbase.rootdir", "hdfs://hadoop0:9000/hbase");

//使用eclipse时必须添加这个，否则无法定位

conf.set("hbase.zookeeper.quorum", "hadoop0");

return conf;

}

//创建一张表

public static void create(String tableName, String columnFamily) throws IOException{

HBaseAdmin admin = new HBaseAdmin(getConfiguration());

if ( admin.tableExists(tableName)) {

System.out.println("table exists!");

}else{

HTableDescriptor tableDesc = new HTableDescriptor(tableName);

tableDesc.addFamily(new HColumnDescriptor(columnFamily));

admin.createTable(tableDesc);

System.out.println("create table success!");

}

//添加一条记录

public static void put(String tableName, String row, String columnFamily, String column, String data) throws IOException{

HTable table = new HTable(getConfiguration(), tableName);

Put p1 = new Put(Bytes.toBytes(row));

p1.add(Bytes.toBytes(columnFamily), Bytes.toBytes(column), Bytes.toBytes(data));

table.put(p1);

System.out.println("put'"+row+"',"+columnFamily+":"+column+"','"+data+"'");

}

//读取一条记录

public static void get(String tableName, String row) throws IOException{

HTable table = new HTable(getConfiguration(), tableName);

Get get = new Get(Bytes.toBytes(row));

Result result = table.get(get);

System.out.println("Get: "+result);

}

//显示所有数据

public static void scan(String tableName) throws IOException{

HTable table = new HTable(getConfiguration(), tableName);

Scan scan = new Scan();

ResultScanner scanner = table.getScanner(scan);

for (Result result : scanner) {

System.out.println("Scan: "+result);

}

//删除表

public static void delete(String tableName) throws IOException{

HBaseAdmin admin = new HBaseAdmin(getConfiguration());

if(admin.tableExists(tableName)){

try {

admin.disableTable(tableName);

admin.deleteTable(tableName);

} catch (IOException e) {

e.printStackTrace();

System.out.println("Delete "+tableName+" 失败");

}

System.out.println("Delete "+tableName+" 成功");

}

public static void main(String[] args) throws IOException {

String tableName="hbase_tb";

String columnFamily="cf";

HBaseTestCase.create(tableName, columnFamily);

HBaseTestCase.put(tableName, "row1", columnFamily, "cl1", "data");

HBaseTestCase.get(tableName, "row1");

HBaseTestCase.scan(tableName);

HBaseTestCase.delete(tableName);

}

5.19 HBASE结合MapReduce批量导入

static class BatchImportMapper extends Mapper<LongWritable, Text, LongWritable, Text>{

SimpleDateFormat dateformat1=new SimpleDateFormat("yyyyMMddHHmmss");

Text v2 = new Text();

protected void map(LongWritable key, Text value, Context context) throws java.io.IOException ,InterruptedException {

final String[] splited = value.toString().split("\t");

try {

final Date date = new Date(Long.parseLong(splited[0].trim()));

final String dateFormat = dateformat1.format(date);

String rowKey = splited[1]+":"+dateFormat ;

v2.set( rowKey +"\t" +value.toString()) ;

context.write(key, v2);

} catch (NumberFormatException e) {

final Counter counter = context.getCounter("BatchImport", "ErrorFormat");

counter.increment(1L);

System.out.println("出错了"+splited[0]+" "+e.getMessage());

}

};

}

static class BatchImportReducer extends TableReducer<LongWritable, Text, NullWritable>{

protected void reduce(LongWritable key, java.lang.Iterable<Text> values, Context context) throws java.io.IOException ,InterruptedException {

for (Text text : values) {

final String[] splited = text.toString().split("\t");

final Put put = new Put(Bytes.toBytes(splited[0]));

put.add(Bytes.toBytes("cf"), Bytes.toBytes("date"), Bytes.toBytes(splited[1]));

// 省略其他字段，调用 put.add(....) 即可

context.write(NullWritable.get(), put);

}

public static void main(String[] args) throws Exception {

final Configuration configuration = new Configuration();

// 设置 zookeeper

configuration.set("hbase.zookeeper.quorum", "hadoop0");

// 设置 hbase 表名称

configuration.set(TableOutputFormat.OUTPUT_TABLE, " wlan _log");

// 将该值改大，防止 hbase 超时退出

configuration.set("dfs.socket.timeout", "180000");

final Job job = new Job(configuration, "HBaseBatchImport");

job.setMapperClass(BatchImportMapper.class);

job.setReducerClass(BatchImportReducer.class);

// 设置 map 的输出，不设置 reduce 的输出类型

job.setMapOutputKeyClass(LongWritable.class);

job.setMapOutputValueClass(Text.class);

job.setInputFormatClass(TextInputFormat.class);

// 不再设置输出路径，而是设置输出格式类型

job.setOutputFormatClass(TableOutputFormat.class);

FileInputFormat.setInputPaths(job, "hdfs://hadoop0:9000/input");

job.waitForCompletion(true);

}

疯流小子

关注

0
点赞
踩
0

收藏

觉得还不错? 一键收藏
0
评论
HBase的集群搭建和使用

1. HBase简介 HBase – Hadoop Database，是一个高可靠性、高性能、面向列、可伸缩的分布式存储系统，利用HBase技术可在廉价PC Server上搭建起大规模结构化存储集群。HBase利用Hadoop HDFS作为其文件存储系统，利用Hadoop MapReduce来处理HBase中的海量数据，利用Zookeeper作为协调工具。
复制链接

扫一扫

专栏目录