hadoop、hbase使用记录

=======================================================================
记录一下吧,刚学的时候命令都记住了,一段时间不用基本上又还回去了,…

=======================================================================

Environment VAriables

这是我系统配置的环境变量

# java Environment variables
export JAVA_HOME=/usr/lib/jvm/java-1.8.0-openjdk-1.8.0.201.b09-2.el7_6.x86_64

# Hadoop Environment Variables
export HADOOP_HOME=/usr/local/hadoop
export HADOOP_INSTALL=$HADOOP_HOME
export HADOOP_MAPRED_HOME=$HADOOP_HOME
export HADOOP_COMMON_HOME=$HADOOP_HOME
export HADOOP_HDFS_HOME=$HADOOP_HOME
export YARN_HOME=$HADOOP_HOME
export HADOOP_COMMON_LIB_NATIVE_DIR=$HADOOP_HOME/lib/native
export PATH=$PATH:$HADOOP_HOME/sbin:$HADOOP_HOME/bin:/usr/local/hbase/bin


export HADOOP_OPTS="-Djava.library.path=$HADOOP_INSTALL/lib/native"

# HBase Environment Variables
export HBASE_HOME=/usr/local/hbase
export PATH=$HBASE_HOME/bin:$PATH
export PATH=$HBASE_HOME/lib:$PATH

#Solr Environment Variables
export SOLR_HOME=/usr/local/solr
export SOLR_INSTALL=$SOLR_HOME
export PATH=$SOLR_HOME/bin:$PATH

一、启动

由于我将执行文件的路径配置到了环境变量中,故这里我无需切换到指定目录就可以直接启动。

1 首先启动Hadoop

hadoop是分布式文件系统的一种实现方式,相当于一个平台。hbase、hive都是建立在这个平台上的。

[hadoop@localhost ~]$ start-dfs.sh
Starting namenodes on [localhost]
Starting datanodes
Starting secondary namenodes [account.jetbrains.com]

2 启动Hbase

Hbase是一种分布式数据库,其实Hbase自己本身无法实现分布式的功能,但是借助Hadoop之后,它就可以。
我这里Hbase的节点布置在Hadoop上。Zookeeper我使用的是Hbase自带的。

[hadoop@localhost ~]$ start-hbase.sh
localhost: running zookeeper, logging to /usr/local/hbase/logs/hbase-hadoop-zookeeper-localhost.localdomain.out
running master, logging to /usr/local/hbase/logs/hbase-hadoop-master-localhost.localdomain.out
: running regionserver, logging to /usr/local/hbase/logs/hbase-hadoop-regionserver-localhost.localdomain.out

在这里插入图片描述

HRegionServer进程

HMaster通过Zookeeper来追踪HRegion Server的状态。
HRegion Server 上线时,首先在Zookeeper的server目录中创建自己的文件,并取得文件的独占锁。
由于HMaser订阅了server目录,当目录下有文件增加或者删除时,HMaster能收到来自Zookeeper的实时通知,因此当HRegion Server上线时HMaster能马上得到消息。

http://www.cnblogs.com/yanzibuaa/p/7521624.html

3 启动yarn守护进程

Apache Hadoop YARN (Yet Another Resource Negotiator,另一种资源协调者)是一种新的 Hadoop 资源管理器,它是一个通用资源管理系统,可为上层应用提供统一的资源管理和调度,它的引入为集群在利用率、资源统一管理和数据共享等方面带来了巨大好处。

https://baike.baidu.com/item/yarn/16075826?fr=aladdin
——百度百科

[hadoop@localhost ~]$ start-yarn.sh
Starting resourcemanager
Starting nodemanagers

查看

查看Hadoop、Hbase、yarn的进程是否正常启动。

[hadoop@localhost ~]$ jps
3872 NameNode     	#Hadoop的名称节点
4738 HQuorumPeer	#Hbase内置的zookeeper 进程
4885 HMaster				#Hbase的主节点
4266 SecondaryNameNode		#Hadoop的第二名称节点
5723 NodeManager	#yarn的节点管理进程
6267 Jps
5052 HRegionServer
4013 DataNode			#hadoop的数据节点
5582 ResourceManager		#yarn的资源管理进程

二、使用

1 Hbase shell的使用

1.1 进入
[hadoop@localhost ~]$ hbase shell
HBase Shell
Use "help" to get list of supported commands.
Use "exit" to quit this interactive shell.
For Reference, please visit: http://hbase.apache.org/2.0/book.html#shell
Version 2.0.5, r76458dd074df17520ad451ded198cd832138e929, Mon Mar 18 00:41:49 UTC 2019
Took 0.0042 seconds                                                                                                                               
hbase(main):001:0> 

1.2 help 命令使用
hbase(main):001:0> help
HBase Shell, version 2.0.5, r76458dd074df17520ad451ded198cd832138e929, Mon Mar 18 00:41:49 UTC 2019
Type 'help "COMMAND"', (e.g. 'help "get"' -- the quotes are necessary) for help on a specific command.
Commands are grouped. Type 'help "COMMAND_GROUP"', (e.g. 'help "general"') for help on a command group.

COMMAND GROUPS:				#显示所有的命令组
  Group name: general		#命令组的名称
  Commands: processlist, status, table_help, version, whoami		#该命令组内的所有命令

  Group name: ddl
  Commands: alter, alter_async, alter_status, create, describe, disable, disable_all, drop, drop_all, enable, enable_all, exists, get_table, is_disabled, is_enabled, list, list_regions, locate_region, show_filters

  Group name: namespace
  Commands: alter_namespace, create_namespace, describe_namespace, drop_namespace, list_namespace, list_namespace_tables

  Group name: dml
  Commands: append, count, delete, deleteall, get, get_counter, get_splits, incr, put, scan, truncate, truncate_preserve

  Group name: tools
............
..............
(1) 查看某一组命令下的所有命令的简介和简单示例

help 'commands'

例:help 'dml'

(2) 查看某个命令的详细使用

help 'command'

例:help 'create'

1.3 general组中命令
(1) 查看服务器端的任务列表
Command: processlist
Show regionserver task list.

  hbase> processlist
  hbase> processlist 'all'
  hbase> processlist 'general'
  hbase> processlist 'handler'
  hbase> processlist 'rpc'
  hbase> processlist 'operation'
  hbase> processlist 'all','host187.example.com'
  hbase> processlist 'all','host187.example.com,16020'
  hbase> processlist 'all','host187.example.com,16020,1289493121758'

(2) 查看集群状态
Command: status
Show cluster status. Can be 'summary', 'simple', 'detailed', or 'replication'. The
default is 'summary'. Examples:

  hbase> status
  hbase> status 'simple'
  hbase> status 'summary'
  hbase> status 'detailed'
  hbase> status 'replication'
  hbase> status 'replication', 'source'
  hbase> status 'replication', 'sink'

(3) 查看对某张表进行操作的基本命令
Command: table_help
Help for table-reference commands.

You can either create a table via 'create' and then manipulate the table via commands like 'put', 'get', etc.
See the standard help information for how to use each of these commands.(您可以通过'create'创建一个表,然后通过'put'、'get'等命令操作表。有关如何使用这些命令,请参阅标准帮助信息。)

However, as of 0.96, you can also get a reference to a table, on which you can invoke commands.
For instance, you can get create a table and keep around a reference to it via:(但是,从0.96开始,您还可以获得对表的引用,您可以在该表上调用命令。例如,你可以创建一个表,并保持周围的引用它通过:)

   hbase> t = create 't', 'cf'

Or, if you have already created the table, you can get a reference to it:(或者,如果你已经创建了这个表,你可以得到它的引用:)

   hbase> t = get_table 't'

You can do things like call 'put' on the table:(你可以这样做:)

  hbase> t.put 'r', 'cf:q', 'v'

which puts a row 'r' with column family 'cf', qualifier 'q' and value 'v' into table t.(它将行“r”与列族“cf”、限定符“q”和值“v”放在表t中)

To read the data out, you can scan the table:(要读取数据,可以扫描表:)

  hbase> t.scan

which will read all the rows in table 't'.(它将读取表't'中的所有行。)

Essentially, any command that takes a table name can also be done via table reference.
Other commands include things like: get, delete, deleteall,
get_all_columns, get_counter, count, incr. These functions, along with
the standard JRuby object methods are also available via tab completion.
(本质上,任何接受表名的命令都可以通过表引用来执行。
其他命令包括:get、delete、deleteall、
get_all_columns, get_counter, count, incr。这些函数
标准的JRuby对象方法也可以通过选项卡完成。)

For more information on how to use each of these commands, you can also just type:

   hbase> t.help 'scan'

which will output more information on how to use that command.

You can also do general admin actions directly on a table; things like enable, disable,
flush and drop just by typing:

   hbase> t.enable
   hbase> t.flush
   hbase> t.disable
   hbase> t.drop

Note that after dropping a table, your reference to it becomes useless and further usage
is undefined (and not recommended).
(4) 查看集群版本
Command: version
Output this HBase version
(5) 当前登录用户角色信息
Command: whoami
Show the current hbase user.
Syntax : whoami
For example:

    hbase> whoami

1.4 ddl组中命令
(1) alter 修改表
Command: alter
Alter a table. Tables can be altered without disabling them first.(可以在不首先禁用表的情况下更改表。)
Altering enabled tables has caused problems(更改启动中的表会出现问题)
in the past, so use caution and test it before using in production.(在过去,请谨慎使用并在生产中使用前进行测试)

You can use the alter command to add,(你可以使用alter命令来添加)
modify or delete column families or change table configuration options.(修改或删除列族或更改表配置选项。)
Column families work in a similar way as the 'create' command.(列族的工作方式与“创建”命令类似) 
The column familyspecification can either be a name string, or a dictionary with the NAME attribute.(列族名称可以是字符串,也可以是具有name属性的字典)
Dictionaries are described in the output of the 'help' command, with no arguments.(字典在“help”命令的输出中进行描述,没有参数。)

For example, to change or add the 'f1' column family in table 't1' from
current value to keep a maximum of 5 cell VERSIONS, do:

  hbase> alter 't1', NAME => 'f1', VERSIONS => 5

例:

--(1) 增加列族
alter 'table_name', 'add_family'
# 或者
alter 'table_name', {NAME => 'add_family'}
# 当然,新增加的列可以设置属性,比如
alter 'table_name', {NAME => 'add_family', VERSIONS => 3}

--(2) 删除列族
alter 'table_name', {NAME => 'delete_family', METHOD => 'delete'}
或者
alter 'table_name', 'delete' => 'delete_family'

--(3) 添加列族f1同时删除列族f2
alter 'user', {NAME => 'f1'}, {NAME => 'f2', METHOD => 'delete'}

--(4) 修改列族
# 将user表的f1列族版本号改为5
alter 'user', NAME => 'f1', VERSIONS => 5

(2) create 创建表
Command: create
Creates a table. Pass a table name, and a set of column family
specifications (at least one), and, optionally, table configuration.
Column specification can be a simple string (name), or a dictionary
(dictionaries are described below in main help output), necessarily
including NAME attribute.
(创建一个表。
传递一个表名、一组列族规范(至少一个)和表配置(可选)。
列规范可以是简单的字符串(名称),
也可以是字典(字典在下面的主帮助输出中描述),
其中必须包含name属性。)
Examples:

Create a table with namespace=ns1 and table qualifier=t1
  hbase> create 'ns1:t1', {NAME => 'f1', VERSIONS => 5}

(3) list 查看数据库中有哪些表
hbase(main):007:0> list
TABLE                                                                                                                                             
teacher                                                                                                                                           
test                                                                                                                                              
2 row(s)
Took 0.7881 seconds                                                                                                                               
=> ["teacher", "test"]
hbase(main):008:0> 
(4) describe 查看表属性信息
hbase(main):009:0> describe 'test'
Table test is ENABLED                                                                                                                             
test                                                                                                                                              
COLUMN FAMILIES DESCRIPTION                                                                                                                       
{NAME => 'cf', VERSIONS => '1', EVICT_BLOCKS_ON_CLOSE => 'false', NEW_VERSION_BEHAVIOR => 'false', KEEP_DELETED_CELLS => 'FALSE', CACHE_DATA_ON_WR
ITE => 'false', DATA_BLOCK_ENCODING => 'NONE', TTL => 'FOREVER', MIN_VERSIONS => '0', REPLICATION_SCOPE => '0', BLOOMFILTER => 'ROW', CACHE_INDEX_
ON_WRITE => 'false', IN_MEMORY => 'false', CACHE_BLOOMS_ON_WRITE => 'false', PREFETCH_BLOCKS_ON_OPEN => 'false', COMPRESSION => 'NONE', BLOCKCACHE
 => 'true', BLOCKSIZE => '65536'}                                                                                                                 
1 row(s)
Took 0.2891 seconds  
NAME:列族名
VERSIONS:最大版本号
MIN_VERSIONS:最小版本号
TTL(Time To Live):存活时间
IN_MEMORY:是否开启缓存,默认false,应该开启,否则与BLOCKCACHE冲突
BLOCKCACHE:读缓存是否开启,默认开启,64M
(5) exists 判断表是否存在
hbase(main):010:0> exists 'test'
Table test does exist                                                                                                                             
Took 0.0155 seconds                                                                                                                               
=> true

(6) 禁用表与启用表
disable 禁用表

    disable 'table_name'

is_disabled 查看表是否禁用

is_disabled ‘table_name’

enable 启用表
enable 'table_name'
is_enabled查看表是否启用
is_enabled 'table_name'
(7) drop 删除表

hbase的某些版本,在删除表前,需要先禁用表

disable ‘table_name’
drop ‘table_name’

1.5 dml组中命令
(1) 插入数据
put ‘表名’,‘rowkey’,‘列族名:列名’,‘值’
put 'person','0001','name:firstname', 'Jed'
可以指定时间戳,否则默认为系统当前时间
put 'person','0002','info:age',20,1482077777778
(2) 查询某行
get 'person', '0001'
查询某行,指定列名
get 'person', '0001', 'name:firstname'
查询某行,添加其他限制条件

例1:
查询person表中,rowkey为’0001’的这一行,只显示name:firstname这一列,并且只显示最新的3个版本

get 'person', '0001', {COLUMNS => 'name:firstname', VERSIONS => 3}

例2:
查看指定列的内容,并限定显示最新的3个版本和时间范围

get 'person', '0001', {COLUMN => 'name:first', VERSIONS => 3, TIMERANGE => [1392368783980, 1392380169184]}

例3:
查询person表中,rowkey为’rk0001’,且某列的内容为’中国’的记录

scan'person', 'rk0001', {FILTER => "ValueFilter(=, 'binary:中国')"}
(3) 全表扫描
扫描全表
scan 'person'
扫描时指定列族
scan 'person', {COLUMNS => 'name'}
扫描时指定列族,并限定显示最新的5个版本的内容
scan 'person', {COLUMNS => 'name', VERSIONS => 5}
设置开启Raw模式,开启Raw模式会把那些已添加删除标记但是未实际删除的数据也显示出来
scan 'person', {COLUMNS => 'name', RAW => true}
列的过滤

例:
查询user表中列族为info和data的信息

scan 'user', {COLUMNS => ['info', 'data']}

例:
查询user表中列族为info,列名为name、列族为data,列名为pic的信息

scan 'user', {COLUMNS => ['info:name', 'data:pic']}

例:
查询user表中列族为info,列名为name的信息,并且版本最新的5个

scan 'user', {COLUMNS => 'info:name', VERSIONS => 5}

例:
查询user表中列族为info和data且列名含有a字符的信息

scan 'user', {COLUMNS => ['info', 'data'], FILTER => "(QualifierFilter(=,'substring:a'))"}

例:
查询user表中列族为info,rk范围是[rk0001, rk0003)的数据

scan 'people', {COLUMNS => 'info', STARTROW => 'rk0001', ENDROW => 'rk0003'}

例:
查询user表中row key以rk字符开头的

scan 'user',{FILTER=>"PrefixFilter('rk')"}

例:
查询user表中指定时间范围的数据

scan 'user', {TIMERANGE => [1392368783980, 1392380169184]}

scan的用法很多,参数,过滤条件可以很多,各种组合, 在此不列举过多的例子,参考 help ‘scan’

(4) 删除数据
delete 'table_name', 'rowkey', 'family:column'
(5) 清空表
truncate 'table_name'

三、结束

Hadoop放在最后结束

1 首先结束Hbase

[hadoop@localhost ~]$ stop-hbase.sh
stopping hbase...........
localhost: running zookeeper, logging to /usr/local/hbase/logs/hbase-hadoop-zookeeper-localhost.localdomain.out
localhost: stopping zookeeper.

2 结束yarn

[hadoop@localhost ~]$ stop-yarn.sh
Stopping nodemanagers
Stopping resourcemanager

3 结束Hadoop

[hadoop@localhost ~]$ stop-dfs.sh
Stopping namenodes on [localhost]
Stopping datanodes
Stopping secondary namenodes [account.jetbrains.com]

  • 0
    点赞
  • 1
    收藏
    觉得还不错? 一键收藏
  • 0
    评论
评论
添加红包

请填写红包祝福语或标题

红包个数最小为10个

红包金额最低5元

当前余额3.43前往充值 >
需支付:10.00
成就一亿技术人!
领取后你会自动成为博主和红包主的粉丝 规则
hope_wisdom
发出的红包
实付
使用余额支付
点击重新获取
扫码支付
钱包余额 0

抵扣说明:

1.余额是钱包充值的虚拟货币,按照1:1的比例进行支付金额的抵扣。
2.余额无法直接购买下载,可以购买VIP、付费专栏及课程。

余额充值