29hbase&hive&hdfs——好程序

最新推荐文章于 2024-05-28 10:55:52 发布

木生火18624

最新推荐文章于 2024-05-28 10:55:52 发布

阅读量198

点赞数

分类专栏：大数据学习路程

本文链接：https://blog.csdn.net/penghao_1/article/details/104550037

版权

大数据学习路程专栏收录该内容

97 篇文章 2 订阅

订阅专栏

如果不是高可用，是不需要这些的

mr和hbase的结合
TableMapper
TableReducer
TableMapReduceUtil

出错：Caused by: java.lang.ClassNotFoundException: org.apache.hadoop.hbase.client.Scan

解决方案：
1、将hbase的依赖jar包临时带入到hadoop的依赖中
export HADOOP_CLASSPATH=$HADOOP_CLASSPATH:/usr/local/hbase-1.2.1/lib/*

2、将所有的依赖都打到jar包中，但是注意：jar包会有200M
3、在hadoop-env.sh中，将export的命令加入到最后。然后重启集群
4、将hbase的所有的依赖$HBASE_HOME/lib下的jar包整个的copy到HADOOP_HOME/lib目录下。但是容易引起jar包的冲突。不推荐

hbase与hive的结合
整合的目的：
hbase中的表数据在hive中能够看到
hive中的表数据在hbase中能够看到

整合步骤：
1、在hive中创建hbase能看到的表

create table if not exists hbase2hive(
uid int,
uname string,
uage int
)
stored by 'org.apache.hadoop.hive.hbase.HBaseStorageHandler'
with 
serdeproperties(
"hbase.columns.mapping"=":key,cf1:name,cf1:age"
)
tblproperties("hbase.table.name"="h2h")
;

FAILED: Execution Error, return code 1 from org.apache.hadoop.hive.ql.exec.DDLTask. org.apache.hadoop.hbase.HTableDescriptor.addFamily(Lorg/apache/hadoop/hbase/HColumnDescriptor;)V
解决：
将hive-hbase的jar重新打包，重新启动hive

hive中的数据加载：
load data 方式不能加载数据
insert into
select
;

2、如果hbase中存在表，并且存在数据

create EXTERNAL table if not exists hbase2hive2(
uid string,
uname string,
uage int
)
stored by 'org.apache.hadoop.hive.hbase.HBaseStorageHandler'
with 
serdeproperties(
"hbase.columns.mapping"=":key,base_info:name,base_info:age,"
)
tblproperties("hbase.table.name"="ns1:t_userinfo")
;

注意事项：
映射hbase中的列，rowKey的映射，要么就写:key，要么不写，否则列数不匹配
hbase中表存在时，在hive中创建表时需要加关键字external
若删除hbase中对应的表，在hive中就不能查询出数据。
hbase中的列和hive中的列个数以及类型最好相同，hive与hbase的映射是按照字段的顺序来的，而不是按照字段名称来的。
hbase和hive、mysql等都可以使用第三方工具来相互整合数据（蓝灯、shell脚本、phoenix）

协处理器：
observer
endpoint

案例：
create 'ns1:t_guanzhu','cf1','cf2'
create 't_fensi','cf1'

将协处理加载到表：
alter 'ns1:t_guanzhu',METHOD => 'table_att','coprocessor'=>'hdfs://gp1923/demo/gp1923demo-1.0-SNAPSHOT.jar|qfedu.com.bigdata.hbaseObServer.InverIndexCoprocessor|1001|'

hbase需要注意的事项
memstore的刷新阀值：
属性设置：

<property>
    <name>hbase.hregion.memstore.flush.size</name>
    <value>134217728</value>   128M
    <description>
    Memstore will be flushed to disk if size of the memstore
    exceeds this number of bytes.  Value is checked by a thread that runs
    every hbase.server.thread.wakefrequency.</description>
  </property>

hregion的阀值：

<property>
    <name>hbase.hregion.max.filesize</name>
    <value>10737418240</value>   10G
    <description>
    Maximum HStoreFile size. If any one of a column families' HStoreFiles has
    grown to exceed this value, the hosting HRegion is split in two.</description>
  </property>

regionserver的操作线程数：

<property>
    <name>hbase.regionserver.handler.count</name>
    <value>30</value>
    <description>Count of RPC Listener instances spun up on RegionServers.
    Same property is used by the Master for count of master handlers.</description>
  </property>

客户端的优化
1、关闭自动刷新
HTable ht = (HTable) table;
ht.setAutoFlush(false,true);

2、尽量批量写入数据(List<Put> List<Delete>)

3、谨慎关闭写Log：
ht.setDurability(Durability.SKIP_WAL);

4、尽量将数据放到缓存
hc.setInMemory(true);

5、尽量不要太多列簇，最多2个。
hbase在刷新数据时会将列簇相邻的列簇同时刷新

6、rowKey的长度尽可能短。最大64KB
7、尽量将该关闭的对象关闭
比如：admin table connection resultScanner 等

rowKey的设计：（应用场景,四大原则）
长度原则
散列原则
排序原则
唯一原则

移动数据：
通话
上网
短信
....

查询某个用户本月的通话详单：
如何设计rowkey：
phonenum_type_year_month_day_timestamp

解决数据的热点问题：
1、散列
2、加盐
3、反转

查询效率问题：
二级索引

预习：
mr和hbase的结合（总结）
hive和hbase的结合

二级索引
协处理器
rowKey的设计
优化

flume
http://flume.apache.org/

flume的架构

hbase的总结
hbase shell
create alter drop
put get scan delete

tools

java api：
admin
table

rowkey的设计（必懂）
热点问题
性能问题
二级索引
协处理器
宽表、高表

木生火18624

关注

0
点赞
踩
0

收藏

觉得还不错? 一键收藏
0
评论
29hbase&hive&hdfs——好程序

如果不是高可用，是不需要这些的mr和hbase的结合TableMapperTableReducerTableMapReduceUtil出错：Caused by: java.lang.ClassNotFoundException: org.apache.hadoop.hbase.client.Scan解决方案：1、将hbase的依赖jar包临时带入到hadoop的依赖中ex...
复制链接

扫一扫