Hadoop学习--HBase与MapReduce的使用

最新推荐文章于 2022-03-02 18:37:02 发布

我非英雄

最新推荐文章于 2022-03-02 18:37:02 发布

阅读量1.6k

点赞数

分类专栏： Hadoop 文章标签： Hadoop hbase

本文链接：https://blog.csdn.net/y521263/article/details/23608827

版权

Hadoop 专栏收录该内容

18 篇文章 0 订阅

订阅专栏

HBase以表的形式存储数据，每个表由行和列组成，每个列属于一个特定的列族（Column Family）。表中由行列确定的存储单元称为一个元素（Cell），每个元素保存了同一份数据的多个版本，由时间戳来标识。

下面就从安装开始...........

1、下载与安装

选择一个 Apache 下载镜像，下载 HBase Releases. 点击 stable目录，然后下载后缀为 .tar.gz 的文件; 例如 hbase-0.95-SNAPSHOT.tar.gz.

解压缩，然后进入到那个要解压的目录.

$ tar xfz hbase-0.95-SNAPSHOT.tar.gz
$ cd hbase-0.95-SNAPSHOT

编辑 conf/hbase-site.xml（以下配置均为伪分布模式配置）

<configuration>
	<property>
	<name>hbase.rootdir</name>
	<value>hdfs://localhost:9000/hbase</value>
	</property>
	<property>
	<name>dfs.replication</name>
	<value>1</value>
	</property>
</configuration>

编辑conf/hbase-env.xml，添加一行JAVA目录

export JAVA_HOME=/usr/lib/jdk/jdk1.7.0_40

运行HBase

伪分布模式下运行方式：

$ bin/start-hbase.sh

进入HBase shell之中

$ bin/hbase shell

如下所示：

root@ubuntu:~/hbase/hbase-0.94.18# bin/hbase shell
HBase Shell; enter 'help<RETURN>' for list of supported commands.
Type "exit<RETURN>" to leave the HBase Shell
Version 0.94.18, r1577788, Sat Mar 15 04:46:47 UTC 2014


hbase(main):001:0>

退出：

hbase(main):001:0> exit

关闭hbase

$ bin/stop-hbase.sh

MapReduce与HBase结合使用的例子；

改写之前的WordCount代码，将其结果写入到HBase中；

所使用的JAVA API

1、HBaseConfiguration

对HBase进行配置，以及初始化；

用法：

HBaseConfiguration config = new HBaseConfiguration();

2、HBaseAdmin

可以用来对添加，删除表格等等操作。

用法：

HBaseAdmin admin = new HBaseAdmin(config);

3、HTableDescriptor

HTableDescriptor 类包含了表的名字及其表的列表。可以用来添加列，删除列等等。

用法：
HTableDescriptor htd = new HTableDescriptor(tablename);

4、HColumnDescriptor

HColumnDescriptor 类维护着列的信息。

用法：

HColumnDescriptor col = new HColumnDescriptor("content");

5、Put

对单个行进行添加操作；

用法：

Put put = new Put(Bytes.toBytes(key.toString()));
put.add(Bytes.toBytes("content"),Bytes.toBytes("count"),Bytes.toBytes(String.valueOf(count)));

代码：

import java.io.IOException;
import java.util.StringTokenizer;

import org.apache.hadoop.conf.Configuration;
import org.apache.hadoop.fs.Path;
import org.apache.hadoop.hbase.HBaseConfiguration;
import org.apache.hadoop.hbase.HColumnDescriptor;
import org.apache.hadoop.hbase.HTableDescriptor;
import org.apache.hadoop.hbase.client.HBaseAdmin;
import org.apache.hadoop.hbase.client.Put;
import org.apache.hadoop.hbase.util.Bytes;
import org.apache.hadoop.hbase.mapreduce.TableOutputFormat;
import org.apache.hadoop.hbase.mapreduce.TableReducer;
import org.apache.hadoop.io.IntWritable;
import org.apache.hadoop.io.LongWritable;
import org.apache.hadoop.io.NullWritable;
import org.apache.hadoop.io.Text;
import org.apache.hadoop.mapreduce.Job;
import org.apache.hadoop.mapreduce.Mapper;
import org.apache.hadoop.mapreduce.lib.input.FileInputFormat;
import org.apache.hadoop.mapreduce.lib.input.TextInputFormat;

public class WordCountHbase {
    
    public static class MapClass 
    	extends Mapper<LongWritable, Text, Text, IntWritable> {
        
        private final static IntWritable one = new IntWritable(1);
        private Text word = new Text(); 
        public void map(LongWritable key, Text value,
                        Context context ) throws IOException,
                        InterruptedException {
            StringTokenizer itr = new StringTokenizer(value.toString());
            while (itr.hasMoreTokens()){
				word.set(itr.nextToken());
				context.write(word,one);
			}
        }
    }
    
    public static class Reduce extends TableReducer<Text, IntWritable, NullWritable> {
        
        public void reduce(Text key, Iterable<IntWritable> values,
                           Context context) throws IOException,InterruptedException {
                           
            int count = 0;
            for (IntWritable val : values) {
                count += val.get();
            }
            Put put = new Put(Bytes.toBytes(key.toString()));
            put.add(Bytes.toBytes("content"),Bytes.toBytes("count"),Bytes.toBytes(String.valueOf(count)));
            context.write(NullWritable.get(), put);
        }
    }
    
    public static void createHBaseTable(String tablename)throws IOException{
    	HTableDescriptor htd = new HTableDescriptor(tablename);
    	HColumnDescriptor col = new HColumnDescriptor("content");
    	htd.addFamily(col);
    	HBaseConfiguration config = new HBaseConfiguration();
    	HBaseAdmin admin = new HBaseAdmin(config);
    	if(admin.tableExists(tablename)){
    		System.out.println("table exists, trying recreate table! ");
    		admin.disableTable(tablename);
    		admin.deleteTable(tablename);    	
    	}
    	System.out.println("create new table: " + tablename);
    	admin.createTable(htd);    
    }
    
    public static void main(String[] args) throws Exception { 
    	String tablename = "wordcount";
        Configuration conf = new Configuration();
        conf.set(TableOutputFormat.OUTPUT_TABLE,tablename);
        createHBaseTable(tablename);
        
        Job job = new Job(conf, "word count table with"+ args[0]);
        job.setJarByClass(WordCountHbase.class);
        job.setMapperClass(MapClass.class);
        //job.setCombinerClass(Reduce.class);
        job.setReducerClass(Reduce.class);
        job.setMapOutputKeyClass(Text.class);
        job.setMapOutputValueClass(IntWritable.class);
        job.setInputFormatClass(TextInputFormat.class);
        job.setOutputFormatClass(TableOutputFormat.class);
        FileInputFormat.setInputPaths(job, new Path(args[0]));
        System.exit(job.waitForCompletion(true) ? 0 : 1);
    }
}

这里编译的时候，需要将Hbase目录下的hbase-0.94.18.jar、 zookeeper-3.4.5.jar、protobuf-java-2.4.0a.jar、guava-11.0.2.jar 拷贝到hadoop/lib目录下。不然就会报一堆错误。

编译指令：

$ javac -classpath hadoop-core-1.2.1.jar:lib/commons-cli-1.2.jar:lib/commons-logging-api-1.0.4.jar:hbase-0.94.18.jar:zookeeper-3.4.5.jar -d practise/WordCountHbase/classes practise/WordCountHbase/src/WordCountHbase.java

打包之后，再运行：

$ bin/hadoop jar practise/WordCountHbase/WordCountHbase.jar WordCountHbase 1.txt

这里的1.txt中存有

hello world

hello hadoop

之后再到HBase里查看，Mapreduce是否已经把结果写入到表中；

hbase(main):001:0> list
TABLE                                                                           
tab1                                                                            
wordcount                                                                       
2 row(s) in 8.3730 seconds


hbase(main):002:0> scan 'wordcount'
ROW                   COLUMN+CELL                                               
 hadoop               column=content:count, timestamp=1397376976400, value=1    
 hello                column=content:count, timestamp=1397376976400, value=2    
 world                column=content:count, timestamp=1397376976400, value=1    
3 row(s) in 0.5460 seconds


hbase(main):003:0>

以下是一些使用Hbase报错的问题解决方法

hadoop+hbase导致报错

HBase异常：hbase-default.xml file seems to be for and old version of HBase的解决方法

HBase MapReduce实例分析

编写MR运行在Hbase上面注意事项

hive、hbase整合后，reduce过程总找不到zookeeper问题解决