HADOOP-HBase MapReduce Examples

51 篇文章 0 订阅

1. HBase MapReduce Read Example

The following is an example of using HBase as a MapReduce source in read-only manner. Specifically, there is a Mapper instance but no Reducer, and nothing is being emitted from the Mapper. There job would be defined as follows…

Configuration config = HBaseConfiguration.create();
Job job = new Job(config, "ExampleRead");
job.setJarByClass(MyReadJob.class);     // class that contains mapper

Scan scan = new Scan();
scan.setCaching(500);        // 1 is the default in Scan, which will be bad for MapReduce jobs
scan.setCacheBlocks(false);  // don't set to true for MR jobs
// set other scan attrs
...

TableMapReduceUtil.initTableMapperJob(
  tableName,        // input HBase table name
  scan,             // Scan instance to control CF and attribute selection
  MyMapper.class,   // mapper
  null,             // mapper output key 
  null,             // mapper output value
  job);
job.setOutputFormatClass(NullOutputFormat.class);   // because we aren't emitting anything from mapper

boolean b = job.waitForCompletion(true);
if (!b) {
  throw new IOException("error with job!");
}

…and the mapper instance would extend TableMapper

public static class MyMapper extends TableMapper<Text, Text> {

  public void map(ImmutableBytesWritable row, Result value, Context context) throws InterruptedException, IOException {
    // process data for the row from the Result instance.
   }
}

2. HBase MapReduce Read/Write Example

The following is an example of using HBase both as a source and as a sink with MapReduce. This example will simply copy data from one table to another.

Configuration config = HBaseConfiguration.create();
Job job = new Job(config,"ExampleReadWrite");
job.setJarByClass(MyReadWriteJob.class);    // class that contains mapper

Scan scan = new Scan();
scan.setCaching(500);        // 1 is the default in Scan, which will be bad for MapReduce jobs
scan.setCacheBlocks(false);  // don't set to true for MR jobs
// set other scan attrs

TableMapReduceUtil.initTableMapperJob(
	sourceTable,      // input table
	scan,	          // Scan instance to control CF and attribute selection
	MyMapper.class,   // mapper class
	null,	          // mapper output key
	null,	          // mapper output value
	job);
TableMapReduceUtil.initTableReducerJob(
	targetTable,      // output table
	null,             // reducer class
	job);
job.setNumReduceTasks(0);

boolean b = job.waitForCompletion(true);
if (!b) {
    throw new IOException("error with job!");
}

An explanation is required of what TableMapReduceUtil is doing, especially with the reducer.TableOutputFormat is being used as the outputFormat class, and several parameters are being set on the config (e.g., TableOutputFormat.OUTPUT_TABLE), as well as setting the reducer output key toImmutableBytesWritable and reducer value to Writable. These could be set by the programmer on the job and conf, butTableMapReduceUtil tries to make things easier.

The following is the example mapper, which will create a Put and matching the inputResult and emit it. Note: this is what the CopyTable utility does.

public static class MyMapper extends TableMapper<ImmutableBytesWritable, Put>  {

	public void map(ImmutableBytesWritable row, Result value, Context context) throws IOException, InterruptedException {
		// this example is just copying the data from the source table...
   		context.write(row, resultToPut(row,value));
   	}

  	private static Put resultToPut(ImmutableBytesWritable key, Result result) throws IOException {
  		Put put = new Put(key.get());
 		for (KeyValue kv : result.raw()) {
			put.add(kv);
		}
		return put;
   	}
}

There isn’t actually a reducer step, so TableOutputFormat takes care of sending thePut to the target table.

This is just an example, developers could choose not to use TableOutputFormat and connect to the target table themselves.

3. HBase MapReduce Read/Write Example With Multi-Table Output

TODO: example for MultiTableOutputFormat.

4. HBase MapReduce Summary to HBase Example

The following example uses HBase as a MapReduce source and sink with a summarization step. This example will count the number of distinct instances of a value in a table and write those summarized counts in another table.

Configuration config = HBaseConfiguration.create();
Job job = new Job(config,"ExampleSummary");
job.setJarByClass(MySummaryJob.class);     // class that contains mapper and reducer

Scan scan = new Scan();
scan.setCaching(500);        // 1 is the default in Scan, which will be bad for MapReduce jobs
scan.setCacheBlocks(false);  // don't set to true for MR jobs
// set other scan attrs

TableMapReduceUtil.initTableMapperJob(
	sourceTable,        // input table
	scan,               // Scan instance to control CF and attribute selection
	MyMapper.class,     // mapper class
	Text.class,         // mapper output key
	IntWritable.class,  // mapper output value
	job);
TableMapReduceUtil.initTableReducerJob(
	targetTable,        // output table
	MyTableReducer.class,    // reducer class
	job);
job.setNumReduceTasks(1);   // at least one, adjust as required

boolean b = job.waitForCompletion(true);
if (!b) {
	throw new IOException("error with job!");
}

In this example mapper a column with a String-value is chosen as the value to summarize upon. This value is used as the key to emit from the mapper, and anIntWritable represents an instance counter.

public static class MyMapper extends TableMapper<Text, IntWritable>  {

	private final IntWritable ONE = new IntWritable(1);
   	private Text text = new Text();

   	public void map(ImmutableBytesWritable row, Result value, Context context) throws IOException, InterruptedException {
        	String val = new String(value.getValue(Bytes.toBytes("cf"), Bytes.toBytes("attr1")));
          	text.set(val);     // we can only emit Writables...

        	context.write(text, ONE);
   	}
}

In the reducer, the “ones” are counted (just like any other MR example that does this), and then emits aPut.

public static class MyTableReducer extends TableReducer<Text, IntWritable, ImmutableBytesWritable>  {

 	public void reduce(Text key, Iterable<IntWritable> values, Context context) throws IOException, InterruptedException {
    		int i = 0;
    		for (IntWritable val : values) {
    			i += val.get();
    		}
    		Put put = new Put(Bytes.toBytes(key.toString()));
    		put.add(Bytes.toBytes("cf"), Bytes.toBytes("count"), Bytes.toBytes(i));

    		context.write(null, put);
   	}
}

5. HBase MapReduce Summary to File Example

This very similar to the summary example above, with exception that this is using HBase as a MapReduce source but HDFS as the sink. The differences are in the job setup and in the reducer. The mapper remains the same.

Configuration config = HBaseConfiguration.create();
Job job = new Job(config,"ExampleSummaryToFile");
job.setJarByClass(MySummaryFileJob.class);     // class that contains mapper and reducer

Scan scan = new Scan();
scan.setCaching(500);        // 1 is the default in Scan, which will be bad for MapReduce jobs
scan.setCacheBlocks(false);  // don't set to true for MR jobs
// set other scan attrs

TableMapReduceUtil.initTableMapperJob(
	sourceTable,        // input table
	scan,               // Scan instance to control CF and attribute selection
	MyMapper.class,     // mapper class
	Text.class,         // mapper output key
	IntWritable.class,  // mapper output value
	job);
job.setReducerClass(MyReducer.class);    // reducer class
job.setNumReduceTasks(1);    // at least one, adjust as required
FileOutputFormat.setOutputPath(job, new Path("/tmp/mr/mySummaryFile"));  // adjust directories as required

boolean b = job.waitForCompletion(true);
if (!b) {
	throw new IOException("error with job!");
}

As stated above, the previous Mapper can run unchanged with this example. As for the Reducer, it is a “generic” Reducer instead of extending TableMapper and emitting Puts.

 public static class MyReducer extends Reducer<Text, IntWritable, Text, IntWritable>  {

	public void reduce(Text key, Iterable<IntWritable> values, Context context) throws IOException, InterruptedException {
		int i = 0;
		for (IntWritable val : values) {
			i += val.get();
		}	
		context.write(key, new IntWritable(i));
	}
}

6. HBase MapReduce Summary to HBase Without Reducer

It is also possible to perform summaries without a reducer – if you use HBase as the reducer.

An HBase target table would need to exist for the job summary. The HTable methodincrementColumnValue would be used to atomically increment values. From a performance perspective, it might make sense to keep a Map of values with their values to be incremeneted for each map-task, and make one update per key at during thecleanup method of the mapper. However, your milage may vary depending on the number of rows to be processed and unique keys.

In the end, the summary results are in HBase.

7. Export an HBase table to File:

package mapred;

/**
* Copyright 2009 The Apache Software Foundation
*
* Licensed to the Apache Software Foundation (ASF) under one
* or more contributor license agreements.  See the NOTICE file
* distributed with this work for additional information
* regarding copyright ownership.  The ASF licenses this file
* to you under the Apache License, Version 2.0 (the
* “License”); you may not use this file except in compliance
* with the License.  You may obtain a copy of the License at
*
*     http://www.apache.org/licenses/LICENSE-2.0
*
* Unless required by applicable law or agreed to in writing, software
* distributed under the License is distributed on an “AS IS” BASIS,
* WITHOUT WARRANTIES OR CONDITIONS OF ANY KIND, either express or implied.
* See the License for the specific language governing permissions and
* limitations under the License.
*/
import java.io.IOException;

import org.apache.hadoop.conf.Configuration;
import org.apache.hadoop.fs.Path;
import org.apache.hadoop.hbase.HBaseConfiguration;
import org.apache.hadoop.hbase.client.Result;
import org.apache.hadoop.hbase.client.Scan;
import org.apache.hadoop.hbase.io.ImmutableBytesWritable;
import org.apache.hadoop.hbase.mapreduce.TableInputFormat;
import org.apache.hadoop.hbase.mapreduce.TableMapReduceUtil;
import org.apache.hadoop.hbase.mapreduce.TableMapper;
import org.apache.hadoop.hbase.util.Bytes;
import org.apache.hadoop.mapreduce.Job;
import org.apache.hadoop.mapreduce.lib.output.FileOutputFormat;
import org.apache.hadoop.mapreduce.lib.output.SequenceFileOutputFormat;
import org.apache.hadoop.util.GenericOptionsParser;
import org.apache.commons.logging.Log;
import org.apache.commons.logging.LogFactory;

/**
* Export an HBase table.
* Writes content to sequence files up in HDFS.  Use {@link Import} to read it
* back in again.
*/
public class Export {
private static final Log LOG = LogFactory.getLog(Export.class);
final static String NAME = “export”;

/**
* Mapper.
*/
static class Exporter extends TableMapper<ImmutableBytesWritable, Result> {
/**
* @param row  The current table row key.
* @param value  The columns.
* @param context  The current context.
* @throws IOException When something is broken with the data.
* @see org.apache.hadoop.mapreduce.Mapper#map(KEYIN, VALUEIN,
*   org.apache.hadoop.mapreduce.Mapper.Context)
*/
@Override
public void map(ImmutableBytesWritable row, Result value,
Context context)
throws IOException {
try {
context.write(row, value);
} catch (InterruptedException e) {
e.printStackTrace();
}
}
}

/**
* Sets up the actual job.
*
* @param conf  The current configuration.
* @param args  The command line parameters.
* @return The newly created job.
* @throws IOException When setting up the job fails.
*/
public static Job createSubmittableJob(Configuration conf, String[] args)
throws IOException {
String tableName = args[0];
Path outputDir = new Path(args[1]);
Job job = new Job(conf, NAME + “_” + tableName);
job.setJobName(NAME + “_” + tableName);
job.setJarByClass(Exporter.class);
// TODO: Allow passing filter and subset of rows/columns.
Scan s = new Scan();
// Optional arguments.
int versions = args.length > 2? Integer.parseInt(args[2]): 1;
s.setMaxVersions(versions);
long startTime = args.length > 3? Long.parseLong(args[3]): 0L;
long endTime = args.length > 4? Long.parseLong(args[4]): Long.MAX_VALUE;
s.setTimeRange(startTime, endTime);
s.setCacheBlocks(false);
if (conf.get(TableInputFormat.SCAN_COLUMN_FAMILY) != null) {
s.addFamily(Bytes.toBytes(conf.get(TableInputFormat.SCAN_COLUMN_FAMILY)));
}
LOG.info(“verisons=” + versions + “, starttime=” + startTime +
“, endtime=” + endTime);
TableMapReduceUtil.initTableMapperJob(tableName, s, Exporter.class, null,
null, job);
// No reducers.  Just write straight to output files.
job.setNumReduceTasks(0);
job.setOutputFormatClass(SequenceFileOutputFormat.class);
job.setOutputKeyClass(ImmutableBytesWritable.class);
job.setOutputValueClass(Result.class);
FileOutputFormat.setOutputPath(job, outputDir);
return job;
}

/*
* @param errorMsg Error message.  Can be null.
*/
private static void usage(final String errorMsg) {
if (errorMsg != null && errorMsg.length() > 0) {
System.err.println(“ERROR: ” + errorMsg);
}
System.err.println(“Usage: Export [-D <property=value>]* <tablename> <outputdir> [<versions> " +
"[<starttime> [<endtime>]]]\n”);
System.err.println(“  Note: -D properties will be applied to the conf used. “);
System.err.println(“  For example: “);
System.err.println(“   -D mapred.output.compress=true”);
System.err.println(“   -D mapred.output.compression.codec=org.apache.hadoop.io.compress.GzipCodec”);
System.err.println(“   -D mapred.output.compression.type=BLOCK”);
System.err.println(“  Additionally, the following SCAN properties can be specified”);
System.err.println(“  to control/limit what is exported..”);
System.err.println(“   -D ” + TableInputFormat.SCAN_COLUMN_FAMILY + “=<familyName>”);
}

/**
* Main entry point.
*
* @param args  The command line parameters.
* @throws Exception When running the job fails.
*/
public static void main(String[] args) throws Exception {
args = new String[]{“test”,”Out”};
Configuration conf = HBaseConfiguration.create();
String[] otherArgs = new GenericOptionsParser(conf, args).getRemainingArgs();
if (otherArgs.length < 2) {
usage(“Wrong number of arguments: ” + otherArgs.length);
System.exit(-1);
}
Job job = createSubmittableJob(conf, otherArgs);
System.exit(job.waitForCompletion(true)? 0 : 1);
}
}

8. Import an exported Data into Table:

package mapred;

import java.io.IOException;
import  org.apache.hadoop.conf.Configuration;
import  org.apache.hadoop.fs.Path;
import org.apache.hadoop.hbase.HBaseConfiguration;
import org.apache.hadoop.hbase.KeyValue;
import org.apache.hadoop.hbase.client.Put;
import org.apache.hadoop.hbase.client.Result;
import org.apache.hadoop.hbase.io.ImmutableBytesWritable;
import org.apache.hadoop.hbase.mapreduce.TableMapReduceUtil;
import org.apache.hadoop.hbase.mapreduce.TableMapper;
import  org.apache.hadoop.mapreduce.Job;
import  org.apache.hadoop.mapreduce.lib.input.FileInputFormat;
import  org.apache.hadoop.mapreduce.lib.input.SequenceFileInputFormat;
import  org.apache.hadoop.util.GenericOptionsParser;

public class Import {

final static String NAME = “import”;
static class Importer extends TableMapper<ImmutableBytesWritable, Put> {

@Override
public void map(ImmutableBytesWritable row, Result value,Context context) throws IOException {

try {
context.write(row, resultToPut(row, value));
} catch (InterruptedException e) {
e.printStackTrace();
}
}

private static Put resultToPut(ImmutableBytesWritable key, Result result) throws IOException {
Put put = new Put(key.get());
for (KeyValue kv : result.raw()) {
put.add(kv);
}
return put;
}
}

public static Job createSubmittableJob(Configuration conf, String[] args) throws IOException {
String tableName = args[0];
Path inputDir = new Path(args[1]);
Job job = new Job(conf, NAME + “_” + tableName);
job.setJarByClass(Importer.class);
FileInputFormat.setInputPaths(job, inputDir);
job.setInputFormatClass(SequenceFileInputFormat.class);
job.setMapperClass(Importer.class);

TableMapReduceUtil.initTableReducerJob(tableName, null, job);
job.setNumReduceTasks(0);
return job;
}

private static void usage(final String errorMsg) {
if (errorMsg != null && errorMsg.length() > 0) {
System.err.println(“ERROR: ” + errorMsg);
}
System.err.println(“Usage: Import <tablename> <inputdir>”);

}

public static void main(String[] args) throws Exception {
Configuration conf = HBaseConfiguration.create();
String[] otherArgs = new GenericOptionsParser(conf, args).getRemainingArgs();
if (otherArgs.length < 2) {
usage(“Wrong number of arguments: ” + otherArgs.length);
System.exit(-1);
}
Job job = createSubmittableJob(conf, otherArgs);
System.exit(job.waitForCompletion(true) ? 0 : 1);
}
}

9.HBase Manager Utility:

package util;

import java.io.IOException;
import java.util.ArrayList;
import java.util.Collections;
import java.util.HashMap;
import java.util.List;
import org.apache.hadoop.conf.Configuration;
import org.apache.hadoop.hbase.HColumnDescriptor;
import org.apache.hadoop.hbase.HTableDescriptor;
import org.apache.hadoop.hbase.KeyValue;
import org.apache.hadoop.hbase.MasterNotRunningException;
import org.apache.hadoop.hbase.ZooKeeperConnectionException;
import org.apache.hadoop.hbase.client.Get;
import org.apache.hadoop.hbase.client.HBaseAdmin;
import org.apache.hadoop.hbase.client.HTable;
import org.apache.hadoop.hbase.client.Put;
import org.apache.hadoop.hbase.client.Result;
import org.apache.hadoop.hbase.client.ResultScanner;
import org.apache.hadoop.hbase.client.Scan;
import org.apache.hadoop.hbase.filter.CompareFilter;
import org.apache.hadoop.hbase.filter.Filter;
import org.apache.hadoop.hbase.filter.FilterList;
import org.apache.hadoop.hbase.filter.SingleColumnValueFilter;
import org.apache.hadoop.hbase.filter.SubstringComparator;
import org.apache.hadoop.hbase.util.Bytes;

public class HBaseManager {

private static HBaseAdmin admin = null;
public static Configuration conf = null;

static{
try {
conf =  HBaseManager.getHBConnection();
admin = new HBaseAdmin(conf);
} catch (MasterNotRunningException e) {
// TODO Auto-generated catch block
e.printStackTrace();
} catch (ZooKeeperConnectionException e) {
// TODO Auto-generated catch block
e.printStackTrace();
}
}
/**
* This method would be used to connect to remote HBase System….
* @throws IOException
*/
public void getRemoteHBaseConnection() throws IOException{
ResultScanner scanner = null;
try {
Configuration config = new Configuration();
config.clear();
config.set(“hbase.master”, “master:60000″);
config.set(“hbase.rootdir”, “hdfs://localhost:50001/hbase”);

/* config.set(“hbase.master.info.bindAddress”, “0.0.0.0″);
config.set(“hbase.master.dns.interface”, “2888″);
config.set(“hbase.master.info.port”, “60010″);
config.set(“hbase.rpc.engine”, “org.apache.hadoop.hbase.ipc.WritableRpcEngine”);
config.set(“hbase.zookeeper.peerport”, “2888″); */
config.set(“hbase.zookeeper.quorum”, “master”);
config.set(“hbase.zookeeper.property.clientPort”,”2181″);

// HBaseAdmin.checkHBaseAvailable(config);
//creating a new table
HTable table = new HTable(config, “test”);
Scan s = new Scan();
scanner = table.getScanner(s);
for (Result rr = scanner.next(); rr != null; rr = scanner.next()) {
// print out the row we found and the columns we were looking for
System.out.println(“Found row: ” + rr);
}
System.out.println(“Table mytable obtained “);
// addData(table);
} catch (Exception e) {
System.out.println(“HBase is not running!”);
System.exit(1);
} finally{
scanner.close();
}
}

/**
* This method would be used to connect to Local HBase master ….
* @return
*/
public static Configuration getHBConnection(){
Configuration config = null;
try {
config = new Configuration();
config.clear();
config.set(“hbase.zookeeper.quorum”, “107.108.99.145″);
config.set(“hbase.zookeeper.property.clientPort”,”2181″);
config.set(“hbase.master”, “107.108.99.145:60000″);
//config.set(“hbase.hregion.max.filesize”, “1″);

} catch (Exception e) {
System.out.println(“HBase is not running!”);
System.exit(1);
}
return config;
}

public void putData(){

}

/**
*
* @param table
* @param splitKeys
* @param colfams
* @throws IOException
*/
public void createTable(String table, byte[][] splitKeys, String… colfams)
throws IOException {
HTableDescriptor desc = new HTableDescriptor(table);
for (String cf : colfams) {
HColumnDescriptor coldef = new HColumnDescriptor(cf);
desc.addFamily(coldef);
}
if (splitKeys != null) {
admin.createTable(desc, splitKeys);
} else {
admin.createTable(desc);
}
}

/**
*
* @param table
* @param startRow
* @param endRow
* @param numCols
* @param pad
* @param setTimestamp
* @param random
* @param colfams
* @throws IOException
*/
public void fillTable(String table, int startRow, int endRow, int numCols,
int pad, boolean setTimestamp, boolean random,
String[] colfams,String[] colVals)
throws IOException {

Configuration conf = HBaseManager.getHBConnection();
HTable tbl = new HTable(conf, table);
for (int row = startRow; row <= endRow; row++) {
for (int col = 1; col <= numCols; col++) {
Put put = new Put(Bytes.toBytes(padNum(row, pad)));
for (int i=0; i< colfams.length; i++) {
String cf = colfams[i];
String val = colVals[i];
String colName = padNum(col, pad);
if (setTimestamp) {
put.add(Bytes.toBytes(cf), Bytes.toBytes(colName),
col, Bytes.toBytes(val));
} else {
put.add(Bytes.toBytes(cf), Bytes.toBytes(colName),
Bytes.toBytes(val));
}
}
tbl.put(put);
}
}
tbl.close();
}

/**
*
* @param tableName
* @return
*/
public void dump(String table, String[] rows, String[] fams, String[] quals)
throws IOException {
HTable tbl = new HTable(conf, table);
List<Get> gets = new ArrayList<Get>();
for (String row : rows) {
Get get = new Get(Bytes.toBytes(row));
get.setMaxVersions();
if (fams != null) {
for (String fam : fams) {
for (String qual : quals) {
get.addColumn(Bytes.toBytes(fam), Bytes.toBytes(qual));
}
}
}
gets.add(get);
}
Result[] results = tbl.get(gets);
for (Result result : results) {
for (KeyValue kv : result.raw()) {
HashMap map = (HashMap) kv.toStringMap();
// System.out.println(kv.toStringMap().toString());
System.out.println( map.get(“family”) +
“: ” + Bytes.toString(kv.getValue()));
}
}
}

/**
*
* @param table
* @param row
* @param fam
* @param qual
* @param val
* @throws IOException
*/
public void put(String table, String row, String fam, String qual,
String val) throws IOException {
HTable tbl = new HTable(conf, table);
Put put = new Put(Bytes.toBytes(row));
put.add(Bytes.toBytes(fam), Bytes.toBytes(qual), Bytes.toBytes(val));
tbl.put(put);
tbl.close();
}

/**
*
* @return
*/
public void getContentData(String tableName,String[] colmFmly,String[] qualifiers, int nums){
ResultScanner scanner = null;
int i=1;
Configuration config = null;
try {
config = HBaseManager.getHBConnection();
//creating a new table
HTable table = new HTable(config, tableName);
Scan s = new Scan();
for(String column : colmFmly){
for(String qualifier : qualifiers){
s.addColumn(Bytes.toBytes(column),Bytes.toBytes(qualifier));
}
}
scanner = table.getScanner(s);
for (Result rr = scanner.next(); rr != null && i <= nums; rr = scanner.next()) {
i++;
for(String column : colmFmly){
for(String qualifier : qualifiers){
System.out.println(“key : “+column +” Value: “+ Bytes.toString(rr.getValue(Bytes.toBytes(column),Bytes.toBytes(qualifier))));
}
}
}
}catch (Exception e) {
// TODO: handle exception
}
}

public void getContentData(String tableName, int limit) throws IOException{
Configuration config = HBaseManager.getHBConnection();

List<Filter> filters = new ArrayList<Filter>();
HTable table = new HTable(config, tableName);

/**
*  Filter to check the User-ID column and value …
*/
SingleColumnValueFilter filter = new SingleColumnValueFilter(Bytes.toBytes(“User_ID”),Bytes.toBytes(“01″),CompareFilter.CompareOp.EQUAL, new SubstringComparator(“Usr-1″));
filters.add(filter);
/**
*  Filter to check the domainType column and value …
*/
SingleColumnValueFilter filtera = new SingleColumnValueFilter(Bytes.toBytes(“DomainType”),Bytes.toBytes(“01″),CompareFilter.CompareOp.EQUAL, new SubstringComparator(“webapp”));
filters.add(filtera);

List sortedLst = new ArrayList();
FilterList filterList2 = new FilterList(
FilterList.Operator.MUST_PASS_ONE, filters);
Scan scan = new Scan();
scan.setFilter(filterList2);
ResultScanner scanner2 = table.getScanner(scan);
for (Result result : scanner2) {
// for (int i=0; i < limit; i++) {
//Result result = scanner2.next();
for (KeyValue kv : result.raw()) {
HashMap map = (HashMap) kv.toStringMap();
//System.out.println(map.get(“family”) +
//  “: ” + Bytes.toString(kv.getValue()));
sortedLst.add(map.get(“family”)+”|”+Bytes.toString(kv.getValue()));
}
}
Collections.sort(sortedLst);
for(int i=0; i< sortedLst.size(); i++){
System.out.println(sortedLst.get(i));
}
scanner2.close();
}

/**
*
* @param num
* @param pad
* @return
*/
public static String padNum(int num, int pad) {
String res = Integer.toString(num);
if (pad > 0) {
while (res.length() < pad) {
res = “0″ + res;
}
}
return res;
}

/**
*
* @param args
*/
public static void main(String[] args) {
HBaseManager hmanager = new HBaseManager();
try {
//hmanager.createTable(“EvaluatedDB”, null, “Content_ID”, “DomainType”, “Predicted_Rating”,”User_ID”);
//for (int i=1; i<= 20; i++){
//    hmanager.fillTable(“EvaluatedDB”, 3, 23, 1, 2, false, false, new String[]{“Content_ID”, “DomainType”, “Predicted_Rating”,”User_ID”},new String[]{“C-”+i,”webapp”,”5.9″+i,”Usr-”+i});
//}
//hmanager.fillTable(“EvaluatedDB”, 3, 3, 1, 2, false, false, new String[]{“Content_ID”, “DomainType”, “Predicted_Rating”,”User_ID”},new String[]{“C-1″,”webapp”,”5.9″,”Usr-1″});
//hmanager.dump(“EvaluatedDB”, new String[]{“”}, new String[]{“Content_ID”, “DomainType”, “Predicted_Rating”,”User_ID”}, new String[]{“01″});
//hmanager.getContentData(“EvaluatedDB”, new String[]{“Content_ID”, “DomainType”, “Predicted_Rating”,”User_ID”}, new String[]{“01″},2);
//hmanager.getContentData(“EvaluatedDB”, 10);
getHBConnection();
} catch (Exception e) {
// TODO Auto-generated catch block
e.printStackTrace();
}
}
}

10. Map Reduce example for MapReduceFileToTable:

package mapred;

import org.apache.hadoop.conf.Configuration;
import org.apache.hadoop.fs.Path;
import org.apache.hadoop.hbase.client.Put;
import org.apache.hadoop.hbase.mapreduce.TableMapReduceUtil;
import org.apache.hadoop.hbase.util.Bytes;
import org.apache.hadoop.io.LongWritable;
import org.apache.hadoop.io.Text;
import org.apache.hadoop.mapreduce.Job;
import org.apache.hadoop.mapreduce.Mapper;
import org.apache.hadoop.mapreduce.lib.input.TextInputFormat;
import org.apache.hadoop.mapreduce.lib.input.FileInputFormat;
import org.apache.hadoop.mapreduce.lib.output.TextOutputFormat;
import util.HBaseManager;
import java.io.IOException;

public class MapReduceFileToTable{

static class Map extends Mapper<LongWritable, Text, Text, Put> {
/**
* map  driver code
*
* @param key
* @param value
* @param context
* @exception IOException
* @exception InterruptedException
*/

protected void map(LongWritable key, Text value, Context context)
throws IOException, InterruptedException {
String messageStr = value.toString();
Put put = new Put(Bytes.toBytes(“1″));
if (messageStr.contains(“\t”)) {
String[] logRecvArr = messageStr.split(“\t”);
if (logRecvArr.length >= 10) {

put.add(Bytes.toBytes(“User”), Bytes.toBytes(“UserId”),
Bytes.toBytes(logRecvArr[0]));
put.add(Bytes.toBytes(“MusicData”), Bytes.toBytes(“Id”),
Bytes.toBytes(logRecvArr[1]));
put.add(Bytes.toBytes(“MusicData”), Bytes.toBytes(“PlayerId”),
Bytes.toBytes(logRecvArr[2]));
put.add(Bytes.toBytes(“MusicData”), Bytes.toBytes(“PlayDate”),
Bytes.toBytes(logRecvArr[4]));
put.add(Bytes.toBytes(“MusicData”), Bytes.toBytes(“PlayTime”),
Bytes.toBytes(logRecvArr[5]));
put.add(Bytes.toBytes(“MusicData”), Bytes.toBytes(“PlayStopDate”),
Bytes.toBytes(logRecvArr[6]));
put.add(Bytes.toBytes(“MusicData”), Bytes.toBytes(“PlayStopTime”),
Bytes.toBytes(logRecvArr[7]));
put.add(Bytes.toBytes(“MusicData”), Bytes.toBytes(“Langitude”),
Bytes.toBytes(logRecvArr[8]));
put.add(Bytes.toBytes(“MusicData”), Bytes.toBytes(“Latitude”),
Bytes.toBytes(logRecvArr[9]));

}
} else {
System.out.println(“Log is in incorrect format. “);
}
context.write(new Text(“1″), put);
}

}

/**
* Where jobs and their settings and sequence is set.
*
* @param args
*          arguments with exception of Tools understandable ones.
*/
public int execute() throws Exception {
Configuration config = HBaseManager.conf;
Job job = new Job(config, “TrandferHdfsToUserLog”);
job.setJarByClass(MapReduceFileToTable.class); // class that
job.setOutputKeyClass(Text.class);
job.setOutputValueClass(Text.class);
job.setMapOutputKeyClass(Text.class);
job.setMapOutputValueClass(Text.class);

job.setInputFormatClass(TextInputFormat.class);
job.setOutputFormatClass(TextOutputFormat.class);

FileInputFormat.setInputPaths(job, new Path(“Out”));

job.setMapperClass(Map.class);
TableMapReduceUtil.initTableReducerJob(
“UserLogTable”,        // output table
null,    // reducer class
job);
job.setNumReduceTasks(0);   // at least one, adjust as required
System.out.println(“Hello Hadoop 2nd Job!!”+job.waitForCompletion(true));
return 0;
}

public static void main(String[] args) throws Exception {
new MapReduceFileToTable().execute();
}
}

11. Map Reduce example for MultiTableMapper:

package mapred;

import java.io.IOException;
import org.apache.hadoop.hbase.io.ImmutableBytesWritable;
import org.apache.hadoop.hbase.mapreduce.MultiTableOutputFormat;
import org.apache.hadoop.hbase.util.Bytes;
import org.apache.hadoop.io.LongWritable;
import org.apache.hadoop.io.Text;
import org.apache.hadoop.conf.Configuration;
import org.apache.hadoop.fs.Path;
import org.apache.hadoop.mapreduce.Job;
import org.apache.hadoop.mapreduce.Mapper;
import org.apache.hadoop.mapreduce.lib.input.FileInputFormat;
import org.apache.hadoop.mapreduce.lib.input.TextInputFormat;
import org.apache.hadoop.hbase.client.Put;

public class MultiTableMapper {

static class InnerMapper extends Mapper <LongWritable, Text, ImmutableBytesWritable, Put> {

public void map(LongWritable offset, Text value, Context context) throws IOException {
// contains the line of tab separated data we are working on (needs to be parsed out).
//byte[] lineBytes = value.getBytes();
String valuestring[]=value.toString().split(“\t”);
String rowid = /*HBaseManager.generateID();*/ “12345″;
// rowKey is the hbase rowKey generated from lineBytes
Put put = new Put(rowid.getBytes());
put.add(Bytes.toBytes(“UserInfo”), Bytes.toBytes(“StudentName”), Bytes.toBytes(valuestring[0]));

try {
context.write(new ImmutableBytesWritable(Bytes.toBytes(“Table1″)), put);
} catch (InterruptedException e) {
// TODO Auto-generated catch block
e.printStackTrace();
} // write to the actions table

// rowKey2 is the hbase rowKey
Put put1 = new Put(rowid.getBytes());
put1.add(Bytes.toBytes(“MarksInfo”),Bytes.toBytes(“Marks”),Bytes.toBytes(valuestring[1]));
// Create your KeyValue object
//put.add(kv);

try {
context.write(new ImmutableBytesWritable(Bytes.toBytes(“Table2″)), put1);
} catch (InterruptedException e) {
// TODO Auto-generated catch block
e.printStackTrace();
} // write to the actions table
}
}

public static void createSubmittableJob() throws IOException, ClassNotFoundException, InterruptedException {
Path inputDir = new Path(“in”);
Configuration conf = /*HBaseManager.getHBConnection();*/ new Configuration();
Job job = new Job(conf, “my_custom_job”);
job.setJarByClass(InnerMapper.class);
FileInputFormat.setInputPaths(job, inputDir);
job.setMapperClass(InnerMapper.class);

job.setInputFormatClass(TextInputFormat.class);

// this is the key to writing to multiple tables in hbase
job.setOutputFormatClass(MultiTableOutputFormat.class);
//job.setNumReduceTasks(0);
//TableMapReduceUtil.addDependencyJars(job);
//TableMapReduceUtil.addDependencyJars(job.getConfiguration());
System.out.println(job.waitForCompletion(true));
}

public static void main(String[] args) throws IOException, ClassNotFoundException, InterruptedException {
// TODO Auto-generated method stub
MultiTableMapper.createSubmittableJob();
System.out.println();
}

}

12. Map Reduce example for ReadFromTableAndWriteToFile:

package mapred;

import java.io.IOException;
import java.util.HashMap;

import org.apache.hadoop.conf.Configuration;
import org.apache.hadoop.fs.Path;
import org.apache.hadoop.hbase.KeyValue;
import org.apache.hadoop.hbase.client.Result;
import org.apache.hadoop.hbase.client.Scan;
import org.apache.hadoop.hbase.io.ImmutableBytesWritable;
import org.apache.hadoop.hbase.mapreduce.TableMapReduceUtil;
import org.apache.hadoop.hbase.mapreduce.TableMapper;
import org.apache.hadoop.hbase.util.Bytes;
import org.apache.hadoop.io.IntWritable;
import org.apache.hadoop.io.Text;
import org.apache.hadoop.mapreduce.Job;
import org.apache.hadoop.mapreduce.Reducer;
import org.apache.hadoop.mapreduce.lib.output.FileOutputFormat;
import org.apache.hadoop.mapreduce.lib.output.TextOutputFormat;

import util.HBaseManager;

public class ReadFromTableAndWriteToFile {

public static class MyMapper extends TableMapper<Text, IntWritable>  {

private final IntWritable ONE = new IntWritable(1);
private Text text = new Text();

public void map(ImmutableBytesWritable row, Result value, Context context) throws IOException, InterruptedException {
System.out.println(row);
for (KeyValue kv : value.raw()) {
HashMap<?, ?> map = (HashMap<?, ?>) kv.toStringMap();
System.out.println(“RowId is : “+Bytes.toString(value.getRow()));
System.out.println(map.get(“family”));
if(map.get(“qualifier”)!= null && !map.get(“qualifier”).equals(“”)){
System.out.println(map.get(“qualifier”));
}
System.out.println(Bytes.toString(kv.getValue()));
}
text.set(row.toString());     // we can only emit Writables…
context.write(text,ONE);
}
}

public class MyReducer extends Reducer<Text, IntWritable, Text, IntWritable> {
public void reduce(Text key, Iterable<IntWritable> values,
Context context) throws IOException, InterruptedException {
int sum = 0;
for (IntWritable value : values) {
sum += value.get();
}
context.write(key, new IntWritable(sum));
}
}

public static void main(String[] args) throws Exception{
Configuration config = HBaseManager.conf;
Job job = new Job(config,”UserProfileTable”);
//Job job1 = new Job(config,”UserProfileTable1″);
job.setJarByClass(ReadWriteWithMapReducer.class);     // class that contains mapper and reducer

Scan scan = new Scan();
scan.setCaching(500);        // 1 is the default in Scan, which will be bad for MapReduce jobs
scan.setCacheBlocks(false);  // don’t set to true for MR jobs
// set other scan attrs
TableMapReduceUtil.initTableMapperJob(
“UserProfileTable”,        // input HBase table name
scan,             // Scan instance to control CF and attribute selection
MyMapper.class,   // mapper
Text.class,             // mapper output key
IntWritable.class,             // mapper output value
job);
job.setOutputFormatClass(TextOutputFormat.class);
FileOutputFormat.setOutputPath(job, new Path(“Out/help.txt”));
//job.setOutputFormatClass(NullOutputFormat.class);
job.setReducerClass(MyReducer.class);
job.setNumReduceTasks(0);
boolean b = job.waitForCompletion(true);
if (!b) {
throw new IOException(“error with job!”);
}
}
//ReadFromTableAndWriteToFile.java
}

13. Map Reduce example for ReadingFromTableMapper:

package mapred;

import java.io.IOException;
import java.util.HashMap;

import org.apache.hadoop.conf.Configuration;
import org.apache.hadoop.hbase.KeyValue;
import org.apache.hadoop.hbase.client.Result;
import org.apache.hadoop.hbase.client.Scan;
import org.apache.hadoop.hbase.io.ImmutableBytesWritable;
import org.apache.hadoop.hbase.mapreduce.TableMapReduceUtil;
import org.apache.hadoop.hbase.mapreduce.TableMapper;
import org.apache.hadoop.hbase.util.Bytes;
import org.apache.hadoop.io.IntWritable;
import org.apache.hadoop.io.LongWritable;
import org.apache.hadoop.io.Text;
import org.apache.hadoop.mapreduce.Job;
import org.apache.hadoop.mapreduce.Reducer;
import org.apache.hadoop.mapreduce.lib.output.NullOutputFormat;

import util.HBaseManager;

public class ReadingFromTableMapper {

/*public static class My1Mapper extends TableMapper<Text, IntWritable>  {

private final IntWritable ONE = new IntWritable(1);
private Text text = new Text();

public void map(ImmutableBytesWritable row, Result value, Context context) throws IOException, InterruptedException {
System.out.println(row);
for (KeyValue kv : value.raw()) {
HashMap<?, ?> map = (HashMap<?, ?>) kv.toStringMap();
System.out.println(“RowId is : “+Bytes.toString(value.getRow()));
System.out.println(map.get(“family”));
if(map.get(“qualifier”)!= null && !map.get(“qualifier”).equals(“”)){
System.out.println(map.get(“qualifier”));
}
System.out.println(Bytes.toString(kv.getValue()));
}
text.set(row.toString());     // we can only emit Writables…
context.write(text, ONE);
}
}*/

public static class MyMapper extends TableMapper<LongWritable,Text>  {

private final LongWritable ONE = new LongWritable(1);
private Text text = new Text();

public void map(ImmutableBytesWritable row, Result value, Context context) throws IOException, InterruptedException {
System.out.println(row);
for (KeyValue kv : value.raw()) {
HashMap<?, ?> map = (HashMap<?, ?>) kv.toStringMap();
System.out.println(“RowId is : “+Bytes.toString(value.getRow()));
System.out.println(map.get(“family”));
if(map.get(“qualifier”)!= null && !map.get(“qualifier”).equals(“”)){
System.out.println(map.get(“qualifier”));
}
System.out.println(Bytes.toString(kv.getValue()));
}
text.set(row.toString());     // we can only emit Writables…
context.write(ONE,text);
}
}

public class MyReducer extends Reducer<Text, IntWritable, Text, IntWritable> {
protected void reduce(Text key, Iterable<IntWritable> values,
Context context) throws IOException, InterruptedException {
int sum = 0;
for (IntWritable value : values) {
sum += value.get();
}
context.write(key, new IntWritable(sum));
}
}

public static void main(String[] args) throws Exception{
Configuration config = HBaseManager.conf;
Job job = new Job(config,”UserProfileTable”);
//Job job1 = new Job(config,”UserProfileTable1″);
job.setJarByClass(ReadWriteWithMapReducer.class);     // class that contains mapper and reducer

Scan scan = new Scan();
scan.setCaching(500);        // 1 is the default in Scan, which will be bad for MapReduce jobs
scan.setCacheBlocks(false);  // don’t set to true for MR jobs
// set other scan attrs
TableMapReduceUtil.initTableMapperJob(
“UserProfileTable”,        // input HBase table name
scan,             // Scan instance to control CF and attribute selection
MyMapper.class,   // mapper
null,             // mapper output key
null,             // mapper output value
job);
job.setOutputFormatClass(NullOutputFormat.class);
//FileOutputFormat.setOutputPath(job, new Path(“/tmp/mr/mySummaryFile”));
boolean b = job.waitForCompletion(true);
if (!b) {
throw new IOException(“error with job!”);
}
}
//ReadFromTableAndWriteToFile.java
}

14. Map Reduce example for ReadWriteWith2Map2Reducer:

package mapred;

import java.io.IOException;
import java.util.HashMap;

import org.apache.hadoop.conf.Configuration;
import org.apache.hadoop.hbase.KeyValue;
import org.apache.hadoop.hbase.client.Put;
import org.apache.hadoop.hbase.client.Result;
import org.apache.hadoop.hbase.client.Scan;
import org.apache.hadoop.hbase.io.ImmutableBytesWritable;
import org.apache.hadoop.hbase.mapreduce.TableMapReduceUtil;
import org.apache.hadoop.hbase.mapreduce.TableMapper;
import org.apache.hadoop.hbase.mapreduce.TableReducer;
import org.apache.hadoop.hbase.util.Bytes;
import org.apache.hadoop.io.IntWritable;
import org.apache.hadoop.io.Text;
import org.apache.hadoop.mapreduce.Job;

import util.HBaseManager;

public class ReadWriteWith2Map2Reducer {

public static class My1Mapper extends TableMapper<Text, IntWritable>  {

private final IntWritable ONE = new IntWritable(1);
private Text text = new Text();

public void map(ImmutableBytesWritable row, Result value, Context context) throws IOException, InterruptedException {
System.out.println(row);
for (KeyValue kv : value.raw()) {
HashMap<?, ?> map = (HashMap<?, ?>) kv.toStringMap();
System.out.println(“RowId is : “+Bytes.toString(value.getRow()));
System.out.println(map.get(“family”));
if(map.get(“qualifier”)!= null && !map.get(“qualifier”).equals(“”)){
System.out.println(map.get(“qualifier”));
}
System.out.println(Bytes.toString(kv.getValue()));
}
text.set(row.toString());     // we can only emit Writables…
context.write(text, ONE);
while(true){
System.out.println(“inside while ….”);
}
}
}

public static class My2Mapper extends TableMapper<Text, IntWritable>  {

private final IntWritable ONE = new IntWritable(1);
private Text text = new Text();

public void map(ImmutableBytesWritable row, Result value, Context context) throws IOException, InterruptedException {
System.out.println(row);
for (KeyValue kv : value.raw()) {
HashMap<?, ?> map = (HashMap<?, ?>) kv.toStringMap();
System.out.println(“RowId is : “+Bytes.toString(value.getRow()));
System.out.println(map.get(“family”));
if(map.get(“qualifier”)!= null && !map.get(“qualifier”).equals(“”)){
System.out.println(map.get(“qualifier”));
}
System.out.println(Bytes.toString(kv.getValue()));
}
text.set(row.toString());     // we can only emit Writables…
context.write(text, ONE);
}
}

/**
*
* @author hadoop-node1
*
*/
public static class My1TableReducer extends TableReducer<Text, IntWritable, ImmutableBytesWritable>  {

public void reduce(Text key, Iterable<IntWritable> values, Context context) throws IOException, InterruptedException {
for (IntWritable val : values) {
System.out.println(val);
}
Put put = new Put(Bytes.toBytes(key.toString()));
put.add(Bytes.toBytes(“Recommenders”), Bytes.toBytes(“Recommenders-1″), Bytes.toBytes(key.toString()));

context.write(null, put);
}
}

public static class My2TableReducer extends TableReducer<Text, IntWritable, ImmutableBytesWritable>  {

public void reduce(Text key, Iterable<IntWritable> values, Context context) throws IOException, InterruptedException {
for (IntWritable val : values) {
System.out.println(val);
}
Put put = new Put(Bytes.toBytes(key.toString()));
put.add(Bytes.toBytes(“Recommenders”), Bytes.toBytes(“Recommenders-1″), Bytes.toBytes(key.toString()));

context.write(null, put);
}
}

public static void main(String[] args) throws IOException, InterruptedException, ClassNotFoundException {
Configuration config = HBaseManager.conf;
Job job = new Job(config,”UserProfileTable”);
//Job job1 = new Job(config,”UserProfileTable1″);
job.setJarByClass(ReadWriteWithMapReducer.class);     // class that contains mapper and reducer

Scan scan = new Scan();
scan.setCaching(500);        // 1 is the default in Scan, which will be bad for MapReduce jobs
scan.setCacheBlocks(false);  // don’t set to true for MR jobs
// set other scan attrs

TableMapReduceUtil.initTableMapperJob(
“UserProfileTable”,        // input table
scan,               // Scan instance to control CF and attribute selection
My1Mapper.class,     // mapper class
Text.class,         // mapper output key
IntWritable.class,  // mapper output value
job);
Job job1 = new Job(config,”UserDataTable”);
//Job job1 = new Job(config,”UserProfileTable1″);
job1.setJarByClass(ReadWriteWithMapReducer.class);     // class that contains mapper and reducer

Scan scan1 = new Scan();
scan1.setCaching(500);        // 1 is the default in Scan, which will be bad for MapReduce jobs
scan1.setCacheBlocks(false);
TableMapReduceUtil.initTableMapperJob(
“UserDataTable”,        // input table
scan1,               // Scan instance to control CF and attribute selection
My2Mapper.class,     // mapper class
Text.class,         // mapper output key
IntWritable.class,  // mapper output value
job1);
TableMapReduceUtil.initTableReducerJob(
“UserProfileTableCopy”,        // output table
My1TableReducer.class,    // reducer class
job);
TableMapReduceUtil.initTableReducerJob(
“UserProfileTableCopy”,        // output table
My2TableReducer.class,    // reducer class
job1);
job.setNumReduceTasks(1);   // at least one, adjust as required
job.submit();
job1.submit();
System.exit(job.waitForCompletion(true) ? 0 : 1);
System.exit(job1.waitForCompletion(true) ? 0 : 1);

}
}

15. Map Reduce example for ReadWriteWithMapReducer:

package mapred;

import java.io.IOException;
import java.util.HashMap;

import org.apache.hadoop.conf.Configuration;
import org.apache.hadoop.hbase.KeyValue;
import org.apache.hadoop.hbase.client.Put;
import org.apache.hadoop.hbase.client.Result;
import org.apache.hadoop.hbase.client.Scan;
import org.apache.hadoop.hbase.io.ImmutableBytesWritable;
import org.apache.hadoop.hbase.mapreduce.TableMapReduceUtil;
import org.apache.hadoop.hbase.mapreduce.TableMapper;
import org.apache.hadoop.hbase.mapreduce.TableReducer;
import org.apache.hadoop.hbase.util.Bytes;
import org.apache.hadoop.io.IntWritable;
import org.apache.hadoop.io.Text;
import org.apache.hadoop.mapreduce.Job;

import util.HBaseManager;

public class ReadWriteWithMapReducer {

/**
*
* @author hadoop-node1
*
*/
public static class MyMapper extends TableMapper<Text, IntWritable>  {

private final IntWritable ONE = new IntWritable(1);
private Text text = new Text();

public void map(ImmutableBytesWritable row, Result value, Context context) throws IOException, InterruptedException {
//System.out.println(row);
HashMap<String, String> musicMap = new HashMap<String, String>();
for (KeyValue kv : value.raw()) {
String qualifier = “”;
HashMap<?, ?> map = (HashMap<?, ?>) kv.toStringMap();
String family = (String) map.get(“family”);
if(family.equalsIgnoreCase(“Music”)){
if(map.get(“qualifier”)!= null && !map.get(“qualifier”).equals(“”)){
qualifier = (String) map.get(“qualifier”);
}
String qualifierVal = Bytes.toString(kv.getValue());
musicMap.put(qualifier, qualifierVal);
}
}
System.out.println(musicMap.toString());
/*

for (KeyValue kv : value.raw()) {
HashMap<?, ?> map = (HashMap<?, ?>) kv.toStringMap();
System.out.println(“RowId is : “+Bytes.toString(value.getRow()));
System.out.println(map.get(“family”));
if(map.get(“qualifier”)!= null && !map.get(“qualifier”).equals(“”)){
System.out.println(map.get(“qualifier”));
}
System.out.println(Bytes.toString(kv.getValue()));
}*/
text.set(row.toString());     // we can only emit Writables…
context.write(text, ONE);
}
}

/**
*
* @author hadoop-node1
*
*/
public static class MyTableReducer extends TableReducer<Text, IntWritable, ImmutableBytesWritable>  {

public void reduce(Text key, Iterable<IntWritable> values, Context context) throws IOException, InterruptedException {
for (IntWritable val : values) {
System.out.println(val);
}
Put put = new Put(Bytes.toBytes(key.toString()));
put.add(Bytes.toBytes(“Recommenders”), Bytes.toBytes(“Recommenders-1″), Bytes.toBytes(key.toString()));

context.write(null, put);
}
}

public static void main(String[] args) throws IOException, InterruptedException, ClassNotFoundException {
Configuration config = HBaseManager.conf;
Job job = new Job(config,”UserProfileTable”);
//Job job1 = new Job(config,”UserProfileTable1″);
job.setJarByClass(ReadWriteWithMapReducer.class);     // class that contains mapper and reducer

Scan scan = new Scan();
scan.setCaching(500);        // 1 is the default in Scan, which will be bad for MapReduce jobs
scan.setCacheBlocks(false);  // don’t set to true for MR jobs
// set other scan attrs

TableMapReduceUtil.initTableMapperJob(
“UserProfileTable”,        // input table
scan,               // Scan instance to control CF and attribute selection
MyMapper.class,     // mapper class
Text.class,         // mapper output key
IntWritable.class,  // mapper output value
job);
/*Job job1 = new Job(config,”UserDataTable”);
//Job job1 = new Job(config,”UserProfileTable1″);
job1.setJarByClass(ReadWriteWithMapReducer.class);     // class that contains mapper and reducer

Scan scan1 = new Scan();
scan1.setCaching(500);        // 1 is the default in Scan, which will be bad for MapReduce jobs
scan1.setCacheBlocks(false);
TableMapReduceUtil.initTableMapperJob(
“UserDataTable”,        // input table
scan1,               // Scan instance to control CF and attribute selection
MyMapper.class,     // mapper class
Text.class,         // mapper output key
IntWritable.class,  // mapper output value
job1);*/

TableMapReduceUtil.initTableReducerJob(
“UserProfileTableCopy”,        // output table
MyTableReducer.class,    // reducer class
job);
/*TableMapReduceUtil.initTableReducerJob(
“UserProfileTableCopy”,        // output table
MyTableReducer.class,    // reducer class
job1);
job.setNumReduceTasks(1);   // at least one, adjust as required
job.submit();
job1.submit();*/
System.exit(job.waitForCompletion(true) ? 0 : 1);
//System.exit(job1.waitForCompletion(true) ? 0 : 1);

}
}

16. Map Reduce example for TableCopyAndPaste_Mapper_Reducer:

package mapred;

import java.io.IOException;
import java.util.HashMap;

import org.apache.hadoop.conf.Configuration;
import org.apache.hadoop.hbase.KeyValue;
import org.apache.hadoop.hbase.client.Put;
import org.apache.hadoop.hbase.client.Result;
import org.apache.hadoop.hbase.client.Scan;
import org.apache.hadoop.hbase.io.ImmutableBytesWritable;
import org.apache.hadoop.hbase.mapreduce.TableMapReduceUtil;
import org.apache.hadoop.hbase.mapreduce.TableMapper;
import org.apache.hadoop.hbase.mapreduce.TableReducer;
import org.apache.hadoop.hbase.util.Bytes;
import org.apache.hadoop.io.Writable;
import org.apache.hadoop.mapreduce.Job;

import util.HBaseManager;

public class TableCopyAndPaste_Mapper_Reducer {

/**
*
* @author hadoop-node1
*
*/
public static class MyMapper extends TableMapper<ImmutableBytesWritable, Writable>  {

public void map(ImmutableBytesWritable row, Result value, Context context) throws IOException, InterruptedException {
Put p = new Put( row.get());
HashMap<String, String> musicMap = new HashMap<String, String>();
for (KeyValue kv : value.raw()) {
String qualifier = “”;
HashMap<?, ?> map = (HashMap<?, ?>) kv.toStringMap();
String family = (String) map.get(“family”);
if(family.equalsIgnoreCase(“Music”)){
if(map.get(“qualifier”)!= null && !map.get(“qualifier”).equals(“”)){
qualifier = (String) map.get(“qualifier”);
}
String qualifierVal = Bytes.toString(kv.getValue());
musicMap.put(qualifier, qualifierVal);
}
p.add(kv);
}
System.out.println(musicMap.toString());
context.write(row, p);
}
}

/**
*
* @author hadoop-node1
*
*/
public static class MyTableReducer extends TableReducer<Writable, Writable, Writable>  {

public void reduce(Writable key, Iterable<Writable> values, Context context)
throws IOException, InterruptedException {
for (Writable putOrDelete : values) {
context.write(key, putOrDelete);
}
}
}

public static void main(String[] args) throws IOException, InterruptedException, ClassNotFoundException {
Configuration config = HBaseManager.conf;
Job job = new Job(config,”UserProfileTable”);
job.setJarByClass(ReadWriteWithMapReducer.class);     // class that contains mapper and reducer

Scan scan = new Scan();
scan.setCaching(500);        // 1 is the default in Scan, which will be bad for MapReduce jobs
scan.setCacheBlocks(false);  // don’t set to true for MR jobs

TableMapReduceUtil.initTableMapperJob(
“UserProfileTable”, scan,
MyMapper.class, ImmutableBytesWritable.class, Put.class, job);

TableMapReduceUtil.initTableReducerJob(“UserProfileTableCopy”, MyTableReducer.class, job);

System.exit(job.waitForCompletion(true) ? 0 : 1);

}
}

17.HBase MapReduce Summary to RDBMS

Sometimes it is more appropriate to generate summaries to an RDBMS. For these cases, it is possible to generate summaries directly to an RDBMS via a custom reducer. Thesetup method can connect to an RDBMS (the connection information can be passed via custom parameters in the context) and the cleanup method can close the connection.

It is critical to understand that number of reducers for the job affects the summarization implementation, and you’ll have to design this into your reducer. Specifically, whether it is designed to run as a singleton (one reducer) or multiple reducers. Neither is right or wrong, it depends on your use-case. Recognize that the more reducers that are assigned to the job, the more simultaneous connections to the RDBMS will be created – this will scale, but only to a point.

 public static class MyRdbmsReducer extends Reducer<Text, IntWritable, Text, IntWritable>  {

	private Connection c = null;

	public void setup(Context context) {
  		// create DB connection...
  	}

	public void reduce(Text key, Iterable<IntWritable> values, Context context) throws IOException, InterruptedException {
		// do summarization
		// in this example the keys are Text, but this is just an example
	}

	public void cleanup(Context context) {
  		// close db connection
  	}

}

In the end, the summary results are written to your RDBMS table/s.

Ref:  http://bigdataprocessing.wordpress.com/2012/07/27/hadoop-hbase-mapreduce-examples/

http://hbase.apache.org/book/mapreduce.example.html

http://hbase.apache.org/book/perf.reading.html

http://stackoverflow.com/questions/2431387/how-to-read-data-from-hbase

http://svn.apache.org/repos/asf/hbase/trunk/hbase-server/src/main/java/org/apache/hadoop/hbase/mapred

  • 0
    点赞
  • 0
    收藏
    觉得还不错? 一键收藏
  • 0
    评论

“相关推荐”对你有帮助么?

  • 非常没帮助
  • 没帮助
  • 一般
  • 有帮助
  • 非常有帮助
提交
评论
添加红包

请填写红包祝福语或标题

红包个数最小为10个

红包金额最低5元

当前余额3.43前往充值 >
需支付:10.00
成就一亿技术人!
领取后你会自动成为博主和红包主的粉丝 规则
hope_wisdom
发出的红包
实付
使用余额支付
点击重新获取
扫码支付
钱包余额 0

抵扣说明:

1.余额是钱包充值的虚拟货币,按照1:1的比例进行支付金额的抵扣。
2.余额无法直接购买下载,可以购买VIP、付费专栏及课程。

余额充值