使用常用工具测试HBase应用

最新推荐文章于 2024-08-20 02:09:09 发布

moxiaomomo

最新推荐文章于 2024-08-20 02:09:09 发布

阅读量6.4k

点赞数 1

分类专栏： Hadoop

Hadoop 专栏收录该内容

20 篇文章 1 订阅

订阅专栏

虽然业界采用Apache HBase来构建终端用户应用的范围越来越多，但是许多这种应用并没有经过良好的测试。通过这篇文章，你可以了解到有关这方面的一些容易实现的测试方法。

我们首先以JUnit为例, 然后是Mockito 和Apache MRUnit, 接着会使用HBase的一个微型集群来做集成测试。(HBase自身的代码也是通过一个微型的集群来测试的，所以对于上游的应用为什么不能这样测试呢?)

作为探讨的基础，我们假设你创建了用于将数据插入到HBase 的数据访问对象(DAO)。实际的逻辑可能很复杂，但为了演示用例，以下简单的代码也可以完成基本的功能:

public class MyHBaseDAO {

                public static void insertRecord(HTableInterface table, HBaseTestObj obj)
        throws Exception {
                        Put put = createPut(obj);
                        table.put(put);
                }

                private static Put createPut(HBaseTestObj obj) {
                        Put put = new Put(Bytes.toBytes(obj.getRowKey()));
                        put.add(Bytes.toBytes("CF"), Bytes.toBytes("CQ-1"),
                                Bytes.toBytes(obj.getData1()));
                        put.add(Bytes.toBytes("CF"), Bytes.toBytes("CQ-2"),
                                Bytes.toBytes(obj.getData2()));
                        return put;
                }
}

HBaseTestObj是一个含有成员rowkey, data1, and data2及其getter和setter的对象。

方法insertRecord向HBase表插入了列族为CF, 列为CQ-1,CQ-2的记录。方法createPut 简单的包装了Put数据并返回给调用者。

使用JUnit

对于大部分Java开发者来说都很熟悉的JUnit, 可以容易的应用到HBase大部分程序中。首先，在pom中添加依赖库:

<dependency>
    <groupId>junit</groupId>
    <artifactId>junit</artifactId>
    <version>4.11</version>
    <scope>test</scope>
</dependency>

然后在测试类中使用:

public class TestMyHbaseDAOData {
                @Test
                public void testCreatePut() throws Exception {
                HBaseTestObj obj = new HBaseTestObj();
                obj.setRowKey("ROWKEY-1");
                obj.setData1("DATA-1");
                obj.setData2("DATA-2");
                Put put = MyHBaseDAO.createPut(obj);
                assertEquals(obj.getRowKey(), Bytes.toString(put.getRow()));
                assertEquals(obj.getData1(), Bytes.toString(put.get(Bytes.toBytes("CF"),
  Bytes.toBytes("CQ-1")).get(0).getValue()));
                assertEquals(obj.getData2(), Bytes.toString(put.get(Bytes.toBytes("CF"),
  Bytes.toBytes("CQ-2")).get(0).getValue()));
                }
  }

上面所写的代码就是为了保证方法createPut能正常的创建,填充,和返回对象Put。

使用Mockito

怎么实现像JUnit那样的功能来测试方法insertRecord? 使用Mockito可以这样做：

首先在pom中添加Mockito依赖库:

<dependency>
    <groupId>org.mockito</groupId>
    <artifactId>mockito-all</artifactId>
    <version>1.9.5</version>
    <scope>test</scope>
</dependency>

然后在测试类中增加如下代码:

@RunWith(MockitoJUnitRunner.class)
public class TestMyHBaseDAO{
  @Mock 
  private HTableInterface table;
  @Mock
  private HTablePool hTablePool;
  @Captor
  private ArgumentCaptor putCaptor;

  @Test
  public void testInsertRecord() throws Exception {
    //return mock table when getTable is called
    when(hTablePool.getTable("tablename")).thenReturn(table);
    //create test object and make a call to the DAO that needs testing
    HBaseTestObj obj = new HBaseTestObj();
    obj.setRowKey("ROWKEY-1");
    obj.setData1("DATA-1");
    obj.setData2("DATA-2");
    MyHBaseDAO.insertRecord(table, obj);
    verify(table).put(putCaptor.capture());
    Put put = putCaptor.getValue();
  
    assertEquals(Bytes.toString(put.getRow()), obj.getRowKey());
    assert(put.has(Bytes.toBytes("CF"), Bytes.toBytes("CQ-1")));
    assert(put.has(Bytes.toBytes("CF"), Bytes.toBytes("CQ-2")));
    assertEquals(Bytes.toString(put.get(Bytes.toBytes("CF"),
Bytes.toBytes("CQ-1")).get(0).getValue()), "DATA-1");
    assertEquals(Bytes.toString(put.get(Bytes.toBytes("CF"),
Bytes.toBytes("CQ-2")).get(0).getValue()), "DATA-2");
  }
}

上述代码使用了“ROWKEY-1”, “DATA-1”, “DATA-2”来填充了HBaseTestObj，然后用table接口和DAO来将它插入到表中。此过程将捕捉DAO用来inert操作的对象Put和检查rowkey, data1及 data2的值是否符合期望。

这里的重点是需要管理HTable pool和在DAO之外创建的HTable对象。这样会让你清晰的进行mock(创建虚拟对象)并像上面一样测试Put对象。近似的也可以这样测试Get, Scan, Delete等操作。

使用MRUnit

使用常规的数据访问来覆盖单元测试，让我们针对HBase表来实现MapReduce作业。

可以像测试MapReduce常规任务来测试HBase的MapReduce作业，MRUnit使得很容易完成这样的单元测试。

假设你有一个往HBase表写数据MR作业“MyTest”, 它的列族为“CF”。作业的reducer部分像以下这样:

public class MyReducer extends TableReducer<Text, Text, ImmutableBytesWritable> {
   public static final byte[] CF = "CF".getBytes();
   public static final byte[] QUALIFIER = "CQ-1".getBytes();
  public void reduce(Text key, Iterable<Text> values, Context context) throws IOException, InterruptedException {
     //bunch of processing to extract data to be inserted, in our case, lets say we are simply
     //appending all the records we receive from the mapper for this particular
     //key and insert one record into HBase
     StringBuffer data = new StringBuffer();
     Put put = new Put(Bytes.toBytes(key.toString()));
     for (Text val : values) {
         data = data.append(val);
     }
     put.add(CF, QUALIFIER, Bytes.toBytes(data.toString()));
     //write to HBase
     context.write(new ImmutableBytesWritable(Bytes.toBytes(key.toString())), put);
   }
 }

现在你如何在MRUnit中使用单元测试来测试reducer呢? 首先,在pom中添加MRUnit依赖库.

<dependency>
   <groupId>org.apache.mrunit</groupId>
   <artifactId>mrunit</artifactId>
   <version>1.0.0 </version>
   <scope>test</scope>
</dependency>

接着，在测试类中像下面这样调用MRUnit提供的ReduceDriver：

public class MyReducerTest {
    ReduceDriver<Text, Text, ImmutableBytesWritable, Writable> reduceDriver;
    byte[] CF = "CF".getBytes();
    byte[] QUALIFIER = "CQ-1".getBytes();

    @Before
    public void setUp() {
      MyReducer reducer = new MyReducer();
      reduceDriver = ReduceDriver.newReduceDriver(reducer);
    }
  
   @Test
   public void testHBaseInsert() throws IOException {
      String strKey = "RowKey-1", strValue = "DATA", strValue1 = "DATA1", 
strValue2 = "DATA2";
      List<Text> list = new ArrayList<Text>();
      list.add(new Text(strValue));
      list.add(new Text(strValue1));
      list.add(new Text(strValue2));
      //since in our case all that the reducer is doing is appending the records that the mapper   
      //sends it, we should get the following back
      String expectedOutput = strValue + strValue1 + strValue2;
     //Setup Input, mimic what mapper would have passed
      //to the reducer and run test
      reduceDriver.withInput(new Text(strKey), list);
      //run the reducer and get its output
      List<Pair<ImmutableBytesWritable, Writable>> result = reduceDriver.run();
    
      //extract key from result and verify
      assertEquals(Bytes.toString(result.get(0).getFirst().get()), strKey);
    
      //extract value for CF/QUALIFIER and verify
      Put a = (Put)result.get(0).getSecond();
      String c = Bytes.toString(a.get(CF, QUALIFIER).get(0).getValue());
      assertEquals(expectedOutput,c );
   }

}

在MyReducer中进行的一系列过程，你会验证：

你所期望得到的输出结果。
对象Put以 “RowKey-1″为键插入到HBAse表中。
“DATADATA1DATA2″是列族CF和列CQ的值。

你也可以类似的使用MapperDriver来测试从HBase中获取数据的Mapper，或者测试从HBase读取、处理和写入数据到HDFS的MR作业。

使用HBase Mini-cluster

现在我们来看一下怎样实现集成测试。HBase附带了HBaseTestingUtility, 这个用于在简单的mini-cluster环境中编写集成测试。为了引用正确的库，在pom中需要添加如下依赖:

<dependency>
    <groupId>org.apache.hadoop</groupId>
    <artifactId>hadoop-common</artifactId>
    <version>2.0.0-cdh4.2.0</version>
    <type>test-jar</type>
    <scope>test</scope>
</dependency>

<dependency>
    <groupId>org.apache.hbase</groupId>
    <artifactId>hbase</artifactId>
    <version>0.94.2-cdh4.2.0</version>
    <type>test-jar</type>
    <scope>test</scope>
</dependency>
        
<dependency>
    <groupId>org.apache.hadoop</groupId>
    <artifactId>hadoop-hdfs</artifactId>
    <version>2.0.0-cdh4.2.0</version>
    <type>test-jar</type>
    <scope>test</scope>
</dependency>

<dependency>
    <groupId>org.apache.hadoop</groupId>
    <artifactId>hadoop-hdfs</artifactId>
    <version>2.0.0-cdh4.2.0</version>
    <scope>test</scope>
</dependency>

现在看一下怎样来为MyDAO的插入操作运行一个集成测试：

public class MyHBaseIntegrationTest {
private static HBaseTestingUtility utility;
byte[] CF = "CF".getBytes();
byte[] QUALIFIER = "CQ-1".getBytes();

@Before
public void setup() throws Exception {
        utility = new HBaseTestingUtility();
        utility.startMiniCluster();
}

@Test
    public void testInsert() throws Exception {
         HTableInterface table = utility.createTable(Bytes.toBytes("MyTest"),
                         Bytes.toBytes("CF"));
         HBaseTestObj obj = new HBaseTestObj();
         obj.setRowKey("ROWKEY-1");
         obj.setData1("DATA-1");
         obj.setData2("DATA-2");
         MyHBaseDAO.insertRecord(table, obj);
         Get get1 = new Get(Bytes.toBytes(obj.getRowKey()));
         get1.addColumn(CF, CQ1);
         Result result1 = table.get(get1);
         assertEquals(Bytes.toString(result1.getRow()), obj.getRowKey());
         assertEquals(Bytes.toString(result1.value()), obj.getData1());
         Get get2 = new Get(Bytes.toBytes(obj.getRowKey()));
         get2.addColumn(CF, CQ2);
         Result result2 = table.get(get2);
         assertEquals(Bytes.toString(result2.getRow()), obj.getRowKey());
         assertEquals(Bytes.toString(result2.value()), obj.getData2());
    }}

上面代码创建了一个HBase微型集群并启动它，然后创建了名为“MyTest”的其列族为"CF"的HBase表，接着使用DAO插入一条记录、再从该表执行Get操作，验证DAO是否正确的插入记录。

同样可以测试更加复杂的MR作业。也可以在创建HBase集群，运行MR作业，输出数据到HBase表，验证插入的数据时访问HDFS及ZooKeeper mini-cluster。

注意：启动一个mini-cluster需要20到30秒，在Windows下需要安装Cygwin。然而它们应该只是周期性的运行，更长的时间也是可以接受的。

可参考https://github.com/sitaula/HBaseTest，上面有同样的示例代码。Happy testing!

原文：How-to: Test HBase Applications Using Popular Tools