HBase实践
实验目的:通过Hadoop和HBase环境搭建和进行简单的使用,增加对大数据存储和NoSQL数据库的了解。
实验环境:
电脑硬件配置:
Ram:32G
CPU:Intel® Core™ i5-4460 CPU @ 3.20GHz × 4
OS:Ubuntu 16.04.1 LTS 64-bit
Virtubox:5.0
虚拟机:
Master ,slave1,slave2,slave3:
Ram:4G
OS:Ubuntu 16.04.1 LTS 64-bit
HADOOP:2.7.3
JAVA:1.8
HBASE:1.2.4
实验要求:
1、通过虚拟机至少搭建3个节点(注意并不是采用伪分布式)。
2、通过客户端(shell)可以正常的进行插入、删除、查看等操作。
3、将下列关系数据库的Table转换为适合于HBase存储的表并插入数据:
学生表(Student)
学号(S_No) | 姓名(S_Name) | 性别(S_Sex) | 年龄(S_Age) |
2015001 | Zhangsan | male | 23 |
2015003 | Mary | female | 22 |
2015003 | Lisi | male | 24 |
课程表(Course)
课程号(C_No) | 课程名(C_Name) | 学分(C_Credit) |
123001 | Math | 2.0 |
123002 | Computer Science | 5.0 |
123003 | English | 3.0 |
选课表(SC)
学号(SC_Sno) | 课程号(SC_Cno) | 成绩(SC_Score) |
2015001 | 123001 | 86 |
2015001 | 123003 | 69 |
2015002 | 123002 | 77 |
2015002 | 123003 | 99 |
2015003 | 123001 | 98 |
2015003 | 123002 | 95 |
同时,请使用HBase Java API编程完成以下功能:
(1)createTable(String tableName, String[] fields)
创建表,参数tableName为表的名称,字符串数组fields为存储记录各个域名称的数组。要求当HBase已经存在名为tableName的表的时候,先删除原有的表,然后再创建新的表。
(2)addRecord(String tableName, String row, String[] fields, String[] values)
向表tableName、行row(用S_Name表示)和字符串数组files指定的单元格中添加对应的数据values。其中fields中每个元素如果对应的列族下还有相应的列限定符的话,用“columnFamily:column”表示。例如,同时向“Math”、“Computer Science”、“English”三列添加成绩时,字符串数组fields为{“Score:Math”,”Score;Computer Science”,”Score:English”},数组values存储这三门课的成绩。
(3)scanColumn(String tableName, String column)
浏览表tableName某一列的数据,如果某一行记录中该列数据不存在,则返回null。要求当参数column为某一列族名称时,如果底下有若干个列限定符,则要列出每个列限定符代表的列的数据;当参数column为某一列具体名称(例如“Score:Math”)时,只需要列出该列的数据。
(4)modifyData(String tableName, String row, String column)
修改表tableName,行row(可以用学生姓名S_Name表示),列column指定的单元格的数据。
(5)deleteRow(String tableName, String row)
删除表tableName中row指定的行的记录。
实验过程:
1.hadoop配置:
教程:https://thwang1206.gitbooks.io/hadoop-installation/content/install_hadoop.html
2.配置结果:
1个Master和3个Slave
1.Hbase配置
教程:https://thwang1206.gitbooks.io/hadoop-installation/content/install_hbase.html
2.配置结果
启动hbase成功!
代码运行图:
Hbase shell 查看创建插入是否成功。
主要代码:
附录:
package com.popoaichuiniu.jacy;
import java.io.IOException;
import java.util.Iterator;
import org.apache.hadoop.conf.Configuration;
import org.apache.hadoop.hbase.HBaseConfiguration;
import org.apache.hadoop.hbase.HColumnDescriptor;
import org.apache.hadoop.hbase.HTableDescriptor;
import org.apache.hadoop.hbase.TableName;
import org.apache.hadoop.hbase.client.Admin;
import org.apache.hadoop.hbase.client.Connection;
import org.apache.hadoop.hbase.client.ConnectionFactory;
import org.apache.hadoop.hbase.client.Delete;
import org.apache.hadoop.hbase.client.Get;
import org.apache.hadoop.hbase.client.Table;
import org.apache.hadoop.hbase.client.Put;
import org.apache.hadoop.hbase.client.Result;
import org.apache.hadoop.hbase.client.ResultScanner;
import org.apache.hadoop.hbase.client.Scan;
import org.apache.hadoop.hbase.util.Bytes;
// Class that has nothing but a main.
// Does a Put, Get and a Scan against an hbase table.
// The API described here is since HBase 1.0.
public class HBaseExample {
public static Configuration config=null;
public static Connection connection=null;
public static void createTable(String tableName, String[] fields) throws IOException
{
Admin admin=connection.getAdmin();
// Instantiating table descriptor class
//HTableDescriptor contains the details about an HBase table such as the descriptors of all the column families,
//is the table a catalog table, -ROOT- or hbase:meta , if the table is read only, the maximum size of the memstore, when the region split should occur, coprocessors associated with it etc...
if(admin.tableExists(TableName.valueOf(tableName)))
{
System.out.println("table has existed!");
}
else
{
HTableDescriptor tableDescriptor = new HTableDescriptor(TableName.valueOf(tableName));
for(int i=0;i<fields.length;i++)
{
HColumnDescriptor hColumnFamily=new HColumnDescriptor(fields[i].getBytes());
tableDescriptor.addFamily(hColumnFamily);
}
admin.createTable(tableDescriptor);
System.out.println("create successfully!");
}
}
public static void addRecord(String tableName, String row, String[] fields, String[] values) throws IOException
{
Table table=connection.getTable(TableName.valueOf(tableName));
Put put=new Put(row.getBytes());
for(int i=0;i<fields.length;i++)
put.add(fields[i].getBytes(),fields[i].getBytes() , values[i].getBytes());
table.put(put);
}
public void scanColumn(String tableName, String column) throws IOException
{ Table table=connection.getTable(TableName.valueOf(tableName));
if(column.contains(":"))//only one column
{String []temp=column.split(":");
ResultScanner rs=table.getScanner(temp[0].getBytes(),temp[1].getBytes());
for(Iterator<Result>it= rs.iterator();it.hasNext();)
{
Result result=rs.next();
byte []values=result.value();
if(values==null)
System.out.print("null"+" ");
else
System.out.print(Bytes.toString(values)+" ");
}
System.out.println("");
rs.close();
}
else
{
ResultScanner rs=table.getScanner(column.getBytes());
for(Iterator<Result>it= rs.iterator();it.hasNext();)//result Single row result of a Get or Scan query.
{
Result result=rs.next();
if(result!=null)
System.out.println(result+" ");
else
System.out.print("null"+" ");
}
System.out.println("");
rs.close();
}
}
public void modifyData(String tableName, String row, String column,String value) throws IOException
{Table table=connection.getTable(TableName.valueOf(tableName));
Put put=new Put(row.getBytes());
String []temp=column.split(":");
put.addColumn(temp[0].getBytes(), temp[1].getBytes(), value.getBytes());
table.put(put);
}
public void deleteRow(String tableName, String row) throws IOException
{
Table table=connection.getTable(TableName.valueOf(tableName));
Delete delete=new Delete(row.getBytes());
table.delete(delete);
}
public static void main(String[] args) throws IOException {
// You need a configuration object to tell the client where to connect.
// When you create a HBaseConfiguration, it reads in whatever you've set
// into your hbase-site.xml and in hbase-default.xml, as long as these can
// be found on the CLASSPATH
config = HBaseConfiguration.create();
// Next you need a Connection to the cluster. Create one. When done with it,
// close it. A try/finally is a good way to ensure it gets closed or use
// the jdk7 idiom, try-with-resources: see
// https://docs.oracle.com/javase/tutorial/essential/exceptions/tryResourceClose.html
//
// Connections are heavyweight. Create one once and keep it around. From a Connection
// you get a Table instance to access Tables, an Admin instance to administer the cluster,
// and RegionLocator to find where regions are out on the cluster. As opposed to Connections,
// Table, Admin and RegionLocator instances are lightweight; create as you need them and then
// close when done.
//
String fields []=new String[]{"S_No","S_Name","S_Sex","S_Age"};
String values []=new String[]{"SA16011096","zms","man","22"};
connection = ConnectionFactory.createConnection(config);
createTable("Student",fields);
addRecord("Student","SA16011096",fields,values);
fields=new String[]{"C_No","C_Name","C_credits"};//
values=new String[]{"123","zuheshuxue","3.5"};
createTable("Course",fields);
addRecord("Course","123",fields,values);
fields=new String[]{"SC_Sno","SC_Cno","SC_Score"};//
values=new String[]{"SA16011096","123","85"};
createTable("SC",fields);
addRecord("SC","SA16011096",fields,values);
connection.close();
}
}