一、HBase的数据变化的流程图如下:
在下面的图片展示了HBase的数据变化之后在Atlas里面的流程:
二、在整个过程中每一个小流程分析
- 使用
java
编写的代码需要运行。首先会被编程成字节码文件,然后把字节码加载到内存中才可以运行。Atlas的hook也一样,想要运行首先肯定需要被加载到内存。把.Class
文件加载到内存需要对应的内加载器。下面是加载Atlas的HBaseAtlasHook的类,如下所示:
/**
* 在这个类中实现对HBaseAtlasHook这个类的加载
*/
public class HBaseAtlasCoprocessor implements MasterCoprocessor, MasterObserver, RegionObserver, RegionServerObserver {
// 在构造生成HBaseAtlasCoprocessor的时候,加载HBaseAtlasHook
public HBaseAtlasCoprocessor() {
if(LOG.isDebugEnabled()) {
LOG.debug("==> HBaseAtlasCoprocessor.HBaseAtlasCoprocessor()");
}
// HBaseAtlasCoprocessor类初始化的时候将HBaseAtlasHook加载到HBase集群的内存中
this.init();
if(LOG.isDebugEnabled()) {
LOG.debug("<== HBaseAtlasCoprocessor.HBaseAtlasCoprocessor()");
}
}
private void init(){
if(LOG.isDebugEnabled()) {
LOG.debug("==> HBaseAtlasCoprocessor.init()");
}
try {
// 获取HBaseAtlasHook的类加载器
atlasPluginClassLoader = AtlasPluginClassLoader.getInstance(ATLAS_PLUGIN_TYPE, this.getClass());
@SuppressWarnings("unchecked")
Class<?> cls = Class.forName(ATLAS_HBASE_HOOK_IMPL_CLASSNAME, true, atlasPluginClassLoader);
activatePluginClassLoader();
impl = cls.newInstance();
implMasterObserver = (MasterObserver)impl;
implRegionObserver = (RegionObserver)impl;
implRegionServerObserver = (RegionServerObserver)impl;
implMasterCoprocessor = (MasterCoprocessor)impl;
} catch (Exception e) {
// check what need to be done
LOG.error("Error Enabling RangerHbasePlugin", e);
} finally {
deactivatePluginClassLoader();
}
if(LOG.isDebugEnabled()) {
LOG.debug("<== HBaseAtlasCoprocessor.init()");
}
}
}
在看完上面的代码,肯定会疑惑HBaseAtlasCoprocessor
实现的那四个接口有何作用。首先看第一个接口如下所示
@InterfaceAudience.LimitedPrivate(HBaseInterfaceAudience.COPROC)
@InterfaceStability.Evolving
public interface MasterCoprocessor extends Coprocessor {
default Optional<MasterObserver> getMasterObserver() {
return Optional.empty();
}
}
这个接口是在HBase
的coprocessor
包下面的。但是很蛋疼在接口上没有任务注释。但是这个接口继承了Coprocessor
接口。下面是Coprocessor
接口以及其注释
/**
* Building a coprocessor to observe Master operations.
* <pre>
* class MyMasterCoprocessor implements MasterCoprocessor {
* @Override
* public Optional<MasterObserver> getMasterObserver() {
* return new MyMasterObserver();
* }
* }
*/
@InterfaceAudience.LimitedPrivate(HBaseInterfaceAudience.COPROC)
@InterfaceStability.Evolving
public interface Coprocessor {
int VERSION = 1;
/** Highest installation priority */
int PRIORITY_HIGHEST = 0;
/** High (system) installation priority */
int PRIORITY_SYSTEM = Integer.MAX_VALUE / 4;
/** Default installation priority for user coprocessors */
int PRIORITY_USER = Integer.MAX_VALUE / 2;
/** Lowest installation priority */
int PRIORITY_LOWEST = Integer.MAX_VALUE;
/**
* Lifecycle state of a given coprocessor instance.
*/
enum State {
UNINSTALLED,
INSTALLED,
STARTING,
ACTIVE,
STOPPING,
STOPPED
}
/**
* Called by the {@link CoprocessorEnvironment} during it's own startup to initialize the
* coprocessor.
*/
default void start(CoprocessorEnvironment env) throws IOException {
}
/**
* Called by the {@link CoprocessorEnvironment} during it's own shutdown to stop the
* coprocessor.
*/
default void stop(CoprocessorEnvironment env) throws IOException {
}
/**
* Coprocessor endpoints providing protobuf services should override this method.
* @return Iterable of {@link Service}s or empty collection. Implementations should never
* return null.
*/
default Iterable<Service> getServices() {
return Collections.EMPTY_SET;
}
}
通过这个接口的注释可以得知,想要观察到HBase
的Master
的操作就需要实现这个操作,结合在Atlas源码中的代码。实现这个接口是为了获取HBase集群中HMaster
对集群的操作。
Atlas实现的第二接口是HBaseAtlasCoprocessor
这个接口的详细源码以及注释如下:
/**
* Defines coprocessor hooks for interacting with operations on the
* {@link org.apache.hadoop.hbase.master.HMaster} process.
*/
@InterfaceAudience.LimitedPrivate(HBaseInterfaceAudience.COPROC)
@InterfaceStability.Evolving
public interface MasterObserver {
/**
* Called before a new table is created by
* {@link org.apache.hadoop.hbase.master.HMaster}. Called as part of create
* table RPC call.
* @param ctx the environment to interact with the framework and master
* @param desc the TableDescriptor for the table
* @param regions the initial regions created for the table
*/
default void preCreateTable(final ObserverContext<MasterCoprocessorEnvironment> ctx,
TableDescriptor desc, RegionInfo[] regions) throws IOException {
}
// 还有很多代码
}
定义与HBase的Master交互的钩子函数。从上面代码中的方法中可以看到,这个方法在HBase创建表之前会被调用。在表被创建之前会被调用那么通过这个接口我们就可以成功的获取到表的属性信息。
Atlas实现的第三个接口是RegionObserver
。其部分代码以及注释如下所示:
/**
*Coprocessors implement this interface to observe and mediate client actions on the region.
*/
@InterfaceAudience.LimitedPrivate(HBaseInterfaceAudience.COPROC)
@InterfaceStability.Evolving
// TODO as method signatures need to break, update to
// ObserverContext<? extends RegionCoprocessorEnvironment>
// so we can use additional environment state that isn't exposed to coprocessors.
public interface RegionObserver {
/** Mutation type for postMutationBeforeWAL hook */
enum MutationType {
APPEND, INCREMENT
}
/**
* Called before the region is reported as open to the master.
* @param c the environment provided by the region server
*/
default void preOpen(ObserverContext<RegionCoprocessorEnvironment> c) throws IOException {
}
}
实现这个接口是为了获取HBase的region数据变化相关的信息。
实现的第四个接口是RegionServerObserver
。其部分代码以及注释如下所示:
@InterfaceAudience.LimitedPrivate(HBaseInterfaceAudience.COPROC)
@InterfaceStability.Evolving
public interface RegionServerObserver {
/**
* Called before stopping region server.
* @param ctx the environment to interact with the framework and region server.
*/
default void preStopRegionServer(
final ObserverContext<RegionServerCoprocessorEnvironment> ctx) throws IOException {
}
}
实现这个接口的作用也是为了捕获到HBase的数据变化。
综合上述条件。HBaseAtlasCoprocessor
实现这四个接口的作用是为了捕获到在HBase
中所有的数据变化。但是在源码中会出现如下片段的源码:
@Override
public void postDeleteNamespace(ObserverContext<MasterCoprocessorEnvironment> ctx, String ns) throws IOException {
if(LOG.isDebugEnabled()) {
LOG.debug("==> HBaseAtlasCoprocessor.preDeleteNamespace()");
}
try {
activatePluginClassLoader();
implMasterObserver.postDeleteNamespace(ctx, ns);
} finally {
deactivatePluginClassLoader();
}
if(LOG.isDebugEnabled()) {
LOG.debug("<== HBaseAtlasCoprocessor.preDeleteNamespace()");
}
}
在上述的源码中,HBase中的数据只要出现变化就会出发一次激活类加载器的操作。出发这次操作的原因是在初始化完成之后这个类加载其被关闭。但是会出现如下代码implMasterObserver.postDeleteNamespace(ctx, ns);
,目前没看明白是何含义,还希望大神不吝赐教。
2. 在上一步完成类的加载,在类被加载完成之后。就可以开始正式使用。下面是获取HBase
变化的数据。并且发送出去。如下代码所示:
public class HBaseAtlasCoprocessor implements MasterCoprocessor, MasterObserver, RegionObserver, RegionServerObserver {
private static final Logger LOG = LoggerFactory.getLogger(HBaseAtlasCoprocessor.class);
final HBaseAtlasHook hbaseAtlasHook;
public HBaseAtlasCoprocessor() {
hbaseAtlasHook = HBaseAtlasHook.getInstance();
}
@Override
public void postCreateTable(ObserverContext<MasterCoprocessorEnvironment> observerContext, TableDescriptor tableDescriptor, RegionInfo[] hRegionInfos) throws IOException {
LOG.info("==> HBaseAtlasCoprocessor.postCreateTable()");
// 将获取到数据通过Atlas的Hook发送到kafka消息服务器
hbaseAtlasHook.sendHBaseTableOperation(tableDescriptor, null, HBaseAtlasHook.OPERATION.CREATE_TABLE, observerContext);
if (LOG.isDebugEnabled()) {
LOG.debug("<== HBaseAtlasCoprocessor.postCreateTable()");
}
}
}
这个HBaseAtlasCoprocessor
虽与上面的是同名的但是是在不同包下的数据,在这个类中是为了捕获数据。在上面的类中是为了激活类加载器。
3. HBaseAtlasHook
相关源码
// This will register Hbase entities into Atlas
public class HBaseAtlasHook extends AtlasHook {
private static volatile HBaseAtlasHook me;
public static HBaseAtlasHook getInstance() {
HBaseAtlasHook ret = me;
if (ret == null) {
try {
synchronized (HBaseAtlasHook.class) {
ret = me;
if (ret == null) {
me = ret = new HBaseAtlasHook(atlasProperties);
}
}
} catch (Exception e) {
LOG.error("Caught exception instantiating the Atlas HBase hook.", e);
}
}
return ret;
}
/**
* 调用Atlas的消息通知框架将消息发送到Atlas的消息服务器
* @param tableDescriptor HBase表描述器
* @param tableName 表名称
* @param operation 对表进行的操作
* @param ctx 对表操作的上下文
*/
public void sendHBaseTableOperation(TableDescriptor tableDescriptor, final TableName tableName, final OPERATION operation, ObserverContext<MasterCoprocessorEnvironment> ctx) {
if (LOG.isDebugEnabled()) {
LOG.debug("==> HBaseAtlasHook.sendHBaseTableOperation()");
}
try {
final UserGroupInformation ugi = getUGI(ctx);
final User user = getActiveUser(ctx);
final String userName = (user != null) ? user.getShortName() : null;
//封装HBase的操作为Atlas对消息上下文的封装。
HBaseOperationContext hbaseOperationContext = handleHBaseTableOperation(tableDescriptor, tableName, operation, ugi, userName);
//将在前面构建好的对HBase操作的上下文发送到Atlas的kafka服务器
sendNotification(hbaseOperationContext);
} catch (Throwable t) {
LOG.error("<== HBaseAtlasHook.sendHBaseTableOperation(): failed to send notification", t);
}
if (LOG.isDebugEnabled()) {
LOG.debug("<== HBaseAtlasHook.sendHBaseTableOperation()");
}
}
}
在上面HBaseAtlasHook的代码中,首先是获取HBaseAtlasHook,是通过双重检查锁创建的HBaseAtlasHook。但是在创建的是需要需要使用到atlas的配置文件,那配置文件是从哪儿读取的呢?在AtlasHook
中有如下的代码:
/**
* A base class for atlas hooks.
*/
public abstract class AtlasHook {
static {
try {
atlasProperties = ApplicationProperties.get();
} catch (Exception e) {
LOG.info("Failed to load application properties", e);
}
}
/**
* Application properties used by Atlas.
*/
public final class ApplicationProperties extends PropertiesConfiguration {
public static Configuration get() throws AtlasException {
if (instance == null) {
synchronized (ApplicationProperties.class) {
if (instance == null) {
//public static final String APPLICATION_PROPERTIES = "atlas-application.properties";
set(get(APPLICATION_PROPERTIES));
}
}
}
return instance;
}
}
看到这里终于明白配置文件的来源,在向HBase组件注册Atlas的时候,需要把Atlas的配置问价一起拷贝过去,在这里读取就是拷贝过去的Atlas的配置文件。在AtlasHook
中根据拷贝过去的配置文件创建HBaseAtlasHook
。在创建完成之后调用sendHBaseTableOperation
方法向kafka
服务器发送数据。
4. 发送数据,下面是发送数据代码以及注释:
// This will register Hbase entit