0、SparkSQL要集成sentry权限认证要解决下面几个问题:
1、从hive认证hook中找到sentry认证方法,并将认证代码提取出来
2、从spark sql的逻辑计划中提取sentry认证方法所需的参数
3、通过spark sql extensions将认证添加到spark sql的执行过程中
1、提取sentry认证方法
从sentry源代码中逐步查看hive hook的代码,终于找到了真正认证的方法,代码如下:
public void auth(HiveOperation stmtOperation, Set<ReadEntity> inputs, Set<WriteEntity> outputs, String user) {
HiveAuthzPrivileges stmtAuthObject = HiveAuthzPrivilegesMap.getHiveAuthzPrivileges(stmtOperation);
List<List<DBModelAuthorizable>> inputHierarchy = new ArrayList<List<DBModelAuthorizable>>();
List<List<DBModelAuthorizable>> outputHierarchy = new ArrayList<List<DBModelAuthorizable>>();
switch (stmtAuthObject.getOperationScope()) {
case SERVER:
// validate server level privileges if applicable. Eg create UDF,register jar etc ..
List<DBModelAuthorizable> serverHierarchy = new ArrayList<DBModelAuthorizable>();
serverHierarchy.add(hiveAuthzBinding.getAuthServer());
inputHierarchy.add(serverHierarchy);
break;
case DATABASE:
// workaround for database scope statements (create/alter/drop db)
List<DBModelAuthorizable> dbHierarchy = new ArrayList<DBModelAuthorizable>();
getInputHierarchyFromInputs(inputHierarchy, inputs);
getOutputHierarchyFromOutputs(outputHierarchy, outputs);
break;
case TABLE:
// workaround for add partitions
if (partitionURI != null) {
inputHierarchy.add(ImmutableList.of(hiveAuthzBinding.getAuthServer(), partitionURI));
}
if (indexURI != null) {
outputHierarchy.add(ImmutableList.of(hiveAuthzBinding.getAuthServer(), indexURI));
}
getInputHierarchyFromInputs(inputHierarchy, inputs);
for (WriteEntity writeEntity : outputs) {
if (filterWriteEntity(writeEntity)) {
continue;
}
List<DBModelAuthorizable> entityHierarchy = new ArrayList<DBModelAuthorizable>();
entityHierarchy.add(hiveAuthzBinding.getAuthServer());
entityHierarchy.addAll(getAuthzHierarchyFromEntity(writeEntity));
outputHierarchy.add(entityHierarchy);
}
if (currTab != null) {
List<DBModelAuthorizable> externalAuthorizableHierarchy = new ArrayList<DBModelAuthorizable>();
externalAuthorizableHierarchy.add(hiveAuthzBinding.getAuthServer());
externalAuthorizableHierarchy.add(currDB);
externalAuthorizableHierarchy.add(currTab);
inputHierarchy.add(externalAuthorizableHierarchy);
}
if (currOutTab != null) {
List<DBModelAuthorizable> externalAuthorizableHierarchy = new ArrayList<DBModelAuthorizable>();
externalAuthorizableHierarchy.add(hiveAuthzBinding.getAuthServer());
externalAuthorizableHierarchy.add(currOutDB);
externalAuthorizableHierarchy.add(currOutTab);
outputHierarchy.add(externalAuthorizableHierarchy);
}
break;
default:
throw new AuthorizationException("Unknown operation scope type " + stmtAuthObject.getOperationScope().toString());
}
hiveAuthzBinding.authorize(stmtOperation, stmtAuthObject, new Subject(user), inputHierarchy, outputHierarchy);
}
该认证方法需要4个参数:
HiveOperation stmtOperation sql的操作类型,create、select或者其他
Set inputs 需要读取的表
Set outputs 需要创建或者写入的表
String user 用户名
其中hiveAuthzBinding
的初始化如下:
hiveConf = new HiveConf();
// 初始化需要hive-site和sentry-site
hiveConf.addResource(new FileInputStream("xxx/conf/hive-site.xml"));
authzConf = HiveAuthzBindingHook.loadAuthzConf(hiveConf);
hiveAuthzBinding = new HiveAuthzBinding(hiveConf, authzConf);
2、通过spark sql逻辑计划构造sentry认证所需的参数
获取spark sql LogicalPlan有很多方法,因为之前学习过自定义Optimizer规则,所以首先就想到可以自定义一个Rule,在自定义Rule中完成权限认证
class SentryAuthRule extends Rule[LogicalPlan]{
override def apply(plan: LogicalPlan): LogicalPlan = {
// do something
plan
}
}
2.1、构造HiveOperation
def getHiveOperation(plan: LogicalPlan): HiveOperation = {
plan match {
case c: Command => c match {
case _: AlterDatabasePropertiesCommand => HiveOperation.ALTERDATABASE
case p if p.nodeName == "AlterTableAddColumnsCommand" => HiveOperation.ALTERTABLE_ADDCOLS
case _: AlterTableAddPartitionCommand => HiveOperation.ALTERTABLE_ADDPARTS
case p if p.nodeName == "AlterTableChangeColumnCommand" =>
HiveOperation