How HiveServer2 Brings Security and Concurrency to Apache Hive

一篇比较老的文章。

repost:https://blog.cloudera.com/how-hiveserver2-brings-security-and-concurrency-to-apache-hive/

Apache Hive was one of the first projects to bring higher-level languages to Apache Hadoop. Specifically, Hive enables the legions of trained SQL users to use industry-standard SQL to process their Hadoop data.

However, as you probably have gathered from all the recent community activity in the SQL-over-Hadoop area, Hive has a few limitations for users in the enterprise space. Until recently, two in particular – concurrency and security – were largely unaddressed.

To address these gaps, for Hive release 0.11, Cloudera engineers built and contributed new infrastructure for meeting these needs. In this post, you’ll learn why it’s needed, and how it works.

Customer Requirements

As you probably know, relational databases almost universally have a server process to support clients connecting over IPC or network connections. The clients may be native command-line editors or applications/tools using a driver such as ODBC or JDBC.

In Hive, a component called HiveServer serves this purpose. But over the past few years, as adoption of Hive increased, more and more customers reported two major requirements unaddressed by HiveServer:

  • To run more users concurrently against Hive in traditional client/server architecture
  • To authenticate users to prevent untrusted user access and to enforce authorization around permissions to their data assets

Because Hive is so important for our customers, these requirements motivated us to implement a new server process for Hive 0.11. The goal was to create a framework that handles multiple concurrent clients, supports popular authentication mechanisms, and is easy to adopt for open client implementations like JDBC and ODBC.

The result of that effort, HiveServer2 (HIVE-2935), finally brings concurrencyauthentication, and a foundation for authorization to Hive. Next, we’ll provide some details about these new features.

HiveServer2 Architecture

HiveServer2 is now available in Hive 0.11 and all other releases of Hive in CDH 4.1 and later. It implements a new Thrift-based RPC interface that can handle concurrent clients. The current release supports Kerberos, LDAP, and custom pluggable authentication. The new RPC interface also has better options for JDBC and ODBC clients, especially for metadata access.

Like the original HiveServer, HiveServer2 is a container for the Hive execution engine. For each client connection, it creates a new execution context that serves Hive SQL requests from the client. The new RPC interface enables the server to associate this Hive execution context with the thread serving the client’s request.

Clients for HiveServer2

JDBC: Hive 0.11 includes a new JDBC driver that works with HiveServer2, enabling users to write JDBC applications against Hive. The application needs to use the JDBC driver class and specify the network address and port in the connection URL in order to connect to Hive. The following code snippet shows how to connect to HiveServer2 from JDBC:

Class.forName("org.apache.hive.jdbc.HiveDriver");
  Connection con = DriverManager.
     getConnection("jdbc:hive2://localhost:10000/default",
    "hive", "passwd");

You can review a detailed example on the Hive wiki.

Beeline CLI: Hive 0.11 also includes a new command-line interface (CLI) called Beeline that works with HiveServer2. Beeline is a JDBC application based on the SQLLine CLI that supports embedded and remote-client modes. The embedded mode is where the Hive runtime is part of the client process itself; there’s no server involved. (You can explore the detailed documentation for SQLLine, which is also applicable to Beeline, here.) Note that HiveServer2 doesn’t support the original Hive CLI client, as the Beeline CLI is a functional replacement designed for the HiveServer2 interface.

ODBC: Although Hive 0.11 currently doesn’t include a ODBC driver that works with HiveServer2, Cloudera makes one available.

Metastore Considerations

The Hive metastore service runs in its own JVM process. Clients other than Hive, like Apache Pig, connect to this service via HCatalog for metadata access. HiveServer2 supports local as well as remote metastore modes – which is useful when you have more than one service (Pig, Cloudera Impala, and so on) that needs access to metadata. This is the recommended deployment mode with HiveServer2:

Authentication

Authentication support is another major feature of HiveServer2. In the original HiveServer, if you can access the host/port over the network, you can access the data – so it relies on support for multiple authentication options to restrict access.

In contrast, HiveServer2 support Kerberos, pass-through LDAP, and pass-through plug-able custom authentication. All client types – JDBC, ODBC, as well as Beeline CLI — support these authentication modes. This enables the Hive deployment to easily integrate with existing authentication services.

Gateway to Secure Hadoop

Today, the Hadoop ecosystem only supports Kerberos for authentication. That means for accessing secure Hadoop, one needs to get a Kerberos ticket. However, enabling Kerberos on every client box can be a very challenging task and thus can restrict access to Hive and Hadoop.

To address that issue, HiveServer2 can authenticate clients over non-Kerberos connections (eg. LDAP) and run queries against Kerberos-secured Hadoop data. This approach allows users to securely access Hive without complex security infrastructure or limitations.

Foundation for Fine-grained Authorization

As a stopgap until fine-grained authorization is available, HiveServer2 also supports access to Hadoop as itself or by impersonating the connected user. (This behavior is configurable.) In this so-called impersonation mode, MapReduce jobs are submitted as the user connecting to HiveServer2. If the underlying Hadoop cluster is secure, the service principle used by Hive needs Hadoop proxy privileges to impersonate the connecting users. This interim solution provides coarse-grained authorization based on ownership and permissions on files and directories in HDFS (as opposed to Hive tables and views), which unblocks some usage.

HiveServer2’s strong authentication and revamped server-side architecture also provides the foundation for fine-grained authorization in Hive in the very near future. Stay tuned! (Update: read “With Sentry, Cloudera Closes Hadoop’s Enterprise Security Gap”)

Conclusion

In this post, you have received an overview of how Cloudera’s contribution of HiveServer2 brings concurrency, authentication, and a foundation for fine-grained authorization (more on this in a future post) to Hive. For further reading, you may want to explore the docs on Setting up HiveServer2 and HiveServer2 Clients.

Prasad Mujumdar is a Software Engineer on the Platform team.

  • 0
    点赞
  • 0
    收藏
    觉得还不错? 一键收藏
  • 0
    评论
本项目是一个基于SpringBoot开发的华府便利店信息管理系统,使用了Vue和MySQL作为前端框架和数据库。该系统主要针对计算机相关专业的正在做毕设的学生和需要项目实战练习的Java学习者,包含项目源码、数据库脚本、项目说明等,有论文参考,可以直接作为毕设使用。 后台框架采用SpringBoot,数据库使用MySQL,开发环境为JDK、IDEA、Tomcat。项目经过严格调试,确保可以运行。如果基础还行,可以在代码基础之上进行改动以实现更多功能。 该系统的功能主要包括商品管理、订单管理、用户管理等模块。在商品管理模块中,可以添加、修改、删除商品信息;在订单管理模块中,可以查看订单详情、处理订单状态;在用户管理模块中,可以注册、登录、修改个人信息等。此外,系统还提供了数据统计功能,可以对销售数据进行统计和分析。 技术实现方面,前端采用Vue框架进行开发,后端使用SpringBoot框架搭建服务端应用。数据库采用MySQL进行数据存储和管理。整个系统通过前后端分离的方式实现,提高了系统的可维护性和可扩展性。同时,系统还采用了一些流行的技术和工具,如MyBatis、JPA等进行数据访问和操作,以及Maven进行项目管理和构建。 总之,本系统是一个基于SpringBoot开发的华府便利店信息管理系统,使用了Vue和MySQL作为前端框架和数据库。系统经过严格调试,确保可以运行。如果基础还行,可以在代码基础之上进行改动以实现更多功能。

“相关推荐”对你有帮助么?

  • 非常没帮助
  • 没帮助
  • 一般
  • 有帮助
  • 非常有帮助
提交
评论
添加红包

请填写红包祝福语或标题

红包个数最小为10个

红包金额最低5元

当前余额3.43前往充值 >
需支付:10.00
成就一亿技术人!
领取后你会自动成为博主和红包主的粉丝 规则
hope_wisdom
发出的红包
实付
使用余额支付
点击重新获取
扫码支付
钱包余额 0

抵扣说明:

1.余额是钱包充值的虚拟货币,按照1:1的比例进行支付金额的抵扣。
2.余额无法直接购买下载,可以购买VIP、付费专栏及课程。

余额充值