hue的部署 - 关联大数据实战第一课

一、hue官网介绍

一、hue官网介绍

官网Hue简介:gethue.com

  • Hue is a mature open source Query Assistant for querying, visualizing and sharing data in database & data warehouse
  • hue是一个成熟的开源助手用于查询、可视化、共享数据库在数据库和数据仓库中.

去到它的官网上看,看别人博客一脸懵逼也不知道在讲什么。

注意:

由于我们所使用的组件都是基于cdh开发的,所以使用的也都是cdh的版本,cdh官网组件仓库链接地址

CDH版本hue简介:

  • Hue is a graphical user interface to operate and develop applications for Apache Hadoop. Hue applications are collected into a desktop-style environment and delivered as a Web application, requiring no additional installation for individual users.
  • Hue是一个用户图形化界面,用于操作和开发Apache Hadoop应用程序。Hue应用程序被收集到一个桌面风格的环境,作为一个web应用程序被交付,不需要独立的安装对于独立用户来说。

2.1 Install Hue:

  • Hue consists of a web service that runs on a special node in your cluster. Choose one node where you want to run Hue. This guide refers to that node as the Hue Server. For optimal performance(为了获得最佳性能), this should be one of the nodes within your cluster, though it can be a remote node as long as there are no overly restrictive frirewalls(没有过于严格的防火墙). For small clusters of less than 10 nodes, you can see your existing master node as the Hue Server.

2.1.1 Hue Dependecies(依赖)

  • Hue employs some Python modules which use native code and requires certain development libraries be installed on your system. To install from the tarball, you must have the following installed.

一些必要的依赖我们需要使用如下的命令给它安装上去:

[root@hadoop001 ~]# yum install -y ant asciidoc cyrus-sasl-devel cyrus-sasl-gssapi cyrus-sasl-plain gcc \
gcc-c++ krb5-devel libffi-devel libxml2-devel libxslt-devel make mysql mysql-devel openldap-devel python-devel sqlite-devel gmp-devel

2.1.2 Build

  • Configure $prefix with the path where you want to install Hue by running:
$ PREFIX=/usr/share make install
$ cd /usr/share/name
  • You can install Hue anywhere on your system, and run Hue as a non-root user. The Shell application needs root privileges to launch various sub-processes as the logged in users.$USER should be the group of the user running Hue(same as its username)

  • it is a good practice to create a new user for Hue and either install Hue in that user’s home directory, or in a directory within /usr/share.

2.1.3 Troubleshooting the Hue Tarball Installation(解决安装问题)

迁移Hue的安装目录后需要进行如下操作:
Q:I moved my Hue installation from one directory to another and now Hue no.longer functions correctly.

A:Due to the use of absolute paths by some Python packages, you must run a series of commands if you move your Hue installation. In the new location, run:

rm app.reg
rm -r build
make apps

2.2、 Hadoop Configuration(Hadoop配置)

2.2.1 Configure WebHdfs:

  • 提供高数据传输,client可以直接和DataNode进行通信
  • You need to enable WebHdfs or run an HttpFS server. To turn on WebHDFS, add this to your hdfs-site.xml and restart your HDFS cluster. Depending on your setup, your hdfs-site.xml might be in /etc/hadoop/conf.
<property>
  <name>dfs.webhdfs.enabled</name>
  <value>true</value>
</property>

You also need to add this to core-site.xml:

<property>
  <name>hadoop.proxyuser.hue.hosts</name>
  <value>*</value>
</property>
<property>
  <name>hadoop.proxyuser.hue.groups</name>
  <value>*</value>
</property>

If you place your Hue server outside the Hadoop cluster, you can run an HttpFS server to provide Hue access to HDFS. The HttpFS service requires only one port to be opened to the cluster.

  • 如果你将Hue服务器置于Hadoop集群之外,则可以运行httpFS服务器来提供对HDFS的Hue访问,HttpsFS服务仅仅只需要向集群打开一个端口

路径:/etc/hadoop/conf目录下的httpfs-site.xml:

  • HttpFS是一个代理服务,方便与集群外部系统的进行集成,注意:HA模式下只能使用该模式
<property>
  <name>httpfs.proxyuser.hue.hosts</name>
  <value>*</value>
</property>
<property>
  <name>httpfs.proxyuser.hue.groups</name>
  <value>*</value>
</property>

2.4 Hive Configuration:

1、Hue’s Beeswax application helps you use Hive to query your data. It depends on a Hive server 2 running in the cluster. Please read this section to ensure a proper integration.

2、Your Hive data is stored in HDFS, normally under /user/hive/warehouse (or any path you specify as hive.metastore.warehouse.dir) in your hive-site.xml. Make sure this location exists and is writable(可以被写入) by the users whom you expect to be creating tables. /tmp (on the local file system) must be world-writable(1777), as Hive makes extensive use of it.

  • In hue.ini, modify hive_conf_dir to point to the directory containing hive-site.xml.

第三大步:Configuring Hue(配置Hue)

Hue ships with a default configuration that will work for pseudo-distributed clusters(hue自带了一个默认配置在伪分布式节点中).

if you are running on a real cluster, you must make a few changes to the hue.ini configuration file (/etc/hue/hue.ini) when installed from the package version or pseudo-distributed.ini in desktop/conf when in development mode). (如果你想真正运行它在集群上,你必须做出一些改变在hue.ini上,当你安装这个包的时候)The following sections describe the key configuration options you must make to configure Hue.

3.1 Web Server Configuration:

  • These configuration variables are under the desktop section in the hue.ini configuration file.

3.1.1 指定Hue的HTTP地址(Specifying the Hue HTTP Address)

  • Hue uses CherryPy web server. You can use the following options to change the IP address and port that the web server listens on. The default setting is port 8888 on all configured IP address.
#webserver listens on this address and port 
http_host=0.0.0.0
http_port=8888

3.1.2 指定秘钥(Specifying the secret key)

For security, you should also specify the secret key that is used for secure hashing in the session store. Enter a long series of random characters(30 to 60 characters is recommended).

secret_key=jFE93j;2[290-eiw.KEiwN2s3['d;/.q[eIW^y#e=+Iei*@Mn<qW5o

if you dont specify a secret key, your session cookies will not be secure. Hue will run but it will also display error messages telling you to set the secret key. (如果你不指定秘钥,你的cookies是不安全的,Hue会运行但是它会显示错误信息来让你设置secret key)

3.1.3 Authentication(身份验证)

  • By default,the first user who logs in to Hue can choose any username and password and becomes an administrator automatically. This user can create other user and administrator accounts. User information is stored in the Django database in the Django backend.
  • 默认的第一个登陆Hue的用户能够选择任意的用户名和密码,它将自动成为一个管理员。这个用户能创建其它用户和管理员账户,用户信息被存储在Django后台的Django数据库中。

The authentication system is pluggable.(可插拔的)

3.2 Hue Configuration for Hadoop

  • These configuration variables are under the [hadoop] section in the [hue.ini] configuration file.

3.3 Hive Configuration

  • In the [beeswax] section of the configuration file, you can optionally specify the following:

beeswax_server_host:
The hostname or IP that the Hive Server should blind to. By default it binds to [localhost], and therefore only serves local IPC clients.
Hive_home_dir
The base directory of your hive installation.
Hive_conf_dir
The directory containing your hive-site.xml Hive configuration file.

开始进行配置:

设置hue.ini文件

($HUE_HOME/desktop/conf/hue.ini)

[desktop]
secret_key=jFE93j;2[290-eiw.KEiwN2s3['d;/.q[eIW^y#e=+Iei*@Mn<qW5o
http_host=hadoop001
time_zone=Asia/Shanghai

Hue集成Hadoop($HADOOP_HOME/etc/hadoop):

#hdfs-site.xml
    <property>
        <name>dfs.webhdfs.enabled</name>
        <value>true</value>
    </property>
     
#core-site.xml
    <property>
        <name>hadoop.proxyuser.hue.hosts</name>
        <value>*</value>
    </property>
    <property>
        <name>hadoop.proxyuser.hue.groups</name>
        <value>*</value>
    </property>
#httpfs-site.xml
    <property>
        <name>httpfs.proxyuser.hue.hosts</name>
        <value>*</value>
    </property>
    <property>
        <name>httpfs.proxyuser.hue.groups</name>
        <value>*</value>
    </property>

修改hue.ini配置文件($HUE_HOME/desktop/conf/hue.ini)
取消webhdfs_url的注释:

[hadoop]

  # Configuration for HDFS NameNode
  # ------------------------------------------------------------------------
  [[hdfs_clusters]]
    # HA support by using HttpFs

    [[[default]]]
      # Enter the filesystem uri
      fs_defaultfs=hdfs://hadoop001:8020

      # NameNode logical name.
      ## logical_name=

      # Use WebHdfs/HttpFs as the communication mechanism.
      # Domain should be the NameNode or HttpFs host.
      # Default port is 14000 for HttpFs.
      webhdfs_url=http://hadoop001:50070/webhdfs/v1

Hue还需要集成Hive:

5.1 修改hive-site.xml,设置hs2的相关信息:

<property>
    <name>hive.server2.thrift.port</name>
    <value>10000</value>
</property>
<property>
    <name>hive.server2.thrift.bind.host</name>
    <value>hadoop001</value>
</property>
<property>
    <name>hive.server2.long.polling.timeout</name>
    <value>5000</value>
</property>
<property>
    <name>hive.server2.authentication</name>
    <value>NONE</value>
</property>

5.2 修改hue.ini配置文件($HUE_HOME/conf/hue.ini)

  • 取消如下注释,并修改hostname:
[beeswax]

  # Host where HiveServer2 is running.
  # If Kerberos security is enabled, use fully-qualified domain name (FQDN).
  hive_server_host=hadoop001

  # Port where HiveServer2 Thrift server runs on.
  hive_server_port=10000

  # Hive configuration directory, where hive-site.xml is located
  hive_conf_dir=/home/hadoop/app/hive-1.1.0-cdh5.7.0/conf/

上述配置完成后开始启动Hue:

#启动hadoop
[hadoop@hadoop001 ~]$ start-all.sh 

#启动hiveserver2
[hadoop@hadoop001 ~]$ nohup hive --service hiveserver2  >~/app/hive-1.1.0-cdh5.7.0/console.log 2>&1 &

使用beeline连接hs2,-n hadoop 这里的hadoop是启动beeline的用户hadoop
[hadoop@hadoop001 ~]$ beeline -u jdbc:hive2://hadoop001:10000/default -n hadoop

连接时发生异常,信息如下:Error: Could not open client transport with JDBC Uri: jdbc:hive2://hadoop001:10000: null 
解决方案:检查发现配置了hive的用户登录导致的,将hive.server2.authentication配置的值改为NONE即可,原先是NOSAL

#启动hue
[hadoop@hadoop001 ~]$ ~/app/hue-3.9.0-cdh5.7.0/build/env/bin/supervisor

7、访问Hue:
访问地址:http://ip:88888
我创建了一个hadoop账号,必须是hadoop用户,不然dfs只能操作/usr/xxx目录,Hive也是无法操作的。

  • 0
    点赞
  • 0
    收藏
    觉得还不错? 一键收藏
  • 0
    评论
评论
添加红包

请填写红包祝福语或标题

红包个数最小为10个

红包金额最低5元

当前余额3.43前往充值 >
需支付:10.00
成就一亿技术人!
领取后你会自动成为博主和红包主的粉丝 规则
hope_wisdom
发出的红包
实付
使用余额支付
点击重新获取
扫码支付
钱包余额 0

抵扣说明:

1.余额是钱包充值的虚拟货币,按照1:1的比例进行支付金额的抵扣。
2.余额无法直接购买下载,可以购买VIP、付费专栏及课程。

余额充值