1. Hue简介
HUE=Hadoop User Experience(Hadoop用户体验),直白来说就一个开源的Apache Hadoop UI系统,由Cloudera Desktop演化而来,最后Cloudera公司将其贡献给Apache基金会的Hadoop社区,它是基于Python Web框架Django实现的。通过使用HUE我们可以在浏览器端的Web控制台上与Hadoop集群进行交互来分析处理数据。
2. Hue与其他框架的集成
2.1 Hue与HDFS
2.1.1 集群环境
master | slave1 | slave2 |
---|---|---|
zk | zk | zk |
hbase | hbase | hbase |
hadoop | hadoop | hadoop |
2.1.2 配置HDFS
vim hdfs-site.xml
# 设置web是否能够访问namenode与datanode的文件信息
<property>
<name>dfs.webhdfs.enabled</name>
<value>true</value>
</property>
vim core-site.xml
# 设置hue的代理host
<property>
<name>hadoop.proxyuser.hue.hosts</name>
<value>*</value>
</property>
# 设置hue的代理用户组
<property>
<name>hadoop.proxyuser.hue.groups</name>
<value>*</value>
</property>
# 设置用户Hadoop的代理用户
<property>
<name>hadoop.proxyuser.hadoop.hosts</name>
<value>*</value>
</property>
# 设置代理用户hadoop的代理用户组
<property>
<name>hadoop.proxyuser.hadoop.groups</name>
<value>*</value>
</property>
# 如果你的Hadoop配置了高可用,则必须通过httpfs来访问,需要添加如下属性,反则则不必须。(如果HUE服务与Hadoop服务不在同一节点,则必须配置)
# 设置httpfs的代理host
<property>
<name>hadoop.proxyuser.httpfs.hosts</name>
<value>*</value>
</property>
# 设置httpfs的代理用户组
<property>
<name>hadoop.proxyuser.httpfs.groups</name>
<value>*</value>
</property>
区别:WebHDFS是HDFS内置的组件,已经运行于NameNode和DataNode中。对HDFS文件的读写,将会重定向到文件所在的DataNode,并且会完全利用HDFS的带宽。HttpFS是独立于HDFS的一个服务。对HDFS文件的读写,将会通过它进行中转,它能限制带宽占用。
vim httpfs-site.xml
<configuration>
# 设置hue的代理host
<property>
<name>httpfs.proxyuser.hue.hosts</name>
<value>*</value>
</property>
# 设置hue的代理用户组
<property>
<name>httpfs.proxyuser.hue.groups</name>
<value>*</value>
</property>
</configuration>
以上两个属性是用于HUE服务与Hadoop服务不在同一台节点上所必须的配置。
2.1.3 scp同步集群的hadoop配置文件
[root@master hadoop]# scp -r /usr/local/hadoop-2.6.1/etc/hadoop root@slave1:/usr/local/hadoop-2.6.1/etc/
[root@master hadoop]# scp -r /usr/local/hadoop-2.6.1/etc/hadoop root@slave2:/usr/local/hadoop-2.6.1/etc/
2.1.4 启动httpfs服务
[root@master hadoop]# /usr/local/hadoop-2.6.1/sbin/httpfs.sh start
2.1.5 配置hue.ini文件
[hadoop]
# Configuration for HDFS NameNode
# ------------------------------------------------------------------------
[[hdfs_clusters]]
# HA support by using HttpFs
[[[default]]]
# HDFS服务器地址
fs_defaultfs=hdfs://master:8020
# 如果开启了高可用,需要配置如下
# NameNode logical name.
## logical_name=
# Use WebHdfs/HttpFs as the communication mechanism.
# Domain should be the NameNode or HttpFs host.
# Default port is 14000 for HttpFs.
# 向HDFS发送命令的请求地址
webhdfs_url=http://master:50070/webhdfs/v1
# Change this if your HDFS cluster is Kerberos-secured
## security_enabled=false
# Default umask for file and directory creation, specified in an octal value.
## umask=022
# Directory of the Hadoop configuration
## hadoop_conf_dir=$HADOOP_CONF_DIR when set or '/etc/hadoop/conf'
# hadoop的配置
hadoop_conf_dir=/usr/local/hadoop-2.6.1/etc/hadoop
hadoop_hdfs_home=/usr/local/hadoop-2.6.1
hadoop_bin=/usr/local/hadoop-2.6.1/bin
2.2 Hue与Yarn
2.2.1 配置hue.ini文件
[[yarn_clusters]]
[[[default]]]
# yarn服务的配置
resourcemanager_host=master
# The port where the ResourceManager IPC listens on
resourcemanager_port=8032
# 是否将作业提交到此群集,并监控作业执行情况
submit_to=True
# Resource Manager logical name (required for HA)
## logical_name=
# Change this if your YARN cluster is Kerberos-secured
## security_enabled=false
# 配置yarn资源管理的访问入口
resourcemanager_api_url=http://master:8088
# URL of the ProxyServer API
proxy_api_url=http://master:8088
# 历史服务器管理的入口,查看作业的历史运行情况
history_server_api_url=http://slave2:19888
2.2.2 Hadoop启动jobhistoryserver服务来实现web查看作业的历史运行情况
[root@slave2 hadoop-2.6.1]# ./sbin/mr-jobhistory-daemon.sh start historyserver
2.3 Hue与Hive
2.3.1 修改hive配置文件hive-site.xml
# Hue与hive集成需要hive开启HiveServer2服务
<property>
<name>hive.server2.thrift.bind.host</name>
<value>master</value>
</property>
<property>
# TCP监听hive的端口
<name>hive.server2.thrift.port</name>
<value>10000</value>
</property>
<property>
<name>hive.server2.long.polling.timeout</name>
<value>5000</value>
</property>
<property>
# 运行metastore服务的host
<name>hive.metastore.uris</name>
<value>thrift://master:9083</value>
</property>
2.3.2 启动hive有关服务
[root@master apache-hive-1.2.2-bin]# ./bin/hive --service metastore &
[root@master apache-hive-1.2.2-bin]# ./bin/hive --service hiveserver2 &
2.3.3 配置hue.ini
[beeswax]
# Host where HiveServer2 is running.
# If Kerberos security is enabled, use fully-qualified domain name (FQDN).
hive_server_host=master
# Port where HiveServer2 Thrift server runs on.
hive_server_port=10000
# Hive configuration directory, where hive-site.xml is located
hive_conf_dir=/usr/local/apache-hive-1.2.2-bin/conf
2.4 Hue与Mysql
2.4.1 配置hue.ini文件
# mysql, oracle, or postgresql configuration.
[[[mysql]]]
# Name to show in the UI.
498,0-1 52%
[[[mysql]]]
# Name to show in the UI.
nice_name="db_mysql"
# For MySQL and PostgreSQL, name is the name of the database.
# For Oracle, Name is instance of the Oracle server. For express edition
# this is 'xe' by default.
## name=mysqldb
# Database backend to use. This can be:
# 1. mysql
# 2. postgresql
# 3. oracle
engine=mysql
# IP or hostname of the database to connect to.
host=master
# Port the database server is listening to. Defaults are:
# 1. MySQL: 3306
# 2. PostgreSQL: 5432
# 3. Oracle Express Edition: 1521
port=3306
# Username to authenticate with when connecting to the database.
user=root
# Password matching the username to authenticate with when
# connecting to the database.
password=xxxxxx
2.5 Hue与Zookeeper
2.5.1 配置hue.ini文件
[zookeeper]
[[clusters]]
[[[default]]]
# Zookeeper ensemble. Comma separated list of Host/Port.
# e.g. localhost:2181,localhost:2182,localhost:2183
host_ports=master:2181,slave1:2181,slave2:2181
2.5.2 启动zookeeper集群
[root@master zookeeper-3.4.11]# ./bin/zkServer.sh start
[root@slave1 zookeeper-3.4.11]# ./bin/zkServer.sh start
[root@slave2 zookeeper-3.4.11]# ./bin/zkServer.sh start
2.6 Hue与Hbase
2.6.1 修改hue.ini文件
[hbase]
# Comma-separated list of HBase Thrift servers for clusters in the format of '(name|host:port)'.
# Use full hostname with security.
hbase_clusters=(Cluster|master:9090)
# HBase configuration directory, where hbase-site.xml is located.
hbase_conf_dir=/usr/local/hbase-1.3.1/conf
2.6.2 启动HBase服务与thrift服务
[root@master hbase-1.3.1]# ./bin/start-hbase.sh
[root@master hbase-1.3.1]# ./bin/hbase-daemon.sh start thrift