简介
Hue是一个开源的Apache Hadoop UI系统,由Cloudera Desktop演化而来,最后Cloudera公司将其贡献给Apache基金会的Hadoop社区,用Python Web框架Django实现的。通过使用Hue我们可以在浏览器端的Web控制台上与Hadoop集群进行交互来分析处理数据,例如操作HDFS上的数据,运行MapReduce Job,执行Hive的SQL语句,浏览HBase数据库等等。
环境准备
组件 | 版本 |
---|---|
hadoop | 2.7.5 |
mysql | 5.7 |
jdk | 1.8 |
maven | 3.3.9 |
python | 2.7 |
hue | 4.2.0 |
安装基础依赖
sudo apt-get install git ant gcc g++ libffi-dev libkrb5-dev libmysqlclient-dev libsasl2-dev libsasl2-modules-gssapi-mit libsqlite3-dev libssl-dev libxml2-dev libxslt-dev make maven libldap2-dev python-dev python-setuptools libgmp3-dev
hue版本选择
hue4.2版本为稳定版本,所以我们选择了4.2
配置mysql
启动mysql,并创建hue库
编译hue
解压hue安装包,进入hue目录
修改字符集
根据需求修改为简体中文,必须在编译前修改配置文件
vim ./desktop/core/src/desktop/settings.py
LANGUAGE_CODE = 'zh_CN'
#LANGUAGE_CODE = 'en-us'
LANGUAGES = [
('en-us', _('English')),
('zh_CN', _('Simplified Chinese')),
编译
make
make locals
此处注意:在4.2版本,make apps命令编译后字符集不生效,因此使用以上命令
修改配置文件
vim ./desktop/conf/pseudo-distributed.ini
主要修改的地方;
1、Hue的启动主机地址以及IP,设置登录的用户和组
2、Hadoop集群启动的参数,HDFS以及Yarn的启动位置
3、Mysql的地址,链接所用的用户、密码以及库名
4、HiveServer的地址
5、zookeeper集群的地址。
6、不需要启动加载的插件:appblack
1\设置标签:[desktop]
[desktop]
# Webserver listens on this address and port
http_host=10.0.0.3
http_port=8888
# Time zone name
time_zone=Asia/Shanghai
# Webserver runs as this user
server_user=hue123
server_group=hue123
# This should be the Hue admin and proxy user
default_user=hue123
# This should be the hadoop cluster admin
default_hdfs_superuser=root
2\:设置标签:[[database]]
host=10.0.0.3
port=3306
engine=mysql
user=hue123
password=***
name=hue123
3:设置:[hadoop]
[hadoop]
# Configuration for HDFS NameNode
# ------------------------------------------------------------------------
[[hdfs_clusters]]
# HA support by using HttpFs
[[[default]]]
# Enter the filesystem uri
fs_defaultfs=hdfs://master02.yscredit.com:8020
#fs_defaultfs=hdfs://ns1
# NameNode logical name.
#logical_name=ns1
# Use WebHdfs/HttpFs as the communication mechanism.
# Domain should be the NameNode or HttpFs host.
# Default port is 14000 for HttpFs.
webhdfs_url=http://master02.yscredit.com:50070/webhdfs/v1
# Change this if your HDFS cluster is Kerberos-secured
# security_enabled=true
# In secure mode (HTTPS), if SSL certificates from YARN Rest APIs
# have to be verified against certificate authority
## ssl_cert_ca_verify=True
# Directory of the Hadoop configuration
hadoop_hdfs_home=/opt/hadoop-2.6.0-cdh5.8.0
hadoop_bin=/opt/hadoop-2.6.0-cdh5.8.0/bin
## hadoop_conf_dir=$HADOOP_CONF_DIR when set or '/etc/hadoop/conf'
hadoop_conf_dir=/opt/hadoop-2.6.0-cdh5.8.0/etc/hadoop
# Configuration for YARN (MR2)
# ------------------------------------------------------------------------
[[yarn_clusters]]
[[[default]]]
# Enter the host on which you are running the ResourceManager
resourcemanager_host=master01.yscredit.com
# The port where the ResourceManager IPC listens on
resourcemanager_port=8032
# Whether to submit jobs to this cluster
submit_to=True
# Resource Manager logical name (required for HA)
## logical_name=
# Change this if your YARN cluster is Kerberos-secured
## security_enabled=false
# URL of the ResourceManager API
resourcemanager_api_url=http://master01.yscredit.com:8088
# URL of the ProxyServer API
proxy_api_url=http://master01.yscredit.com:8088
# URL of the HistoryServer API
history_server_api_url=http://master01.yscredit.com:19888
4、设置Hive:[beeswax]
[beeswax]
# Host where HiveServer2 is running.
# If Kerberos security is enabled, use fully-qualified domain name (FQDN).
hive_server_host=master02.yscredit.com
# Port where HiveServer2 Thrift server runs on.
hive_server_port=10000
hive_home_dir=/opt/hive-1.1.0-cdh5.8.0
# Hive configuration directory, where hive-site.xml is located
hive_conf_dir=/opt/hive-1.1.0-cdh5.8.0/conf
5\设置zookeeper:
[zookeeper]
[[clusters]]
[[[default]]]
# Zookeeper ensemble. Comma separated list of Host/Port.
# e.g. localhost:2181,localhost:2182,localhost:2183
## host_ports=localhost:2181
host_ports=zookeeper01.yscredit.com:2181,zookeeper02.yscredit.com:2181,zookeeper03.yscredit.com:2181
# The URL of the REST contrib service (required for znode browsing).
## rest_url=http://localhost:9998
# Name of Kerberos principal when using security.
## principal_name=zookeeper
[libzookeeper]
# ZooKeeper ensemble. Comma separated list of Host/Port.
# e.g. localhost:2181,localhost:2182,localhost:2183
## ensemble=localhost:2181
ensemble=zookeeper01.yscredit.com:2181,zookeeper02.yscredit.com:2181,zookeeper03.yscredit.com:2181
# Name of Kerberos principal when using security.
## principal_name=zookeeper
6、设置黑名单
# Comma separated list of apps to not load at server startup.
# e.g.: pig,zookeeper
# pig, hbase, search, indexer, sqoop, impala
app_blacklist=pig,hbase,sqoop
初始化数据库
bin/hue syncdb
bin/hue migrate
需要注意的是
1、在执行bin/hue syncdb时提示了
You just installed Django’s auth system, which means you don’t have any superusers defined.
Would you like to create one now? (yes/no):yes
Username (leave blank to use ‘root’):root
Email address:
Password:
Password (again):
Superuser created successfully.
Installing custom SQL …
Installing indexes …
Installed 0 object(s) from 0 fixture(s)
此时会为hue创建一个初始化的用户,即超级管理员用户,同时会在mysql的元数据库hue中的auth_user表中创建一个用户(即上面输入的用户),用于作为超级管理员登录hue
2、此外需要注意的是如果使用的是:
/build/env/bin/hue syncdb --noinput
则不会让输入初始的用户名和密码,只有在首次登录时才会让输入,作为超级管理员账户。
启动hue
nohup /opt/hue/build/env/bin/supervisor