1 Hue简介
1.1 Hue介绍
Hue是一个开源的Apache Hadoop UI系统,最早是由Cloudera Desktop演化而来,由Cloudera贡献给开源社区,它是基于Python Web框架Django实现的。通过使用Hue我们可以在浏览器端的Web控制台上与Hadoop集群进行交互来分析处理数据。
1.2 功能和特性集合
Ø 默认基于轻量级sqlite数据库管理会话数据,用户认证和授权,可以自定义为MySQL、Postgresql,以及Oracle
Ø 基于文件浏览器(File Browser)访问HDFS
Ø 基于Hive编辑器来开发和运行Hive查询
Ø 支持基于Solr进行搜索的应用,并提供可视化的数据视图,以及仪表板(Dashboard)
Ø 支持基于Impala的应用进行交互式查询
Ø 支持Spark编辑器和仪表板(Dashboard)
Ø 支持Pig编辑器,并能够提交脚本任务
Ø 支持Oozie编辑器,可以通过仪表板提交和监控Workflow、Coordinator和Bundle
Ø 支持HBase浏览器,能够可视化数据、查询数据、修改HBase表
Ø 支持Metastore浏览器,可以访问Hive的元数据,以及HCatalog
Ø 支持Job浏览器,能够访问MapReduce Job(MR1/MR2-YARN)
Ø 支持Job设计器,能够创建MapReduce/Streaming/Java Job
Ø 支持Sqoop 2编辑器和仪表板(Dashboard)
Ø 支持ZooKeeper浏览器和编辑器
Ø 支持MySql、PostGresql、Sqlite和Oracle数据库查询编辑器
1.3 Hadoop集群服务分布图
1.3.1 ZOOKEEPER&HDFS
服务器 | ZOOKEEPER | HDFS | ||||
IP地址 | 主机名 | zookeeper | journalnode | namenode | zkfc | datanode |
192.168.1.201 | hadoop001 | √ | √ | √ | √ |
|
192.168.1.202 | hadoop002 | √ | √ |
|
| √ |
192.168.1.203 | hadoop003 | √ | √ | √ | √ |
|
192.168.1.204 | hadoop004 |
|
|
|
| √ |
192.168.1.205 | hadoop005 |
|
|
|
| √ |
192.168.1.206 | hadoop006 |
|
|
|
| √ |
192.168.1.207 | Hue |
|
|
|
|
|
1.3.2 YARN
服务器 | YARN | ||||
IP地址 | 主机名 | resourcemanager | nodemanager | timeline | JobHistory |
192.168.1.201 | hadoop001 | √ |
| √ | √ |
192.168.1.202 | hadoop002 |
| √ |
|
|
192.168.1.203 | hadoop003 | √ |
|
|
|
192.168.1.204 | hadoop004 |
| √ |
|
|
192.168.1.205 | hadoop005 |
| √ |
|
|
192.168.1.206 | hadoop006 |
| √ |
|
|
192.168.1.207 | hue |
|
|
|
|
1.3.3 HIVE&MYSQL
服务器 | HIVE | MYSQL | ||
IP地址 | 主机名 | metastore | hiveserver2 | mariadb |
192.168.1.201 | hadoop001 | √ | √ | √(主) |
192.168.1.202 | hadoop002 |
|
|
|
192.168.1.203 | hadoop003 |
|
| √(备) |
192.168.1.204 | hadoop004 |
|
|
|
192.168.1.205 | hadoop005 |
|
|
|
192.168.1.206 | hadoop006 |
|
|
|
192.168.1.207 | hue |
|
|
|
1.3.4 HBASE
服务器 | HBASE | ||
IP地址 | 主机名 | master | regionserver |
192.168.1.201 | hadoop001 | √ |
|
192.168.1.202 | hadoop002 |
| √ |
192.168.1.203 | hadoop003 | √ |
|
192.168.1.204 | hadoop004 |
| √ |
192.168.1.205 | hadoop005 |
| √ |
192.168.1.206 | hadoop006 |
| √ |
192.168.1.207 | hue |
|
|
1.4 Hue部署情况
192.168.1.207 | |
组件 | 完成情况 |
HIVE | √ |
HBASE | √ |
SPARKSQL | √ |
HDFS | √ |
ZOOKEEPER | √ |
DBQuery |
|
Impala |
|
HDFS |
|
Job Browser | √ |
Ooize |
|
Pig |
|
Notebook | √ |
2 环境及依赖
2.1 操作系统
Description: CentOS Linux release 7.2.1511 (Core)
Release: 7.2.1511
2.2 配置局域网YUM源
根据实际需要配置局域网YUM源。
【参考配置】如下
===============================================================================
在/etc/yum.repos.d目录下创建local-yum.repo文件
[root@hue~]# mkdir /etc/yum.repos.d/local-yum.repo
该文件的内容如下:
[local_yum]
name=local_yum
baseurl=ftp://192.168.8.211/vg0_lv1/Packages/
enabled=1
gpgcheck=0
##enabled=1 查看是否已经创建成功,yum list显示存在lan的包则成功
## /etc/yum.repos.d/目录下可能会有系统自带的yum文件,需要先删掉才能识别自己创建的yum文件,记得删除前需备份。
接下来删除/etc/yum.repos.d目录下除local-yum.repo外的所有文件
[root@hue yum.repos.d]# ll
total 28
-rw-r--r-- 1 root root 71 Aug 17 14:32 centos_211.repo
-rw-r--r--. 1 root root 1991 Oct 23 2014 CentOS-Base.repo
-rw-r--r--. 1 root root 647 Oct 23 2014 CentOS-Debuginfo.repo
-rw-r--r--. 1 root root 289 Oct 23 2014 CentOS-fasttrack.repo
-rw-r--r--. 1 root root 630 Oct 23 2014 CentOS-Media.repo
-rw-r--r--. 1 root root 5394 Oct 23 2014 CentOS-Vault.repo
[root@hue yum.repos.d]# rm -rf CentOS-*
[root@hue yum.repos.d]# ll
配完local-yum.repo文件后执行下面语句查看yum源是否配置成。
yum clean all
yum makecache
yum list |wc -l ##执行后显示yum数目,这里显示的就是6353
6353 ##不同的yum配置 数目会有所不一样
2.3 防火墙
关闭防火墙
[root@hue~]# systemctl stop firewalld
[root@hue~]# systemctl disable firewalld
2.4 官方依赖
centos:
Ÿ Oracle's JDK
Ÿ ant
Ÿ asciidoc
Ÿ cyrus-sasl-devel
Ÿ cyrus-sasl-gssapi
Ÿ cyrus-sasl-plain
Ÿ gcc
Ÿ gcc-c++
Ÿ krb5-devel
Ÿ libffi-devel
Ÿ libtidy (for unit tests only)
Ÿ libxml2-devel
Ÿ libxslt-devel
Ÿ make
Ÿ mvn (from apache-maven package or maven3 tarball)
Ÿ mysql
Ÿ mysql-devel
Ÿ openldap-devel
Ÿ python-devel
Ÿ sqlite-devel
Ÿ openssl-devel (for version 7+)
Ÿ gmp-devel
2.5 添加用户
Ø 添加用户:
[root@hue~]#useradd hadoop
Ø 设置密码:
[root@hue~]#passwd hadoop
并输入两次以设置密码
3 安装部署
3.1 集群
Ø hadoop001~006上安装hadoop集群。参考《省份大数据平台部署运维规范(贵州模版)》。
3.2 beh软件包
Ø 将hadoop集群任意节点中的beh软件包scp到hue主机上预留一份
3.3 安装依赖
Ø 在hue主机上安装依赖
[root@hue~]# yum -y install asciidoc cyrus-sasl-devel cyrus-sasl-gssapi cyrus-sasl-plain gcc gcc-c++ krb5-devel libffi-devel libtidy libxml2-devel libxslt-devel make mysql mysql-devel openldap-devel python-devel sqlite-devel openssl-devel gmp-devel
特别检查asciidoc、libffi-devel是否正确安装,若无yum源,请自行下载安装。
3.3.1 JDK
[root@hue~]# vi /etc/profile
export JAVA_HOME=/opt/beh/core/jdk
export PATH=$JAVA_HOME/bin:$PATH
export CLASSPATH=.:$JAVA_HOME/lib/dt.jar:$JAVA_HOME/lib/tools.jar
[root@hue~]# source /etc/profile
[root@hue~]# java –version
3.3.2 ant
Ø 下载解压
[root@hue~]# wget http://mirror.bit.edu.cn/apache/ant/binaries/apache-ant-1.9.7-bin.tar.gz
[root@hue~]# tar -zxvf apache-ant-1.9.7-bin.tar.gz –C /usr/local
Ø 切到local目录
[root@hue local]# mv apache-ant-1.9.7 ant
[root@hue local]# vi /etc/profile
export ANT_HOME=/usr/local/ant
export PATH=$ANT_HOME/bin:$PATH
[root@hue local]# source /etc/profile
[root@hue local]# ant –v
3.3.3 mvn
Ø 下载解压
[root@hue~]# wget http://mirror.bit.edu.cn/apache/maven/maven-3/3.3.9/binaries/apache-maven-3.3.9-bin.tar.gz
[root@hue~]# tar -zxvf apache-maven-3.3.9-bin.tar.gz –C /usr/local
Ø 切到local目录
[root@hue local]# mv apache-maven-3.3.9 maven
[root@hue local]# vi /etc/profile
export M2_HOME=/usr/local/maven
export PATH=$M2_HOME/bin:$PATH
[root@hue local]# source /etc/profile
[root@hue local]# mvn –v
3.3.4 更新pytz
[root@hue~]# easy_install --upgrade pytz
3.3.5 更新cffi
[root@hue~]# easy_install --upgrade cffi
3.4 编译安装
3.4.1 安装
Ø 下载并上传
http://gethue.com/category/release/ download最新版本hue,当前最新hue-3-10
Ø 解压到hue家目录中并重命名文件夹
[root@hue~]# tar -zxvf hue-3.10.0.tgz –C /home/hue/
[root@hue~]# cd /home/hue
[root@hue hue]# mv hue-3.10.0 hue
3.4.2 编译
[root@hue hue]# cd hue
[root@hue hue]# make apps
3.4.3 测试
[root@hue hue]# build/env/bin/hue runserver
可在http://localhost:8000 web页面看到hue的登录界面
3.4.4 常见问题
编译出错,缺少依赖,安装pip,缺什么依赖通过pip install xxxx进行安装,然后执行make clean后再次编译。
time out,与目标服务器的连接不稳定,执行make clean后再次编译。
3.4.5 权限
[root@hue hue]# chown –R hadoop:hadoop hue/
4 配置
4.1 配置各组件及所需服务开启
默认配置文件:$HUE_HOME/desktop/conf/hue.ini
bin目录:$HUE_HOME/build/env/bin/
4.1.1 webserver listens
http_host=192.168.1.207
http_port=8000
4.1.2 修改hadoop用户权限
# Webserver runs as this user
server_user=hadoop
server_group=hadoop
# This should be the Hue admin and proxy user
default_user=hue
# This should be the hadoop cluster admin
default_hdfs_superuser=hadoop
4.1.3 使用mysql数据库
4.1.3.1 修改配置文件
# Note for MariaDB use the 'mysql' engine.
engine=mysql
host=hadoop001
port=3306
user=hue
password=hue
# Execute this script to produce the database password. This will be used when `password` is not set.
## password_script=/path/script
name=hue
4.1.3.2 创建hue相关mysql数据
[hadoop@hadoop001 ~]$ mysql –u root –p
Ø 执行创建数据库、创建用户并授权操作
create database hue;
grant all on hue.* to 'hue'@'%' identified by 'hue';
grant all on hue.* to 'hue'@'localhost' identified by 'hue';
grant all on hue.* to 'hue'@'hadoop001' identified by 'hue';
4.1.3.3 同步初始化数据
[hadoop@hue home]$ hue/hue/build/env/bin/hue syncdb
[hadoop@hue home]$ hue/hue/build/env/bin/hue migrate
4.1.4 HDFS
配置HDFS用于支持其他功能的使用。
需要将hdfs-site.xml中webhdfs权限开启,并重启服务
<property>
<name>dfs.webhdfs.enabled</name>
<value>true</value>
</property>
然后修改hue的配置文件
[[[default]]]
# Enter the filesystem uri
fs_defaultfs=hdfs://beh
# NameNode logical name.
## logical_name=
# Use WebHdfs/HttpFs as the communication mechanism.
# Domain should be the NameNode or HttpFs host.
# Default port is 14000 for HttpFs.
webhdfs_url=http://192.168.1.201:50070/webhdfs/v1
# Change this if your HDFS cluster is Kerberos-secured
security_enabled=true
hadoop_conf_dir=$HADOOP_CONF_DIR when set or '/opt/beh/core/hadoop/etc/hadoop/conf'
4.1.5 YARN
[[yarn_clusters]]
# Enter the host on which you are running the ResourceManager
resourcemanager_host=192.168.1.201
# The port where the ResourceManager IPC listens on
resourcemanager_port=23140
# Whether to submit jobs to this cluster
submit_to=True
# Resource Manager logical name (required for HA)
## logical_name=
# Change this if your YARN cluster is Kerberos-secured
## security_enabled=false
# URL of the ResourceManager API
resourcemanager_api_url=http://hadoop001:23188
# URL of the ProxyServer API
proxy_api_url=http://hadoop001:8088
# URL of the HistoryServer API
history_server_api_url=http://hadoop001:19888
# URL of the Spark History Server
spark_history_server_url=http://hadoop001:18088
# In secure mode (HTTPS), if SSL certificates from YARN Rest APIs
# have to be verified against certificate authority
## ssl_cert_ca_verify=True
[[yarn_clusters]]
# Enter the host on which you are running the ResourceManager
resourcemanager_host=192.168.1.201
# The port where the ResourceManager IPC listens on
resourcemanager_port=23140
# Whether to submit jobs to this cluster
submit_to=True
# Resource Manager logical name (required for HA)
## logical_name=
# Change this if your YARN cluster is Kerberos-secured
## security_enabled=false
# URL of the ResourceManager API
resourcemanager_api_url=http://hadoop001:23188
# URL of the ProxyServer API
proxy_api_url=http://hadoop001:8088
# URL of the HistoryServer API
history_server_api_url=http://hadoop001:19888
# URL of the Spark History Server
spark_history_server_url=http://hadoop001:18088
# In secure mode (HTTPS), if SSL certificates from YARN Rest APIs
# have to be verified against certificate authority
## ssl_cert_ca_verify=True
4.1.6 HA
# HA support by specifying multiple clusters.
# Redefine different properties there.
# e.g.
[[[ha]]]
# Resource Manager logical name (required for HA)
logical_name=beh
# Un-comment to enable
submit_to=True
# URL of the ResourceManager API
resourcemanager_api_url=http://hadoop003:23188
4.1.7 Hive
Hive需要hadoop001开启thriftserver2服务
[hadoop@hadoop001 ~]$ hive_start hiveserver2 &
Ø 然后修改hue配置文件
[beeswax]
# Host where HiveServer2 is running.
# If Kerberos security is enabled, use fully-qualified domain name (FQDN).
hive_server_host=192.168.1.201
# Port where HiveServer2 Thrift server runs on.
hive_server_port=10000
# Hive configuration directory, where hive-site.xml is located
hive_conf_dir=/opt/beh/core/hive/conf
4.1.8 HBase
HBase 需要hadoop001开启ThriftServer服务
[hadoop@hadoop001 ~]$ hbase-daemon.sh start thrift
Ø 然后修改hue配置文件
[hbase]
# Comma-separated list of HBase Thrift servers for clusters in the format of '(name|host:port)'.
# Use full hostname with security.
# If using Kerberos we assume GSSAPI SASL, not PLAIN.
hbase_clusters=(Cluster|192.168.1.201:9090)
#hbase_clusters=(hadoop|192.168.1.201:9090)
# HBase configuration directory, where hbase-site.xml is located.
hbase_conf_dir=/opt/beh/core/hbase/conf
4.1.9 ZooKeeper
只需要hadoop集群开启zookeeper服务即可
Ø 修改hue配置文件
[zookeeper]
[[clusters]]
[[[default]]]
# Zookeeper ensemble. Comma separated list of Host/Port.
# e.g. localhost:2181,localhost:2182,localhost:2183
host_ports=192.168.1.201:2181,192.168.1.202:2181,192.168.1.203:2181
# The URL of the REST contrib service (required for znode browsing).
rest_url=http://192.168.1.201:9998
# Name of Kerberos principal when using security.
## principal_name=zookeeper
4.1.10 Spark
Spark需要livy-server,spark-Jobserver,spark-thrift-server
Ø hue上安装启动Livv-server
[hadoop@hue ~]# git clone http://archive.cloudera.com/beta/livy/livy-server-0.2.0.zip
[hadoop@hue ~]# sudo zip livy-server-0.2.0.zip
[hadoop@hue ~]# cd livy-server-0.2.0
[hadoop@hue ~]# ./bin/livy-server
Ø hadoop001上启动spark-thrift-server
[hadoop@hadoop001 ~]# cd /opt/beh/core/spark/sbin
[hadoop@hadoop001 sbin]# ./start-thriftserver.sh --hiveconf hive.server2.thrift.port=10001
Ø hadoop001上启动spark-jobserver
[hadoop@hadoop001 ~]# rpm -ivh https://dl.bintray.com/sbt/rpm/sbt-0.13.6.rpm
[hadoop@hadoop001 ~]# git clone https://github.com/ooyala/spark-jobserver.git
[hadoop@hadoop001 ~]# cd spark-jobserver
[hadoop@hadoop001 ~]# sbt
> re-start
Ø 修改hue配置文件
spark]
# Host address of the Livy Server.
livy_server_host=hue
# Port of the Livy Server.
livy_server_port=8998
# Configure livy to start in local 'process' mode, or 'yarn' workers.
livy_server_session_kind=process
# If livy should use proxy users when submitting a job.
## livy_impersonation_enabled=true
# Host of the Sql Server
sql_server_host=192.168.1.201
# Port of the Sql Server
sql_server_port=10001
4.1.11 File Browser
需要将hdfs-site.xml中用户认证关闭,并重启服务。否则报类似cannot connect的错误。
<property>
<name>dfs.permissions.enabled</name>
<value>false</value>
</property>
需要在core-site.xml中添加如下代理用户配置,并重启服务。否则可能报类似cannot impersonate的错误。
<property>
<name>hadoop.proxyuser.hue.hosts</name>
<value>*</value>
</property>
<property>
<name>hadoop.proxyuser.hue.groups</name>
<value>*</value>
</property>