Hue安装部署

最新推荐文章于 2024-07-30 17:40:08 发布

chenlouzhen1603

最新推荐文章于 2024-07-30 17:40:08 发布

阅读量448

点赞数

文章标签：数据库 git java

原文链接：https://my.oschina.net/u/2547078/blog/810319

版权

1 Hue简介

1.1 Hue介绍

Hue是一个开源的Apache Hadoop UI系统，最早是由Cloudera Desktop演化而来，由Cloudera贡献给开源社区，它是基于Python Web框架Django实现的。通过使用Hue我们可以在浏览器端的Web控制台上与Hadoop集群进行交互来分析处理数据。

1.2 功能和特性集合

Ø 默认基于轻量级sqlite数据库管理会话数据，用户认证和授权，可以自定义为MySQL、Postgresql，以及Oracle

Ø 基于文件浏览器（File Browser）访问HDFS

Ø 基于Hive编辑器来开发和运行Hive查询

Ø 支持基于Solr进行搜索的应用，并提供可视化的数据视图，以及仪表板（Dashboard）

Ø 支持基于Impala的应用进行交互式查询

Ø 支持Spark编辑器和仪表板（Dashboard）

Ø 支持Pig编辑器，并能够提交脚本任务

Ø 支持Oozie编辑器，可以通过仪表板提交和监控Workflow、Coordinator和Bundle

Ø 支持HBase浏览器，能够可视化数据、查询数据、修改HBase表

Ø 支持Metastore浏览器，可以访问Hive的元数据，以及HCatalog

Ø 支持Job浏览器，能够访问MapReduce Job（MR1/MR2-YARN）

Ø 支持Job设计器，能够创建MapReduce/Streaming/Java Job

Ø 支持Sqoop 2编辑器和仪表板（Dashboard）

Ø 支持ZooKeeper浏览器和编辑器

Ø 支持MySql、PostGresql、Sqlite和Oracle数据库查询编辑器

1.3 Hadoop集群服务分布图

1.3.1 ZOOKEEPER&HDFS

服务器	ZOOKEEPER	HDFS
IP地址	主机名	zookeeper	journalnode	namenode	zkfc	datanode
192.168.1.201	hadoop001	√	√	√	√
192.168.1.202	hadoop002	√	√			√
192.168.1.203	hadoop003	√	√	√	√
192.168.1.204	hadoop004					√
192.168.1.205	hadoop005					√
192.168.1.206	hadoop006					√
192.168.1.207	Hue

1.3.2 YARN

服务器	YARN
IP地址	主机名	resourcemanager	nodemanager	timeline	JobHistory
192.168.1.201	hadoop001	√		√	√
192.168.1.202	hadoop002		√
192.168.1.203	hadoop003	√
192.168.1.204	hadoop004		√
192.168.1.205	hadoop005		√
192.168.1.206	hadoop006		√
192.168.1.207	hue

1.3.3 HIVE&MYSQL

服务器	HIVE	MYSQL
IP地址	主机名	metastore	hiveserver2	mariadb
192.168.1.201	hadoop001	√	√	√(主)
192.168.1.202	hadoop002
192.168.1.203	hadoop003			√(备)
192.168.1.204	hadoop004
192.168.1.205	hadoop005
192.168.1.206	hadoop006
192.168.1.207	hue

1.3.4 HBASE

服务器	HBASE
IP地址	主机名	master	regionserver
192.168.1.201	hadoop001	√
192.168.1.202	hadoop002		√
192.168.1.203	hadoop003	√
192.168.1.204	hadoop004		√
192.168.1.205	hadoop005		√
192.168.1.206	hadoop006		√
192.168.1.207	hue

1.4 Hue部署情况

192.168.1.207
组件	完成情况
HIVE	√
HBASE	√
SPARKSQL	√
HDFS	√
ZOOKEEPER	√
DBQuery
Impala
HDFS
Job Browser	√
Ooize
Pig
Notebook	√

2 环境及依赖

2.1 操作系统

Description: CentOS Linux release 7.2.1511 (Core)

Release: 7.2.1511

2.2 配置局域网YUM源

根据实际需要配置局域网YUM源。

【参考配置】如下

===============================================================================

在/etc/yum.repos.d目录下创建local-yum.repo文件

[root@hue~]# mkdir /etc/yum.repos.d/local-yum.repo

该文件的内容如下：

[local_yum]

name=local_yum

baseurl=ftp://192.168.8.211/vg0_lv1/Packages/

enabled=1

gpgcheck=0

##enabled=1 查看是否已经创建成功,yum list显示存在lan的包则成功

## /etc/yum.repos.d/目录下可能会有系统自带的yum文件，需要先删掉才能识别自己创建的yum文件，记得删除前需备份。

接下来删除/etc/yum.repos.d目录下除local-yum.repo外的所有文件

[root@hue yum.repos.d]# ll

total 28

-rw-r--r-- 1 root root 71 Aug 17 14:32 centos_211.repo

-rw-r--r--. 1 root root 1991 Oct 23 2014 CentOS-Base.repo

-rw-r--r--. 1 root root 647 Oct 23 2014 CentOS-Debuginfo.repo

-rw-r--r--. 1 root root 289 Oct 23 2014 CentOS-fasttrack.repo

-rw-r--r--. 1 root root 630 Oct 23 2014 CentOS-Media.repo

-rw-r--r--. 1 root root 5394 Oct 23 2014 CentOS-Vault.repo

[root@hue yum.repos.d]# rm -rf CentOS-*

[root@hue yum.repos.d]# ll

配完local-yum.repo文件后执行下面语句查看yum源是否配置成。

yum clean all

yum makecache

yum list |wc -l ##执行后显示yum数目，这里显示的就是6353

6353 ##不同的yum配置数目会有所不一样

2.3 防火墙

关闭防火墙

[root@hue~]# systemctl stop firewalld

[root@hue~]# systemctl disable firewalld

2.4 官方依赖

centos：

Ÿ Oracle's JDK

Ÿ ant

Ÿ asciidoc

Ÿ cyrus-sasl-devel

Ÿ cyrus-sasl-gssapi

Ÿ cyrus-sasl-plain

Ÿ gcc

Ÿ gcc-c++

Ÿ krb5-devel

Ÿ libffi-devel

Ÿ libtidy (for unit tests only)

Ÿ libxml2-devel

Ÿ libxslt-devel

Ÿ make

Ÿ mvn (from apache-maven package or maven3 tarball)

Ÿ mysql

Ÿ mysql-devel

Ÿ openldap-devel

Ÿ python-devel

Ÿ sqlite-devel

Ÿ openssl-devel (for version 7+)

Ÿ gmp-devel

2.5 添加用户

Ø 添加用户：

[root@hue~]#useradd hadoop

Ø 设置密码：

[root@hue~]#passwd hadoop

并输入两次以设置密码

3 安装部署

3.1 集群

Ø hadoop001~006上安装hadoop集群。参考《省份大数据平台部署运维规范(贵州模版)》。

3.2 beh软件包

Ø 将hadoop集群任意节点中的beh软件包scp到hue主机上预留一份

3.3 安装依赖

Ø 在hue主机上安装依赖

[root@hue~]# yum -y install asciidoc cyrus-sasl-devel cyrus-sasl-gssapi cyrus-sasl-plain gcc gcc-c++ krb5-devel libffi-devel libtidy libxml2-devel libxslt-devel make mysql mysql-devel openldap-devel python-devel sqlite-devel openssl-devel gmp-devel

特别检查asciidoc、libffi-devel是否正确安装，若无yum源，请自行下载安装。

3.3.1 JDK

[root@hue~]# vi /etc/profile

export JAVA_HOME=/opt/beh/core/jdk

export PATH=$JAVA_HOME/bin:$PATH

export CLASSPATH=.:$JAVA_HOME/lib/dt.jar:$JAVA_HOME/lib/tools.jar

[root@hue~]# source /etc/profile

[root@hue~]# java –version

3.3.2 ant

Ø 下载解压

[root@hue~]# wget http://mirror.bit.edu.cn/apache/ant/binaries/apache-ant-1.9.7-bin.tar.gz

[root@hue~]# tar -zxvf apache-ant-1.9.7-bin.tar.gz –C /usr/local

Ø 切到local目录

[root@hue local]# mv apache-ant-1.9.7 ant

[root@hue local]# vi /etc/profile

export ANT_HOME=/usr/local/ant

export PATH=$ANT_HOME/bin:$PATH

[root@hue local]# source /etc/profile

[root@hue local]# ant –v

3.3.3 mvn

Ø 下载解压

[root@hue~]# wget http://mirror.bit.edu.cn/apache/maven/maven-3/3.3.9/binaries/apache-maven-3.3.9-bin.tar.gz

[root@hue~]# tar -zxvf apache-maven-3.3.9-bin.tar.gz –C /usr/local

Ø 切到local目录

[root@hue local]# mv apache-maven-3.3.9 maven

[root@hue local]# vi /etc/profile

export M2_HOME=/usr/local/maven

export PATH=$M2_HOME/bin:$PATH

[root@hue local]# source /etc/profile

[root@hue local]# mvn –v

3.3.4 更新pytz

[root@hue~]# easy_install --upgrade pytz

3.3.5 更新cffi

[root@hue~]# easy_install --upgrade cffi

3.4 编译安装

3.4.1 安装

Ø 下载并上传

http://gethue.com/category/release/ download最新版本hue，当前最新hue-3-10

Ø 解压到hue家目录中并重命名文件夹

[root@hue~]# tar -zxvf hue-3.10.0.tgz –C /home/hue/

[root@hue~]# cd /home/hue

[root@hue hue]# mv hue-3.10.0 hue

3.4.2 编译

[root@hue hue]# cd hue

[root@hue hue]# make apps

3.4.3 测试

[root@hue hue]# build/env/bin/hue runserver

可在http://localhost:8000 web页面看到hue的登录界面

3.4.4 常见问题

编译出错，缺少依赖，安装pip，缺什么依赖通过pip install xxxx进行安装，然后执行make clean后再次编译。

time out，与目标服务器的连接不稳定，执行make clean后再次编译。

3.4.5 权限

[root@hue hue]# chown –R hadoop:hadoop hue/

4 配置

4.1 配置各组件及所需服务开启

默认配置文件:$HUE_HOME/desktop/conf/hue.ini

bin目录:$HUE_HOME/build/env/bin/

4.1.1 webserver listens

http_host=192.168.1.207

http_port=8000

4.1.2 修改hadoop用户权限

# Webserver runs as this user

server_user=hadoop

server_group=hadoop

# This should be the Hue admin and proxy user

default_user=hue

# This should be the hadoop cluster admin

default_hdfs_superuser=hadoop

4.1.3 使用mysql数据库

4.1.3.1 修改配置文件

# Note for MariaDB use the 'mysql' engine.

engine=mysql

host=hadoop001

port=3306

user=hue

password=hue

# Execute this script to produce the database password. This will be used when `password` is not set.

## password_script=/path/script

name=hue

4.1.3.2 创建hue相关mysql数据

[hadoop@hadoop001 ~]$ mysql –u root –p

Ø 执行创建数据库、创建用户并授权操作

create database hue;

grant all on hue.* to 'hue'@'%' identified by 'hue';

grant all on hue.* to 'hue'@'localhost' identified by 'hue';

grant all on hue.* to 'hue'@'hadoop001' identified by 'hue';

4.1.3.3 同步初始化数据

[hadoop@hue home]$ hue/hue/build/env/bin/hue syncdb

[hadoop@hue home]$ hue/hue/build/env/bin/hue migrate

4.1.4 HDFS

配置HDFS用于支持其他功能的使用。

需要将hdfs-site.xml中webhdfs权限开启，并重启服务

<name>dfs.webhdfs.enabled</name>

</property>

然后修改hue的配置文件

[[[default]]]

# Enter the filesystem uri

fs_defaultfs=hdfs://beh

# NameNode logical name.

## logical_name=

# Use WebHdfs/HttpFs as the communication mechanism.

# Domain should be the NameNode or HttpFs host.

# Default port is 14000 for HttpFs.

webhdfs_url=http://192.168.1.201:50070/webhdfs/v1

# Change this if your HDFS cluster is Kerberos-secured

security_enabled=true

hadoop_conf_dir=$HADOOP_CONF_DIR when set or '/opt/beh/core/hadoop/etc/hadoop/conf'

4.1.5 YARN

[[yarn_clusters]]

# Enter the host on which you are running the ResourceManager

resourcemanager_host=192.168.1.201

# The port where the ResourceManager IPC listens on

resourcemanager_port=23140

# Whether to submit jobs to this cluster

submit_to=True

# Resource Manager logical name (required for HA)

## logical_name=

# Change this if your YARN cluster is Kerberos-secured

## security_enabled=false

# URL of the ResourceManager API

resourcemanager_api_url=http://hadoop001:23188

# URL of the ProxyServer API

proxy_api_url=http://hadoop001:8088

# URL of the HistoryServer API

history_server_api_url=http://hadoop001:19888

# URL of the Spark History Server

spark_history_server_url=http://hadoop001:18088

# In secure mode (HTTPS), if SSL certificates from YARN Rest APIs

# have to be verified against certificate authority

## ssl_cert_ca_verify=True

[[yarn_clusters]]

# Enter the host on which you are running the ResourceManager

resourcemanager_host=192.168.1.201

# The port where the ResourceManager IPC listens on

resourcemanager_port=23140

# Whether to submit jobs to this cluster

submit_to=True

# Resource Manager logical name (required for HA)

## logical_name=

# Change this if your YARN cluster is Kerberos-secured

## security_enabled=false

# URL of the ResourceManager API

resourcemanager_api_url=http://hadoop001:23188

# URL of the ProxyServer API

proxy_api_url=http://hadoop001:8088

# URL of the HistoryServer API

history_server_api_url=http://hadoop001:19888

# URL of the Spark History Server

spark_history_server_url=http://hadoop001:18088

# In secure mode (HTTPS), if SSL certificates from YARN Rest APIs

# have to be verified against certificate authority

## ssl_cert_ca_verify=True

4.1.6 HA

# HA support by specifying multiple clusters.

# Redefine different properties there.

# e.g.

[[[ha]]]

# Resource Manager logical name (required for HA)

logical_name=beh

# Un-comment to enable

submit_to=True

# URL of the ResourceManager API

resourcemanager_api_url=http://hadoop003:23188

4.1.7 Hive

Hive需要hadoop001开启thriftserver2服务

[hadoop@hadoop001 ~]$ hive_start hiveserver2 &

Ø 然后修改hue配置文件

[beeswax]

# Host where HiveServer2 is running.

# If Kerberos security is enabled, use fully-qualified domain name (FQDN).

hive_server_host=192.168.1.201

# Port where HiveServer2 Thrift server runs on.

hive_server_port=10000

# Hive configuration directory, where hive-site.xml is located

hive_conf_dir=/opt/beh/core/hive/conf

4.1.8 HBase

HBase 需要hadoop001开启ThriftServer服务

[hadoop@hadoop001 ~]$ hbase-daemon.sh start thrift

Ø 然后修改hue配置文件

[hbase]

# Comma-separated list of HBase Thrift servers for clusters in the format of '(name|host:port)'.

# Use full hostname with security.

# If using Kerberos we assume GSSAPI SASL, not PLAIN.

hbase_clusters=(Cluster|192.168.1.201:9090)

#hbase_clusters=(hadoop|192.168.1.201:9090)

# HBase configuration directory, where hbase-site.xml is located.

hbase_conf_dir=/opt/beh/core/hbase/conf

4.1.9 ZooKeeper

只需要hadoop集群开启zookeeper服务即可

Ø 修改hue配置文件

[zookeeper]

[[clusters]]

[[[default]]]

# Zookeeper ensemble. Comma separated list of Host/Port.

# e.g. localhost:2181,localhost:2182,localhost:2183

host_ports=192.168.1.201:2181,192.168.1.202:2181,192.168.1.203:2181

# The URL of the REST contrib service (required for znode browsing).

rest_url=http://192.168.1.201:9998

# Name of Kerberos principal when using security.

## principal_name=zookeeper

4.1.10 Spark

Spark需要livy-server，spark-Jobserver，spark-thrift-server

Ø hue上安装启动Livv-server

[hadoop@hue ~]# git clone http://archive.cloudera.com/beta/livy/livy-server-0.2.0.zip

[hadoop@hue ~]# sudo zip livy-server-0.2.0.zip

[hadoop@hue ~]# cd livy-server-0.2.0

[hadoop@hue ~]# ./bin/livy-server

Ø hadoop001上启动spark-thrift-server

[hadoop@hadoop001 ~]# cd /opt/beh/core/spark/sbin

[hadoop@hadoop001 sbin]# ./start-thriftserver.sh --hiveconf hive.server2.thrift.port=10001

Ø hadoop001上启动spark-jobserver

[hadoop@hadoop001 ~]# rpm -ivh https://dl.bintray.com/sbt/rpm/sbt-0.13.6.rpm

[hadoop@hadoop001 ~]# git clone https://github.com/ooyala/spark-jobserver.git

[hadoop@hadoop001 ~]# cd spark-jobserver

[hadoop@hadoop001 ~]# sbt

> re-start

Ø 修改hue配置文件

spark]

# Host address of the Livy Server.

livy_server_host=hue

# Port of the Livy Server.

livy_server_port=8998

# Configure livy to start in local 'process' mode, or 'yarn' workers.

livy_server_session_kind=process

# If livy should use proxy users when submitting a job.

## livy_impersonation_enabled=true

# Host of the Sql Server

sql_server_host=192.168.1.201

# Port of the Sql Server

sql_server_port=10001

4.1.11 File Browser

需要将hdfs-site.xml中用户认证关闭，并重启服务。否则报类似cannot connect的错误。

<name>dfs.permissions.enabled</name>

<value>false</value>

</property>

需要在core-site.xml中添加如下代理用户配置，并重启服务。否则可能报类似cannot impersonate的错误。

<name>hadoop.proxyuser.hue.hosts</name>

</property>

<name>hadoop.proxyuser.hue.groups</name>

</property>

转载于:https://my.oschina.net/u/2547078/blog/810319

chenlouzhen1603

关注

0
点赞
踩
0

收藏

觉得还不错? 一键收藏
0
评论
复制链接

分享到 QQ

分享到新浪微博

扫一扫