官网:http://gethue.com/ 源码:http://github.com/cloudera/hue hue安装指南:http://archive.cloudera.com/cdh5/cdh/5/hue-3.7.0-cdh5.3.6/
Hue是一个开源的Apache Hadoop UI系统,最早是由Cloudera Desktop演化而来,由Cloudera贡献给开源社区,它是基于Python Web框架Django实现的。通过使用Hue我们可以在浏览器端的Web控制台上与hadoop集群进行交互来分析处理数据,例如操作HDFS上的数据,运行MapReduce Job等等。简单就理解就是hadoop可视化工具,杠杠的...
- 基于文件浏览器(File Browser)访问HDFS
- 基于Hive编辑器来开发和运行Hive查询
- 支持基于Solr进行搜索的应用,并提供可视化的数据视图,以及仪表板(Dashboard)
- 支持基于Impala的应用进行交互式查询
- 支持Spark编辑器和仪表板(Dashboard)
- 支持Pig编辑器,并能够提交脚本任务
- 支持Oozie编辑器,可以通过仪表板提交和监控Workflow、Coordinator和Bundle
- 支持HBase浏览器,能够可视化数据、查询数据、修改HBase表
- 支持Metastore浏览器,可以访问Hive的元数据,以及HCatalog
- 支持Job浏览器,能够访问MapReduce Job(MR1/MR2-YARN)
- 支持Job设计器,能够创建MapReduce/Streaming/Java Job
- 支持Sqoop 2编辑器和仪表板(Dashboard)
- 支持ZooKeeper浏览器和编辑器
- 支持MySql、PostGresql、Sqlite和Oracle数据库查询编辑器
$ su -
# yum -y install ant asciidoc cyrus-sasl-devel cyrus-sasl-gssapi gcc gcc-c++ krb5-devel libtidy libxml2-devel libxslt-devel openldap-devel python-devel sqlite-devel openssl-devel mysql-devel gmp-devel
2、解压、编译hue
$ su - tom
$ tar zxvf /opt/softwares/hue-3.7.0-cdh5.3.6.tar.gz
$ cd hue-3.7.0-cdh5.3.6/
$ make apps
3、修改desktop/conf/hue.ini
[desktop]
# 本密钥是从官方文档复制过来的,随便填什么都可以
secret_key=jFE93j;2[290-eiw.KEiwN2s3['d;/.q[eIW^y#e=+Iei*@Mn<qW5o
# 主机、端口
http_host=blue01.mydomain
http_port=8888
# 时区
time_zone=Asia/Shanghai
4、启动Hue
$ build/env/bin/supervisor
$ netstat -an |grep 8888 --查看端口
5、打开浏览器
http://[hostname]:8888
创建超级用户:hue/hue
----Hue与Hadoop集成---------------------------
1、a)配置etc/Hadoop/hdfs-site.xml:
<property>
<name>dfs.webhdfs.enabled</name>
<value>true</value>
</property>
<!--允许(权限)检查-->
<property>
<name>dfs.permissions.enabled</name>
<value>false</value>
</property>
b)配置etc/Hadoop/core-site.xml:
<property>
<name>hadoop.proxyuser.hue.hosts</name>
<value>*</value>
</property>
<property>
<name>hadoop.proxyuser.hue.groups</name>
<value>*</value>
</property>
c)重启Hadoop服务!!
2、修改desktop/conf/hue.ini:
[[hdfs_clusters]]
# HA support by using HttpFs
[[[default]]]
# Enter the filesystem uri
fs_defaultfs=hdfs:/[hostname]:8020
# NameNode logical name.
## logical_name=
# Use WebHdfs/HttpFs as the communication mechanism.
# Domain should be the NameNode or HttpFs host.
# Default port is 14000 for HttpFs.
webhdfs_url=http://[hostname]:50070/webhdfs/v1
# Change this if your HDFS cluster is Kerberos-secured
## security_enabled=false
# Default umask for file and directory creation, specified in an octal value.
## umask=022
# Directory of the Hadoop configuration
hadoop_conf_dir=/opt/modules/cdh/hadoop-2.5.0-cdh5.3.6/etc/hadoop
hadoop_hdfs_home=/opt/modules/cdh/hadoop-2.5.0-cdh5.3.6/
hadoop_bin =/opt/modules/cdh/hadoop-2.5.0-cdh5.3.6/bin/
[[yarn_clusters]]
[[[default]]]
# Enter the host on which you are running the ResourceManager
resourcemanager_host=[hostname]
# The port where the ResourceManager IPC listens on
resourcemanager_port=8032
# Whether to submit jobs to this cluster
submit_to=True
# Resource Manager logical name (required for HA)
## logical_name=
# Change this if your YARN cluster is Kerberos-secured
## security_enabled=false
# URL of the ResourceManager API
resourcemanager_api_url=http://[hostname]:8088
# URL of the ProxyServer API
proxy_api_url=http://[hostname]:8088
# URL of the HistoryServer API
history_server_api_url=http://[hostname]:19888
3、启动Hue
$ build/env/bin/supervisor
PS:
正常是使用Ctrl+C关闭,若是无法关闭,则
$ ps -ef |grep supervisor
** ps -A 显示所有程序,-e的效果和"A"相同
** ps f 显示程序间的相互关系
UID PID PPID C STIME TTY TIME CMD
tom 17604 16613 0 19:48 pts/2 00:00:01 /opt/modules/cdh/hue-3.7.0-cdh5.3.6/build/env/bin/python2.6 build/env/bin/supervisor
UID 程序被该 UID 所拥有
PID 就是这个程序的 ID
PPID 则是其上级父程序的ID
C CPU使用的资源百分比
STIME 启动时间
TTY 登入者的终端机位置
TIME 使用掉的 CPU 时间。
CMD 命令
关闭进程:$ kill -9 17604
若还是杀不掉,则$ netstat -antp |grep 8888
tcp 0 0 192.168.122.128:8888 0.0.0.0:* LISTEN 17607/python2.6
tcp 0 0 192.168.122.128:8888 192.168.122.1:54763 ESTABLISHED 17607/python2.6
$ kill -9 17607
----Hue与Hive集成-----------------------------------
修改hive-site.xml:
<property>
<name>hive.server2.thrift.port</name>
<value>10000</value>
</property>
<property>
<name>hive.server2.thrift.bind.host</name>
<value>[hostname]</value>
</property>
<property>
<name>hive.server2.long.polling.timeout</name>
<value>5000</value>
</property>
<!--thriftserver服务,该属性先不要配,不行再说-->
<property>
<name>hive.metastore.uris</name>
<value>thrift://[hostname]:9083</value>
<description>Thrift URI for the remote metastore. Used by metastore client to connect to remote metastore.</description>
</property>
启动hiveserver2和metastore:
bin/hive --service hiveserver2
bin/hive --service metastore
配置Hue.ini
[beeswax]
# Host where HiveServer2 is running.
# If Kerberos security is enabled, use fully-qualified domain name (FQDN).
hive_server_host=blue01.mydomain
# Port where HiveServer2 Thrift server runs on.
hive_server_port=10000
# Hive configuration directory, where hive-site.xml is located
hive_conf_dir=/opt/modules/cdh/hive-0.13.1-cdh5.3.6/conf
启动Hue
$ build/env/bin/supervisor
----Hue与mysql集成-------------------------------------
** 注意:此处一定要打开注释
[[[mysql]]]
# Name to show in the UI.
nice_name="My SQL DB"
# For MySQL and PostgreSQL, name is the name of the database.
# For Oracle, Name is instance of the Oracle server. For express edition
# this is 'xe' by default.
## name=mysqldb
# Database backend to use. This can be:
# 1. mysql
# 2. postgresql
# 3. oracle
engine=mysql
# IP or hostname of the database to connect to.
host=blue01.mydomain
# Port the database server is listening to. Defaults are:
# 1. MySQL: 3306
# 2. PostgreSQL: 5432
# 3. Oracle Express Edition: 1521
port=3306
# Username to authenticate with when connecting to the database.
user=root
# Password matching the username to authenticate with when
# connecting to the database.
password=root
启动Hue
$ build/env/bin/supervisor
----Hue与zookeeper集成----------------------------------
[zookeeper]
[[clusters]]
[[[default]]]
# Zookeeper ensemble. Comma separated list of Host/Port.
# e.g. localhost:2181,localhost:2182,localhost:2183
host_ports=host1:2181,host2:2181
----Hue与oozie集成--------------------------------------
1、修改oozie-site.xml:
<!--此步骤也可以不做-->
<property>
<name>oozie.processing.timezone</name>
<value>UTC</value>
</property>
2、修改hue.ini:
[liboozie]
# The URL where the Oozie service runs on. This is required in order for
# users to submit jobs. Empty value disables the config check.
oozie_url=http://vampire04.mydomain:11000/oozie
# Requires FQDN in oozie_url if enabled
## security_enabled=false
# Location on HDFS where the workflows/coordinator are deployed when submitted.
remote_deployement_dir=/user/user01/oozie-apps/
[oozie]
# Location on local FS where the examples are stored.
local_data_dir=/opt/modules/cdh/oozie-4.0.0-cdh5.3.6/examples
# Location on local FS where the data for the examples is stored.
sample_data_dir=/opt/modules/cdh/oozie-4.0.0-cdh5.3.6/oozie-apps
# Location on HDFS where the oozie examples and workflows are stored.
remote_data_dir=/user/vampire04/oozie-apps/
# Maximum of Oozie workflows or coodinators to retrieve in one API call.
## oozie_jobs_count=100
# Use Cron format for defining the frequency of a Coordinator instead of the old frequency number/unit.
enable_cron_scheduling=true
启动oozie
$ bin/oozied.sh start
重启hue
$ build/env/bin/supervisor
进入Workflow--Editors,点击左上角的“oozie编辑器”
----Hue与Hbase集成--------------------------------------
HUE跟Hbase通讯是通过 hbase-thrift,我们先在[hostname]上启动 hbase-thrift 服务
service hbase-thrift start
修改desktop/conf/hue.ini
[hbase]
# Comma-separated list of HBase Thrift servers for clusters in the format of '(name|host:port)'.
# Use full hostname with security.
hbase_clusters=(Cluster|[hostname]:9090)
# HBase configuration directory, where hbase-site.xml is located.
## hbase_conf_dir=/etc/hbase/conf
# Hard limit of rows or columns per row fetched before truncating.
## truncate_limit = 500
# 'buffered' is the default of the HBase Thrift Server and supports security.
# 'framed' can be used to chunk up responses,
# which is useful when used in conjunction with the nonblocking server in Thrift.
## thrift_transport=buffered
这边的 (Cluster|host1:9090) 里面的 Cluster并不是你的HDFS集群名字,只是一个显示在HUE界面上的文字,所以可以随便写,
我这边保留 Cluster字样,后面的host1:9090是thrift的访问地址,如果有多个用逗号分隔
===================================================================================================