Hue框架简介

官网:http://gethue.com/ 源码:http://github.com/cloudera/hue hue安装指南:http://archive.cloudera.com/cdh5/cdh/5/hue-3.7.0-cdh5.3.6/

Hue是一个开源的Apache Hadoop UI系统,最早是由Cloudera Desktop演化而来,由Cloudera贡献给开源社区,它是基于Python Web框架Django实现的。通过使用Hue我们可以在浏览器端的Web控制台上与hadoop集群进行交互来分析处理数据,例如操作HDFS上的数据,运行MapReduce Job等等。简单就理解就是hadoop可视化工具,杠杠的...

  • 基于文件浏览器(File Browser)访问HDFS
  • 基于Hive编辑器来开发和运行Hive查询
  • 支持基于Solr进行搜索的应用,并提供可视化的数据视图,以及仪表板(Dashboard)
  • 支持基于Impala的应用进行交互式查询
  • 支持Spark编辑器和仪表板(Dashboard)
  • 支持Pig编辑器,并能够提交脚本任务
  • 支持Oozie编辑器,可以通过仪表板提交和监控Workflow、Coordinator和Bundle
  • 支持HBase浏览器,能够可视化数据、查询数据、修改HBase表
  • 支持Metastore浏览器,可以访问Hive的元数据,以及HCatalog
  • 支持Job浏览器,能够访问MapReduce Job(MR1/MR2-YARN)
  • 支持Job设计器,能够创建MapReduce/Streaming/Java Job
  • 支持Sqoop 2编辑器和仪表板(Dashboard)
  • 支持ZooKeeper浏览器和编辑器
  • 支持MySql、PostGresql、Sqlite和Oracle数据库查询编辑器



1、安装依赖包(需要联网)
$ su -
# yum -y install ant asciidoc cyrus-sasl-devel cyrus-sasl-gssapi gcc gcc-c++ krb5-devel libtidy libxml2-devel libxslt-devel openldap-devel python-devel sqlite-devel openssl-devel mysql-devel gmp-devel


2、解压、编译hue
$ su - tom
$ tar zxvf /opt/softwares/hue-3.7.0-cdh5.3.6.tar.gz
$ cd hue-3.7.0-cdh5.3.6/
$ make apps


3、修改desktop/conf/hue.ini
[desktop]
  # 本密钥是从官方文档复制过来的,随便填什么都可以
  secret_key=jFE93j;2[290-eiw.KEiwN2s3['d;/.q[eIW^y#e=+Iei*@Mn<qW5o
  # 主机、端口
  http_host=blue01.mydomain
  http_port=8888
  # 时区
  time_zone=Asia/Shanghai


4、启动Hue
$ build/env/bin/supervisor
$ netstat -an |grep 8888    --查看端口


5、打开浏览器
http://[hostname]:8888
创建超级用户:hue/hue




----Hue与Hadoop集成---------------------------


1、a)配置etc/Hadoop/hdfs-site.xml:
<property>
  <name>dfs.webhdfs.enabled</name>
  <value>true</value>
</property>
<!--允许(权限)检查-->
<property>
  <name>dfs.permissions.enabled</name>
  <value>false</value>
</property>


b)配置etc/Hadoop/core-site.xml:
<property>
<name>hadoop.proxyuser.hue.hosts</name>
<value>*</value>
</property>
<property>
<name>hadoop.proxyuser.hue.groups</name>
<value>*</value>
</property>


c)重启Hadoop服务!!


2、修改desktop/conf/hue.ini:


  [[hdfs_clusters]]
    # HA support by using HttpFs


    [[[default]]]
      # Enter the filesystem uri
      fs_defaultfs=hdfs:/[hostname]:8020


      # NameNode logical name.
      ## logical_name=


      # Use WebHdfs/HttpFs as the communication mechanism.
      # Domain should be the NameNode or HttpFs host.
      # Default port is 14000 for HttpFs.
      webhdfs_url=http://[hostname]:50070/webhdfs/v1


      # Change this if your HDFS cluster is Kerberos-secured
      ## security_enabled=false


      # Default umask for file and directory creation, specified in an octal value.
      ## umask=022


      # Directory of the Hadoop configuration
      hadoop_conf_dir=/opt/modules/cdh/hadoop-2.5.0-cdh5.3.6/etc/hadoop


 hadoop_hdfs_home=/opt/modules/cdh/hadoop-2.5.0-cdh5.3.6/
 hadoop_bin =/opt/modules/cdh/hadoop-2.5.0-cdh5.3.6/bin/

  [[yarn_clusters]]
    [[[default]]]
      # Enter the host on which you are running the ResourceManager
      resourcemanager_host=[hostname]


      # The port where the ResourceManager IPC listens on
      resourcemanager_port=8032


      # Whether to submit jobs to this cluster
      submit_to=True


      # Resource Manager logical name (required for HA)
      ## logical_name=


      # Change this if your YARN cluster is Kerberos-secured
      ## security_enabled=false


      # URL of the ResourceManager API
      resourcemanager_api_url=http://[hostname]:8088


      # URL of the ProxyServer API
      proxy_api_url=http://[hostname]:8088


      # URL of the HistoryServer API
      history_server_api_url=http://[hostname]:19888


3、启动Hue
$ build/env/bin/supervisor


PS:
正常是使用Ctrl+C关闭,若是无法关闭,则
$ ps -ef |grep supervisor
** ps -A 显示所有程序,-e的效果和"A"相同
** ps f  显示程序间的相互关系


UID PID   PPID   C  STIME  TTY    TIME       CMD   
tom  17604 16613  0  19:48  pts/2  00:00:01   /opt/modules/cdh/hue-3.7.0-cdh5.3.6/build/env/bin/python2.6 build/env/bin/supervisor


UID   程序被该 UID 所拥有
PID   就是这个程序的 ID 
PPID  则是其上级父程序的ID
C     CPU使用的资源百分比
STIME 启动时间
TTY   登入者的终端机位置
TIME  使用掉的 CPU 时间。
CMD   命令


关闭进程:$ kill -9 17604


若还是杀不掉,则$ netstat -antp |grep 8888
tcp        0      0 192.168.122.128:8888        0.0.0.0:*                   LISTEN      17607/python2.6     
tcp        0      0 192.168.122.128:8888        192.168.122.1:54763         ESTABLISHED 17607/python2.6


$ kill -9 17607


----Hue与Hive集成-----------------------------------


修改hive-site.xml:


<property>
<name>hive.server2.thrift.port</name>
<value>10000</value>
</property>
<property>
<name>hive.server2.thrift.bind.host</name>
<value>[hostname]</value>
</property>
<property>
  <name>hive.server2.long.polling.timeout</name>
  <value>5000</value>
</property>


<!--thriftserver服务,该属性先不要配,不行再说-->
<property>
  <name>hive.metastore.uris</name>
  <value>thrift://[hostname]:9083</value>
  <description>Thrift URI for the remote metastore. Used by metastore client to connect to remote metastore.</description>
</property>


启动hiveserver2和metastore:
bin/hive --service hiveserver2
bin/hive --service metastore

配置Hue.ini
[beeswax]
  # Host where HiveServer2 is running.
  # If Kerberos security is enabled, use fully-qualified domain name (FQDN).
  hive_server_host=blue01.mydomain


  # Port where HiveServer2 Thrift server runs on.
  hive_server_port=10000


  # Hive configuration directory, where hive-site.xml is located
  hive_conf_dir=/opt/modules/cdh/hive-0.13.1-cdh5.3.6/conf
  
启动Hue
$ build/env/bin/supervisor


----Hue与mysql集成-------------------------------------


** 注意:此处一定要打开注释
    [[[mysql]]]
      # Name to show in the UI.
      nice_name="My SQL DB"


      # For MySQL and PostgreSQL, name is the name of the database.
      # For Oracle, Name is instance of the Oracle server. For express edition
      # this is 'xe' by default.
      ## name=mysqldb


      # Database backend to use. This can be:
      # 1. mysql
      # 2. postgresql
      # 3. oracle
      engine=mysql


      # IP or hostname of the database to connect to.
      host=blue01.mydomain


      # Port the database server is listening to. Defaults are:
      # 1. MySQL: 3306
      # 2. PostgreSQL: 5432
      # 3. Oracle Express Edition: 1521
      port=3306


      # Username to authenticate with when connecting to the database.
      user=root


      # Password matching the username to authenticate with when
      # connecting to the database.
      password=root
 
启动Hue
$ build/env/bin/supervisor




----Hue与zookeeper集成----------------------------------
[zookeeper]  
  
  [[clusters]]  
  
    [[[default]]]  
      # Zookeeper ensemble. Comma separated list of Host/Port.  
      # e.g. localhost:2181,localhost:2182,localhost:2183  
      host_ports=host1:2181,host2:2181 


----Hue与oozie集成--------------------------------------


1、修改oozie-site.xml:
<!--此步骤也可以不做-->
<property>
<name>oozie.processing.timezone</name>
<value>UTC</value>
</property>

2、修改hue.ini:
[liboozie]
  # The URL where the Oozie service runs on. This is required in order for
  # users to submit jobs. Empty value disables the config check.
  oozie_url=http://vampire04.mydomain:11000/oozie


  # Requires FQDN in oozie_url if enabled
  ## security_enabled=false


  # Location on HDFS where the workflows/coordinator are deployed when submitted.
  remote_deployement_dir=/user/user01/oozie-apps/


[oozie]
  # Location on local FS where the examples are stored.
  local_data_dir=/opt/modules/cdh/oozie-4.0.0-cdh5.3.6/examples


  # Location on local FS where the data for the examples is stored.
  sample_data_dir=/opt/modules/cdh/oozie-4.0.0-cdh5.3.6/oozie-apps


  # Location on HDFS where the oozie examples and workflows are stored.
  remote_data_dir=/user/vampire04/oozie-apps/


  # Maximum of Oozie workflows or coodinators to retrieve in one API call.
  ## oozie_jobs_count=100


  # Use Cron format for defining the frequency of a Coordinator instead of the old frequency number/unit.
  enable_cron_scheduling=true


启动oozie
$ bin/oozied.sh start
重启hue
$ build/env/bin/supervisor
进入Workflow--Editors,点击左上角的“oozie编辑器”




----Hue与Hbase集成--------------------------------------
 HUE跟Hbase通讯是通过 hbase-thrift,我们先在[hostname]上启动 hbase-thrift 服务
service hbase-thrift start 


修改desktop/conf/hue.ini


[hbase]  
  # Comma-separated list of HBase Thrift servers for clusters in the format of '(name|host:port)'.  
  # Use full hostname with security.  
  hbase_clusters=(Cluster|[hostname]:9090)  
  
  # HBase configuration directory, where hbase-site.xml is located.  
  ## hbase_conf_dir=/etc/hbase/conf  
  
  # Hard limit of rows or columns per row fetched before truncating.  
  ## truncate_limit = 500  
  
  # 'buffered' is the default of the HBase Thrift Server and supports security.  
  # 'framed' can be used to chunk up responses,  
  # which is useful when used in conjunction with the nonblocking server in Thrift.  
  ## thrift_transport=buffered  


这边的 (Cluster|host1:9090) 里面的 Cluster并不是你的HDFS集群名字,只是一个显示在HUE界面上的文字,所以可以随便写,
我这边保留 Cluster字样,后面的host1:9090是thrift的访问地址,如果有多个用逗号分隔
===================================================================================================





















  • 0
    点赞
  • 1
    收藏
    觉得还不错? 一键收藏
  • 0
    评论
评论
添加红包

请填写红包祝福语或标题

红包个数最小为10个

红包金额最低5元

当前余额3.43前往充值 >
需支付:10.00
成就一亿技术人!
领取后你会自动成为博主和红包主的粉丝 规则
hope_wisdom
发出的红包
实付
使用余额支付
点击重新获取
扫码支付
钱包余额 0

抵扣说明:

1.余额是钱包充值的虚拟货币,按照1:1的比例进行支付金额的抵扣。
2.余额无法直接购买下载,可以购买VIP、付费专栏及课程。

余额充值