大数据之Hue

最新推荐文章于 2024-04-16 09:59:11 发布

code__online

最新推荐文章于 2024-04-16 09:59:11 发布

阅读量495

点赞数 1

分类专栏：大数据分布式平台文章标签： cloudera

本文链接：https://blog.csdn.net/code__online/article/details/109145361

版权

大数据分布式平台专栏收录该内容

18 篇文章 5 订阅

订阅专栏

1. Hue简介

HUE=Hadoop User Experience(Hadoop用户体验)，直白来说就一个开源的Apache Hadoop UI系统，由Cloudera Desktop演化而来，最后Cloudera公司将其贡献给Apache基金会的Hadoop社区，它是基于Python Web框架Django实现的。通过使用HUE我们可以在浏览器端的Web控制台上与Hadoop集群进行交互来分析处理数据。

2. Hue与其他框架的集成

2.1 Hue与HDFS

2.1.1 集群环境

master	slave1	slave2
zk	zk	zk
hbase	hbase	hbase
hadoop	hadoop	hadoop

2.1.2 配置HDFS

vim hdfs-site.xml

# 设置web是否能够访问namenode与datanode的文件信息
<property>
        <name>dfs.webhdfs.enabled</name>
        <value>true</value>
</property>

vim core-site.xml

# 设置hue的代理host
<property>
  <name>hadoop.proxyuser.hue.hosts</name>
  <value>*</value>
</property>
# 设置hue的代理用户组
<property>
  <name>hadoop.proxyuser.hue.groups</name>
  <value>*</value>
</property>

# 设置用户Hadoop的代理用户
<property>
  <name>hadoop.proxyuser.hadoop.hosts</name>
  <value>*</value>
</property>
# 设置代理用户hadoop的代理用户组
<property>
  <name>hadoop.proxyuser.hadoop.groups</name>
  <value>*</value>
</property>
# 如果你的Hadoop配置了高可用，则必须通过httpfs来访问，需要添加如下属性，反则则不必须。（如果HUE服务与Hadoop服务不在同一节点，则必须配置）
# 设置httpfs的代理host
<property>
  <name>hadoop.proxyuser.httpfs.hosts</name>
  <value>*</value>
</property>
# 设置httpfs的代理用户组
<property>
  <name>hadoop.proxyuser.httpfs.groups</name>
  <value>*</value>
</property>

区别：WebHDFS是HDFS内置的组件，已经运行于NameNode和DataNode中。对HDFS文件的读写，将会重定向到文件所在的DataNode，并且会完全利用HDFS的带宽。HttpFS是独立于HDFS的一个服务。对HDFS文件的读写，将会通过它进行中转，它能限制带宽占用。

vim httpfs-site.xml

<configuration>
	# 设置hue的代理host
    <property>
        <name>httpfs.proxyuser.hue.hosts</name>
        <value>*</value>
    </property>
    # 设置hue的代理用户组
    <property>
        <name>httpfs.proxyuser.hue.groups</name>
        <value>*</value>
   </property>
</configuration>

以上两个属性是用于HUE服务与Hadoop服务不在同一台节点上所必须的配置。

2.1.3 scp同步集群的hadoop配置文件

[root@master hadoop]# scp -r /usr/local/hadoop-2.6.1/etc/hadoop root@slave1:/usr/local/hadoop-2.6.1/etc/
[root@master hadoop]# scp -r /usr/local/hadoop-2.6.1/etc/hadoop root@slave2:/usr/local/hadoop-2.6.1/etc/

2.1.4 启动httpfs服务

[root@master hadoop]# /usr/local/hadoop-2.6.1/sbin/httpfs.sh start

2.1.5 配置hue.ini文件


[hadoop]

  # Configuration for HDFS NameNode
  # ------------------------------------------------------------------------
  [[hdfs_clusters]]
    # HA support by using HttpFs

    [[[default]]]
      # HDFS服务器地址
      fs_defaultfs=hdfs://master:8020
	  # 如果开启了高可用，需要配置如下
      # NameNode logical name.
      ## logical_name=
      # Use WebHdfs/HttpFs as the communication mechanism.
      # Domain should be the NameNode or HttpFs host.
      # Default port is 14000 for HttpFs.
      # 向HDFS发送命令的请求地址
      webhdfs_url=http://master:50070/webhdfs/v1

      # Change this if your HDFS cluster is Kerberos-secured
      ## security_enabled=false

      # Default umask for file and directory creation, specified in an octal value.
      ## umask=022

      # Directory of the Hadoop configuration
      ## hadoop_conf_dir=$HADOOP_CONF_DIR when set or '/etc/hadoop/conf'
      # hadoop的配置
      hadoop_conf_dir=/usr/local/hadoop-2.6.1/etc/hadoop
      hadoop_hdfs_home=/usr/local/hadoop-2.6.1
      hadoop_bin=/usr/local/hadoop-2.6.1/bin

2.2 Hue与Yarn

2.2.1 配置hue.ini文件

  [[yarn_clusters]]

    [[[default]]]
      # yarn服务的配置
      resourcemanager_host=master
      # The port where the ResourceManager IPC listens on
      resourcemanager_port=8032

      # 是否将作业提交到此群集，并监控作业执行情况
      submit_to=True

      # Resource Manager logical name (required for HA)
      ## logical_name=

      # Change this if your YARN cluster is Kerberos-secured
      ## security_enabled=false

      # 配置yarn资源管理的访问入口
      resourcemanager_api_url=http://master:8088

      # URL of the ProxyServer API
      proxy_api_url=http://master:8088

      # 历史服务器管理的入口，查看作业的历史运行情况
      history_server_api_url=http://slave2:19888

2.2.2 Hadoop启动jobhistoryserver服务来实现web查看作业的历史运行情况

[root@slave2 hadoop-2.6.1]# ./sbin/mr-jobhistory-daemon.sh start historyserver

2.3 Hue与Hive

2.3.1 修改hive配置文件hive-site.xml

# Hue与hive集成需要hive开启HiveServer2服务
<property>
        <name>hive.server2.thrift.bind.host</name>
        <value>master</value>
</property>
<property>
# TCP监听hive的端口
        <name>hive.server2.thrift.port</name>
        <value>10000</value>
</property>
<property>
        <name>hive.server2.long.polling.timeout</name>
        <value>5000</value>
</property>
<property>
# 运行metastore服务的host
        <name>hive.metastore.uris</name>
        <value>thrift://master:9083</value>
</property>

2.3.2 启动hive有关服务

[root@master apache-hive-1.2.2-bin]# ./bin/hive --service metastore &
[root@master apache-hive-1.2.2-bin]# ./bin/hive --service hiveserver2 &

2.3.3 配置hue.ini

[beeswax]

  # Host where HiveServer2 is running.
  # If Kerberos security is enabled, use fully-qualified domain name (FQDN).
  hive_server_host=master

  # Port where HiveServer2 Thrift server runs on.
  hive_server_port=10000

  # Hive configuration directory, where hive-site.xml is located
  hive_conf_dir=/usr/local/apache-hive-1.2.2-bin/conf

2.4 Hue与Mysql

2.4.1 配置hue.ini文件

 # mysql, oracle, or postgresql configuration.
     [[[mysql]]]
      # Name to show in the UI.
                                                                                                                                498,0-1       52%
     [[[mysql]]]
      # Name to show in the UI.
      nice_name="db_mysql"

      # For MySQL and PostgreSQL, name is the name of the database.
      # For Oracle, Name is instance of the Oracle server. For express edition
      # this is 'xe' by default.
      ## name=mysqldb

      # Database backend to use. This can be:
      # 1. mysql
      # 2. postgresql
      # 3. oracle
      engine=mysql

      # IP or hostname of the database to connect to.
      host=master

      # Port the database server is listening to. Defaults are:
      # 1. MySQL: 3306
      # 2. PostgreSQL: 5432
      # 3. Oracle Express Edition: 1521
      port=3306

      # Username to authenticate with when connecting to the database.
      user=root

      # Password matching the username to authenticate with when
      # connecting to the database.
      password=xxxxxx

2.5 Hue与Zookeeper

2.5.1 配置hue.ini文件

[zookeeper]

  [[clusters]]

    [[[default]]]
      # Zookeeper ensemble. Comma separated list of Host/Port.
      # e.g. localhost:2181,localhost:2182,localhost:2183
      host_ports=master:2181,slave1:2181,slave2:2181

2.5.2 启动zookeeper集群

[root@master zookeeper-3.4.11]# ./bin/zkServer.sh start
[root@slave1 zookeeper-3.4.11]# ./bin/zkServer.sh start
[root@slave2 zookeeper-3.4.11]# ./bin/zkServer.sh start

2.6 Hue与Hbase

2.6.1 修改hue.ini文件

[hbase]
  # Comma-separated list of HBase Thrift servers for clusters in the format of '(name|host:port)'.
  # Use full hostname with security.
  hbase_clusters=(Cluster|master:9090)

  # HBase configuration directory, where hbase-site.xml is located.
  hbase_conf_dir=/usr/local/hbase-1.3.1/conf

2.6.2 启动HBase服务与thrift服务

[root@master hbase-1.3.1]# ./bin/start-hbase.sh
[root@master hbase-1.3.1]# ./bin/hbase-daemon.sh start thrift

code__online

关注

1
点赞
踩
1

收藏

觉得还不错? 一键收藏
0
评论
大数据之Hue

1. Hue简介HUE=Hadoop User Experience(Hadoop用户体验)，直白来说就一个开源的Apache Hadoop UI系统，由Cloudera Desktop演化而来，最后Cloudera公司将其贡献给Apache基金会的Hadoop社区，它是基于Python Web框架Django实现的。通过使用HUE我们可以在浏览器端的Web控制台上与Hadoop集群进行交互来分析处理数据。2. Hue与其他框架的集成2.1 Hue与HDFS2.1.1 集群环境master
复制链接

扫一扫