配置及使用hue

最新推荐文章于 2024-07-30 17:40:08 发布

琪得龙东强

最新推荐文章于 2024-07-30 17:40:08 发布

阅读量713

点赞数

分类专栏：一些大数据组件的安装过程文章标签： hue集群

本文链接：https://blog.csdn.net/strangewu/article/details/115443694

版权

一些大数据组件的安装过程专栏收录该内容

4 篇文章 0 订阅

订阅专栏

hue如何安装配置，配置好后怎么使用工作流，解决一些报错

在配置前确保有一个可被连接到的数据库；
安装前的准备工作：根据你自己的需求，保证安装前一些组件可以启动；
根据你的需求吧，如果你只想用这个hue可视化界面练习下hivesql，那么你把hive，和hdfs启动起来就行；在集成其他功能组件的时候，每集成一个就测试下hue看看能否使用

安装配置HUE

我使用的是3.9.0的版本
如果你想了解官网对hue介绍点击可查看

通过该命令可查找文件中某一字符对应的行数

[root@wq1 conf]# grep -n beeswax hue.ini 
1022:[beeswax]

安装所需依赖
之所以给了两个，因为这两个都执行过，然后还没什么问题，就都放上了

yum install -y gcc libxml2-devel libxslt-devel cyrus-sasl-devel mysql-devel python-devel python-setuptools python-simplejson sqlite-devel ant gmp-devel cyrus-sasl-plain cyrus-sasl-devel cyrus-sasl-gssapi libffi-devel openldap-devel

yum install -y asciidoc cyrus-sasl-devel cyrus-sasl-gssapi cyrus-sasl-plain gcc gcc-c++ krb5-devel libffi-devel libxml2-devel libxslt-devel make openldap-devel python-devel sqlite-devel gmp-devel

安装位置/opt/hue-3.9.0

[root@wq1 hue-3.9.0]# vim desktop/conf/hue.ini

第一步：vim hui.ini

#通用配置17行处就可看到
[desktop]
##这个随意，照着复制就行
secret_key=jFE93j;2[290-eiw.KEiwN2s3['d;/.q[eIW^y#e=+Iei*@Mn<qW5o
##机器名
http_host=wq1
is_hue_4=true
time_zone=Asia/Shanghai
server_user=root
server_group=root
default_user=root
default_hdfs_superuser=root
#配置使用mysql作为hue的存储数据库,在hue.ini的617行处
[[database]]
engine=mysql
host=wq2
port=3306
user=specialwu
password=specialwu
name=hue

第二步：在MySQL中创建数据库存放元数据

create database hue default character set utf8 default collate utf8_general_ci;

第三步：编译(在自己安装目录下执行)

[root@wq1 hue-3.9.0]# make apps

第四步：启动（添加hue用户后启动）

[root@wq1 hue-3.9.0]# useradd hue;
[root@wq1 hue-3.9.0]# build/env/bin/supervisor

进行完上面的操作后，就可以访问这个页面了，我不是第一登了，第一次登陆，会告诉你，你输入的这账号密码，后面登陆时就用这个；如果看到了这个界面，就可以进行后面的操作了
HUE界面

集成hdfs和yarn

vim hue.ini

#910行处
[[hdfs_clusters]]
    [[[default]]]
fs_defaultfs=hdfs://wq1:9000
webhdfs_url=http://node01:50070/webhdfs/v1
hadoop_conf_dir=/opt/hadoop-2.7.7/etc/hadoop
#937行处
[[yarn_clusters]]
    [[[default]]]
      resourcemanager_host=wq1
      resourcemanager_port=8032
      submit_to=True
      resourcemanager_api_url=http://wq1:8088
      history_server_api_url=http://wq1:19888

Hadoop上的配置文件
我的是在/opt/hadoop-2.7.7/etc/hadoop
hdfs-site.xml

<property> 
  <name> dfs.webhdfs.enabled </ name> 
  <value> true </ value> 
</ property>

core-site.xml

<property>
 <name>hadoop.proxyuser.root.hosts</name>
  <value>*</value>
</property>
<property>
  <name>hadoop.proxyuser.root.groups</name>
  <value>*</value>
</property>

httpfs-site.xmlhttpfs.sh start

<property>
  <name>httpfs.proxyuser.root.hosts</name>
  <value>*</value>
</property>
<property>
  <name>httpfs.proxyuser.root.groups</name>
  <value>*</value>
</property>

如果上面的没配置you are a Hue admin but not a HDFS superuser, "hdfs" or part of HDFS supergroup, "supergroup
vim yarn-site.xml

<property>
<name>yarn.resourcemanager.address</name>
<value>wq1:8032</value>
</property>
##是否启用日志聚集功能
<property>  
<name>yarn.log-aggregation-enable</name>
<value>true</value>
</property>
##设置日志保留时间，单位是秒。
<property>  
<name>yarn.log-aggregation.retain-seconds</name>
<value>106800</value>
</property>

在这里插入图片描述

vim mapred-site.xml 启动jobhistorymr-jobhistory-daemon.sh

    <property>
        <name>mapreduce.framework.name</name>
        <value>yarn</value>
    </property>
<property>
    <name>mapreduce.jobhistory.address</name>
    <value>wq1:10020</value>
</property>
<property>
    <name>mapreduce.jobhistory.webapp.address</name>
    <value>wq1:19888</value>
</property>

在这里插入图片描述

start-yarn.sh,star-dfs.sh

集成hive

vim hue.ini

[beeswax]
  hive_server_host=wq1
  hive_server_port=10000
  hive_conf_dir=/opt/hive-1.2.2/conf
  server_conn_timeout=120
[metastore]
  #允许使用hive创建数据库表等操作
  enable_new_create_table=true

如果没启动服务则会出现hive连接10000端口失败
启动hive，重启hue:启动hue /opt/hue-3.9.0/build/env/bin supervisor

hive
hiveserver2
supervisor

集成MySQL

vim hue.ini
注意此时这个是集成MySQL，文章最开始连接MySQL是存储元数据

#1578行
[[[mysql]]]
      nice_name="My SQL DB"
      engine=mysql
      host=wq2
      port=3306
      user=specialwu
      password=specialwu

集成oozie

vim hue.ini

  #1185行
  [oozie]
  # Location on local FS where the examples are stored.
  # local_data_dir=/export/servers/oozie-4.1.0-cdh5.14.0/examples/apps

  # Location on local FS where the data for the examples is stored.
  # sample_data_dir=/export/servers/oozie-4.1.0-cdh5.14.0/examples/input-data

  # Location on HDFS where the oozie examples and workflows are stored.
  # Parameters are $TIME and $USER, e.g. /user/$USER/hue/workspaces/workflow-$TIME
  # remote_data_dir=/user/root/oozie_works/examples/apps

  # Maximum of Oozie workflows or coodinators to retrieve in one API call.
  oozie_jobs_count=100

  # Use Cron format for defining the frequency of a Coordinator instead of the old frequency number/unit.
  enable_cron_scheduling=true

  # Flag to enable the saved Editor queries to be dragged and dropped into a workflow.
  enable_document_action=true

  # Flag to enable Oozie backend filtering instead of doing it at the page level in Javascript. Requires Oozie 4.3+.
  enable_oozie_backend_filtering=true

  # Flag to enable the Impala action.
  enable_impala_action=true
  #1216行
  [filebrowser]
  # Location on local filesystem where the uploaded archives are temporary stored.
  archive_upload_tempdir=/tmp

  # Show Download Button for HDFS file browser.
  show_download_button=true

  # Show Upload Button for HDFS file browser.
  show_upload_button=true

  # Flag to enable the extraction of a uploaded archive in HDFS.
  enable_extract_uploaded_archive=true
  #1442行
[liboozie]
  # The URL where the Oozie service runs on. This is required in order for
  # users to submit jobs. Empty value disables the config check.
  oozie_url=http://wq1:11000/oozie
  # Location on HDFS where the workflows/coordinator are deployed when submitted.
  remote_deployement_dir=/user/root/oozie_works

vim oozie-site.xml

<property>										   
        <name>oozie.service.ProxyUserService.proxyuser.root.hosts</name>
        <value>*</value>
</property>
<property>
        <name>oozie.service.ProxyUserService.proxyuser.root.groups</name>
        <value>*</value>
</property>

启动oozie /bin oozie-start.sh

各个服务添加后查看进程

wq1
89665 Jps
84562 SecondaryNameNode
85924 ResourceManager
86102 NodeManager
5240 EmbeddedOozieServer
84201 NameNode
117755 JobHistoryServer
5097 Bootstrap
84349 DataNode

使用过程中异常问题解决

使用hivesql语句执行完之后之后如下报错

The auxService:mapreduce_shuffle does not exist

vim yarn-site.xml,之后重启yarn

<property>
      <name>yarn.nodemanager.aux-services</name>
      <value>mapreduce_shuffle</value>
</property>

使用workflow

工作流开始
要添加什么工作流就往里拖，拖动完第一个后，再放第二个就可以往上下左右放，形成一个执行顺序
执行第一步
我这个是mapreduce工作流，需要增加一些属性
在这里插入图片描述
保存后再提交

提交时会让你先输入自己添加的那些属性，我这里加的是mapreduce的输入路径和输出路径

提交后查看你workflow的运行情况
执行MapReduce的工作流

使用mapreduce的工作流

import org.apache.hadoop.conf.Configuration;
import org.apache.hadoop.fs.Path;
import org.apache.hadoop.io.LongWritable;
import org.apache.hadoop.io.Text;
import org.apache.hadoop.mapreduce.Job;
import org.apache.hadoop.mapreduce.Mapper;
import org.apache.hadoop.mapreduce.Reducer;
import org.apache.hadoop.mapreduce.lib.input.FileInputFormat;
import org.apache.hadoop.mapreduce.lib.output.FileOutputFormat;

import java.io.IOException;

/**
 * @Author:wuqi
 * @DATE: 2021/4/4 0004
 * @TIME: 19:14
 * @PROJECT_NAME: MapReduceOnHue
 * 该类可实现的功能是拿到访问次数小于50次的IP
 */
public class MRfilter {
    public static void main(String[] args) throws Exception {

        //1. 获取job实例
        Configuration conf = new Configuration();
        Job job = Job.getInstance(conf);
        //2. 设置job运行的主类 通过查找给定类的来源来设置Jar
        job.setJarByClass(MRfilter.class);
        //3. 设置Mapper的类
        job.setMapperClass(WCMapper.class);
        //4. 设置Reducer的类
        job.setReducerClass(WCReducer.class);
        //5. 设置Mapper输出的类型  为映射输出数据设置键类。这允许用户指定映射输出键类与最终输出值类不同。
        job.setMapOutputKeyClass(Text.class);
        job.setMapOutputValueClass(LongWritable.class);

        //6. 设置Reducer输出的类型
        job.setOutputKeyClass(Text.class);
        job.setOutputValueClass(LongWritable.class);


        //7. 设置job的输入路径
        FileInputFormat.setInputPaths(job,new Path(args[0]));
        //8. 设置job的输出路径 必须要设置否则Output directory not set
        FileOutputFormat.setOutputPath(job, new Path(args[1]));
        //将作业提交到集群并等待它完成,如果任务成功，则为true
        System.exit(job.waitForCompletion(true) ? 0 : 1);
    public static class WCMapper extends Mapper<LongWritable, Text, Text, LongWritable> {
        private Text k = new Text();
        private LongWritable v = new LongWritable(1);

        @Override
        protected void map(LongWritable key, Text value, Context context) throws IOException, InterruptedException {

            // 1. 取出每行的字符串
            String line = value.toString();
            // 2. 切分成单词 以空白符切分\\s+
            String[] words = line.split("\\s+");
            // 3. 用context将处理完的结果写出到框架
            for (String word : words) {
                if (!"".equals(word)) {
                    k.set(words[0]);
                    context.write(k, v);
                }
            }
        }
    }

    public static class WCReducer extends Reducer<Text, LongWritable, Text, LongWritable> {
        @Override
        public void reduce(Text key, Iterable<LongWritable> values, Context context) throws IOException, InterruptedException {
            long count = 0L;
            for (LongWritable value : values) {
                count++;
            }
            System.out.println(count);
            if (count<50)
                context.write(key, new LongWritable(count));
        }
    }
}

靠制定的方式进行,需要设置的参数如下，main方法这部分就不需要了

左侧写入的属性	说明	侧填入的值
mapreduce.input.fileinputformat.inputdir	MapReduce处理的目标源	${inputDir}
mapreduce.output.fileoutputformat.outputdir	处理结果导出的地址	${outputDir}
mapreduce.job.map.class	map全类名	WCMapper
mapreduce.job.reduce.class	reduce的全类名	WCReducer
mapreduce.job.output.key.class	输出的key值类型	org.apache.hadoop.io.LongWritable
mapreduce.job.output.value.class	输出的value值类型	org.apache.hadoop.io.Text
mapred.mapper.new-api	map新的API	true
mapred.reducer.new-api	reduce新的API	true
mapreduce.job.reduces	reduce的个数	1