hue如何安装配置,配置好后怎么使用工作流,解决一些报错
在配置前确保有一个可被连接到的数据库;
安装前的准备工作:根据你自己的需求,保证安装前一些组件可以启动;
根据你的需求吧,如果你只想用这个hue可视化界面练习下hivesql,那么你把hive,和hdfs启动起来就行;在集成其他功能组件的时候,每集成一个就测试下hue看看能否使用
安装配置HUE
我使用的是3.9.0的版本
如果你想了解官网对hue介绍点击可查看
通过该命令可查找文件中某一字符对应的行数
[root@wq1 conf]# grep -n beeswax hue.ini
1022:[beeswax]
安装所需依赖
之所以给了两个,因为这两个都执行过,然后还没什么问题,就都放上了
yum install -y gcc libxml2-devel libxslt-devel cyrus-sasl-devel mysql-devel python-devel python-setuptools python-simplejson sqlite-devel ant gmp-devel cyrus-sasl-plain cyrus-sasl-devel cyrus-sasl-gssapi libffi-devel openldap-devel
yum install -y asciidoc cyrus-sasl-devel cyrus-sasl-gssapi cyrus-sasl-plain gcc gcc-c++ krb5-devel libffi-devel libxml2-devel libxslt-devel make openldap-devel python-devel sqlite-devel gmp-devel
安装位置/opt/hue-3.9.0
[root@wq1 hue-3.9.0]# vim desktop/conf/hue.ini
第一步:vim hui.ini
#通用配置17行处就可看到
[desktop]
##这个随意,照着复制就行
secret_key=jFE93j;2[290-eiw.KEiwN2s3['d;/.q[eIW^y#e=+Iei*@Mn<qW5o
##机器名
http_host=wq1
is_hue_4=true
time_zone=Asia/Shanghai
server_user=root
server_group=root
default_user=root
default_hdfs_superuser=root
#配置使用mysql作为hue的存储数据库,在hue.ini的617行处
[[database]]
engine=mysql
host=wq2
port=3306
user=specialwu
password=specialwu
name=hue
第二步:在MySQL中创建数据库存放元数据
create database hue default character set utf8 default collate utf8_general_ci;
第三步:编译(在自己安装目录下执行)
[root@wq1 hue-3.9.0]# make apps
第四步:启动(添加hue用户后启动)
[root@wq1 hue-3.9.0]# useradd hue;
[root@wq1 hue-3.9.0]# build/env/bin/supervisor
进行完上面的操作后,就可以访问这个页面了,我不是第一登了,第一次登陆,会告诉你,你输入的这账号密码,后面登陆时就用这个;如果看到了这个界面,就可以进行后面的操作了
集成hdfs和yarn
vim hue.ini
#910行处
[[hdfs_clusters]]
[[[default]]]
fs_defaultfs=hdfs://wq1:9000
webhdfs_url=http://node01:50070/webhdfs/v1
hadoop_conf_dir=/opt/hadoop-2.7.7/etc/hadoop
#937行处
[[yarn_clusters]]
[[[default]]]
resourcemanager_host=wq1
resourcemanager_port=8032
submit_to=True
resourcemanager_api_url=http://wq1:8088
history_server_api_url=http://wq1:19888
Hadoop上的配置文件
我的是在/opt/hadoop-2.7.7/etc/hadoop
hdfs-site.xml
<property>
<name> dfs.webhdfs.enabled </ name>
<value> true </ value>
</ property>
core-site.xml
<property>
<name>hadoop.proxyuser.root.hosts</name>
<value>*</value>
</property>
<property>
<name>hadoop.proxyuser.root.groups</name>
<value>*</value>
</property>
httpfs-site.xml
httpfs.sh start
<property>
<name>httpfs.proxyuser.root.hosts</name>
<value>*</value>
</property>
<property>
<name>httpfs.proxyuser.root.groups</name>
<value>*</value>
</property>
如果上面的没配置you are a Hue admin but not a HDFS superuser, "hdfs" or part of HDFS supergroup, "supergroup
vim yarn-site.xml
<property>
<name>yarn.resourcemanager.address</name>
<value>wq1:8032</value>
</property>
##是否启用日志聚集功能
<property>
<name>yarn.log-aggregation-enable</name>
<value>true</value>
</property>
##设置日志保留时间,单位是秒。
<property>
<name>yarn.log-aggregation.retain-seconds</name>
<value>106800</value>
</property>
vim mapred-site.xml 启动jobhistorymr-jobhistory-daemon.sh
<property>
<name>mapreduce.framework.name</name>
<value>yarn</value>
</property>
<property>
<name>mapreduce.jobhistory.address</name>
<value>wq1:10020</value>
</property>
<property>
<name>mapreduce.jobhistory.webapp.address</name>
<value>wq1:19888</value>
</property>
start-yarn.sh
,star-dfs.sh
集成hive
vim hue.ini
[beeswax]
hive_server_host=wq1
hive_server_port=10000
hive_conf_dir=/opt/hive-1.2.2/conf
server_conn_timeout=120
[metastore]
#允许使用hive创建数据库表等操作
enable_new_create_table=true
如果没启动服务则会出现hive连接10000端口失败
启动hive,重启hue:启动hue /opt/hue-3.9.0/build/env/bin
supervisor
hive
hiveserver2
supervisor
集成MySQL
vim hue.ini
注意此时这个是集成MySQL,文章最开始连接MySQL是存储元数据
#1578行
[[[mysql]]]
nice_name="My SQL DB"
engine=mysql
host=wq2
port=3306
user=specialwu
password=specialwu
集成oozie
vim hue.ini
#1185行
[oozie]
# Location on local FS where the examples are stored.
# local_data_dir=/export/servers/oozie-4.1.0-cdh5.14.0/examples/apps
# Location on local FS where the data for the examples is stored.
# sample_data_dir=/export/servers/oozie-4.1.0-cdh5.14.0/examples/input-data
# Location on HDFS where the oozie examples and workflows are stored.
# Parameters are $TIME and $USER, e.g. /user/$USER/hue/workspaces/workflow-$TIME
# remote_data_dir=/user/root/oozie_works/examples/apps
# Maximum of Oozie workflows or coodinators to retrieve in one API call.
oozie_jobs_count=100
# Use Cron format for defining the frequency of a Coordinator instead of the old frequency number/unit.
enable_cron_scheduling=true
# Flag to enable the saved Editor queries to be dragged and dropped into a workflow.
enable_document_action=true
# Flag to enable Oozie backend filtering instead of doing it at the page level in Javascript. Requires Oozie 4.3+.
enable_oozie_backend_filtering=true
# Flag to enable the Impala action.
enable_impala_action=true
#1216行
[filebrowser]
# Location on local filesystem where the uploaded archives are temporary stored.
archive_upload_tempdir=/tmp
# Show Download Button for HDFS file browser.
show_download_button=true
# Show Upload Button for HDFS file browser.
show_upload_button=true
# Flag to enable the extraction of a uploaded archive in HDFS.
enable_extract_uploaded_archive=true
#1442行
[liboozie]
# The URL where the Oozie service runs on. This is required in order for
# users to submit jobs. Empty value disables the config check.
oozie_url=http://wq1:11000/oozie
# Location on HDFS where the workflows/coordinator are deployed when submitted.
remote_deployement_dir=/user/root/oozie_works
vim oozie-site.xml
<property>
<name>oozie.service.ProxyUserService.proxyuser.root.hosts</name>
<value>*</value>
</property>
<property>
<name>oozie.service.ProxyUserService.proxyuser.root.groups</name>
<value>*</value>
</property>
启动oozie /bin oozie-start.sh
各个服务添加后查看进程
wq1
89665 Jps
84562 SecondaryNameNode
85924 ResourceManager
86102 NodeManager
5240 EmbeddedOozieServer
84201 NameNode
117755 JobHistoryServer
5097 Bootstrap
84349 DataNode
使用过程中异常问题解决
使用hivesql语句执行完之后之后如下报错
The auxService:mapreduce_shuffle does not exist
vim yarn-site.xml,之后重启yarn
<property>
<name>yarn.nodemanager.aux-services</name>
<value>mapreduce_shuffle</value>
</property>
使用workflow
要添加什么工作流就往里拖,拖动完第一个后,再放第二个就可以往上下左右放,形成一个执行顺序
我这个是mapreduce工作流,需要增加一些属性
保存后再提交
提交时会让你先输入自己添加的那些属性,我这里加的是mapreduce的输入路径和输出路径
提交后查看你workflow的运行情况
使用mapreduce的工作流
import org.apache.hadoop.conf.Configuration;
import org.apache.hadoop.fs.Path;
import org.apache.hadoop.io.LongWritable;
import org.apache.hadoop.io.Text;
import org.apache.hadoop.mapreduce.Job;
import org.apache.hadoop.mapreduce.Mapper;
import org.apache.hadoop.mapreduce.Reducer;
import org.apache.hadoop.mapreduce.lib.input.FileInputFormat;
import org.apache.hadoop.mapreduce.lib.output.FileOutputFormat;
import java.io.IOException;
/**
* @Author:wuqi
* @DATE: 2021/4/4 0004
* @TIME: 19:14
* @PROJECT_NAME: MapReduceOnHue
* 该类可实现的功能是拿到访问次数小于50次的IP
*/
public class MRfilter {
public static void main(String[] args) throws Exception {
//1. 获取job实例
Configuration conf = new Configuration();
Job job = Job.getInstance(conf);
//2. 设置job运行的主类 通过查找给定类的来源来设置Jar
job.setJarByClass(MRfilter.class);
//3. 设置Mapper的类
job.setMapperClass(WCMapper.class);
//4. 设置Reducer的类
job.setReducerClass(WCReducer.class);
//5. 设置Mapper输出的类型 为映射输出数据设置键类。这允许用户指定映射输出键类与最终输出值类不同。
job.setMapOutputKeyClass(Text.class);
job.setMapOutputValueClass(LongWritable.class);
//6. 设置Reducer输出的类型
job.setOutputKeyClass(Text.class);
job.setOutputValueClass(LongWritable.class);
//7. 设置job的输入路径
FileInputFormat.setInputPaths(job,new Path(args[0]));
//8. 设置job的输出路径 必须要设置否则Output directory not set
FileOutputFormat.setOutputPath(job, new Path(args[1]));
//将作业提交到集群并等待它完成,如果任务成功,则为true
System.exit(job.waitForCompletion(true) ? 0 : 1);
public static class WCMapper extends Mapper<LongWritable, Text, Text, LongWritable> {
private Text k = new Text();
private LongWritable v = new LongWritable(1);
@Override
protected void map(LongWritable key, Text value, Context context) throws IOException, InterruptedException {
// 1. 取出每行的字符串
String line = value.toString();
// 2. 切分成单词 以空白符切分\\s+
String[] words = line.split("\\s+");
// 3. 用context将处理完的结果写出到框架
for (String word : words) {
if (!"".equals(word)) {
k.set(words[0]);
context.write(k, v);
}
}
}
}
public static class WCReducer extends Reducer<Text, LongWritable, Text, LongWritable> {
@Override
public void reduce(Text key, Iterable<LongWritable> values, Context context) throws IOException, InterruptedException {
long count = 0L;
for (LongWritable value : values) {
count++;
}
System.out.println(count);
if (count<50)
context.write(key, new LongWritable(count));
}
}
}
靠制定的方式进行,需要设置的参数如下,main方法这部分就不需要了
左侧写入的属性 | 说明 | 侧填入的值 |
---|---|---|
mapreduce.input.fileinputformat.inputdir | MapReduce处理的目标源 | ${inputDir} |
mapreduce.output.fileoutputformat.outputdir | 处理结果导出的地址 | ${outputDir} |
mapreduce.job.map.class | map全类名 | WCMapper |
mapreduce.job.reduce.class | reduce的全类名 | WCReducer |
mapreduce.job.output.key.class | 输出的key值类型 | org.apache.hadoop.io.LongWritable |
mapreduce.job.output.value.class | 输出的value值类型 | org.apache.hadoop.io.Text |
mapred.mapper.new-api | map新的API | true |
mapred.reducer.new-api | reduce新的API | true |
mapreduce.job.reduces | reduce的个数 | 1 |