官方参考文档链接:
http://hadoop.apache.org/docs/r2.8.5/hadoop-project-dist/hadoop-common/SingleCluster.html
1.依赖的软件检查
1、 java
2、 ssh,用来进行管理远程hadoop进程
另外官方文档还有一句:
If your cluster doesn’t have the requisite software you will need to install it.
如果没有下面这些软件还需要自己手动安装,下面是ubuntu的例子
$ sudo apt-get install ssh
$ sudo apt-get install rsync
2.下载并设置
我是下载的r2.8.5这个版本的,如果是下载的别的版本,还需要去查看对应的版本的文档。开始的时候我看到的是一篇中文文档,后来才注意到版本不一致,有些目录结构是不一样的。所以需要查看对应版本的文档。
2.1 设置环境变量
etc/hadoop/hadoop-env.sh
在上面这个文件中,添加JAVA_HOME 的环境变量。
export JAVA_HOME=/usr/java/latest --替换成相应的java路径
设置完成环境变量,在hadoop的安装目录下运行一下
$ bin/hadoop
这时候终端会显示hadoop的一些使用说明
bin/hadoop
Usage: hadoop [--config confdir] [COMMAND | CLASSNAME]
CLASSNAME run the class named CLASSNAME
or
where COMMAND is one of:
fs run a generic filesystem user client
version print the version
jar <jar> run a jar file
note: please use "yarn jar" to launch
YARN applications, not this command.
checknative [-a|-h] check native hadoop and compression libraries availability
distcp <srcurl> <desturl> copy file or directories recursively
archive -archiveName NAME -p <parent path> <src>* <dest> create a hadoop archive
classpath prints the class path needed to get the
Hadoop jar and the required libraries
credential interact with credential providers
daemonlog get/set the log level for each daemon
trace view and modify Hadoop tracing settings
Most commands print help when invoked w/o parameters.
到此hadoop安装完成~
文档的下面还介绍了三种模式的例子
Standalone Operation 单节点
Pseudo-Distributed Operation 伪分布式
Fully-Distributed Operation 分布式
下面是运行单节点的一个例子:
$ mkdir input
$ cp etc/hadoop/*.xml input
$ bin/hadoop jar share/hadoop/mapreduce/hadoop-mapreduce-examples-2.8.5.jar grep input output 'dfs[a-z.]+'
$ cat output/*
注意:这里面的output文件夹不需提前建立,运行示例的时候会提示文件夹已存在,感觉自己好像没有搭建好hadoop一样。
下面是运行结果的结尾部分,
18/12/21 15:26:58 INFO mapred.LocalJobRunner: Finishing task: attempt_local768800350_0002_r_000000_0
18/12/21 15:26:58 INFO mapred.LocalJobRunner: reduce task executor complete.
18/12/21 15:26:59 INFO mapreduce.Job: Job job_local768800350_0002 running in uber mode : false
18/12/21 15:26:59 INFO mapreduce.Job: map 100% reduce 100%
18/12/21 15:26:59 INFO mapreduce.Job: Job job_local768800350_0002 completed successfully
18/12/21 15:26:59 INFO mapreduce.Job: Counters: 30
File System Counters
FILE: Number of bytes read=1503038
FILE: Number of bytes written=2692554
FILE: Number of read operations=0
FILE: Number of large read operations=0
FILE: Number of write operations=0
Map-Reduce Framework
Map input records=12
Map output records=12
Map output bytes=274
Map output materialized bytes=304
Input split bytes=134
Combine input records=0
Combine output records=0
Reduce input groups=5
Reduce shuffle bytes=304
Reduce input records=12
Reduce output records=12
Spilled Records=24
Shuffled Maps =1
Failed Shuffles=0
Merged Map outputs=1
GC time elapsed (ms)=4
Total committed heap usage (bytes)=692060160
Shuffle Errors
BAD_ID=0
CONNECTION=0
IO_ERROR=0
WRONG_LENGTH=0
WRONG_MAP=0
WRONG_REDUCE=0
File Input Format Counters
Bytes Read=468
File Output Format Counters
Bytes Written=214
$ cat output/*
6 dfs.audit.logger
4 dfs.class
3 dfs.server.namenode.
3 dfs.logger
2 dfs.period
2 dfs.audit.log.maxfilesize
2 dfs.audit.log.maxbackupindex
1 dfsmetrics.log
1 dfsadmin
1 dfs.servers
1 dfs.log
1 dfs.file
更多相关
ubuntu18.4 LTS 环境下搭建 hadoop
ubuntu18.4 LTS 环境下搭建 hive