一:配置Java环境
这里就不详细介绍了,网上有大把的教程
二:安装hadoop
hadoop下载地址:https://hadoop.apache.org/releases.html
解压hadoop
# tar -zxvf hadoop-2.9.2.tar.gz
配置hadoop的环境变量:
# vi /etc/profile
在文件最后添加如下两行:
export HADOOP_HOME=/root/develop/hadoop-2.9.2
export PATH=$PATH:$HADOOP_HOME/bin:/$HADOOP_HOME/sbin
使配置生效
# source /etc/profile
执行hadoop version查询是否配置成功:
# hadoop version
Hadoop 2.9.2
Subversion https://git-wip-us.apache.org/repos/asf/hadoop.git -r 826afbeae31ca687bc2f8471dc841b66ed2c6704
Compiled by ajisaka on 2018-11-13T12:42Z
Compiled with protoc 2.5.0
From source with checksum 3a9939967262218aa556c684d107985
This command was run using /root/develop/hadoop-2.9.2/share/hadoop/common/hadoop-common-2.9.2.jar
下面以一个简单的例子进行演示
https://download.csdn.net/download/vincent_yuan89/10883834
备注:来自《Hadoop权威指南.大数据的存储与分析.第4版》随书源码
把源码工程打包成jar包
如hadoop-examples.jar
设置HADOOP_CLASSPATH环境变量
# export HADOOP_CLASSPATH=hadoop-examples.jar
执行hadoop命令
# hadoop MaxTemperature input/sample.txt output
后面跟的参数分别为需要执行的应用类名,输入的源数据,输出的目录
执行的结果如图
File System Counters
FILE: Number of bytes read=25034
FILE: Number of bytes written=947710
FILE: Number of read operations=0
FILE: Number of large read operations=0
FILE: Number of write operations=0
Map-Reduce Framework
Map input records=5
Map output records=5
Map output bytes=45
Map output materialized bytes=61
Input split bytes=94
Combine input records=0
Combine output records=0
Reduce input groups=2
Reduce shuffle bytes=61
Reduce input records=5
Reduce output records=2
Spilled Records=10
Shuffled Maps =1
Failed Shuffles=0
Merged Map outputs=1
GC time elapsed (ms)=39
Total committed heap usage (bytes)=243048448
Shuffle Errors
BAD_ID=0
CONNECTION=0
IO_ERROR=0
WRONG_LENGTH=0
WRONG_MAP=0
WRONG_REDUCE=0
File Input Format Counters
Bytes Read=529
File Output Format Counters
Bytes Written=29
在output目录下有相关的结果记录
# cat output/part-r-00000
1949 111
1950 22