Mac系统下单独安装hadoop环境
1.JDK准备
本人装的是JDK1.8的环境,配置JDK的环境变量
vi ~/.bash_profile
export JAVA_HOME=/Library/Java/JavaVirtualMachines/jdk1.8.0_221.jdk/Contents/Home
source ~/.bash_profile
2. 下载hadoop
hadoop版本:handoop-2.8.4
官网的镜像下载环境:https://mirrors.tuna.tsinghua.edu.cn/apache/hadoop/common/
百度网盘链接:https://pan.baidu.com/s/1cy6eZ5StaO0ArzeF9b0QTA 密码:ieqq
安装位置:/Users/caoan/Downloads/hadoop-2.8.4
下载完hadoop,解压文件夹
tar -zxvf hadoop-2.8.4.tar.gz
设置环境变量
vi ~/.bash_profile
export HADOOP_HOME=/Users/caoan/Downloads/hadoop-2.8.4
export PATH=$PATH:$HADOOP_HOME/bin
修改hadoop-env.sh下的JDK配置(不修改此项的话,本人在运行mapreduce job查看日志的时候发现找不到java的路径)
vi etc/hadoop/hadoop-env.sh
# export JAVA_HOME=${JAVA_HOME}
export JAVA_HOME=/Library/Java/JavaVirtualMachines/jdk1.8.0_221.jdk/Contents/Home
接下来根据官网的教程进行hadoop的验证 http://hadoop.apache.org/docs/r2.8.4/hadoop-project-dist/hadoop-common/SingleCluster.html
hadoop可以分为以下三种模式
独立运行(Standalone Mode)
$ mkdir input
$ cp etc/hadoop/*.xml input
$ bin/hadoop jar share/hadoop/mapreduce/hadoop-mapreduce-examples-2.8.4.jar grep input output 'dfs[a-z.]+'
$ cat output/*
伪分布式部署(Pseudo-Distributed Operation)
Hadoop也可以以伪分布式模式在单节点上运行,其中每个Hadoop守护程序都在单独的Java进程中运行。
配置core-site.xml
mkdir -p hdfs/tmp
vi etc/hadoop/core-site.xml
配置修改如下:
<configuration>
<property>
<name>hadoop.tmp.dir</name>
<value>/Users/caoan/Downloads/hadoop-2.8.4/hdfs/tmp</value>
<description>Abase for other temporary directories.</description>
</property>
<property>
<name>fs.defaultFS</name>
<value>hdfs://localhost:9000</value>
</property>
</configuration>
配置hdfs-site.xml
vi etc/hadoop/hdfs-site.xml
配置修改如下:
因为是伪分布式单节点部署,因此将replication配置为1,如不配置,默认为3,一个文件默认会存三份进行备份。
配置namenode、datanode的存储目录
<configuration>
<property>
<name>dfs.replication</name>
<value>1</value>
</property>
<property>
<name>dfs.namenode.name.dir</name>
<value>/Users/caoan/Downloads/hadoop-2.8.4/hdfs/tmp/dfs/name</value>
</property>
<property>
<name>dfs.datanode.data.dir</name>
<value>/Users/caoan/Downloads/hadoop-2.8.4/hdfs/tmp/dfs/data</value>
</property>
</configuration>
配置mapred-site.xml
下载的配置文件中没有mapred-site.xml文件但是有mapred-site.xml.template文件,拷贝一份模版
cp etc/hadoop/mapred-site.xml.template etc/hadoop/mapred-site.xml
vi etc/hadoop/mapred-site.xml
配置修改如下:
<configuration>
<!-- 通知框架MR使用YARN -->
<property>
<name>mapreduce.framework.name</name>
<value>yarn</value>
</property>
</configuration>
配置yarn-site.xml
vi etc/hadoop/yarn-site.xml
配置修改如下:(如果不修改此项,本人在运行mapreduce job的时候发现卡死在了job那里)
<configuration>
<!-- reducer取数据的方式是mapreduce_shuffle -->
<property>
<name>yarn.nodemanager.aux-services</name>
<value>mapreduce_shuffle</value>
</property>
<property>
<name>yarn.nodemanager.disk-health-checker.min-healthy-disks</name>
<value>0.0</value>
</property>
<property>
<name>yarn.nodemanager.disk-health-checker.max-disk-utilization-per-disk-percentage</name>
<value>95.0</value>
</property>
</configuration>
格式化hdfs文件系统
./bin/hdfs namenode -format
设置免密登陆
$ ssh-keygen -t rsa -P '' -f ~/.ssh/id_rsa
$ cat ~/.ssh/id_rsa.pub >> ~/.ssh/authorized_keys
$ chmod 0600 ~/.ssh/authorized_keys
启动hadoop环境,包括hdfs和yarn
sbin/start-all.sh
通过http://localhost:50070/dfshealth.html#tab-overview查看hdfs状态,http://localhost:8088/cluster 查看yarn资源管理状态
可以对hdfs进行测试
创建文件夹
hdfs dfs -mkdir
将文件上传到dfs
hdfs dfs -put test1.txt /test
查看文件列表
hdfs dfs -ls /
运行hadoop-2.8.4自带的mapreduce的wordcount的例子
jar包路径:/Users/caoan/Downloads/hadoop-2.8.4/share/hadoop/mapreduce/hadoop-mapreduce-examples-2.8.4.jar
//创建文件夹
hdfs dfs -mkdir /wordCountInput
//创建文件
vi workcount.txt
//上传文件
hdfs dfs -put workcount.txt /wordCountInput/
//运行例子
hadoop jar hadoop-mapreduce-examples-2.8.4.jar wordcount /wordCountInput /wordCountOutput
运行过程
caoan@caoandeMacBook-Pro:~/Downloads/hadoop-2.8.4/share/hadoop/mapreduce$ hadoop jar hadoop-mapreduce-examples-2.8.4.jar wordcount /wordCountInput /wordCountOutput
20/03/04 20:45:30 WARN util.NativeCodeLoader: Unable to load native-hadoop library for your platform... using builtin-java classes where applicable
20/03/04 20:45:30 INFO client.RMProxy: Connecting to ResourceManager at /0.0.0.0:8032
20/03/04 20:45:31 INFO input.FileInputFormat: Total input files to process : 1
20/03/04 20:45:31 INFO mapreduce.JobSubmitter: number of splits:1
20/03/04 20:45:31 INFO mapreduce.JobSubmitter: Submitting tokens for job: job_1583308872072_0004
20/03/04 20:45:31 INFO impl.YarnClientImpl: Submitted application application_1583308872072_0004
20/03/04 20:45:31 INFO mapreduce.Job: The url to track the job: http://caoandeMacBook-Pro.local:8088/proxy/application_1583308872072_0004/
20/03/04 20:45:31 INFO mapreduce.Job: Running job: job_1583308872072_0004
20/03/04 20:45:36 INFO mapreduce.Job: Job job_1583308872072_0004 running in uber mode : false
20/03/04 20:45:36 INFO mapreduce.Job: map 0% reduce 0%
20/03/04 20:45:40 INFO mapreduce.Job: map 100% reduce 0%
20/03/04 20:45:45 INFO mapreduce.Job: map 100% reduce 100%
20/03/04 20:45:45 INFO mapreduce.Job: Job job_1583308872072_0004 completed successfully
20/03/04 20:45:45 INFO mapreduce.Job: Counters: 49
File System Counters
FILE: Number of bytes read=62
FILE: Number of bytes written=316047
FILE: Number of read operations=0
FILE: Number of large read operations=0
FILE: Number of write operations=0
HDFS: Number of bytes read=153
HDFS: Number of bytes written=44
HDFS: Number of read operations=6
HDFS: Number of large read operations=0
HDFS: Number of write operations=2
Job Counters
Launched map tasks=1
Launched reduce tasks=1
Data-local map tasks=1
Total time spent by all maps in occupied slots (ms)=1956
Total time spent by all reduces in occupied slots (ms)=1841
Total time spent by all map tasks (ms)=1956
Total time spent by all reduce tasks (ms)=1841
Total vcore-milliseconds taken by all map tasks=1956
Total vcore-milliseconds taken by all reduce tasks=1841
Total megabyte-milliseconds taken by all map tasks=2002944
Total megabyte-milliseconds taken by all reduce tasks=1885184
Map-Reduce Framework
Map input records=3
Map output records=3
Map output bytes=50
Map output materialized bytes=62
Input split bytes=115
Combine input records=3
Combine output records=3
Reduce input groups=3
Reduce shuffle bytes=62
Reduce input records=3
Reduce output records=3
Spilled Records=6
Shuffled Maps =1
Failed Shuffles=0
Merged Map outputs=1
GC time elapsed (ms)=69
CPU time spent (ms)=0
Physical memory (bytes) snapshot=0
Virtual memory (bytes) snapshot=0
Total committed heap usage (bytes)=350748672
Shuffle Errors
BAD_ID=0
CONNECTION=0
IO_ERROR=0
WRONG_LENGTH=0
WRONG_MAP=0
WRONG_REDUCE=0
File Input Format Counters
Bytes Read=38
File Output Format Counters
Bytes Written=44
查看结果
caoan@caoandeMacBook-Pro:~/Downloads/hadoop-2.8.4/share/hadoop/mapreduce$ hdfs dfs -cat /wordCountOutput/*
20/03/04 20:47:35 WARN util.NativeCodeLoader: Unable to load native-hadoop library for your platform... using builtin-java classes where applicable
Hello,Hadoop! 1
Hello,world! 1
Hello,you! 1