Mac环境下进行大数据hadoop环境安装

最新推荐文章于 2024-08-31 13:22:50 发布

一只小猪皮

最新推荐文章于 2024-08-31 13:22:50 发布

阅读量625

点赞数

本文链接：https://blog.csdn.net/qq_37080417/article/details/104660887

版权

Mac系统下单独安装hadoop环境

1.JDK准备

本人装的是JDK1.8的环境，配置JDK的环境变量

vi ~/.bash_profile
export JAVA_HOME=/Library/Java/JavaVirtualMachines/jdk1.8.0_221.jdk/Contents/Home
source ~/.bash_profile

2. 下载hadoop

hadoop版本：handoop-2.8.4

官网的镜像下载环境：https://mirrors.tuna.tsinghua.edu.cn/apache/hadoop/common/

百度网盘链接:https://pan.baidu.com/s/1cy6eZ5StaO0ArzeF9b0QTA 密码:ieqq

安装位置：/Users/caoan/Downloads/hadoop-2.8.4

下载完hadoop，解压文件夹

tar -zxvf hadoop-2.8.4.tar.gz

设置环境变量

vi ~/.bash_profile
export HADOOP_HOME=/Users/caoan/Downloads/hadoop-2.8.4
export PATH=$PATH:$HADOOP_HOME/bin

修改hadoop-env.sh下的JDK配置(不修改此项的话，本人在运行mapreduce job查看日志的时候发现找不到java的路径)

vi etc/hadoop/hadoop-env.sh
# export JAVA_HOME=${JAVA_HOME}
export JAVA_HOME=/Library/Java/JavaVirtualMachines/jdk1.8.0_221.jdk/Contents/Home

接下来根据官网的教程进行hadoop的验证 http://hadoop.apache.org/docs/r2.8.4/hadoop-project-dist/hadoop-common/SingleCluster.html

hadoop可以分为以下三种模式

独立运行（Standalone Mode）

  $ mkdir input
  $ cp etc/hadoop/*.xml input
  $ bin/hadoop jar share/hadoop/mapreduce/hadoop-mapreduce-examples-2.8.4.jar grep input output 'dfs[a-z.]+'
  $ cat output/*

伪分布式部署（Pseudo-Distributed Operation）

Hadoop也可以以伪分布式模式在单节点上运行，其中每个Hadoop守护程序都在单独的Java进程中运行。

配置core-site.xml

mkdir -p hdfs/tmp
vi etc/hadoop/core-site.xml

配置修改如下：

<configuration>
    <property>
        <name>hadoop.tmp.dir</name>
        <value>/Users/caoan/Downloads/hadoop-2.8.4/hdfs/tmp</value>
        <description>Abase for other temporary directories.</description>
    </property>
    <property>
        <name>fs.defaultFS</name>
        <value>hdfs://localhost:9000</value>
    </property>
</configuration>

配置hdfs-site.xml

vi etc/hadoop/hdfs-site.xml

配置修改如下：

因为是伪分布式单节点部署，因此将replication配置为1，如不配置，默认为3，一个文件默认会存三份进行备份。

配置namenode、datanode的存储目录

<configuration>
        <property>
             <name>dfs.replication</name>
             <value>1</value>
        </property>
        <property>
             <name>dfs.namenode.name.dir</name>
             <value>/Users/caoan/Downloads/hadoop-2.8.4/hdfs/tmp/dfs/name</value>
        </property>
        <property>
             <name>dfs.datanode.data.dir</name>
             <value>/Users/caoan/Downloads/hadoop-2.8.4/hdfs/tmp/dfs/data</value>
        </property>
</configuration>

配置mapred-site.xml

下载的配置文件中没有mapred-site.xml文件但是有mapred-site.xml.template文件，拷贝一份模版

cp etc/hadoop/mapred-site.xml.template etc/hadoop/mapred-site.xml
vi etc/hadoop/mapred-site.xml

配置修改如下：

<configuration>
    <!-- 通知框架MR使用YARN -->
    <property>
        <name>mapreduce.framework.name</name>
        <value>yarn</value>
    </property>
</configuration>

配置yarn-site.xml

vi etc/hadoop/yarn-site.xml

配置修改如下：（如果不修改此项，本人在运行mapreduce job的时候发现卡死在了job那里）

<configuration>
    <!-- reducer取数据的方式是mapreduce_shuffle -->
    <property>
        <name>yarn.nodemanager.aux-services</name>
        <value>mapreduce_shuffle</value>
    </property>
<property>
     <name>yarn.nodemanager.disk-health-checker.min-healthy-disks</name>
     <value>0.0</value>
  </property>
  <property>
     <name>yarn.nodemanager.disk-health-checker.max-disk-utilization-per-disk-percentage</name>
     <value>95.0</value>
  </property>
</configuration>

格式化hdfs文件系统

./bin/hdfs namenode -format

设置免密登陆

  $ ssh-keygen -t rsa -P '' -f ~/.ssh/id_rsa
  $ cat ~/.ssh/id_rsa.pub >> ~/.ssh/authorized_keys
  $ chmod 0600 ~/.ssh/authorized_keys

启动hadoop环境，包括hdfs和yarn

sbin/start-all.sh

通过http://localhost:50070/dfshealth.html#tab-overview查看hdfs状态，http://localhost:8088/cluster 查看yarn资源管理状态

可以对hdfs进行测试

创建文件夹

hdfs dfs -mkdir

将文件上传到dfs

hdfs dfs -put test1.txt /test

查看文件列表

hdfs dfs -ls /

运行hadoop-2.8.4自带的mapreduce的wordcount的例子

jar包路径：/Users/caoan/Downloads/hadoop-2.8.4/share/hadoop/mapreduce/hadoop-mapreduce-examples-2.8.4.jar

//创建文件夹
hdfs dfs -mkdir /wordCountInput
//创建文件
vi workcount.txt
//上传文件
hdfs dfs -put workcount.txt /wordCountInput/
//运行例子
hadoop jar hadoop-mapreduce-examples-2.8.4.jar wordcount /wordCountInput /wordCountOutput

运行过程

caoan@caoandeMacBook-Pro:~/Downloads/hadoop-2.8.4/share/hadoop/mapreduce$ hadoop jar hadoop-mapreduce-examples-2.8.4.jar wordcount /wordCountInput /wordCountOutput
20/03/04 20:45:30 WARN util.NativeCodeLoader: Unable to load native-hadoop library for your platform... using builtin-java classes where applicable
20/03/04 20:45:30 INFO client.RMProxy: Connecting to ResourceManager at /0.0.0.0:8032
20/03/04 20:45:31 INFO input.FileInputFormat: Total input files to process : 1
20/03/04 20:45:31 INFO mapreduce.JobSubmitter: number of splits:1
20/03/04 20:45:31 INFO mapreduce.JobSubmitter: Submitting tokens for job: job_1583308872072_0004
20/03/04 20:45:31 INFO impl.YarnClientImpl: Submitted application application_1583308872072_0004
20/03/04 20:45:31 INFO mapreduce.Job: The url to track the job: http://caoandeMacBook-Pro.local:8088/proxy/application_1583308872072_0004/
20/03/04 20:45:31 INFO mapreduce.Job: Running job: job_1583308872072_0004
20/03/04 20:45:36 INFO mapreduce.Job: Job job_1583308872072_0004 running in uber mode : false
20/03/04 20:45:36 INFO mapreduce.Job:  map 0% reduce 0%
20/03/04 20:45:40 INFO mapreduce.Job:  map 100% reduce 0%
20/03/04 20:45:45 INFO mapreduce.Job:  map 100% reduce 100%
20/03/04 20:45:45 INFO mapreduce.Job: Job job_1583308872072_0004 completed successfully
20/03/04 20:45:45 INFO mapreduce.Job: Counters: 49
	File System Counters
		FILE: Number of bytes read=62
		FILE: Number of bytes written=316047
		FILE: Number of read operations=0
		FILE: Number of large read operations=0
		FILE: Number of write operations=0
		HDFS: Number of bytes read=153
		HDFS: Number of bytes written=44
		HDFS: Number of read operations=6
		HDFS: Number of large read operations=0
		HDFS: Number of write operations=2
	Job Counters
		Launched map tasks=1
		Launched reduce tasks=1
		Data-local map tasks=1
		Total time spent by all maps in occupied slots (ms)=1956
		Total time spent by all reduces in occupied slots (ms)=1841
		Total time spent by all map tasks (ms)=1956
		Total time spent by all reduce tasks (ms)=1841
		Total vcore-milliseconds taken by all map tasks=1956
		Total vcore-milliseconds taken by all reduce tasks=1841
		Total megabyte-milliseconds taken by all map tasks=2002944
		Total megabyte-milliseconds taken by all reduce tasks=1885184
	Map-Reduce Framework
		Map input records=3
		Map output records=3
		Map output bytes=50
		Map output materialized bytes=62
		Input split bytes=115
		Combine input records=3
		Combine output records=3
		Reduce input groups=3
		Reduce shuffle bytes=62
		Reduce input records=3
		Reduce output records=3
		Spilled Records=6
		Shuffled Maps =1
		Failed Shuffles=0
		Merged Map outputs=1
		GC time elapsed (ms)=69
		CPU time spent (ms)=0
		Physical memory (bytes) snapshot=0
		Virtual memory (bytes) snapshot=0
		Total committed heap usage (bytes)=350748672
	Shuffle Errors
		BAD_ID=0
		CONNECTION=0
		IO_ERROR=0
		WRONG_LENGTH=0
		WRONG_MAP=0
		WRONG_REDUCE=0
	File Input Format Counters
		Bytes Read=38
	File Output Format Counters
		Bytes Written=44

查看结果

caoan@caoandeMacBook-Pro:~/Downloads/hadoop-2.8.4/share/hadoop/mapreduce$ hdfs dfs -cat /wordCountOutput/*
20/03/04 20:47:35 WARN util.NativeCodeLoader: Unable to load native-hadoop library for your platform... using builtin-java classes where applicable
Hello,Hadoop!	1
Hello,world!	1
Hello,you!	1