注意修改配置后需要重启服务,以下每个都是,指cmd那个黑框服务
配置java
此处略
推荐配置java8+spark2.4.0+scala2.11+hadoop3.1.1
配置spark
从官网下载好安装包解压并类似java配置环境变量SPARK_HOME和path
此处注意scala和spark版本对应如下:
https://mvnrepository.com/artifact/org.apache.spark/spark-core
配置scala
同样下载包并配置SCALA_HOME和path
配置hadoop
https://archive.apache.org/dist/hadoop/common/
下载一个版本的hadoop配置如之前 HADOOP_HOME和path
path中额外配置一个%HADOOP_HOME%\bin\winutils.exe
之后需要下载hadoop的windows对应文件并替换
网址:https://github.com/cdarlint/winutils
注意配置后在开启cmd
从此处开始配置hadoop伪分布式
在hadoop目录下的etc/hadoop里有下面文件进行修改,配置时注意把路径改为自己的
core-site.xml
<?xml version="1.0" encoding="UTF-8"?>
<?xml-stylesheet type="text/xsl" href="configuration.xsl"?>
<configuration>
<property>
<name>fs.defaultFS</name>
<value>hdfs://localhost:9000</value>
</property>
<!-- 配置Hadoop临时目录文件 -->
<property>
<name>hadoop.tmp.dir</name>
<value>file:///F:/hadoop/hadoop-3.1.1/data/tmp</value>
</property>
</configuration>
hdfs-site.xml
<?xml version="1.0" encoding="UTF-8"?>
<?xml-stylesheet type="text/xsl" href="configuration.xsl"?>
<configuration>
<property>
<name>dfs.replication</name>
<value>1</value>
</property>
<!-- 配置Secondary NameNode的IP地址及端口(HTTP) -->
<property>
<name>dfs.namenode.secondary.http-address</name>
<value>localhost:50090</value>
</property>
<!-- 配置Secondary NameNode的IP地址及端口(HTTPS) -->
<property>
<name>dfs.namenode.secondary.https-address</name>
<value>localhost:50091</value>
</property>
<property>
<name>dfs.namenode.name.dir</name>
<value>file:///F:/hadoop/hadoop-3.1.1/data/dfs/namenode</value>
</property>
<property>
<name>dfs.datanode.data.dir</name>
<value>file:///F:/hadoop/hadoop-3.1.1/data/dfs/datanode</value>
</property>
<property>
<name>dfs.http.address</name>
<value>0.0.0.0:50070</value>
</property>
</configuration>
mapred-site.xml
<?xml version="1.0"?>
<?xml-stylesheet type="text/xsl" href="configuration.xsl"?>
<configuration>
<property>
<name>mapreduce.framework.name</name>
<value>yarn</value>
</property>
</configuration>
yarn-site.xml
<?xml version="1.0"?>
<?xml-stylesheet type="text/xsl" href="configuration.xsl"?>
<configuration>
<property>
<name>yarn.nodemanager.aux-services</name>
<value>mapreduce_shuffle</value>
</property>
<property>
<name>yarn.nodemanager.aux-services.mapreduce.shuffle.class</name>
<value>org.apache.hadoop.mapred.ShuffleHandler</value>
</property>
<property>
<name>yarn.application.classpath</name>
<value>
%HADOOP_HOME%\etc\hadoop,
%HADOOP_HOME%\share\hadoop\common\*,
%HADOOP_HOME%\share\hadoop\common\lib\*,
%HADOOP_HOME%\share\hadoop\mapreduce\*,
%HADOOP_HOME%\share\hadoop\mapreduce\lib\*,
%HADOOP_HOME%\share\hadoop\hdfs\*,
%HADOOP_HOME%\share\hadoop\hdfs\lib\*,
%HADOOP_HOME%\share\hadoop\yarn\*,
%HADOOP_HOME%\share\hadoop\yarn\lib\*
</value>
</property>
</configuration>
注意之后的操作的路径,即使你配置了环境变量,我仍然建议你去路径下
初始化hdfs只需要做一次
F:\hadoop\hadoop-3.1.1\sbin>hdfs namenode -format
启动hdfs
F:\hadoop\hadoop-3.1.1\sbin>start-dfs
启动yarn
F:\hadoop\hadoop-3.1.1\sbin>start-yarn
此处如果出错尝试复制 “hadoop-yarn-server-timelineservice-3.1.1” 从 ~\hadoop-3.1.1\share\hadoop\yarn\timelineservice 到 ~\hadoop-3.1.1\share\hadoop\yarn 下. 出错的示例
2019-10-30 11:26:10,237 FATAL resourcemanager.ResourceManager: Error starting ResourceManager
java.lang.NoClassDefFoundError: org/apache/hadoop/yarn/server/timelineservice/collector/TimelineCollectorManager
at java.lang.ClassLoader.defineClass1(Native Method)
at java.lang.ClassLoader.defineClass(ClassLoader.java:763)
at org.apache.hadoop.yarn.server.resourcemanager.ResourceManager.main(ResourceManager.java:1507)
Caused by: java.lang.ClassNotFoundException: org.apache.hadoop.yarn.server.timelineservice.collector.TimelineCollectorManager
at java.net.URLClassLoader.findClass(URLClassLoader.java:381)
at java.lang.ClassLoader.loadClass(ClassLoader.java:424)
at sun.misc.Launcher$AppClassLoader.loadClass(Launcher.java:349)
at java.lang.ClassLoader.loadClass(ClassLoader.java:357)
... 36 more
页面
hdfs监控页面: http://localhost:50070
yarn监控页面: http://localhost:8042
测试
# 在HDFS中创建一个文件夹
F:\hadoop\hadoop-3.1.3> hdfs dfs -mkdir /input
向HDFS中上传文件到input目录下
如果你不是管理员方式启动的任务那么你需要创建链接,我想在解压时你已经看到了这个错误
1.win+R gpedit.msc
2. 计算机配置->windows设置->安全设置->本地策略->用户权限分配->创建符号链接。
3. 把用户添加进去,重启或者注销,直接搜索你的用户名就会出来,然后点击他并应用。
#自己创建test.txt 并输入点内容
F:\hadoop\hadoop-3.1.1> hdfs dfs -copyFromLocal F:\test.txt /input
#查看文件
hadoop fs -ls /input
#运行wordcoun例子
F:\hadoop\hadoop-3.1.1>hadoop jar share/hadoop/mapreduce/hadoop-mapreduce-examples-3.1.1.jar wordcount /input/test.txt /out
查看结果
F:\hadoop\hadoop-3.1.1>hadoop fs -ls /out
Found 2 items
-rw-r--r-- 1 Lorin supergroup 0 2019-11-15 19:07 /out/_SUCCESS
-rw-r--r-- 1 Lorin supergroup 14 2019-11-15 19:07 /out/part-r-00000
F:\hadoop\hadoop-3.1.1>hadoop fs -get /out F:/
2019-11-15 19:08:59,987 INFO sasl.SaslDataTransferClient: SASL encryption trust check: localHostTrusted = false, remoteHostTrusted = false
end
最后提醒
如果你想着把hadoop部署在云服务器,直接不用打jar包就调用服务器的hadoop运行,不好意思,不可能。我找了好久,最终得出这样一个答案,即使你完成了,那么他也不是你想的点一下idea运行按钮就能跑了