Windows+Hadoop 伪分布式搭建
菜鸡程序员准备搭建伪分布式啦!
下载安装包
hadoop安装包: http://hadoop.apache.org/releases.html (试验下载了hadoop-3.1.3)
hadoopWindows脚本:https://github.com/steveloughran/winutils (试验使用了hadoop-3.0.0)
没办法…没找到两遍都一致的版本,先凑合试一试
不一样的版本尝试失败…启动hdfs报错
后期验证还是要找一样版本的Windows脚本
https://github.com/s911415/apache-hadoop-3.1.0-winutils
部署环境
HADOOP_HOME=F:\hadoop\hadoop-3.1.3
Path:%HADOOP_HOME%\bin
解压准备文件
此处解压需要以管理员的身份
此处附一个小教程
开始—WinRAR—更多—以管理员身份运行
找到需要解压的压缩包即可
将winutils-master中对应hadoop版本的bin完全替换掉hadoop安装版本下的bin
配置hadoop
配置文件在etc下
添加配置如下
core-site.xml
<?xml version="1.0" encoding="UTF-8"?>
<?xml-stylesheet type="text/xsl" href="configuration.xsl"?>
<configuration>
<property>
<name>fs.defaultFS</name>
<value>hdfs://localhost:9000</value>
</property>
<!-- 配置Hadoop临时目录文件 -->
<property>
<name>hadoop.tmp.dir</name>
<value>file:///F:/hadoop/hadoop-3.1.3/data/tmp</value>
</property>
</configuration>
hdfs-site.xml
<?xml version="1.0" encoding="UTF-8"?>
<?xml-stylesheet type="text/xsl" href="configuration.xsl"?>
<configuration>
<property>
<name>dfs.replication</name>
<value>1</value>
</property>
<!-- 配置Secondary NameNode的IP地址及端口(HTTP) -->
<property>
<name>dfs.namenode.secondary.http-address</name>
<value>localhost:50090</value>
</property>
<!-- 配置Secondary NameNode的IP地址及端口(HTTPS) -->
<property>
<name>dfs.namenode.secondary.https-address</name>
<value>localhost:50091</value>
</property>
<property>
<name>dfs.namenode.name.dir</name>
<value>file:///F:/hadoop/hadoop-3.1.3/data/dfs/namenode</value>
</property>
<property>
<name>dfs.datanode.data.dir</name>
<value>file:///F:/hadoop/hadoop-3.1.3/data/dfs/datanode</value>
</property>
</configuration>
mapred-site.xml
<?xml version="1.0"?>
<?xml-stylesheet type="text/xsl" href="configuration.xsl"?>
<configuration>
<property>
<name>mapreduce.framework.name</name>
<value>yarn</value>
</property>
</configuration>
yarn-site.xml
<?xml version="1.0"?>
<?xml-stylesheet type="text/xsl" href="configuration.xsl"?>
<configuration>
<property>
<name>yarn.nodemanager.aux-services</name>
<value>mapreduce_shuffle</value>
</property>
<property>
<name>yarn.nodemanager.aux-services.mapreduce.shuffle.class</name>
<value>org.apache.hadoop.mapred.ShuffleHandler</value>
</property>
<property>
<name>yarn.application.classpath</name>
<value>
%HADOOP_HOME%\etc\hadoop,
%HADOOP_HOME%\share\hadoop\common\*,
%HADOOP_HOME%\share\hadoop\common\lib\*,
%HADOOP_HOME%\share\hadoop\mapreduce\*,
%HADOOP_HOME%\share\hadoop\mapreduce\lib\*,
%HADOOP_HOME%\share\hadoop\hdfs\*,
%HADOOP_HOME%\share\hadoop\hdfs\lib\*,
%HADOOP_HOME%\share\hadoop\yarn\*,
%HADOOP_HOME%\share\hadoop\yarn\lib\*
</value>
</property>
</configuration>
初始化hdfs
没有配置环境变量必须进入到hadoop安装目录的bin目录下去执行该命令
F:\hadoop\hadoop-3.1.3\sbin>hdfs namenode -format
产生这些文件说明执行成功
启动hdfs
检查9000端口是否已被绑定
F:\hadoop\hadoop-3.1.3\sbin>start-dfs
启动yarn
F:\hadoop\hadoop-3.1.3\sbin>start-yarn
没想到启动yarn时一直报下面的错
坚持不懈的寻找下终于找到了解决方法
★,°:.☆( ̄▽ ̄)/$:.°★ 。
复制 “hadoop-yarn-server-timelineservice-3.1.3” 从 ~\hadoop-3.1.3\share\hadoop\yarn\timelineservice 到 ~\hadoop-3.0.3\share\hadoop\yarn 下.
2019-10-30 11:26:10,237 FATAL resourcemanager.ResourceManager: Error starting ResourceManager
java.lang.NoClassDefFoundError: org/apache/hadoop/yarn/server/timelineservice/collector/TimelineCollectorManager
at java.lang.ClassLoader.defineClass1(Native Method)
at java.lang.ClassLoader.defineClass(ClassLoader.java:763)
at java.security.SecureClassLoader.defineClass(SecureClassLoader.java:142)
at java.net.URLClassLoader.defineClass(URLClassLoader.java:467)
at java.net.URLClassLoader.access$100(URLClassLoader.java:73)
at java.net.URLClassLoader$1.run(URLClassLoader.java:368)
at java.net.URLClassLoader$1.run(URLClassLoader.java:362)
at java.security.AccessController.doPrivileged(Native Method)
at java.net.URLClassLoader.findClass(URLClassLoader.java:361)
at java.lang.ClassLoader.loadClass(ClassLoader.java:424)
at sun.misc.Launcher$AppClassLoader.loadClass(Launcher.java:349)
at java.lang.ClassLoader.loadClass(ClassLoader.java:357)
at java.lang.Class.getDeclaredMethods0(Native Method)
at java.lang.Class.privateGetDeclaredMethods(Class.java:2701)
at java.lang.Class.getDeclaredMethods(Class.java:1975)
at com.google.inject.spi.InjectionPoint.getInjectionPoints(InjectionPoint.java:688)
at com.google.inject.spi.InjectionPoint.forInstanceMethodsAndFields(InjectionPoint.java:380)
at com.google.inject.spi.InjectionPoint.forInstanceMethodsAndFields(InjectionPoint.java:399)
at com.google.inject.internal.BindingBuilder.toInstance(BindingBuilder.java:84)
at org.apache.hadoop.yarn.server.resourcemanager.webapp.RMWebApp.setup(RMWebApp.java:60)
at org.apache.hadoop.yarn.webapp.WebApp.configureServlets(WebApp.java:160)
at com.google.inject.servlet.ServletModule.configure(ServletModule.java:55)
at com.google.inject.AbstractModule.configure(AbstractModule.java:62)
at com.google.inject.spi.Elements$RecordingBinder.install(Elements.java:340)
at com.google.inject.spi.Elements.getElements(Elements.java:110)
at com.google.inject.internal.InjectorShell$Builder.build(InjectorShell.java:138)
at com.google.inject.internal.InternalInjectorCreator.build(InternalInjectorCreator.java:104)
at com.google.inject.Guice.createInjector(Guice.java:96)
at com.google.inject.Guice.createInjector(Guice.java:73)
at com.google.inject.Guice.createInjector(Guice.java:62)
at org.apache.hadoop.yarn.webapp.WebApps$Builder.build(WebApps.java:387)
at org.apache.hadoop.yarn.webapp.WebApps$Builder.start(WebApps.java:432)
at org.apache.hadoop.yarn.server.resourcemanager.ResourceManager.startWepApp(ResourceManager.java:1203)
at org.apache.hadoop.yarn.server.resourcemanager.ResourceManager.serviceStart(ResourceManager.java:1312)
at org.apache.hadoop.service.AbstractService.start(AbstractService.java:194)
at org.apache.hadoop.yarn.server.resourcemanager.ResourceManager.main(ResourceManager.java:1507)
Caused by: java.lang.ClassNotFoundException: org.apache.hadoop.yarn.server.timelineservice.collector.TimelineCollectorManager
at java.net.URLClassLoader.findClass(URLClassLoader.java:381)
at java.lang.ClassLoader.loadClass(ClassLoader.java:424)
at sun.misc.Launcher$AppClassLoader.loadClass(Launcher.java:349)
at java.lang.ClassLoader.loadClass(ClassLoader.java:357)
... 36 more
验证启动是否成功
浏览器中能成功打开就说明启动成功
hdfs监控页面:
http://localhost:50070
在打开hdfs地址的时候,并没有一次性成功,而且用其它相关端口号时发现Version Mismatch这个错误,参考谷歌上的说是由于访问端口号不对的缘故,例如当有人使用Web浏览器访问datanode URL时(打算访问Web界面).
然后在hdfs-site.xml里加上以下内容,重启之后成功了o( ̄▽ ̄)ブ
<property>
<name>dfs.http.address</name>
<value>0.0.0.0:50070</value>
</property>
yarn监控页面:
http://localhost:8042
运行测试例子
创建test.txt 里面输入
test code
# 在HDFS中创建一个文件夹
F:\hadoop\hadoop-3.1.3> hdfs dfs -mkdir /input
# 向HDFS中上传文件到input目录下
F:\hadoop\hadoop-3.1.3> hdfs dfs -copyFromLocal F:\test.txt /input
#查看文件
hadoop fs -ls /input
#运行wordcoun例子
F:\hadoop\hadoop-3.1.3>hadoop jar share/hadoop/mapreduce/hadoop-mapreduce-examples-3.1.3.jar wordcount /input/test.txt /out
其中运行例子的时候报以下错误
Exit code: 1
Exception message: CreateSymbolicLink error (1314): ???????????
Shell output: 移动了 1 个文件。
"Setting up env variables"
"Setting up job resources"
找到的解决方法是
1.win+R gpedit.msc
2. 计算机配置->windows设置->安全设置->本地策略->用户权限分配->创建符号链接。
3. 把用户添加进去,重启或者注销。
或者直接用Administrator账号执行
查看执行结果
F:\hadoop\hadoop-3.1.3>hadoop fs -ls /out
Found 2 items
-rw-r--r-- 1 Lorin supergroup 0 2019-11-15 19:07 /out/_SUCCESS
-rw-r--r-- 1 Lorin supergroup 14 2019-11-15 19:07 /out/part-r-00000
F:\hadoop\hadoop-3.1.3>hadoop fs -get /out F:/
2019-11-15 19:08:59,987 INFO sasl.SaslDataTransferClient: SASL encryption trust check: localHostTrusted = false, remoteHostTrusted = false
在F盘即可看到out文件夹
输入内容
code 1
test 1
一切都大功告成了🤣 有啥不正确的尽管提捏
参考此大佬.