HDFS是Hadoop提供的分布式存储框架,它可以用来存储海量数据,MapReduce是Hadoop提供的分布式计算框架,它可以用来统计和分析HDFS上的海量数据,而Hive则是SQL On Hadoop,Hive提供了SQL接口,开发人员只需要编写简单易上手的SQL语句,Hive负责把SQL翻译成MapReduce,提交运行。
参考下面几位大神的博文:
https://blog.csdn.net/qq_35535690/article/details/81976032
https://blog.csdn.net/u013159040/article/details/81939662
https://www.jianshu.com/p/70bd81b2956f
JAVA_HOME
C:\Program Files\Java\jdk1.8.0_151
HADOOP_HOME
E:\Python\Hadoop\hadoop-2.7.7
环境变量path中添加
%JAVA_HOME%\bin
测试安装成功命令
hdfs namenode -format
hadoop namenode -format
start-all
停止运行的所有节点的命令为
stop-all
http://localhost:50070/
https://blog.csdn.net/shdxhsq/article/details/105590655
二. 运行案例
namenode节点不运行,可能需要重启
格式化namenode并启动hadoop
hdfs namenode -format
1创建文件夹
hadoop fs -mkdir /user
2把文件放到刚刚建好的文件夹中
hadoop fs -put “E:\Python\Hadoop\hadooptestfile\Hadoop-On-Window-master\input_file.txt” /user
hadoop fs -put E:\Python\Hadoop\file*.txt /hdfsinput
3查看待输入的文件
查看文件夹状态:hadoop fs -ls /user/
可以用下面这个命令查看上传的文件
hadoop dfs -cat /user/input_file.txt
4使用MapReduceClient计算
hadoop jar E:\Python\Hadoop\hadooptestfile\Hadoop-On-Window-master\hadoop-mapreduce-examples-2.7.7.jar wordcount /user /output_dir
hadoop jar E:\Python\Hadoop\hadooptestfile\Hadoop-On-Window-master\hadoop-hdfs-2.7.7.jar wordcount /hdfsinput /output_dir
hadoop jar E:\Python\Hadoop\hadooptestfile\Hadoop-On-Window-master\hadoop-mapreduce-examples-2.7.7.jar
运行不动时
在这文件中"E:\Python\Hadoop\hadoop-2.7.7\etc\hadoop\yarn-site.xml"
添加
<property>
<name>yarn.scheduler.minimum-allocation-mb</name>
<value>1024</value>
</property>
<property>
<name>yarn.nodemanager.resource.memory-mb</name>
<value>4096</value>
</property>
<property>
<name>yarn.nodemanager.resource.cpu-vcores</name>
<value>2</value>
</property>
#备份
<property>
<name>yarn.nodemanager.resource.memory-mb</name>
<value>20480</value>
</property>
<property>
<name>yarn.scheduler.minimum-allocation-mb</name>
<value>2048</value>
</property>
<property>
<name>yarn.nodemanager.vmem-pmem-ratio</name>
<value>2.1</value>
</property>
可以查看节点管理
http://localhost:50070/