在hdfs上操作:
备注:每次打开hdfs与yarn前需要将namenode格式化,否则namenode将不会开启
即:
hadoop namenode -format #先格式化,中间有时需要同意某个东西,即y同意
start-yarn.sh #start-all.sh 不推荐使用,但也可以用
start-dfs.sh
使用jps可以查看namenode和datanode等是否全部开启
hadoop fs -mkdir /input #在hdfs上创建input目录
echo "hello adu hello world"> file #创建本地文件
hadoop fs -put file /input #上传到hdfs上
可以在localhost:50070查看hdfs的中状态(页面右上侧utilities中browse the file system)
idea中pom.xml内容:
<?xml version="1.0" encoding="UTF-8"?>
<project xmlns="http://maven.apache.org/POM/4.0.0"
xmlns:xsi="http://www.w3.org/2001/XMLSchema-instance"
xsi:schemaLocation="http://maven.apache.org/POM/4.0.0 http://maven.apache.org/xsd/maven-4.0.0.xsd">
<modelVersion>4.0.0</modelVersion>
<groupId>hadoop</groupId>
<artifactId>com.adu</artifactId>
<version>1.0-SNAPSHOT</version>
<dependencies>
<dependency>
<groupId>org.apache.hadoop</groupId>
<artifactId>hadoop-client</artifactId>
<version>2.7.2</version>
</dependency>
<dependency>
<groupId>org.apache.hadoop</groupId>
<artifactId>hadoop-common</artifactId>
<version>2.7.2</version>
</dependency>
<dependency>
<groupId>org.apache.hadoop</groupId>
<artifactId>hadoop-hdfs</artifactId>
<version>2.7.2</version>
</dependency>
</dependencies>
<repositories>
<repository>
<id>apache</id>
<url>http://maven.apache.org</url>
</repository>
</repositories>
<build>
<plugins>
<plugin>
<artifactId>maven-dependency-plugin</artifactId>
<configuration>
<excludeTransitive>false</excludeTransitive>
<stripVersion>true</stripVersion>
<outputDirectory>./lib</outputDirectory>
</configuration>
</plugin>
</plugins>
</build>
</project>
map函数和reduce函数和前几章中的代码一样
点击run》》edit configurations
将program arguments改为:hdfs://localhost:9000/input/file hdfs://localhost:9000/output
其他和上一章一样,上面两个目录之间有一个空格,并且output目录自己不用创建,file文件为上传文件