准备:在linux平台上部署单节点hadoop集群。要求安装有java和ssh。需要启动sshd 服务,请参考 sshd服务开启 。
步骤:
1、在apache官网下载hadoop distribution,并解压。
2、在etc/hadoop/hadoop-env.sh文件中做如下的编辑:
export JAVA_HOME = /usr/java/latest(此处填写jdk路径)
3、执行在shell中执行bin/hadoop。返回的是hadoop脚本的文档:
[hadoop@ip-172-199-0-15 hadoop]$ bin/hadoop
Usage: hadoop [--config confdir] [COMMAND | CLASSNAME]
CLASSNAME run the class named CLASSNAME
or
where COMMAND is one of:
fs run a generic filesystem user client
version print the version
jar <jar> run a jar file
note: please use "yarn jar" to launch
YARN applications, not this command.
checknative [-a|-h] check native hadoop and compression libraries availability
distcp <srcurl> <desturl> copy file or directories recursively
archive -archiveName NAME -p <parent path> <src>* <dest> create a hadoop archive
classpath prints the class path needed to get the
credential interact with credential providers
Hadoop jar and the required libraries
daemonlog get/set the log level for each daemon
trace view and modify Hadoop tracing settings
Most commands print help when invoked w/o parameters.
4、修改hadoop配置文件
etc/hadoop/core-site.xml:
<configuration>
<property>
<name>fs.defaultFS</name>
<value>hdfs://localhost:9000</value>
</property>
</configuration>
etc/hadoop/hdfs-site.xml:
<configuration>
<property>
<name>dfs.replication</name>
<value>1</value>
</property>
</configuration>
5、设置ssh免密码登录:
$ ssh-keygen -t rsa -P '' -f ~/.ssh/id_rsa
$ cat ~/.ssh/id_rsa.pub >> ~/.ssh/authorized_keys
$ chmod 0600 ~/.ssh/authorized_keys
测试方法:
$ ssh localhost6、格式化文件系统:
$ bin/hdfs namenode -format7、启动hadoop的namenode和datanode守护进程
$ sbin/start-dfs.sh8、应该是权限设置,说是为了执行mapreduce作业
$ bin/hdfs dfs -mkdir /user $ bin/hdfs dfs -mkdir /user/<username>9、将本地文件(etc/hadoop目录下的文件)上传到dfs
$ bin/hdfs dfs -put etc/hadoop input10、执行一个简单的例子,这个例子应该是hadoop开发人员编写的。该示例具体有啥作用尚不明,但是可以猜测执行结果保存在dfs的output目录。
$ bin/hadoop jar share/hadoop/mapreduce/hadoop-mapreduce-examples-2.7.3.jar grep input output 'dfs[a-z.]+'11、从dfs下载输出文件到本地,并查看输出结果
$ bin/hdfs dfs -get output output $ cat output/*或者直接在dfs上查看结果
$ bin/hdfs dfs -cat output/*12、使用yarn执行作业。启动ResourceManager和NodeManager守护进程。
$ sbin/start-yarn.sh13、查看ResourceManager的ui界面
访问http://localhost:8088/
14、关闭守护进程
$ sbin/stop-yarn.sh
$ sbin/stop-dfs.sh