很久之前的草稿,一直放着没理,后面忙就没继续了,最近被问起来就发出来,写的不怎么好。
1.下载hadoop
http://hadoop.apache.org/releases.html下载地址版本自行选择
也可以在这里下载http://download.csdn.net/detail/jack5261314/6896011 -->hadoop-1.2.1-bin.tar.gz
2.安装
*使用hadoop的前提是安装了sun_jdk,不是open_jdk,openjdk安装软件是需要他的支持,但是是对于开发不适合。
2.1)卸载open_jdk
[root@localhost ~]# rpm -qa|grep jdk
java-1.6.0-openjdk-1.6.0.0-1.50.1.11.5.el6_3.x86_64
java-1.7.0-openjdk-1.7.0.9-2.3.4.1.el6_3.x86_64
[root@localhost ~]# rpm -qa|grep gcj
java-1.4.2-gcj-compat-1.4.2.0-40jpp.115
libgcj-4.1.2-48.el5
[root@localhost ~]# yum -y remove java java-1.6.0-openjdk-1.6.0.0-1.50.1.11.5.el6_3.x86_64
[root@localhost ~]# yum -y remove java java-1.7.0-openjdk-1.7.0.9-2.3.4.1.el6_3.x86_64
[root@localhost ~]# yum -y remove java java-1.4.2-gcj-compat-1.4.2.0-40jpp.115
[root@localhost ~]# yum -y remove libgcj-4.1.2-48.el5
java -version 没有东西了,Eclipse也没了。
2.2) 安装sun_jdk
rpm -ivh jdk-7-linux-x64.rpm 推荐使用rpm包
JDK默认安装在/usr/java中。
2.3)配置sun_jdk
vim /etc/profile
export JAVA_HOME=/usr/java/jdk1.8.0_45
export CLSSPATH=.:$JAVA_HOME/lib/dt.jar:$JAVA_HOME/lib/tools.jar
export PATH=$PATH:$JAVA_HOME/bin
2.4)安装hadoop
tar -xzf hadoop-1.2.1-bin.tar.gz -C /usr/local 解压
ln -s /usr/local/hadoop-1.2.1 /opt/hadoop 添加方便使用路径的软连接
vim /etc/profile
export HADOOP_HOME=/opt/hadoop
export JAVA_HOME=/usr/java/jdk1.8.0_45
export CLSSPATH=.:$JAVA_HOME/lib/dt.jar:$JAVA_HOME/lib/tools.jar
export PATH=$PATH:$HADOOP_HOME/bin:$JAVA_HOME/bin
source /etc/profile 使配置生效
上面基本安装完成,但是会发现有个warning 不过无关紧要,不妨碍编程。<span style="color:#383838;">[root@kong Desktop]# java -version java version "1.8.0_45" Java(TM) SE Runtime Environment (build 1.8.0_45-b14) Java HotSpot(TM) 64-Bit Server VM (build 25.45-b02, mixed mode) [root@kong Desktop]# hadoop </span><span style="color:#ff6666;">Warning: $HADOOP_HOME is deprecated.</span><span style="color:#383838;"> Usage: hadoop [--config confdir] COMMAND where COMMAND is one of: namenode -format format the DFS filesystem secondarynamenode run the DFS secondary namenode namenode run the DFS namenode datanode run a DFS datanode dfsadmin run a DFS admin client mradmin run a Map-Reduce admin client fsck run a DFS filesystem checking utility fs run a generic filesystem user client balancer run a cluster balancing utility oiv apply the offline fsimage viewer to an fsimage fetchdt fetch a delegation token from the NameNode </span>
如果想去掉也可以。vim /etc/profilesource /etc/profileexport HADOOP_HOME=/usr/local/hadoop-1.2.1 export HADOOP_HOME_WARN_SUPPRESS=1 -->解决关键 export JAVA_HOME=/usr/java/jdk1.8.0_45 export CLSSPATH=.:$JAVA_HOME/lib/dt.jar:$JAVA_HOME/lib/tools.jar export PATH=$PATH:$HADOOP_HOME/bin:$JAVA_HOME/bin
warning 去掉了[root@kong Desktop]# hadoop Usage: hadoop [--config confdir] COMMAND where COMMAND is one of: namenode -format format the DFS filesystem secondarynamenode run the DFS secondary namenode namenode run the DFS namenode datanode run a DFS datanode dfsadmin run a DFS admin client mradmin run a Map-Reduce admin cli
3.简单测试
3.1)计算圆周率
hadoop jar /opt/hadoop/hadoop-examples-1.2.1.jar pi 4 1000
这个作业是建了4个任务来完成圆周率的计算。简单的mapreduce 作业。
[root@localhost home]# hadoop jar /opt/hadoop/hadoop-examples-1.2.1.jar pi 4 1000 Number of Maps = 4 Samples per Map = 1000 15/05/17 21:49:45 INFO util.NativeCodeLoader: Loaded the native-hadoop library Wrote input for Map #0 Wrote input for Map #1 Wrote input for Map #2 Wrote input for Map #3 Starting Job 15/05/17 21:49:45 INFO mapred.FileInputFormat: Total input paths to process : 4 15/05/17 21:49:46 INFO mapred.JobClient: Running job: job_local630184568_0001 15/05/17 21:49:46 INFO mapred.LocalJobRunner: Waiting for map tasks 15/05/17 21:49:46 INFO mapred.LocalJobRunner: Starting task: attempt_local630184568_0001_m_000000_0 15/05/17 21:49:46 INFO util.ProcessTree: setsid exited with exit code 0 15/05/17 21:49:46 INFO mapred.Task: Using ResourceCalculatorPlugin : org.apache.hadoop.util.LinuxResourceCalculatorPlugin@45f1cdad 15/05/17 21:49:46 INFO mapred.MapTask: Processing split: file:/home/PiEstimator_TMP_3_141592654/in/part0:0+118 15/05/17 21:49:46 INFO mapred.MapTask: numReduceTasks: 1 15/05/17 21:49:46 INFO mapred.MapTask: io.sort.mb = 100 15/05/17 21:49:46 INFO mapred.MapTask: data buffer = 79691776/99614720 15/05/17 21:49:46 INFO mapred.MapTask: record buffer = 262144/327680 15/05/17 21:49:46 INFO mapred.MapTask: Starting flush of map output 15/05/17 21:49:46 INFO mapred.MapTask: Finished spill 0 15/05/17 21:49:46 INFO mapred.Task: Task:attempt_local630184568_0001_m_000000_0 is done. And is in the process of commiting 15/05/17 21:49:46 INFO mapred.LocalJobRunner: Generated 1000 samples. 15/05/17 21:49:46 INFO mapred.Task: Task 'attempt_local630184568_0001_m_000000_0' done. 15/05/17 21:49:46 INFO mapred.LocalJobRunner: Finishing task: attempt_local630184568_0001_m_000000_0 15/05/17 21:49:46 INFO mapred.LocalJobRunner: Starting task: attempt_local630184568_0001_m_000001_0 15/05/17 21:49:46 INFO mapred.Task: Using ResourceCalculatorPlugin : org.apache.hadoop.util.LinuxResourceCalculatorPlugin@1abdca6a 15/05/17 21:49:46 INFO mapred.MapTask: Processing split: file:/home/PiEstimator_TMP_3_141592654/in/part1:0+118 15/05/17 21:49:46 INFO mapred.MapTask: numReduceTasks: 1 15/05/17 21:49:46 INFO mapred.MapTask: io.sort.mb = 100 15/05/17 21:49:47 INFO mapred.JobClient: map 25% reduce 0% 15/05/17 21:49:47 INFO mapred.MapTask: data buffer = 79691776/99614720 15/05/17 21:49:47 INFO mapred.MapTask: record buffer = 262144/327680 15/05/17 21:49:47 INFO mapred.MapTask: Starting flush of map output 15/05/17 21:49:47 INFO mapred.MapTask: Finished spill 0 15/05/17 21:49:47 INFO mapred.Task: Task:attempt_local630184568_0001_m_000001_0 is done. And is in the process of commiting 15/05/17 21:49:47 INFO mapred.LocalJobRunner: Generated 1000 samples. 15/05/17 21:49:47 INFO mapred.Task: Task 'attempt_local630184568_0001_m_000001_0' done. 15/05/17 21:49:47 INFO mapred.LocalJobRunner: Finishing task: attempt_local630184568_0001_m_000001_0 15/05/17 21:49:47 INFO mapred.LocalJobRunner: Starting task: attempt_local630184568_0001_m_000002_0 15/05/17 21:49:47 INFO mapred.Task: Using ResourceCalculatorPlugin : org.apache.hadoop.util.LinuxResourceCalculatorPlugin@142af1bd 15/05/17 21:49:47 INFO mapred.MapTask: Processing split: file:/home/PiEstimator_TMP_3_141592654/in/part2:0+118 15/05/17 21:49:47 INFO mapred.MapTask: numReduceTasks: 1 15/05/17 21:49:47 INFO mapred.MapTask: io.sort.mb = 100 15/05/17 21:49:48 INFO mapred.JobClient: map 50% reduce 0% 15/05/17 21:49:49 INFO mapred.MapTask: data buffer = 79691776/99614720 15/05/17 21:49:49 INFO mapred.MapTask: record buffer = 262144/327680 15/05/17 21:49:49 INFO mapred.MapTask: Starting flush of map output 15/05/17 21:49:49 INFO mapred.MapTask: Finished spill 0 15/05/17 21:49:49 INFO mapred.Task: Task:attempt_local630184568_0001_m_000002_0 is done. And is in the process of commiting 15/05/17 21:49:49 INFO mapred.LocalJobRunner: Generated 1000 samples. 15/05/17 21:49:49 INFO mapred.Task: Task 'attempt_local630184568_0001_m_000002_0' done. 15/05/17 21:49:49 INFO mapred.LocalJobRunner: Finishing task: attempt_local630184568_0001_m_000002_0 15/05/17 21:49:49 INFO mapred.LocalJobRunner: Starting task: attempt_local630184568_0001_m_000003_0 15/05/17 21:49:49 INFO mapred.Task: Using ResourceCalculatorPlugin : org.apache.hadoop.util.LinuxResourceCalculatorPlugin@2ff0a79d 15/05/17 21:49:49 INFO mapred.MapTask: Processing split: file:/home/PiEstimator_TMP_3_141592654/in/part3:0+118 15/05/17 21:49:49 INFO mapred.MapTask: numReduceTasks: 1 15/05/17 21:49:49 INFO mapred.MapTask: io.sort.mb = 100 15/05/17 21:49:50 INFO mapred.JobClient: map 75% reduce 0% 15/05/17 21:49:50 INFO mapred.MapTask: data buffer = 79691776/99614720 15/05/17 21:49:50 INFO mapred.MapTask: record buffer = 262144/327680 15/05/17 21:49:50 INFO mapred.MapTask: Starting flush of map output 15/05/17 21:49:50 INFO mapred.MapTask: Finished spill 0 15/05/17 21:49:50 INFO mapred.Task: Task:attempt_local630184568_0001_m_000003_0 is done. And is in the process of commiting 15/05/17 21:49:50 INFO mapred.LocalJobRunner: Generated 1000 samples. 15/05/17 21:49:50 INFO mapred.Task: Task 'attempt_local630184568_0001_m_000003_0' done. 15/05/17 21:49:50 INFO mapred.LocalJobRunner: Finishing task: attempt_local630184568_0001_m_000003_0 15/05/17 21:49:50 INFO mapred.LocalJobRunner: Map task executor complete. 15/05/17 21:49:50 INFO mapred.Task: Using ResourceCalculatorPlugin : org.apache.hadoop.util.LinuxResourceCalculatorPlugin@47588226 15/05/17 21:49:50 INFO mapred.LocalJobRunner: 15/05/17 21:49:50 INFO mapred.Merger: Merging 4 sorted segments 15/05/17 21:49:50 INFO mapred.Merger: Down to the last merge-pass, with 4 segments left of total size: 96 bytes 15/05/17 21:49:50 INFO mapred.LocalJobRunner: 15/05/17 21:49:50 INFO mapred.Task: Task:attempt_local630184568_0001_r_000000_0 is done. And is in the process of commiting 15/05/17 21:49:50 INFO mapred.LocalJobRunner: 15/05/17 21:49:50 INFO mapred.Task: Task attempt_local630184568_0001_r_000000_0 is allowed to commit now 15/05/17 21:49:50 INFO mapred.FileOutputCommitter: Saved output of task 'attempt_local630184568_0001_r_000000_0' to file:/home/PiEstimator_TMP_3_141592654/out 15/05/17 21:49:50 INFO mapred.LocalJobRunner: reduce > reduce 15/05/17 21:49:50 INFO mapred.Task: Task 'attempt_local630184568_0001_r_000000_0' done. 15/05/17 21:49:51 INFO mapred.JobClient: map 100% reduce 100% 15/05/17 21:49:51 INFO mapred.JobClient: Job complete: job_local630184568_0001 15/05/17 21:49:51 INFO mapred.JobClient: Counters: 21 15/05/17 21:49:51 INFO mapred.JobClient: Map-Reduce Framework 15/05/17 21:49:51 INFO mapred.JobClient: Spilled Records=16 15/05/17 21:49:51 INFO mapred.JobClient: Map output materialized bytes=112 15/05/17 21:49:51 INFO mapred.JobClient: Reduce input records=8 15/05/17 21:49:51 INFO mapred.JobClient: Virtual memory (bytes) snapshot=0 15/05/17 21:49:51 INFO mapred.JobClient: Map input records=4 15/05/17 21:49:51 INFO mapred.JobClient: SPLIT_RAW_BYTES=400 15/05/17 21:49:51 INFO mapred.JobClient: Map output bytes=72 15/05/17 21:49:51 INFO mapred.JobClient: Reduce shuffle bytes=0 15/05/17 21:49:51 INFO mapred.JobClient: Physical memory (bytes) snapshot=0 15/05/17 21:49:51 INFO mapred.JobClient: Map input bytes=96 15/05/17 21:49:51 INFO mapred.JobClient: Reduce input groups=8 15/05/17 21:49:51 INFO mapred.JobClient: Combine output records=0 15/05/17 21:49:51 INFO mapred.JobClient: Reduce output records=0 15/05/17 21:49:51 INFO mapred.JobClient: Map output records=8 15/05/17 21:49:51 INFO mapred.JobClient: Combine input records=0 15/05/17 21:49:51 INFO mapred.JobClient: CPU time spent (ms)=0 15/05/17 21:49:51 INFO mapred.JobClient: Total committed heap usage (bytes)=1698168832 15/05/17 21:49:51 INFO mapred.JobClient: File Input Format Counters 15/05/17 21:49:51 INFO mapred.JobClient: Bytes Read=520 15/05/17 21:49:51 INFO mapred.JobClient: FileSystemCounters 15/05/17 21:49:51 INFO mapred.JobClient: FILE_BYTES_WRITTEN=981494 15/05/17 21:49:51 INFO mapred.JobClient: FILE_BYTES_READ=721813 15/05/17 21:49:51 INFO mapred.JobClient: File Output Format Counters 15/05/17 21:49:51 INFO mapred.JobClient: Bytes Written=109 Job Finished in 5.846 seconds Estimated value of Pi is 3.14000000000000000000
3.2)注意事项运行时要保持网络的通常,楼主我曾在一个host-only的虚拟网络中想连接几台虚拟机做个分布式,网络是不通外网的。结果出错,所以要保持网络通畅
centos fedora rehat 一样可以