centos6安装hadoop1.2.1

1->下载hadoop-1.2.1.tar.gz

tar -zxvf hadoop-1.2.1.tar.gz  解压 这里假设解压的文件在 /root/soft

2->创建 hadoop 账户

groupadd hadoop

useradd -g haddop -d /home/hadoop

chown -R hadoop:hadoop /home/hadoop

mv /root/soft/hadoop-1.2.1 /home/hadoop/hadoop-1.2.1

3->hadoop账户免登陆设置

su - hadoop

ssh-keygen -t dsa -P '' -f ~/.ssh/id_dsa 
cat ~/.ssh/id_dsa.pub >> ~/.ssh/authorized_keys

cd .ssh && chmod 710 authorized_keys

4->安装jdk 

下载  jdk-8u40-linux-i586.rpm  

linux下安装  rpm -ivh jdk-8u40-linux-i586.rpm  

安装完成后 被安装在/usr/java/jdk1.8.0_40

设置环境变量 

/etc/profile添加

JAVA_HOME=/usr/java/jdk1.8.0_40

export PATH=$JAVA_HOME/bin:$PATH

进入hadoop根目录/conf/hadoop-env.sh 添加javahome

export JAVA_HOME=/usr/java/jdk1.8.0_40

5->设置hadoop环境变量

/etc/profile添加

HADOOP_HOME=/home/hadoop/hadoop-1.2.1
PATH=$HADOOP_HOME/bin:$PATH

进入hadoop根目录/conf

修改三个配置文件

conf/core-site.xml:

<configuration>
     <property>
         <name>fs.default.name</name>
         <value>hdfs://localhost:9000</value>
     </property>
</configuration>


conf/hdfs-site.xml:

<configuration>
     <property>
         <name>dfs.replication</name>
         <value>1</value>
     </property>
</configuration>


conf/mapred-site.xml:

<configuration>
     <property>
         <name>mapred.job.tracker</name>
         <value>localhost:9001</value>
     </property>
</configuration>

6->启动hadoop

su - hadoop

start-all.sh

或者 依次运行start-dfs.sh   start-mapred.sh 

启动完成后  查看 hdfs文件系统的目录里的文件 可以理解为是个ftp服务器

hadoop fs -lsa /

这里 运行 hadoop 根目录下的 hadoop-examples-1.2.1.jar 测试程序

hadoop jar hadoop-examples-1.2.1.jar 可以看到 这个测试程序 有多个程序

[root@localhost hadoop-1.2.1]# hadoop jar hadoop-examples-1.2.1.jar

An example program must be given as the first argument.
Valid program names are:
  aggregatewordcount: An Aggregate based map/reduce program that counts the words in the input files.
  aggregatewordhist: An Aggregate based map/reduce program that computes the histogram of the words in the input files.
  dbcount: An example job that count the pageview counts from a database.
  grep: A map/reduce program that counts the matches of a regex in the input.
  join: A job that effects a join over sorted, equally partitioned datasets
  multifilewc: A job that counts words from several files.
  pentomino: A map/reduce tile laying program to find solutions to pentomino problems.
  pi: A map/reduce program that estimates Pi using monte-carlo method.
  randomtextwriter: A map/reduce program that writes 10GB of random textual data per node.
  randomwriter: A map/reduce program that writes 10GB of random data per node.
  secondarysort: An example defining a secondary sort to the reduce.
  sleep: A job that sleeps at each map and reduce task.
  sort: A map/reduce program that sorts the data written by the random writer.
  sudoku: A sudoku solver.
  teragen: Generate data for the terasort
  terasort: Run the terasort
  teravalidate: Checking results of terasort
  wordcount: A map/reduce program that counts the words in the input files.

这里测试个wordcount  这个程序 是通过mapreduce程序 将hdfs某个目录下的文本文件 统计他里面出现的所有单词 以及单词出现的次数

首先 在hdfs文件系统上创建一个文件夹 这里的/就是我们定义的

conf/core-site.xml:

<configuration>
     <property>
         <name>fs.default.name</name>
         <value>hdfs://localhost:9000</value>
     </property>
</configuration>

当然 一般情况下要写成 hadoop fs -mkdir hdfs://localhost:9000/test 下面简便写如下

hadoop fs -mkdir /test

 hadoop fs -mkdir /test/input

下面也可以将 /test/input 改成 hdfs://localhost:9000/test/input

[hadoop@localhost hadoop-1.2.1]$ hadoop fs -put /home/hadoop/hadoop-1.2.1/conf/*.xml /test/input
[hadoop@localhost hadoop-1.2.1]$ hadoop fs -lsr /
drwxr-xr-x   - hadoop supergroup          0 2015-03-31 14:36 /test
drwxr-xr-x   - hadoop supergroup          0 2015-03-31 14:38 /test/input
-rw-r--r--   1 hadoop supergroup       7457 2015-03-31 14:38 /test/input/capacity-scheduler.xml
-rw-r--r--   1 hadoop supergroup        294 2015-03-31 14:38 /test/input/core-site.xml
-rw-r--r--   1 hadoop supergroup        327 2015-03-31 14:38 /test/input/fair-scheduler.xml
-rw-r--r--   1 hadoop supergroup       4644 2015-03-31 14:38 /test/input/hadoop-policy.xml
-rw-r--r--   1 hadoop supergroup        274 2015-03-31 14:38 /test/input/hdfs-site.xml
-rw-r--r--   1 hadoop supergroup       2033 2015-03-31 14:38 /test/input/mapred-queue-acls.xml
-rw-r--r--   1 hadoop supergroup        285 2015-03-31 14:38 /test/input/mapred-site.xml
drwxr-xr-x   - hadoop supergroup          0 2015-03-31 13:21 /tmp
drwxr-xr-x   - hadoop supergroup          0 2015-03-31 13:21 /tmp/hadoop-hadoop
drwxr-xr-x   - hadoop supergroup          0 2015-03-31 13:59 /tmp/hadoop-hadoop/mapred
drwx------   - hadoop supergroup          0 2015-03-31 13:59 /tmp/hadoop-hadoop/mapred/system
-rw-------   1 hadoop supergroup          4 2015-03-31 13:59 /tmp/hadoop-hadoop/mapred/system/jobtracker.info

运行程序 

hadoop jar hadoop-examples-1.2.1.jar wordcount /test/input /test/output

可以通过http://192.168.2.88:50070/ 查看hdfs文件情况

这里看 input为7个文件

Goto : 


Go to parent directory
NameTypeSizeReplicationBlock SizeModification TimePermissionOwnerGroup
capacity-scheduler.xmlfile7.28 KB164 MB2015-03-31 14:38rw-r--r--hadoopsupergroup
core-site.xmlfile0.29 KB164 MB2015-03-31 14:38rw-r--r--hadoopsupergroup
fair-scheduler.xmlfile0.32 KB164 MB2015-03-31 14:38rw-r--r--hadoopsupergroup
hadoop-policy.xmlfile4.54 KB164 MB2015-03-31 14:38rw-r--r--hadoopsupergroup
hdfs-site.xmlfile0.27 KB164 MB2015-03-31 14:38rw-r--r--hadoopsupergroup
mapred-queue-acls.xmlfile1.99 KB164 MB2015-03-31 14:38rw-r--r--hadoopsupergroup
mapred-site.xmlfile0.28 KB164 MB2015-03-31 14:38rw-r--r--hadoopsupergroup

可以通过http://192.168.2.88:50030/  查看mapreduce运行情况

User: hadoop
Job Name: word count
Job File: hdfs://localhost:9000/tmp/hadoop-hadoop/mapred/staging/hadoop/.staging/job_201503311441_0001/job.xml
Submit Host: localhost.localdomain
Submit Host Address: 127.0.0.1
Job-ACLs: All users are allowed
Job Setup: Successful
Status: Succeeded
Started at: Tue Mar 31 14:45:01 PDT 2015
Finished at: Tue Mar 31 14:45:40 PDT 2015
Finished in: 38sec
Job Cleanup: Successful


Kind% CompleteNum TasksPendingRunningCompleteKilledFailed/Killed
Task Attempts
map100.00%
700700 / 0
reduce100.00%
100100 / 0

这里通过 这个结果 可以理解 什么是map 什么是reduce  这里 input目录下 可以看到有 7个文件 就启动了 7个 task 也就是7个map 

就是7个线程单独统计 各自文件中的单词及其单词出现的次数 

最后 再来一次 reduce 这里可以理解为统计  7个线程 统计出来的7个结果 合并到一起 排序后 最后所有相同单词 统计到一起 就是reduce的过程

这里 map 运行的就是  tasktracker 而tasktracker 处理的文件处于 datanode中 所以一般 运行中tasktracker 和datanode处于同一台机器

而 jobtracker 可以理解为 tasktracker 的启动者 创建多少个tasktracker 都由jobtracker 调度


集群安装 参考

http://www.cnblogs.com/xia520pi/archive/2012/05/16/2503949.html

  • 0
    点赞
  • 0
    收藏
    觉得还不错? 一键收藏
  • 0
    评论
评论
添加红包

请填写红包祝福语或标题

红包个数最小为10个

红包金额最低5元

当前余额3.43前往充值 >
需支付:10.00
成就一亿技术人!
领取后你会自动成为博主和红包主的粉丝 规则
hope_wisdom
发出的红包
实付
使用余额支付
点击重新获取
扫码支付
钱包余额 0

抵扣说明:

1.余额是钱包充值的虚拟货币,按照1:1的比例进行支付金额的抵扣。
2.余额无法直接购买下载,可以购买VIP、付费专栏及课程。

余额充值