1.安装检查 jdk的安装 略过
2.安装ssh (牛哥 提供脚本 未检测)
#!/bin/sh
cd /home/admin
rm -rfv .ssh
mk .ssh
ssh-keygen -t rsa -P '' -f ~/.ssh/id_rsa
cat ~/.ssh/id_rsa.pub >> ~/.ssh/authorized_keys
chmod go-rwx ~/.ssh/authorized_keys
3.配置ssh 通讯
4.在windowns 平台修改conf目录下的配置文件
hadoop-env.xml 配置jdk
原文件已经有此属性 放开注释即可
export JAVA_HOME=/usr/java/jdk1.6.0_20
core-site.xml 修改为
<?xml version="1.0"?> <?xml-stylesheet type="text/xsl" href="configuration.xsl"?> <!-- Put site-specific property overrides in this file. --> <configuration> <property> <name>fs.default.name</name> <value>hdfs://10.10.18.9:9000</value> </property> <property> <name>hadoop.tmp.dir</name> <value>/home/xiaoj/hadoop/tmp</value> </property> </configuration>
mapred-site.xml
<?xml version="1.0"?> <?xml-stylesheet type="text/xsl" href="configuration.xsl"?> <!-- Put site-specific property overrides in this file. --> <configuration> <property> <name>mapred.job.tracker</name> <value>http://10.10.18.9:9001</value> </property> </configuration>
hdfs-site.xml
<?xml version="1.0"?> <?xml-stylesheet type="text/xsl" href="configuration.xsl"?> <!-- Put site-specific property overrides in this file. --> <configuration> <property> <name>dfs.name.dir</name> <value>/home/xiaoj/hadoop/name</value> </property> <property> <name>dfs.data.dir</name> <value>/home/xiaoj/hadoop/data</value> </property> <property> <name>dfs.replication</name> <value>3</value> </property> <!-- Cancel the permissions check--> <property> <name>dfs.permissions</name> <value>false</value> </property> </configuration>
masters文件 设置namenode的IP
10.10.18.9
slaves 文件 设置datanode 的IP
10.10.18.7
10.10.18.8
配置文件到此修改完毕!
接下来 上传hadoop文件夹到虚拟机,在root用户下
Adduser xiaoj --创建用户xiaoj
Chown -R xiaoj:xiaoj hadoop --更改文件夹的所有者
Su xiaoj
Scp hadoop 10.10.18.9:~/--远程copy, 输入正确密码后copy完成
退出虚拟机 登录xiaoj@10.10.18.9
scp hadoop 10.10.18.7:~/
scp hadoop 10.10.18.8:~/
进入hadoop 安装目录
1. 格式化namenode
./bin/hadoop namenode -format
格式化后会在/tmp 和配置的tmp中产生一些临时文件,关机后可能会删除这个文件,所以重启后可能需要再次 格式化namenode
2. 启动hadoop
./bin/start-all.sh
启动namenode,datenode,jobtracker
[hadoop@y176 conf]$ start-all.sh
starting namenode, logging to /usr/hadoop/libexec/../logs/hadoop-hadoop-namenode-y176.out
172.19.121.163: starting datanode, logging to /usr/hadoop/libexec/../logs/hadoop-hadoop-datanode-y163.out
172.19.121.162: starting datanode, logging to /usr/hadoop/libexec/../logs/hadoop-hadoop-datanode-y162.out
172.19.121.176: starting secondarynamenode, logging to /usr/hadoop/libexec/../logs/hadoop-hadoop-secondarynamenode-y176.out
starting jobtracker, logging to /usr/hadoop/libexec/../logs/hadoop-hadoop-jobtracker-y176.out
172.19.121.163: starting tasktracker, logging to /usr/hadoop/libexec/../logs/hadoop-hadoop-tasktracker-y163.out
172.19.121.162: starting tasktracker, logging to /usr/hadoop/libexec/../logs/hadoop-hadoop-tasktracker-y162.out
[hadoop@y176 conf]$
查看文件系统
[hadoop@y176 conf]$ hadoop dfs -ls
Found 2 items
-rw-r--r-- 3 hadoop supergroup 7368962 2012-12-23 22:27 /user/hadoop/input
drwxr-xr-x - hadoop supergroup 0 2012-12-24 01:47 /user/hadoop/output
[hadoop@y176 conf]$
添加文件core-site.xml到dfs 系统中的文件夹intput下
[hadoop@y176 conf]$ hadoop dfs -put core-site.xml intput
[hadoop@y176 conf]$ hadoop dfs -ls
Found 3 items
-rw-r--r-- 3 hadoop supergroup 7368962 2012-12-23 22:27 /user/hadoop/input
-rw-r--r-- 3 hadoop supergroup 369 2012-12-24 19:08 /user/hadoop/intput
drwxr-xr-x - hadoop supergroup 0 2012-12-24 01:47 /user/hadoop/output
[hadoop@y176 conf]$
执行hadoop 自带的示例wordcount 检测系统是否能够正常运行
[hadoop@y176 hadoop]$ hadoop jar hadoop-examples-1.1.1.jar wordcount intput ouput
12/12/24 19:11:09 INFO input.FileInputFormat: Total input paths to process : 1
12/12/24 19:11:09 INFO util.NativeCodeLoader: Loaded the native-hadoop library
12/12/24 19:11:09 WARN snappy.LoadSnappy: Snappy native library not loaded
12/12/24 19:11:10 INFO mapred.JobClient: Running job: job_201212241902_0002
12/12/24 19:11:11 INFO mapred.JobClient: map 0% reduce 0%
12/12/24 19:11:21 INFO mapred.JobClient: map 100% reduce 0%
12/12/24 19:11:33 INFO mapred.JobClient: map 100% reduce 33%
12/12/24 19:11:34 INFO mapred.JobClient: map 100% reduce 100%
12/12/24 19:11:35 INFO mapred.JobClient: Job complete: job_201212241902_0002
12/12/24 19:11:35 INFO mapred.JobClient: Counters: 29
12/12/24 19:11:35 INFO mapred.JobClient: Job Counters
12/12/24 19:11:35 INFO mapred.JobClient: Launched reduce tasks=1
12/12/24 19:11:35 INFO mapred.JobClient: SLOTS_MILLIS_MAPS=8860
12/12/24 19:11:35 INFO mapred.JobClient: Total time spent by all reduces waiting after reserving slots (ms)=0
12/12/24 19:11:35 INFO mapred.JobClient: Total time spent by all maps waiting after reserving slots (ms)=0
12/12/24 19:11:35 INFO mapred.JobClient: Launched map tasks=1
12/12/24 19:11:35 INFO mapred.JobClient: Data-local map tasks=1
12/12/24 19:11:35 INFO mapred.JobClient: SLOTS_MILLIS_REDUCES=12515
12/12/24 19:11:35 INFO mapred.JobClient: File Output Format Counters
12/12/24 19:11:35 INFO mapred.JobClient: Bytes Written=372
12/12/24 19:11:35 INFO mapred.JobClient: FileSystemCounters
12/12/24 19:11:35 INFO mapred.JobClient: FILE_BYTES_READ=466
12/12/24 19:11:35 INFO mapred.JobClient: HDFS_BYTES_READ=479
12/12/24 19:11:35 INFO mapred.JobClient: FILE_BYTES_WRITTEN=48729
12/12/24 19:11:35 INFO mapred.JobClient: HDFS_BYTES_WRITTEN=372
12/12/24 19:11:35 INFO mapred.JobClient: File Input Format Counters
12/12/24 19:11:35 INFO mapred.JobClient: Bytes Read=369
12/12/24 19:11:35 INFO mapred.JobClient: Map-Reduce Framework
12/12/24 19:11:35 INFO mapred.JobClient: Map output materialized bytes=466
12/12/24 19:11:35 INFO mapred.JobClient: Map input records=14
12/12/24 19:11:35 INFO mapred.JobClient: Reduce shuffle bytes=466
12/12/24 19:11:35 INFO mapred.JobClient: Spilled Records=44
12/12/24 19:11:35 INFO mapred.JobClient: Map output bytes=447
12/12/24 19:11:35 INFO mapred.JobClient: CPU time spent (ms)=1180
12/12/24 19:11:35 INFO mapred.JobClient: Total committed heap usage (bytes)=208404480
12/12/24 19:11:35 INFO mapred.JobClient: Combine input records=24
12/12/24 19:11:35 INFO mapred.JobClient: SPLIT_RAW_BYTES=110
12/12/24 19:11:35 INFO mapred.JobClient: Reduce input records=22
12/12/24 19:11:35 INFO mapred.JobClient: Reduce input groups=22
12/12/24 19:11:35 INFO mapred.JobClient: Combine output records=22
12/12/24 19:11:35 INFO mapred.JobClient: Physical memory (bytes) snapshot=178487296
12/12/24 19:11:35 INFO mapred.JobClient: Reduce output records=22
12/12/24 19:11:35 INFO mapred.JobClient: Virtual memory (bytes) snapshot=753090560
12/12/24 19:11:35 INFO mapred.JobClient: Map output records=24
[hadoop@y176 hadoop]$
由命令可以看出,输出结果保存在ouput目录下
[hadoop@y176 hadoop]$ hadoop dfs -ls
Found 4 items
-rw-r--r-- 3 hadoop supergroup 7368962 2012-12-23 22:27 /user/hadoop/input
-rw-r--r-- 3 hadoop supergroup 369 2012-12-24 19:08 /user/hadoop/intput
drwxr-xr-x - hadoop supergroup 0 2012-12-24 19:11 /user/hadoop/ouput
drwxr-xr-x - hadoop supergroup 0 2012-12-24 01:47 /user/hadoop/output
[hadoop@y176 hadoop]$
将文件夹下载到本地
[hadoop@y176 hadoop]$ hadoop dfs -get ouput ouput
查看本地文件夹ouput 的大小
[hadoop@y176 hadoop]$ ll |grep ouput
drwxrwxr-x. 3 hadoop hadoop 4096 Dec 24 19:13 ouput
[hadoop@y176 hadoop]$
查看分析结果
[hadoop@y176 hadoop]$ cd ouput/
[hadoop@y176 ouput]$ ll
total 8
drwxrwxr-x. 3 hadoop hadoop 4096 Dec 24 19:13 _logs
-rw-rw-r--. 1 hadoop hadoop 372 Dec 24 19:13 part-r-00000
-rw-rw-r--. 1 hadoop hadoop 0 Dec 24 19:13 _SUCCESS
查看文件的前10行
[hadoop@y176 ouput]$ tail -n 10 part-r-00000
Put1
file.1
href="configuration.xsl"?>1
in1
overrides1
property1
site-specific1
this1
type="text/xsl"1
version="1.0"?>1
[hadoop@y176 ouput]$
关闭hadoop
[hadoop@y176 ~]$ stop-all.sh
stopping jobtracker
172.19.121.163: stopping tasktracker
172.19.121.162: stopping tasktracker
stopping namenode
172.19.121.162: stopping datanode
172.19.121.163: stopping datanode
172.19.121.176: stopping secondarynamenode
[hadoop@y176 ~]$ jps
8039 Jps
[hadoop@y176 ~]$
自此 hadoop 安装 完毕