Hadoop 3台虚拟机的小型集群

部署环境:


OS:Red Hat Enterprise Linux Client release 5.5 

JDK:JDKk1.6.0_23

Hadoop:Hadoop-0.20.2

VMWare:8.0


网络环境:

192.168.2.86  namenode   secondnamenode

192.168.4.123 datanode

192.168.4.124 datanode


配置以及安装步骤:

1. 网络环境配置

(1) 关闭所有网络主机的防火墙,service iptables stop


(2) /etc/hosts 文件内容追加

192.168.2.86  master
192.168.4.123  slave1
192.168.4.124  slave2


2. 建立ssh互信


(1) 修改 vi  /etc/ssh/sshd_config

#    $OpenBSD: sshd_config,v 1.73 2005/12/06 22:38:28 reyk Exp $

# This is the sshd server system-wide configuration file.  See
# sshd_config(5) for more information.

# This sshd was compiled with PATH=/usr/local/bin:/bin:/usr/bin

# The strategy used for options in the default sshd_config shipped with
# OpenSSH is to specify options with their default value where
# possible, but leave them commented.  Uncommented options change a
# default value.

#Port 22
#Protocol 2,1
Protocol 2
#AddressFamily any
#ListenAddress 0.0.0.0
#ListenAddress ::

# HostKey for protocol version 1
#HostKey /etc/ssh/ssh_host_key
# HostKeys for protocol version 2
#HostKey /etc/ssh/ssh_host_rsa_key
#HostKey /etc/ssh/ssh_host_dsa_key

# Lifetime and size of ephemeral version 1 server key
#KeyRegenerationInterval 1h
#ServerKeyBits 768

# Logging
# obsoletes QuietMode and FascistLogging
#SyslogFacility AUTH
SyslogFacility AUTHPRIV
#LogLevel INFO

# Authentication:

#LoginGraceTime 2m
#PermitRootLogin yes
#PermitRootLogin yes
#StrictModes yes
#MaxAuthTries 6

#RSAAuthentication yes
#PubkeyAuthentication yes
#PubkeyAuthentication no
#AuthorizedKeysFile    .ssh/authorized_keys

# For this to work you will also need host keys in /etc/ssh/ssh_known_hosts
#RhostsRSAAuthentication no
# similar for protocol version 2
#HostbasedAuthentication no
# Change to yes if you don't trust ~/.ssh/known_hosts for
# RhostsRSAAuthentication and HostbasedAuthentication
#IgnoreUserKnownHosts no
# Don't read the user's ~/.rhosts and ~/.shosts files
#IgnoreRhosts yes

# To disable tunneled clear text passwords, change to no here!
#PasswordAuthentication yes
#PermitEmptyPasswords no
#PasswordAuthentication yes
PasswordAuthentication no
AuthorizedKeysFile .ssh/authorized_keys

# Change to no to disable s/key passwords
#ChallengeResponseAuthentication yes
ChallengeResponseAuthentication no

# Kerberos options
#KerberosAuthentication no
#KerberosOrLocalPasswd yes
#KerberosTicketCleanup yes
#KerberosGetAFSToken no

# GSSAPI options
#GSSAPIAuthentication no
GSSAPIAuthentication yes
#GSSAPICleanupCredentials yes
GSSAPICleanupCredentials yes

# Set this to 'yes' to enable PAM authentication, account processing, 
# and session processing. If this is enabled, PAM authentication will 
# be allowed through the ChallengeResponseAuthentication mechanism. 
# Depending on your PAM configuration, this may bypass the setting of 
# PasswordAuthentication, PermitEmptyPasswords, and 
# "PermitRootLogin without-password". If you just want the PAM account and 
# session checks to run without PAM authentication, then enable this but set 
# ChallengeResponseAuthentication=no
#UsePAM no
UsePAM yes

# Accept locale-related environment variables
AcceptEnv LANG LC_CTYPE LC_NUMERIC LC_TIME LC_COLLATE LC_MONETARY LC_MESSAGES 
AcceptEnv LC_PAPER LC_NAME LC_ADDRESS LC_TELEPHONE LC_MEASUREMENT 
AcceptEnv LC_IDENTIFICATION LC_ALL
#AllowTcpForwarding yes
#GatewayPorts no
#X11Forwarding no
X11Forwarding yes
#X11DisplayOffset 10
#X11UseLocalhost yes
#PrintMotd yes
#PrintLastLog yes
#TCPKeepAlive yes
#UseLogin no
#UsePrivilegeSeparation yes
#PermitUserEnvironment no
#Compression delayed
#ClientAliveInterval 0
#ClientAliveCountMax 3
#ShowPatchLevel no
#UseDNS yes
#PidFile /var/run/sshd.pid
#MaxStartups 10
#PermitTunnel no
#ChrootDirectory none

# no default banner path
#Banner /some/path

# override default of no subsystems
Subsystem    sftp    /usr/libexec/openssh/sftp-server

(2)  生成公钥、私钥文件

生成密钥到用户主目录下

ssh -keygen -t dsa -P '' -f ~/.ssh/id_dsa  
cat ~/.ssh/id_dsa.pub >> ~/.ssh/authorized_keys 

将authorized_keys文件拷贝到两台slave主机

scp authorized_keys slave1:~/.ssh/  
scp authorized_keys slave2:~/.ssh/ 


(3) 开启ssh-agent服务

ssh-agent bash --login -i

ssh-add 添加密钥


【ssh-agent介绍】

ssh-agent就是一个管理私钥的代理,受管理的私钥通过ssh-add来添加,所以ssh-agent的客户端都可以共享使用这些私钥。
好处1:不用重复输入密码。
用 ssh-add 添加私钥时,如果私钥有密码的话,照例会被要求输入一次密码,在这之后ssh-agent可直接使用该私钥,无需再次密码认证。
好处2:不用到处部署私钥
假设私钥分别可以登录同一内网的主机 A 和主机 B,出于一些原因,不能直接登录 B。可以通过在 A 上部署私钥或者设置 PortForwarding 登录 B,也可以转发认证代理连接在 A 上面使用ssh-agent私钥登录 B。


配置好后可以测试下从master ssh到slaves1 slaves2 可以不用输入密码直接连上,如果是这样的话ssh互信配置成功。
[root@master ~]# ssh 192.168.4.123
Last login: Mon Jul 23 23:04:54 2012 from 192.168.2.86
[root@localhost ~]# 

Hadoop相关环境变量以及配置文件说明

1. 环境变量

可以将环境变量写入~/.bash_profile文件中

export JAVA_HOME=/usr/java/jdk1.6.0_23

export HADOOP_HOME=/root/bin/hadoop-0.20.2

export HADOOP_CLASSPATH=/user/root/build/classes

export PATH=$PATH:$JAVA_HOME/bin:$HADOOP_HOME/bin


2. 相关配置文件

(1) hadoop-env.sh 主要配置两个属性

# The java implementation to use.  Required.
export JAVA_HOME=/usr/java/jdk1.6.0_23

# Extra Java CLASSPATH elements.  Optional.
export HADOOP_CLASSPATH=/user/root/build/classes

(2) core-site.xml

<?xml version="1.0"?>
<?xml-stylesheet type="text/xsl" href="configuration.xsl"?>

<!-- Put site-specific property overrides in this file. -->

<configuration>
        <property>
          <name>fs.default.name</name>
          <value>hdfs://master:9000</value>
        </property>

        <property>
          <name>Hadoop.tmp.dir</name>
          <value>/tmp/hadoop-root</value>
        </property>
</configuration>


转载注明出处:博客园 石头儿 http://www.cnblogs.com/shitouer/


注释一:hadoop分布式文件系统文件存放位置都是基于hadoop.tmp.dir目录的,namenode的名字空间存放地方就是 ${hadoop.tmp.dir}/dfs/name, datanode数据块的存放地方就是 ${hadoop.tmp.dir}/dfs/data,所以设置好hadoop.tmp.dir目录后,其他的重要目录都是在这个目录下面,这是一个根目录。


注释二:fs.default.name,设置namenode所在主机,端口号是9000


注释三:core-site.xml 对应有一个core-default.xml, hdfs-site.xml对应有一个hdfs-default.xml,mapred-site.xml对应有一个mapred-default.xml。这三个defalult文件里面都有一些默认配置,现在我们修改这三个site文件,目的就覆盖default里面的一些配置


(3) hdfs-site.xml

<?xml version="1.0"?>
<?xml-stylesheet type="text/xsl" href="configuration.xsl"?>

<!-- Put site-specific property overrides in this file. -->

<configuration>
        <property>
          <name>dfs.replication</name>
          <value>2</value>
        </property>
</configuration>

dfs.replication,设置数据块的复制次数,默认是3,如果slave节点数少于3,则写成相应的1或者2


(3) mapred-site.xml 

<?xml version="1.0"?>
<?xml-stylesheet type="text/xsl" href="configuration.xsl"?>

<!-- Put site-specific property overrides in this file. -->

<configuration>
        <property>
          <name>mapred.job.tracker</name>
          <value>http://master:9001</value>
        </property>
</configuration>

mapred.job.tracker,设置jobtracker所在机器,端口号9001


(4) master

192.168.2.86 master


(5) slaves

192.168.4.123 slave1
192.168.4.124 slave2

将已经配置好的hadoop-0.20.2,分别拷贝到另外两台主机,并做相同配置。此时,hadoop的集群配置已经完成,输入hadoop,则可看到hadoop相关的操作。


Map/Reduce程序

1. Map

public class MaxTemperatureMapper extends
		Mapper<LongWritable, Text, Text, IntWritable> {

	private static final int MISSIGN = 9999;
	
	@Override
	public void map(LongWritable key, Text value, org.apache.hadoop.mapreduce.Mapper<LongWritable,Text,Text,IntWritable>.Context context) throws IOException ,InterruptedException {
		String line = value.toString();
		String year = line.substring(15, 19);

		int airTemperator = 0;
		if (line.charAt(87) == '+') {
			airTemperator = Integer.parseInt(line.substring(88, 92));
		} else {
			airTemperator = Integer.parseInt(line.substring(87, 92));
		}
		String quality = line.substring(92, 93);
		if (airTemperator < MISSIGN && quality.matches("[01459]")) {
			context.write(new Text(year), new IntWritable(airTemperator));
		}
		
	};

}

2. Reduce

public class MaxTemperatureReducer extends
		Reducer<Text, IntWritable, Text, IntWritable> {

	@Override
	public void reduce(Text key, Iterable<IntWritable> values, Context context)
			throws IOException, InterruptedException {
		int maxValue = Integer.MIN_VALUE;
		for (IntWritable value : values) {
			maxValue = Math.max(maxValue, value.get());
		}
		context.write(key, new IntWritable(maxValue));
	};

}

3. Client

public class MaxTemperature {
	
	public static void main(String[] args) throws Exception {
		if (args.length != 2) {
			System.out.println("Usage MaxTemperator <input path> <output path>");
			System.exit(-1);
		}
		Job job = new Job();
		
		job.setJarByClass(MaxTemperature.class);
		
		job.setMapperClass(MaxTemperatureMapper.class);
		job.setReducerClass(MaxTemperatureReducer.class);
		
		job.setOutputKeyClass(Text.class);
		job.setOutputValueClass(IntWritable.class);
		
		FileInputFormat.addInputPath(job, new Path(args[0]));
		FileOutputFormat.setOutputPath(job, new Path(args[1]));
		
		System.exit(job.waitForCompletion(true) ? 0 : 1);
	}
}


4. 运行

将上述程序达成jar包放到$HADOOP_CLASSPATH路径下。

然后执行 hadoop jar test.jar com.tx.weahter.MaxTemerature input/sample.txt output






评论
添加红包

请填写红包祝福语或标题

红包个数最小为10个

红包金额最低5元

当前余额3.43前往充值 >
需支付:10.00
成就一亿技术人!
领取后你会自动成为博主和红包主的粉丝 规则
hope_wisdom
发出的红包
实付
使用余额支付
点击重新获取
扫码支付
钱包余额 0

抵扣说明:

1.余额是钱包充值的虚拟货币,按照1:1的比例进行支付金额的抵扣。
2.余额无法直接购买下载,可以购买VIP、付费专栏及课程。

余额充值