hadoop安装前准备工作

最新推荐文章于 2024-06-11 17:02:20 发布

南抖北快东卫

最新推荐文章于 2024-06-11 17:02:20 发布

阅读量143

点赞数

文章标签： hadoop 大数据分布式

本文链接：https://blog.csdn.net/sun13047140038/article/details/131939593

版权

在安装好的Ubuntu系统下添加具有sudo权限的用户。

root@nodeA:~# sudo adduser zyx

Adding user `zyx' ...

Adding new group `zyx' (1001) ...

Adding new user `zyx' (1001) with group `zyx' ...

Creating home directory `/home/zyx' ...

Copying files from `/etc/skel' ...

Enter new UNIX password:

Retype new UNIX password:

passwd: password updated successfully

Changing the user information for zyx

Enter the new value, or press ENTER for the default

Full Name []: ^Cadduser: `/usr/bin/chfn zyx' exited from signal 2. Exiting.

root@nodeA:~#

root@nodeA:~# sudo usermod -G admin -a zyx

root@nodeA:~#

建立SSH无密码登陆

（1）namenode上实现无密码登陆本机

zyx@nodeA:~$ ssh-keygen -t dsa -P '' -f ~/.ssh/id_dsa

Generating public/private dsa key pair.

Created directory '/home/zyx/.ssh'.

Your identification has been saved in /home/zyx/.ssh/id_dsa.

Your public key has been saved in /home/zyx/.ssh/id_dsa.pub.

The key fingerprint is:

65:2e:e0:df:2e:61:a5:19:6a:ab:0e:38:45:a9:6a:2b zyx@nodeA

The key's randomart image is:

+--[ DSA 1024]----+

| |

| . |

| o . o |

| o . ..+. |

|. . ..S=. |

|.o o.=o |

|+.. . o... |

|E... . .. |

|.. .o. .. |

+-----------------+

zyx@nodeA:~$ cat ~/.ssh/id_dsa.pub >> ~/.ssh/authorized_keys

zyx@nodeA:~$

（2）实现namenode无密码登陆其他datanode

hadoop@nodeB:~$ scp hadoop@nodea:/home/hadoop/.ssh/id_dsa.pub /home/hadoop

hadoop@nodea's password:

id_dsa.pub 100% 602 0.6KB/s 00:00

hadoop@nodeB:~$ cat id_dsa.pub >> .ssh/authorized_keys

hadoop@nodeB:~$ sudo ufw disable

复制JDK（jdk-6u20-linux-i586.bin）文件到linux

利用F-Secure SSH File Transfer Trial 工具，直接拖拽

jdk-6u20-linux-i586.bin的安装和配置

（1）安装

zyx@nodeA:~$ ls

Examples jdk

zyx@nodeA:~$ cd jdk

zyx@nodeA:~/jdk$ ls

jdk-6u20-linux-i586.bin

zyx@nodeA:~/jdk$ chmod a+x jdk*

zyx@nodeA:~/jdk$ ./jdk*

接下来显示许可协议，然后选择yes，然后按Enter键，安装结束。

zyx@nodeA:~/jdk$ ls

jdk1.6.0_20 jdk-6u20-linux-i586.bin

（2）配置

用root@nodeA:/home/zyx# vi .bashrc 打开bashrc，然后在最后加入下面几行：

export JAVA_HOME=/home/zyx/jdk/jdk1.6.0_20

export JRE_HOME=/home/zyx/jdk/jdk1.6.0_20/jre

export CLASS_PATH=$CLASSPATH:$JAVA_HOME/lib:$JRE_HOME/lib

export PATH=$JAVA_HOME/bin:$JRE_HOME/bin:$PATH:$HOMR/bin

Hadoop的安装

下载地址：

http://labs.renren.com/apache-mirror/hadoop/core/hadoop-0.20.2/hadoop-0.20.2.tar.gz

把hadoop-0.20.2.tar.gz 放到home/zyx/hadoop下，然后解压该文件

zyx@nodeB:~/hadoop$ tar -zvxf hadoop-0.20.2.tar.gz

设置环境变量，添加到home/zyx/.bashrc

zyx@nodeA:~$ vi .bashrc

export HADOOP_HOME=/home/zyx/hadoop/hadoop-0.20.2

export PATH=$HADOOP_HOME/bin:$PATH

Hadoop的配置

在conf/hadoop-env.sh中配置java环境

export JAVA_HOME=/home/zyx/jdk/jdk/jdk1.6.0_20

配置conf/masters, slaves 文件，只需要在nodename上配置。
配置core-site.xml, hdfs-site.xml, mapred-site.xml

zyx@nodeC:~/hadoop-0.20.2/conf$ more core-site.xml

<?xml version="1.0"?>

<?xml-stylesheet type="text/xsl" href="configuration.xsl"?>

<name>fs.default.name</name>

# <value>hdfs://192.168.1.103:54310</value>

</property>

</configuration>

zyx@nodeC:~/hadoop-0.20.2/conf$ more hdfs-site.xml

<?xml version="1.0"?>

<?xml-stylesheet type="text/xsl" href="configuration.xsl"?>

<name>dfs.replication</name>

</property>

</configuration>

zyx@nodeC:~/hadoop-0.20.2/conf$ more mapred-site.xml

<?xml version="1.0"?>

<?xml-stylesheet type="text/xsl" href="configuration.xsl"?>

<name>mapred.job.tracker</name>

# <value>hdfs://192.168.1.103:54320</value>

</property>

</configuration>

Hadoop的运行

（0）格式化：

zyx@nodeC:~/hadoop-0.20.2/bin$ hadoop namenode –format

（1）用jps查看进程：

zyx@nodeC:~/hadoop-0.20.2/bin$ jps

31030 NameNode

31488 TaskTracker

31283 SecondaryNameNode

31372 JobTracker

31145 DataNode

31599 Jps

（2）查看集群状态

zyx@nodeC:~/hadoop-0.20.2/bin$ hadoop dfsadmin -report

Configured Capacity: 304716488704 (283.79 GB)

Present Capacity: 270065557519 (251.52 GB)

DFS Remaining: 270065532928 (251.52 GB)

DFS Used: 24591 (24.01 KB)

DFS Used%: 0%

Under replicated blocks: 0

Blocks with corrupt replicas: 0

Missing blocks: 0

-------------------------------------------------

Datanodes available: 1 (1 total, 0 dead)

Name: 192.168.1.103:50010

Decommission Status : Normal

Configured Capacity: 304716488704 (283.79 GB)

DFS Used: 24591 (24.01 KB)

Non DFS Used: 34650931185 (32.27 GB)

DFS Remaining: 270065532928(251.52 GB)

DFS Used%: 0%

DFS Remaining%: 88.63%

Last contact: Fri Apr 23 15:39:10 CST 2010

（3）Stop 文件：

zyx@nodeC:~/hadoop-0.20.2/bin$ stop-all.sh

stopping jobtracker

localhost: stopping tasktracker

stopping namenode

localhost: stopping datanode

localhost: stopping secondarynamenode

运行一个简单JAVA 程序

先在本地磁盘建立两个文件file01和file02
[cuijj@station1 ~]$ echo "Hello cuijj bye cuijj" > file01
[cuijj@station1 ~]$ echo "Hello Hadoop Goodbye Hadoop" > file02
2）在hdfs中建立一个input目录
[cuijj@station1 ~]$ hadoop dfs -mkdir input

将file01和file02拷贝到hdfs的input目录下

zyx@nodeC:~$ hadoop dfs -copyFromLocal /home/zyx/file0* input

zyx@nodeC:~$ hadoop dfs -ls

Found 1 items

drwxr-xr-x - zyx supergroup 0 2010-04-23 16:40 /user/zyx/input

查看input目录下有没有复制成功file01和file02

zyx@nodeC:~$ hadoop dfs -ls input

Found 2 items

-rw-r--r-- 1 zyx supergroup 0 2010-04-23 16:40 /user/zyx/input/file01

-rw-r--r-- 1 zyx supergroup 0 2010-04-23 16:40 /user/zyx/input/file02

执行wordcount（确保hdfs上没有output目录）

zyx@nodeC:~/hadoop-0.20.2$ hadoop jar hadoop-0.20.2-examples.jar wordcount input output

10/04/24 09:25:10 INFO input.FileInputFormat: Total input paths to process : 2

10/04/24 09:25:11 INFO mapred.JobClient: Running job: job_201004240840_0001

10/04/24 09:25:12 INFO mapred.JobClient: map 0% reduce 0%

10/04/24 09:25:22 INFO mapred.JobClient: map 100% reduce 0%

10/04/24 09:25:34 INFO mapred.JobClient: map 100% reduce 100%

10/04/24 09:25:36 INFO mapred.JobClient: Job complete: job_201004240840_0001

10/04/24 09:25:36 INFO mapred.JobClient: Counters: 17

10/04/24 09:25:36 INFO mapred.JobClient: Job Counters

10/04/24 09:25:36 INFO mapred.JobClient: Launched reduce tasks=1

10/04/24 09:25:36 INFO mapred.JobClient: Launched map tasks=2

10/04/24 09:25:36 INFO mapred.JobClient: Data-local map tasks=2

10/04/24 09:25:36 INFO mapred.JobClient: FileSystemCounters

10/04/24 09:25:36 INFO mapred.JobClient: FILE_BYTES_READ=79

10/04/24 09:25:36 INFO mapred.JobClient: HDFS_BYTES_READ=50

10/04/24 09:25:36 INFO mapred.JobClient: FILE_BYTES_WRITTEN=228

10/04/24 09:25:36 INFO mapred.JobClient: HDFS_BYTES_WRITTEN=41

10/04/24 09:25:36 INFO mapred.JobClient: Map-Reduce Framework

10/04/24 09:25:36 INFO mapred.JobClient: Reduce input groups=5

10/04/24 09:25:36 INFO mapred.JobClient: Combine output records=6

10/04/24 09:25:36 INFO mapred.JobClient: Map input records=2

10/04/24 09:25:36 INFO mapred.JobClient: Reduce shuffle bytes=85

10/04/24 09:25:36 INFO mapred.JobClient: Reduce output records=5

10/04/24 09:25:36 INFO mapred.JobClient: Spilled Records=12

10/04/24 09:25:36 INFO mapred.JobClient: Map output bytes=82

10/04/24 09:25:36 INFO mapred.JobClient: Combine input records=8

10/04/24 09:25:36 INFO mapred.JobClient: Map output records=8

10/04/24 09:25:36 INFO mapred.JobClient: Reduce input records=6

查看运行结果

zyx@nodeC:~/hadoop-0.20.2$ hadoop fs -cat output/part-r-00000

Goodbye 1

Hadoop 2

Hello 2

bye 1

cuijj 2

MapReduce的安装
MapReduce程序的运行

11. 对于.java程序的hadoop编译：

root@nodeC:/home/zyx/hadoop-0.20.2# javac -classpath /home/zyx/hadoop-0.20.2/hadoop-0.20.2-core.jar:/home/zyx/hadoop-0.20.2/lib/commons-cli-1.2.jar -d /home/zyx/wordcount_class /home/zyx/hadoop-0.20.2/src/examples/org/apache/hadoop/examples/WordCount.java

12. 把 .class 文件生成 .jar 文件

root@nodeC:/home/zyx/wordcount_class/org/apache/hadoop/examples# jar -cvf /home/zyx/wordcount.jar /home/zyx/wordcount_class/ .

added manifest

adding: home/zyx/wordcount_class/(in = 0) (out= 0)(stored 0%)

adding: home/zyx/wordcount_class/org/(in = 0) (out= 0)(stored 0%)

adding: home/zyx/wordcount_class/org/apache/(in = 0) (out= 0)(stored 0%)

adding: home/zyx/wordcount_class/org/apache/hadoop/(in = 0) (out= 0)(stored 0%)

adding: home/zyx/wordcount_class/org/apache/hadoop/examples/(in = 0) (out= 0)(stored 0%)

adding: home/zyx/wordcount_class/org/apache/hadoop/examples/WordCount.class(in = 1911) (out= 996)(deflated 47%)

adding: home/zyx/wordcount_class/org/apache/hadoop/examples/WordCount$TokenizerMapper.class(in = 1790) (out= 765)(deflated 57%)

adding: home/zyx/wordcount_class/org/apache/hadoop/examples/WordCount$IntSumReducer.class(in = 1789) (out= 746)(deflated 58%)

adding: WordCount.class(in = 1911) (out= 996)(deflated 47%)

adding: WordCount$TokenizerMapper.class(in = 1790) (out= 765)(deflated 57%)

例子：WordCount v1.0

在深入细节之前，让我们先看一个Map/Reduce的应用示例，以便对它们的工作方式有一个初步的认识。

WordCount是一个简单的应用，它可以计算出指定数据集中每一个单词出现的次数。

这个应用适用于单机模式，伪分布式模式或完全分布式模式三种Hadoop安装方式。

源代码

	WordCount.java
1.	package org.myorg;
2.
3.	import java.io.IOException;
4.	import java.util.*;
5.
6.	import org.apache.hadoop.fs.Path;
7.	import org.apache.hadoop.conf.*;
8.	import org.apache.hadoop.io.*;
9.	import org.apache.hadoop.mapred.*;
10.	import org.apache.hadoop.util.*;
11.
12.	public class WordCount {
13.
14.	public static class Map extends MapReduceBase implements Mapper<LongWritable, Text, Text, IntWritable> {
15.	private final static IntWritable one = new IntWritable(1);
16.	private Text word = new Text();
17.
18.	public void map(LongWritable key, Text value, OutputCollector<Text, IntWritable> output, Reporter reporter) throws IOException {
19.	String line = value.toString();
20.	StringTokenizer tokenizer = new StringTokenizer(line);
21.	while (tokenizer.hasMoreTokens()) {
22.	word.set(tokenizer.nextToken());
23.	output.collect(word, one);
24.	}
25.	}
26.	}
27.
28.	public static class Reduce extends MapReduceBase implements Reducer<Text, IntWritable, Text, IntWritable> {
29.	public void reduce(Text key, Iterator<IntWritable> values, OutputCollector<Text, IntWritable> output, Reporter reporter) throws IOException {
30.	int sum = 0;
31.	while (values.hasNext()) {
32.	sum += values.next().get();
33.	}
34.	output.collect(key, new IntWritable(sum));
35.	}
36.	}
37.
38.	public static void main(String[] args) throws Exception {
39.	JobConf conf = new JobConf(WordCount.class);
40.	conf.setJobName("wordcount");
41.
42.	conf.setOutputKeyClass(Text.class);
43.	conf.setOutputValueClass(IntWritable.class);
44.
45.	conf.setMapperClass(Map.class);
46.	conf.setCombinerClass(Reduce.class);
47.	conf.setReducerClass(Reduce.class);
48.
49.	conf.setInputFormat(TextInputFormat.class);
50.	conf.setOutputFormat(TextOutputFormat.class);
51.
52.	FileInputFormat.setInputPaths(conf, new Path(args[0]));
53.	FileOutputFormat.setOutputPath(conf, new Path(args[1]));
54.
55.	JobClient.runJob(conf);
57.	}
58.	}
59.

用法

假设环境变量HADOOP_HOME对应安装时的根目录，HADOOP_VERSION对应Hadoop的当前安装版本，编译WordCount.java来创建jar包，可如下操作：

$ mkdir wordcount_classes
$ javac -classpath ${HADOOP_HOME}/hadoop-${HADOOP_VERSION}-core.jar -d wordcount_classes WordCount.java
$ jar -cvf /usr/joe/wordcount.jar -C wordcount_classes/ .

假设：

/usr/joe/wordcount/input - 是HDFS中的输入路径
/usr/joe/wordcount/output - 是HDFS中的输出路径

用示例文本文件做为输入：

$ bin/hadoop dfs -ls /usr/joe/wordcount/input/
/usr/joe/wordcount/input/file01
/usr/joe/wordcount/input/file02

$ bin/hadoop dfs -cat /usr/joe/wordcount/input/file01
Hello World Bye World

$ bin/hadoop dfs -cat /usr/joe/wordcount/input/file02
Hello Hadoop Goodbye Hadoop

运行应用程序：

$ bin/hadoop jar /usr/joe/wordcount.jar org.myorg.WordCount /usr/joe/wordcount/input /usr/joe/wordcount/output

输出是：

$ bin/hadoop dfs -cat /usr/joe/wordcount/output/part-00000
Bye 1
Goodbye 1
Hadoop 2
Hello 2
World 2

应用程序能够使用-files选项来指定一个由逗号分隔的路径列表，这些路径是 task的当前工作目录。使用选项-libjars可以向map和reduce的 classpath中添加jar包。使用-archives选项程序可以传递档案文件做为参数，这些档案文件会被解压并且在task的当前工作目录下会创建一个指向解压生成的目录的符号链接（以压缩包的名字命名）。有关命令行选项的更多细节请参考 Commands manual。

使用-libjars和-files运行wordcount例子：
hadoop jar hadoop-examples.jar wordcount -files cachefile.txt -libjars mylib.jar input output

南抖北快东卫

关注

0
点赞
踩
0

收藏

觉得还不错? 一键收藏
打赏
0
评论
hadoop安装前准备工作

应用程序能够使用-files选项来指定一个由逗号分隔的路径列表，这些路径是 task的当前工作目录。使用-archives选项程序可以传递档案文件做为参数，这些档案文件会被解压并且在task的当前工作目录下会创建一个指向解压生成的目录的符号链接（以压缩包的名字命名）。有关命令行选项的更多细节请参考。在深入细节之前，让我们先看一个Map/Reduce的应用示例，以便对它们的工作方式有一个初步的认识。把hadoop-0.20.2.tar.gz 放到home/zyx/hadoop下，然后解压该文件。
复制链接

扫一扫