1.安装
安装前准备:
装有openssh server的ubuntu14.04 系统三台(也可以准备1台,后面进行虚拟机的克隆,或者导入导出)。这儿需要三台机器在同一个网段内。
开始安装
1)启动三台虚拟机,分别修改主机名
sudo vim /etc/hostname
分别命名为:
HadoopMaster
HadoopSlave1
HadoopSlave2
ps:重启后生效
2)安装jdk(3台机器一样的安装)
这儿用的Apache的jdk
sudo add-apt-repository ppa:webupd8team/java
sudo apt update
sudo apt install oracle-java7-installer
安装好了后配置环境变量
sudo vim ~/.bashrc
加入export JAVA_HOME=JDK安装路径
通过以上方式安装的JDK路径为:/usr/lib/jvm/java-7-oracle
export PATH=$JAVA_HOME/bin:$PATH
export CLASSPATH=.:$JAVA_HOME/lib/dt.jar:$JAVA_HOME/lib/tools.jar
source ~/.bashrc (使配置生效)
3)修改hosts文件(3台机器都一样的改)
sudo vim /etc/hosts
在后面加入
10.13.7.10 HadoopMaster
10.13.7.11 HadoopSlave1
10.13.7.12 HadoopSlave2
注意:把ip地址改成自己的主机名对应的ip
4)设置ssh免密登录(三台机器同理操作)
下面指令是在10.13.7.10上输入的,自己按理改
ssh-keygen(敲回车后,会提示你输入,全部敲回车跳过)
ssh-copy-id persistence@10.13.7.10
ssh-copy-id persistence@10.13.7.11
ssh-copy-id persistence@10.13.7.12(persistence是用户名,后面加其他机器的ip)
三台机器都要做以上操作,这样可以让这三台机器互相免密ssh
5)下载hadoop2.6.0(三台机器都要做)
wget http://apache.fayea.com/hadoop/common/hadoop-2.6.0/hadoop-2.6.0.tar.gz
6)解压Hadoop并配置相关环境变量(三台机器都要做)
sudo tar -zxvf hadoop-2.6.0.tar.gz -C /usr/local(解压到/usr/local目录下)
sudo mv /usr/local/hadoop-2.6.0 /usr/local/hadoop(对文件重命名)
sudo chown -R persistence:persistence /usr/local/hadoop(修改文件所属用户和组)(这儿把persistence改成你自己的用户,以上以下同理)
/usr/local/hadoop/bin/hadoop(检查hadoop是否安装成功)
在~/.bashrc 加入以下内容(三台机器都要做)
sudo vim ~/.bashrc
export HADOOP_INSTALL=/usr/local/hadoop
export PATH=$PATH:$HADOOP_INSTALL/bin
export PATH=$PATH:$HADOOP_INSTALL/sbin
export HADOOP_MAPRED_HOME=$HADOOP_INSTALL
export HADOOP_COMMON_HOME=$HADOOP_INSTALL
export HADOOP_HDFS_HOME=$HADOOP_INSTALL
export YARN_HOME=$HADOOP_INSTALL
source ~/.bashrc
验证:输入hdfs ,如果看到提示,说明安装成功
7)创建hadoop的需要的目录(三台机器都要做)
sudo mkdir /home/hadoop
sudo chown -R persistence:persistence /home/hadoop
mkdir /home/hadoop/hadoop-2.6.0
mkdir /home/hadoop/hadoop-2.6.0/tmp
mkdir /home/hadoop/hadoop-2.6.0/dfs
mkdir /home/hadoop/hadoop-2.6.0/dfs/name
mkdir /home/hadoop/hadoop-2.6.0/dfs/data
8)修改配置文件(很重要,不要出错了)(三台机器都要做)
①
vim /usr/local/hadoop/etc/hadoop/hadoop-env.sh
加入export JAVA_HOME=/usr/lib/jvm/java-7-oracle
②
vim /usr/local/hadoop/etc/hadoop/core-site.xml
在<configuration></configuration>中加入以下内容
<property>
<name>hadoop.tmp.dir</name>
<value>/home/hadoop/hadoop-2.6.0/tmp</value>
<description>Abase for other temporary directories.</description>
</property>
<property>
<name>fs.default.name</name>
<value>hdfs://HadoopMaster:9000</value>
</property>
③
vim /usr/local/hadoop/etc/hadoop/hdfs-site.xml
在<configuration></configuration>中加入以下内容
<property>
<name>dfs.name.dir</name>
<value>/home/hadoop/hadoop-2.6.0/dfs/name</value>
<description>Path on the local filesystem where the NameNode stores the namespace and transactions logs persistently.</description>
</property>
<property>
<name>dfs.data.dir</name>
<value>/home/hadoop/hadoop-2.6.0/dfs/data</value>
<description>Comma separated list of paths on the local filesystem of a DataNode where it should store its blocks.</description>
</property>
<property>
<name>dfs.replication</name>
<value>1</value>
</property>
④
vim /usr/local/hadoop/etc/hadoop/mapred-site.xml.template
在<configuration></configuration>中加入以下内容
<property>
<name>mapred.job.tracker</name>
<value>HadoopMaster:9001</value>
<description>Host or IP and port of JobTracker.</description>
</property>
cp /usr/local/hadoop/etc/hadoop/mapred-site.xml.template /usr/local/hadoop/etc/hadoop/mapred-site.xml
⑤
vim /usr/local/hadoop/etc/hadoop/slaves
将localhost删掉,加入以下内容
HadoopSlave1
HadoopSlave2
⑥
vim /usr/local/hadoop/etc/hadoop/masters
加入以下内容
HadoopMaster
9)格式化HDFS文件系统的namenode(三台机器都要做)
cd /usr/local/hadoop && bin/hdfs namenode -format
10)启动Hadoop集群(注意:这步只在HadoopMaster上做)
/usr/local/hadoop/sbin/start-dfs.sh //这个是启动
/usr/local/hadoop/sbin/stop-dfs.sh //这个是关闭
启动完成之后执行jps查看输出
如果在Master有三个进程,Slave有两个进程,那就是启动成功了
以上就是安装配置hadoop内容。
可以通过HadoopMaster的ip:8088
和HadoopMaster的ip:50070查看hadoop信息
下面及hdfs的几个简单操作(都是在HadoopMaster上执行)
hadoop fs -mkdir /input/ -->在hadoop上创建文件夹
hadoop fs -rmdir /input/ -->在hadoop上创建文件夹
hadoop fs -ls / -->查看hadoop上的/目录下的文件
hadoop fs -rm /test.txt -->删除文件
hadoop fs -put test.txt / --> 上传文件test.txt 到hadoop/目录下
hadoop fs -get /test.txt -->从hadoop下载文件到当前目录
2.简单应用–统计单词个数
1)确保启动了hadoop集群
/usr/local/hadoop/sbin/start-dfs.sh
/usr/local/hadoop/sbin/start-yarn.sh
2)编写java代码
cd /home/hadoop && mkdir example
cd example && mkdir word_count_class jar
vim WordCount.java
内容如下
import java.io.IOException;
import java.util.StringTokenizer;
import org.apache.hadoop.conf.Configuration;
import org.apache.hadoop.fs.Path;
import org.apache.hadoop.io.IntWritable;
import org.apache.hadoop.io.LongWritable;
import org.apache.hadoop.io.Text;
import org.apache.hadoop.mapreduce.Job;
import org.apache.hadoop.mapreduce.Mapper;
import org.apache.hadoop.mapreduce.Reducer;
import org.apache.hadoop.mapreduce.lib.input.FileInputFormat;
import org.apache.hadoop.mapreduce.lib.input.TextInputFormat;
import org.apache.hadoop.mapreduce.lib.output.FileOutputFormat;
import org.apache.hadoop.mapreduce.lib.output.TextOutputFormat;
public class WordCount {
public static class WordCountMap extends
Mapper<LongWritable, Text, Text, IntWritable> {
private final IntWritable one = new IntWritable(1);
private Text word = new Text();
public void map(LongWritable key, Text value, Context context)
throws IOException, InterruptedException {
String line = value.toString();
StringTokenizer token = new StringTokenizer(line);
while (token.hasMoreTokens()) {
word.set(token.nextToken());
context.write(word, one);
}
}
}
public static class WordCountReduce extends
Reducer<Text, IntWritable, Text, IntWritable> {
public void reduce(Text key, Iterable<IntWritable> values,
Context context) throws IOException, InterruptedException {
int sum = 0;
for (IntWritable val : values) {
sum += val.get();
}
context.write(key, new IntWritable(sum));
}
}
public static void main(String[] args) throws Exception {
Configuration conf = new Configuration();
Job job = new Job(conf);
job.setJarByClass(WordCount.class);
job.setJobName("wordcount");
job.setOutputKeyClass(Text.class);
job.setOutputValueClass(IntWritable.class);
job.setMapperClass(WordCountMap.class);
job.setReducerClass(WordCountReduce.class);
job.setInputFormatClass(TextInputFormat.class);
job.setOutputFormatClass(TextOutputFormat.class);
FileInputFormat.addInputPath(job, new Path(args[0]));
FileOutputFormat.setOutputPath(job, new Path(args[1]));
job.waitForCompletion(true);
}
}
3) 下载jar包,并发在/home/hadoop/example/jar目录下
下载链接common包
下载链接mapreduce
下载到本地后,传到/home/hadoop/example/jar目录下
4)编译运行
javac -classpath /home/hadoop/example/jar/hadoop-common-2.6.0.2.2.9.9-2.jar:/home/hadoop/example/jar/hadoop-mapreduce-client-core-2.6.0.2.2.9.9-2.jar -d word_count_class WordCount.java(编译)
cd word_count_class
jar -cvf WordCount.jar *.class(打包)
cd /home/hadoop/example
自己建立两个文件命名为file1,file2.并自己在里面加入一些单词内容
hadoop fs -mkdir /input/
hadoop fs -put file* /input/
hadoop jar word_count_class/WordCount.jar WordCount /input /output
执行完毕后可以查看单词统计结果
hadoop fs -ls /output(输出的结果在这三个目录下,我们要的结果在part-r-00000中)
hadoop fs -cat /output/part-r-00000
over,thanks。