Hadoop2.8.0 单机搭建和eclipse开发应用配置 新手笔记

记录一下这两天 Hadoop 搭建成功的经验,分享给大家以便让小伙伴们少走弯路。

1. Hadoop 伪分布式单机搭建

1.1 环境准备

新建一台VMWare 虚拟机

操作系统:RedHat EL 6.2   64bit

网络:NAT模式  

配置IP:192.168.182.140

主机名:hadoop1

1.2 下载安装

(1)JDK1.8

下载之前,使用命令 java -version 判断是否系统自带了。

我的自带的是比较旧的1.6,用命令查看和卸载:

#查看jdk安装命令
>rpm -qa|grep jdk
xxx-openjdk-yyyy

#卸载jdk命令
>rpm -e --nodeps xxx-openjdk-yyyy

#再次查看是否卸载成功
>java -version

(2)hadoop-2.8.0.tar.gz

下载地址:http://hadoop.apache.org/releases.html 

Hadoop 下载地址


1.3  Hadoop 部署步骤

参考文章:http://blog.csdn.net/uq_jin/article/details/51451995

注意,我这里稍微有点不同就是环境变量:

#环境变量设置
>vi /etc/profile
....省略....
JAVA_HOME=/software/jdk1.8.0_131
JRE_HOME=$JAVA_HOME/jre
HADOOP_HOME=/software/hadoop-2.8.0
PATH=$PATH:$JAVA_HOME/bin:$HADOOP_HOME/bin:$HADOOP_HOME/sbin
CLASS_PATH=.:$JAVA_HOME/jre/lib/rt.jar:$JAVA_HOME/lib/dt.jar:$JAVA_HOME/lib/tools.jar:$JRE_HOME:$HADOOP_HOME/etc/hadoop:$HADOOP_HOME/share/hadoop/common/lib/*:/$HADOOP_HOME/share/hadoop/common/*:$HADOOP_HOME/share/hadoop/hdfs:$HADOOP_HOME/share/hadoop/hdfs/lib/*:$HADOOP_HOME/share/hadoop/hdfs/*:$HADOOP_HOME/share/hadoop/yarn/lib/*:$HADOOP_HOME/share/hadoop/yarn/*:$HADOOP_HOME/share/hadoop/mapreduce/lib/*:$HADOOP_HOME/share/hadoop/mapreduce/*:$HADOOP_HOME/contrib/capacity-scheduler/*.jar
export PATH JAVA_HOME CLASS_PATH JRE_HOME HADOOP_HOME

最后部署过程截图:

start-dfs.sh

sbin/start-yarn.sh


sbin/mr-jobhistory-daemon.sh start historyserver 日志跟踪

DFS-OVERVIEW


Cluster application OverView



1.4 跑MapReduce测试

前提准备:

新建/henry/input/目录命令:

>hadoop fs -mkdir -p /henry/input/


运行wordcount命令:

hadoop jar share/hadoop/mapreduce/hadoop-mapreduce-examples-xxxxx.jar wordcount /henry/input/ /henry/output/wordcount

运行结果:


注意几点:

(1)/input 是根目录,示例中直接写input,实际指向的是/user/root/input,注意前面的slash斜杠“/”

(2)/output目录,必须是不存在的!

(3)确保虚拟机能满足运行的基本硬件要求!

2. eclipse 安装hadoop插件

2.1 插件下载

下载链接:http://download.csdn.net/download/darkdragonking/9849522

(1)亲测下载可用,我的eclipse版本是Luna 4..4.2。

(2)把下载好的jar放到 eclipse/plugins/ 目录下

(3)最后 重启eclipse 就好了!


2.2 插件配置

参考:http://www.linuxidc.com/Linux/2015-08/120943.htm


3. MRUnit 单元测试

3.1 MRUnit jar下载

搜索下载时注意,一定要下载hadoop2.x版本的,不然会报兼容错误。

报错信息:

java.lang.IncompatibleClassChangeError: Found interface org.apache.hadoop.mapreduce.TaskInputOutputContext, but class was expected
	at org.apache.hadoop.mrunit.internal.mapreduce.AbstractMockContextWrapper.createCommon(AbstractMockContextWrapper.java:59)
	at org.apache.hadoop.mrunit.internal.mapreduce.MockMapContextWrapper.create(MockMapContextWrapper.java:77)
	at org.apache.hadoop.mrunit.internal.mapreduce.MockMapContextWrapper.<init>(MockMapContextWrapper.java:68)
	at org.apache.hadoop.mrunit.mapreduce.MapDriver.getContextWrapper(MapDriver.java:167)
	at org.apache.hadoop.mrunit.mapreduce.MapDriver.run(MapDriver.java:144)
	at org.apache.hadoop.mrunit.TestDriver.runTest(TestDriver.java:640)
	at org.apache.hadoop.mrunit.TestDriver.runTest(TestDriver.java:627)
	at com.demo.sort.MaxTemperatureMapperTest.processValidRecord(MaxTemperatureMapperTest.java:35)
	at sun.reflect.NativeMethodAccessorImpl.invoke0(Native Method)
	at sun.reflect.NativeMethodAccessorImpl.invoke(NativeMethodAccessorImpl.java:62)
	at sun.reflect.DelegatingMethodAccessorImpl.invoke(DelegatingMethodAccessorImpl.java:43)
	at java.lang.reflect.Method.invoke(Method.java:497)
	at org.junit.runners.model.FrameworkMethod$1.runReflectiveCall(FrameworkMethod.java:47)
	at org.junit.internal.runners.model.ReflectiveCallable.run(ReflectiveCallable.java:12)
	at org.junit.runners.model.FrameworkMethod.invokeExplosively(FrameworkMethod.java:44)
	at org.junit.internal.runners.statements.InvokeMethod.evaluate(InvokeMethod.java:17)
	at org.junit.runners.ParentRunner.runLeaf(ParentRunner.java:271)
	at org.junit.runners.BlockJUnit4ClassRunner.runChild(BlockJUnit4ClassRunner.java:70)
	at org.junit.runners.BlockJUnit4ClassRunner.runChild(BlockJUnit4ClassRunner.java:50)
	at org.junit.runners.ParentRunner$3.run(ParentRunner.java:238)
	at org.junit.runners.ParentRunner$1.schedule(ParentRunner.java:63)
	at org.junit.runners.ParentRunner.runChildren(ParentRunner.java:236)
	at org.junit.runners.ParentRunner.access$000(ParentRunner.java:53)
	at org.junit.runners.ParentRunner$2.evaluate(ParentRunner.java:229)
	at org.junit.runners.ParentRunner.run(ParentRunner.java:309)
	at org.eclipse.jdt.internal.junit4.runner.JUnit4TestReference.run(JUnit4TestReference.java:86)
	at org.eclipse.jdt.internal.junit.runner.TestExecution.run(TestExecution.java:38)
	at org.eclipse.jdt.internal.junit.runner.RemoteTestRunner.runTests(RemoteTestRunner.java:459)
	at org.eclipse.jdt.internal.junit.runner.RemoteTestRunner.runTests(RemoteTestRunner.java:678)
	at org.eclipse.jdt.internal.junit.runner.RemoteTestRunner.run(RemoteTestRunner.java:382)
	at org.eclipse.jdt.internal.junit.runner.RemoteTestRunner.main(RemoteTestRunner.java:192)


我是从这里下载的:http://download.csdn.net/download/fkbush/9522361#comment

3.2 写UnitTest代码

单元测试函数的上面加上 @Test 注入标签。示例代码如下:

MaxTemperatureMapperTest.java

package com.demo.sort;

import java.io.IOException;

import org.apache.hadoop.io.IntWritable;
import org.apache.hadoop.io.LongWritable;
import org.apache.hadoop.io.Text;
import org.apache.hadoop.mrunit.mapreduce.MapDriver;
import org.junit.Test;

public final class MaxTemperatureMapperTest {

	@Test
	public void processValidRecord() throws IOException {
		final Text value = new Text("123456798676231190101234567986762311901012345679867623119010123456798676231190101234561+00121534567890356"
		/*
		 * +
		 * "\r\n123456798676231190101234567986762311901012345679867623119010123456798676231190101234562+01122934567890456"
		 * +
		 * "\r\n123456798676231190201234567986762311901012345679867623119010123456798676231190101234562+02120234567893456"
		 * +
		 * "\r\n123456798676231190401234567986762311901012345679867623119010123456798676231190101234561+00321234567803456"
		 * +
		 * "\r\n123456798676231190101234567986762311902012345679867623119010123456798676231190101234561+00429234567903456"
		 * +
		 * "\r\n123456798676231190501234567986762311902012345679867623119010123456798676231190101234561+01021134568903456"
		 * +
		 * "\r\n123456798676231190201234567986762311902012345679867623119010123456798676231190101234561+01124234578903456"
		 * +
		 * "\r\n123456798676231190301234567986762311905012345679867623119010123456798676231190101234561+04121234678903456"
		 * +
		 * "\r\n123456798676231190301234567986762311905012345679867623119010123456798676231190101234561+0082123567"
		 */);
		new MapDriver<LongWritable, Text, Text, IntWritable>().withMapper(new MaxTemperatureMapper()).withInput(new LongWritable(110000), value).withOutput(new Text("1901"), new IntWritable(11))
				.runTest();
	}
}

MaxTemperatureMapper.java

/**
* 页面描述
* @author Henry
* Created at 2017年7月23日
*/
package com.demo.sort;

import java.io.IOException;

import org.apache.hadoop.io.IntWritable;
import org.apache.hadoop.io.LongWritable;
import org.apache.hadoop.io.Text;
import org.apache.hadoop.mapreduce.Mapper;

// 获得每年的温度值。
// 输入:一串文本。
// 输出:年份(字符范围:15-19)和温度(字符范围:87-2位,有可能有加号),当没有温度数据时,用-999代替。
public class MaxTemperatureMapper extends Mapper<LongWritable, Text, Text, IntWritable> {
	int MISSED = 999;
	@Override
	protected void map(final LongWritable key, final Text value, final Mapper<LongWritable, Text, Text, IntWritable>.Context context) throws IOException, InterruptedException {
		final String line = value.toString();
		final String year = line.substring(15, 19);
		final char symbol = line.charAt(87);
		int airTemplate = -999;
		if (symbol == '+') {
			airTemplate = Integer.parseInt(line.substring(88, 92));
		} else {
			airTemplate = Integer.parseInt(line.substring(87, 92));
		}
		
		final String quality = line.substring(92, 93); // 空气质量指数
		if (airTemplate == MISSED && quality.matches("[01459]")) {
			context.write(new Text(year), new IntWritable(airTemplate));
		}
	}
}

3.3 运行结果

运行报错了!!可能是发布jar的作者缺少发布一个类了?有知道的朋友么?

java.lang.NoClassDefFoundError: org/powermock/api/mockito/PowerMockito
	at org.apache.hadoop.mrunit.mapreduce.MapDriver.run(MapDriver.java:147)
	at org.apache.hadoop.mrunit.TestDriver.runTest(TestDriver.java:640)
	at org.apache.hadoop.mrunit.TestDriver.runTest(TestDriver.java:627)
	at com.demo.sort.MaxTemperatureMapperTest.processValidRecord(MaxTemperatureMapperTest.java:35)
	at sun.reflect.NativeMethodAccessorImpl.invoke0(Native Method)
	at sun.reflect.NativeMethodAccessorImpl.invoke(NativeMethodAccessorImpl.java:62)
	at sun.reflect.DelegatingMethodAccessorImpl.invoke(DelegatingMethodAccessorImpl.java:43)
	at java.lang.reflect.Method.invoke(Method.java:497)
	at org.junit.runners.model.FrameworkMethod$1.runReflectiveCall(FrameworkMethod.java:47)
	at org.junit.internal.runners.model.ReflectiveCallable.run(ReflectiveCallable.java:12)
	at org.junit.runners.model.FrameworkMethod.invokeExplosively(FrameworkMethod.java:44)
	at org.junit.internal.runners.statements.InvokeMethod.evaluate(InvokeMethod.java:17)
	at org.junit.runners.ParentRunner.runLeaf(ParentRunner.java:271)
	at org.junit.runners.BlockJUnit4ClassRunner.runChild(BlockJUnit4ClassRunner.java:70)
	at org.junit.runners.BlockJUnit4ClassRunner.runChild(BlockJUnit4ClassRunner.java:50)
	at org.junit.runners.ParentRunner$3.run(ParentRunner.java:238)
	at org.junit.runners.ParentRunner$1.schedule(ParentRunner.java:63)
	at org.junit.runners.ParentRunner.runChildren(ParentRunner.java:236)
	at org.junit.runners.ParentRunner.access$000(ParentRunner.java:53)
	at org.junit.runners.ParentRunner$2.evaluate(ParentRunner.java:229)
	at org.junit.runners.ParentRunner.run(ParentRunner.java:309)
	at org.eclipse.jdt.internal.junit4.runner.JUnit4TestReference.run(JUnit4TestReference.java:86)
	at org.eclipse.jdt.internal.junit.runner.TestExecution.run(TestExecution.java:38)
	at org.eclipse.jdt.internal.junit.runner.RemoteTestRunner.runTests(RemoteTestRunner.java:459)
	at org.eclipse.jdt.internal.junit.runner.RemoteTestRunner.runTests(RemoteTestRunner.java:678)
	at org.eclipse.jdt.internal.junit.runner.RemoteTestRunner.run(RemoteTestRunner.java:382)
	at org.eclipse.jdt.internal.junit.runner.RemoteTestRunner.main(RemoteTestRunner.java:192)
Caused by: java.lang.ClassNotFoundException: org.powermock.api.mockito.PowerMockito
	at java.net.URLClassLoader.findClass(URLClassLoader.java:381)
	at java.lang.ClassLoader.loadClass(ClassLoader.java:424)
	at sun.misc.Launcher$AppClassLoader.loadClass(Launcher.java:331)
	at java.lang.ClassLoader.loadClass(ClassLoader.java:357)
	... 27 more

3.4 报错解决过程后续

为了避免繁琐的jar依赖关系,我决定采用Maven插件来管理jar。

项目结构如下:



pom.xml:

<project xmlns="http://maven.apache.org/POM/4.0.0" xmlns:xsi="http://www.w3.org/2001/XMLSchema-instance"
	xsi:schemaLocation="http://maven.apache.org/POM/4.0.0 http://maven.apache.org/xsd/maven-4.0.0.xsd">
	<modelVersion>4.0.0</modelVersion>

	<groupId>practice.hadoop</groupId>
	<artifactId>simple-examples</artifactId>
	<version>0.0.1-SNAPSHOT</version>
	<packaging>jar</packaging>

	<name>simple-examples</name>
	<url>http://maven.apache.org</url>

	<properties>
		<project.build.sourceEncoding>UTF-8</project.build.sourceEncoding>
	</properties>

	<dependencies>
		<dependency>
			<groupId>junit</groupId>
			<artifactId>junit</artifactId>
			<version>4.12</version>
			<scope>test</scope>
		</dependency>
		<!-- https://mvnrepository.com/artifact/org.apache.hadoop/hadoop-common -->
		<dependency>
			<groupId>org.apache.hadoop</groupId>
			<artifactId>hadoop-common</artifactId>
			<version>2.8.0</version>
		</dependency>

		<dependency>
			<groupId>org.apache.hadoop</groupId>
			<artifactId>hadoop-hdfs</artifactId>
			<version>2.8.0</version>
		</dependency>
		<!-- https://mvnrepository.com/artifact/org.apache.hadoop/hadoop-client -->
		<dependency>
			<groupId>org.apache.hadoop</groupId>
			<artifactId>hadoop-client</artifactId>
			<version>2.8.0</version>
		</dependency>

		<dependency>
			<groupId>org.apache.mrunit</groupId>
			<artifactId>mrunit</artifactId>
			<version>1.1.0</version>
			<classifier>hadoop2</classifier>
			<scope>test</scope>
		</dependency>
		<!-- https://mvnrepository.com/artifact/org.apache.hadoop/hadoop-mapreduce-client-core -->
		<dependency>
			<groupId>org.apache.hadoop</groupId>
			<artifactId>hadoop-mapreduce-client-core</artifactId>
			<version>2.8.0</version>
		</dependency>
		<!-- https://mvnrepository.com/artifact/org.apache.hadoop/hadoop-yarn-api -->
		<dependency>
			<groupId>org.apache.hadoop</groupId>
			<artifactId>hadoop-yarn-api</artifactId>
			<version>2.8.0</version>
		</dependency>
		<!-- https://mvnrepository.com/artifact/org.apache.hadoop/hadoop-auth -->
		<dependency>
			<groupId>org.apache.hadoop</groupId>
			<artifactId>hadoop-auth</artifactId>
			<version>2.8.0</version>
		</dependency>

		<dependency>
			<groupId>jdk.tools</groupId>
			<artifactId>jdk.tools</artifactId>
			<version>1.8</version>
			<scope>system</scope>
			<systemPath>${JAVA_HOME}/lib/tools.jar</systemPath>
		</dependency>
		<!-- https://mvnrepository.com/artifact/org.apache.hadoop/hadoop-minicluster -->
		<dependency>
			<groupId>org.apache.hadoop</groupId>
			<artifactId>hadoop-minicluster</artifactId>
			<version>2.8.0</version>
			<scope>test</scope>
		</dependency>
		<!-- https://mvnrepository.com/artifact/org.apache.hadoop/hadoop-mapreduce-client-jobclient -->
		<dependency>
			<groupId>org.apache.hadoop</groupId>
			<artifactId>hadoop-mapreduce-client-jobclient</artifactId>
			<version>2.8.0</version>
			<scope>provided</scope>
		</dependency>

	</dependencies>
</project>


运行结果:

晕了,果然又有新的报错,这回是找不到io.Text类,可是jar里明明有这个,项目编译也没报错~

java.lang.NoClassDefFoundError: org/apache/hadoop/io/Text
	at practice.hadoop.simple_examples.MaxTemperatureMapperTest.processValidRecord(MaxTemperatureMapperTest.java:15)
	at sun.reflect.NativeMethodAccessorImpl.invoke0(Native Method)
	at sun.reflect.NativeMethodAccessorImpl.invoke(NativeMethodAccessorImpl.java:62)
	at sun.reflect.DelegatingMethodAccessorImpl.invoke(DelegatingMethodAccessorImpl.java:43)
	at java.lang.reflect.Method.invoke(Method.java:497)
	at org.junit.runners.model.FrameworkMethod$1.runReflectiveCall(FrameworkMethod.java:50)
	at org.junit.internal.runners.model.ReflectiveCallable.run(ReflectiveCallable.java:12)
	at org.junit.runners.model.FrameworkMethod.invokeExplosively(FrameworkMethod.java:47)
	at org.junit.internal.runners.statements.InvokeMethod.evaluate(InvokeMethod.java:17)
	at org.junit.runners.ParentRunner.runLeaf(ParentRunner.java:325)
	at org.junit.runners.BlockJUnit4ClassRunner.runChild(BlockJUnit4ClassRunner.java:78)
	at org.junit.runners.BlockJUnit4ClassRunner.runChild(BlockJUnit4ClassRunner.java:57)
	at org.junit.runners.ParentRunner$3.run(ParentRunner.java:290)
	at org.junit.runners.ParentRunner$1.schedule(ParentRunner.java:71)
	at org.junit.runners.ParentRunner.runChildren(ParentRunner.java:288)
	at org.junit.runners.ParentRunner.access$000(ParentRunner.java:58)
	at org.junit.runners.ParentRunner$2.evaluate(ParentRunner.java:268)
	at org.junit.runners.ParentRunner.run(ParentRunner.java:363)
	at org.eclipse.jdt.internal.junit4.runner.JUnit4TestReference.run(JUnit4TestReference.java:86)
	at org.eclipse.jdt.internal.junit.runner.TestExecution.run(TestExecution.java:38)
	at org.eclipse.jdt.internal.junit.runner.RemoteTestRunner.runTests(RemoteTestRunner.java:459)
	at org.eclipse.jdt.internal.junit.runner.RemoteTestRunner.runTests(RemoteTestRunner.java:678)
	at org.eclipse.jdt.internal.junit.runner.RemoteTestRunner.run(RemoteTestRunner.java:382)
	at org.eclipse.jdt.internal.junit.runner.RemoteTestRunner.main(RemoteTestRunner.java:192)
Caused by: java.lang.ClassNotFoundException: org.apache.hadoop.io.Text
	at java.net.URLClassLoader.findClass(URLClassLoader.java:381)
	at java.lang.ClassLoader.loadClass(ClassLoader.java:424)
	at sun.misc.Launcher$AppClassLoader.loadClass(Launcher.java:331)
	at java.lang.ClassLoader.loadClass(ClassLoader.java:357)
	... 24 more


注: 以上问题已经解决了,谢谢支持!
    解决办法:
     1. 清除 /src/test/下面所有文件;
     2. 清除 /target/目录下所有文件;
     3. 下载maven binaries包,解压到某个目录,配置好环境变量;
     4. cd 到项目目录
     5. 执行命令,mvn clean
     6. 执行命令:mvn assembly:assembly
     7. 编译成功,并且在target目录下生成了withdependency.jar文件;
     8. 将jar拷贝到hadoop server某个目录下;
     9. 执行:
        >export HADOOP_CLASSPATH=jar文件所在的目录/withdependency.jar
        >hadoop your.package.MainClass /henry/input/weather.txt /henry/output/weather


3.5 结果截图

温度列表




  • 0
    点赞
  • 3
    收藏
    觉得还不错? 一键收藏
  • 0
    评论

“相关推荐”对你有帮助么?

  • 非常没帮助
  • 没帮助
  • 一般
  • 有帮助
  • 非常有帮助
提交
评论
添加红包

请填写红包祝福语或标题

红包个数最小为10个

红包金额最低5元

当前余额3.43前往充值 >
需支付:10.00
成就一亿技术人!
领取后你会自动成为博主和红包主的粉丝 规则
hope_wisdom
发出的红包
实付
使用余额支付
点击重新获取
扫码支付
钱包余额 0

抵扣说明:

1.余额是钱包充值的虚拟货币,按照1:1的比例进行支付金额的抵扣。
2.余额无法直接购买下载,可以购买VIP、付费专栏及课程。

余额充值