1、首先建立一个java项目
HDFS实现:
2、创建个lib文件夹添加jar包(在附件);
具体路径:
\hadoop-2.6.0\share\hadoop\common 三个;
\hadoop-2.6.0\share\hadoop\common\lib 所有
\hadoop-2.6.0\share\hadoop\hdfs 三个
3,具体测试代码:
//已省略导入的包
public class HdfsDemo {
FileSystem fileSystem=null;
//在执行 @Test 之前先执行初始化
@Before
public void init() throws IOException, InterruptedException, URISyntaxException {
//伪装成root用户
fileSystem=FileSystem.get(new URI("hdfs://192.168.1.168:9000"), new Configuration(),"root");
}
@Test
public void testUpload() throws IllegalArgumentException, IOException {
//读取本地文件系统的文件,返回输入流
InputStream inputStream=new FileInputStream("G:\\UU\\FXFK.rar");
//在hdfs创建一个文件,返回输出流
OutputStream outputStream=fileSystem.create(new Path("/fxfk.rar"));
//输入->输出
IOUtils.copyBytes(inputStream, outputStream, 4096, true);
}
@Test
public void downLoad() throws IllegalArgumentException, IOException {
fileSystem.copyToLocalFile(true, new Path("/fxfk.rar"), new Path("g://f.tar"));
}
@Test
public void delTest() throws IllegalArgumentException, IOException{
//当目录不空时不可删除
boolean flag = fileSystem.delete(new Path("/dirs"), true);
System.out.println(flag);
}
@Test
public void testMkdir() throws IllegalArgumentException, IOException {
//添加目录
boolean flag = fileSystem.mkdirs(new Path("/dirs"));
System.out.println(flag);
}
public static void main(String[] args) throws IOException, URISyntaxException {
FileSystem fileSystem=FileSystem.get(new URI("hdfs://192.168.1.168:9000"), new Configuration());
InputStream inputStream=fileSystem.open(new Path("/had"));
OutputStream outputStream = new FileOutputStream("G://hda2.6");
IOUtils.copyBytes(inputStream, outputStream, 4096, true);
}
}
MapReduce实现:
Mapreduce单词统计实现
总结步骤:
首先拷贝 hadoop-2.6.0\share\hadoop\mapreduce*.jar包
* 1.分析具体业务逻辑,确定输入输出数据的样式
* 2.自定义一个类,这个类继承Mapper类,重写map方法,在其里实现具体业务逻辑,将新的key,value输出
* 3.再自定义个类继承Reducer,实现reduce方法
* 4.将自定义的mapper、reducer类,通过job对象组装起来
1.自定义WCMapper:
import org.apache.hadoop.io.LongWritable;
import org.apache.hadoop.io.Text;
import org.apache.hadoop.mapreduce.Mapper;
// 虽然Long、String实现了序列化接口,但其实对应的jdk序列化接口,性能比较慢,
// 因此Hadoop有自己的序列化接口实现类
// Long->LongWritable;String->Text
public class WCMapper extends Mapper<LongWritable, Text, Text, LongWritable>{
@Override
protected void map(LongWritable key, Text value,
Mapper<LongWritable, Text, Text, LongWritable>.Context context)
throws IOException, InterruptedException {
// 接收数据V1
String line = value.toString();
// 切分数据
String []words=line.split(" ");
// 循环
for (String string : words) {
// 出现一次,计一个1,输出
context.write(new Text(string), new LongWritable(1));
}
}
}
2.自定义Reducer类
import java.io.IOException;
import org.apache.hadoop.io.LongWritable;
import org.apache.hadoop.io.Text;
import org.apache.hadoop.mapreduce.Reducer;
public class WCReducer extends Reducer<Text, LongWritable, Text, LongWritable>{
// 注意 :V2S 是Iterable<LongWritable>类型
// 也就是经过Shuffle处理过程后其一个key对应的value是一个类似这样的集合{v2,...}
@Override
protected void reduce(Text V2, Iterable<LongWritable> V2S,
Reducer<Text, LongWritable, Text, LongWritable>.Context context)
throws IOException, InterruptedException {
// 接收数据
// 定义一个计数器
long count=0;
// 循环v2s
for (LongWritable longWritable : V2S) {
count+=longWritable.get();
}
// 输出
context.write(V2, new LongWritable(count));
}
}
3.WorldCount类
import java.io.IOException;
import org.apache.hadoop.conf.Configuration;
import org.apache.hadoop.fs.Path;
import org.apache.hadoop.io.LongWritable;
import org.apache.hadoop.io.Text;
import org.apache.hadoop.mapreduce.Job;
import org.apache.hadoop.mapreduce.lib.input.FileInputFormat;
import org.apache.hadoop.mapreduce.lib.output.FileOutputFormat;
/**
* 总结步骤:
* 1.分析具体业务逻辑,确定输入输出数据的样式
* 2.自定义一个类,这个类继承Mapper类,重写map方法,在其里实现具体业务逻辑,将新的key,value输出
* 3.再自定义个类继承Reducer,实现reduce方法
* 4.将自定义的mapper、reducer类,通过job对象组装起来
* @author xin
*
*/
public class WordCount {
public static void main(String[] args) throws Exception {
//构建job对象
Job job = Job.getInstance(new Configuration());
//注意:main方法所在类
job.setJarByClass(WordCount.class);
// 设置mapper相关属性
job.setMapperClass(WCMapper.class);
// 设置map后输出的k2的类型类
job.setMapOutputKeyClass(Text.class);
// 设置map后输出的v2的类型类
job.setMapOutputValueClass(LongWritable.class);
// 设置HDFS上的待输入处理的文件
FileInputFormat.setInputPaths(job, new Path("/words.txt"));
// 设置reducer相关属性
job.setReducerClass(WCReducer.class);
// 设置reducer输出的k3类型类
job.setOutputKeyClass(Text.class);
// 设置reducer输出的v3类型类
job.setOutputValueClass(LongWritable.class);
// 设置输出到HDFS上的处理结果文件夹
FileOutputFormat.setOutputPath(job, new Path("/wcout601"));
//提交作业,并打印进度详情
job.waitForCompletion(true);
}
}
提示:在eclipse端打包成普通jar包,选择主函数类为WorldCount,然后将wc.jar包传递到虚拟机环境
通过下面命令执行wc.jar包(因为是普通jar,所以需要使用hadoop jar命令):
hadoop jar wc.jar (普通jar)
java -jar wc.jar (runable jar包)