创建一个文件"words.txt",上传到hdfs
代码:
public class CreateFile {
public static void main(String[] args) throws Exception {
//设置一个配置 服务器所在 信息
Configuration conf = new Configuration();
// linux 上的 hdfs 访问 地址
conf.set("fs.defaultFS", "hdfs://master:8020");
// 从服务器 获取 hdfs 文件 操作对象
FileSystem hdfs = FileSystem.get(conf);
//找到上传的 文件
byte[] buf = ("Asia is better than the rest of the world. We should continue to develop and build" +
" Asia well, show Asia's resilience, wisdom and strength, and build an anchor of peace " +
"and stability, a source of growth and a new highland of cooperation in the world." +
"First, firmly safeguard peace in Asia. Today, the five principles of peaceful " +
"coexistence and the \"Bandung spirit\" initiated by Asia are of more practical " +
"significance. We should uphold the principles of mutual respect, equality and mutual " +
"benefit and peaceful coexistence, pursue a policy of good neighborliness and friendship," +
" and firmly hold our destiny in our own hands." +
"Second, actively promote Asian cooperation. The agreement on regional comprehensive " +
"economic partnership has officially entered into force, and the railway between China " +
"and old fellow has been opened to traffic, effectively enhancing the level of regional " +
"hard interconnection and soft Unicom. We should take this opportunity to promote the " +
"formation of a larger and more open market in Asia and take new steps in promoting " +
"win-win cooperation in Asia." +
"Third, jointly promote Asian solidarity. We should consolidate the central position of " +
"ASEAN in the regional structure and maintain a regional order that takes into account" +
" the demands of all parties and embraces the interests of all parties. Countries, big" +
" or small, strong or weak, both within and outside the region, should add luster to " +
"Asia without adding chaos. They should jointly follow the path of peaceful development" +
", seek win-win cooperation and create a united and progressive Asian family." +
"China will fully implement the new development concept, accelerate the construction " +
"of a new development pattern and strive to promote high-quality development. No matter" +
" what changes take place in the world, China's confidence and will in reform and opening" +
" up will not waver. China will unswervingly follow the path of peaceful development " +
"and always be a builder of world peace, a contributor to global development and a " +
"defender of the international order." +
"Often do, not afraid of thousands of things. As long as we join hands and work " +
"together, we will be able to gather the great strength of win-win cooperation, " +
"overcome various challenges on the way forward and usher in a brighter and better" +
" future for mankind.").getBytes();
//对应 hdfs 路径
Path dst = new Path("/neusoftin/words.txt");
// 创建文件路径
FSDataOutputStream out = hdfs.create(dst);
out.write(buf, 0, buf.length);// 向文件 传入信息
out.close();
// 验证 是否创建成功
System.out.println(hdfs.exists(dst));
}
}
算法2:使用Tool工具类实现Mapreduce词频统计main方法
代码:
public class WordCountTool extends Configured implements Tool {
public static void main(String[] args) throws Exception {
// 服务器连接对象
Configuration conf = new Configuration();
conf.set("fs.defaultFS", "hdfs://master:8020");
FileSystem hdfs = FileSystem.get(conf);
// 设置读取路径和文件
String input ="/neusoftin/*.txt";
String output= "/neusoftout"; // mapreduce 最后的结果,路径不
事先存在
Path outputpath = new Path(output);
// 执行前先 删除 结果文件夹;所以 如果为 true
if(hdfs.exists(outputpath)){
hdfs.delete(outputpath);
}
//工具类中 启动
args = new String[]{“/neusoftin/*.txt”, “/neusoftout”};
int re =ToolRunner.run(conf,new WordCountTool(),args);
System.exit(re);
}
@Override
public int run(String[] strings) throws Exception {
//
Job job =Job.getInstance(getConf());
job.setJarByClass(WordCountMain.class);//执行jar启动类
job.setInputFormatClass(TextInputFormat.class);
TextInputFormat.setInputPaths(job,strings[0]);// 输入入口
//Mapper
job.setMapperClass(WordCountMapper.class);
job.setMapOutputKeyClass(Text.class);
job.setMapOutputValueClass(IntWritable.class);
job.setCombinerClass(WordCoundCombiner.class);
// reducer
job.setReducerClass(WordCountReducer.class);
job.setOutputKeyClass(Text.class);
job.setOutputValueClass(IntWritable.class);
// 结果文件 输出
job.setOutputFormatClass(TextOutputFormat.class);
TextOutputFormat.setOutputPath(job,new Path(strings[1]));//
行输出
//运行
boolean result = job.waitForCompletion(true);
FileSystem hdfs = FileSystem.get(getConf());
if(result){
// 获取hdfs 路径下的
for(FileStatus fs: hdfs.listStatus(new Path(strings[1]))){
FSDataInputStream dis = hdfs.open(fs.getPath());
//用IOUtils下的copyBytes将流中的数据打印输出到控制台
BufferedReader reader = new BufferedReader(new
InputStreamReader(dis)); // 字节转字符
String line = reader.readLine();
while(line!=null){
System.out.println(line);
line = reader.readLine();
}
}
}
return 0;
}
}
算法3:MapReduce的mapper方法
代码:
public class WordCountMapper extends Mapper<LongWritable, Text,Text,
IntWritable> {
// 优化 写法
private Text outMapKey= new Text();
private static final IntWritable outMapValue = new
IntWritable(1);
/**
*
* @param key
* @param value 传入的文本
* @param context 返回的map
* @throws IOException
* @throws InterruptedException
*/
@Override
protected void map(LongWritable key, Text value,
Mapper<LongWritable, Text, Text, IntWritable>.Context context) throws
IOException, InterruptedException {
//获取 传入需要 统计的信息
String line = value.toString();
//分片
// line.split(" "); 是否为空的判定
if(StringUtils.isBlank(line)){
return ;
}
//调用工具类 差分 获取单词
StringTokenizer st = new StringTokenizer(line);
while(st.hasMoreTokens()){ // 循环判断是否还有 可以或缺的 单词
String word =st.nextToken(); // 向下获取单词
outMapKey.set(word);
context.write(outMapKey, outMapValue); // 向reduce 传递 信息
key 和value
}
}
}
算法4:MapReduce的reducer方法
代码:
public class WordCountReducer extends Reducer<Text,
IntWritable,Text ,IntWritable> {
/**
*
* @param key
* @param values
* @param context
* @throws IOException
* @throws InterruptedException
*/
@Override
protected void reduce(Text key, Iterable<IntWritable> values,
Reducer<Text, IntWritable, Text, IntWritable>.Context context) throws
IOException, InterruptedException {
int sum=0; // 定义求和变量
for(IntWritable value :values){ //循环 vaues
sum+= value.get(); //+1
}
context.write(key,new IntWritable(sum));// 返回map
}
}