Hadoop 上使用c 语言编程

最新推荐文章于 2021-05-25 12:23:54 发布

kangquan2008

最新推荐文章于 2021-05-25 12:23:54 发布

阅读量1w

点赞数 5

分类专栏：高性能及分布式计算[hadoop] 文章标签： hadoop 语言编程 c buffer query

本文链接：https://blog.csdn.net/huangkq1989/article/details/7042638

版权

高性能及分布式计算[hadoop] 专栏收录该内容

7 篇文章 0 订阅

订阅专栏

今天尝试用C语言在hadoop上编写统计单词的程序，具体过程如下：

一、编写map和reduce程序

mapper.c

#include <stdio.h>
#include <stdlib.h>
#include <string.h>

#define BUF_SIZE	2048
#define DELIM		'\n'

int main(int argc, char * argv[])
{
	char buffer[BUF_SIZE];
	while(fgets(buffer,BUF_SIZE-1,stdin))
	{
		int len = strlen(buffer);
		if(buffer[len-1] == DELIM) // 将换行符去掉
			buffer[len-1] = 0;

		char *query = NULL;
		query = strtok(buffer, " ");
		while(query)
		{
			printf("%s\t1\n",query);
			query = strtok(NULL," ");
		}
	}
	return 0;
}

reducer.c

#include <stdio.h>
#include <stdlib.h>
#include <string.h>

#define BUFFER_SIZE 	1024
#define DELIM 		"\t"

int main(int argc, char * argv[])
{
	char str_last_key[BUFFER_SIZE];
	char str_line[BUFFER_SIZE];
	int count = 0;

	*str_last_key = '\0';

	while( fgets(str_line,BUFFER_SIZE-1,stdin) )
	{
		char * str_cur_key = NULL;
		char * str_cur_num = NULL;

		str_cur_key = strtok(str_line,DELIM);
		str_cur_num = strtok(NULL,DELIM);

		if(str_last_key[0] =='\0')
		{
			strcpy(str_last_key,str_cur_key);
		}
		if(strcmp(str_cur_key, str_last_key))// 前后不相等，输出
		{
			printf("%s\t%d\n",str_last_key,count);
			count = atoi(str_cur_num);
		}else{// 相等，则加当前的key的value
			count += atoi(str_cur_num);
		}
		strcpy(str_last_key,str_cur_key);
	}
	printf("%s\t%d\n",str_last_key,count);
	return 0;
}

二、编译

gcc mapper.c -o mapper

gcc reducer.c -o reducer

三、运行

（一）启动hadoop后将待统计单词的输入文件放到 input文件夹中：bin/hadoop fs -put 待统计文件 input

（二）使用contrib/streaming/下的jar工具调用上面的mapper\reducer:

bin/hadoop jar /home/huangkq/Desktop/hadoop/contrib/streaming/hadoop-streaming-0.20.203.0.jar -mapper /home/huangkq/Desktop/hadoop2/mapper -reducer /home/huangkq/Desktop/hadoop2/reducer -input input -output c_output -jobconf mapred.reduce.tasks=2

说明：hadoop-streaming-0.20.203.0.jar是一个管道工具

（三）查看结果：bin/hadoop fs -cat c_output/*

kangquan2008

关注

5
点赞
踩
15

收藏

觉得还不错? 一键收藏
4
评论
Hadoop 上使用c 语言编程

今天尝试用C语言在hadoop上编写统计单词的程序，具体过程如下：一、编写map和reduce程序mapper.c#include #include #include #define BUF_SIZE 2048#define DELIM '\n'int main(int argc, char * argv[]){ char buffer[BUF_SIZE]; wh
复制链接

扫一扫