【java学习】Jni在hadoop上的使用（下）

最新推荐文章于 2023-10-25 12:56:00 发布

xceman1997

最新推荐文章于 2023-10-25 12:56:00 发布

阅读量1.9k

点赞数

分类专栏： hadoop c/c++ Java

本文链接：https://blog.csdn.net/xceman1997/article/details/8270774

版权

c/c++ 同时被 3 个专栏收录

25 篇文章 0 订阅

订阅专栏

Java

18 篇文章 0 订阅

订阅专栏

hadoop

15 篇文章 0 订阅

订阅专栏

有了前两篇博文的铺垫，这一篇，进入“阶段三”，即在hadoop中调用.so动态库，在动态库中读取文件。

其实苦头都在前面吃了，这部分相对很顺利。

首先，在动态库.so的java wrapper中增加一个函数接口Init，用来load词典文件：

package FakeSegmentForJni;

/**
 * 
 * This class is for verifying the jni technology. It's a fake segmenter.
 * The only user is its function interface.
 * 
 */

public class FakeSegmentForJni {
	public static native boolean Init (String file);
	public static native String SegmentALine (String line);

	static 
	{
		System.loadLibrary("FakeSegmentForJni");
	}
}

接下来，同《【java学习】Jni在hadoop上的使用（上）》中介绍的过程一样，用javac命令生成FakeSegmentForJni.class文件，用javah命令生成c++头文件FakeSegmentForJni_FakeSegmentForJni.h。用javah的时候，要注意路径问题。FakeSegmentForJni_FakeSegmentForJni.h看起来是这个样子的：

/* DO NOT EDIT THIS FILE - it is machine generated */
#include <jni.h>
/* Header for class FakeSegmentForJni_FakeSegmentForJni */

#ifndef _Included_FakeSegmentForJni_FakeSegmentForJni
#define _Included_FakeSegmentForJni_FakeSegmentForJni
#ifdef __cplusplus
extern "C" {
#endif
/*
 * Class:     FakeSegmentForJni_FakeSegmentForJni
 * Method:    Init
 * Signature: (Ljava/lang/String;)Z
 */
JNIEXPORT jboolean JNICALL Java_FakeSegmentForJni_FakeSegmentForJni_Init
  (JNIEnv *, jclass, jstring);

/*
 * Class:     FakeSegmentForJni_FakeSegmentForJni
 * Method:    SegmentALine
 * Signature: (Ljava/lang/String;)Ljava/lang/String;
 */
JNIEXPORT jstring JNICALL Java_FakeSegmentForJni_FakeSegmentForJni_SegmentALine
  (JNIEnv *, jclass, jstring);

#ifdef __cplusplus
}
#endif
#endif

其实也可以自己来写。

第二，编写FakeSegmentForJni_FakeSegmentForJni.cpp文件内容，实现相关功能，如下：

#include <jni.h>
#include <stdio.h>
#include <string.h>

#include <string>
#include <vector>
#include <fstream>
#include <iostream>

#include "FakeSegmentForJni_FakeSegmentForJni.h"

using namespace std;

// the global varable for lexicon
vector <string> WordVec;

/* * Class:     FakeSegmentForJni_FakeSegmentForJni 
 * * Method:    Init 
 * * Signature: (Ljava/lang/String;)Ljava/lang/String; 
 * */
JNIEXPORT jboolean JNICALL Java_FakeSegmentForJni_FakeSegmentForJni_Init
	(JNIEnv *env, jclass obj, jstring line)
{
	const char *pFileName = NULL;
	pFileName = env->GetStringUTFChars (line, false);
	if (pFileName == NULL)
		return false;
	
	ifstream in (pFileName);
	if (!in)
	{
		cerr << "Can not open the file of " << pFileName << endl;
		return false;
	}
	string sWord;
	while (getline (in, sWord))
	{
		WordVec.push_back(sWord);
	}
	return true;
}


/* * Class:     FakeSegmentForJni_FakeSegmentForJni
 * * Method:    SegmentALine
 * * Signature: (Ljava/lang/String;)Ljava/lang/String;
 * */

JNIEXPORT jstring JNICALL Java_FakeSegmentForJni_FakeSegmentForJni_SegmentALine
   (JNIEnv *env, jclass obj, jstring line)
{
	char buf[128];
	buf[0] = 0;
	const char *str = NULL;
	str = env->GetStringUTFChars(line, false);
	if (str == NULL)
		return NULL;
	strcpy (buf, str);
	if (!WordVec.empty())
		strcat (buf, WordVec.at(0).c_str());
	// strcat (buf, "--copy that\n");
	env->ReleaseStringUTFChars(line, str);
	return env->NewStringUTF(buf);
}

功能很简单，就是在Init函数中打开一个文件，将文件中的每一行存储在全局变量WordVec中；然后，在SegmentALine函数中，将输入字符串和WordVec中的第一个元素相连接，再输出。用g++将.cpp文件编译成.so文件，命令如下：

g++ -I/System/Library/Frameworks/JavaVM.framework/Versions/A/Headers FakeSegmentForJni_FakeSegmentForJni.cpp -fPIC -shared -o libFakeSegmentForJni.so

第三步，写hadoop程序。

map函数：

public static class MapTestJni extends Mapper<Writable, Text, Text, Text> {
		
		protected String s;
		protected void setup(Context context) throws IOException, InterruptedException
		{
			FakeSegmentForJni.Init("Lex.txt");
			s = FakeSegmentForJni.SegmentALine("jni-value");
		}
		
		protected void map(Writable key, Text value, Context context)
		throws IOException, InterruptedException {
			
			// the format of input value is:
			//    mcid totaltimes item1 item2(itemkey=itemvalue)
			
			context.write(new Text("key"), new Text(s.toString()));
		}
	}

在setup函数中，我们调用了FakeSegmentForJni.Init函数，来load文件Lex.txt中的内容。可以看到，相对路径就是本地当前路径。在下文分发过程中，会将Lex.txt文件分发到与jar文件相同的本地路径下。在map函数中，输出s的内容。reduce函数、控制函数和main函数与《【java学习】Jni在hadoop上的使用（中）》的一样，这里就不重复粘贴了。

第四步，在命令行中用“-files”参数将.so文件和Lex.txt文件分发到tasknode上，命令如下：

hadoop jar /xxx/TestFakeSegmentForJniHadoop.jar -files /xxx/TestJni/libFakeSegmentForJni.so,/xxx/TestJni/Lex.txt FakeSegmentForJni.TestFakeSegmentForJni /input/xxx.txt /outputJNI

多个文件用逗号间隔。

最后，检查运行结果，结果正确。

xceman1997

关注

0
点赞
踩
0

收藏

觉得还不错? 一键收藏
4
评论
【java学习】Jni在hadoop上的使用（下）

有了前两篇博文的铺垫，这一篇，进入“阶段三”，即在hadoop中调用.so动态库，在动态库中读取文件。其实苦头都在前面吃了，这部分相对很顺利。首先，在动态库.so的java wrapper中增加一个函数接口Init，用来load词典文件：package FakeSegmentForJni;/** * * This class is for verify
复制链接

扫一扫

专栏目录