Emscripten之JS与C/C++互相调用

最新推荐文章于 2022-09-05 09:48:06 发布

wopelo

最新推荐文章于 2022-09-05 09:48:06 发布

阅读量4.5k

点赞数 5

分类专栏： WebAssembly 文章标签： javascript c语言 c++ webassembly ecmascript

本文链接：https://blog.csdn.net/wopelo/article/details/121730953

版权

WebAssembly 专栏收录该内容

2 篇文章 1 订阅

订阅专栏

1.前言

上一篇博客介绍了Emscripten中的胶水代码，通常我们会在js中调用定义在C/C++中的函数，此时就涉及到js如何向C/C++传递参数。本文主要介绍Emscripten中JS与C/C++互相调用的方式。在阅读之前，读者最好对WebAssembly有所了解，并且有一定的使用Emscripten的经验。
本文所用到的示例已在github上开源。

2.内存模型

2.1.Module.asm.memory

通过Emscripten处理后，C/C++代码直接通过地址访问的数据全部在内存中，该内存空间实际是Emscripten提供的ArrayBuffer对象。我们可以在js中通过 Module.asm.memory 访问到这个对象。实际上js可以访问到C/C++所使用的内存地址，但C/C++却不能访问到js所使用的内存地址。参考《C/C++面向WebAssembly编程》一书，这种模型被成为单向透明的内存模型

C/C++能直接访问的数据事实上被限制在Module.buffer内部，JavaScript环境中的其他对象无法被C/C++直接访问。但在JavaScript中可以访问C/C++内存，通过获取C/C++中的变量地址进而获取到C/C++中的变量。这种模型被称为单向透明的内存模型。

在上一篇博客介绍胶水代码加载wasm模块时提到了胶水代码通过 receiveInstance 函数将wasm实例的exports挂载到window.Module.asm下。其实 receiveInstance 函数还有一个作用，那就是处理wasm实例导出的memory属性：

// Load the wasm module and create an instance of using native support in the JS engine.
  // handle a generated wasm instance, receiving its exports and
  // performing other necessary setup
  /** @param {WebAssembly.Module=} module*/
  function receiveInstance(instance, module) {
    var exports = instance.exports;

    Module['asm'] = exports;

    wasmMemory = Module['asm']['memory'];
    assert(wasmMemory, "memory not found in wasm exports");
    // This assertion doesn't hold when emscripten is run in --post-link
    // mode.
    // TODO(sbc): Read INITIAL_MEMORY out of the wasm file in post-link mode.
    //assert(wasmMemory.buffer.byteLength === 16777216);
    updateGlobalBufferAndViews(wasmMemory.buffer);

   // ...
  }

其中 updateGlobalBufferAndViews 函数的主要逻辑如下：

function updateGlobalBufferAndViews(buf) {
  buffer = buf;
  Module['HEAP8'] = HEAP8 = new Int8Array(buf);
  Module['HEAP16'] = HEAP16 = new Int16Array(buf);
  Module['HEAP32'] = HEAP32 = new Int32Array(buf);
  Module['HEAPU8'] = HEAPU8 = new Uint8Array(buf);
  Module['HEAPU16'] = HEAPU16 = new Uint16Array(buf);
  Module['HEAPU32'] = HEAPU32 = new Uint32Array(buf);
  Module['HEAPF32'] = HEAPF32 = new Float32Array(buf);
  Module['HEAPF64'] = HEAPF64 = new Float64Array(buf);
}

2.2.Module.HEAPX

在函数 updateGlobalBufferAndViews 中，Int8Array、Int16Array、Int32Array等等都是TypedArray视图。为什么在胶水代码中需要导出如此众多的TypedArray视图，这是因为ArrayBuffer对象代表储存二进制数据的一段内存，不能直接读写，需要通过TypedArray视图或者DataView视图来进行读写。而对于同一段内存根据视图的不同可以有不同的解读方式，因而需要将同一个ArrayBuffer转换为不同的TypedArray对象。这些TypedArray对象都可以直接在JS中通过 Module.HEAPX 的方式获取。
如果C中定义了一个方法 get_int_ptr 返回一个int值的地址，则在JS中可以这样获取其值：

// 获取变量地址
var int_ptr = Module._get_int_ptr();
// 通过Module.HEAP32[int_ptr >> 2]获取了该地址对应的int32值
// 由于Module.HEAP32每个元素占用4字节，因此int_ptr需除以4（既右移2位）方为正确的索引
var int_value = Module.HEAP32[int_ptr >> 2];

3.JS与C/C++互相调用的方式

总的来说，JS与C/C++互相调用的方式有两种：

通过Number类型的参数直接传递
通过内存间接传递

3.1.通过数值类型的参数

js与C/C++有各自的数据体系，但Number是两者的交集，如果在js或者C/C++中直接调用对方的函数，那面可以将Number作为参数和返回值。
js的Number类型其实是64位浮点数，可以精确表达32位及以下整型数、32位浮点数、64位浮点数，但C/C++中的number其实还有64位整型数，这意味着JavaScript与C/C++相互直接调用时，不能使用64位整型数作为参数或返回值。如果直接调用时传递的数据不是number，则会导致传参失败。

3.1.1.JS调用C/C++函数

由于C/C++是强类型语言，因此来自js的Number传入时，会发生隐式类型转换：

若目标类型为int，将执行向0取整
若目标类型为float，类型转换时有可能损失精度

尝试如下代码：

#include <stdio.h>

EM_PORT_API(void) print_int(int a) {
	printf("C{print_int() a:%d}\n", a);
}

EM_PORT_API(void) print_float(float a) {
	printf("C{print_float() a:%f}\n", a);
}

EM_PORT_API(void) print_double(double a) {
	printf("C{print_double() a:%lf}\n", a);
}

其中，EM_PORT_API 是C函数的函数导出宏，需要将下列代码添加到C文件的顶部，否则编译器很有可能会认为定义的函数没有被调用而将其干掉：

// 定义函数导出宏
// __EMSCRIPTEN__宏用于探测是否是Emscripten环境
// __cplusplus用于探测是否C++环境
// EMSCRIPTEN_KEEPALIVE是Emscripten特有的宏，用于告知编译器后续函数在优化时必须保留，并且该函数将被导出至JavaScript
#ifndef EM_PORT_API
#	if defined(__EMSCRIPTEN__)
#		include <emscripten.h>
#		if defined(__cplusplus)
#			define EM_PORT_API(rettype) extern "C" rettype EMSCRIPTEN_KEEPALIVE
#		else
#			define EM_PORT_API(rettype) rettype EMSCRIPTEN_KEEPALIVE
#		endif
#	else
#		if defined(__cplusplus)
#			define EM_PORT_API(rettype) extern "C" rettype
#		else
#			define EM_PORT_API(rettype) rettype
#		endif
#	endif
#endif

在JS中做如下调用：

Module._print_int(3.4)
Module._print_int(4.6)
Module._print_int(-3.4)
Module._print_int(-4.6)
Module._print_float(2000000.03125)
Module._print_double(2000000.03125)

控制台打印：

C{print_int() a:3}
C{print_int() a:4}
C{print_int() a:-3}
C{print_int() a:-4}
C{print_float() a:2000000.000000}
C{print_double() a:2000000.031250}

3.1.2.C/C++调用JS函数

通过将js函数注入C/C++，可以在C/C++中向js函数传递Number。不过这种做法稍微麻烦一些，需要将待注入的js函数单独维护在一个js文件中，比如我们将待注入的js函数放到pkg.js中：

mergeInto(LibraryManager.library, {
  // c将传入两个int，js返回int
  js_add: function (a, b) {
    console.log('js_add')
    return a + b
  },
  // c将传入两个float，js返回float
  js_addF: function (a, b) {
    console.log('js_addF')
    return a + b
  },
  // c将传入一个int，js没有返回
  js_console_log_int: function (param) {
    console.log('js_console_log_int:' + param)
  },
  // c将传入一个float，js没有返回
  js_console_log_float: function (param) {
    console.log('js_console_log_float:' + param)
  },
  // c将传入一个字符串，测试js能否拿到字符串
  js_console_log_string: function (param) {
    console.log('js_console_log_string', param)
  }
})

注意，我们在mergeInto函数的第二个参数中，将需要注入的函数定义为对象的方法。mergeInto将该对象合并到LibraryManager.library中，LibraryManager.library是JavaScript注入C环境的库。
在编译时添加参数 --js-library 表示将js函数注入C，后接js文件地址：

emcc ../index.c -o index.js -s WASM=1 -s "EXPORTED_RUNTIME_METHODS=['ccall']" -s "EXPORTED_FUNCTIONS=['_malloc', '_free', '_main']" --js-library ../pkg.js

在C/C++中需要先声明定义在js的函数，然后才能使用：

// c调用js函数
EM_PORT_API(int) js_add(int a, int b);
EM_PORT_API(float) js_addF(float a, float b);
EM_PORT_API(void) js_console_log_int(int param);
EM_PORT_API(void) js_console_log_float(float param);
EM_PORT_API(void) js_console_log_string(char* str);

EM_PORT_API(void) print_the_answer() {
	int i = js_add(21, 21);
	float j = js_addF(1.1, 1.1);
	js_console_log_int(i);
	js_console_log_float(j);
	js_console_log_string("Hello, wolrd! 你好，世界！");
}

你可以直接在C/C++中调用print_the_answer，也可以在js中通过 Module._print_the_answer() 来调用，结果都是一样的：

js_add
index.js:1911 js_addF
index.js:1920 js_console_log_int:42
index.js:1916 js_console_log_float:2.200000047683716
index.js:1924 js_console_log_string 1024

3.2.通过内存

JavaScript和C/C++通过内存可以传递number或字符串格式的数据，通常用于需要在JavaScript与C/C++之间交换大块的数据

3.2.1.C/C++向JS

3.2.1.1.传递数值

C/C++向js返回Number的指针，js通过Emscripten为Module.buffer创建的常用类型的TypedArray进行读取。
C代码：

EM_PORT_API(int) g_int = 42;
EM_PORT_API(double) g_double = 3.1415926;

EM_PORT_API(int*) get_int_ptr() {
  return &g_int;
}

EM_PORT_API(double*) get_double_ptr() {
  return &g_double;
}

EM_PORT_API(void) print_data() {
  printf("C{g_int:%d}\n", g_int);
  printf("C{g_double:%lf}\n", g_double);
}

js代码：

const int_ptr = Module._get_int_ptr()
// 获取了该地址对应的int32值
// 由于Module.HEAP32每个元素占用4字节
// 因此int_ptr需除以4（既右移2位）方为正确的索引
const int_value = Module.HEAP32[int_ptr >> 2]
console.log("JS{int_value:" + int_value + "}")

const double_ptr = Module._get_double_ptr()
const double_value = Module.HEAPF64[double_ptr >> 3]
console.log("JS{double_value:" + double_value + "}")

// js改动c中定义的变量
Module.HEAP32[int_ptr >> 2] = 13
Module.HEAPF64[double_ptr >> 3] = 123456.789      
Module._print_data()

控制台输出：

1 1 2 3 5 8 13 21 34 55

3.2.1.2.传递字符串

传递字符串的逻辑和传递数值是一样的，C/C++向js返回字符串的指针，js调用UTF8ToString将其转化为js字符串。
C代码：

// 向js传递字符串
EM_PORT_API(const char*) get_string() {
  static const char str[] = "Hello, wolrd! 你好，世界！";
  return str;
}

js：

// C函数get_string()返回了一个字符串的地址
const ptr = Module._get_string()
// 调用UTF8ToString将其转换为js字符串
const str = UTF8ToString(ptr)
console.log(typeof(str))
console.log(str)

控制台打印：

string
index.js:72 Hello, wolrd! 你好，世界！

3.2.2.JS向C/C++

3.2.2.1.传递数值

js调用c中的malloc函数分配内存，该函数返回一个指针，C/C++通过该指针获取对应的内存地址。
js：

const count = 50
// 调用c malloc方法分配内存
const ptr = _malloc(4 * count)

for (let i = 0; i < count; i++){
    Module.HEAP32[ptr / 4 + i] = i + 1
}

console.log(Module._sum(ptr, count))
Module._free(ptr)

C：

// 求数组前count项的和
EM_PORT_API(int) sum(int* ptr, int count) {
  int total = 0;
  for (int i = 0; i < count; i++){
    total += ptr[i];
  }
  return total;
}

控制台输出：1275
需要注意的是，如果要在js代码中使用_malloc，需要在编译时增加参数 EXPORTED_FUNCTIONS，将一些C函数导出

emcc ../index.c -o index.js -s WASM=1 -s "EXPORTED_FUNCTIONS=['_malloc', '_free', '_main']" --js-library ../pkg.js

上述命令导出malloc/free/main三个C函数

3.2.2.2.传递字符串

js使用allocateUTF8()将字符串传入C/C++内存，该方法返回一个指针，C/C++通过该指针获取对应的内存地址。
js：

// 使用allocateUTF8()将字符串传入C/C内存
const ptr = allocateUTF8("你好，Emscripten！")
Module._print_string(ptr)
_free(ptr)

C：

// 打印js通过内存传递的字符串
EM_PORT_API(void) print_string(char* str) {
  printf("%s\n", str);
}

控制台输出：你好，Emscripten！

4.ccall/cwrap

通过上面的例子可以看出，C/C++和js互相传递数据时，如果通过数值或者内存的形式进行传递，过程比较繁琐。为了简化调用过程，Emscripten提供了ccall/cwrap两个函数用于js调用C/C++函数。

4.1.ccall

ccall的语法如下：

const result = Module.ccall(ident, returnType, argTypes, args)

需要传递的参数如下：

ident ：C导出函数的函数名（不含“_”下划线前缀）
returnType：C导出函数的返回值类型，可以为’boolean’、‘number’、‘string’、‘null’，分别表示函数返回值为布尔值、数值、字符串、无返回值
argTypes ：C导出函数的参数类型的数组。参数类型可以为’number’、‘string’、‘array’，分别代表数值、字符串、数组
args：参数数组

以调用上一节C中定义的 print_string 函数为例，采用ccall进行调用的话，只需一行代码：

Module.ccall('print_string', 'null', ['string'], ['你好，Emscripten！'])

需要注意的是，如果要在js代码中使用ccall，需要在编译时增加参数 EXPORTED_RUNTIME_METHODS，将一运行时的函数导出

emcc ../index.c -o index.js -s WASM=1 -s "EXPORTED_RUNTIME_METHODS=['ccall']" -s "EXPORTED_FUNCTIONS=['_malloc', '_free', '_main']" --js-library ../pkg.js

4.2.cwrap

ccall虽然封装了字符串等数据类型，但调用时仍然需要填入参数类型数组、参数列表等，为此cwrap进行了进一步封装：

const func = Module.cwrap(ident, returnType, argTypes)

参数：

ident ：C导出函数的函数名（不含“_”下划线前缀）
returnType：C导出函数的返回值类型，可以为’boolean’、‘number’、‘string’、‘null’，分别表示函数返回值为布尔值、数值、字符串、无返回值
argTypes：C导出函数的参数类型的数组。参数类型可以为’number’、‘string’、‘array’，分别代表数值、字符串、数组

返回值：封装后的方法
同样的，我们需要在编译时将cwrap导出：

emcc ../index.c -o index.js -s WASM=1 -s "EXPORTED_RUNTIME_METHODS=['ccall', 'cwrap']" -s "EXPORTED_FUNCTIONS=['_malloc', '_free', '_main']" --js-library ../pkg.js

还是以调用C中定义的 print_string 函数为例：

const printString = Module.cwrap('print_string', 'null', ['string'])
printString('你好，Emscripten！')

我们只需要调用cwrap封装print_string函数一次，后续调用只需要传递参数即可，用法上有点像bind。

4.3.ccall/cwrap潜在风险

参考《C/C++面向WebAssembly编程》一书，使用ccall/cwrap其实存在潜在风险：

虽然ccall/cwrap可以简化字符串参数的交换，但这种便利性是有代价的——当输入参数类型为’string’/'array’时，ccall/cwrap在C环境的栈上分配了相应的空间，并将数据拷入了其中，然后调用相应的导出函数。
相对于堆来说，栈空间是很稀缺的资源，因此使用ccall/cwrap时需要格外注意传入的字符串/数组的大小，避免爆栈。

5.C/C++调用JS的其他方式

js可以通过ccall/cwrap很方便的调用C/C++。在C/C++中，也有一些方法可以直接调用js代码，主要包括：

EM_ASM宏内联JavaScript代码
emscripten_run_script

5.1.EM_ASM宏

EM_ASM宏只能执行嵌入的jst代码, 无法传入参数或获取返回结果：

#include <emscripten.h>

int main(int argc, char ** argv) {
	EM_ASM(console.log('From EM_ASM', [{a: true}]));
}

我们可以在EM_ASM里面编写任何js代码，可以使用任何js支持的数据类型。

5.2.emscripten_run_script

使用emscripten_run_script时，需要先在C/C++中声明emscripten_run_script：

void emscripten_run_script(const char *script);

int main(int argc, char ** argv) {
	emscripten_run_script("console.log('From emscripten_run_script', [{a: true}]);");
}

在emscripten_run_script内，我们可以通过字符串的形式编写任意js代码，该方法没有返回值。如果想获取返回值，可以使用 emscripten_run_script_int 或 emscripten_run_script_string 获取整型或者字符串类型的返回，两个函数的参数和emscripten_run_script一致。