1. Strings, bytes and Unicode conversions
Passing Python strings to C++
将python str格式数据传递给C++的函数,如果C++函数的形参是 std::string
or char *
,pybind11会自动将Python stringz转为UTF-8的编码方式。所有python的str都能以UTF-8来编码,所以pybind11的这个操作通常都会成功。
The C++ language is encoding agnostic. It is the responsibility of the programmer to track encodings. It’s often easiest to simply use UTF-8 everywhere.
PYBIND11_MODULE(py_string_to_cpp, m){
m.def("utf8_test", [](const std::string &s ){
std::cout<<"utf-8 is icing on cake!!";
std::cout<< s << std::endl;
});
m.def("utf8_charptr", [](char* s){
std::cout<<"my favoriate food is "<< s <<std::endl;
});
}
s = "cake noodles"
utf8_test(s)
utf8_charptr(s)
无论C++的函数的形参是传值调用
还是引用调用
,无论形参中是否使用const
,测试结果都是一样的。
Passing bytes to C++
python bytes对象 传递给形参为 std::string or char*
的C++函数,无需类型转换
。
为了在python3中使函数只接受bytes (and not str)
,在C++中使用py::bytes
来声明形参。
2. Returning C++ strings to Python
C++返回std::string or char*
给python,pybind11会假定 string 为UTF-8ge格式,并将编码为python的str(using the same API as Python uses to perform bytes.decode(‘utf-8’))。如果编码失败,pybind11会报错(UnicodeDecodeErro)
m.def("std_string_return", [](){
return std::string("this std::string needs to be UTF-8 encoded!");
});
m.def("char_ptr_return", [](){
char * s = "thish string needs to be UTF-8 encoded!";
return s;
from py_string_to_cpp import std_string_return, char_ptr_return
print(std_string_return())
print(char_ptr_return())
isinstance(std_string_return(), str)
isinstance(char_ptr_return(), str)
this std::string needs to be UTF-8 encoded!
thish string needs to be UTF-8 encoded!
True
True
Because UTF-8 is inclusive of pure ASCII, there is never any issue with returning a pure ASCII string to Python. If there is any possibility that the string is not pure ASCII, it is necessary to ensure the encoding is valid UTF-8.
Wide character strings
当Python str传递给形参为std::wstring, wchar_t*, std::u16string or std::u32string
的C++函数时,str会被编码为UTF-16 or UTF-32
,
取决于C++编译器。当这些类型的string从C++向python返回时,会假定这些string有效的UTF-16 UTF-32 格式,并将其编码为python str。
#define UNICODE
#include <windows.h>
m.def("set_window_text",
[](HWND hwnd, std::wstring s) {
// Call SetWindowText with null-terminated UTF-16 string
::SetWindowText(hwnd, s.c_str());
}
);
Character literals
形参为 char wchar_t
的C++函数,如果收到python str类型的输入,会将python str 的第一个字符作为函数的输入,后面的字符会被忽略。
当C++返回一个Character literal时,会将其转换为只有一个字符的python str。
m.def("pass_char", [](char c){
return c;
});
m.def("pass_wchar", [](wchar_t wc){
return wc;
});
from py_string_to_cpp import pass_char, pass_wchar
try:
print(pass_char("abcde"))
print(pass_wchar("abcde"))
except Exception as e:
print(e)
else:
print("pass_car can accept multi char")
finally:
print(pass_char("a"))
print(pass_wchar("a"))
Expected a character, but multi-character string found
a
a