在做一些挖掘之后,我下面介绍我的发现…考虑这些测试文件:
a.txt
ΓΔΘΛΞΠΣΦΩαβγδεζηθικλμνξπρςστυφχψω
b.txt
தமிழ்
首先,我们读取文件:
%# open file in binary mode, and read a list of bytes
fid = fopen('a.txt', 'rb');
b = fread(fid, '*uint8')'; %'# read bytes
fclose(fid);
%# decode as unicode string
str = native2unicode(b,'UTF-8');
如果你尝试打印字符串,你会得到一堆废话:
>> str
str =
尽管如此,str确保了正确的字符串。我们可以检查每个字符的Unicode代码,这可以在ASCII范围之外看到(最后两个是不可打印的CR-LF行结尾):
>> double(str)
ans =
Columns 1 through 13
915 916 920 923 926 928 931 934 937 945 946 947 948
Columns 14 through 26
949 950 951 952 953 954 955 956 957 958 960 961 962
Columns 27 through 35
963 964 965 966 967 968 969 13 10