c语言中的字符串定义:以“\0”结尾
JAVA中的字符串:String不属于8种基本数据类型,String是一个对象。
在 java 语言中, 用来处理字符串的的类常用的有 3 个: String、StringBuffer、StringBuilder。
它们的异同点:
1) 都是 final 类, 都不允许被继承;
2) String 长度是不可变的, StringBuffer、StringBuilder 长度是可变的;
3) StringBuffer 是线程安全的, StringBuilder 不是线程安全的。
创建字符串的示例:
final String headers = new StringBuilder(512)
.append("Build: ").append(Build.FINGERPRINT).append("\n")
.append("Hardware: ").append(Build.BOARD).append("\n")
.append("Revision: ")
.append(SystemProperties.get("ro.revision", "")).append("\n")
.append("Bootloader: ").append(Build.BOOTLOADER).append("\n")
.append("Radio: ").append(Build.RADIO).append("\n")
.append("Kernel: ")
.append(FileUtils.readTextFile(new File("/proc/version"), 1024, "...\n"))
.append("\n").toString();
JNIEXPORT jstring JNICALLJava_Test_sayHello
(JNIEnv * env, jobject obj, jstring s)
{
char * str;
str=(char*)(*env)->GetStringUTFChars(env,s,NULL);
printf("%s",str);
(*env)->ReleaseStringUTFChars(env, s, str);
......
}
//! This is a string holding UTF-16 characters. class String16}
//! This is a string holding UTF-8 characters. Does not allow the value more // than 0x10FFFF, which is not valid unicode codepoint. class String8 {}
来自java层的字符串,是通过该函数转化的,就是java层来的是UTF16
static String8 good_old_string(const String16& src) { String8 name8; char ch8[2]; ch8[1] = 0; for (unsigned j = 0; j < src.size(); j++) { char16_t ch = src[j]; if (ch < 128) ch8[0] = (char)ch; name8.append(ch8); } return name8; }
What is the difference between UTF-8 and UTF-16?
UTF-8 uses a minimum of 1 8-bit byte to encode character s. For the 128 7-bit characters of the ASCII character set, it is backward-compatible with ASCII: a roman-alphabet ASCII text encoded in UTF-8 will display normally on a system that does not understand UTF-8. Accented characters are not part of ASCII and so they will all be more or less garbled. Beyond 1 byte, UTF-8 may use 2, 3 or 4 bytes to encode the rest of the Unicode character set. Because of the way it uses the first byte of multi-byte sequences, UTF-8 uses 3 bytes for some characters that require only 2 bytes in UTF-16.UTF-16 uses a minimum of 2 bytes/16 bits . This makes it in compatible with ASCII. Given an /A-Za-z/ text in UTF-16, a system that does not understand UTF-16 will make a mess of it (showing a null character before every single character).
A few examples:
- "A" in ASCII is hex 0x41; in UTF-8 it is also 0x41; in UTF-16 it is 0x0041
- "À" in Latin-1 is 0xC0; in UTF-8 it is 0xC3 0x80; in UTF-16 it is 0x00C0
- The Tibetan letter ཨ in UTF-8 is 0xE0 0xBD 0xA8; it UTF-16 it is 0x0F68
- This character*: http://www.fileformat.inf
o/info/... in UTF-8 is 0xF0 0xA0 0x80 0x8B; in UTF-16 it is 0xD840 0xDC0B