Strings
Overview
VEX includes a string datatype. This is useful in several places:
VEX包含一个字符串数据类型。这在以下几个方面很有用:
-
Manipulating text 操纵文本
-
Referencing filenames and op node names 引用文件名和op节点名
-
Manipulating binary data 操作二进制数据
String Literals 字符串
String literals can be enclosed in either single quotes (') or double quotes ("). Strings may also be specified using the Python or C++ raw-string format.
字符串文本可以用单引号(')或双引号(")括起来。字符串也可以使用Python或c++ raw-string格式指定。
string s = 'foo'; string t = "bar"; string py = r"Hello world\n"; // Python style, equivalent to "Hello world\\n" string cpp = R"(Hello world\n)"; // C++ style, equivalent to "Hello world\\n"
Escaped strings (non-raw strings) automatically convert known escape sequences to their representative byte sequences. For example "\n" will convert to the ASCII byte to emit a newline.
转义字符串(非原始字符串)自动将已知的转义序列转换为它们的代表性字节序列。例如,“\n”将转换为ASCII字节以发出换行。
Raw strings ignore escape sequences. For a raw string, the "\n" will be interpreted literally as a backslash and the lower case n
.
原始字符串忽略转义序列。对于原始字符串,“\n”将按字面意思解释为反斜杠和小写n。
The syntax for strings can be summarized
可以总结字符串的语法
-
Escaped strings
"text"
or'text'
转义字符串 “text”或“text” -
Python raw-strings
r"raw text" Python原始字符串
-
C++ raw-string
R"delimiter(raw text)delimiter" C++原始字符串
Where the
delimiter
is an optional string of 0 to 16 characters. Unlike Python raw-strings, C++ style raw strings can contain multi-line text and even binary data.
其中分隔符是由0到16个字符组成的可选字符串。与Python原始字符串不同,c++风格的原始字符串可以包含多行文本,甚至二进制数据。
string escaped = 'Line 1\nLine 2'; string raw = r"Line 1\nLine 1 continues"; // "Line 1\\nLine 1 continues" string cppraw = R"(Line 1\nLine 1 continues)"; // "Line 1\\nLine 1 continues" string cppmultiline = R"multi(This is a long string which has multiple lines. The string also contains an embedded raw string R"(raw string)" But since the delimiter doesn't match, the string isn't actually ended until here.)multi";
Declaring string types
声明字符串类型
To declare a string variable, the general form is string var_name
:
要声明一个字符串变量,一般的形式是string var_name:
// My string is a normal string string mystring;
To declare a function that returns a string:
声明一个返回字符串的函数:
// A function which returns a string string rgb_name() { ... };
To specify a literal array, use double quotes or single quotes. 若要指定文字数组,请使用双引号或单引号。
string a_string = "hello world!"; string another_string = 'good-bye!'
Accessing and setting string values
访问和设置字符串值
Use string[index]
to look up a character by its position in the array.
使用string[index]根据字符在数组中的位置查找字符。
The index is a byte offset into the string, not a character offset. This is an important distinction when you deal with Unicode strings. VEX assumes a UTF-8 encoding for all strings. If the given offset isn’t a valid UTF-8 character, an empty string is returned. Otherwise, the full UTF-8 character is returned - this may be a string of length greater than one!
索引是字符串中的字节偏移量,而不是字符偏移量。这是处理Unicode字符串时的一个重要区别。VEX对所有字符串都采用UTF-8编码。如果给定的偏移量不是有效的UTF-8字符,则返回一个空字符串。否则,将返回完整的UTF-8字符——这可能是一个长度大于1的字符串!
String bounds are checked at run time. Reading out of bounds will result in an empty string. This may generate a warning or optional run-time error in the future.
在运行时检查字符串边界。读取超出界限将导致空字符串。这可能在将来生成警告或不确定的运行时错误。
Python-style indexing is used. This means negative indices refer to positions from the end of the array.
使用python风格的索引。这意味着负索引指的是数组末尾的位置。
The slice notation can be used with the square brackets to extract ranges of a string. This will operate on byte sequences, side stepping the UTF-8 requirements of the normal square bracket operation. Thus if you want the third byte, regardless of whether it is a valid UTF-8 or not, use:
切片符号可以与方括号一起用于提取字符串的范围。这将对字节序列进行操作,侧步执行UTF-8要求的正常方括号操作。因此,如果您想要第三个字节,无论它是否是有效的UTF-8,请使用:
string a_string = "hello world!"; string thirdbyte = a_string[2:3];
You cannot assign values to an array using square brackets.
不能使用方括号为数组赋值。
(The getcomp function is the equivalent for using the square brackets notation.)(getcomp函数与使用方括号符号是等价的。)
Looping over a string
See foreach.
Note you will get empty strings for the offsets which do not correspond to valid unicode characters.
Working with strings
The following functions let you query and manipulate arrays.下面的函数允许查询和操作数组。
Returns the length of a string.返回字符串的长度
append Adds an item to an array or string.向数组或字符串添加项。
Adds another array to the end of this one.
在这个数组的末尾添加另一个数组。
ord int ord(string value)
Converts a UTF-8 string to a codepoint.
将UTF-8字符串转换为码点。
chr chr(int value)
Converts a codepoint to a UTF-8 string.
将代码点转换为UTF-8字符串。