Golang源码阅读笔记 - String

勇敢的菜鸡

已于 2023-10-27 17:36:16 修改

阅读量590

点赞数

分类专栏： go 文章标签： go 字符串

于 2021-06-07 00:06:56 首次发布

本文链接：https://blog.csdn.net/qq_39679639/article/details/117639077

版权

go 专栏收录该内容

17 篇文章 0 订阅

订阅专栏

String用法说明

在src/buildin/buildin.go文件中，对golang内建数据类型做了详细的描述，关于string的说明如下:

// string is the set of all strings of 8-bit bytes, conventionally but not
// necessarily representing UTF-8-encoded text. A string may be empty, but
// not nil. Values of string type are immutable.
type string string

从中我们可以获取以下信息:
- 字符串是8比特字节的集合
- 字符串一般但不一定是UTF-8文本
  - 个人理解，golang默认是使用UTF-8编码的，但是支持用户修改编码方式
- 字符串可以为空""，但不能为nil
- 字符串不可修改

string底层数据结构

// 代码位置：/src/runtime/string.go
type stringStruct struct {
	str unsafe.Pointer
	len int
}

String底层由两个字段组成
- str unsafe.Pointer: 指向底层数组的指针
- len int: 字符串长度
由底层数据结构可知，len(string)的时间复杂度是O(1)

基础函数

hasPrefix(s, prefix string): 字符串s是否有前缀字符串prefix，时间复杂度O(1)

func hasPrefix(s, prefix string) bool {
	return len(s) >= len(prefix) && s[:len(prefix)] == prefix   // 这个 == 的原理是什么？？？
}

index(s, t string): 判断字符串t在字符串s中首次初选的位置，时间复杂度O(n)

func index(s, t string) int {
	if len(t) == 0 {
		return 0
	}
	for i := 0; i < len(s); i++ {
		if s[i] == t[0] && hasPrefix(s[i:], t) {
			return i
		}
	}
	return -1
}

contains(s, t string) bool: 字符串s是否包含字符串t, 即通过index查看首次出现的位置时间复杂度O(n)
```
func contains(s, t string) bool {
	return index(s, t) >= 0
}
```

stringStructOf(sp *string) *stringStruct: 获取字符串sp的底层数据结构体stringStruct

func stringStructOf(sp *string) *stringStruct {
	return (*stringStruct)(unsafe.Pointer(sp))
}

rawstring(size int) (s string, b []byte): 获取字节大小为size的字符串

// rawstring allocates storage for a new string. The returned
// string and byte slice both refer to the same storage.
// The storage is not zeroed.
func rawstring(size int) (s string, b []byte) {
	p := mallocgc(uintptr(size), nil, false)   // 先分配一段内存空间，大小为size个字节，无类型，无零值

	stringStructOf(&s).str = p    // 将分配内存空间的地址，赋值给string的底层slice指针
	stringStructOf(&s).len = size  // 字符串s大小为size个字节

	*(*slice)(unsafe.Pointer(&b)) = slice{p, size, size}  // 同时，定义一个slice类型（也指向同一个地址空间）赋值给b

	return
}

rawstring函数给一个新字符串分配存储，返回一个字符串和一个字节slice，两者指向同一个内存空间，存储空间没有初始化零值（mallocgc函数未初始化零值）

stringDataOnStack(s string): 字符串s是否在当前goroutine的栈上

func stringDataOnStack(s string) bool {
	ptr := uintptr(stringStructOf(&s).str)  // 获取字符串s的指针地址
	stk := getg().stack  // 获取当前goroutine的栈指针
	return stk.lo <= ptr && ptr < stk.hi  // 字符串s的地址是否在栈的高低地址之间
}

slicebytetostringtmp: 把字节slice转换成字符串。其实就是把字节slice的指针赋值给字符串s的地址指针，再加上长度n即可
```
func slicebytetostringtmp(ptr *byte, n int) (str string) {
	stringStructOf(&str).str = unsafe.Pointer(ptr)
	stringStructOf(&str).len = n
	return
}
```

string拼接x + y + z

func concatstrings(buf *tmpBuf, a []string) string {
	idx := 0
	l := 0
	count := 0
	for i, x := range a {
		n := len(x)
		if n == 0 {
			continue
		}
		if l+n < l {
			throw("string concatenation too long")
		}
		l += n
		count++
		idx = i
	}
	if count == 0 {
		return ""
	}
	if count == 1 && (buf != nil || !stringDataOnStack(a[idx])) {
		return a[idx]
	}
	s, b := rawstringtmp(buf, l)
	for _, x := range a {
		copy(b, x)
		b = b[len(x):]
	}
	return s
}

多个字符串拼接，其原理是先遍历所有子字符串，获取其长度l后，先在内存中分配长度l的内存空间，然后将子字符串依次赋值到新空间内。时间复杂度为O(n)

string转slice

// 定义了默认缓冲区大小为32bytes
const tmpStringBufSize = 32
type tmpBuf [tmpStringBufSize]byte

// string => Slice
func stringtoslicebyte(buf *tmpBuf, s string) []byte {
	var b []byte
	// 如果字符串长度小于32bytes，直接使用默认缓冲区
	if buf != nil && len(s) <= len(buf) {
		*buf = tmpBuf{}
		b = buf[:len(s)]
	} else {
	// 如果字符换长度大于32bytes，需要新申请一块内存区域
		b = rawbyteslice(len(s))
	}
	// 始终产生一次数据拷贝
	copy(b, s)
	return b
}

// 重新申请一块内存，构建新的byte slice
func rawbyteslice(size int) (b []byte) {
	cap := roundupsize(uintptr(size))
	p := mallocgc(cap, nil, false)
	if cap != uintptr(size) {
		memclrNoHeapPointers(add(p, uintptr(size)), cap-uintptr(size))
	}

	*(*slice)(unsafe.Pointer(&b)) = slice{p, size, int(cap)}
	return
}

slice转string

func slicebytetostring(buf *tmpBuf, ptr *byte, n int) (str string) {
	...
	var p unsafe.Pointer
	// 如果字节数小于32，直接使用缓冲区
	if buf != nil && n <= len(buf) {
		p = unsafe.Pointer(buf)
	} else {
	// 否则，重新申请内存
		p = mallocgc(uintptr(n), nil, false)
	}
	// 构建string，内存为tmp地址或新申请内存地址，长度为n
	stringStructOf(&str).str = p
	stringStructOf(&str).len = n
	// 从ptr指针拷贝n个字节到p指针，也存在一次数据拷贝
	memmove(p, unsafe.Pointer(ptr), uintptr(n))
	return
}

勇敢的菜鸡

关注

0
点赞
踩
2

收藏

觉得还不错? 一键收藏
0
评论
Golang源码阅读笔记 - String

String用法说明在src/buildin/buildin.go文件中，对golang内建数据类型做了详细的描述，关于string的说明如下:// string is the set of all strings of 8-bit bytes, conventionally but not// necessarily representing UTF-8-encoded text. A string may be empty, but// not nil. Values of string typ
复制链接

扫一扫