golang中 strings.Trim()的怪异行为

北雪浪子

已于 2022-09-30 12:00:04 修改

阅读量1.2k

点赞数 1

文章标签： golang 开发语言后端

于 2022-09-30 11:53:30 首次发布

本文链接：https://blog.csdn.net/prodigalhero/article/details/127121180

版权

Go语言 strings.Trim() ASCII集字符去除行为分析

关键词由CSDN通过智能技术生成

在golang中有个函数的定义如下：

func Trim(s, cutset string) string

按照官方文档的说明是：

Trim returns a slice of the string s with all leading and trailing Unicode code points contained in cutset removed.

这个函数返回一个 string的 slice，并且所有头和尾的cutset被去掉。但是实际使用过程中，发现了一些不一样的情况，下面展示一下：

package main

import (
	"fmt"
	"strings"
)

func main() {
	fmt.Println(strings.Trim("-----Hello, Gophers-----", "-"))
	fmt.Println(strings.Trim("!-!-!-Hello, Gophers-!-!-!-!", "!-"))
	fmt.Println(strings.Trim("!-!-!-Hello,-!-Go-!-!", "!-"))
}

得到的结果：

Hello, Gophers
Hello, Gophers
Hello,-!-Go

但是从正常理解来看，我认为的结果应该是：

Hello, Gophers
Hello, Gophers-!
Hello,-!-Go-!

那么问题到底出在哪里呢？

于是可以查看一下对应的源码：

// Trim returns a slice of the string s with all leading and
// trailing Unicode code points contained in cutset removed.
func Trim(s, cutset string) string {
	if s == "" || cutset == "" {
		return s
	}
	if len(cutset) == 1 && cutset[0] < utf8.RuneSelf {
		return trimLeftByte(trimRightByte(s, cutset[0]), cutset[0])
	}
	if as, ok := makeASCIISet(cutset); ok {
		return trimLeftASCII(trimRightASCII(s, &as), &as)
	}
	return trimLeftUnicode(trimRightUnicode(s, cutset), cutset)
}

其中然后再接着查看首先可以很明显的发觉我们的情况是第二个 if 条件，那么我们来深挖一下这个情况；

	if as, ok := makeASCIISet(cutset); ok {
		return trimLeftASCII(trimRightASCII(s, &as), &as)
	}

以及这个获取 as的函数：

type asciiSet [8]uint32

// makeASCIISet creates a set of ASCII characters and reports whether all
// characters in chars are ASCII.
func makeASCIISet(chars string) (as asciiSet, ok bool) {
	for i := 0; i < len(chars); i++ {
		c := chars[i]
		if c >= utf8.RuneSelf {
			return as, false
		}
		as[c/32] |= 1 << (c % 32)
	}
	return as, true
}

然后我们就可以开始准备一下恍然大悟了，直接和大家说结论吧，这个函数的功能有两个：

判断 chars中的每一个字符是不是个 ASCII，如果有一个不是就返回false
用一个 8 * 32 = 256比特的连续空间来表示 chars这个字符串中所有出现的字符

其中有些令人困惑的应该是 as[c/32] |= 1 << (c%32)，其实这个也好理解就是首先定位 c 应该在那个 uint32里面，然后使用 c%32来确定哪一个bit应该被设为1。

基于这个考虑我们可以编写这样的程序：

package main

import (
	"fmt"
	"unicode/utf8"
)

type asciiSet [8]uint32

func main() {
	makeASCIISet("ababab")
	makeASCIISet("ab")
	makeASCIISet("baba")

}


// makeASCIISet creates a set of ASCII characters and reports whether all
// characters in chars are ASCII.
func makeASCIISet(chars string) (as asciiSet, ok bool) {
	for i := 0; i < len(chars); i++ {
		c := chars[i]
		if c >= utf8.RuneSelf {
			return as, false
		}
		as[c/32] |= 1 << (c % 32)
	}
	fmt.Println(as)

	return as, true
}

然后就得到了这样的结果：

[0 0 0 6 0 0 0 0]
[0 0 0 6 0 0 0 0]
[0 0 0 6 0 0 0 0]

其实这样就解释 strings.Trim()的怪异行为。不过我们还是把这个函数分析完：

func trimLeftASCII(s string, as *asciiSet) string {
	for len(s) > 0 {
		if !as.contains(s[0]) {
			break
		}
		s = s[1:]
	}
	return s
}

// contains reports whether c is inside the set.
func (as *asciiSet) contains(c byte) bool {
	return (as[c/32] & (1 << (c % 32))) != 0
}

其中的 trimLeftASCII就是判断是否从左边的字符在 as中，如果在就删除，只要出现了一个不在，就直接 break然后返回。

我们可以编写这个函数来验证一下：

package main

import (
	"fmt"
	"strings"
)

type asciiSet [8]uint32

func main() {
	fmt.Println(strings.Trim("---a--Hello, Gophers-----", "-a"))
}

其中的返回结果：

Hello, Gophers

最后记录一下，这个行为说明了 strings.Trim就是单纯为了去除字符串前后的空格，如果想用这个奇怪的行为也许需要多多考虑一下了。

北雪浪子

关注

1
点赞
踩
1

收藏

觉得还不错? 一键收藏
0
评论
golang中 strings.Trim()的怪异行为

golang中的Strings.Trim()的怪异行为
复制链接

扫一扫