go的字符编码、strings.Builder、Reader

最新推荐文章于 2024-06-15 10:54:51 发布

红鲤鱼与绿鲤鱼与驴__

最新推荐文章于 2024-06-15 10:54:51 发布

阅读量978

点赞数

分类专栏： Go 文章标签： unicode与字符编码 Builder Reader

本文链接：https://blog.csdn.net/zy13270867781/article/details/90905144

版权

Go 专栏收录该内容

3 篇文章 0 订阅

订阅专栏

unicode与字符编码

1、go使用的编码

Go 语言采用的字符编码方案从属于 Unicode 编码规范。更确切地说，Go 语言的代码正是由 Unicode 字符组成的。Go 语言的所有源代码，都必须按照 Unicode 编码规范中的 UTF-8 编码格式进行编码。

Go 语言的源码文件必须使用 UTF-8 编码格式进行存储。如果源码文件中出现了非 UTF-8 编码的字符，那么在构建、安装以及运行的时候，go 命令就会报告错误“illegal UTF-8 encoding”。

Go 语言不但拥有可以独立代表 Unicode 字符的类型rune，而且还有可以对字符串值进行 Unicode 字符拆分的for语句。

所以：go中的字符串都是unicode格式

example1:

func main() {
    str := "Go爱好者"
    fmt.Printf("The string: %q\n", str)
    fmt.Printf("  => runes(char): %q\n", []rune(str))
    fmt.Printf("  => runes(hex): %x\n", []rune(str))
    fmt.Printf("  => runes(d): %d\n", []rune(str))
    fmt.Printf("  => bytes(hex): [% x]\n", []byte(str))
}

output:

The string: "Go爱好者"
  => runes(char): ['G' 'o' '爱' '好' '者']
  => runes(hex): [47 6f 7231 597d 8005]
  => runes(d): [71 111 29233 22909 32773]
  => bytes(hex): [47 6f e7 88 b1 e5 a5 bd e8 80 85]

example2:

func main() {
    str := "Go爱好者"
    for i, c := range str {
        fmt.Printf("%d: %q [% x]\n", i, c, []byte(string(c)))
    }
}

string类型值会由若干个 Unicode 字符组成，每个 Unicode 字符都可以由一个rune类型的值来承载。
一个string类型的值在底层就是一个能够表达若干个 UTF-8 编码值的字节序列

for语句会先把被遍历的字符串值拆成一个字节序列，然后再试图找出这个字节序列中包含的每一个 UTF-8 编码值，或者说每一个 Unicode 字符。相邻的 Unicode 字符的索引值并不一定是连续的。这取决于前一个 Unicode 字符是否为单字节字符。

output:

0: 'G' [47]
1: 'o' [6f]
2: '爱' [e7 88 b1]
5: '好' [e5 a5 bd]
8: '者' [e8 80 85]

strings包与字符串操作

strings.Builder 和 strings.Reader

strings.Builder

与string值相比，strings.Builder类型的值有哪些优势？

与string值相比，Builder值的优势其实主要体现在字符串拼接方面
已存在的内容不可变，但可以拼接更多的内容；
减少了内存分配和内容拷贝的次数；
可将内容重置，可重用值。

Builder值中有一个用于承载内容的容器,它是一个以byte为元素类型的切片
Builder 结构

type Builder struct {
    addr *Builder // of receiver, to detect copies by value
    buf  []byte
}

Builder值并不允许对内部元素进行任意修改，所以Builder值中的内容是不可变的，可以利用Builder值提供的方法拼接更多的内容，而丝毫不用担心这些方法会影响到已存在的内容。
可以通过Write、WriteByte、WriteRune和WriteString 进行拼接

Builder值会自动地对自身的内容容器进行扩容，自动扩容策略与切片的扩容策略一致
example:

func main() {
    // 示例1。
    var builder1 strings.Builder
    builder1.WriteString("A Builder is used to efficiently build a string using Write methods.")
    
    fmt.Printf("The first output(%d):\n%q\n", builder1.Len(), builder1.String())
    fmt.Println()
    builder1.WriteByte(' ')
    builder1.WriteString("It minimizes memory copying. The zero value is ready to use.")
    builder1.Write([]byte{'\n', '\n'})
    builder1.WriteString("Do not copy a non-zero Builder.")
    fmt.Printf("The second output(%d):\n\"%s\"\n", builder1.Len(), builder1.String())
    fmt.Println()
    
    // 示例2。
    fmt.Println("Grow the builder ...")
    builder1.Grow(10)	//主动扩容
    fmt.Printf("The length of contents in the builder is %d.\n", builder1.Len())
    fmt.Println(builder1.Cap())
    fmt.Println()
    
    // 示例3。
    fmt.Println("Reset the builder ...")
    builder1.Reset()	// 重置为空
    fmt.Printf("The third output(%d):\n%q\n", builder1.Len(), builder1.String())
}

strings.Reader

Reader 结构:

type Reader struct {
    s        string
    i        int64 // current reading index
    prevRune int   // index of previous rune; or < 0
}

reader.Read()方法读取内容的时候会记录已读计数
reader.ReadAt() 不会记录已读技术和修改
reader.Seek() 会更索引位置下次在read的会从该位置继续读取

seek 第二个参数，whence 值有三个，代表从当前reader的起始、当前、结束位置进行偏移，并返回最终修改后的索引值

SeekStart = 0 // seek relative to the origin of the file
SeekCurrent = 1 // seek relative to the current offset
SeekEnd = 2 // seek relative to the end

Reader值实现高效读取的关键就在于它内部的已读计数。计数的值就代表着下一次读取的起始索引位置。它可以很容易地被计算出来。
Reader值的Seek方法可以直接设定该值中的已读计数值。

example:

func main() {
    // 示例1。
    reader1 := strings.NewReader(
        "NewReader returns a new Reader reading from s. " +
            "It is similar to bytes.NewBufferString but more efficient and read-only.")
    fmt.Printf("The size of reader: %d\n", reader1.Size())
    fmt.Printf("The len of reader: %d\n", reader1.Len())
    fmt.Printf("The reading index in reader: %d\n",
        reader1.Size()-int64(reader1.Len()))
    
    buf1 := make([]byte, 47)
    n, _ := reader1.Read(buf1)	// 从reader中读出buf1大小的内容 返回读取的字节数
    fmt.Printf("%d bytes were read. (call Read)\n", n)
    fmt.Printf("The reading index in reader: %d\n",
        reader1.Size()-int64(reader1.Len()))
    fmt.Printf("buf1:%s\n",buf1)
    fmt.Println(reader1)
    fmt.Println()
    
    // 示例2。
    buf2 := make([]byte, 21)
    offset1 := int64(64)
    n, _ = reader1.ReadAt(buf2, offset1)
    fmt.Printf("%d bytes were read. (call ReadAt, offset: %d)\n", n, offset1)
    fmt.Printf("The reading index in reader: %d\n",
        reader1.Size()-int64(reader1.Len()))
    fmt.Printf("buf2:%s\n",buf2)
    fmt.Println(reader1)
    fmt.Println()
    n, _ = reader1.Read(buf2)
    fmt.Println(reader1)
    
    fmt.Println()
    // 示例3。
    offset2 := int64(17)
    expectedIndex := reader1.Size() - int64(reader1.Len()) + offset2
    fmt.Printf("Seek with offset %d and whence %d ...\n", offset2, io.SeekCurrent)
    readingIndex, _ := reader1.Seek(offset2, io.SeekCurrent)
    fmt.Printf("The reading index in reader: %d (returned by Seek)\n", readingIndex)
    fmt.Printf("The reading index in reader: %d (computed by me)\n", expectedIndex)
    
    fmt.Println(reader1)
    n, _ = reader1.Read(buf2)
    fmt.Printf("%d bytes were read. (call Read)\n", n)
    fmt.Printf("The reading index in reader: %d\n",
        reader1.Size()-int64(reader1.Len()))
    fmt.Println(reader1)
}

output:

The size of reader: 119
The len of reader: 119
The reading index in reader: 0
47 bytes were read. (call Read)
The reading index in reader: 47
buf1:NewReader returns a new Reader reading from s. 
&{NewReader returns a new Reader reading from s. It is similar to bytes.NewBufferString but more efficient and read-only. 47 -1}

21 bytes were read. (call ReadAt, offset: 64)
The reading index in reader: 47
buf2:bytes.NewBufferString
&{NewReader returns a new Reader reading from s. It is similar to bytes.NewBufferString but more efficient and read-only. 47 -1}

&{NewReader returns a new Reader reading from s. It is similar to bytes.NewBufferString but more efficient and read-only. 68 -1}

Seek with offset 17 and whence 1 ...
The reading index in reader: 85 (returned by Seek)
The reading index in reader: 85 (computed by me)
&{NewReader returns a new Reader reading from s. It is similar to bytes.NewBufferString but more efficient and read-only. 85 -1}
21 bytes were read. (call Read)
The reading index in reader: 106
&{NewReader returns a new Reader reading from s. It is similar to bytes.NewBufferString but more efficient and read-only. 106 -1}

其他string函数待补充

红鲤鱼与绿鲤鱼与驴__

关注

0
点赞
踩
1

收藏

觉得还不错? 一键收藏
0
评论
go的字符编码、strings.Builder、Reader

unicode与字符编码1、go使用的编码Go 语言采用的字符编码方案从属于 Unicode 编码规范。更确切地说，Go 语言的代码正是由 Unicode 字符组成的。Go 语言的所有源代码，都必须按照 Unicode 编码规范中的 UTF-8 编码格式进行编码。Go 语言的源码文件必须使用 UTF-8 编码格式进行存储。如果源码文件中出现了非 UTF-8 编码的字符，那么在构建、安装以及运...
复制链接

扫一扫

专栏目录