rust 简单写个短字符串优化

最新推荐文章于 2024-08-13 20:04:34 发布

noooob_snake

最新推荐文章于 2024-08-13 20:04:34 发布

阅读量136

点赞数 5

文章标签： rust 开发语言

本文链接：https://blog.csdn.net/qq_39048200/article/details/140619147

版权

文章目录

前言
一、数据结构
二、实现基本操作
三、跑分

前言

在比较c++和rust的字符串的时候就发现c++在字符串比较短的时候会分配在栈上，网上叫短字符串优化（SSO），但是rust没有这个优化，于是自己简单实现一下

一、数据结构

结构体需要存长度、容量、字符串数据。当字符少的时候存在栈上，多了就把数据区域变成指针，把字符串存在堆上。

又因为enum附带了tag信息会占空间，所以直接用union

union Store {
    data: [u8; 8],
    // *mut [u8] 胖指针 占空间
    ptr: NonNull<u8>,
}

enum StoreEnum {
    Stack([u8; 8]),
    Heap(NonNull<u8>),
}

#[test]
fn t_size() {
    assert_eq!(size_of::<[u8; 8]>(), size_of::<NonNull<u8>>());
    assert_eq!(size_of::<Store>(), 8);
    assert_eq!(size_of::<StoreEnum>(), 16);
}

最终结构如下，保证自己写的结构体空间跟标准库的一致

pub struct MyString {
    len: usize, // 8
    cap: usize, // 8
    /// stack char / heap pointer
    /// cap == 8   cap > 8
    store: Store, // 8
}

#[test]
fn t_size() {
    assert_eq!(size_of::<String>(), 24);
    assert_eq!(size_of::<MyString>(), 24);
}

二、实现基本操作

最初创建时容量为8，数据在栈上

impl MyString {
    pub fn new() -> Self {
        MyString {
            len: 0,
            cap: 8,
            store: Store { data: [0; 8] },
        }
    }

    fn grow(&mut self) -> Result<()> {
        let new_cap = self.cap * 2;

        let new_layout = Layout::array::<u8>(new_cap)?;

        // 栈
        let new_p = if self.cap == 8 {
            unsafe {
                let new_p = alloc(new_layout);

                // 栈上的数据拷贝到堆上
                ptr::copy_nonoverlapping(&self.store.data as *const u8, new_p, 8);

                new_p
            }
        } else {
            let old_layout = Layout::array::<u8>(self.cap)?;

            unsafe {
                let old_p = self.store.ptr.as_ptr();

                realloc(old_p, old_layout, new_layout.size())
            }
        };

        self.store.ptr = NonNull::new(new_p).unwrap();
        self.cap = new_cap;

        Ok(())
    }

    pub fn push(&mut self, ch: char) {
        // 基本类型没有实现Drop->可以索引
        let ch = ch as u8;

        // 栈
        if self.len < 8 && self.cap == 8 {
            unsafe {
                self.store.data[self.len] = ch;
            }
        } else { // 堆
            if self.len == self.cap {
                self.grow().unwrap();
            }

            unsafe {
                *self.store.ptr.as_ptr().add(self.len) = ch;
            }
        }

        self.len += 1;
    }

    pub fn pop(&mut self) -> Option<char> {
        if self.len == 0 {
            None
        } else {
            self.len -= 1;

            if self.cap == 8 {
                unsafe {
                    Some(self.store.data[self.len] as char)
                }
            } else {
                unsafe {
                    Some(*self.store.ptr.as_ptr().add(self.len) as char)
                }
            }
        }
    }
}

三、跑分

用criterion库跑bench，简单的比较一下增删速度

use criterion::Criterion;

use structure::string::MyString;

fn cmp_short_str(c: &mut Criterion) {
    let mut group = c.benchmark_group("string<8");

    group.bench_function("std", |b| {
        b.iter(|| {
            let mut s = String::new();

            for _ in 0..8 {
                s.push('a');
            }

            for _ in 0..8 {
                s.pop();
            }
        })
    });

    group.bench_function("sso", |b| {
        b.iter(|| {
            let mut s = MyString::new();

            for _ in 0..8 {
                s.push('a');
            }

            for _ in 0..8 {
                s.pop();
            }
        })
    });
}

fn cmp_long_str(c: &mut Criterion) {
    let mut group = c.benchmark_group("string>8");

    group.bench_function("std", |b| {
        b.iter(|| {
            let mut s = String::new();

            for _ in 0..1000 {
                s.push('a');
            }

            for _ in 0..1000 {
                s.pop();
            }
        })
    });

    group.bench_function("sso", |b| {
        b.iter(|| {
            let mut s = MyString::new();

            for _ in 0..1000 {
                s.push('a');
            }

            for _ in 0..1000 {
                s.pop();
            }
        })
    });
}

criterion_group!(short_string, cmp_short_str);
criterion_group!(long_string, cmp_long_str);

criterion_main! {
    short_string,
    long_string,
}

最终结果在短字符串的时候也就快一丢丢，长字符串就比不过标准库了。有的网友也讨论过这个短字符串优化是否有必要，不过对本人而言就是简单的练练手了

Benchmarking string<8/std: Collecting 100 samples in estimated 5.0000 sstring<8/std
			            time:   [66.206 ns 66.338 ns 66.463 ns]        
                        
Benchmarking string<8/sso: Collecting 100 samples in estimated 5.0000 sstring<8/sso
			            time:   [35.382 ns 35.453 ns 35.536 ns]        
                     
Benchmarking string>8/std: Collecting 100 samples in estimated 5.0004 sstring>8/std
			            time:   [2.6485 µs 2.6585 µs 2.6675 µs]        
                        
Benchmarking string>8/sso: Collecting 100 samples in estimated 5.0023 sstring>8/sso
			            time:   [6.1796 µs 6.4600 µs 6.7006 µs]