前言
在比较c++
和rust
的字符串的时候就发现c++在字符串比较短的时候会分配在栈上,网上叫短字符串优化(SSO),但是rust没有这个优化,于是自己简单实现一下
一、数据结构
结构体需要存长度、容量、字符串数据。当字符少的时候存在栈上,多了就把数据区域变成指针,把字符串存在堆上。
又因为enum附带了tag信息会占空间,所以直接用union
union Store {
data: [u8; 8],
// *mut [u8] 胖指针 占空间
ptr: NonNull<u8>,
}
enum StoreEnum {
Stack([u8; 8]),
Heap(NonNull<u8>),
}
#[test]
fn t_size() {
assert_eq!(size_of::<[u8; 8]>(), size_of::<NonNull<u8>>());
assert_eq!(size_of::<Store>(), 8);
assert_eq!(size_of::<StoreEnum>(), 16);
}
最终结构如下,保证自己写的结构体空间跟标准库的一致
pub struct MyString {
len: usize, // 8
cap: usize, // 8
/// stack char / heap pointer
/// cap == 8 cap > 8
store: Store, // 8
}
#[test]
fn t_size() {
assert_eq!(size_of::<String>(), 24);
assert_eq!(size_of::<MyString>(), 24);
}
二、实现基本操作
最初创建时容量为8,数据在栈上
impl MyString {
pub fn new() -> Self {
MyString {
len: 0,
cap: 8,
store: Store { data: [0; 8] },
}
}
fn grow(&mut self) -> Result<()> {
let new_cap = self.cap * 2;
let new_layout = Layout::array::<u8>(new_cap)?;
// 栈
let new_p = if self.cap == 8 {
unsafe {
let new_p = alloc(new_layout);
// 栈上的数据拷贝到堆上
ptr::copy_nonoverlapping(&self.store.data as *const u8, new_p, 8);
new_p
}
} else {
let old_layout = Layout::array::<u8>(self.cap)?;
unsafe {
let old_p = self.store.ptr.as_ptr();
realloc(old_p, old_layout, new_layout.size())
}
};
self.store.ptr = NonNull::new(new_p).unwrap();
self.cap = new_cap;
Ok(())
}
pub fn push(&mut self, ch: char) {
// 基本类型没有实现Drop->可以索引
let ch = ch as u8;
// 栈
if self.len < 8 && self.cap == 8 {
unsafe {
self.store.data[self.len] = ch;
}
} else { // 堆
if self.len == self.cap {
self.grow().unwrap();
}
unsafe {
*self.store.ptr.as_ptr().add(self.len) = ch;
}
}
self.len += 1;
}
pub fn pop(&mut self) -> Option<char> {
if self.len == 0 {
None
} else {
self.len -= 1;
if self.cap == 8 {
unsafe {
Some(self.store.data[self.len] as char)
}
} else {
unsafe {
Some(*self.store.ptr.as_ptr().add(self.len) as char)
}
}
}
}
}
三、跑分
用criterion库跑bench,简单的比较一下增删速度
use criterion::Criterion;
use structure::string::MyString;
fn cmp_short_str(c: &mut Criterion) {
let mut group = c.benchmark_group("string<8");
group.bench_function("std", |b| {
b.iter(|| {
let mut s = String::new();
for _ in 0..8 {
s.push('a');
}
for _ in 0..8 {
s.pop();
}
})
});
group.bench_function("sso", |b| {
b.iter(|| {
let mut s = MyString::new();
for _ in 0..8 {
s.push('a');
}
for _ in 0..8 {
s.pop();
}
})
});
}
fn cmp_long_str(c: &mut Criterion) {
let mut group = c.benchmark_group("string>8");
group.bench_function("std", |b| {
b.iter(|| {
let mut s = String::new();
for _ in 0..1000 {
s.push('a');
}
for _ in 0..1000 {
s.pop();
}
})
});
group.bench_function("sso", |b| {
b.iter(|| {
let mut s = MyString::new();
for _ in 0..1000 {
s.push('a');
}
for _ in 0..1000 {
s.pop();
}
})
});
}
criterion_group!(short_string, cmp_short_str);
criterion_group!(long_string, cmp_long_str);
criterion_main! {
short_string,
long_string,
}
最终结果在短字符串的时候也就快一丢丢,长字符串就比不过标准库了。有的网友也讨论过这个短字符串优化是否有必要,不过对本人而言就是简单的练练手了
Benchmarking string<8/std: Collecting 100 samples in estimated 5.0000 sstring<8/std
time: [66.206 ns 66.338 ns 66.463 ns]
Benchmarking string<8/sso: Collecting 100 samples in estimated 5.0000 sstring<8/sso
time: [35.382 ns 35.453 ns 35.536 ns]
Benchmarking string>8/std: Collecting 100 samples in estimated 5.0004 sstring>8/std
time: [2.6485 µs 2.6585 µs 2.6675 µs]
Benchmarking string>8/sso: Collecting 100 samples in estimated 5.0023 sstring>8/sso
time: [6.1796 µs 6.4600 µs 6.7006 µs]