SGI STL (4) :: String Implementation Issue

Issue in String Draft Standard

the problem is that, if two strings share a common representation, they are vulnerable to modification through a pre-existing reference or iterator.

#include <string>
#include <stdio.h>

main() 
{
   string s("abc");
   string t;
   char & c(s[1]);

   // Data typically shared between s and t.
   t = s;
   // How many strings does this modify?
   c = 'z';
   if (t[1] == 'z') 
   {
       printf("wrong\n");
   } else 
   {
       printf("right\n");
   }
}

updating a reference to one of s’s elements should only modify s, not t as well. However, given the design of basic_string, though, it is very difficult for a reference-counted implementation to satisfy that requirement.

The only known way for a reference-counted implementation to avoid this problem is: whenever a program obtains a reference or an iterator to a string (e.g. by using operator[] or begin()), that particular string will no longer use reference counting; assignment and copy construction will copy the string’s elements instead of just copying a pointer.

Or, totally abandon reference-counting impl.

So what should I use to represent strings?

Use SGI Ropes
perform reasonably well for all applications that do not require very frequent small updates to strings.

It is the only alternative that scales well to very long strings, i.e. that could easily be used to represent a mail message or a text file as a single string.

The disadvantages are:

  • Single character replacements are slow.
  • Portability and compilation time may be an issue in the short term.

C strings
This is likely to be the most efficient way to represent a large collection of very short strings. The primary disadvantages are that

  • Operations such as concatenation and substring are much more expensive than for ropes if the strings are long. A C string is not a good representation for a text file in an editor.
  • The user needs to be aware of sharing between string representations. If strings are assigned by copying pointers, an update to one string may affect another.
  • provide no help in storage management. This may be a major issue, although a garbage collector can help alleviate it.
  • Most operations on entire strings (e.g. assignment, concatenation) do not scale well to long strings.

vector < char >
If a string is treated primarily as an array of characters, with frequent in-place updates, it is reasonable to represent it as vector or vector. The same is true if it will be modified by STL container algorithms.

Unlike C strings, vectors handle internal storage management automatically, and operations that modify the length of a string are generally more convenient.

Disadvantages are:

  • Vector assignments are much more expensive than C string pointer assignments; the only way to share string representations is to pass pointers or references to vectors.
  • Most operations on entire strings (e.g. assignment, concatenation) do not scale well to long strings.
  • A number of standard string operations (e.g. concatenation and substring) are not provided with the usual syntax, and must be expressed using generic STL algorithms. This is usually not hard.
  • Conversion to C strings is currently slow, even for short strings. That may change in future implementations.
  • 0
    点赞
  • 0
    收藏
    觉得还不错? 一键收藏
  • 0
    评论

“相关推荐”对你有帮助么?

  • 非常没帮助
  • 没帮助
  • 一般
  • 有帮助
  • 非常有帮助
提交
评论
添加红包

请填写红包祝福语或标题

红包个数最小为10个

红包金额最低5元

当前余额3.43前往充值 >
需支付:10.00
成就一亿技术人!
领取后你会自动成为博主和红包主的粉丝 规则
hope_wisdom
发出的红包
实付
使用余额支付
点击重新获取
扫码支付
钱包余额 0

抵扣说明:

1.余额是钱包充值的虚拟货币,按照1:1的比例进行支付金额的抵扣。
2.余额无法直接购买下载,可以购买VIP、付费专栏及课程。

余额充值