python源码学习(七)——String对象

http://www.androiddev.net/python7/

python源码学习(七)——String对象

By 李 志豪 | 2014/07/05

0 Comment

有了前面对Python中整数对象的学习,再学习String对象就回容易理解些,首先我们来看一下Python中的PyStringObject和PySting_Type,PyStringObject.h对象的定义如下:

1typedef struct {
2    PyObject_VAR_HEAD
3    long ob_shash;
4    int ob_sstate;
5    char ob_sval[1];
6 
7    /* Invariants:
8     *     ob_sval contains space for 'ob_size+1' elements.
9     *     ob_sval[ob_size] == 0.
10     *     ob_shash is the hash of the string or -1 if not computed yet.
11     *     ob_sstate != 0 iff the string object is in stringobject.c's
12     *       'interned' dictionary; in this case the two references
13     *       from 'interned' to this object are *not counted* in ob_refcnt.
14     */
15} PyStringObject;

PyObject_VAR_HEAD中有个ob_size变量,字符串的长度由ob_size决定,ob_size决定了这段内存的实际长度(字节),这个机制是Python中所有变长对象的实现机制。Python中字符串都是以结尾。ob_shash的初始值是-1,这个在Python的dict中有非常巨大的作用。hash值得算法如下:

1static long
2string_hash(PyStringObject *a)
3{
4    register Py_ssize_t len;
5    register unsigned char *p;
6    register long x;
7 
8#ifdef Py_DEBUG
9    assert(_Py_HashSecret_Initialized);
10#endif
11    if (a->ob_shash != -1)
12        return a->ob_shash;
13    len = Py_SIZE(a);
14    /*
15      We make the hash of the empty string be 0, rather than using
16      (prefix ^ suffix), since this slightly obfuscates the hash secret
17    */
18    if (len == 0) {
19        a->ob_shash = 0;
20        return 0;
21    }
22    p = (unsigned char *) a->ob_sval;
23    x = _Py_HashSecret.prefix;
24    x ^= *p << 7;
25    while (--len >= 0)
26        x = (1000003*x) ^ *p++;
27    x ^= Py_SIZE(a);
28    x ^= _Py_HashSecret.suffix;
29    if (x == -1)
30        x = -2;
31    a->ob_shash = x;
32    return x;
33}

字符串的预存hash值和intern机制使得Python的执行效率提升了20%左右。再看一下PyString_Type,代码如下:

1PyTypeObject PyString_Type = {
2    PyVarObject_HEAD_INIT(&PyType_Type, 0)
3    "str",
4    PyStringObject_SIZE,
5    sizeof(char),
6    string_dealloc,                             /* tp_dealloc */
7    (printfunc)string_print,                    /* tp_print */
8    0,                                          /* tp_getattr */
9    0,                                          /* tp_setattr */
10    0,                                          /* tp_compare */
11    string_repr,                                /* tp_repr */
12    &string_as_number,                          /* tp_as_number */
13    &string_as_sequence,                        /* tp_as_sequence */
14    &string_as_mapping,                         /* tp_as_mapping */
15    (hashfunc)string_hash,                      /* tp_hash */
16    0,                                          /* tp_call */
17    string_str,                                 /* tp_str */
18    PyObject_GenericGetAttr,                    /* tp_getattro */
19    0,                                          /* tp_setattro */
20    &string_as_buffer,                          /* tp_as_buffer */
21    Py_TPFLAGS_DEFAULT | Py_TPFLAGS_CHECKTYPES |
22        Py_TPFLAGS_BASETYPE | Py_TPFLAGS_STRING_SUBCLASS |
23        Py_TPFLAGS_HAVE_NEWBUFFER,              /* tp_flags */
24    string_doc,                                 /* tp_doc */
25    0,                                          /* tp_traverse */
26    0,                                          /* tp_clear */
27    (richcmpfunc)string_richcompare,            /* tp_richcompare */
28    0,                                          /* tp_weaklistoffset */
29    0,                                          /* tp_iter */
30    0,                                          /* tp_iternext */
31    string_methods,                             /* tp_methods */
32    0,                                          /* tp_members */
33    0,                                          /* tp_getset */
34    &PyBaseString_Type,                         /* tp_base */
35    0,                                          /* tp_dict */
36    0,                                          /* tp_descr_get */
37    0,                                          /* tp_descr_set */
38    0,                                          /* tp_dictoffset */
39    0,                                          /* tp_init */
40    0,                                          /* tp_alloc */
41    string_new,                                 /* tp_new */
42    PyObject_Del,                               /* tp_free */
43};

从上面的代码我们不难看出,String对象的tp_as_number、tp_as_sequence、tp_as_mapping三个域都被设置了,这说明了PyStringObject对数值操作
、序列操作和映射操作都支持。接下来我们看一下图片:
Screen Shot 2014-07-05 at 下午3.55.57

Screen Shot 2014-07-05 at 下午5.58.32
好了,通过以上两张图片我们就可以比较清楚的了解Python中string对象运行时的状态以及所占内存了,以后将会讲一个非常重要的机制,就是string的intern机制。

转载于:https://my.oschina.net/tplinuxhyh/blog/798309

  • 0
    点赞
  • 0
    收藏
    觉得还不错? 一键收藏
  • 0
    评论
评论
添加红包

请填写红包祝福语或标题

红包个数最小为10个

红包金额最低5元

当前余额3.43前往充值 >
需支付:10.00
成就一亿技术人!
领取后你会自动成为博主和红包主的粉丝 规则
hope_wisdom
发出的红包
实付
使用余额支付
点击重新获取
扫码支付
钱包余额 0

抵扣说明:

1.余额是钱包充值的虚拟货币,按照1:1的比例进行支付金额的抵扣。
2.余额无法直接购买下载,可以购买VIP、付费专栏及课程。

余额充值