python字符串浅析

最新推荐文章于 2021-07-13 18:06:33 发布

dengji8846

最新推荐文章于 2021-07-13 18:06:33 发布

阅读量292

点赞数

文章标签： python

原文链接：http://www.cnblogs.com/qianggezhishen/p/7460309.html

版权

一说到字符串，不管哪门语言，其实会让人头痛的，因为它们有不同的用处，特别是在混用的时候，很容易脑子糊掉。
先看看有哪些字符串，普通的字符串就分为：

单引号字符串，形如：str='123'
双引号字符串，形如：str="123"

还有其他非普通字符串：

原始字符串，形如：str=r'123'
unicode字符串，形如：str=u'123'

长字符串，形如:

>>> print '''this is a paragraph 
... haha
... dead
... end
... '''
this is a paragraph
haha
dead
end

普通字符串

先讲普通字符串，后面再讲其他类型的字符串。我们在控制台打印一下看看输出的结果如何？

>>> 'hello, world!'
'hello, world!'
>>> "hello, world!"
'hello, world!'

有没有一点奇怪？双引号打印出来的都是单引号（' '）的形式，这样看来它们并没有太大区别，那为什么又需要保留这两种类型的字符串呢？
我们看看下面的组合使用，

>>> "Let's happy!"
"Let's happy!"
>>> 'Let\'s happy!'
"Let's happy!"

这个时候的控制台输出，竟然都是双引号(" ")的形式！！这是因为我们需要输出带单引号，这里其实是撇号，所以为了显示出来就用双引号来引用了。
在来看一个例子，我们想输出双引号，

>>> '"hello, world!" he said'
'"hello, world!" he said'

这样因为我们需要输出带双引号的字符串，所以python控制台为了显示出来就又用单引号引号了。

总之，默认情况是单引号输出，如果字符串中包含单引号，则用双引号输出，如果字符串中包含双引号，则用单引号输出，我们把这种输出想像成一种引用就好理解了，即中文中的引用一段话的意思。
再看一个例子：

>>> 'Let's happy!' 
  File "<stdin>", line 1
    'Let's happy!'
         ^
SyntaxError: invalid syntax
>>> 'Let\'s happy!'
"Let's happy!"
>>> "Let's happy!"
"Let's happy!"

这进一步说明了组合的好处，在双引号中，确实不需要那么麻烦的使用转义字符来获取单引号。

原始字符串

原始字符串是用r前缀修饰的，如str=r'123',　不会对反斜杠进行转义，如：

>>> print 'hello.\n world!'
hello.
 world!
>>> print r'hello.\n world!'
hello.\n world!
>>> print 'c:\\nowhere'
c:\nowhere
>>> print r'c:\\nowhere'
c:\\nowhere

如果我们想输出形如：c:\a\b\c\d这样的带反斜线的目录，则原始字符串很容易做到，如：

>>> print 'c:\\a\\b\\c\\d' 
c:\a\b\c\d
>>> print r'c:\a\b\c\d'    
c:\a\b\c\d

这样我们看到了原始字符串的好处了吧。

unicode字符串

unicode字符串是用u前缀修饰的，如str=u'hello, world!'，如：

>>> u'hello, world!'
u'hello, world!'

ASCII 码使用指定的7 位或8 位二进制数组合来表示128 或256 种可能的字符。标准ASCII 码也叫基础ASCII码，使用7 位二进制数（剩下的1位二进制为0）来表示所有的大写和小写字母，数字0 到9、标点符号，以及在美式英语中使用的特殊控制字符。
后128个称为扩展ASCII码。许多基于x86的系统都支持使用扩展（或“高”）ASCII。扩展ASCII 码允许将每个字符的第8 位用于确定附加的128 个特殊符号字符、外来语字母和图形符号。
Unicode 是为了解决传统的字符编码方案的局限而产生的，它为每种语言中的每个字符设定了统一并且唯一的二进制编码，以满足跨语言、跨平台进行文本转换、处理的要求。它分为utf-8, utf-16,utf-32等
具体不在这里展开说明。

小试牛刀

我们再来看一个问题　“如何在字符串尾打印出反斜线()？”

对于普通字符串，如果想在该尾部添加反斜杠()，则会出现如下不合法信息：

>>> print "this is \ illegal\"    

  File "<stdin>", line 1

    print "this is \ illegal\"

                             ^

SyntaxError: EOL while scanning string literal

此时，需要使用再多使用一个反斜杠进行转义，如下：

>>> print "this is \ legal\\"

this is \ legal\

原始字符串，则使用'r'指定，如果也想在该尾部添加反斜杠()，则会出现如下不合法信息，

>>> print r"this is illegal\"

  File "<stdin>", line 1

    print r"this is illegal\"

                            ^

SyntaxError: EOL while scanning string literal

此时，我们可能会想着使用反斜杠进行转义，即会如下使用，但这样虽然不会报错，但也不是我们想要的结果，它会出现两个反斜线，

因为'\'在原始字符串中并不会被转义，除了像现在这样在尾部的情况，其他全部原样输出，

>>> print r"this is legal\\"

this is legal\\

那我们就是需要使用原始字符串，又需要在尾部输出呢？则可以借助字符串拼接的方式如下使用，

>>> print r'this is illegal' '\\' 

this is illegal\

既然有这么多种字符串，那我想知道它们所占字节大小，该怎么办？
2.6版本前，使用str.__sizeof__()方法
2.6版本后，才支持使用sys.getsizeof(str)方法，当然也支持str.__sizeof__()方法
举例如下：

>>> import sys
>>> a='1'
>>> b=r'1'
>>> c=u'1'
>>> sys.getsizeof(a)
38
>>> sys.getsizeof(b)
38
>>> sys.getsizeof(c)  
56
>>> a.__sizeof__()
38
>>> b.__sizeof__()
38
>>> c.__sizeof__()
56

最后来看一下python控制台的输出和采用print表达式的输出有什么不同？
python控制台输出的是python自己使用的表达式格式，而print是让人更易阅读的格式。如下，

>>> "hello, world!"
'hello, world!'
>>> print "hello, world!"
hello, world!
>>> 10000L
10000L
>>> print 10000L
10000
>>>

如果我在使用print的同时，我就想输出python的原始表达式该怎么办？可以使用repr，如：

>>> print repr("hello, world!")
'hello, world!'
>>> print repr(10000L)
10000L
>>> print str("hello, world!")
hello, world!
>>> print str(10000L)
10000

如果我们需要打印一个包含数字的数值该怎么办？我们可以使用str, 或者反引号，如下：

>>> temp=1
>>> print "the first number is: " + temp
Traceback (most recent call last):
  File "<stdin>", line 1, in <module>
TypeError: cannot concatenate 'str' and 'int' objects
>>> print "the first number is: " + str(temp)
the first number is: 1
>>> print "the first number is: " + `temp`
the first number is: 1

由于输出的字符串有不同的格式，一种是适用于python代码的python表达式，一般带有''单引号，如 'hello, world'，另一种是易于人阅读的格式hello, world，所以有时候，我们有不同的应用场景会需要输入这两种字符串格式，这就是input和raw_input的区别了。input会假设用户输入的就是适合python的表达式，即带有''或者""，而raw_input会将用户输入的字符串进行包装成python表达式的格式，一般我们都用 raw_input代替input例如：

>>> input("what is your name: ")
what is your name: chenqiang
Traceback (most recent call last):
  File "<stdin>", line 1, in <module>
  File "<string>", line 1, in <module>
NameError: name 'chenqiang' is not defined
>>> raw_input("what is your name: ") 
what is your name: chenqiang
'chenqiang'

>>> input("what is your name: ")
what is your name: 'chenqiang'
'chenqiang'
>>> input("what is your name: ")
what is your name: "chenqiang"
'chenqiang'

既然raw_input返回的是python代码的表达式，那么在接收它的返回值当作整数使用的时候，就需要进行int强制类型转换一下，如：

>>> month_number=raw_input("please input the month number (1~12): ")
please input the month number (1~12): 10
>>> month_number
'10'
>>> int(month_number)
10
>>>

转载于:https://www.cnblogs.com/qianggezhishen/p/7460309.html