【python 让繁琐工作自动化】第6章字符串操作

本文链接：https://blog.csdn.net/HPP_CSDN/article/details/103539919

Automate the Boring Stuff with Python: Practical Programming for Total Beginners (2nd Edition)
Written by Al Sweigart.
The second edition is available on 2019.10.29

6.1 处理字符串

让我们来看看，Python 提供的写入、打印和访问字符串的一些方法。

字符串字面值

在 Python 代码中输入字符串值非常简单：它们以单引号开头和结尾。但如何在字符串中使用引号呢？输入 ‘That is Alice’s cat.’ 行不通，因为 Python 认为字符串在 Alice 之后结束，而其余的 (s cat.’) 是无效的 Python 代码。幸运的是，有多种方法可以输入字符串。

双引号

字符串可以以双引号开始和结束。使用双引号的一个好处，是字符串中可以使用单引号字符。

spam = "That is Alice's cat."

因为字符串以双引号开头，所以 Python 知道单引号是字符串的一部分，而不是字符串的结尾。

转义字符

转义字符（escape character）允许使用在其他情况下不可能放入字符串中的字符。转义字符由反斜杠（\）和要添加到字符串中的字符组成。（尽管它由两个字符组成，但它被公认为一个转义字符。）例如，单引号的转义字符是 \’。可以在以单引号开头和结尾的字符串中使用它。

spam = 'Say hi to Bob\'s mother.'

表6-1 转义字符

转义字符	打印为
\’	单引号
\"	双引号
\t	制表符（Tab）
\n	换行符
\\	反斜杠

>>> print("Hello there!\nHow are you?\nI\'m doing fine.")
Hello there!
How are you?
I'm doing fine.

原始字符串

可以在字符串的开始引号前放置一个 r，使其成为原始字符串。原始字符串完全忽略所有转义字符并打印字符串中出现的任何反斜杠。

>>> print(r'That is Carol\'s cat.')
That is Carol\'s cat.

如果在单引号前连续输入偶数个反斜杠，会导致错误。

>>> print(r'That is Carol\\'s cat.')
      
SyntaxError: invalid syntax

带三重引号的多行字符串

虽然可以使用 \n 转义字符将换行符放入字符串，但使用多行字符串通常更容易。Python 中的多行字符串以三个单引号或三个双引号开始和结束。“三重引号”之间的任何引号、制表符或换行符都被认为是字符串的一部分。Python 的块缩进规则不适用于多行字符串中。

print('''Dear Alice,

Eve's cat has been arrested for catnapping, cat burglary, and extortion.

Sincerely,
Bob''')

上面代码运行后，输出如下：

Dear Alice,

Eve's cat has been arrested for catnapping, cat burglary, and extortion.

Sincerely,
Bob

下面的 print() 调用将打印相同的文本，但不使用多行字符串：

print('Dear Alice,\n\nEve\'s cat has been arrested for catnapping, cat burglary, and extortion.\n\nSincerely,\nBob')

多行注释

井字字符（#）表示这行其余部分的注释的开头，多行字符串通常用于跨多行的注释。

"""This is a test Python program.
Written by Al Sweigart al@inventwithpython.com

This program was designed for Python 3, not Python 2.
"""

def spam():
    """This is a multiline comment to help
    explain what the spam() function does."""
    print('Hello!')

字符串索引和切片

字符串使用索引和切片的方式与列表相同。

>>> spam = 'Hello world!'
>>> spam[0]
'H'
>>> spam[4]
'o'
>>> spam[-1]
'!'
>>> spam[0:5]
'Hello'
>>> spam[:5]
'Hello'
>>> spam[6:]
'world!'
>>> fizz = spam[0:5] # 将切片结果子字符串保存在变量 fizz 中
>>> fizz
'Hello'

注意，字符串切片并不会修改原始字符串。

字符串的 in 和 not in 操作符

in 和 not in 操作符可以与字符串一起使用，就像列表值一样。

>>> 'Hello' in 'Hello World'
True
>>> 'HELLO' in 'Hello World'
False
>>> '' in 'spam'
True
>>> 'cats' not in 'cats and dogs'
False

6.2 有用的字符串方法

upper()、lower()、isupper() 和 islower()字符串方法

upper() 和 lower() 字符串方法返回一个新字符串，其中原始字符串中的所有字母分别转换为大写或小写。字符串中的非字母字符保持不变。

>>> spam = 'Hello world!'
>>> spam = spam.upper()
>>> spam
'HELLO WORLD!'
>>> spam = spam.lower()
>>> spam
'hello world!'

如果需要进行大小写无关的比较，那么 upper() 和 lower() 方法非常有用。

print('How are you?')
feeling = input()
if feeling.lower() == 'great':
    print('I feel great too.')
else:
    print('I hope the rest of your day is good.')

如果字符串至少有一个字母，并且所有字母都是大写或小写，isupper() 和 islower() 方法就会相应地返回布尔值 True。否则，该方法返回 False。

>>> spam = 'Hello world!'
>>> spam.islower()
False
>>> spam.isupper()
False
>>> 'HELLO'.isupper()
True
>>> 'abc12345'.islower()
True
>>> '12345'.islower()
False
>>> '12345'.isupper()
False

由于 upper() 和 lower() 字符串方法本身返回字符串，所以也可以对这些返回的字符串值调用字符串方法。这样做的表达式看起来像一个方法调用链。

>>> 'Hello'.upper()
'HELLO'
>>> 'Hello'.upper().lower()
'hello'
>>> 'Hello'.upper().lower().upper()
'HELLO'
>>> 'HELLO'.lower()
'hello'
>>> 'HELLO'.lower().islower()
True

isX 字符串方法

除了 islower() 和 isupper() 之外，还有一些名称以 is 开头的字符串方法。这些方法返回一个描述字符串性质的布尔值。下面是一些常用的 isX 字符串方法：
① 如果字符串只包含字母，并且非空，则 isalpha() 返回 True。
② 如果字符串只包含字母和数字，并且非空，则 isalnum() 返回 True。
③ 如果字符串只包含数字字符，并且非空，则 isdecimal() 返回 True。
④ 如果字符串只包含空格、制表符和换行符，并且非空，则 isspace() 返回 True。
⑤ 如果字符串只包含以大写字母开头，后面紧跟小写字母的单词，则 istitle() 返回 True。

>>> 'hello'.isalpha()
True
>>> 'hello123'.isalpha()
False
>>> 'hello123'.isalnum()
True
>>> 'hello'.isalnum()
True
>>> '123'.isdecimal()
True
>>> '    '.isspace()
True
>>> 'This Is Title Case'.istitle()
True
>>> 'This Is Title Case 123'.istitle()
True
>>> 'This Is not Title Case'.istitle()
False
>>> 'This Is NOT Title Case Either'.istitle()
False
>>> '123'.istitle() # 只有数字字符 istitle() 返回 False.
False
>>> '123 Title'.istitle()
True

当需要验证用户输入时，isX 字符串方法非常有用。
例如，下面的程序反复询问用户的年龄和密码，直到他们提供有效的输入。

while True:
    print('Enter your age:')
    age = input()
    if age.isdecimal():
        break
    print('Please enter a number for your age.')

while True:
    print('Select a new password (letters and numbers only):')
    password = input()
    if password.isalnum():
        break
    print('Passwords can only have letters and numbers.')

startswith() 和 endswith() 字符串方法

若 startswith() 和 endswith() 这两个方法所调用的字符串，以该方法传入的字符串分别开始或结束，则方法返回 True；否则，它们返回 False。

>>> 'Hello world!'.startswith('Hello')
True
>>> 'Hello world!'.endswith('world!')
True
>>> 'abc123'.startswith('abcdef')
False
>>> 'abc123'.endswith('12')
False
>>> 'Hello world!'.startswith('Hello world!')
True
>>> 'Hello world!'.endswith('Hello world!')
True

join() 和 split() 字符串方法

当需要将一组字符串连接成一个字符串值时，join() 方法非常有用。join() 方法在一个字符串上调用，传递的参数是一个字符串列表，返回一个字符串。返回的字符串是传入列表中每个字符串的连接。

>>> ', '.join(['cats', 'rats', 'bats'])
'cats, rats, bats'
>>> ' '.join(['My', 'name', 'is', 'Simon'])
'My name is Simon'
>>> 'ABC'.join(['My', 'name', 'is', 'Simon'])
'MyABCnameABCisABCSimon'

split() 方法执行与 join() 方法相反的操作：它对一个字符串值调用，并返回一个字符串列表。

>>> 'My name is Simon'.split()
['My', 'name', 'is', 'Simon']

默认情况下，只要找到空格、制表符或换行符等空白字符，字符串 “My name is Simon” 就会被分割。

可以将分隔符字符串传递给 split() 方法，指定另一种不同的方式分割字符串。

>>> 'MyABCnameABCisABCSimon'.split('ABC')
['My', 'name', 'is', 'Simon']
>>> 'My name is Simon'.split('m')
['My na', 'e is Si', 'on']

split() 的一个常见用法是按照换行字符分割多行字符串。

>>> spam = '''Dear Alice,
How have you been? I am fine.
There is a container in the fridge
that is labeled "Milk Experiment".

Please do not drink it.
Sincerely,
Bob'''
>>> spam.split('\n')
['Dear Alice,', 'How have you been? I am fine.', 'There is a container in the fridge', 
'that is labeled "Milk Experiment".', '', 'Please do not drink it.', 'Sincerely,', 'Bob']

使用 rjust()、ljust() 和 center() 对齐文本

rjust() 和 ljust() 字符串方法返回它们所调用的字符串的填充版本，通过插入空格来调整文本。这两个方法的第一个参数都是整数长度，表示对齐后字符串的长度。

>>> 'Hello'.rjust(10)
'     Hello'
>>> 'Hello'.rjust(20)
'               Hello'
>>> 'Hello World'.rjust(20)
'         Hello World'
>>> 'Hello'.ljust(10)
'Hello     '

rjust() 和 ljust() 的第二个可选参数将指定填充字符，而不是空格字符。

>>> 'Hello'.rjust(20, '*')
'***************Hello'
>>> 'Hello'.ljust(20, '-')
'Hello---------------'

center() 字符串方法的工作方式类似于 ljust() 和 rjust()，但它将文本居中，而不是将其调整为向左或向右。

>>> 'Hello'.center(20)
'       Hello        '
>>> 'Hello'.center(20, '=')
'=======Hello========'

如果需要打印表格式数据，留出正确的空格，这些方法特别有用。

def printPicnic(itemsDict, leftWidth, rightWidth):
    print('PICNIC ITEMS'.center(leftWidth + rightWidth, '-'))
    for k, v in itemsDict.items():
        print(k.ljust(leftWidth, '.') + str(v).rjust(rightWidth))
picnicItems = {'sandwiches': 4, 'apples': 12, 'cups': 4, 'cookies': 8000}
printPicnic(picnicItems, 12, 5)
printPicnic(picnicItems, 20, 6)

使用 strip()、rstrip() 和 lstrip() 删除空白字符

strip() 字符串方法返回一个新字符串，该字符串的开头和结尾没有空白字符（空格、制表符和换行符）。lstrip() 和 rstrip() 方法将分别从左端和右端删除空白字符。

>>> spam = '    Hello World     '
>>> spam.strip()
'Hello World'
>>> spam.lstrip()
'Hello World     '
>>> spam.rstrip()
'    Hello World'

有一个可选的字符串参数，指定两端的哪些字符应该被删除。

>>> spam = 'SpamSpamBaconSpamEggsSpamSpam'
>>> spam.strip('ampS') # 传递给 strip() 的字符串中字符的顺序不重要
'BaconSpamEggs'

使用 pyperclip 模块复制和粘贴字符串

pyperclip 模块有 copy() 和 paste() 函数，可以向计算机剪贴板发送文本和接收文本。将程序的输出发送到剪贴板，使它很容易粘贴到邮件、文字处理程序或其他软件中。
pyperclip 模块不是 Python 自带的。要安装它，请遵从附录 A 中安装第三方模块的指南。
———————————————————————————————————————————————
我自己使用 pip install pyperclip 命令安装 pyperclip 模块时，出现错误：
在这里插入图片描述
我的解决办法是：
直接在网站 https://pypi.org/ 搜索 pyperclip，进入 pyperclip 1.7.0 页面，下载文件 pyperclip-1.7.0.tar.gz，并将此文件放到计算机的 E 盘。
接着在 cmd 中输入命令 pip install E:\pyperclip-1.7.0.tar.gz 就安装成功啦。
———————————————————————————————————————————————

>>> import pyperclip
>>> pyperclip.copy('Hello world!')
>>> pyperclip.paste()
'Hello world!'

如果程序外部的内容改变了剪贴板的内容，那么 paste() 函数将返回它。例如，如果我将这句话复制到剪贴板，然后调用 paste()，它看起来就像这样：

>>> pyperclip.paste()
'例如，如果我将这句话复制到剪贴板，然后调用 paste()，它看起来就像这样：'

在 IDLE 之外运行 Python 脚本
可以设置一些快捷方式来简化 Python 脚本的运行。Windows、OS X 和 Linux 的步骤略有不同，但每种都在附录 B 中描述。请参阅附录 B，了解如何方便地运行 Python 脚本，并能够将命令行参数传递给它们。（无法使用IDLE将命令行参数传递给程序。）

6.3 项目：密码保管箱

许多人可能在很多不同的网站上都有账户。使用相同的密码是一个坏习惯，如果其中一个网站存在安全漏洞，黑客就会知道所有其他账户的密码。最好在计算机上使用密码管理器软件，它使用一个主控密码来解锁密码管理器。然后可以复制任何帐户密码到剪贴板，并粘贴到相应网站的密码输入框。

步骤 1：程序设计和数据结构

可以用一个命令行参数来运行这个程序，参数是账号的名称。该帐户的密码将被复制到剪贴板，以便用户可以将其粘贴到密码输入框。通过这种方式，用户可以拥有长而复杂的密码，而不必记住它们。
打开一个新的文件编辑器窗口并将程序保存为 pw.py。程序开始时需要 #! 行（见附录 B），还应该写一个注释，简要描述程序。由于要将每个帐户的名称与其密码关联起来，所以可以将这些字符串存储在字典中。字典是组织帐户和密码数据的数据结构。

#! python3
# pw.py - An insecure password locker program.

PASSWORDS = {'email': 'F7minlBDDuvMJuxESSKHFhTxFtjVB6',
             'blog': 'VmALvQyKAxiVH5G8v01if1MLZF3sdt',
             'luggage': '12345'}

步骤 2：处理命令行参数

命令行参数存储在 sys.argv 变量中。sys.argv 列表中的第一项总是一个字符串，包含程序的文件名，第二项应该是第一个命令行参数。对于该程序，此参数是账户名称，以获取相应密码。由于命令行参数是必需的，如果用户忘记添加它（即，如果 sys.argv 列表中少于两个值），要向用户显示用法消息。

import sys
if len(sys.argv) < 2:
    print('Usage: python pw.py [account] - copy account password')
    sys.exit()

account = sys.argv[1]      # first command line arg is the account name

步骤 3：复制正确的密码

现在帐户名作为字符串存储在变量 account 中，需要查看它是否是 PASSWORDS 字典中的键。如果是，则需要使用 pyperclip.copy() 将该键的值复制到剪贴板。注意，实际上并不需要 account 变量，可以在程序中所有使用 account 的地方，直接使用 sys.argv[1]。但是，一个名为 account 的变量要比 sys.argv[1] 变量可读性强得多。

import pyperclip
if account in PASSWORDS:
    pyperclip.copy(PASSWORDS[account])
    print('Password for ' + account + ' copied to clipboard.')
else:
    print('There is no account named ' + account)

步骤 4：运行程序

完整的脚本如下。如果想要更新密码，必须修改源代码中的 PASSWORDS 字典中的值。

#! python3
# pw.py - An insecure password locker program.
PASSWORDS = {'email': 'F7minlBDDuvMJuxESSKHFhTxFtjVB6',
             'blog': 'VmALvQyKAxiVH5G8v01if1MLZF3sdt',
             'luggage': '12345'}

import sys, pyperclip
if len(sys.argv) < 2:
    print('Usage: py pw.py [account] - copy account password')
    sys.exit()

account = sys.argv[1]   # first command line arg is the account name

if account in PASSWORDS:
    pyperclip.copy(PASSWORDS[account])
    print('Password for ' + account + ' copied to clipboard.')
else:
    print('There is no account named ' + account)

当然，把所有密码都放在一个任何人都可以轻易复制的地方并不安全。但可以修改此程序，使用它来快速将普通文本复制到剪贴板。
在 Windows 上，可以创建一个批处理文件来运行此程序。在文件编辑器中输入下面的内容，保存为 pw.bat，放置于 C:\Windows 文件夹：

@py.exe C:\Python34\pw.py %*
@pause

创建该批处理文件后，在 Windows 上只需按 WIN+R，再输入 pw <帐户名>，就能运行密码安全程序。
（Python2 版本使用 python 代替 py。）

6.4 项目：将项目符号添加到 Wiki 标记

编辑 Wikipedia 文章时，可以通过将每个列表项放置一行并在前面放置一个星号，来创建无序列表。但是，假设有一个非常大的列表，想要添加项目符号。可以在每一行的开头，一个一个地输入这些星号。或者，可以使用一个简短的 Python 脚本来自动化这个任务。
bulletPointAdder.py 脚本将从剪贴板获取文本，在每行的开头添加星号和空格，然后将新文本粘贴到剪贴板。例如，如果将以下文本（用于 Wikipedia 文章 “List of Lists of Lists”）复制到剪贴板:

Lists of animals
Lists of aquarium life
Lists of biologists by author abbreviation
Lists of cultivars

然后运行 bulletPointAdder.py 程序，剪贴板将包含以下内容：

* Lists of animals
* Lists of aquarium life
* Lists of biologists by author abbreviation
* Lists of cultivars

这段前面加了星号的文本，就可以粘贴到 Wikipedia 的文章中，成为一个无序列表。

步骤 1：在剪贴板中复制和粘贴

你想要 bulletPointAdder.py 程序完成下列事情：
① 从剪贴板粘贴文本
② 对它做些处理
③ 将新的文本复制到剪贴板
先编写包含步骤 1 和步骤 3 的程序部分。输入以下内容，将程序保存为 bulletPointAdder.py：

#! python3
# bulletPointAdder.py - Adds Wikipedia bullet points to the start
# of each line of text on the clipboard.

import pyperclip
text = pyperclip.paste()

# TODO: Separate lines and add stars.

pyperclip.copy(text)

步骤 2：分离文本中的行并添加星号

调用 pyperclip.paste() 将以一个大字符串的形式返回剪贴板上的所有文本。如果使用 “List of Lists of Lists” 的例子，那么文本中存储的字符串应该是这样的：

'Lists of animals\nLists of aquarium life\nLists of biologists by author abbreviation\nLists of cultivars'

可以编写代码来搜索字符串中的每个 \n 换行符，在后面添加星号。但，使用 split() 方法返回字符串列表会更容易一些，其中每个表项对应于原始字符串中的每一行，再将星号添加到列表中每个字符串的前面。

# Separate lines and add stars.
lines = text.split('\n')
for i in range(len(lines)):    # loop through all indexes in the "lines" list
    lines[i] = '* ' + lines[i] # add star to each string in "lines" list

步骤 3：连接修改后的行

lines 列表现在包含以星号开头的修改后的行。要得到一个单字符串值，将 lines 传递给 join() 方法，连接列表中的字符串。

text = '\n'.join(lines)

bulletPointAdder.py 脚本完整的程序如下：

#! python3
# bulletPointAdder.py - Adds Wikipedia bullet points to the start
# of each line of text on the clipboard.

import pyperclip
text = pyperclip.paste()

# Separate lines and add stars.
lines = text.split('\n')
for i in range(len(lines)):    # loop through all indexes for "lines" list
    lines[i] = '* ' + lines[i] # add star to each string in "lines" list
text = '\n'.join(lines)
pyperclip.copy(text)

即使不需要自动化此特定任务，也可以自动化其他类型的文本操作，比如从删除行尾的空格，或将文本转换为大写或小写。无论需要什么，都可以使用剪贴板作为输入和输出。

6.5 习题

8．下面的表达式求值是多少？

>>> 'Remember, remember, the fifth of November.'.split()
['Remember,', 'remember,', 'the', 'fifth', 'of', 'November.']
>>> '-'.join('There can be only one.'.split())
'There-can-be-only-one.'

6.6 实践项目

表格打印

编写一个名为 printTable() 的函数，该函数获取字符串列表的列表，并将其显示在一个组织良好的表中，每列右对齐。假设所有内部列表包含相同数量的字符串。例如，值可以是这样的：

tableData = [['apples', 'oranges', 'cherries', 'banana'],
             ['Alice', 'Bob', 'Carol', 'David'],
             ['dogs', 'cats', 'moose', 'goose']]

printTable() 函数将打印出：

  apples Alice  dogs
 oranges   Bob  cats
cherries Carol moose
  banana David goose

提示：代码首先必须在每个内层列表中找到最长的字符串，这样整个列的宽度就足以容纳所有字符串。可以将每个列的最大宽度存储为整数列表。printTable() 函数可以从 colWidths = [0] * len(tableData) 开始，它将创建一个列表，包含与 tableData 内层列表相同数量的 0 值。这样，colWidths[0] 可以存储 tableData[0] 中最长字符串的宽度，colWidths[1] 可以存储 tabledata[1] 中最长字符串的宽度，以此类推。然后，在 colWidths 列表中查找最大的值，以确定要传递给 rjust() 字符串方法的整数宽度。

#! python3
# tablePrinter.py - Takes a list of lists of strings and displays it 
# in a well-organized table with each column right-justified.

def printTable(tableData):
	colWidths = [0] * len(tableData)
	for i in range(len(tableData)):
		for s in tableData[i]:
			if len(s) > colWidths[i]:
				colWidths[i] = len(s)
	for j in range(len(tableData[0])):
		for i in range(len(tableData)):
			print(tableData[i][j].rjust(colWidths[i]+1), end='')
		print() # 换行
	
tableData = [['apples', 'oranges', 'cherries', 'banana'],
             ['Alice', 'Bob', 'Carol', 'David'],
             ['dogs', 'cats', 'moose', 'goose']]
printTable(tableData)