聪明办法学 Python 字符串 Strings

星星子Yocio

已于 2024-06-29 16:05:47 修改

阅读量1.1k

点赞数 30

文章标签： python

于 2024-06-29 00:41:31 首次发布

本文链接：https://blog.csdn.net/Serena_yocio/article/details/140053619

版权

四种引号

引号的作用就是将文字包裹起来，告诉 Python "这是个字符串！"

单引号 ' 和双引号 " 是最常见的两种字符串引号，对于字符串来说是等价的。

print('单引号')
print("双引号")

单引号
双引号

三个引号的情况不太常见，但是它在一些场合有特定的作用（如函数文档 doc-strings）

print('''三个单引号''')

print("""三个双引号""")

三个单引号

三个双引号

我们为什么需要两种不同的引号？

#为了写出这样的句子：

print("聪明办法学 Python 第二版的课程简称是 'P2S'")

聪明办法学 Python 第二版的课程简称是 'P2S'

如果只用一种引号呢？

#会导致语法错误，Python 无法正确判断一个字符串的终止位置
print("聪明办法学 Python 第二版的课程简称是 "P2S"")

  Cell In [4], line 2
    print("聪明办法学 Python 第二版的课程简称是 "P2S"")
                                   ^
SyntaxError: invalid syntax

字符串中的换行符号

前面有反斜杠 \ 的字符，叫做转义序列

比如 \n 代表换行，尽管它看起来像两个字符，但是 Python 依然把它视为一个特殊的字符

# 这两个 print() 在做同样的事情

print("Data\nwhale")  # \n 是一个单独的换行符号

Data
whale

print("""Data
whale""")

Data
whale

print("""你可以在字符串后面使用 反斜杠 `\`  来排除后面的换行。\
比如这里是第二行文字，但是你会看到它会紧跟在上一行句号后面。\
这种做法在 CIL 里面经常使用（多个 Flag 并排保持美观），\
但是在编程中的应用比较少。\
""")

你可以在字符串后面使用 反斜杠 `\`  来排除后面的换行。比如这里是第二行文字，但是你会看到它会紧跟在上一行句号后面。这种做法在 CIL 里面经常使用（多个 Flag 并排保持美观），但是在编程中的应用比较少。

其他的转义序列

print("双引号：\"")

双引号："

print("反斜线：\\")

反斜线：\

print("换\n行")

换
行

print("这个是\t制\t表\t符\n也叫\t跳\t格\t键")

这个是	制	表	符
也叫	跳	格	键

转义序列只作为一个字符存在

s = "D\\a\"t\ta"
print("s =", s)
print("\ns 的长度为：", len(s))

s = D\a"t	a

s 的长度为： 7

repr() vs. print()

我们现在有两个字符串

s1 = "Data\tWhale"

s2 = "Data        Whale"

它俩看起来似乎是一样的

print("s1:", s1)
print("s2:", s2)

s1: Data	Whale
s2: Data        Whale

但是它们真的一样吗？

s1 == s2

False

如来佛合掌道：“观音尊者，你看那两个行者，谁是真假？”

“谛听，汝之神通，能分辨出谁是真身，可为我说之。”

print(repr(s1))
print(repr(s2))

'Data\tWhale'
'Data Whale'

hack_text = "密码应当大于 8 个字符，小于 16 个字符，包含大写字母、小写字母、数字和特殊符号\t\t\t\t\t\t\t\t\t\t\t\t\t"

print(hack_text)

密码应当大于 8 个字符，小于 16 个字符，包含大写字母、小写字母、数字和特殊符号

print(repr(hack_text))

'密码应当大于 8 个字符，小于 16 个字符，包含大写字母、小写字母、数字和特殊符号\t\t\t\t\t\t\t\t\t\t\t\t\t'

多行字符串作为注释

"""

Python 本身是没有多行注释的，
但是你可以用多行字符串实现同样的操作，
还记得我们之前学过的“表达式“吗？
它的原理就是 Python 会运行它，
但是马上扔掉！（垃圾回收机制）
"""
print("Amazing!")

Amazing!

一些字符串常量

import string
print(string.ascii_letters)

abcdefghijklmnopqrstuvwxyzABCDEFGHIJKLMNOPQRSTUVWXYZ

print(string.ascii_lowercase)

abcdefghijklmnopqrstuvwxyz

print(string.ascii_uppercase)

ABCDEFGHIJKLMNOPQRSTUVWXYZ

print(string.digits)

0123456789

print(string.punctuation) # < = >

!"#$%&'()*+,-./:;<=>?@[\]^_`{|}~

print(string.printable)

0123456789abcdefghijklmnopqrstuvwxyzABCDEFGHIJKLMNOPQRSTUVWXYZ!"#$%&'()*+,-./:;<=>?@[\]^_`{|}~

print(string.whitespace)

print(repr(string.whitespace))

' \t\n\r\x0b\x0c'

一些字符串的运算

字符串的加减

print("abc" + "def")
print("abc" * 3)

abcdef
abcabcabc

print("abc" + 3)

---------------------------------------------------------------------------
TypeError                                 Traceback (most recent call last)
Cell In [31], line 1
----> 1 print("abc" + 3)

TypeError: can only concatenate str (not "int") to str

in 运算（超级好用！）

print("ring" in "strings") # True
print("wow" in "amazing!") # False
print("Yes" in "yes!") # False
print("" in "No way!") # True
print("聪明" in "聪明办法学 Python") # True

True
False
False
True
True

字符串索引和切片

单个字符索引

索引可以让我们在特定位置找到一个字符

s = "Datawhale"
print(s)
print(s[0])
print(s[1])
print(s[2])
print(s[3])

Datawhale
D
a
t
a

len(s)

print(s[len(s)-1])

print(s[len(s)])

---------------------------------------------------------------------------
IndexError                                Traceback (most recent call last)
Cell In [36], line 1
----> 1 print(s[len(s)])

IndexError: string index out of range

负数索引

print(s)
print(s[-5])
print(s[-4])
print(s[-3])
print(s[-2])
print(s[-1])

Datawhale
w
h
a
l
e

用切片来获取字符串的一部分

print(s[0:4])
print(s[4:9])

Data
whale

print(s[0:2])
print(s[2:4])
print(s[5:7])
print(s[7:9])

Da
ta
ha
le

切片的默认参数

print(s[:4])
print(s[4:])
print(s[:])

Data
whale
Datawhale

切片的第三个参数 step

print(s[:9:3])
print(s[1:4:2])

Daa
aa

翻转字符串

# 可以，但是不优雅
print(s[::-1])

elahwataD

# 也可以，但是还是不够优雅
print("".join(reversed(s)))

elahwataD

# 实在是太优雅辣
def reverseString(s):
    return s[::-1]

print(reverseString(s))

elahwataD

字符串的循环

用索引的 for 循环

for i in range(len(s)):
    print(i, s[i])

0 D
1 a
2 t
3 a
4 w
5 h
6 a
7 l
8 e

其实也可以不用索引（超级好用的 in）

for c in s:
    print(c)

D
a
t
a
w
h
a
l
e

也可以使用 enumerate() 获得元素的序号

for idx, c in enumerate(s):
    print(idx, c)

0 D
1 a
2 t
3 a
4 w
5 h
6 a
7 l
8 e

zip(a, b) 可以在一次循环中，分别从 a 和 b 里同时取出一个元素

for a, b in zip(s, reverseString(s)):
    print(a, b)

D e
a l
t a
a h
w w
h a
a t
l a
e D

用 split() 来循环

# class_name.split() 本身会产生一个新的叫做“列表”的东西，但是它不存储任何内容

class_name = "learn python the smart way 2nd edition"
for word in class_name.split():
    print(word)

learn
python
the
smart
way
2nd
edition

用 splitlines() 来循环

# 跟上面一样，class_info.splitlines() 也会产生一个列表，但不存储任何内容

class_info = """\
聪明办法学 Python 第二版是 Datawhale 基于第一版教程的一次大幅更新。我们尝试在教程中融入更多计算机科学与人工智能相关的内容，制作“面向人工智能的 Python 专项教程”。
我们的课程简称为 P2S，有两个含义：
Learn Python The Smart Way V2，“聪明办法学 Python 第二版”的缩写。
Prepare To Be Smart， 我们希望同学们学习这个教程后能学习到聪明的办法，从容的迈入人工智能的后续学习。
"""
for line in class_info.splitlines():
    if (line.startswith("Prepare To Be Smart")):
        print(line)

Prepare To Be Smart， 我们希望同学们学习这个教程后能学习到聪明的办法，从容的迈入人工智能的后续学习。

例子：回文判断

如果一个句子正着读、反着读都是一样的，那它就叫做“回文”

def isPalindrome1(s):
    return (s == reverseString(s))

def isPalindrome2(s):
    for i in range(len(s)):
        if (s[i] != s[len(s)-1-i]):
            return False
    return True

def isPalindrome3(s):
    for i in range(len(s)):
        if (s[i] != s[-1-i]):
            return False
    return True

def isPalindrome4(s):
    while (len(s) > 1):
        if (s[0] != s[-1]):
            return False
        s = s[1:-1]
    return True

print(isPalindrome1("abcba"), isPalindrome1("abca"))
print(isPalindrome2("abcba"), isPalindrome2("abca"))
print(isPalindrome3("abcba"), isPalindrome3("abca"))
print(isPalindrome4("abcba"), isPalindrome4("abca"))

True False
True False
True False
True False

更推荐第一种方法

一些跟字符串相关的内置函数

str() 和 len()

name = input("输入你的名字: ")
print("Hi, " + name + ", 你的名字有 " + str(len(name)) + " 个字！")

输入你的名字: Datawhale
Hi, Datawhale, 你的名字有 9 个字！

chr() 和 ord()

print(ord("A"))

print(chr(65))

print(
    chr(
        ord("A") + 1
    )
)

print(chr(ord("A") + ord(" ")))

# 它可以正常运行，但是我们不推荐你使用这个方法
s = "(3**2 + 4**2)**0.5"
print(eval(s))

5.0


def 电脑当场爆炸():    
from rich.progress import (
        Progress, 
        TextColumn, 
        BarColumn, 
        TimeRemainingColumn)
    import time
    from rich.markdown import Markdown
    from rich import print as rprint
    from rich.panel import Panel

   with Progress(TextColumn("[progress.description]{task.description}"),
                BarColumn(),
                TimeRemainingColumn()) as progress:
        epoch_tqdm = progress.add_task(description="爆炸倒计时！", total=100)
        for ep in range(100):
            time.sleep(0.1)
            progress.advance(epoch_tqdm, advance=1)

    rprint(Panel.fit("[red]Boom! R.I.P"))

s = "电脑当场爆炸()"
eval(s) # 如果这是一串让电脑爆炸的恶意代码，那会发生什么
爆炸倒计时！ ━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━╸ 0:00:01

╭─────────────╮
│ Boom! R.I.P │
╰─────────────╯

# 推荐使用 ast.literal_eval()

import ast
s_safe = "['p', 2, 's']"
s_safe_result = ast.literal_eval(s_safe)
print(s_safe_result)
print(type(s_safe_result))

['p', 2, 's']
<class 'list'>

一些字符串方法

def p(test):
    print("True     " if test else "False    ", end="")
def printRow(s):
    print(" " + s + "  ", end="")
    p(s.isalnum())
    p(s.isalpha())
    p(s.isdigit())
    p(s.islower())
    p(s.isspace())
    p(s.isupper())
    print()
def printTable():
    print("  s   isalnum  isalpha  isdigit  islower  isspace  isupper")
    for s in "ABCD,ABcd,abcd,ab12,1234,    ,AB?!".split(","):
        printRow(s)
printTable()

  s   isalnum  isalpha  isdigit  islower  isspace  isupper
 ABCD  True     True     False    False    False    True     
 ABcd  True     True     False    False    False    False    
 abcd  True     True     False    True     False    False    
 ab12  True     False    False    True     False    False    
 1234  True     False    True     False    False    False    
       False    False    False    False    True     False    
 AB?!  False    False    False    False    False    True

print("YYDS YYSY XSWL DDDD".lower())
print("fbi! open the door!!!".upper())

yyds yysy xswl dddd
FBI! OPEN THE DOOR!!!

print("   strip() 可以将字符串首尾的空格删除    ".strip())

strip() 可以将字符串首尾的空格删除

print("聪明办法学 Python".replace("Python", "C"))
print("Hugging LLM, Hugging Future".replace("LLM", "SD", 1)) # count = 1

聪明办法学 C
Hugging SD, Hugging Future

s = "聪明办法学Python, 就找 Datawhale"
t = s.replace("聪明办法", "")
print(t)

学Python, 就找 Datawhale

print("This is a history test".count("is"))
print("This IS a history test".count("is"))

3
2

print("Dogs and cats!".startswith("Do"))
print("Dogs and cats!".startswith("Don't"))

True
False

print("Dogs and cats!".endswith("!"))
print("Dogs and cats!".endswith("rats!"))

True
False

print("Dogs and cats!".find("and"))
print("Dogs and cats!".find("or"))

5
-1

print("Dogs and cats!".index("and"))
print("Dogs and cats!".index("or"))

---------------------------------------------------------------------------
ValueError                                Traceback (most recent call last)
c:\Coding\Datawhale\Python_Tutorial\learn-python-the-smart-way-v2\slides\chapter_6-Strings.ipynb Cell 112 line 2
      <a href='vscode-notebook-cell:/c%3A/Coding/Datawhale/Python_Tutorial/learn-python-the-smart-way-v2/slides/chapter_6-Strings.ipynb#Y221sZmlsZQ%3D%3D?line=0'>1</a> print("Dogs and cats!".index("and"))
----> <a href='vscode-notebook-cell:/c%3A/Coding/Datawhale/Python_Tutorial/learn-python-the-smart-way-v2/slides/chapter_6-Strings.ipynb#Y221sZmlsZQ%3D%3D?line=1'>2</a> print("Dogs and cats!".index("or"))

ValueError: substring not found

用 `f-string` 格式化字符串

x = 42
y = 99

print(f'你知道 {x} + {y} 是 {x+y} 吗？')

你知道 42 + 99 是 141 吗？

其他格式化字符串的方法

如果要格式化字符串的话，f-string 是个很棒的方法，Python 还有其他方法去格式化字符串：

% 操作
format() 方法

参考资料：

字符串是不可变的

s = "Datawhale"
s[3] = "e"  # Datewhale

---------------------------------------------------------------------------
TypeError                                 Traceback (most recent call last)
c:\Coding\Datawhale\Python_Tutorial\learn-python-the-smart-way-v2\slides\chapter_6-Strings.ipynb Cell 118 line 2
      <a href='vscode-notebook-cell:/c%3A/Coding/Datawhale/Python_Tutorial/learn-python-the-smart-way-v2/slides/chapter_6-Strings.ipynb#Y230sZmlsZQ%3D%3D?line=0'>1</a> s = "Datawhale"
----> <a href='vscode-notebook-cell:/c%3A/Coding/Datawhale/Python_Tutorial/learn-python-the-smart-way-v2/slides/chapter_6-Strings.ipynb#Y230sZmlsZQ%3D%3D?line=1'>2</a> s[3] = "e"

TypeError: 'str' object does not support item assignment

你必须创建一个新的字符串

s = s[:3] + "e" + s[4:]
print(s)

Datewhale

字符串和别名

字符串是不可变的，所以它的别名也是不可变的

s = 'Data'  # s 引用了字符串 “Data”
t = s      # t 只是 “Data” 的一个只读别名
s += 'whale'
print(s)
print(t)

Datawhale
Data

t[3] = "e"

---------------------------------------------------------------------------
TypeError                                 Traceback (most recent call last)
c:\Coding\Datawhale\Python_Tutorial\learn-python-the-smart-way-v2\slides\chapter_6-Strings.ipynb Cell 124 line 1
----> <a href='vscode-notebook-cell:/c%3A/Coding/Datawhale/Python_Tutorial/learn-python-the-smart-way-v2/slides/chapter_6-Strings.ipynb#Y236sZmlsZQ%3D%3D?line=0'>1</a> t[3] = "e"

TypeError: 'str' object does not support item assignment

基础文件操作

`Open()` 函数

Python open() 函数用于打开一个文件，并返回文件对象，在对文件进行处理过程都需要使用到这个函数。

open(file, mode) 函数主要有 file 和 mode 两个参数，其中 file 为需要读写文件的路径。mode 为读取文件时的模式，常用的模式有以下几个：

r：以字符串的形式读取文件。
rb：以二进制的形式读取文件。
w：写入文件。
a：追加写入文件。

不同模式下返回的文件对象功能也会不同。

file = open("chap6_demo.txt", "w")
dw_text = "Datawhale"
file.write(dw_text)
file.close()

file = open('chap6_demo.txt', 'r')
print(type(file))

文件对象

open 函数会返回一个文件对象。在进行文件操作前，我们首先需要了解文件对象提供了哪些常用的方法：

close( ): 关闭文件
在 r 与 rb 模式下：
- read(): 读取整个文件
- readline(): 读取文件的一行
- readlines(): 读取文件的所有行
在 w 与 a 模式下：
- write():
- writelines():

下面我们通过实例学习这几种方法：

## 通过 read 方法读取整个文件
content = file.read()
print(content)

Datawhale

## 通过 readline() 读取文件的一行
content = file.readline()
print(content)

代码竟然什么也没输出，这是为什么？

## 关闭之前打开的 chap6_demo.txt 文件
file.close()
## 重新打开
file = open('chap6_demo.txt', 'r')
content = file.readline()
print(content)

Datawhale

注意每次操作结束后，及时通过 close( ) 方法关闭文件

## 以 w 模式打开文件chap6_demo.txt
file = open('chap6_demo.txt', 'w')
## 创建需要写入的字符串变量 在字符串中 \n 代表换行（也就是回车）
content = 'Data\nwhale\n'
## 写入到 chap6_demo.txt 文件中
file.write(content)
## 关闭文件对象
file.close()

w 模式会覆盖之前的文件。如果你想在文件后面追加内容，可以使用 a 模式操作。

## 以 w 模式打开文件chap6_demo.txt
file = open('chap6_demo.txt', 'w')
## 创建需要追加的字符串变量
content = 'Hello smart way!!!'
## 写入到 chap6_demo.txt 文件中
file.write(content)
## 关闭文件对象
file.close()

with 语句

我不想写 close() 啦

import this
The Zen of Python, by Tim Peters

Beautiful is better than ugly.
Explicit is better than implicit.
Simple is better than complex.
Complex is better than complicated.
Flat is better than nested.
Sparse is better than dense.
Readability counts.
Special cases aren't special enough to break the rules.
Although practicality beats purity.
Errors should never pass silently.
Unless explicitly silenced.
In the face of ambiguity, refuse the temptation to guess.
There should be one-- and preferably only one --obvious way to do it.
Although that way may not be obvious at first unless you're Dutch.
Now is better than never.
Although never is often better than *right* now.
If the implementation is hard to explain, it's a bad idea.
If the implementation is easy to explain, it may be a good idea.
Namespaces are one honking great idea -- let's do more of those!

Caesar_cipher = """s = \"\"\"Gur Mra bs Clguba, ol Gvz Crgref

Ornhgvshy vf orggre guna htyl.
Rkcyvpvg vf orggre guna vzcyvpvg.
Fvzcyr vf orggre guna pbzcyrk.
Pbzcyrk vf orggre guna pbzcyvpngrq.
Syng vf orggre guna arfgrq.
Fcnefr vf orggre guna qrafr.
Ernqnovyvgl pbhagf.
Fcrpvny pnfrf nera'g fcrpvny rabhtu gb oernx gur ehyrf.
Nygubhtu cenpgvpnyvgl orngf chevgl.
Reebef fubhyq arire cnff fvyragyl.
Hayrff rkcyvpvgyl fvyraprq.
Va gur snpr bs nzovthvgl, ershfr gur grzcgngvba gb thrff.
Gurer fubhyq or bar-- naq cersrenoyl bayl bar --boivbhf jnl gb qb vg.
Nygubhtu gung jnl znl abg or boivbhf ng svefg hayrff lbh'er Qhgpu.
Abj vf orggre guna arire.
Nygubhtu arire vf bsgra orggre guna *evtug* abj.
Vs gur vzcyrzragngvba vf uneq gb rkcynva, vg'f n onq vqrn.
Vs gur vzcyrzragngvba vf rnfl gb rkcynva, vg znl or n tbbq vqrn.
Anzrfcnprf ner bar ubaxvat terng vqrn -- yrg'f qb zber bs gubfr!\"\"\"


d = {}
for c in (65, 97):
    for i in range(26):
        d[chr(i+c)] = chr((i+13) % 26 + c)

print("".join([d.get(c, c) for c in s]))
"""

with open("ZenOfPy.py", "w", encoding="utf-8") as file:
    file.write(Caesar_cipher)
    print(len(Caesar_cipher))

import ZenOfPy
The Zen of Python, by Tim Peters

Beautiful is better than ugly.
Explicit is better than implicit.
Simple is better than complex.
Complex is better than complicated.
Flat is better than nested.
Sparse is better than dense.
Readability counts.
Special cases aren't special enough to break the rules.
Although practicality beats purity.
Errors should never pass silently.
Unless explicitly silenced.
In the face of ambiguity, refuse the temptation to guess.
There should be one-- and preferably only one --obvious way to do it.
Although that way may not be obvious at first unless you're Dutch.
Now is better than never.
Although never is often better than *right* now.
If the implementation is hard to explain, it's a bad idea.
If the implementation is easy to explain, it may be a good idea.
Namespaces are one honking great idea -- let's do more of those!

总结

单引号与双引号要适时出现，多行文本用三引号。
字符串中可以包含转义序列。
repr() 能够显示出更多的信息。
字符串本身包含许多内置方法，in 是一个特别好用的玩意。
字符串是不可变的常量。
文件操作推荐使用 with open("xxx") as yyy，这样就不用写 f.close() 啦。

纵观历史，你会看到苹果创始人史蒂夫·乔布斯对 Less is More 的追求，看到无印良品“删繁就简，去其浮华”的核心设计理念，看到山下英子在《断舍离》中对生活做减法的观点，甚至看到苏东坡“竹杖芒鞋轻胜马，一蓑烟雨任平生”的人生态度。你会发现极简主义不只存在于 Python 编程中，它本就是这个世界优雅的一条运行法则。