Python-碎言碎语1

最新推荐文章于 2023-08-15 22:09:12 发布

米谷

最新推荐文章于 2023-08-15 22:09:12 发布

阅读量338

点赞数

分类专栏： python

本文链接：https://blog.csdn.net/Migu4423/article/details/103141436

版权

python 专栏收录该内容

14 篇文章 0 订阅

订阅专栏

Python核心技术与实战（一）

学习Python核心技术与实战的第6讲《python黑盒：输入与输出》中，作者提到了一个例子，顺便拿来练习下。

1、键盘的输入输出

没想到一个小小的例子，居然很容易碰到各种小问题。如下：

#-*- coding:utf-8 -*-
#name = raw_input('your name:')
#gender = raw_input('you are a boy?(y/n)')
name = input('your name:')
gender = input('you are a boy?(y/n)')

welcome_str = 'Welcome to the matrix {prefix}{name}.'
welcome_dic = {
    'prefix':'Mr.' if gender == 'y' else 'Mrs.',
    'name':name
}

print('authorizing....')
print(welcome_str.format(**welcome_dic))

在这里插入图片描述
由于运行时报错：SyntaxError: invalid syntax，是指非法语句的意思，检查语法是否出现错误，漏写等。
发现在第7行末尾，少了一个逗号。添加上后重新运行代码就正常了。

但是通过键盘输入姓名：xiaoxi，就出现如下问题，提示name ‘XXX’ is not defined
在这里插入图片描述
问题的原因：
pycharm解析器使用的python2.X，input函数不支持字符串直接调用
解决方案：
方法一：pycharm解析器改用python3.X，可以解决。
方法二：python2.X中，input函数改用 raw_input（）做交互输入。

作者：Calyn0709
链接：https://www.jianshu.com/p/86d6fc0a48d9
来源：简书
按第一种方式解决：file–>找到settings，Project:->Project Interpreter，list设置python3.x重新运行项目，发现问题没得到解决。
把try设置python3.x重新运行项目，发现没起作用，直到重启Pycharm后才起作用。目前已正常。（备注：这里的list、try均为两个不同的项目，由于代码是放在try项目下的，要修复对应项目try为python3.x，重新启动Pycharm才会起作用）
在这里插入图片描述

在这里插入图片描述
如果采用第二种方式，直接修改input为raw_input，也可以修复这个报错。
展示整个例子的完整输入及输出，如下：

知识点释疑：

> 官方文档

1、input()与raw_input()区别
1）版本差异
raw_input—python2版本
input —python3版本
2）输入格式差异
raw_input()输入的都会自动转换为字符串，而input()必须按照Python的规则来~
在这里插入图片描述
3）如何使用
如果是python2版本，两种均可使用，使用差异详见2)
如果是python3版本，只能使用input()
由于版本升级优化，后续会淘汰raw_input()，为了书写习惯，建议都使用input()
看python input的文档，可以看到input其实是通过raw_input来实现的，原理很简单，就下面一行代码：

def input(prompt):
    return (eval(raw_input(prompt)))

input() 函数暂停程序运行，同时等待键盘输入；直到回车被按下，函数的参数即为提示语，输入的类型永远是字符串型（str）。print() 函数则接受字符串、数字、字典、列表甚至一些自定义类的输出。
2、**的使用

print(welcome_str.format(**welcome_dic))

比较好奇这个** 的使用，所以尝试把**去掉后，重新运行项目。出现如下报错：
在这里插入图片描述
**用途为：
用于指定函数传入参数的类型，**用于参数前则表示传入的(多个)参数将按照字典的形式存储，是一个字典。
3、str.format()使用
print(welcome_str.format(**welcome_dic))
Python format 格式化函数

2、文件的输入输出

#-*- coding:utf-8 -*-
#!/use/bin/python

#打开一个文件
#fo = open('int1.txt','w') #当前文件所在目录下会自动创建一个int。txt文件
#fo.write("hello file\nyou are so good！\n")
#fo.close()
#print ('文件名:',fo.name)
#print ('是否已关闭：',fo.closed)
#print ('访问模式：',fo.mode)
#print ('末尾是否强制加空格：',fo.softspace) #在Python3.x上就不适用了

#fo = open(r'C:\Users\EL113\Desktop\int.txt','r')
#fo = open('C:\\Users\\EL113\\Desktop\\int.txt','r')
fo = open('C:/Users/EL113/Desktop/int.txt','r')
#for foo in  fo.readline():
#    foo = foo.strip('\n')

foo = fo.read()
foo1 = ''.join(foo for foo in foo if foo.isalnum())    #有疑问
#foo2 = foo1.upper() #小写转大写
foo2 = foo1.lower() #大写转小写

print(foo2)

绝对路径引入文件时，要留意要python中的写法
在这里插入图片描述

提示： fo = open(‘C:\Users\EL113\Desktop\int.txt’,‘r’)
^ SyntaxError: (unicode error) ‘unicodeescape’ codec can’t decode bytes in position 2-3: truncated \UXXXXXXXX escape

Process finished with exit code 1

原因分析：在windows系统当中读取文件路径可以使用,但是在python字符串中\有转义的含义，如\t可代表TAB，\n代表换行，所以我们需要采取一些方式使得\不被解读为转义字符。目前有3个解决方案
版权声明：此解决方案摘自CSDN博主「可乐饲养员」的原创文章

1、在路径前面加r，即保持字符原始值的意思。

fo = open(r'C:\Users\EL113\Desktop\int.txt','r')

2、替换为双反斜杠

fo = open('C:\\Users\\EL113\\Desktop\\int.txt','r')

3、替换为正斜杠

fo = open('C:/Users/EL113/Desktop/int.txt','r')

————————————————
大小写转换

foo = fo.read()
foo1 = ''.join(foo for foo in foo if foo.isalnum())    #有疑问
#foo2 = foo1.upper() #小写转大写
foo2 = foo1.lower() #大写转小写

释疑：
1、str.join()
join()：连接字符串数组。将字符串、元组、列表中的元素以指定的字符(分隔符)连接生成一个新的字符串

#对序列进行操作（分别使用' '与':'作为分隔符）

  
>>> seq1 = ['hello','good','boy','doiido']
>>> print ' '.join(seq1)
hello good boy doiido
>>> print ':'.join(seq1)
hello:good:boy:doiido

2、file.isalnum()
isalnum() 方法检测字符串是否由字母和数字组成
返回值为布尔值
true or false
3、.upper()与.lower()
.upper()返回小写字母转为大写字母的字符串。
.lower()返回大写字母转为小写字母的字符串。

简单的 NLP（自然语言处理）任务

基本步骤如下
1、读取文件；
2、去除所有标点符号和换行符，并把所有大写变成小写；
3、合并相同的词，统计每个词出现的频率，并按照词频从大到小排序；
4、将结果按行输出到文件 out.txt。

我的做法是根据步骤，自行去思考如何达到每个步骤中的效果。然后再跟作者的代码比对，发现有些实现方式，作者的更为简洁好用。
然而对着作者的代码敲到编辑器上，执行后发现结果并未与作者输出的一致。似乎有些地方不对劲。后面经过排查才发现是xx.sub()的运用有点问题。
有问题的代码截图如下：在这里插入图片描述
发现打开输出的out文档。如下：

ihaveadreamthatmyfourlittlechildrenwillonedayliveinanationwheretheywillnotbejudgedbythecoloroftheirskinbutbythecontentoftheircharacterihaveadreamtodayihaveadreamthatonedaydowninalabamawithitsviciousracistsonedayrightthereinalabamalittleblackboysandblackgirlswillbeabletojoinhandswithlittlewhiteboysandwhitegirlsassistersandbrothersihaveadreamtodayihaveadreamthatonedayeveryvalleyshallbeexaltedeveryhillandmountainshallbemadelowtheroughplaceswillbemadeplainandthecrookedplaceswillbemadestraightandthegloryofthelordshallberevealedandallfleshshallseeittogetherthisisourhopewiththisfaithwewillbeabletohewoutofthemountainofdespairastoneofhopewiththisfaithwewillbeabletotransformthejanglingdiscordsofournationintoabeautifulsymphonyofbrotherhoodwiththisfaithwewillbeabletoworktogethertopraytogethertostruggletogethertogotojailtogethertostandupforfreedomtogetherknowingthatwewillbefreeonedayandwhenthishappensandwhenweallowfreedomringwhenweletitringfromeveryvillageandeveryhamletfromeverystateandeverycitywewillbeabletospeedupthatdaywhenallofgodschildrenblackmenandwhitemenjewsandgentilesprotestantsandcatholicswillbeabletojoinhandsandsinginthewordsoftheoldnegrospiritualfreeatlastfreeatlastthankgodalmightywearefreeatlast 1

发现文档并没有分成词组，也没有统计词频。文档中词与词之间并没有分隔，是否因为没有分隔才导致后续的无法统计。
解决问题：
在parse()函数中，以下代码段存在写法问题。

#使用正则表达式去除标点符号和换行符
    text = re.sub(r'[^\w]','',text)

改为：

#使用正则表达式去除标点符号和换行符
    text = re.sub(r'[^\w]',' ',text)

重新打开输出的文档查看：

and 15
be 13
will 11
to 11
…
…
sing 1
exalted 1
for 1
as 1

附上源码：

#-*- coding:utf-8 -*-
import re

def parse(text):
    #使用正则表达式去除标点符号和换行符
    text = re.sub(r'[^\w]',' ',text)

    #转小写
    text = text.lower()

    #生成所有单词的列表
    word_list = text.split(' ')

    #去除空白单词
    word_list = filter(None,word_list)

    #生成单词和词频的字典
    word_cnt = {}
    for word in word_list:
        if word not in word_cnt:
            word_cnt[word] = 0
        word_cnt[word] += 1

    #按照词频排序
    sorted_word_cnt = sorted(word_cnt.items(),key = lambda kv: kv[1],reverse=True)

    return sorted_word_cnt

with open('int.txt','r') as  fin:
    text = fin.read()

word_and_freq = parse(text)

with open('outt.txt','w') as fout:
    for word,freq in word_and_freq:
        fout.write('{} {}\n'.format(word,freq))

知识点拓展：

1、re.sub(pattern, repl, string, count=0, flags=0) #用于替换字符串中的匹配项。
参数：
pattern：正则中的模式字符串
repl：替换的字符串，也可为一个函数
string：要被查找替换的原始字符串
count：模式匹配后替换的最大次数，默认0标示替换所有的匹配
flags：暂时未知，待释疑

示例：repl 参数是一个字符串时

#!/usr/bin/python
# -*- coding: UTF-8 -*-
import re
phone = "2004-959-559 # 这是一个国外电话号码"
# 删除字符串中的 Python注释  .*$ 类似通配符，也就是标示#号后面无论带任何数据，全部要被替换
num = re.sub(r'#.*$', "", phone) 
print "电话号码是: ", num
# 删除非数字(-)的字符串 \D在正则表达式中的意思为：匹配一个非数字字符。等价于[^0-9]
num = re.sub(r'\D', "", phone)
print "电话号码是 : ", num

示例：repl 参数是一个函数（这个例子还没理解清楚？）

#!/usr/bin/python
# -*- coding: UTF-8 -*-
import re
# 将匹配的数字乘以 2
def double(matched):
    value = int(matched.group('value'))
    return str(value * 2)
s = 'A23G4HFD567'
print(re.sub('(?P<value>\d+)', double, s))

2、*号
1）表示乘号
2）标示倍数

def T(msg, time=1):
    print((msg + ' ') * time)
T('hi', 3)

输出结果：

hi hi hi

3）单个星号
a、如：*parameter是用来接受任意多个参数并将其放在一个元组中

b、如：函数在调用多个参数时，在列表、元组、集合、字典及其他科迭代对象作为实参，定在前面加*

def T2(a,b,c):
    print(a,b,c)
T2(1,2,3)
a = [1,2,3]
b = [2,3,4,5]
c = [3,4,5]
T2(a,b,c)
T2(*a)

输出：

1 2 3
[1, 2, 3] [2, 3, 4, 5] [3, 4, 5]
1 2 3

如 (1,2,3)解释器讲自动进行解包然后传递给多个单变量参数（参数个数要对应相等
），如果参数个数没有对应，则会报错，如下：
4）、两个*
如：*parameter用于接收类似关键参数一样赋值的形式的多个实参放入字典中（即把该函数的参数转换为字典）

def T3(**p):
    for i in  p.items():
        print(i)

T3(x=1,y=2)

输出：
(‘x’, 1)
(‘y’, 2)
再次执行代码，发现输出结果先后变了，如下：
(‘y’, 2)
(‘x’, 1)
再执行一次代码，又变为：
(‘x’, 1)
(‘y’, 2)
△这个疑惑待解决？
写函数的时候，记得要带上：，不然报错如下
在这里插入图片描述
3、.items()
字典(Dictionary) items() 函数以列表返回可遍历的(键, 值) 元组数组。

dict ={'hello':'python','url':'baidu.com','happy':'time'}
#   %s,表示格式化一个对象为字符  
print ('字典值：%s'% dict.items())
#遍历字典列表
for key,values in dict.items():
    print (key,values)

输出如下：

字典值：dict_items([(‘hello’, ‘python’), (‘url’, ‘baidu.com’), (‘happy’, ‘time’)])
hello python
url baidu.com
happy time

不用%s，直接换成如下写法输出结果也一致：（疑惑？）

dict ={'hello':'python','url':'baidu.com','happy':'time'}
print ('字典值：', dict.items())

#遍历字典列表
for key,values in dict.items():
    print (key,values)

4、lambda
这里只讲代码里涉及到的lambda的一种用法。需要详细了解的可查阅：测试不将就
将lambda函数作为参数传递给其他函数，如sorted函数
此时lambda函数用于指定对列表中所有元素进行排序的准则。
例如sorted([1, 2, 3, 4, 5, 6, 7, 8, 9], key=lambda x: abs(5-x))将列表[1, 2, 3, 4, 5, 6, 7, 8, 9]按照元素与5距离从小到大进行排序，其结果是[5, 4, 6, 3, 7, 2, 8, 1, 9]
5、sorted函数
详见如下用法，或者参考原创作者文章
源代码中用到的就是字典的用法
在这里插入图片描述