第6章 python 模块使用(二) 匿名函数与内置函数

最新推荐文章于 2024-09-09 00:00:00 发布

weixin_34015336

最新推荐文章于 2024-09-09 00:00:00 发布

阅读量91

点赞数

文章标签： python awk java

原文链接：http://blog.51cto.com/fenghaining/1955381

版权

简介

正则表达式（regular expression）是可以匹配文本片段的模式。最简单的正则表达式就是普通字符串，可以匹配其自身。比如，正则表达式 ‘hello’ 可以匹配字符串 ‘hello’。

要注意的是，正则表达式并不是一个程序，而是用于处理字符串的一种模式，如果你想用它来处理字符串，就必须使用支持正则表达式的工具，比如 Linux 中的 awk, sed, grep，或者编程语言 Perl, Python, Java 等等。

正则表达式有多种不同的风格，下表列出了适用于 Python 或 Perl 等编程语言的部分元字符以及说明：

re 模块

在 Python 中，我们可以使用内置的 re 模块来使用正则表达式。

有一点需要特别注意的是，正则表达式使用 \ 对特殊字符进行转义，比如，为了匹配字符串 ‘python.org’，我们需要使用正则表达式 'python\.org'，而 Python 的字符串本身也用 \ 转义，所以上面的正则表达式在 Python 中应该写成 'python\\.org'，这会很容易陷入 \ 的困扰中，因此，我们建议使用 Python 的原始字符串，只需加一个 r 前缀，上面的正则表达式可以写成：

r'python\.org'

re 模块提供了不少有用的函数，用以匹配字符串，比如：

compile 函数

match 函数

search 函数

findall 函数

finditer 函数

split 函数

sub 函数

subn 函数

re 模块的一般使用步骤如下：

使用 compile 函数将正则表达式的字符串形式编译为一个 Pattern 对象

通过 Pattern 对象提供的一系列方法对文本进行匹配查找，获得匹配结果（一个 Match 对象）

最后使用 Match 对象提供的属性和方法获得信息，根据需要进行其他的操作

compile 函数

compile 函数用于编译正则表达式，生成一个 Pattern 对象，它的一般使用形式如下：

re.compile(pattern[, flag])

其中，pattern 是一个字符串形式的正则表达式，flag 是一个可选参数，表示匹配模式，比如忽略大小写，多行模式等

match 方法

match 方法用于查找字符串的头部（也可以指定起始位置），它是一次匹配，只要找到了一个匹配的结果就返回，而不是查找所有匹配的结果。它的一般使用形式如下：

match(string[, pos[, endpos]])

其中，string 是待匹配的字符串，pos 和 endpos 是可选参数，指定字符串的起始和终点位置，默认值分别是 0 和 len (字符串长度)。因此，当你不指定 pos 和 endpos 时，match 方法默认匹配字符串的头部。

当匹配成功时，返回一个 Match 对象，如果没有匹配上，则返回 None。

search 方法

search 方法用于查找字符串的任何位置，它也是一次匹配，只要找到了一个匹配的结果就返回，而不是查找所有匹配的结果，它的一般使用形式如下：

search(string[, pos[, endpos]])

其中，string 是待匹配的字符串，pos 和 endpos 是可选参数，指定字符串的起始和终点位置，默认值分别是 0 和 len (字符串长度)。

当匹配成功时，返回一个 Match 对象，如果没有匹配上，则返回 None。

findall 方法

上面的 match 和 search 方法都是一次匹配，只要找到了一个匹配的结果就返回。然而，在大多数时候，我们需要搜索整个字符串，获得所有匹配的结果。

findall 方法的使用形式如下：

findall(string[, pos[, endpos]])

其中，string 是待匹配的字符串，pos 和 endpos 是可选参数，指定字符串的起始和终点位置，默认值分别是 0 和 len (字符串长度)。

findall 以列表形式返回全部能匹配的子串，如果没有匹配，则返回一个空列表

finditer 方法

finditer 方法的行为跟 findall 的行为类似，也是搜索整个字符串，获得所有匹配的结果。但它返回一个顺序访问每一个匹配结果（Match 对象）的迭代器

例子：

import re

# print(re.findall('\w','hello_ | egon 123'))

# print(re.findall('\W','hello_ | egon 123'))

# print(re.findall('\s','hello_ | egon 123 \n \t'))

# print(re.findall('\S','hello_ | egon 123 \n \t'))

# print(re.findall('\d','hello_ | egon 123 \n \t'))

# print(re.findall('\D','hello_ | egon 123 \n \t'))

# print(re.findall('h','hello_ | hello h egon 123 \n \t'))

# # print(re.findall('\Ahe','hello_ | hello h egon 123 \n \t'))

# print(re.findall('^he','hello_ | hello h egon 123 \n \t'))

# # print(re.findall('123\Z','hello_ | hello h egon 123 \n \t123'))

# print(re.findall('123$','hello_ | hello h egon 123 \n \t123'))

# print(re.findall('\n','hello_ | hello h egon 123 \n \t123'))

# print(re.findall('\t','hello_ | hello h egon 123 \n \t123'))

#. [] [^]

#.本身代表任意一个字符

# print(re.findall('a.c','a a1c a*c a2c abc a c aaaaaac aacc'))

#a.c

# print(re.findall('a.c','a a1c a*c a2c abc a\nc',re.DOTALL))

# print(re.findall('a.c','a a1c a*c a2c abc a\nc',re.S))

#[]内部可以有多个字符，但是本身只配多个字符中的一个

# print(re.findall('a[0-9][0-9]c','a a12c a1c a*c a2c a c a\nc',re.S))

# print(re.findall('a[a-zA-Z]c','aac abc aAc a12c a1c a*c a2c a c a\nc',re.S))

# print(re.findall('a[^a-zA-Z]c','aac abc aAc a12c a1c a*c a2c a c a\nc',re.S))

# print(re.findall('a[\+\/\*\-]c','a-c a+c a/c aac abc aAc a12c a1c a*c a2c a c a\nc',re.S))

#\:转义

# print(re.findall(r'a\\c','a\c abc')) #rawstring

#? * + {}：左边有几个字符，如果有的话，贪婪匹配

#?左边那一个字符有0个或者1个

# print(re.findall('ab?','aab a ab aaaa'))

#ab?

#*左边那一个字符有0个或者无穷个

# print(re.findall('ab*','a ab abb abbb abbbb bbbbbb'))

# print(re.findall('ab{0,}','a ab abb abbb abbbb bbbbbb'))

#+左边那一个字符有1个或者无穷个

# print(re.findall('ab+','a ab abb abbb abbbb bbbbbb'))

# print(re.findall('ab{1,}','a ab abb abbb abbbb bbbbbb'))

#{n,m}左边的字符有n-m次

# print(re.findall('ab{3}','a ab abb abbb abbbb bbbbbb'))

# print(re.findall('ab{2,3}','a ab abb abbb abbbb bbbbbb'))

# .* .*?

#.*贪婪匹配

# print(re.findall('a.*c','a123c456c'))

#.*?非贪婪匹配

# print(re.findall('a.*?c','a123c456c'))

# print(re.findall('company|companies','Too many companies have gone bankrupt, and the next one is my company'))

# company|companies

# print(re.findall('compan|companies','Too many companies have gone bankrupt, and the next one is my company'))

#():分组

# print(re.findall('ab+','abababab123'))

# print(re.findall('ab+123','abababab123'))

# print(re.findall('ab','abababab123'))

# print(re.findall('(ab)','abababab123'))

# print(re.findall('(a)b','abababab123'))

# print(re.findall('a(b)','abababab123'))

# print(re.findall('(ab)+','abababab123'))

# print(re.findall('(?:ab)+','abababab123'))

# print(re.findall('(ab)+123','abababab123'))

# print(re.findall('(?:ab)+123','abababab123'))

# print(re.findall('(ab)+(123)','abababab123'))

# print(re.findall('compan(y|ies)','Too many companies have gone bankrupt, and the next one is my company'))

# print(re.findall('compan(?:y|ies)','Too many companies have gone bankrupt, and the next one is my company'))

#re的其他方法

# print(re.findall('ab','abababab123'))

# print(re.search('ab','abababab123').group())

# print(re.search('ab','12aasssdddssssssss3'))

# print(re.search('ab','12aasssdddsssssssab3sssssss').group())

# print(re.search('ab','123ab456'))

# print(re.match('ab','123ab456')) #print(re.search('^ab','123ab456'))

# print(re.split('b','abcde'))

# print(re.split('[ab]','abcde'))

# print(re.sub('alex','SB','alex make love alex alex',1))

# print(re.subn('alex','SB','alex make love alex alex',1))

# print(re.sub('(\w+)(\W+)(\w+)(\W+)(\w+)',r'\5\2\3\4\1','alex make love'))

# print(re.sub('(\w+)( .* )(\w+)',r'\3\2\1','alex make love'))

# obj=re.compile('\d{2}')

# print(obj.search('abc123eeee').group()) #12

# print(obj.findall('abc123eeee')) #12

print(re.findall('\-?\d+\.?\d+',"1-12*(60+(-40.35/5)-(-4*3))"))

print(re.findall('\-?\d+\.?\d*',"1-12*(60+(-40.35/5)-(-4*3))"))

# print(re.findall('\-?\d+\.\d+',"1-12*(60+(-40.35/5)-(-4*3))"))

# print(re.findall('\-?\d+',"1-12*(60+(-40.35/5)-(-4*3))"))

# print(re.findall('\-?\d+\.\d+|(\-?\d+)',"1-12*(60+(-40.35/5)-(-4*3))"))

# print(re.findall('\-?\d+\.\d+|\-?\d+',"1-12*(60+(-40.35/5)-(-4*3))"))

# print(re.findall('\-?\d+|\-?\d+\.\d+',"1-12*(60+(-40.35/5)-(-4*3))"))

内置函数

PYTHON 常用的内置函数

cmp(x, y)

cmp()函数比较 x 和 y 两个对象，并根据比较结果返回一个整数，如果 x<y，则返回-1；如果

x>y，则返回 1,如果 x==y 则返回 0。

len(object) -> integer

len()函数返回字符串和序列的长度。

range([lower,]stop[,step])

range()函数可按参数生成连续的有序整数列表。

xrange([lower,]stop[,step])

xrange()函数与 range()类似，但 xrnage()并不创建列表，而是返回一个 xrange 对象，它的行为

与列表相似，但是只在需要时才计算列表值，当列表很大时，这个特性能为我们节省内存。

float(x)

float()函数把一个数字或字符串转换成浮点数。

hex(x)

hex()函数可把整数转换成十六进制数。

list(x)

list()函数可将序列对象转换成列表。如：

>>> list("hello world") ['h', 'e', 'l', 'l', 'o', ' ', 'w', 'o', 'r', 'l', 'd'] >>> list((1,2,3,4)) [1, 2,

3, 4]

int(x[,base])

int()函数把数字和字符串转换成一个整数，base 为可选的基数。

>>> int(3.3) 3 >>> int(3L) 3 >>> int("13") 13 >>> int("14",15) 19

min(x[,y,z...])

min()函数返回给定参数的最小值，参数可以为序列。

>>> min(1,2,3,4) 1 >>> min((1,2,3),(2,3,4)) (1, 2, 3)

max(x[,y,z...])

max()函数返回给定参数的最大值，参数可以为序列。

>>> max(1,2,3,4) 4 >>> max((1,2,3),(2,3,4)) (2, 3, 4)

str(obj)

str()函数把对象转换成可打印字符串。

>>> str("4") '4' >>> str(4) '4' >>> str(3+2j) '(3+2j)'

tuple(x)

tuple()函数把序列对象转换成 tuple。

zip(seq[,seq,...])

zip()函数可把两个或多个序列中的相应项合并在一起，并以元组的格式返回它们，在处理完最短序

列中的所有项后就停止。

replace(string,old,new[,maxsplit])字符串的替换函数，把字符串中的 old 替换成 new。默认是把 string 中所有的 old 值替换成 new值，如果给出 maxsplit 值，还可控制替换的个数，如果 maxsplit 为 1，则只替换第一个 old 值。

>>>a="11223344" >>>print string.replace(a,"1","one") oneone2223344 >>>print

string.replace(a,"1","one",1) one12223344

split(string,sep=None,maxsplit=-1)

从 string 字符串中返回一个列表，以 sep 的值为分界符。

匿名函数

lambda这个名称来自于LISP，而LISP则是从lambda calculus(一种符号逻辑形式)取这个名称的。在python中，

lambda作为一个关键字，作为引入表达式的语法。想比较def函数，lambda是单一的表达式，而不是语句块!

你仅能够在lambda中封装有限的业务逻辑，这样设计的目的:让lambda纯粹为了编写简单的函数而设计，def则

专注于处理更大的业务。

在编程语言中，函数的应用

1. 代码块重复，这时候必须考虑用到函数，降低程序的冗余度

2. 代码块复杂，这时候可以考虑用到函数，降低程序的可读性

在python,有两种函数，一种是def定义，一种是lambda函数(一种生成函数对象的表达式形式，因为她和LISP语言很相似，所以取名字为lambda函数)

#假如要求两个数之和，用普通函数或匿名函数如下:

1. def func(x,y):return x+y

2. lambda x,y: x+y

转载于:https://blog.51cto.com/fenghaining/1955381

weixin_34015336

关注

0
点赞
踩
0

收藏

觉得还不错? 一键收藏
0
评论
第6章 python 模块使用(二) 匿名函数与内置函数

简介正则表达式（regular expression）是可以匹配文本片段的模式。最简单的正则表达式就是普通字符串，可以匹配其自身。比如，正则表达式 ‘hello’ 可以匹配字符串 ‘hello’。要注意的是，正则表达式并不是一个程序，而是用于处理字符串的一种模式，如果你想用它来处理字符串，就必须使用支持正则表达式的工具，比如 Linux 中的 awk, sed, grep，或...
复制链接

扫一扫