Python 各种方面的技巧片段

最新推荐文章于 2025-08-07 15:08:44 发布

fengye515

最新推荐文章于 2025-08-07 15:08:44 发布

阅读量993

点赞数

CC 4.0 BY-SA版权

文章标签： python 正则表达式 numbers string exception returning

本文链接：https://blog.csdn.net/fengye515/article/details/3890740

本文档详述了Python中正则表达式的使用方法，包括特殊字符的意义、常用方法及函数，并提供了多个实用代码示例，如提取网页链接、处理浮点数输出格式等。

摘要生成于 C知道，由 DeepSeek-R1 满血版支持，前往体验 >

转自： http://wiki.woodpecker.org.cn/moin/PyTips

1. 各种实用代码片段

1.1. 正则表达式使用

正在使用正则表达式，随手翻译了一正python的文档

::-- ZoomQuiet [2005-04-28 04:15:10]

日期: 2005-4-28 上午11:08
主题: [python-chinese] 正在使用正则表达式，随手翻 译了一正python的文档
回复 | 回复所有人 | 转发 | 打印 | 将发件人添加到通讯录 | 删除该邮件 | 这是网络欺诈 | 显示原始邮件
大部分与其它语言中的规则一致，但是也有部分不同的地方，手头有个工作要用到正则表达式，就随手翻译了一了python的帮助文档。组织的不是很正规。看懂是没有问题的。

###########################################################
特殊字符:
###########################################################
   "."      匹配除 "/n" 之外的任何单个字符。要匹配包括 '/n' 在内的任何字符，请使用象 '[./n]' 的模式。
   "^"      匹配输入字符串的开始位置。
   "$"      匹配输入字符串的结束位置。
   "*"      匹配前面的子表达式零次或多次。例如，zo* 能匹配 "z" 以及"zoo"。 * 等价于{0,}。 Greedy means 贪婪的
   "+"      匹配前面的子表达式一次或多次。例如，'zo+' 能匹配 "zo" 以及 "zoo"，但不能匹配 "z"。+ 等价于 {1,}。
   "?"      匹配前面的子表达式零次或一次(贪婪的)
   *?,+?,?? 前面三个特殊字符的非贪婪版本
   {m,n}    最少匹配 m 次且最多匹配 n 次(m 和 n 均为非负整数，其中m <= n。)
   {m,n}?   上面表达式的非贪婪版本.
   "//"      Either escapes special characters or signals a special sequence.
   []       表示一个字符集合，匹配所包含的任意一个字符
            第一个字符是 "^" 代表这是一个补集
   "|"      A|B, 匹配 A 或 B中的任一个
   (...)    Matches the RE inside the parentheses（圆括号）.（匹配pattern 并获取这一匹配）
            The contents can be retrieved（找回） or matched later in the string.
   (?iLmsux) 设置 I, L, M, S, U, or X 标记 (见下面).
   (?:...)  圆括号的非成组版本.
   (?P<name>...) 被组（group）匹配的子串，可以通过名字访问
   (?P=name) 匹配被组名先前匹配的文本（Matches the text matched earlier by the
group named name.）
   (?#...)  注释；被忽略.
   (?=...)  Matches if ... matches next, but doesn't consume the
string（但是并不消灭这个字串.）
   (?!...)  Matches if ... doesn't match next.

The special sequences consist of "//" and a character from the list
below.  If the ordinary character is not on the list, then the
resulting RE will match the second character.
   /number  Matches the contents of the group of the same number.
   /A       Matches only at the start of the string.
   /Z       Matches only at the end of the string.
   /b       Matches the empty string, but only at the start or end of a word
                                       匹配一个空串但只在一个单词的开始或者结束的地方.匹配单词的边界
   /B       匹配一个空串, 但不是在在一个单词的开始或者结束的地方.（匹配非单词边界）
   /d       匹配一个数字字符。等价于 [0-9]。
   /D       匹配一个非数字字符。等价于 [^0-9]。
   /s       匹配任何空白字符，包括空格、制表符、换页符等等。等价于[ /f/n/r/t/v]。
   /S       匹配任何非空白字符。等价于 [^ /f/n/r/t/v]。
   /w       匹配包括下划线的任何单词字符。等价于'[A-Za-z0-9_]'.
            With LOCALE, it will match the set [0-9_] plus characters defined
            as letters for the current locale.
   /W       匹配/w的补集（匹配任何非单词字符。等价于 '[^A-Za-z0-9_]'。）
   //       匹配一个"/"(反斜杠)

##########################################################
共有如下方法可以使用：
##########################################################
   match    从一个字串的开始匹配一个正则表达式
   search   搜索匹配正则表达式的一个字串
   sub      替换在一个字串中发现的匹配模式的字串
   subn     同sub，但是返回替换的个数
   split    用出现的模式分割一个字串
   findall  Find all occurrences of a pattern in a string.
   compile  把一个模式编译为一个RegexObject对像.
   purge                       清除正则表达式缓存
   escape   Backslash（反斜杠）all non-alphanumerics in a string.

Some of the functions in this module takes flags as optional parameters:
   I  IGNORECASE  Perform case-insensitive matching.（执行大小写敏感的匹配）
   L  LOCALE      Make /w, /W, /b, /B, dependent on the current locale.
   M  MULTILINE   "^" matches the beginning of lines as well as the string.
                  "$" matches the end of lines as well as the string.
   S  DOTALL      "." matches any character at all, including the newline（换行符）.
   X  VERBOSE     Ignore whitespace and comments for nicer looking RE's.
   U  UNICODE     Make /w, /W, /b, /B, dependent on the Unicode locale.

This module also defines an exception 'error'.

compile(pattern, flags=0)
返回一个模式对像
Compile a regular expression pattern, returning a pattern object.

escape(pattern)
Escape all non-alphanumeric characters in pattern.

findall(pattern, string)
如果出现一个或多个匹配，返回所有组的列表；这个列表将是元组的列表。
空匹配也在返回值中
Return a list of all non-overlapping（不相重叠的） matches in the string.
If one or more groups are present in the pattern, return a
list of groups; this will be a list of tuples if the pattern
has more than one group.
Empty matches are included in the result.

finditer(pattern, string)
返回一个指示器（iterator）；每匹配一次，指示器返回一个匹配对像。
空匹配也在返回值中
Return an iterator over all non-overlapping matches in the
string.  For each match, the iterator returns a match object.
Empty matches are included in the result.

match(pattern, string, flags=0)
返回一个匹配的对像，如果没有匹配的，返回一个None
Try to apply the pattern at the start of the string, returning
a match object, or None if no match was found.

purge()
Clear the regular expression cache

search(pattern, string, flags=0)
返回一个匹配的对像，如果没有匹配的，返回一个None
Scan through string looking for a match to the pattern, returning
a match object, or None if no match was found.

split(pattern, string, maxsplit=0)
返回一个包含结果字串的列表
Split the source string by the occurrences of the pattern,
returning a list containing the resulting substrings.

sub(pattern, repl, string, count=0)
返回一个字串，最左边被不重叠的用"repl"替换了。
Return the string obtained by replacing the leftmost
non-overlapping occurrences of the pattern in string by the
replacement repl

subn(pattern, repl, string, count=0)
返回一个包含(new_string, number)的2元组；number是替换的次数
Return a 2-tuple containing (new_string, number).
new_string is the string obtained by replacing the leftmost
non-overlapping occurrences of the pattern in the source
string by the replacement repl.  number is the number of
substitutions that were made.

template(pattern, flags=0)
返回一个模式对像
Compile a template pattern, returning a pattern object

_______________________________________________
python-chinese list
python-chinese@lists.python.cn
http://python.cn/mailman/listinfo/python-chinese

{ PyRe}

1.2. 自动检查md5sums

From: steve <lau.siyuan@gmail.com>

Toggle line numbers Toggle line numbers

   1 #! /usr/local/bin/python
   2 
   3 import commands
   4 file = raw_input("Enter the filename: ")
   5 sum = raw_input("Enter the md5sum: ")
   6 md = "md5sum " + file
   7 print md
   8 check = str(commands.getoutput(md))
   9 checksum = sum + "  " + file
  10 #print checksum
  11 print check
  12 if check == checksum: print "Sums OK"
  13 else: print "Sums are not the same!"

1.3. 提取网页中的超链接

Toggle line numbers Toggle line numbers

   1 r='<a(?:(?://s*.*?//s)|(?://s+))href=(?P<url>/S*?)(?:(?://s.*>)|(?:>)).*?</a>'
   2 compile(r).findall(a)

这个是hoxide和天成讨论出来的方法,用来提取网页中的超链接.

1.4. 解决在 Python 中登录网站的问题

刚刚看了xyb的代码，有点启发。 
写了一小段试了以下，可以登录了。呵呵。 
import httplib 
import urllib 
user=? 
pwd=? 
params=urllib.urlencode({"Loginname":user,"Loginpass":pwd,"firstlogin":1,"option":"登入论坛"}) 
headers={"Accept":"text/html","User-Agent":"IE","Content-Type":"application/x-www-form-urlencoded"} 
website="www.linuxforum.net" 
path="/forum/start_page.php" 
conn=httplib.HTTPConnection(website) 
conn.request("POST",path,params,headers) 
r=conn.getresponse() 
print r.status,r.reason 
data=r.read() 
print data 
conn.close() 

不知从form submit数据和直接提交request有些什么区别？

中国Linux论坛
由xyb总结:PythonClientCookie

1.5. 浮点数的输出格式

>>> a=6200-6199.997841
>>> a
0.0021589999996649567
>>> print "%f"%a
0.002159
>>> import fpformat
>>> fpformat.fix(a, 6)
'0.002159'
>>> print fpformat.fix(a, 6)
0.002159
>>> print "%.6f"%a
0.002159
>>> print "%.7f"%a
0.0021590
>>> print "%.10f"%a
0.0021590000
>>> print "%.5f"%a
0.00216

1.6. 怎么下载网络上的一张图片到本地

>知道了一张图片的URL >比如http://www.yahoo.com/images/logo.gif >想将它下载到本地保存应该怎么实现?

Toggle line numbers Toggle line numbers

   1 urllib.urlretrieve(url, filename)

---Limodou

1.7. 使用locale判断本地语言及编码

from::limodou的学习记录

在支持unicode软件中，经常需要在不同的编码与unicode之间进行转换。

那么对于一个本地文件的处理，首先要将其内容读出来转换成unicode编码，在软件中处理完毕后，再保存为原来的编码。

如果我们不知道文件的确切编码方式，可以使用默认的编码方式。那么我们可以使用locale模块来判断默认编码方式。

>>>import locale
>>>print locale.getdefaultlocale()
('zh_CN', 'cp936')

可以看出，我的机器上默认语言是简体中文，编码是GBK。

1.8. new的使用

from: 中国Linux论坛 -rings

new

new是python里object的方法。如果你要重载new，那么你需要继承object。 new是类方法。他不带self参数。 new和init是不一样的。init带 self参数。所以他是在对象已经被构造好了以后被调用的。而如果你要在对象构造的时候做一些事情，那么就需要使用new。new的返回值必须是对象的实例。 new一般在一些模式里非常有用。我们看一个例子。这个例子是《thinking in python》里的一个Singleton例子

class OnlyOne(object): 
    class __OnlyOne: 
        def __init__(self): 
            self.val = None 
        def __str__(self): 
            return ′self′ + self.val 
            
        instance = None 
        def __new__(cls): # __new__ always a classmethod 
            if not OnlyOne.instance: 
            OnlyOne.instance = OnlyOne.__OnlyOne() 
            return OnlyOne.instance 
        def __getattr__(self, name): 
            return getattr(self.instance, name) 
        def __setattr__(self, name): 
            return setattr(self.instance, name) 

x = OnlyOne() 
x.val = 'sausage' 
print x 
y = OnlyOne() 
y.val = 'eggs' 
print y 
z = OnlyOne() 
z.val = 'spam' 
print z 
print x 
print y

我们可以看到OnlyOne从object继承而来。

如果你不继承object，那么你的 new就不会在构造的时候来调用。

当x = OnlyOne()的时候，其实就是调用new(OnlyOne), 每次实例化OnlyOne 的时候都会调用。

因为他是类方法。

所以这段代码就是利用这个特性来实现Singleton的。

因为不管构造多少对象，都要调用new.

那么在OnlyOne里保持一个类的属性， instance.

他代表嵌套的_OnlyOne的实例。

所以，对于他，我们只构造一次。

以后每次构造的时候都是直接返回这个实例的。

所以，在这里， x,y,z 都是同一个实例。

这个方法和典型的用C++ 来实现 Singleton的道理是一样的。

1.9. traceback 的处理

from::Limodou的学习记录

trackback在 Python 中非常有用，它可以显示出现异常(Exception)时代码执行栈的情况。但当我们捕捉异常，一般是自已的出错处理，因此代码执行栈的信息就看不到了，如果还想显示的话，就要用到traceback模块了。

这里只是简单的对traceback模块的介绍，不是一个完整的说明，而且只是满足我个人的要求，更详细的还是要看文档。

打印完整的traceback

让我们先看一个traceback的显示：

>>> 1/0

Traceback (most recent call last):
  File "", line 1, in -toplevel-
    1/0
ZeroDivisionError: integer division or modulo by zero

可以看出 Python 缺省显示的traceback有一个头：第一行，出错详细位置：第二、三行，异常信息：第四行。也就是说分为三部分，而在traceback可以分别对这三部分进行处理。不过我更关心完整的显示。

在traceback中提供了print_exc([limit[, file]])函数可以打印出与上面一样的效果。 limit参数是限定代码执行栈的条数，file参数可以将traceback信息输出到文件对象中。缺省的话是输出到错误输出中。举例：

>>> try:
    1/0
except:
    traceback.print_exc()
 
Traceback (most recent call last):
  File "", line 2, in ?
ZeroDivisionError: integer division or modulo by zero

当出现异常sys.exc_info()函数会返回与异常相关的信息。如：

>>> try:
    1/0
except:
    sys.exc_info()

(<class exceptions.ZeroDivisionError at 0x00BF4CC0>, 
<exceptions.ZeroDivisionError instance at 0x00E29DC8>, 
<traceback object at 0x00E29DF0>)

sys.exc_info()返回一个tuple，异常类，异常实例，和traceback。

print_exc()是直接输出了，如果我们想得到它的内容，如何做？使用 format_exception(type, value, tb [,limit])，type, value, tb分别对应 sys.exc_info()对应的三个值。如：

>>> try:
    1/0
except:
    type, value, tb = sys.exc_info()
 print traceback.format_exception(type, value, tb)

['Traceback (most recent call last):/n', '  File "", line 2, in ?/n', 
'ZeroDivisionError: integer division or modulo by zero/n']

这样，我们知道了format_exception返回一个字符串列表，这样我们就可以将其应用到我们的程序中了。

1.10. os.walk()的用法, 修改cvsroot

重装系统, windows盘符大乱, 原来是'e:/cvsroot'现在变为'g:/cvsroot', 众多由cvs管理的目录无法正常工作了. python脚本出动:

Toggle line numbers Toggle line numbers

   1 import os
   2 from os.path import join, getsize
   3 import sys
   4 
   5 print sys.argv[1]
   6 for root, dirs, files in os.walk(sys.argv[1]):
   7     if 'CVS' in dirs:
   8         fn = join(root+'/CVS', 'ROOT')
   9         print root+' :', fn
  10         #dirs.remove('CVS')  # don't visit CVS directories
  11         f = open(fn,'r')
  12         r = f.read()
  13         print r
  14         f.close()
  15         if r.startswith('e:/cvsroot'):
  16             open(fn, 'w').write('g:/cvsroot')
  17             f = open(fn,'r')
  18             r = f.read()
  19             print r
  20             f.close()

2. Python多进程处理之参考大全

* PyCourse --from

http://blog.huangdong.com (即将成为历史的HD的个人blog，大家默哀)

3. 将你的Python脚本转换为Windows exe程序

from:: http://blog.huangdong.com (即将成为历史的HD的个人blog，大家默哀)

将Python的脚本变为一个可以执行的Windows exe程序可能的好处会说出很多，我最喜欢的则是它会让你写的程序更像是一个“程序”罢。但是，凡事有利就有弊，这样必然会让python的一些好处没有了。

你可以从这里找到py2exe的相关信息，可以在这里下载到py2exe-0.4.2.win32-py2.3.exe安装包。但是它的使用也还是比较麻烦的，需要你自己手工的写一个小的脚本，就像这样：

Toggle line numbers Toggle line numbers

   1 # setup.py
   2 from distutils.core import setup
   3 import py2exe
   4 
   5 setup(name="myscript",
   6 scripts=["myscript.py"],
   7 )

再通过python的执行：

python setup.py py2exe

来使用。更多的信息上它的网站看罢。

4. 使用 WinAPI 的例子

/PyWinApi -- 简单范例

5. 在函数中确定其调用者！

6. Python哲学--内省的威力

AlbertLee
Xie Yanbo 引发
Remember, Python comes with batteries included!
PyBatteriesIncluded -- 使用内省的功能，获得丰富的信息

7. 在正则表达式中嵌入注释时的陷阱

如下代码所示：

s = 'create table testtable'
>>> p =  r"""
^create/ table   # create table
/s*                 # whitespace
([a-zA-Z]*)      # table name
$                   # end
"""
>>> re.compile(p, re.VERBOSE).match(s).groups()
('testtable',)
>>>

如果在create和table之间没有那个转义的空格，即/ ,在re.VERBOSE 的时候，就会将那个空格忽略掉，因此变成是匹配createtable了，这样他就会匹配不到了

8. python写的数字转中文的程序

源于qq上Jaina(16009966)的提问. 花了一个晚上实现了一下, 基本想法是4位为一个断, 用conv4转换, 然后再用conv组合之. 程序在Windows2003, python2.4下调试通过. 注意编码问题.

Toggle line numbers Toggle line numbers

   1 # coding:utf-8
   2 
   3 UUNIT=[u'', u'十' , u'百' , u'千']
   4 BUINT = [u'', u'万', u'亿', u'万亿' , u'亿亿']
   5 NUM=[u'零',u'一',u'二', u'三', u'四', u'五' , u'六', u'七', u'八', u'九']
   6 
   7 def conv4(num, flag=False):
   8    ret = u''
   9    s = str(num)
  10    l = len(s)
  11    assert(len(s) <= 4)
  12    if flag and len(s)<4:
  13       ret = ret + NUM[0]
  14    for i in xrange(l):
  15       if s[i] != '0':
  16          ret = ret + NUM[int(s[i])]+UUNIT[l-i-1]
  17       elif s[i-1] != '0':
  18          ret = ret + NUM[0]
  19    return ret
  20 
  21 def conv(num):
  22    ss = str(num)
  23    l = len(ss)
  24    j = l / 4
  25    jj = l % 4
  26    lss = [ss[0:jj] for i in [1] if ss[0:jj]] /
       + [ss[i*4+jj:(i+1)*4+jj] for i in xrange(j) if ss[i*4+jj:(i+1)*4+jj] ]
  27    print lss
  28    ul = len(lss)
  29    ret = u''
  30    zflag = False
  31    for i in xrange(ul):
  32       bu = BUINT[ul-i-1]
  33       tret = conv4(int(lss[i]), flag = i)
  34       if tret[-1:] == NUM[0]:
  35          tret = tret[:-1]
  36       if tret:
  37          print zflag , (tret+bu).encode('mbcs')
  38          if zflag and tret[0] != NUM[0] :
  39             ret = ret + NUM[0] +tret+bu
  40          else:
  41             ret = ret + tret+bu
  42          zflag = False
  43       else:
  44          zflag = True
  45    return ret
  46 
  47 if __name__ == '__main__':
  48    #print conv(11111)
  49    print conv(103056).encode('mbcs')
  50    print conv(101000).encode('mbcs')
  51    print conv(1200999100000000010).encode('mbcs')