python技巧

最新推荐文章于 2024-03-26 17:11:32 发布

bound2020

最新推荐文章于 2024-03-26 17:11:32 发布

阅读量390

点赞数

分类专栏：编程语言文章标签： python

本文链接：https://blog.csdn.net/bound2020/article/details/80079190

版权

编程语言专栏收录该内容

6 篇文章 0 订阅

订阅专栏

以下是我从网上和书上找到的有用并且也是pythonic的技巧，其中大部分都已经用在自己代码里了。大部分都是对于python 2.7，而对于python 3的以后也会另外整理一份。

用位运算判断奇偶性：

def is_even(x):

return False if x & 1 else True

循环时， 使用 while 1 比 while True 更快！

文件相关操作：

os.chdir(path) 改变当前工作目录

os.path.dirname(path) #返回文件路径

os.path.abspath(path) #返回绝对路径

os.path.isdir(dir) #判断是否是个文件夹

os.path.isfile(file) ＃判断是否是文件

处理命令行输入

option_parser = OptionParser()

option_parser.add_option(“-d”, “—date”, dest=“date”, help=“input date”)

option_parser.add_option(“-r”, “—rb”, dest=“rb_num”, help=“rb number”)

(options, args) = option_parser.parse_args()

print options.date, options.rb_num

读取配置文件：

parser = ConfigParser.ConfigParser()

parser.read('./conf.ini')

host, port, user, passwd = [x[1] for x in parser.items(dbname)]

子进程调用方法：

proc = subprocess.Popen(find_cmd, shell=True, stderr=subprocess.PIPE, stdout=subprocess.PIPE) #后面两个pipe不要忘加

cout, cerf = proc.communicate() ＃进程结束才会返回

return_code = proc.poll() #获取进程返回码

sys.modules是一个全局字典，该字典是Python启动后就加载在内存中。每当程序员导入新的模块，sys.modules都将记录这些模块。字典sys.modules对于加载模块起到了缓冲的作用。当某个模块第一次导入，字典sys.modules将自动记录该模块。当第二次再导入该模块时，python会直接到字典中查找，从而加快了程序运行的速度。

log的使用：

import logging

logger = logging.getLogger(__name__)

logger.addHandler(handler)

logger.setLevel(logging.DEBUG)

#通过logging.basicConfig对日志的输出格式及方式做修改， 如：

logging.basicConfig(level=logging.DEBUG,

format='',

datefmt='%a, %d %b %Y %H:%M:%S',

filename='mylog.log'

)

console = logging.StreamHandler()

#设置日志级别

console.setLevel(logging.info)

formatter = logging.Formatter()

comsole.setFormatter(formatter)

logging.getLogger().addHandler(console)

logging.debug('This is a debug message')

logging.info('this is a info message')

urllib2的使用：

import urllib2

url = ''

user_agent = 'Mozilla/5.0 (Linux; Android 5.0; SM-G900P Build/LRX21T) AppleWebKit/537.36 (KHTML, like Gecko)Chrome/48.0.2564.23 Mobile Safari/537.36'

req = urllib2.Request(url)

req.add_header('User-Agent', user_agent) ＃添加报头

response = urllib2.urlopen(req)

page = response.read()

print response.geturl() #如果有重定向，获得最终url

使用代理编写爬虫：

1. 参数是一个字典{'类型':'代理ip：端口'}

proxy_support = urllib2.ProxyHandler({})

2. opener = urllib2.build_opener(proxy_support)

3. opener.open(url)

网页状态码，301永久重定向，302临时重定向，404网页不存在，403网页禁止访问

import cookielib

cookie = cookielib.CookieJar()

opener = urllib2.build_opener(urllib2.HTTPCookieProcessor(cookie))

response = opener.open(url)

某些站点有所谓的反盗链设置，其实就是检查你发送请求的header里面，referer站点是不是他自己， 所以我们只需要像把headers的referer改成该网站即可

with的魔力

with语句需要支持上下文管理协议的对象, 上下文管理协议包含__enter__和__exit__两个方法. with语句建立运行时上下文需要通过这两个方法执行进入和退出操作.

将property与装饰器结合实现属性私有化(更简单安全的实现get和set方法)

class Student(object): #object as the parent class

def __init__(self):

self._score = 0

@property

def score(self): #getter

return self._score

@score.setter

def score(self, value): #setter

#some check

self._score = value

神奇partial

from functools import partial

def sum(a, b):

return a + b

def test():

func = partial(sum, 2) #将sum的第一个参数绑定为2

print func(3) #sum的第二个参数为3

python的unicode和中文转换，这个很容易搞错

python2: str->byte: encode

byte->str: decode

python2中的str其实是byte，所以要用decode方法打印出真正的内容

在代码文件的开头要定义encode的方式：# -*- coding: utf-8 -*-

import技巧：

用__import__动态加载模块，比如事先不知道模块的名字，或者加载配置文件中字符形式的模块

__import__ (name[, globals[, locals[, fromlist[, level]]]])

name (required): 被加载 module 的名称

globals (optional): 包含全局变量的字典，该选项很少使用，采用默认值 global()

locals (optional): 包含局部变量的字典，内部标准实现未用到该变量，采用默认值 local()

fromlist (Optional): 被导入的 submodule 名称

level (Optional): 导入路径选项，默认为 -1，表示同时支持 absolute import 和 relative import

计算向量的余弦相似度：

Module1 = np.linalg.norm(a)

Module2 = np.linalg.norm(b)

inner_prod = np.dot(a, b)

Cos_sim = inner_prod / (module1 * module2)

使用默认字典

# 好处是放入新的key时不用判断是否已存在，直接用d[key].append(val)即可

d = collections.defaultdict(lambda:[])

namedtuple

cards = collections.namedtuple('cards', ['color', 'suit'])

zip的妙用

# 每三个生成一个tuple

l = [3, 1, 4, 1, 5, 9, 2, 6, 5, 3, 5, 8]

zip(*[iter(l)] *3)

pow(x, y, z) #(x^y) mod z

使用itertools.chain()将二维列表打散成一维的

>>> from itertools import *

>>> l = [[1, 2], [3, 4]]

>>> list(chain(*l))

[1, 2, 3, 4]

Finding the most frequent element in a list, x:

max(set(x), key=x.count)

一句话生成n-gram

ngrams = zip(*[words[i:] for i in range(n)])

A fast way to compute large fibonacci numbers using generators:

def fibonacci_generator():

1. a,b = 0,1

2. while True:

3. yield a

4. a,b = b, a+b

f = fibonacci_generator()

fib_nums = [f.next() for i in xrange(300)

from operator import itemgetter

metro_data = [

... ('Tokyo' , 'JP', 36.933, ( 35.689722 , 139.691667 )),

... ('Delhi NCR' , 'IN', 21.935, ( 28.613889 , 77.208889 )),

... ('Mexico City' , 'MX', 20.142, ( 19.433333 , - 99.133333 )),

... ('New York-Newark' , 'US', 20.104, ( 40.808611 , - 74.020386 )),

... ('Sao Paulo' , 'BR', 19.649, ( -23.547778, -46.635833)),

... ]

# 对字典排序，当然也可以用lambda

sorted(metro_data, key=itemgetter(0)) #按照第一个字段进行排序，即城市名

例如：按照字典的value进行排序

sorted(dict.items(), key=itemgetter(1), reverse=True)

cc_name = itemgetter(1, 0)

for city in metro_data:

print cc_name(city)

阶乘计算方法：

def fact(n):

return reduce(lambda a,b: a*b, range(1, n+1))

y = np.empty((10, 1))

y.fill(value) #申请一个10*1的向量，值填充为value

python 获取当前时间 time.strftime( ISOTIMEFORMAT, time.localtime() )

计算时间差，其中入参为seconds：

date1 = time.strptime(time.ctime(float(c[3])))

date1 = datetime.datetime(date1[0],date1[1],date1[2],date1[3],date1[4],date1[5])

date2 = time.strptime(time.ctime(float(a[6])))

date2 = datetime.datetime(date2[0],date2[1],date2[2],date2[3],date2[4],date2[5])

print (date2-date1).seconds

检查一个文件夹是否存在,如果不存在就创建它:

if not os.path.exists(directory):

os.makedirs(directory)

from subprocess import check_output # 父进程等待子进程完成
np.array_equal(a, b) #判断两个数组大小和元素是否相等

import seaborn as sns #seaborn也是数据可视化工具包，接受dataframe格式的数据

tips = sns.load_dataset("tip")

g = sns.FacetGrid(tips, col="time")

The main approach for visualizing data on this grid is with the FacetGrid.map() method. Provide it with a plotting function and the name(s) of variable(s) in the dataframe to plot

g.map(plt.hist, "tip")

docopt根据你写的文档描述，可以自动为你生成解析器，可以非常容易的为你的python程序创建命令行界面（Command Line Interface，CLI）。
the operator == compares the values of objects, while is compares their identities

The presence of references is what keeps an object alive in memory. When the reference count of an object reaches zero, the garbage collector disposes of it.

repr()

Return a string representing the object as the developer wants to see it.

str()

Return a string representing the object as the user wants to see it.