一、随机数
1.生成若干个特定范围内的随机数
python - Generate 'n' unique random numbers within a range - Stack Overflow
range(start, stop[, step])--> [start,stop)
>>> import random
>>> random.sample(range(1, 100), 3)
二、list
1.集合求差集
python列表求差集,交集,并集等的问题_CGpipelineTD-CSDN博客_python 求差集
list(set(a)- set(b)) 求出在a中,不在b中的元素
2.返回list中某个item的索引
["foo", "bar", "baz"].index("bar")
三、model
1.保存skilearn model
python - Save classifier to disk in scikit-learn - Stack Overflow
import cPickle
# save the classifier
with open('my_dumped_classifier.pkl', 'wb') as fid:
cPickle.dump(gnb, fid)
# load it again
with open('my_dumped_classifier.pkl', 'rb') as fid:
gnb_loaded = cPickle.load(fid)
四、dataframe
1.获取headers name
https://stackoverflow.com/questions/19482970/get-list-from-pandas-dataframe-column-headers
list(data.head(n=0).columns)
2.读取xlsx中不同sheet内容
writer = pd.ExcelFile(file)
s2_pd = pd.read_excel(writer,sheet_name="s2")
p3_pd = pd.read_excel(writer,sheet_name="p3_h")
p2_pd = pd.read_excel(writer,sheet_name="p2_h")
writer.close()
3. 向现有DataFrame加入一列
a = pd.DataFrame({"a":[1,2,3],"b":[1,2,3]})
a['c'] = [1,2,3]
五、常见Linux命令
1.禁用以及解锁用户
sudo usermod -L -e 1 [username] #禁用用户
sudo chage -E -1 <username> #解除禁用
或者
sudo usermod -L 用户名
sudo usermod -U 用户名
六、多项式操作
from sklearn.preprocessing import PolynomialFeatures
import pandas as pd
import numpy as np
data = pd.DataFrame.from_dict({
'x': np.random.randint(low=1, high=10, size=5),
'y': np.random.randint(low=-1, high=1, size=5),
})
p = PolynomialFeatures(degree=2).fit(data)
print p.get_feature_names(data.columns)
七、float操作
1. 判断两个float是否相等
fabs(price - p)<1e-5
2.取数组最大的3个数的索引
import heapq
import numpy as np
a = np.array([1,3,2,4,5])
heapq.nlargest(3,range(len(a)), a.take)
Out[5]: [4, 3, 1]
八、 常见conda虚拟变量
conda create -n your_env_name python=X.X
九、 操作image
获取图片的物理尺寸
python - How to get image size (bytes) using PIL - Stack Overflow
图片存在本地
import os
os.path.getsize('path_to_file.jpg')`
图片在内存中
from io import BytesIO
img_file = BytesIO()
image.save(img_file, 'png')
image_file_size = img_file.tell()
十 查看系统配置
10.1 查看系统逻辑CPU个数
Ubuntu 查看cpu个数及核心数_万俟淋曦的进击手记-CSDN博客_ubuntu查看cpu核数
cat /proc/cpuinfo| grep "processor"| wc -l
十一 numpy相关操作
1. 获取最小元素的索引
a = np.array([1,2,3,None])
b = np.where(a==None,np.nan,a)
print(np.nanargmin(b))
十二 pandas效率对比
#=================方案一=====================
if len(conf_list):
conf_list = np.array(conf_list)
thr = np.floor(1000.0 / conf_list[:,2] * conf_list[:,1])
ins = np.ceil(rate/thr)
conf_pd = pd.DataFrame({"mps":conf_list[:,0],"batch":conf_list[:,1],"latency":conf_list[:,2],
"thr":thr,"ins":ins,"total_mps":ins * conf_list[:,0] })
conf_pd = conf_pd[conf_pd['latency']>0]
# 1. select the conf that consumes the minimum amount of resource
tmp = conf_pd[conf_pd["ins"] <= self.MAX_INS]
if len(tmp)>0:
idx = tmp["total_mps"].idxmin()
data = tmp.iloc[[idx]]
result = {"total_mps": data["total_mps"], "mps": data["mps"], "ins_num": data["ins"], "batch": data["batch"], "latency": data["latency"], "thr": data["thr"]}
#==================方案二======================
if len(conf_list) > 0: target_mps,target_ins_num,target_batch,target_latency,target_thr,target_total_mps = None,None,None,None,None,None
for conf in conf_list:
mps, batch, latency = conf[0], conf[1], conf[2]
if latency <= 0:
continue
thr = math.floor(1000.0 / latency * batch)
ins = math.ceil(rate / thr)
if self.MPS_list[0] == 1 and ins > self.MAX_INS:
continue
total_mps = ins * mps
##print(stage_index,conf,total_mps,mps,ins,thr)
if target_total_mps == None or total_mps <= target_total_mps:
target_total_mps = total_mps
target_batch, target_ins_num, target_latency, target_mps, target_thr = batch, ins, latency, mps, thr
result = {"total_mps": target_total_mps, "mps": target_mps, "ins_num": target_ins_num, "batch": target_batch, "latency": target_latency, "thr": target_thr}
在数据量不大的情况下,方案二的效率比方案一高。注意,尽量在初始pandas时完成所有column的赋值。