工具类代码

武小胖儿

已于 2022-05-13 11:08:52 修改

阅读量228

点赞数

分类专栏：代码文章标签： python

于 2021-04-09 15:00:17 首次发布

本文链接：https://blog.csdn.net/cleanarea/article/details/100551338

版权

代码专栏收录该内容

16 篇文章 0 订阅

订阅专栏

一、随机数

1.生成若干个特定范围内的随机数

python - Generate 'n' unique random numbers within a range - Stack Overflow

range(start, stop[, step])--> [start,stop)

>>> import random
>>> random.sample(range(1, 100), 3)

二、list

1.集合求差集

python列表求差集，交集，并集等的问题_CGpipelineTD-CSDN博客_python 求差集

list（set（a）- set(b）) 求出在a中，不在b中的元素

2.返回list中某个item的索引

["foo", "bar", "baz"].index("bar")

三、model

1.保存skilearn model

python - Save classifier to disk in scikit-learn - Stack Overflow

import cPickle
# save the classifier
with open('my_dumped_classifier.pkl', 'wb') as fid:
    cPickle.dump(gnb, fid)    

# load it again
with open('my_dumped_classifier.pkl', 'rb') as fid:
    gnb_loaded = cPickle.load(fid)

四、dataframe

1.获取headers name

https://stackoverflow.com/questions/19482970/get-list-from-pandas-dataframe-column-headers
list(data.head(n=0).columns)

2.读取xlsx中不同sheet内容

writer = pd.ExcelFile(file)
s2_pd = pd.read_excel(writer,sheet_name="s2")
p3_pd = pd.read_excel(writer,sheet_name="p3_h")
p2_pd = pd.read_excel(writer,sheet_name="p2_h")
writer.close()

3. 向现有DataFrame加入一列

a = pd.DataFrame({"a":[1,2,3],"b":[1,2,3]})
a['c'] = [1,2,3]

五、常见Linux命令

1.禁用以及解锁用户


sudo usermod -L -e 1 [username] #禁用用户

sudo chage -E -1 <username> #解除禁用


或者
sudo usermod -L 用户名
sudo usermod -U 用户名

六、多项式操作

from sklearn.preprocessing import PolynomialFeatures
import pandas as pd
import numpy as np

data = pd.DataFrame.from_dict({
    'x': np.random.randint(low=1, high=10, size=5),
    'y': np.random.randint(low=-1, high=1, size=5),
})

p = PolynomialFeatures(degree=2).fit(data)
print p.get_feature_names(data.columns)

七、float操作

1. 判断两个float是否相等

fabs(price - p)<1e-5

2.取数组最大的3个数的索引

numpy获得前n大元素下标_Neko-CSDN博客

import heapq
import numpy as np
a = np.array([1,3,2,4,5])
heapq.nlargest(3,range(len(a)), a.take)
Out[5]: [4, 3, 1]

八、常见conda虚拟变量

conda create -n your_env_name python=X.X

九、操作image

获取图片的物理尺寸

python - How to get image size (bytes) using PIL - Stack Overflow

图片存在本地

import os
os.path.getsize('path_to_file.jpg')`

图片在内存中

from io import BytesIO
img_file = BytesIO()
image.save(img_file, 'png')
image_file_size = img_file.tell()

十查看系统配置

10.1 查看系统逻辑CPU个数

Ubuntu 查看cpu个数及核心数_万俟淋曦的进击手记-CSDN博客_ubuntu查看cpu核数

cat /proc/cpuinfo| grep "processor"| wc -l

十一 numpy相关操作

1. 获取最小元素的索引

a = np.array([1,2,3,None])
b = np.where(a==None,np.nan,a)
print(np.nanargmin(b))

十二 pandas效率对比

#=================方案一=====================        
if len(conf_list):
   conf_list = np.array(conf_list)
   thr = np.floor(1000.0 / conf_list[:,2] * conf_list[:,1])
   ins = np.ceil(rate/thr)
   conf_pd = pd.DataFrame({"mps":conf_list[:,0],"batch":conf_list[:,1],"latency":conf_list[:,2],
                                    "thr":thr,"ins":ins,"total_mps":ins * conf_list[:,0] })
   conf_pd = conf_pd[conf_pd['latency']>0]
   # 1. select the conf that consumes the minimum amount of resource
   tmp = conf_pd[conf_pd["ins"] <= self.MAX_INS]
   if len(tmp)>0:
      idx = tmp["total_mps"].idxmin()
      data = tmp.iloc[[idx]]
      result = {"total_mps": data["total_mps"], "mps": data["mps"], "ins_num": data["ins"], "batch": data["batch"], "latency": data["latency"], "thr": data["thr"]}
#==================方案二======================
if len(conf_list) > 0:                     target_mps,target_ins_num,target_batch,target_latency,target_thr,target_total_mps = None,None,None,None,None,None
    for conf in conf_list:
       mps, batch, latency = conf[0], conf[1], conf[2]
       if latency <= 0:
          continue
       thr = math.floor(1000.0 / latency * batch)
       ins = math.ceil(rate / thr)
       if self.MPS_list[0] == 1 and ins > self.MAX_INS:
          continue
          total_mps = ins * mps
          ##print(stage_index,conf,total_mps,mps,ins,thr)
          if target_total_mps == None or total_mps <= target_total_mps:
            target_total_mps = total_mps
            target_batch, target_ins_num, target_latency, target_mps, target_thr = batch, ins, latency, mps, thr
            result = {"total_mps": target_total_mps, "mps": target_mps, "ins_num": target_ins_num, "batch": target_batch, "latency": target_latency, "thr": target_thr}

在数据量不大的情况下，方案二的效率比方案一高。注意，尽量在初始pandas时完成所有column的赋值。

武小胖儿

关注

0
点赞
踩
0

收藏

觉得还不错? 一键收藏
0
评论
工具类代码

一、随机数1.生成若干个特定范围内的随机数https://stackoverflow.com/questions/22842289/generate-n-unique-random-numbers-within-a-rangerange(start, stop[, step])--> [start,stop)>>> import random>&...
复制链接

扫一扫