Python
winner8881
这个作者很懒,什么都没留下…
展开
-
python可视化
python可视化1.参数控制>> 1.图像大小控制:plt.figure(figsize=(20,20))>> 2.图像label名称控制:plt.plot(tmp,label=col)>> 3.图像label大小和显示位置控制(缺省无显示):plt.legend(loc='center',fontsize=25)2.举个栗子import matplotlib.pyplot as pltarr_plot=[]plt.figure(figsize=(20原创 2021-03-24 11:50:35 · 86 阅读 · 0 评论 -
数据挖掘-众数
# 众数def get_mode(arr): mode = [] arr_appear = dict((a, arr.count(a)) for a in arr) # 统计各个元素出现的次数 if max(arr_appear.values()) == 1: # 如果最大的出现为1 return # 则没有众数 else: ...原创 2019-08-10 00:10:10 · 371 阅读 · 0 评论 -
python-常用写法(Updating)
1、mapmap()是 Python 内置的高阶函数,它接收一个==函数 f ==和一个 list,并通过把函数 f 依次作用在 list 的每个元素上,得到一个新的 list 并返回。原创 2019-08-14 09:51:01 · 150 阅读 · 0 评论 -
数据挖掘-特征差异性编码
差异性编码快速写法1、取set()2、建pd.dataframe格式3、merge()arrs = ['adidmd5', 'imeimd5', 'macmd5', 'openudidmd5', 'ip']val = []for i in range(len(arrs)): val.append(list(set(train[arrs[i]].unique()) & s...原创 2019-08-13 12:27:39 · 356 阅读 · 0 评论 -
数据挖掘-统计特征
在def cnt_fea(data,feature,train_num): data['flag'] = '-' for fea in feature: print(fea) data[fea] = data[fea].map(data[fea].value_counts()) for i in range(len(feature)-1):...原创 2019-08-22 13:19:32 · 515 阅读 · 0 评论 -
数据挖掘-ctr特征
def ctr_fea(train,test,feature): for fea in feature: print(fea) temp = train[['label',fea]].groupby(fea)['label'].agg({fea+'_sum':sum, ...原创 2019-08-22 13:19:46 · 762 阅读 · 0 评论 -
ip处理
import numpy as npa=np.load('ip_explain_by_geoip2_china.npy',allow_pickle=True)ip_exp=a.item() temp = pd.DataFrame(list(ip_exp.items()), columns=['ip', 'ip_exp'])temp[['country','province_exp','c...原创 2019-08-13 12:26:43 · 151 阅读 · 0 评论 -
数据挖掘-geoip2工具
import geoip2.databaseimport sys # ip = input()ip = '210.32.149.0'reader = geoip2.database.Reader('./GeoLite2-City.mmdb')data = reader.city(ip)def ip_explain(ip): data = reader.city(ip) ...原创 2019-08-13 12:26:09 · 183 阅读 · 0 评论 -
数据挖掘-训练集、测试集绘制&保存
# train = data[data.label!=-1]# test = data[data.label==-1]# train = train.dropna()# test = test.dropna()# # for i in data.columns:# for i in ['city','lan', 'os', 'osv', 'ver', 'orientation', 'ca...原创 2019-08-02 16:56:37 · 478 阅读 · 0 评论 -
数据挖掘-正负样本绘制&保存
# train_pos = data[data['label']==1]# train_neg = data[data['label']==0]# train = train.dropna()# test = test.dropna()# for i in ['city','lan', 'os', 'osv', 'ver', 'orientation', 'carrier', 'ntt',...原创 2019-08-02 16:55:28 · 372 阅读 · 0 评论 -
数据挖掘-feature_importanct
# 特征重要性import matplotlib.pyplot as pltimport seaborn as snscols = (feature_importance_data[["feature", "importance"]] .groupby("feature") .mean() .sort_values(by="importance...原创 2019-08-02 16:51:06 · 273 阅读 · 0 评论 -
欺诈黑名单获取
import numpy as np# a=np.load('ip_dict.npy',allow_pickle=True)# data=a.item() temp = train[['ip','label']].groupby('ip')['label'].agg({'mean_label':'mean','count_label':'count','sum_label':'sum'}...原创 2019-08-12 14:05:55 · 148 阅读 · 0 评论 -
数据挖掘-常见写法(持续更...)
1、排序train_new.sort_values(by='imeimd5')train_new.sort_values(by='imeimd5')['imeimd5'].max()train_ime = train_new['imeimd5'].unique()2、迭代器进度条:tqdmtqdmcnt = 0for i in tqdm.tqdm_notebook(test_ne...原创 2019-08-13 12:25:46 · 198 阅读 · 0 评论 -
数据挖掘-去长尾操作
# def cut_col(data, col_name, cut_list):# print('cutting', col_name)# def _trans(array):# count = array['box_counts']# for box in cut_list:# if count <= bo...原创 2019-08-02 16:45:09 · 470 阅读 · 0 评论 -
根据ip获取信息
根据ip获取信息import requestsimport IPy def get_location(ip): url = 'https://sp0.baidu.com/8aQDcjqpAAV3otqbppnN2DJv/api.php?co=&resource_id=6006&t=1529895387942&ie=utf8&oe=gbk&c...原创 2019-08-02 16:43:01 · 439 阅读 · 0 评论 -
Python3矩阵转置
代码a = [ [1,3,1], [1,5,1], [4,2,1]]print(a)print(list(map(list,zip(*a))))print(zip(*a))print(list(zip(*a)))print(tuple(zip(*a)))运行结果:原创 2019-03-20 19:00:50 · 1972 阅读 · 0 评论 -
pythonic风格操作
字符串列表按照字典排序d = ["ale","apple","monkey","plea"]d = sorted(d, key = lambda x :str(x))字符串列表按照字符串长度排序d = ["ale","apple","monkey","plea"]d = sorted(d, key = lambda x :len(x))...原创 2019-03-19 15:44:01 · 131 阅读 · 0 评论