一、赛题理解及baseline
来源
Datewhle23期__数据挖掘心跳检测 :
https://github.com/datawhalechina/team-learning-data-mining/tree/master/HeartbeatClassification
作者:鱼佬、杜晓东、张晋、王皓月、牧小熊、姚昱君、杨梦迪
论坛地址:http://datawhale.club/t/topic/1574
import pandas as pd
import numpy as np
1.1 赛题理解
-
题意
心电图感应数据→预测类别→多分类问题 -
数据集
10万训练,2万测试数据特征及描述:
Field | Description |
---|---|
id | 为心跳信号分配的唯一标识 |
heartbeat_signals | 心跳信号序列(数据之间采用“,”进行分隔) |
label | 心跳信号类别(0、1、2、3) |
- 评价指标
设:
4种不同心跳信号预测的概率: [ a 1 , a 2 , a 3 , a 4 ] \left[ a_1,a_2,a_3,a_4 \right] [a1,a2,a3,a4]
4种不同心跳信号真实值编码: [ y 1 , y 2 , y 3 , y 4 ] \left[y_1,y_2,y_3,y_4 \right] [y1,y2,y3,y4]
求平均指标: abs -sum (越小越好)
a b s − s u m = ∑ j = 1 n ∑ i = 1 4 ∣ y i − a i ∣ {abs-sum={\mathop{ \sum }\limits{{j=1}}^{{n}}{{\mathop{ \sum }\limits{{i=1}}^{{4}}{{ \left| {y\mathop{{}}\nolimits{{i}}-a\mathop{{}}\nolimits{{i}}} \right| }}}}}} abs−sum=∑j=1n∑i=14∣yi−ai∣
1.2 baseline
–仅部分代码 供作笔记–
#简单预处理
train_list = []
for items in train.values:
train_list.append([items[0]] + [
float(i) for i in items[1].split(',')] +[
items[2]])
train = pd.DataFrame(np.array(train_list))
train.columns = ['id'] + ['heart_' + str(i) for i in range(
len(train_list[0])-2)] + ['label']
train = reduce_mem_usage(train) # reduce_mem_usage 函数通过调整数据类型,帮助我们减少数据在内存中占用的空间
id heart_0 heart_1 heart_2 heart_3 heart_4 heart_5 heart_6 heart_7 heart_8 heart_9 heart_10 heart_11 heart_12 heart_13 heart_14 heart_15 heart_16 heart_17 heart_18 heart_19 heart_20 heart_21 heart_22 heart_23 heart_24 heart_25 heart_26 heart_27 heart_28 heart_29 heart_30 heart_31 heart_32 heart_33 heart_34 heart_35 heart_36 heart_37 heart_38 heart_39 heart_40 heart_41 heart_42 heart_43 heart_44 heart_45 heart_46 heart_47 heart_48 heart_49 heart_50 heart_51 heart_52 heart_53 heart_54 heart_55 heart_56 heart_57 heart_58 heart_59 heart_60 heart_61 heart_62 heart_63 heart_64 heart_65 heart_66 heart_67 heart_68 heart_69 heart_70 heart_71 heart_72 heart_73 heart_74 heart_75 heart_76 heart_77 heart_78 heart_79 heart_80 heart_81 heart_82 heart_83 heart_84 heart_85 heart_86 heart_87 heart_88 heart_89 heart_90 heart_91 heart_92 heart_93 heart_94 heart_95 heart_96 heart_97 heart_98 heart_99 heart_100 heart_101 heart_102 heart_103 heart_104 heart_105 heart_106 heart_107 heart_108 heart_109 heart_110 heart_111 heart_112 heart_113 heart_114 heart_115 heart_116 heart_117 heart_118 heart_119 heart_120 heart_121 heart_122 heart_123 heart_124 heart_125 heart_126 heart_127 heart_128 heart_129 heart_130 heart_131 heart_132 heart_133 heart_134 heart_135 heart_136 heart_137 heart_138 heart_139 heart_140 heart_141 heart_142 heart_143 heart_144 heart_145 heart_146 heart_147 heart_148 heart_149 heart_150 heart_151 heart_152 heart_153 heart_154 heart_155 heart_156 heart_157 heart_158 heart_159 heart_160 heart_161 heart_162 heart_163 heart_164 heart_165 heart_166 heart_167 heart_168 heart_169 heart_170 heart_171 heart_172 heart_173 heart_174 heart_175 heart_176 heart_177 heart_178 heart_179 heart_180 heart_181 heart_182 heart_183 heart_184 heart_185 heart_186 heart_187 heart_188 heart_189 heart_190 heart_191 heart_192 heart_193 heart_194 heart_195 heart_196 heart_197 heart_198 heart_199 heart_200 heart_201 heart_202 heart_203 heart_204 label
0 0.0 0.991211 0.943359 0.764648 0.618652 0.379639 0.190796 0.040222 0.026001 0.031708 0.065552 0.125488 0.146729 0.167603 0.193359 0.226074 0.221191 0.236084 0.221191 0.221191 0.211060 0.208618 0.193359 0.195923 0.198486 0.185669 0.195923 0.183105 0.193359 0.190796 0.208618 0.221191 0.250732 0.260498 0.277588 0.294189 0.303711 0.336426 0.347900 0.384033 0.386230 0.408447 0.410645 0.426025 0.426025 0.429199 0.432373 0.427979 0.427979 0.412842 0.415039 0.397461 0.401855 0.388672 0.404053 0.395264 0.398438 0.401855 0.417236 0.434570 0.430176 0.432373 0.415039 0.434570 0.426025 0.397461 0.356934 0.345703 0.334229 0.317871 0.320068 0.317871 0.334229 0.324951 0.334229 0.324951 0.336426 0.327148 0.331787 0.336426 0.327148 0.334229 0.334229 0.345703 0.343262 0.354736 0.334229 0.334229 0.406250 0.495850 0.580078 0.692383 0.828613 0.946777 1.000000 0.978027 0.853027 0.692383 0.518555 0.265381 0.180542 0.090393 0.000000 0.011612 0.014496 0.076599 0.117493 0.154663 0.183105 0.216187 0.208618 0.212402 0.216187 0.208618 0.000000 0.000000 0.000000 0.000000 0.000000 0.000000 0.000000 0.000000 0.000000 0.000000 0.000000 0.000000 0.0 0.000000 0.00000 0.000000 0.000000 0.000000 0.000000 0.000000 0.000000 0.000000 0.000000 0.000000 0.000000 0.000000 0.000000 0.000000 0.000000 0.000000 0.000000 0.000000 0.0 0.0 0.0 0.0 0.0 0.0 0.0 0.0 0.0 0.0 0.0 0.0 0.0 0.0 0.0 0.0 0.0 0.0 0.0 0.0 0.0 0.0 0.0 0.0 0.0 0.0 0.0 0.0 0.0 0.0 0.0 0.0 0.0 0.0 0.0 0.0 0.0 0.0 0.0 0.0 0.0 0.0 0.0 0.0 0.0 0.0 0.0 0.0 0.0 0.0 0.0 0.0 0.0 0.0 0.0 0.0 0.0 0.0 0.0 0.0 0.0
1 1.0 0.971680 0.929199 0.572754 0.178467 0.122986 0.132324 0.094421 0.089600 0.030487 0.040497 0.020386 0.027969 0.035492 0.015320 0.045471 0.035492 0.035492 0.025452 0.030487 0.030487 0.030487 0.030487 0.025452 0.020386 0.015320 0.040497 0.025452 0.035492 0.065247 0.089600 0.099182 0.141724 0.141724 0.144043 0.146362 0.178467 0.192017 0.192017 0.205444 0.223145 0.227539 0.218750 0.231934 0.209839 0.203247 0.196533 0.164795 0.155640 0.151001 0.146362 0.164795 0.146362 0.137085 0.141724 0.127686 0.130005 0.132324 0.113525 0.137085 0.127686 0.146362 0.141724 0.137085 0.141724 0.155640 0.104004 0.118225 0.132324 0.127686 0.141724 0.122986 0.118225 0.113525 0.127686 0.089600 0.118225 0.099182 0.101562 0.104004 0.118225 0.122986 0.104004 0.127686 0.122986 0.127686 0.113525 0.137085 0.132324 0.137085 0.141724 0.137085 0.164795 0.169312 0.178467 0.214355 0.333252 0.534668 0.796387 1.000000 0.938477 0.874023 0.431641 0.146362 0.182983 0.164795 0.108765 0.104004 0.070129 0.050446 0.030487 0.025452 0.020386 0.000000 0.025452 0.005123 0.015320 0.020386 0.045471 0.035492 0.050446 0.050446 0.025452 0.000000 0.000000 0.000000 0.0 0.000000 0.00000 0.000000 0.000000 0.000000 0.000000 0.000000 0.000000 0.000000 0.000000 0.000000 0.000000 0.000000 0.000000 0.000000 0.000000 0.000000 0.000000 0.000000 0.0 0.0 0.0 0.0 0.0 0.0 0.0 0.0 0.0 0.0 0.0 0.0 0.0 0.0 0.0 0.0 0.0 0.0 0.0 0.0 0.0 0.0 0.0 0.0 0.0 0.0 0.0 0.0 0.0 0.0 0.0 0.0 0.0 0.0 0.0 0.0 0.0 0.0 0.0 0.0 0.0 0.0 0.0 0.0 0.0 0.0 0.0 0.0 0.0 0.0 0.0 0.0 0.0 0.0 0.0 0.0 0.0 0.0 0.0 0.0 0.0
2 2.0 1.000000 0.958984 0.701172 0.231812 0.000000 0.080688 0.128418 0.187500 0.280762 0.328369 0.320557 0.322510 0.324463 0.322510 0.332031 0.324463 0.316650 0.320557 0.316650 0.322510 0.318604 0.310547 0.314697 0.318604 0.328369 0.332031 0.333984 0.345703 0.372314 0.392822 0.400391 0.413086 0.426025 0.428711 0.431396 0.416748 0.413086 0.405762 0.405762 0.391113 0.394775 0.387207 0.383545 0.381592 0.378906 0.375977 0.385498 0.385498 0.385498 0.389160 0.389160 0.385498 0.379883 0.383545 0.396729 0.395752 0.394775 0.000000 0.000000 0.000000 0.000000 0.000000 0.000000 0.000000 0.000000 0.000000 0.000000 0.000000 0.000000 0.000000 0.000000 0.000000 0.000000 0.000000 0.000000 0.000000 0.000000 0.000000 0.000000 0.000000 0.000000 0.000000 0.000000 0.000000 0.000000 0.000000 0.000000 0.000000 0.000000 0.000000 0.000000 0.000000 0.000000 0.000000 0.000000 0.000000 0.000000 0.000000 0.000000 0.000000 0.000000 0.000000 0.000000 0.000000 0.000000 0.000000 0.000000 0.000000 0.000000 0.000000 0.000000 0.000000 0.000000 0.000000 0.000000 0.000000 0.000000 0.000000 0.000000 0.000000 0.000000 0.000000 0.000000 0.000000 0.000000 0.0 0.000000 0.00000 0.000000 0.000000 0.000000 0.000000 0.000000 0.000000 0.000000 0.000000 0.000000 0.000000 0.000000 0.000000 0.000000 0.000000 0.000000 0.000000 0.000000 0.0 0.0 0.0 0.0 0.0 0.0 0.0 0.0 0.0 0.0 0.0 0.0 0.0 0.0 0.0 0.0 0.0 0.0 0.0 0.0 0.0 0.0 0.0 0.0 0.0 0.0 0.0 0.0 0.0 0.0 0.0 0.0 0.0 0.0 0.0 0.0 0.0 0.0 0.0 0.0 0.0 0.0 0.0 0.0 0.0 0.0 0.0 0.0 0.0 0.0 0.0 0.0 0.0 0.0 0.0 0.0 0.0 0.0 0.0 0.0 2.0
3 3.0 0.975586 0.934082 0.659668 0.249878 0.237061 0.281494 0.249878 0.249878 0.241455 0.230713 0.224243 0.228516 0.232788 0.234985 0.226318 0.215576 0.215576 0.217651 0.208984 0.195801 0.195801 0.191406 0.184692 0.178101 0.169067 0.164551 0.162354 0.137329 0.144165 0.137329 0.114136 0.090576 0.069092 0.064270 0.059387 0.034943 0.005043 0.000000 0.017578 0.042328 0.085815 0.128052 0.169067 0.206787 0.216553 0.226318 0.245605 0.266846 0.273193 0.268799 0.270996 0.289795 0.298096 0.293945 0.287598 0.290771 0.293945 0.285645 0.287598 0.298096 0.310303 0.314209 0.308105 0.312256 0.322510 0.320312 0.317383 0.314209 0.304199 0.312256 0.310303 0.300049 0.306152 0.308105 0.310303 0.302002 0.298096 0.296875 0.295898 0.295898 0.289795 0.298096 0.318359 0.330566 0.334473 0.328369 0.336426 0.334473 0.327393 0.320312 0.322510 0.326416 0.328369 0.316406 0.310303 0.314209 0.312256 0.300049 0.295898 0.300049 0.304199 0.298096 0.291748 0.291748 0.308105 0.314209 0.298096 0.310303 0.352539 0.360352 0.354492 0.348389 0.344482 0.342529 0.338623 0.324463 0.324463 0.332520 0.336426 0.318359 0.281494 0.321533 0.360352 0.612305 0.884277 1.0 0.951172 0.61084 0.208984 0.247803 0.306152 0.285645 0.283447 0.281494 0.281494 0.283447 0.279297 0.264648 0.258301 0.260498 0.251953 0.241455 0.239258 0.233887 0.228516 0.0 0.0 0.0 0.0 0.0 0.0 0.0 0.0 0.0 0.0 0.0 0.0 0.0 0.0 0.0 0.0 0.0 0.0 0.0 0.0 0.0 0.0 0.0 0.0 0.0 0.0 0.0 0.0 0.0 0.0 0.0 0.0 0.0 0.0 0.0 0.0 0.0 0.0 0.0 0.0 0.0 0.0 0.0 0.0 0.0 0.0 0.0 0.0 0.0 0.0 0.0 0.0 0.0 0.0 0.0 0.0 0.0 0.0 0.0 0.0 0.0
4 4.0 0.000000 0.055817 0.261230 0.359863 0.433105 0.453613 0.499023 0.542969 0.616699 0.676758 0.737793 0.755371 0.772949 0.774414 0.786133 0.791016 0.810059 0.821289 0.841797 0.862793 0.874023 0.889160 0.899414 0.909180 0.913574 0.919434 0.919434 0.912598 0.897461 0.892578 0.866211 0.840332 0.828613 0.815430 0.801758 0.772949 0.773926 0.744629 0.736328 0.728027 0.732422 0.705078 0.709473 0.707520 0.710938 0.713867 0.708984 0.709961 0.703125 0.705078 0.700195 0.698242 0.695312 0.696777 0.701172 0.700195 0.699219 0.694824 0.687012 0.675781 0.662598 0.662598 0.669922 0.656250 0.668945 0.666504 0.666016 0.666016 0.658203 0.649902 0.644531 0.658691 0.643555 0.658203 0.648926 0.663574 0.669434 0.665039 0.660645 0.685547 0.765625 0.841309 0.911621 0.990723 1.000000 0.858398 0.592773 0.533691 0.564941 0.595215 0.606934 0.626465 0.623535 0.625000 0.627441 0.645020 0.633301 0.645020 0.644531 0.357910 0.000000 0.000000 0.000000 0.000000 0.000000 0.000000 0.000000 0.000000 0.000000 0.000000 0.000000 0.000000 0.000000 0.000000 0.000000 0.000000 0.000000 0.000000 0.000000 0.000000 0.000000 0.000000 0.000000 0.000000 0.000000 0.0 0.000000 0.00000 0.000000 0.000000 0.000000 0.000000 0.000000 0.000000 0.000000 0.000000 0.000000 0.000000 0.000000 0.000000 0.000000 0.000000 0.000000 0.000000 0.000000 0.0 0.0 0.0 0.0 0.0 0.0 0.0 0.0 0.0 0.0 0.0 0.0 0.0 0.0 0.0 0.0 0.0 0.0 0.0 0.0 0.0 0.0 0.0 0.0 0.0 0.0 0.0 0.0 0.0 0.0 0.0 0.0 0.0 0.0 0.0 0.0 0.0 0.0 0.0 0.0 0.0 0.0 0.0 0.0 0.0 0.0 0.0 0.0 0.0 0.0 0.0 0.0 0.0 0.0 0.0 0.0 0.0 0.0 0.0 0.0 2.0
#训练数据/测试数据准备
x_train = train.drop(['id','label'], axis=1)
y_train = train['label']
x_test=test.drop(['id'], axis=1)
-
评价指标:
a b s − s u m = ∑ j = 1 n ∑ i = 1 4 ∣ y i − a i ∣ {abs-sum={\mathop{ \sum }\limits{{j=1}}^{{n}}{{\mathop{ \sum }\limits{{i=1}}^{{4}}{{ \left| {y\mathop{{}}\nolimits{{i}}-a\mathop{{}}\nolimits{{i}}} \right| }}}}}} abs−sum=∑j=1n∑i=14∣yi−ai∣ 的实现:
def abs_sum(y_pre,y_tru):
y_pre=np.array(y_pre)
y_tru=np.array(y_tru)
loss=sum(sum(abs(y_pre-y_tru)))
return loss
5折交叉划分
def cv_model(clf, train_x, train_y, test_x, clf_name):
folds = 5
seed = 2021
kf = KFold(n_splits=folds, shuffle=True, random_state=seed)
test = np.zeros((test_x.shape[0],4))
模型设置参数 预测
model = clf.train(params,
train_set=train_matrix,
valid_sets=valid_matrix,
num_boost_round=2000,
verbose_eval=100,
early_stopping_rounds=200)
val_pred = model.predict(val_x, num_iteration=model.best_iteration)
test_pred = model.predict(test_x, num_iteration=model.best_iteration)
输出保存
result=pd.read_csv(path+'sample_submit.csv')
result['label_0']=temp[0]
result['label_1']=temp[1]
result['label_2']=temp[2]
result['label_3']=temp[3]
result.to_csv('submit.csv',index=False)
1.3 提交结果
无优化结果:559.61