task3 HeartbeatClassification trfresh 特征提取

主要是利用tsfresh对数据做了处理,依旧使用了beseline中的lgb模型

数据处理

import pandas as pd
import numpy as np
import tsfresh as tsf
from tsfresh import extract_features, select_features
from tsfresh.utilities.dataframe_functions import impute
import lightgbm as lgb

from sklearn.model_selection import StratifiedKFold, KFold
from sklearn.metrics import log_loss
from sklearn.model_selection import train_test_split
from sklearn.preprocessing import OneHotEncoder
data_train = pd.read_csv("train.csv")
data_test_A = pd.read_csv("testA.csv")

print(data_train.shape)
print(data_test_A.shape)
(100000, 3)
(20000, 2)
data_train.head()
idheartbeat_signalslabel
000.9912297987616655,0.9435330436439665,0.764677...0.0
110.9714822034884503,0.9289687459588268,0.572932...0.0
221.0,0.9591487564065292,0.7013782792997189,0.23...2.0
330.9757952826275774,0.9340884687738161,0.659636...0.0
440.0,0.055816398940721094,0.26129357194994196,0...2.0
data_test_A.head()
idheartbeat_signals
01000000.9915713654170097,1.0,0.6318163407681274,0.13...
11000010.6075533139615096,0.5417083883163654,0.340694...
21000020.9752726292239277,0.6710965234906665,0.686758...
31000030.9956348033996116,0.9170249621481004,0.521096...
41000041.0,0.8879490481178918,0.745564725322326,0.531...

利用tsfresh对心跳信号序列处理

# 心跳信号行转列
train_heartbeat_df = data_train["heartbeat_signals"].str.split(",", expand=True).stack()
train_heartbeat_df = train_heartbeat_df.reset_index() # 重置索引
# 将level_0设为索引,可表示id,level_1是一个时间步特征
train_heartbeat_df = train_heartbeat_df.set_index("level_0") 
train_heartbeat_df.index.name = None
train_heartbeat_df = train_heartbeat_df.rename(columns={"level_1":"time", 0:"heartbeat_signals"})
train_heartbeat_df["heartbeat_signals"] = train_heartbeat_df["heartbeat_signals"].astype(float)
train_heartbeat_df
timeheartbeat_signals
000.991230
010.943533
020.764677
030.618571
040.379632
.........
999992000.000000
999992010.000000
999992020.000000
999992030.000000
999992040.000000

20500000 rows × 2 columns

data_train_label = data_train["label"]
data_train = data_train.drop("label", axis=1)
data_train = data_train.drop("heartbeat_signals", axis=1) # 去掉原数据的heartbeat列
data_train = data_train.join(train_heartbeat_df)

data_train
idtimeheartbeat_signals
0000.991230
0010.943533
0020.764677
0030.618571
0040.379632
............
99999999992000.000000
99999999992010.000000
99999999992020.000000
99999999992030.000000
99999999992040.000000

20500000 rows × 3 columns

#测试集处理
# 心跳信号行转列
test_heartbeat_df = data_test_A["heartbeat_signals"].str.split(",", expand=True).stack()
test_heartbeat_df = test_heartbeat_df.reset_index() # 重置索引
# 将level_0设为索引,可表示id,level_1是一个时间步特征
test_heartbeat_df = test_heartbeat_df.set_index("level_0") 
test_heartbeat_df.index.name = None
test_heartbeat_df = test_heartbeat_df.rename(columns={"level_1":"time", 0:"heartbeat_signals"})
test_heartbeat_df["heartbeat_signals"] = test_heartbeat_df["heartbeat_signals"].astype(float)
test_heartbeat_df
timeheartbeat_signals
000.991571
011.000000
020.631816
030.136230
040.041420
.........
199992000.000000
199992010.000000
199992020.000000
199992030.000000
199992040.000000

4100000 rows × 2 columns

data_test_A = data_test_A.drop("heartbeat_signals", axis=1) # 去掉原数据的heartbeat列
data_test_A = data_test_A.join(test_heartbeat_df)

data_test_A
idtimeheartbeat_signals
010000000.991571
010000011.000000
010000020.631816
010000030.136230
010000040.041420
............
199991199992000.000000
199991199992010.000000
199991199992020.000000
199991199992030.000000
199991199992040.000000

4100000 rows × 3 columns

特征提取

我16g的内存跑所有数据不是很ok,于是乎分成了10分分开跑,训练数据大概跑了一个半小时
一共得到787个特征

# 特征提取
train_features = pd.DataFrame([],dtype = float)
for i in range(10):
    l = len(data_train)/10
    m = int(i * l)
    n = int((i+1) * l)
    temp = extract_features(data_train.iloc[m:n,:], column_id='id', column_sort='time')
    print(temp.shape)
    train_features = train_features.append(temp)
train_features
Feature Extraction: 100%|██████████████████████████████████████████████████████████████| 30/30 [07:55<00:00, 15.86s/it]
Feature Extraction:   0%|                                                                       | 0/30 [00:00<?, ?it/s]

(10000, 787)


Feature Extraction: 100%|██████████████████████████████████████████████████████████████| 30/30 [08:08<00:00, 16.28s/it]
Feature Extraction:   0%|                                                                       | 0/30 [00:00<?, ?it/s]

(10000, 787)


Feature Extraction: 100%|██████████████████████████████████████████████████████████████| 30/30 [07:47<00:00, 15.59s/it]
Feature Extraction:   0%|                                                                       | 0/30 [00:00<?, ?it/s]

(10000, 787)


Feature Extraction: 100%|██████████████████████████████████████████████████████████████| 30/30 [07:55<00:00, 15.85s/it]
Feature Extraction:   0%|                                                                       | 0/30 [00:00<?, ?it/s]

(10000, 787)


Feature Extraction: 100%|██████████████████████████████████████████████████████████████| 30/30 [07:53<00:00, 15.80s/it]
Feature Extraction:   0%|                                                                       | 0/30 [00:00<?, ?it/s]

(10000, 787)


Feature Extraction: 100%|██████████████████████████████████████████████████████████████| 30/30 [08:14<00:00, 16.49s/it]
Feature Extraction:   0%|                                                                       | 0/30 [00:00<?, ?it/s]

(10000, 787)


Feature Extraction: 100%|██████████████████████████████████████████████████████████████| 30/30 [08:09<00:00, 16.32s/it]


(10000, 787)


Feature Extraction: 100%|██████████████████████████████████████████████████████████████| 30/30 [07:51<00:00, 15.73s/it]


(10000, 787)


Feature Extraction: 100%|██████████████████████████████████████████████████████████████| 30/30 [08:01<00:00, 16.05s/it]


(10000, 787)


Feature Extraction: 100%|██████████████████████████████████████████████████████████████| 30/30 [08:05<00:00, 16.17s/it]


(10000, 787)
heartbeat_signals__variance_larger_than_standard_deviationheartbeat_signals__has_duplicate_maxheartbeat_signals__has_duplicate_minheartbeat_signals__has_duplicateheartbeat_signals__sum_valuesheartbeat_signals__abs_energyheartbeat_signals__mean_abs_changeheartbeat_signals__mean_changeheartbeat_signals__mean_second_derivative_centralheartbeat_signals__median...heartbeat_signals__permutation_entropy__dimension_5__tau_1heartbeat_signals__permutation_entropy__dimension_6__tau_1heartbeat_signals__permutation_entropy__dimension_7__tau_1heartbeat_signals__query_similarity_count__query_None__threshold_0.0heartbeat_signals__matrix_profile__feature_"min"__threshold_0.98heartbeat_signals__matrix_profile__feature_"max"__threshold_0.98heartbeat_signals__matrix_profile__feature_"mean"__threshold_0.98heartbeat_signals__matrix_profile__feature_"median"__threshold_0.98heartbeat_signals__matrix_profile__feature_"25"__threshold_0.98heartbeat_signals__matrix_profile__feature_"75"__threshold_0.98
00.00.01.01.038.92794518.2161970.019894-0.0048590.0001170.125531...2.1844202.5006582.722686NaN6.44554612.16552510.24652410.7469928.38862511.484910
10.00.01.01.019.4456347.7050920.019952-0.0047620.0001050.030481...2.7109333.0658023.224835NaN3.20914012.6491119.0310699.4375456.72318012.094899
20.00.01.01.021.1929749.1404230.009863-0.0049020.0001010.000000...1.2633701.4060011.509478NaN3.0545398.2462117.3704788.2462115.9661228.246211
30.00.01.01.042.11306615.7576230.018743-0.0047830.0001030.241397...2.9867283.5343543.854177NaN3.0105579.7979596.3313606.4064405.2667437.091706
40.00.01.01.069.75678651.2296160.0145140.000000-0.0001370.000000...1.9145112.1656272.323993NaN9.18123613.4297849.9599139.5162909.28601310.270925
..................................................................
999950.00.01.01.063.32344928.7422380.023588-0.0049020.0007940.388402...2.8736023.3918303.679969NaN2.4363779.5916635.6352316.3662053.5969827.033638
999960.00.01.01.069.65753431.8663230.017373-0.0045430.0000510.421138...3.0855043.7288814.095457NaN1.4154107.4833152.8935922.6843492.0492413.334109
999970.00.01.01.040.89705716.4128570.019470-0.0045380.0008340.213306...2.6010622.9969623.293562NaN5.74865212.1655258.5246377.9834107.06221710.081756
999980.00.01.01.042.33330314.2812810.017032-0.0049020.0000130.264974...3.2369503.7935124.018302NaN2.3468228.2462114.9513744.7275354.0697865.615282
999990.00.01.01.053.29011721.6374710.021870-0.0045390.0000230.320124...2.9492663.4625493.688612NaN1.9591399.3808324.5736913.9086213.0946145.916164

100000 rows × 787 columns

测试数据做同样处理

test_fearues = extract_features(data_test_A, column_id='id', column_sort='time')
Feature Extraction: 100%|██████████████████████████████████████████████████████████████| 30/30 [16:03<00:00, 32.12s/it]

去除NaN值

# 去除抽取特征中的NaN值
impute(train_features)
heartbeat_signals__variance_larger_than_standard_deviationheartbeat_signals__has_duplicate_maxheartbeat_signals__has_duplicate_minheartbeat_signals__has_duplicateheartbeat_signals__sum_valuesheartbeat_signals__abs_energyheartbeat_signals__mean_abs_changeheartbeat_signals__mean_changeheartbeat_signals__mean_second_derivative_centralheartbeat_signals__median...heartbeat_signals__permutation_entropy__dimension_5__tau_1heartbeat_signals__permutation_entropy__dimension_6__tau_1heartbeat_signals__permutation_entropy__dimension_7__tau_1heartbeat_signals__query_similarity_count__query_None__threshold_0.0heartbeat_signals__matrix_profile__feature_"min"__threshold_0.98heartbeat_signals__matrix_profile__feature_"max"__threshold_0.98heartbeat_signals__matrix_profile__feature_"mean"__threshold_0.98heartbeat_signals__matrix_profile__feature_"median"__threshold_0.98heartbeat_signals__matrix_profile__feature_"25"__threshold_0.98heartbeat_signals__matrix_profile__feature_"75"__threshold_0.98
00.00.01.01.038.92794518.2161970.019894-0.0048590.0001170.125531...2.1844202.5006582.7226860.06.44554612.16552510.24652410.7469928.38862511.484910
10.00.01.01.019.4456347.7050920.019952-0.0047620.0001050.030481...2.7109333.0658023.2248350.03.20914012.6491119.0310699.4375456.72318012.094899
20.00.01.01.021.1929749.1404230.009863-0.0049020.0001010.000000...1.2633701.4060011.5094780.03.0545398.2462117.3704788.2462115.9661228.246211
30.00.01.01.042.11306615.7576230.018743-0.0047830.0001030.241397...2.9867283.5343543.8541770.03.0105579.7979596.3313606.4064405.2667437.091706
40.00.01.01.069.75678651.2296160.0145140.000000-0.0001370.000000...1.9145112.1656272.3239930.09.18123613.4297849.9599139.5162909.28601310.270925
..................................................................
999950.00.01.01.063.32344928.7422380.023588-0.0049020.0007940.388402...2.8736023.3918303.6799690.02.4363779.5916635.6352316.3662053.5969827.033638
999960.00.01.01.069.65753431.8663230.017373-0.0045430.0000510.421138...3.0855043.7288814.0954570.01.4154107.4833152.8935922.6843492.0492413.334109
999970.00.01.01.040.89705716.4128570.019470-0.0045380.0008340.213306...2.6010622.9969623.2935620.05.74865212.1655258.5246377.9834107.06221710.081756
999980.00.01.01.042.33330314.2812810.017032-0.0049020.0000130.264974...3.2369503.7935124.0183020.02.3468228.2462114.9513744.7275354.0697865.615282
999990.00.01.01.053.29011721.6374710.021870-0.0045390.0000230.320124...2.9492663.4625493.6886120.01.9591399.3808324.5736913.9086213.0946145.916164

100000 rows × 787 columns

impute(test_fearues)
heartbeat_signals__variance_larger_than_standard_deviationheartbeat_signals__has_duplicate_maxheartbeat_signals__has_duplicate_minheartbeat_signals__has_duplicateheartbeat_signals__sum_valuesheartbeat_signals__abs_energyheartbeat_signals__mean_abs_changeheartbeat_signals__mean_changeheartbeat_signals__mean_second_derivative_centralheartbeat_signals__median...heartbeat_signals__permutation_entropy__dimension_5__tau_1heartbeat_signals__permutation_entropy__dimension_6__tau_1heartbeat_signals__permutation_entropy__dimension_7__tau_1heartbeat_signals__query_similarity_count__query_None__threshold_0.0heartbeat_signals__matrix_profile__feature_"min"__threshold_0.98heartbeat_signals__matrix_profile__feature_"max"__threshold_0.98heartbeat_signals__matrix_profile__feature_"mean"__threshold_0.98heartbeat_signals__matrix_profile__feature_"median"__threshold_0.98heartbeat_signals__matrix_profile__feature_"25"__threshold_0.98heartbeat_signals__matrix_profile__feature_"75"__threshold_0.98
1000000.00.01.01.019.2298637.9079340.018374-0.004861-0.0000210.027745...2.0214512.3568642.5879250.02.69228111.6619049.4565429.6481318.52587811.076665
1000010.00.00.01.084.29893238.2928020.021483-0.0011950.0001950.367241...4.0991234.6568754.8823830.00.9398934.4708012.5849212.4934561.5421053.698142
1000020.00.01.01.047.78992121.2870390.021610-0.0047810.0007490.260611...2.9004883.3210283.5167150.05.68417512.5126939.75112910.4834678.22193411.135462
1000030.00.01.01.047.06901128.7495200.023874-0.0048810.0001940.000000...1.5305581.8062941.9793050.00.9097214.8989793.5319434.8989791.6755264.898979
1000040.00.01.01.024.89939710.1779980.020548-0.0049020.0002760.034859...2.6265542.9605683.1680850.05.03372214.18008711.45059911.99103711.82549112.597792
..................................................................
1199950.00.01.01.043.17513018.9678330.016106-0.0049020.0004110.205399...3.1509103.6253983.8435860.03.6877708.7002945.9913306.3234504.1555587.191577
1199960.00.01.01.031.03078214.4132440.021473-0.0049020.0004290.000000...1.7322871.9556592.0819460.010.45646512.98219711.33830711.24476610.76333211.762948
1199970.00.01.01.031.64862313.0839920.017566-0.0046650.0000870.010807...2.2482412.4970972.6634040.06.03787011.6619049.3121198.9737218.06433810.409977
1199980.00.01.01.019.3054426.7008350.019937-0.0045470.0006170.000000...2.5384562.9128293.0214490.010.35094015.06558412.96122312.88740912.11825913.558463
1199990.00.01.01.035.20456915.5146670.020292-0.0032610.0003410.000000...1.9360102.2071422.4542220.00.8604935.6568543.7602213.6028651.8950065.656854

20000 rows × 787 columns

根据相关性筛选特征
from tsfresh import select_features

# 按照特征和数据label之间的相关性进行特征选择
train_features_filtered = select_features(train_features, data_train_label)

train_features_filtered
heartbeat_signals__sum_valuesheartbeat_signals__fft_coefficient__attr_"abs"__coeff_38heartbeat_signals__fft_coefficient__attr_"abs"__coeff_37heartbeat_signals__fft_coefficient__attr_"abs"__coeff_36heartbeat_signals__fft_coefficient__attr_"abs"__coeff_35heartbeat_signals__fft_coefficient__attr_"abs"__coeff_34heartbeat_signals__fft_coefficient__attr_"abs"__coeff_33heartbeat_signals__fft_coefficient__attr_"abs"__coeff_32heartbeat_signals__fft_coefficient__attr_"abs"__coeff_31heartbeat_signals__fft_coefficient__attr_"abs"__coeff_30...heartbeat_signals__fft_coefficient__attr_"abs"__coeff_84heartbeat_signals__fft_coefficient__attr_"imag"__coeff_97heartbeat_signals__fft_coefficient__attr_"abs"__coeff_90heartbeat_signals__fft_coefficient__attr_"abs"__coeff_94heartbeat_signals__fft_coefficient__attr_"abs"__coeff_92heartbeat_signals__fft_coefficient__attr_"real"__coeff_97heartbeat_signals__fft_coefficient__attr_"abs"__coeff_75heartbeat_signals__fft_coefficient__attr_"real"__coeff_88heartbeat_signals__fft_coefficient__attr_"real"__coeff_92heartbeat_signals__fft_coefficient__attr_"real"__coeff_83
038.9279450.6609491.0907090.8487281.1686850.9821331.2234961.2363001.1041721.497129...0.531883-0.0474380.5543700.3075860.5645960.5629600.5918590.5041240.5284500.473568
119.4456341.7182171.2809231.8507061.4607521.9245011.9254851.7159382.0799571.818636...0.563590-0.1095790.6974460.3980730.6409690.2701920.2249250.6450820.6351350.297325
221.1929741.8142811.6190511.2153431.7871662.1469871.6861901.5401372.2910312.403422...0.712487-0.0740420.3217030.3903860.7169290.3165240.4220770.7227420.6805900.383754
342.1130662.1095500.6196342.3664132.0715391.0003402.7282811.3917272.0171762.610492...0.601499-0.1842480.5646690.6233530.4669800.6517740.3089150.5500970.4669040.494024
469.7567860.1945490.3488820.0921190.6539240.2314221.0800030.7112441.3579041.237998...0.0152920.0705050.0658350.0517800.0929400.1037730.179405-0.0896110.0918410.056867
..................................................................
9999563.3234490.8406511.1862101.3962360.4172212.0360341.6590540.5005841.6935450.859932...0.7799550.0055250.4860130.2733720.7053860.6028980.4479290.4748440.5642660.133969
9999669.6575341.5577871.3939600.9891471.6113331.7930441.0923250.5071381.7639402.677643...0.5394890.1146700.5794980.4172260.2701100.5565960.7032580.4623120.2697190.539236
9999740.8970570.4697581.0003550.7063951.1905140.6746031.6327690.2290082.0278020.302457...0.282597-0.4746290.4606470.4783410.5278910.9041110.7285290.1784100.5008130.773985
9999842.3333030.9929481.3548942.2385891.2376081.3252122.7855151.9185710.8141672.613950...0.594252-0.1621060.6942760.6810250.3571960.4980880.4332970.4061540.3247710.340727
9999953.2901171.6246251.7390882.9365550.1547592.9211642.1839321.4851502.6859220.583443...0.4636970.2893640.2853210.4221030.6920090.2762360.2457800.2695190.681719-0.053993

100000 rows × 707 columns

选出测试集中相对应的特征

test_fearues = test_fearues.loc[:,list(train_features_filtered.columns)]
test_fearues
heartbeat_signals__sum_valuesheartbeat_signals__fft_coefficient__attr_"abs"__coeff_38heartbeat_signals__fft_coefficient__attr_"abs"__coeff_37heartbeat_signals__fft_coefficient__attr_"abs"__coeff_36heartbeat_signals__fft_coefficient__attr_"abs"__coeff_35heartbeat_signals__fft_coefficient__attr_"abs"__coeff_34heartbeat_signals__fft_coefficient__attr_"abs"__coeff_33heartbeat_signals__fft_coefficient__attr_"abs"__coeff_32heartbeat_signals__fft_coefficient__attr_"abs"__coeff_31heartbeat_signals__fft_coefficient__attr_"abs"__coeff_30...heartbeat_signals__fft_coefficient__attr_"abs"__coeff_84heartbeat_signals__fft_coefficient__attr_"imag"__coeff_97heartbeat_signals__fft_coefficient__attr_"abs"__coeff_90heartbeat_signals__fft_coefficient__attr_"abs"__coeff_94heartbeat_signals__fft_coefficient__attr_"abs"__coeff_92heartbeat_signals__fft_coefficient__attr_"real"__coeff_97heartbeat_signals__fft_coefficient__attr_"abs"__coeff_75heartbeat_signals__fft_coefficient__attr_"real"__coeff_88heartbeat_signals__fft_coefficient__attr_"real"__coeff_92heartbeat_signals__fft_coefficient__attr_"real"__coeff_83
10000019.2298632.3812140.8321512.5098691.0821122.5178581.6561042.2571622.2134211.815374...0.563470-0.0405760.4854410.4720590.4480180.4493470.4799500.4804480.4422790.355992
10000184.2989320.9876600.8561740.6162610.2933390.1915580.5286841.0100801.4781821.713876...0.0373070.0100740.2728970.2475380.2869480.1438290.1894160.1242930.1546240.077530
10000247.7899210.6963931.1653871.0043780.9512311.5421140.9462191.6734301.4452201.118439...0.738423-0.1595050.4182980.5666280.8496840.9508510.7793240.4392550.8393150.454957
10000347.0690113.1376680.0448973.3929463.0542170.7262933.5826532.4149461.2576693.188068...0.2731420.3669490.8916900.2145850.9275620.6488720.7301780.6065280.8301050.662320
10000424.8993970.4960101.4010200.5365011.7125921.0446291.5334051.3302581.2517711.441028...0.644046-0.1297000.5785600.7832580.4805980.4850030.6671110.5942340.4479800.511133
..................................................................
11999543.1751301.7769370.2115271.9869400.3935501.6936201.1393951.4599901.7345351.025180...0.546742-0.0602540.5079500.5601920.5415340.2497500.6087960.4554440.5353060.268471
11999631.0307821.4510452.4837261.1054401.9797212.8217990.4752762.7825732.8278820.520034...0.4916620.0164130.4803800.4591720.3637560.4270280.5446920.7548340.3618660.536087
11999731.6486232.1413010.5467062.3404991.3626511.9426342.0436790.9940652.2481441.007128...0.5298800.0010120.7689600.8341590.6721140.5202150.3415190.7134190.6643540.370047
11999819.3054420.2217082.3552881.0512821.7423702.1640580.4355832.6499941.1905942.328580...0.527500-0.1035740.5212220.4264350.6368870.4463650.5514420.5037030.6352460.258394
11999935.2045690.8270170.4929901.6270891.1067990.6398211.3501550.5339041.3324011.229578...0.2487760.0912180.6597500.6362820.3199220.4728240.3558300.3463110.3127970.540855

20000 rows × 707 columns

这里列名需要重命名一下,否则会报错

test_fearues.columns = range(test_fearues.shape[1])
train_features_filtered.columns = range(train_features_filtered.shape[1])
train_features_filtered
0123456789...697698699700701702703704705706
038.9279450.6609491.0907090.8487281.1686850.9821331.2234961.2363001.1041721.497129...0.531883-0.0474380.5543700.3075860.5645960.5629600.5918590.5041240.5284500.473568
119.4456341.7182171.2809231.8507061.4607521.9245011.9254851.7159382.0799571.818636...0.563590-0.1095790.6974460.3980730.6409690.2701920.2249250.6450820.6351350.297325
221.1929741.8142811.6190511.2153431.7871662.1469871.6861901.5401372.2910312.403422...0.712487-0.0740420.3217030.3903860.7169290.3165240.4220770.7227420.6805900.383754
342.1130662.1095500.6196342.3664132.0715391.0003402.7282811.3917272.0171762.610492...0.601499-0.1842480.5646690.6233530.4669800.6517740.3089150.5500970.4669040.494024
469.7567860.1945490.3488820.0921190.6539240.2314221.0800030.7112441.3579041.237998...0.0152920.0705050.0658350.0517800.0929400.1037730.179405-0.0896110.0918410.056867
..................................................................
9999563.3234490.8406511.1862101.3962360.4172212.0360341.6590540.5005841.6935450.859932...0.7799550.0055250.4860130.2733720.7053860.6028980.4479290.4748440.5642660.133969
9999669.6575341.5577871.3939600.9891471.6113331.7930441.0923250.5071381.7639402.677643...0.5394890.1146700.5794980.4172260.2701100.5565960.7032580.4623120.2697190.539236
9999740.8970570.4697581.0003550.7063951.1905140.6746031.6327690.2290082.0278020.302457...0.282597-0.4746290.4606470.4783410.5278910.9041110.7285290.1784100.5008130.773985
9999842.3333030.9929481.3548942.2385891.2376081.3252122.7855151.9185710.8141672.613950...0.594252-0.1621060.6942760.6810250.3571960.4980880.4332970.4061540.3247710.340727
9999953.2901171.6246251.7390882.9365550.1547592.9211642.1839321.4851502.6859220.583443...0.4636970.2893640.2853210.4221030.6920090.2762360.2457800.2695190.681719-0.053993

100000 rows × 707 columns

模型训练

使用baseline中的lgb模型

结果优化

概率较大的直接设为1,提高比赛分数,kkk

## 结果优化,概率较高的直接设为1,较低的直接设为0
def prob_opt(lgb_test,prob):
    for index,row in enumerate(lgb_test):
        row_max = max(row)
        if row_max > prob:
            for i in range(4):
                if row[i] > prob:
                    lgb_test[index,i] = 1
                else:
                    lgb_test[index,i] = 0
    return lgb_test
test = prob_opt(lgb_test,0.9)
temp=pd.DataFrame(test)
temp
0123
01.00.00.00.0
10.00.01.00.0
20.00.00.01.0
31.00.00.00.0
41.00.00.00.0
...............
199951.00.00.00.0
199961.00.00.00.0
199970.00.01.00.0
199981.00.00.00.0
199991.00.00.00.0

20000 rows × 4 columns

  • 0
    点赞
  • 2
    收藏
    觉得还不错? 一键收藏
  • 0
    评论
Transformer是一种用于特征提取的模型,其结构在论文《Attention is All You Need》中被提出。Transformer由多个Transformer Block堆叠而成,其中Encoder框架和Decoder框架都是由Transformer Block组成。Transformer Block是Transformer的最关键部分,它取代了传统的LSTM和CNN结构作为特征提取器。在一般的特征提取任务中,我们主要关注Encoder中的Transformer。 Transformer在Encoder中的工作原理是通过自注意力机制来实现特征提取。自注意力机制可以帮助模型更好地理解输入序列中不同位置之间的关系和依赖关系。通过计算输入序列中每个位置与其他位置的相关性得分,Transformer可以根据这些得分加权地融合不同位置的信息,从而得到更丰富的特征表示。同时,Transformer还引入了残差连接和层归一化等技术,有助于减轻梯度消失和加速模型收敛。 总之,Transformer是一种用于特征提取的模型,通过自注意力机制实现对输入序列的特征提取。它在自然语言处理等领域中取得了很大的成功,并被广泛应用于各种任务中。<span class="em">1</span><span class="em">2</span><span class="em">3</span> #### 引用[.reference_title] - *1* *2* *3* [三大特征提取器(RNN/CNN/Transformer)](https://blog.csdn.net/sinat_28916141/article/details/117807361)[target="_blank" data-report-click={"spm":"1018.2226.3001.9630","extra":{"utm_source":"vip_chatgpt_common_search_pc_result","utm_medium":"distribute.pc_search_result.none-task-cask-2~all~insert_cask~default-1-null.142^v93^chatsearchT3_2"}}] [.reference_item style="max-width: 100%"] [ .reference_list ]
评论
添加红包

请填写红包祝福语或标题

红包个数最小为10个

红包金额最低5元

当前余额3.43前往充值 >
需支付:10.00
成就一亿技术人!
领取后你会自动成为博主和红包主的粉丝 规则
hope_wisdom
发出的红包
实付
使用余额支付
点击重新获取
扫码支付
钱包余额 0

抵扣说明:

1.余额是钱包充值的虚拟货币,按照1:1的比例进行支付金额的抵扣。
2.余额无法直接购买下载,可以购买VIP、付费专栏及课程。

余额充值