原创 [P/M/K] measure error between 'y' and 'pred'

def rmse(y, y_pred): return np.sqrt(mean_squared_error(y, y_pred))

2019-07-13 15:55:59 141

原创 [P/M/K] sns.countplot & df.value_counts().plot(kind='bar')

Difference between sns and matplotlib.pyplotsns.countplot(df_train.Census_OSInstallLanguageIdentifier, ax=axis1)df_train['Census_OSInstallLanguageIdentifier'].value_counts().plot(kind='bar')

2018-12-18 10:46:47 2693

原创 [P/M/K] simple way to downcast type to reduce memory

<class 'pandas.core.frame.DataFrame'>RangeIndex: 5 entries, 0 to 4Data columns (total 11 columns):Season_Year 5 non-null int64GameKey 5 non-null int64PlayID 5 non-null int...

2018-12-10 21:31:49 150

原创 [P/M/K] select specific type from dataframe

float_cols = df_temp.select_dtypes(include=['float'])int_cols = df_temp.select_dtypes(include=['int'])

2018-12-10 21:29:12 162

原创 [P/M/K] random in pandas.dataframe --sample

simple way :print(train['id'].sample(1))

2018-11-05 15:39:33 177

原创 [P/M] One-hot encoding is BAD for Boosting

One-hot encoding is not required for tree-models like RF and boostings. Here I would say categorical variable do not benefit boostings but opposite.The main idea is decision-tree based models have wa...

2018-10-24 14:06:45 104

原创 [P/M/D] How to change order of a dataframe

best way to put:order = ['date', 'time', 'open', 'high', 'low', 'close', 'volumefrom', 'volumeto']df = df[order]

2018-10-23 11:05:33 115

原创 [P/M/K] Merge different dataframes

Merge different dataframesIt’s a really confusing problem when there are a few different dataframes with correlation provided in one dataset. Now I know how to merge it together.train = train.set_in...

2018-10-16 18:52:41 99

原创 [P/M/K] Groupby

GroupbyIt occurs so many times that I have to record it down.dataframe before:date date_block_num shop_id item_id item_price item_cnt_day0 2013-01-02 0 59 22154 999.00 1.01 2013-01-03 0 25 2552 8...

2018-10-16 13:47:47 185

原创 [P/M/T]Datedelta to int

Datedelta to intThis is the only way work for me.Y = (Y / np.timedelta64(1, 'D')).astype(int)[1]: https://blog.csdn.net/xu200yang/article/details/70460592

2018-10-14 12:03:50 143

原创 [P/M/T]Select dataframe by multiple conditions

It’s easy to select a part of dataframe by one condition like below.pos = df_train[df_train['Date']>0]But when you are trying to add conditions like thispos = df_train[df_train['Date']>0 and...

2018-10-10 17:00:59 268

原创 [P/M/K]How to see missing data percentage

How to see missing data percentageSee it in textpercent = (100 * train_df.isnull().sum() / train_df.shape[0]).sort_values(ascending=False)percent[:10]trafficSource.adContent ...

2018-09-25 11:57:44 115

原创 [M/K]Scaling have different affection on regression or decisiontree

Scaling have different affection on regression or decisiontreeScaling is a necessary step of preprocessing,it can help eliminating the bias caused by variable with different scales. It works in SVM o...

2018-09-24 18:55:57 109

原创 [P/M/K]sklearn.preprocessing.LabelEncoder() & pandas.factorize

sklearn.preprocessing.LabelEncoder() & pandas.factorizeI am used to usedata.loc[:, "MSZoning"] = pd.factorize(data.MSZoning)[0]Actually what it does is exactly the same with LabelEncoder. The ...

2018-09-23 11:54:06 452

原创 [P/M/K] How to see correlation when variables are more or less

How to see correlation when variables are small or largeSometimes when we are doing prediction,we have to see the correlation between target and other variables. When the variables are not too many,...

2018-09-05 16:43:31 100

原创 [P/M/K] How to check NaN in df swiftly?

How to check NaN in df swiftly?To check NaN in a dataframe.train.isnull().sum()Id 0v2a1 6860hacdor 0rooms 0hacapo ...

2018-09-05 11:55:24 119

原创 [P/M/K] copy() & deepcopy()

copy() & deepcopy()When we copy one column in a dataframe to use,we are usually talking about .deepcopy() – to take the copy as another new one. As for .copy(), it remains synchronization with...

2018-09-05 11:12:37 96

原创 [P/M/K] 2 way to transform from different types: .loc or mapping

2 way to transform from different type: .loc or mappingWhen we have to tranform a column from type to type,here are two ways: 1all_data.loc[all_data["edjefe"]=="yes","edjefe&quot

2018-09-05 10:37:14 120

原创 [P/M/K] How to select specific columns

How to select specific columnsWhen dataframe comes with a variety of types, sk.select_dtypes() will be helpful.train.select_dtypes(np.int64)for i, col in enumerate(train.select_dtypes('float')...

2018-09-05 10:30:41 124



