1. 训练集测试集划分
from sklearn.cross_validation import train_test_split
x_train, x_test, y_train, y_test = train_test_split(x_data, y_data, test_size = 0.2, random_state = 45)
2. cross table
pd.crosstab(temp,y_result)
pd.crosstab(np.array(y_train).reshape(1048377), clf.predict(x_train))
http://pandas.pydata.org/pandas-docs/version/0.17.0/generated/pandas.crosstab.html
类似的操作:
x_test_loan[['overdue_flag_30', 'result']].groupby(by=['overdue_flag_30', 'result']).agg(len).unstack()
3. map的用法
有时候,有一列list,想对list里每个值做变换,可以使用map函数。
map(func,seq1[,seq2...]):将函数func作用于给定序列的每个元素,并用一个列表来提供返回值;如果func为None,func表现为身份函数,返回一个含有每个序列中元素集合的n个元组的列表。
例如:
>>> map(lambda x : None,[1,2,3,4])
[None, None, None, None]
>>> map(lambda x : x * 2,[1,2,3,4])
[2, 4, 6, 8]
>>> map(lambda x : x * 2,[1,2,3,4,[5,6,7]])
[2, 4, 6, 8, [5, 6, 7, 5, 6, 7]]
>>> map(lambda x : None,[1,2,3,4])
[None, None, None, None]
map内建函数的python实现:
>>> def map(func,seq):
mapped_seq = []
for eachItem in seq:
mapped_seq.append(func(eachItem))
return mapped_seq
Ref:
http://blog.csdn.net/prince2270/article/details/4681299
4. 另一种map,对值的映射
city_type_mapping = {'一线': 1, '二线': 2, '三线': 3, '四线': 4, '五线': 5}
sample_all['address_city_type'] = sample_all['address_city_type'].map(city_type_mapping)
5. 长转宽 (zip)
sample = [(Sample(**doc), 1 / (round((int(time.time()) - doc['timestamp'] / 1000) / 60 / 60 / 24) + 5)) for doc in records]
population, weights = list(zip(*sample))