现在能在网上找到很多很多的学习资源,有免费的也有收费的,当我拿到1套比较全的学习资源之前,我并没着急去看第1节,我而是去审视这套资源是否值得学习,有时候也会去问一些学长的意见,如果可以之后,我会对这套学习资源做1个学习计划,我的学习计划主要包括规划图和学习进度表。
分享给大家这份我薅到的免费视频资料,质量还不错,大家可以跟着学习
网上学习资料一大堆,但如果学到的知识不成体系,遇到问题时只是浅尝辄止,不再深入研究,那么很难做到真正的技术提升。
一个人可以走的很快,但一群人才能走的更远!不论你是正从事IT行业的老鸟或是对IT行业感兴趣的新人,都欢迎加入我们的的圈子(技术交流、学习资源、职场吐槽、大厂内推、面试辅导),让我们一起学习成长!
#用ceil对值舍入(以产生离散的分类),然后将所有大于5的分类归入到分类5
housing[‘income_cat’] = np.ceil(housing[‘median_income’] / 1.5)
housing[‘income_cat’].where(housing[‘income_cat’] < 5, 5.0, inplace = True)
housing[‘income_cat’].value_counts()
housing[‘income_cat’].hist()
#通过sklearn进行分层采样也是可以的
from sklearn.model_selection import StratifiedShuffleSplit
split = StratifiedShuffleSplit(n_splits = 1, test_size = 0.2, random_state = 42)
for train_index, test_index in split.split(housing, housing[‘income_cat’]):
strat_train_set = housing.loc[train_index]
strat_test_set = housing.loc[test_index]
housing[‘income_cat’].value_counts() / len(housing)
strat_test_set[‘income_cat’].value_counts() / len(strat_test_set)
使用相似的代码,用随机采样和分层采样进行对比,发现分层采样测试集的收入分类比例
与总数据集几乎相同,而随机采样数据集偏差严重
def income_cat_proportions(data):
return data[‘income_cat’].value_counts() / len(data)
train_set, test_set = train_test_split(housing, test_size = 0.2,
random_state = 42)
compare_props = pd.DataFrame({
‘Overall’:income_cat_proportions(housing),
‘Stratified’:income_cat_proportions(strat_test_set),
‘Random’: income_cat_proportions(test_set)
}).sort_index()
compare_props[‘Rand.%error’] = 100 * compare_props[‘Random’] / compare_props[‘Overall’] - 100
compare_props[‘Strat.%error’] = 100 * compare_props[‘Stratified’] / compare_props[‘Overall’] - 100
#删除income_cat属性,始数据回到初始状态:
for set in (strat_train_set, strat_test_set):
set.drop([‘income_cat’], axis = 1, inplace = True)
#创建训练集副本,以免损伤训练集
housing = strat_train_set.copy()
#散点图
housing.plot(kind = ‘scatter’, x = ‘longitude’, y = ‘latitude’)
#散点图设置透明读
housing.plot(kind = “scatter”, x = “longitude”, y = “latitude”, alpha=0.1)
#散点图设置彩色
housing.plot(kind = ‘scatter’, x = ‘longitude’, y = ‘latitude’, alpha=0.4,
s = housing[‘population’]/100, label = ‘population’, figsize = (10,7),
c = ‘median_house_value’, cmap = plt.get_cmap(‘jet’), colorbar = True,
sharex = False)
plt.legend()
#将地图导入散点图
import matplotlib.image as mpimg
california_img = mpimg.imread(r’F:\python36_data\01–Sklearn 与 TensorFlow 机器学习实用指南中文版 2018.6.20\handson-ml-master\images\end_to_end_project\california.png’)<