机器学习实战 - Chapter 1 - ML Landscape - Code Annotation

最新推荐文章于 2024-08-08 17:45:51 发布

HHVic

最新推荐文章于 2024-08-08 17:45:51 发布

阅读量131

点赞数

分类专栏： Book:机器学习实战第二版文章标签：深度学习机器学习 tensorflow

本文链接：https://blog.csdn.net/landian0531/article/details/118342283

版权

Book:机器学习实战第二版专栏收录该内容

3 篇文章 0 订阅

订阅专栏

本文通过Python加载并处理OECD的生活满意度和IMF的GDP数据，进行数据预处理，去除异常值。然后，通过数据可视化展示二者之间的关系，并建立模型预测生活满意度。最后，对模型进行评估并探讨可能的关联性。

摘要由CSDN通过智能技术生成

文章目录

Instance
1. Preparation
2. Load And Preparation *Life satisfaction* and *GDP per capita* data
3. Review Dataset Architecture
4. Data Processing
5.Visualize the data
6. Outliers Processing
7. Model Processing

Instance

If money makes people happy, so you download the Better life index from the OECD’s website as well as stats about GDP per capita from the IMF’s website. Then you join the tables and sort by GDP per capita.

Reference: https://www.bilibili.com/video/BV1iJ411k7Gg

1. Preparation

Code:

assert : validate the command line, if yes, continue

# Python ≥3.5 is required
import sys
assert sys.version_info >= (3, 5)
# Scikit-Learn ≥0.20 is required
import sklearn
assert sklearn.__version__ >= "0.20"

path.join( load the datasets’ path)
The last ‘’’’ : the symbol of \

import os
datapath = os.path.join("datasets", "lifesat", "")

%matplotlib inline: Show the figures directly within Jupyter

# To plot pretty figures directly within Jupyter
%matplotlib inline
import matplotlib as mpl
mpl.rc('axes', labelsize=14)
mpl.rc('xtick', labelsize=12)
mpl.rc('ytick', labelsize=12)

2. Load And Preparation Life satisfaction and GDP per capita data

delimiter=’\t’ : delimiter(分隔符) \t(制表符)
na_values=‘n/a’ Change nan or NaN data to N/A

oecd_bli = pd.read_csv(datapath+'oecd_bli_2015.csv', thousands=',')
gdp_per_capita = pd.read_csv(datapath+'gdp_per_capita.csv', thousands=',',delimiter='\t', encoding='latin1',na_values='n/a')

3. Review Dataset Architecture

在这里插入图片描述

4. Data Processing

Need to filter Inequality == TOT

在这里插入图片描述

Use pivot to reset index and columns

在这里插入图片描述

Replace column name and set index
inplace: Modify on the original data

在这里插入图片描述

Merge two file into one
left_index: 如果为True，则使用左侧DataFrame中的索引（行标签）作为其连接键。对于具有MultiIndex（分层）的DataFrame，级别数必须与右侧DataFrame中的连接键数相匹配。
right_index: 与left_index功能相似

在这里插入图片描述

Sort values as GDP per capita
Default is ascending order

在这里插入图片描述

Merge the final data

在这里插入图片描述

5.Visualize the data

kind, refer to https://blog.csdn.net/h_hxx/article/details/90635650

在这里插入图片描述

Save data to a file

country_stats.to_csv('country_stats.csv')

6. Outliers Processing

Delete outliers
iloc → 基于行、列索引序号进行查询

在这里插入图片描述

Visualize the outlier data

在这里插入图片描述

7. Model Processing

Show valuable data

在这里插入图片描述

Model conjecture(猜想)

在这里插入图片描述

Train the Model

在这里插入图片描述

Gain the perfect intercept and coefficient of the linear

在这里插入图片描述

Input Cyprus’s GDP and try to predict its satisfaction

在这里插入图片描述

Visualize all data

在这里插入图片描述

HHVic

关注

0
点赞
踩
0

收藏

觉得还不错? 一键收藏
0
评论
机器学习实战 - Chapter 1 - ML Landscape - Code Annotation

文章目录1. Preparation1. PreparationCode:assert : validate the sentence, if yes, continue# Python ≥3.5 is requiredimport sysassert sys.version_info >= (3, 5)# Scikit-Learn ≥0.20 is requiredimport sklearnassert sklearn.__version__ >= "0.20"p
复制链接

扫一扫

专栏目录