逻辑回归训练

最新推荐文章于 2024-01-04 00:02:40 发布

追忆无义

最新推荐文章于 2024-01-04 00:02:40 发布

阅读量874

点赞数 1

分类专栏： machine learning 文章标签：逻辑回归 pandas python 机器学习 sklearn

本文链接：https://blog.csdn.net/m0_66235114/article/details/126218519

版权

本文档详细介绍了如何利用Python的Scikit-Learn库进行逻辑回归训练，以预测澳大利亚的天气，特别是明日是否会下雨。通过分析Rain数据集，包括各种气象参数，进行数据预处理、模型构建和性能评估，最终使用GridSearchCV优化模型性能。

摘要由CSDN通过智能技术生成

背景和任务

使用 Python 和 Scikit-Learn 实现了逻辑回归，并构建了一个分类器来预测澳大利亚明天是否会下雨。
使用逻辑回归训练二元分类模型。
这个项目中使用了澳大利亚的 Rain 数据集

数据集数据信息

Date: 观察日期
Location：气象站位置的通用名称
MinTemp: 以摄氏度为单位的最低温度
MaxTemp: 最高温度（摄氏度）
Rainfall: 当天记录的降雨量，以毫米为单位
Evaporation: 24小时至上午9点的所谓A类蒸发量（mm）
Sunshine: 一天中明亮日照的小时数
WindGustDir: 24小时至午夜最强阵风方向
WindGustSpeed: 至午夜 24 小时内最强阵风的速度（km/h）
WindDir9am: 上午 9 点的风向
WindDir3pm: 下午三点的风向
WindSpeed9am: 上午 9 点前 10 分钟的平均风速 (km/hr)
WindSpeed3pm: 下午 3 点前 10 分钟内的平均风速 (km/hr)
Humidity9am: 上午 9 点的湿度（百分比）
Humidity3pm: 下午 3 点的湿度（百分比）
Pressure9am: 上午 9 点，大气压力 (hpa) 降至平均海平面
Pressure3pm: 下午 3 点，大气压力 (hpa) 降至平均海平面
Cloud9am: 上午 9 点被云层遮挡的部分天空。这是用八分之一单位“oktas”来衡量的。它记录了多少
Cloud3pm: 下午 3 点被云遮挡的天空部分（在“oktas”中：八分之一）
Temp9am: 上午 9 点的温度（摄氏度）
Temp3pm: 下午 3 点的温度（摄氏度）
RainToday: 布尔值：如果 24 小时内到上午 9 点的降水量 (mm) 超过 1mm，则为 1，否则为 0
RainTomorrow: 次日是否降雨

读取数据

import pandas as pd
import numpy as np

raw_df = pd.read_csv('./weather-dataset-rattle-package/weatherAUS.csv')

raw_df.head()

	Date	Location	MinTemp	MaxTemp	Rainfall	Evaporation	Sunshine	WindGustDir	WindGustSpeed	WindDir9am	...	Humidity9am	Humidity3pm	Pressure9am	Pressure3pm	Cloud9am	Cloud3pm	Temp9am	Temp3pm	RainToday	RainTomorrow
0	2008-12-01	Albury	13.4	22.9	0.6	NaN	NaN	W	44.0	W	...	71.0	22.0	1007.7	1007.1	8.0	NaN	16.9	21.8	No	No
1	2008-12-02	Albury	7.4	25.1	0.0	NaN	NaN	WNW	44.0	NNW	...	44.0	25.0	1010.6	1007.8	NaN	NaN	17.2	24.3	No	No
2	2008-12-03	Albury	12.9	25.7	0.0	NaN	NaN	WSW	46.0	W	...	38.0	30.0	1007.6	1008.7	NaN	2.0	21.0	23.2	No	No
3	2008-12-04	Albury	9.2	28.0	0.0	NaN	NaN	NE	24.0	SE	...	45.0	16.0	1017.6	1012.8	NaN	NaN	18.1	26.5	No	No
4	2008-12-05	Albury	17.5	32.3	1.0	NaN	NaN	W	41.0	ENE	...	82.0	33.0	1010.8	1006.0	7.0	8.0	17.8	29.7	No	No

5 rows × 23 columns

raw_df.shape

(145460, 23)

raw_df.info()

<class 'pandas.core.frame.DataFrame'>
RangeIndex: 145460 entries, 0 to 145459
Data columns (total 23 columns):
 #   Column         Non-Null Count   Dtype  
---  ------         --------------   -----  
 0   Date           145460 non-null  object 
 1   Location       145460 non-null  object 
 2   MinTemp        143975 non-null  float64
 3   MaxTemp        144199 non-null  float64
 4   Rainfall       142199 non-null  float64
 5   Evaporation    82670 non-null   float64
 6   Sunshine       75625 non-null   float64
 7   WindGustDir    135134 non-null  object 
 8   WindGustSpeed  135197 non-null  float64
 9   WindDir9am     134894 non-null  object 
 10  WindDir3pm     141232 non-null  object 
 11  WindSpeed9am   143693 non-null  float64
 12  WindSpeed3pm   142398 non-null  float64
 13  Humidity9am    142806 non-null  float64
 14  Humidity3pm    140953 non-null  float64
 15  Pressure9am    130395 non-null  float64
 16  Pressure3pm    130432 non-null  float64
 17  Cloud9am       89572 non-null   float64
 18  Cloud3pm       86102 non-null   float64
 19  Temp9am        143693 non-null  float64
 20  Temp3pm        141851 non-null  float64
 21  RainToday      142199 non-null  object 
 22  RainTomorrow   142193 non-null  object 
dtypes: float64(16), object(7)
memory usage: 25.5+ MB

raw_df.describe()

	MinTemp	MaxTemp	Rainfall	Evaporation	Sunshine	WindGustSpeed	WindSpeed9am	WindSpeed3pm	Humidity9am	Humidity3pm	Pressure9am	Pressure3pm	Cloud9am	Cloud3pm	Temp9am	Temp3pm
count	143975.000000	144199.000000	142199.000000	82670.000000	75625.000000	135197.000000	143693.000000	142398.000000	142806.000000	140953.000000	130395.00000	130432.000000	89572.000000	86102.000000	143693.000000	141851.00000
mean	12.194034	23.221348	2.360918	5.468232	7.611178	40.035230	14.043426	18.662657	68.880831	51.539116	1017.64994	1015.255889	4.447461	4.509930	16.990631	21.68339
std	6.398495	7.119049	8.478060	4.193704	3.785483	13.607062	8.915375	8.809800	19.029164	20.795902	7.10653	7.037414	2.887159	2.720357	6.488753	6.93665
min	-8.500000	-4.800000	0.000000	0.000000	0.000000	6.000000	0.000000	0.000000	0.000000	0.000000	980.50000	977.100000	0.000000	0.000000	-7.200000	-5.40000
25%	7.600000	17.900000	0.000000	2.600000	4.800000	31.000000	7.000000	13.000000	57.000000	37.000000	1012.90000	1010.400000	1.000000	2.000000	12.300000	16.60000
50%	12.000000	22.600000	0.000000	4.800000	8.400000	39.000000	13.000000	19.000000	70.000000	52.000000	1017.60000	1015.200000	5.000000	5.000000	16.700000	21.10000
75%	16.900000	28.200000	0.800000	7.400000	10.600000	48.000000	19.000000	24.000000	83.000000	66.000000	1022.40000	1020.000000	7.000000	7.000000	21.600000	26.40000
max	33.900000	48.100000	371.000000	145.000000	14.500000	135.000000	130.000000	87.000000	100.000000	100.000000	1041.00000	1039.600000	9.000000	9.000000	40.200000	46.70000

探索性数据分析

import matplotlib
import matplotlib.pyplot as plt
import seaborn as sns
%matplotlib inline

sns.set_style('darkgrid')
# matplotlib.rcParams['font.size'] = 14
# matplotlib.rcParams['figure.figsize'] = (14, 6)
# matplotlib.rcParams['figure.facecolor'] = '#00000000'

sns.boxplot(y = raw_df.Rainfall);

在这里插入图片描述

sns.distplot(a = raw_df.Rainfall, label = 'Rainfall'

最低0.47元/天解锁文章

追忆无义

关注

1
点赞
踩
2

收藏

觉得还不错? 一键收藏
0
评论
逻辑回归训练

使用 Python 和 Scikit-Learn 实现了逻辑回归，并构建了一个分类器来预测澳大利亚明天是否会下雨
复制链接

扫一扫

专栏目录