Datawhale AI 夏令营——task01代码解析-CSDN博客

本文链接：https://blog.csdn.net/shf10/article/details/140425426

Datawhale AI 夏令营——task01代码解析

赛题说明
Baseline代码解析

赛题说明

题目讲解

本场以“电力需求预测”为赛题的数据算法挑战赛。选手需要根据历史数据构建有效的模型，能够准确的预测未来电力需求。
给定多个房屋对应电力消耗历史N天的相关序列数据等信息，预测房屋对应电力的消耗。

数据说明

赛题数据由训练集和测试集组成，为了保证比赛的公平性，将每日日期进行脱敏，用1-N进行标识，即1为数据集最近一天，其中1-10为测试集数据。数据集由字段id（房屋id）、 dt（日标识）、type（房屋类型）、target（实际电力消耗）组成。

id	dt	type	target
房屋id	日标识	房屋类型	实际电力消耗/预测目标

实际特征只有 dt 和 type

评测规则

$\frac{1}{n} \sum_{i=1}^n( y_i - \hat y_i)^2$

loss越小，预测值越接近真实值，预测越准确

Baseline代码解析

导入数据科学常用包 —— pandas， numpy

# 1. 导入需要用到的相关库
# 导入 pandas 库，用于数据处理和分析
import pandas as pd
# 导入 numpy 库，用于科学计算和多维数组操作
import numpy as np

数据探索性分析

发现 train set 中含 id dt type target 四属性。 test set 含 id dt type 三属性，需自行预测 target
train set中 dt 由10开始递增；test set中

# 2. 读取训练集和测试集
# 使用 read_csv() 函数从文件中读取训练集数据，文件名为 'train.csv'
train = pd.read_csv('./train.csv')
# 使用 read_csv() 函数从文件中读取测试集数据，文件名为 'train.csv'
test = pd.read_csv('./test.csv')

train[:12]

	id	dt	type	target
0	00037f39cf	11	2	44.050
1	00037f39cf	12	2	50.672
2	00037f39cf	13	2	39.042
3	00037f39cf	14	2	35.900
4	00037f39cf	15	2	53.888
5	00037f39cf	16	2	35.534
6	00037f39cf	17	2	41.280
7	00037f39cf	18	2	26.114
8	00037f39cf	19	2	25.612
9	00037f39cf	20	2	44.451
10	00037f39cf	21	2	50.395
11	00037f39cf	22	2	34.662

test[:12]

	id	dt	type
0	00037f39cf	1	2
1	00037f39cf	2	2
2	00037f39cf	3	2
3	00037f39cf	4	2
4	00037f39cf	5	2
5	00037f39cf	6	2
6	00037f39cf	7	2
7	00037f39cf	8	2
8	00037f39cf	9	2
9	00037f39cf	10	2
10	00039a1517	1	4
11	00039a1517	2	4