Kaggle 数据清洗挑战 Day 3 - 快速解析日期（date）数据

最新推荐文章于 2022-03-29 16:41:30 发布

风控大鱼

最新推荐文章于 2022-03-29 16:41:30 发布

阅读量1k

点赞数

分类专栏：数据科学文章标签：数据分析数据挖掘大数据

本文链接：https://blog.csdn.net/cyan_soul/article/details/79751632

版权

今天是 Kaggle 数据清洗挑战的第三天，任务是解析 date 型数据。相信我们都遇到过此类情况，拿到的数据集中有需要分析的日期数据，但它们的类型是 String，不便作图，也不适合作为一个 factor 帮助我们进行预测。也可能你拿到的是 Timestamp 类型的数据（如：2005-10-30 T 10:45 UTC），而你只需要年份和月份信息。遇到这些情况，我们都可以使用 python 对其进行解析～

具体分为 5 个部分：

Get our environment set up
Check the data type of our date column
Convert our date columns to datetime
Select just the day of the month from our column
Plot the day of the month to check the date parsing

1、搭建环境

首先还是引入需要的 lib 包和数据集，今天的数据是关于地震信息的：

# modules we'll use
import pandas as pd
import numpy as np
import seaborn as sns
import datetime

# read in our data
landslides = pd.read_csv("../input/landslide-events/catalog.csv")

# set seed for reproducibility
np.random.seed(0)