1001系列之案例0002如何从斯德哥尔摩气温数据集中可视化挖掘

最新推荐文章于 2022-06-13 05:00:00 发布

DataMiningSharer

最新推荐文章于 2022-06-13 05:00:00 发布

阅读量430

点赞数

分类专栏： Python基础特征工程数据可视化文章标签： mysql hadoop python 数据挖掘机器学习

本文链接：https://blog.csdn.net/lqw844597536/article/details/117262110

版权

本案例的重点在于Matplotlib可视化的基础操作实战练习。

import os                   #导入必要的库
import numpy as np
import pandas as pd
import matplotlib.pyplot as plt
import warnings
warnings.filterwarnings("ignore")

os.chdir("D:\Datalu\File")  #指定工作目录

%matplotlib inline          #必要可视化设置
plt.rcParams["font.sans-serif"] = ["KAITI"]
plt.rcParams["axes.unicode_minus"] = False

一、问题的提出

1.查看一年的平均气温
2.查看一月份的气温
3.每个月的平均气温（柱状图和箱线图）

1.1 导入两份气温数据

#导入室外气温数据
df1 = pd.read_csv("temperature_outdoor_2014.tsv",delimiter="\t", names=["time", "outdoor"])
df1.head(2)

	time	outdoor
0	1388530986	4.38
1	1388531586	4.25

#导入室内气温数据
df2 = pd.read_csv("temperature_indoor_2014.tsv",delimiter="\t", names=["time2", "indoor"])
df2.head(2)

	time2	indoor
0	1388530986	21.94
1	1388531586	22.00

#合并两份数据
df = pd.concat([df1,df2],join="inner",axis=1)
df

	time	outdoor	time2	indoor
0	1388530986	4.38	1388530986	21.94
1	1388531586	4.25	1388531586	22.00
2	1388532187	4.19	1388532187	22.00
3	1388532787	4.06	1388532787	22.00
4	1388533388	4.06	1388533388	22.00
...	...	...	...	...
49540	1419975991	1.44	1419977793	11.75
49541	1419976592	1.50	1419978393	11.75
49542	1419977192	1.50	1419978994	11.75
49543	1419977793	1.56	1419979595	11.75
49544	1419978393	1.62	1419980195	11.81

49545 rows × 4 columns

df.columns

Index(['time', 'outdoor', 'time2', 'indoor'], dtype='object')

df.drop('time2',axis=1,inplace=True)

df.head(2)

	time	outdoor	indoor
0	1388530986	4.38	21.94
1	1388531586	4.25	22.00

dt1 = df.copy()

数据集一共有三列数据，其中一列是时间戳，两列是气温数据
这里有两种方法可以将其转化为时间按，第一种是在导入文件时，第二种是用to_datetime方法

二、查看数据基本信息

dt1.info(memory_usage="deep")    # 没有自动辨认成时间

<class 'pandas.core.frame.DataFrame'>
RangeIndex: 49545 entries, 0 to 49544
Data columns (total 3 columns):
 #   Column   Non-Null Count  Dtype  
---  ------   --------------  -----  
 0   time     49545 non-null  int64  
 1   outdoor  49545 non-null  float64
 2   indoor   49545 non-null  float64
dtypes: float64(2), int64(1)
memory usage: 1.1 MB

dt1.values   #查看数据集的值

array([[1.38853099e+09, 4.38000000e+00, 2.19400000e+01],
       [1.38853159e+09, 4.25000000e+00, 2.20000000e+01],
       [1.38853219e+09, 4.19000000e+00, 2.20000000e+01],
       ...,
       [1.41997719e+09, 1.50000000e+00, 1.17500000e+01],
       [1.41997779e+09, 1.56000000e+00, 1.17500000e+01],
       [1.41997839e+09, 1.62000000e+00, 1.18100000e+01]])

dt1.values[:,0]   #查看数据集某一列的值

array([1.38853099e+09, 1.38853159e+09, 1.38853219e+09, ...,
       1.41997719e+09, 1.41997779e+09, 1.41997839e+09])

dt1.time.values  #也可以通过列名来查看值

array([1388530986, 1388531586, 1388532187, ..., 1419977192, 1419977793,
       1419978393], dtype=int64)

2.1 将时间戳转换为日期时间格式

dt1["time"] = pd.Timestamp(dt1["time"],unit="s") #这样转换时间戳是错误的
dt1

---------------------------------------------------------------------------

TypeError                                 Traceback (most recent call last)

<ipython-input-140-255e05936dae> in <module>
----> 1 dt1["time"] = pd.Timestamp(dt1["time"],unit="s") #这样转换时间戳是错误的
      2 dt1


pandas\_libs\tslibs\timestamps.pyx in pandas._libs.tslibs.timestamps.Timestamp.__new__()


pandas\_libs\tslibs\conversion.pyx in pandas._libs.tslibs.conversion.convert_to_tsobject()


TypeError: Cannot convert input [0        1388530986
1        1388531586
2        1388532187
3        1388532787
4        1388533388
            ...    
49540    1419975991
49541    1419976592
49542    1419977192
49543    1419977793
49544    1419978393
Name: time, Length: 49545, dtype: int64] of type <class 'pandas.core.series.Series'> to Timestamp

df["time"] = df["time"].apply(lambda x:pd.TimeStamp(x))  #这样转换时间戳也是错误的
df

------------------

最低0.47元/天解锁文章

DataMiningSharer

关注

0
点赞
踩
1

收藏

觉得还不错? 一键收藏
2
评论
1001系列之案例0002如何从斯德哥尔摩气温数据集中可视化挖掘

本案例的重点在于Matplotlib可视化的基础操作实战练习。import os #导入必要的库import numpy as npimport pandas as pdimport matplotlib.pyplot as pltimport warningswarnings.filterwarnings("ignore")os.chdir("D:\Datalu\File") #指定工作目录%matplotlib inline .
复制链接

扫一扫