本案例的重点在于Matplotlib可视化的基础操作实战练习。
import os
import numpy as np
import pandas as pd
import matplotlib. pyplot as plt
import warnings
warnings. filterwarnings( "ignore" )
os. chdir( "D:\Datalu\File" )
% matplotlib inline
plt. rcParams[ "font.sans-serif" ] = [ "KAITI" ]
plt. rcParams[ "axes.unicode_minus" ] = False
一、问题的提出
1.查看一年的平均气温
2.查看一月份的气温
3.每个月的平均气温(柱状图和箱线图)
1.1 导入两份气温数据
df1 = pd. read_csv( "temperature_outdoor_2014.tsv" , delimiter= "\t" , names= [ "time" , "outdoor" ] )
df1. head( 2 )
time
outdoor
0
1388530986
4.38
1
1388531586
4.25
df2 = pd. read_csv( "temperature_indoor_2014.tsv" , delimiter= "\t" , names= [ "time2" , "indoor" ] )
df2. head( 2 )
time2
indoor
0
1388530986
21.94
1
1388531586
22.00
df = pd. concat( [ df1, df2] , join= "inner" , axis= 1 )
df
time
outdoor
time2
indoor
0
1388530986
4.38
1388530986
21.94
1
1388531586
4.25
1388531586
22.00
2
1388532187
4.19
1388532187
22.00
3
1388532787
4.06
1388532787
22.00
4
1388533388
4.06
1388533388
22.00
...
...
...
...
...
49540
1419975991
1.44
1419977793
11.75
49541
1419976592
1.50
1419978393
11.75
49542
1419977192
1.50
1419978994
11.75
49543
1419977793
1.56
1419979595
11.75
49544
1419978393
1.62
1419980195
11.81
49545 rows × 4 columns
df. columns
Index(['time', 'outdoor', 'time2', 'indoor'], dtype='object')
df. drop( 'time2' , axis= 1 , inplace= True )
df. head( 2 )
time
outdoor
indoor
0
1388530986
4.38
21.94
1
1388531586
4.25
22.00
dt1 = df. copy( )
数据集一共有三列数据,其中一列是时间戳,两列是气温数据
这里有两种方法可以将其转化为时间按,第一种是在导入文件时,第二种是用to_datetime方法
二、查看数据基本信息
dt1. info( memory_usage= "deep" )
<class 'pandas.core.frame.DataFrame'>
RangeIndex: 49545 entries, 0 to 49544
Data columns (total 3 columns):
# Column Non-Null Count Dtype
--- ------ -------------- -----
0 time 49545 non-null int64
1 outdoor 49545 non-null float64
2 indoor 49545 non-null float64
dtypes: float64(2), int64(1)
memory usage: 1.1 MB
dt1. values
array([[1.38853099e+09, 4.38000000e+00, 2.19400000e+01],
[1.38853159e+09, 4.25000000e+00, 2.20000000e+01],
[1.38853219e+09, 4.19000000e+00, 2.20000000e+01],
...,
[1.41997719e+09, 1.50000000e+00, 1.17500000e+01],
[1.41997779e+09, 1.56000000e+00, 1.17500000e+01],
[1.41997839e+09, 1.62000000e+00, 1.18100000e+01]])
dt1. values[ : , 0 ]
array([1.38853099e+09, 1.38853159e+09, 1.38853219e+09, ...,
1.41997719e+09, 1.41997779e+09, 1.41997839e+09])
dt1. time. values
array([1388530986, 1388531586, 1388532187, ..., 1419977192, 1419977793,
1419978393], dtype=int64)
2.1 将时间戳转换为日期时间格式
dt1[ "time" ] = pd. Timestamp( dt1[ "time" ] , unit= "s" )
dt1
---------------------------------------------------------------------------
TypeError Traceback (most recent call last)
<ipython-input-140-255e05936dae> in <module>
----> 1 dt1["time"] = pd.Timestamp(dt1["time"],unit="s") #这样转换时间戳是错误的
2 dt1
pandas\_libs\tslibs\timestamps.pyx in pandas._libs.tslibs.timestamps.Timestamp.__new__()
pandas\_libs\tslibs\conversion.pyx in pandas._libs.tslibs.conversion.convert_to_tsobject()
TypeError: Cannot convert input [0 1388530986
1 1388531586
2 1388532187
3 1388532787
4 1388533388
...
49540 1419975991
49541 1419976592
49542 1419977192
49543 1419977793
49544 1419978393
Name: time, Length: 49545, dtype: int64] of type <class 'pandas.core.series.Series'> to Timestamp
df[ "time" ] = df[ "time" ] . apply ( lambda x: pd. TimeStamp( x) )
df
------------------