Pandas系列教程(三)

最新推荐文章于 2024-09-03 09:14:54 发布

看透说不透

最新推荐文章于 2024-09-03 09:14:54 发布

阅读量2k

点赞数

分类专栏： Pandas系列学习教程文章标签： python pandas dataScience

Pandas系列学习教程专栏收录该内容

7 篇文章 2 订阅

订阅专栏

Pandas系列教程（三）

对于想要入门数据科学的朋友们来说，Python是一个很好的选择，除了因为简单的语法外，Python 生态中提供了很多在数值计算方面非常优秀的库，其中Pandas不可不提，Pandas是很强大是数据集处理工具，往往和numpy, matplotlib 等库搭配使用，我也是刚刚开始学习Pandas,　顺便翻译了一下官方的Pandas教程,　这里使用的是jupyter notebook, 因为博客不支持html直接编辑，所以只能转化为markdown 格式，如果想直接查看html版本可点击每一节下的链接。本文仅供学习和交流使用，欢迎大家交流和指正！

摘要

创建数据，这次的数据集是描述2009-2012年欧洲四个市场的客流量，用to_excel()得到excel格式文件,用date_range()函数得到基于日期与传入频率的列表等（较常用)
获取数据，使用pandas.read_excel()函数从excel格式中读取数据集
准备数据(make data clean)　即确保生成的数据集正确无误，主要是检查类型，这里我们检查了status需要为1,，也对市场进行了筛选
分析数据，本节详细运用了df.groupby(),sum()，transform()，lambda表达式等对数据进行了分组，排序，求和，合并，增删等等，还用 percentile 排除了异常点让图像更平滑
可视化，使用matplotlib，直观地显示我们的数据,

HTML版本点击此处

# 导入所需的库
import pandas as pd
import matplotlib.pyplot as plt
import numpy.random as np
import sys
import matplotlib

%matplotlib inline

# 打印版本信息
print('Python version ' + sys.version)
print('Pandas version: ' + pd.__version__)
print('Matplotlib version ' + matplotlib.__version__)

Python version 3.6.5 |Anaconda, Inc.| (default, Apr 29 2018, 16:14:56) 
[GCC 7.2.0]
Pandas version: 0.23.0
Matplotlib version 2.2.2

# 设置随机种子保证结果相同
np.seed(111)

# 创建测试数据的函数,默认记录数为1
def CreateDataSet(Number=1):
    
    Output = []
    
    for i in range(Number):
        # 创建以周为单位的日期列表，其中freq='W-MON'表示以周为频率，且计入每周周一
        rng = pd.date_range(start='1/1/2009',end='12/31/2012',freq='W-MON')

        # 创建rng中每一天的数据,size参数表示数量
        data = np.randint(low=25,high=1000,size=len(rng))

        # 状态列表
        status = [1,2,3]

        # 创建随机状态列表,包含1,2,3三种状态
        random_status = [status[np.randint(0,len(status))] for i in range(len(rng))]

        # 状态池
        states = ['GA','FL','fl','NY','NJ','TX']

        random_states = [states[np.randint(0,len(states))] for i in range(len(rng))]

        # 将我们得到的四个随机列表合并
        Output.extend(zip(random_states,random_status,data,rng))
    
    return Output

dataset = CreateDataSet()
df = pd.DataFrame(data=dataset,columns=['State','Status','CustomerCount','StatusDate'])
df.info()

<class 'pandas.core.frame.DataFrame'>
RangeIndex: 209 entries, 0 to 208
Data columns (total 4 columns):
State            209 non-null object
Status           209 non-null int64
CustomerCount    209 non-null int64
StatusDate       209 non-null datetime64[ns]
dtypes: datetime64[ns](1), int64(2), object(1)
memory usage: 6.6+ KB

# 比较两次结果的差异,number指定为4之后年份是从09-12的四次循环
dataset = CreateDataSet(4)
df = pd.DataFrame(data=dataset,columns=['State','Status','CustomerCount','StatusDate'])
df.info()

<class 'pandas.core.frame.DataFrame'>
RangeIndex: 836 entries, 0 to 835
Data columns (total 4 columns):
State            836 non-null object
Status           836 non-null int64
CustomerCount    836 non-null int64
StatusDate       836 non-null datetime64[ns]
dtypes: datetime64[ns](1), int64(2), object(1)
memory usage: 26.2+ KB

df.head()

	State	Status	CustomerCount	StatusDate
0	fl	3	122	2009-01-05
1	fl	3	955	2009-01-12
2	NY	1	522	2009-01-19
3	fl	3	436	2009-01-26
4	TX	2	398	2009-02-02

# 用dataFrame的to_excel()函数将结果保存为excel格式文件

df.to_excel('Lesson3.xlsx',index=False)
print('Done!')

Done!

# 用pd.read_excel()从excel文件中读取数据

# 路径位置,windows下需要用r' ' 修正带反斜杠的路径
Location = 'Lesson3.xlsx'

# index_col 参数传入指定的索引列，可参考下面的df.index
df = pd.read_excel(Location,0,index_col='StatusDate')
df.dtypes

State            object
Status            int64
CustomerCount     int64
dtype: object

df.index

DatetimeIndex(['2009-01-05', '2009-01-12', '2009-01-19', '2009-01-26',
               '2009-02-02', '2009-02-09', '2009-02-16', '2009-02-23',
               '2009-03-02', '2009-03-09',
               ...
               '2012-10-29', '2012-11-05', '2012-11-12', '2012-11-19',
               '2012-11-26', '2012-12-03', '2012-12-10', '2012-12-17',
               '2012-12-24', '2012-12-31'],
              dtype='datetime64[ns]', name='StatusDate', length=836, freq=None)

# 注意比较，第一列变成了索引列，而不再是默认的0,1,2...
df.head()

	State	Status	CustomerCount
StatusDate
2009-01-05	fl	3	122
2009-01-12	fl	3	955
2009-01-19	NY	1	522
2009-01-26	fl	3	436
2009-02-02	TX	2	398

"""
检查数据的正确性
- state列的字母都要大写
- 只选择status列为1的数据
- 将state列的NJ与NY合并为NY
- 删除一些很大或者很小的特殊值(customerCount列)

"""

df['State'].unique()

array(['fl', 'NY', 'TX', 'GA', 'NJ', 'FL'], dtype=object)

# 把state列都转化为大写，运用lambda表达式
df['State'] = df.State.apply(lambda x: x.upper())

df['State'].unique()

array(['FL', 'NY', 'TX', 'GA', 'NJ'], dtype=object)

# 只选中status 列为1的记录

# mask掩膜
mask = df['Status'] == 1
df = df[mask]

df.head()

	State	Status	CustomerCount
StatusDate
2009-01-19	NY	1	522
2009-03-16	NY	1	355
2009-03-23	GA	1	972
2009-04-20	NY	1	820
2009-04-27	NY	1	447

# 把NJ与NY合并成NY

mask = df.State == 'NJ'
df.State[mask] = 'NY'

df.head(10)

	State	Status	CustomerCount
StatusDate
2009-01-19	NY	1	522
2009-03-16	NY	1	355
2009-03-23	GA	1	972
2009-04-20	NY	1	820
2009-04-27	NY	1	447
2009-05-11	NY	1	987
2009-05-18	TX	1	377
2009-05-25	NY	1	378
2009-06-08	TX	1	45
2009-06-22	TX	1	896

df['State'].unique()

array(['NY', 'GA', 'TX', 'FL'], dtype=object)

# 绘制顾客数量的变化图, plot(), figsize()指定图的大小

df['CustomerCount'].plot(figsize=(15,5))

<matplotlib.axes._subplots.AxesSubplot at 0x7f33f95be160>

在这里插入图片描述

# 将state为'NY'的数据依照index列进行排序，

sortdf = df[df['State']=='NY'].sort_index(axis=0)
sortdf.head(10)

	State	Status	CustomerCount
StatusDate
2009-01-19	NY	1	522
2009-02-23	NY	1	710
2009-03-16	NY	1	355
2009-03-23	NY	1	728
2009-03-30	NY	1	863
2009-04-13	NY	1	520
2009-04-20	NY	1	820
2009-04-20	NY	1	937
2009-04-27	NY	1	447
2009-05-04	NY	1	497

"""
接下来我们将用groupby(),sum()等函数得到顾客数/天/状态
因为我们需要将statusDate与state作为分组依据传入groupby函数，而groupby接受的
参数类型是columns，所以我们需要先用reset_index()把statusDate从index转回column
"""

# 注意State,StatusDate的顺序
Daily = df.reset_index().groupby(['State','StatusDate']).sum()
Daily.head()

		Status	CustomerCount
State	StatusDate
FL	2009-01-12	1	319
	2009-02-02	1	653
	2009-03-23	1	752
	2009-04-06	1	795
	2009-04-20	1	788

"""
从上面的结果中我们可以看到State和StatusDate列自动变成了
index列，我们可以把index理解为数据库里的表的主键，只是数据帧里的
index列不要求唯一，我们可以通过这个index来进行选择，绘图，计算操作
"""

# 删除Status 列
del Daily['Status']
Daily.head()

		CustomerCount
State	StatusDate
FL	2009-01-12	319
	2009-02-02	653
	2009-03-23	752
	2009-04-06	795
	2009-04-20	788

# 输出现在Daily的index,得到的结果是MultiIndex多重索引

Daily.index

MultiIndex(levels=[['FL', 'GA', 'NY', 'TX'], [2009-01-12 00:00:00, 2009-01-19 00:00:00, 2009-02-02 00:00:00, 2009-02-23 00:00:00, 2009-03-16 00:00:00, 2009-03-23 00:00:00, 2009-03-30 00:00:00, 2009-04-06 00:00:00, 2009-04-13 00:00:00, 2009-04-20 00:00:00, 2009-04-27 00:00:00, 2009-05-04 00:00:00, 2009-05-11 00:00:00, 2009-05-18 00:00:00, 2009-05-25 00:00:00, 2009-06-08 00:00:00, 2009-06-22 00:00:00, 2009-07-06 00:00:00, 2009-07-13 00:00:00, 2009-07-20 00:00:00, 2009-07-27 00:00:00, 2009-08-10 00:00:00, 2009-08-17 00:00:00, 2009-08-24 00:00:00, 2009-08-31 00:00:00, 2009-09-07 00:00:00, 2009-09-14 00:00:00, 2009-09-21 00:00:00, 2009-09-28 00:00:00, 2009-10-05 00:00:00, 2009-10-12 00:00:00, 2009-10-19 00:00:00, 2009-10-26 00:00:00, 2009-11-02 00:00:00, 2009-11-23 00:00:00, 2009-11-30 00:00:00, 2009-12-07 00:00:00, 2009-12-14 00:00:00, 2009-12-21 00:00:00, 2010-01-04 00:00:00, 2010-01-11 00:00:00, 2010-01-18 00:00:00, 2010-01-25 00:00:00, 2010-02-08 00:00:00, 2010-02-15 00:00:00, 2010-02-22 00:00:00, 2010-03-01 00:00:00, 2010-03-08 00:00:00, 2010-03-15 00:00:00, 2010-03-29 00:00:00, 2010-04-05 00:00:00, 2010-04-12 00:00:00, 2010-04-19 00:00:00, 2010-04-26 00:00:00, 2010-05-03 00:00:00, 2010-05-10 00:00:00, 2010-05-17 00:00:00, 2010-05-24 00:00:00, 2010-05-31 00:00:00, 2010-06-14 00:00:00, 2010-06-21 00:00:00, 2010-06-28 00:00:00, 2010-07-05 00:00:00, 2010-07-19 00:00:00, 2010-07-26 00:00:00, 2010-08-02 00:00:00, 2010-08-16 00:00:00, 2010-08-30 00:00:00, 2010-09-06 00:00:00, 2010-09-13 00:00:00, 2010-09-20 00:00:00, 2010-09-27 00:00:00, 2010-10-04 00:00:00, 2010-10-11 00:00:00, 2010-10-18 00:00:00, 2010-10-25 00:00:00, 2010-11-01 00:00:00, 2010-11-08 00:00:00, 2010-11-15 00:00:00, 2010-11-29 00:00:00, 2010-12-20 00:00:00, 2011-01-03 00:00:00, 2011-01-10 00:00:00, 2011-01-17 00:00:00, 2011-01-31 00:00:00, 2011-02-07 00:00:00, 2011-02-14 00:00:00, 2011-02-21 00:00:00, 2011-02-28 00:00:00, 2011-03-07 00:00:00, 2011-03-14 00:00:00, 2011-04-04 00:00:00, 2011-04-18 00:00:00, 2011-04-25 00:00:00, 2011-05-02 00:00:00, 2011-05-09 00:00:00, 2011-05-16 00:00:00, 2011-05-23 00:00:00, 2011-05-30 00:00:00, 2011-06-13 00:00:00, 2011-06-20 00:00:00, 2011-06-27 00:00:00, 2011-07-04 00:00:00, 2011-07-11 00:00:00, 2011-07-25 00:00:00, 2011-08-01 00:00:00, 2011-08-08 00:00:00, 2011-08-15 00:00:00, 2011-08-29 00:00:00, 2011-09-05 00:00:00, 2011-09-12 00:00:00, 2011-09-19 00:00:00, 2011-09-26 00:00:00, 2011-10-03 00:00:00, 2011-10-24 00:00:00, 2011-10-31 00:00:00, 2011-11-07 00:00:00, 2011-11-14 00:00:00, 2011-11-28 00:00:00, 2011-12-05 00:00:00, 2011-12-19 00:00:00, 2011-12-26 00:00:00, 2012-01-02 00:00:00, 2012-01-16 00:00:00, 2012-01-23 00:00:00, 2012-01-30 00:00:00, 2012-02-06 00:00:00, 2012-02-13 00:00:00, 2012-02-20 00:00:00, 2012-02-27 00:00:00, 2012-03-05 00:00:00, 2012-03-12 00:00:00, 2012-03-19 00:00:00, 2012-03-26 00:00:00, 2012-04-02 00:00:00, 2012-04-09 00:00:00, 2012-04-23 00:00:00, 2012-04-30 00:00:00, 2012-05-07 00:00:00, 2012-05-14 00:00:00, 2012-05-28 00:00:00, 2012-06-04 00:00:00, 2012-06-18 00:00:00, 2012-07-09 00:00:00, 2012-07-16 00:00:00, 2012-07-30 00:00:00, 2012-08-06 00:00:00, 2012-08-20 00:00:00, 2012-08-27 00:00:00, 2012-09-03 00:00:00, 2012-09-10 00:00:00, 2012-09-17 00:00:00, 2012-09-24 00:00:00, 2012-10-01 00:00:00, 2012-10-08 00:00:00, 2012-10-15 00:00:00, 2012-10-22 00:00:00, 2012-10-29 00:00:00, 2012-11-05 00:00:00, 2012-11-12 00:00:00, 2012-11-19 00:00:00, 2012-11-26 00:00:00, 2012-12-10 00:00:00, 2012-12-17 00:00:00]],
           labels=[[0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 2, 2, 2, 2, 2, 2, 2, 2, 2, 2, 2, 2, 2, 2, 2, 2, 2, 2, 2, 2, 2, 2, 2, 2, 2, 2, 2, 2, 2, 2, 2, 2, 2, 2, 2, 2, 2, 2, 2, 2, 2, 2, 2, 2, 2, 2, 2, 2, 2, 2, 2, 2, 2, 2, 2, 2, 2, 2, 2, 2, 2, 2, 2, 2, 2, 2, 2, 2, 2, 2, 2, 2, 2, 2, 2, 2, 3, 3, 3, 3, 3, 3, 3, 3, 3, 3, 3, 3, 3, 3, 3, 3, 3, 3, 3, 3, 3, 3, 3, 3, 3, 3, 3, 3, 3, 3, 3, 3, 3], [0, 2, 5, 7, 9, 15, 16, 17, 18, 21, 23, 25, 26, 28, 29, 33, 36, 40, 42, 43, 44, 45, 47, 48, 50, 53, 55, 57, 58, 61, 62, 64, 67, 69, 70, 71, 72, 73, 77, 78, 79, 80, 87, 89, 90, 92, 93, 96, 97, 100, 103, 104, 105, 108, 109, 110, 113, 115, 118, 119, 124, 126, 127, 128, 129, 132, 137, 138, 139, 142, 147, 151, 152, 153, 154, 159, 163, 5, 6, 20, 21, 22, 25, 26, 32, 33, 35, 41, 46, 49, 50, 56, 59, 67, 68, 70, 72, 74, 75, 76, 80, 83, 85, 87, 91, 93, 95, 106, 109, 110, 123, 128, 133, 135, 141, 160, 161, 162, 1, 3, 4, 5, 6, 8, 9, 10, 11, 12, 14, 17, 19, 22, 24, 26, 27, 29, 30, 31, 34, 37, 38, 39, 40, 41, 51, 52, 55, 56, 63, 65, 71, 73, 74, 76, 79, 81, 82, 83, 84, 88, 91, 92, 94, 98, 101, 102, 103, 105, 107, 108, 111, 112, 113, 114, 117, 122, 123, 125, 126, 130, 131, 132, 134, 140, 141, 143, 144, 145, 147, 149, 150, 156, 158, 160, 13, 15, 16, 46, 50, 54, 57, 58, 60, 66, 69, 77, 86, 90, 92, 98, 99, 107, 113, 116, 120, 121, 131, 136, 137, 138, 142, 144, 146, 147, 148, 155, 157]],
           names=['State', 'StatusDate'])

# 只获取state索引
Daily.index.levels[0]

Index(['FL', 'GA', 'NY', 'TX'], dtype='object', name='State')

# 获取statusDate索引
Daily.index.levels[1]

DatetimeIndex(['2009-01-12', '2009-01-19', '2009-02-02', '2009-02-23',
               '2009-03-16', '2009-03-23', '2009-03-30', '2009-04-06',
               '2009-04-13', '2009-04-20',
               ...
               '2012-10-08', '2012-10-15', '2012-10-22', '2012-10-29',
               '2012-11-05', '2012-11-12', '2012-11-19', '2012-11-26',
               '2012-12-10', '2012-12-17'],
              dtype='datetime64[ns]', name='StatusDate', length=164, freq=None)

# 可视化，把每一个state都绘制出来,用loc定位到每一个state
Daily.loc['FL'].plot()
Daily.loc['GA'].plot()
Daily.loc['NY'].plot()
Daily.loc['TX'].plot();

在这里插入图片描述

# 我们也可以在指定state的基础上指定年份,如绘制2012年的顾客数
Daily.loc['FL']['2012':].plot()
Daily.loc['GA']['2012':].plot()
Daily.loc['NY']['2012':].plot()
Daily.loc['TX']['2012':].plot();

在这里插入图片描述

"""
通过上面的图像，我们可以寻找异常值（超过特定范围）然后删除异常值,确保我们最终
得到平滑的图像。
StateYearMonth -这里我们将依照state,year,month(statusDate的属性)来分组
Daily['Outlier'] - 异常值

注意我们这里用的是transfrom而不是apply函数，因为transform　能保证我们数据帧的行
数与列数不变。此外，从上面的图像中可看到，数据不是高斯分布的，所以我们在指定正常
范围的时候不能用stddev与mean等标准，这里我们用的是百分数
"""

# 分组
StateYearMonth = Daily.groupby([Daily.index.get_level_values(0), Daily.index.get_level_values(1).year, Daily.index.get_level_values(1).month])
# 下界
Daily['Lower'] = StateYearMonth['CustomerCount'].transform( lambda x: x.quantile(q=.25) - (1.5*x.quantile(q=.75)-x.quantile(q=.25)) )
# 上界
Daily['Upper'] = StateYearMonth['CustomerCount'].transform( lambda x: x.quantile(q=.75) + (1.5*x.quantile(q=.75)-x.quantile(q=.25)) )
Daily['Outlier'] = (Daily['CustomerCount'] < Daily['Lower']) | (Daily['CustomerCount'] > Daily['Upper']) 

# 删除异常值
Daily = Daily[Daily['Outlier'] == False]

Daily.head()

		CustomerCount	Lower	Upper	Outlier
State	StatusDate
FL	2009-01-12	319	159.500	478.500	False
	2009-02-02	653	326.500	979.500	False
	2009-03-23	752	376.000	1128.000	False
	2009-04-06	795	389.625	1193.375	False
	2009-04-20	788	389.625	1193.375	False

# 接下来我们把所有同一天内但state不同的数据加起来并得到每个月的最大顾客数量

ALL = pd.DataFrame(Daily['CustomerCount'].groupby(Daily.index.get_level_values(1)).sum())

ALL.columns = ['CustomerCount']

YearMonth = ALL.groupby([lambda x:x.year,lambda x:x.month])


# 得到每年和每个月最大的客流量
ALL['Max'] = YearMonth['CustomerCount'].transform(lambda x:x.max())
ALL.head()

	CustomerCount	Max
StatusDate
2009-01-12	319	522
2009-01-19	522	522
2009-02-02	653	710
2009-02-23	710	710
2009-03-16	355	2452

"""
我们给客流量设上目标
- 12/31/2011 - 1,000 
- 12/31/2012 - 2,000 
- 12/31/2013 - 3,000 
"""

data = [1000,2000,3000]

#这里把freq参数设置为a表示annual 每年
idx = pd.date_range(start='12/31/2011',end='12/31/2013',freq='A')
BHAG = pd.DataFrame(data,index=idx,columns=['BHAG'])
BHAG

	BHAG
2011-12-31	1000
2012-12-31	2000
2013-12-31	3000

# 把BHAG和ALL 数据集合并

combined = pd.concat([ALL,BHAG],axis=0)
combined = combined.sort_index(axis=0)
combined.tail()

/home/dreamboy/anaconda3/lib/python3.6/site-packages/ipykernel_launcher.py:3: FutureWarning: Sorting because non-concatenation axis is not aligned. A future version
of pandas will change to not sort by default.

To accept the future behavior, pass 'sort=True'.

To retain the current behavior and silence the warning, pass sort=False

  This is separate from the ipykernel package so we can avoid doing imports until

	BHAG	CustomerCount	Max
2012-11-26	NaN	367.0	1042.0
2012-12-10	NaN	769.0	817.0
2012-12-17	NaN	817.0	817.0
2012-12-31	2000.0	NaN	NaN
2013-12-31	3000.0	NaN	NaN

fig,axes = plt.subplots(figsize=(12,7))

# 这里的method设置为pad表示用前一个非缺失值去替换该缺失值
combined['BHAG'].fillna(method='pad').plot(color='green', label='BHAG')
combined['Max'].plot(color='blue', label='All Markets')
plt.legend(loc='best');

在这里插入图片描述

# 接下来我们来试着预测明年的顾客量

Year = combined.groupby(lambda x:x.year).max()
Year

	BHAG	CustomerCount	Max
2009	NaN	2545.0	2545.0
2010	NaN	1819.0	1819.0
2011	1000.0	1957.0	1957.0
2012	2000.0	1689.0	1689.0
2013	3000.0	NaN	NaN

# 增加用来表示每年变化情况的一列
Year['YR_PCT_CHANGE'] = Year['Max'].pct_change(periods=1)
Year

	BHAG	CustomerCount	Max	YR_PCT_CHANGE
2009	NaN	2545.0	2545.0	NaN
2010	NaN	1819.0	1819.0	-0.285265
2011	1000.0	1957.0	1957.0	0.075866
2012	2000.0	1689.0	1689.0	-0.136944
2013	3000.0	NaN	NaN	0.000000

# 假设增长率不变，计算明年的顾客量
(1 + Year.ix[2012,'YR_PCT_CHANGE']) * Year.loc[2012,'Max']

/home/dreamboy/anaconda3/lib/python3.6/site-packages/ipykernel_launcher.py:2: DeprecationWarning: 
.ix is deprecated. Please use
.loc for label based indexing or
.iloc for positional indexing

See the documentation here:
http://pandas.pydata.org/pandas-docs/stable/indexing.html#ix-indexer-is-deprecated
  





1457.7010730710272

# 绘图
# 总的数据，即把state相加得到的结果
ALL['Max'].plot(figsize=(10, 5));plt.title('ALL Markets')

# 各个市场，用子图
fig, axes = plt.subplots(nrows=2, ncols=2, figsize=(20, 10))
fig.subplots_adjust(hspace=1.0) #　在子图之间开出空白空间

Daily.loc['FL']['CustomerCount']['2012':].fillna(method='pad').plot(ax=axes[0,0])
Daily.loc['GA']['CustomerCount']['2012':].fillna(method='pad').plot(ax=axes[0,1]) 
Daily.loc['TX']['CustomerCount']['2012':].fillna(method='pad').plot(ax=axes[1,0]) 
Daily.loc['NY']['CustomerCount']['2012':].fillna(method='pad').plot(ax=axes[1,1]) 

# 增加标题
axes[0,0].set_title('Florida')
axes[0,1].set_title('Georgia')
axes[1,0].set_title('Texas')
axes[1,1].set_title('North East');