padas数据结构:DataFrame
文档地址:
http://pandas.pydata.org/pandas-docs/stable/dsintro.html#dataframe
import pandas as pd
user1 = pd.Series(["jack","男",22], index=["name","sex","age"])
user2 = pd.Series(["lily","女",21], index=["name","sex","age"])
users = pd.DataFrame({101:user1, 102:user2})
print(users)
"""
101 102
name jack lily
sex 男 女
age 22 21
"""
print(users.index) # Index(['name', 'sex', 'age'], dtype='object')
print(users.columns) # Int64Index([101, 102], dtype='int64')
# 还可以这样
user_list = [("jack","男",22), ("lily","女",21)]
users = pd.DataFrame(user_list)
print(users)
"""
0 1 2
0 jack 男 22
1 lily 女 21
"""
设置列名:
user_list = [("jack","男",22), ("lily","女",21)]
users = pd.DataFrame(user_list,columns=["name","sex","age"]) # 给设置列名
print(users)
"""
name sex age
0 jack 男 22
1 lily 女 21
"""
查看类型:
print(type(users)) # <class 'pandas.core.frame.DataFrame'>
案列:基金指定月份同比数据对比
用我们前面抓取的基金数据,取2份数据。
比如:
2016年6月份的数据:print(fund.loc['2016-06'])
2017年6月份的数据:print(fund.loc['2017-06'])
实际上我们只要NAV(单位净值)
的比较,于是乎代码就变成了这样:
fund.loc["2016-06",["NAV"]]
fund.loc["2017-06",["NAV"]]
接下来就是把这2个DataFrame数据
连接在一起(并集),连接才方便比较啊,但是我们发现这两者的时间并不是一致的,而且是不连续的。
完整代码:
# coding: utf-8
import pandas as pd
fund = pd.read_csv("./csv/519961.csv", dtype={"fcode": pd.np.str_}, index_col="fdate", parse_dates=["fdate"])
#print(fund)
f2016 = fund.loc["2016-06",["NAV"]].reset_index()
f2017 = fund.loc["2017-06",["NAV"]].reset_index()
# 计算6月的连续日期
allDates201606 = pd.DataFrame(pd.date_range("2016-06","2016-07",closed="left"),columns=["fdate"])
# 6月每天 去 和 当天数据 合并
f2016_data = pd.merge(allDates201606,f2016,how="left",on=["fdate"])
print(f2016_data)
"""
fdate NAV
0 2016-06-01 0.991
1 2016-06-02 0.991
2 2016-06-03 0.991
3 2016-06-04 NaN
4 2016-06-05 NaN
5 2016-06-06 0.991
6 2016-06-07 0.987
7 2016-06-08 0.987
8 2016-06-09 NaN
9 2016-06-10 NaN
10 2016-06-11 NaN
11 2016-06-12 NaN
12 2016-06-13 0.986
13 2016-06-14 0.987
14 2016-06-15 0.988
15 2016-06-16 0.988
16 2016-06-17 0.988
17 2016-06-18 NaN
18 2016-06-19 NaN
19 2016-06-20 0.988
20 2016-06-21 0.989
21 2016-06-22 0.989
22 2016-06-23 0.989
23 2016-06-24 0.989
24 2016-06-25 NaN
25 2016-06-26 NaN
26 2016-06-27 0.990
27 2016-06-28 0.991
28 2016-06-29 0.992
29 2016-06-30 0.992
"""
allDates201706 = pd.DataFrame(pd.date_range("2017-06","2017-07",closed="left"),columns=["fdate"])
# 6月每天 去 和 当天数据 合并
f2017_data = pd.merge(allDates201706,f2017,how="left")
print(f2017_data)
"""
fdate NAV
0 2017-06-01 0.991
1 2017-06-02 0.990
2 2017-06-03 NaN
3 2017-06-04 NaN
4 2017-06-05 0.989
5 2017-06-06 0.990
6 2017-06-07 0.994
7 2017-06-08 0.995
8 2017-06-09 0.995
9 2017-06-10 NaN
10 2017-06-11 NaN
11 2017-06-12 0.995
12 2017-06-13 0.996
13 2017-06-14 0.994
14 2017-06-15 0.994
15 2017-06-16 0.993
16 2017-06-17 NaN
17 2017-06-18 NaN
18 2017-06-19 0.994
19 2017-06-20 0.995
20 2017-06-21 0.996
21 2017-06-22 0.995
22 2017-06-23 0.994
23 2017-06-24 NaN
24 2017-06-25 NaN
25 2017-06-26 0.995
26 2017-06-27 0.993
27 2017-06-28 0.998
28 2017-06-29 0.998
29 2017-06-30 0.999
"""
# 最后合并2个月的
# 2016年6的数据 和2017年6月的数据 连接
result = pd.concat([f2016_data,f2017_data], axis=1)
print(result)
"""
fdate NAV fdate NAV
0 2016-06-01 0.991 2017-06-01 0.991
1 2016-06-02 0.991 2017-06-02 0.990
2 2016-06-03 0.991 2017-06-03 NaN
3 2016-06-04 NaN 2017-06-04 NaN
4 2016-06-05 NaN 2017-06-05 0.989
5 2016-06-06 0.991 2017-06-06 0.990
6 2016-06-07 0.987 2017-06-07 0.994
7 2016-06-08 0.987 2017-06-08 0.995
8 2016-06-09 NaN 2017-06-09 0.995
9 2016-06-10 NaN 2017-06-10 NaN
10 2016-06-11 NaN 2017-06-11 NaN
11 2016-06-12 NaN 2017-06-12 0.995
12 2016-06-13 0.986 2017-06-13 0.996
13 2016-06-14 0.987 2017-06-14 0.994
14 2016-06-15 0.988 2017-06-15 0.994
15 2016-06-16 0.988 2017-06-16 0.993
16 2016-06-17 0.988 2017-06-17 NaN
17 2016-06-18 NaN 2017-06-18 NaN
18 2016-06-19 NaN 2017-06-19 0.994
19 2016-06-20 0.988 2017-06-20 0.995
20 2016-06-21 0.989 2017-06-21 0.996
21 2016-06-22 0.989 2017-06-22 0.995
22 2016-06-23 0.989 2017-06-23 0.994
23 2016-06-24 0.989 2017-06-24 NaN
24 2016-06-25 NaN 2017-06-25 NaN
25 2016-06-26 NaN 2017-06-26 0.995
26 2016-06-27 0.990 2017-06-27 0.993
27 2016-06-28 0.991 2017-06-28 0.998
28 2016-06-29 0.992 2017-06-29 0.998
29 2016-06-30 0.992 2017-06-30 0.999
"""