文章目录
import numpy as np
import pandas as pd
from pandas import Series,DataFrame
通过字典中包含字典,创建个DataFrame
dict_city={
'shanghai':{
2019:54000,2020:56000
},
'beijing':{
2018:66666,2019:77777
}
}
df_city=DataFrame(dict_city)
df_city
shanghai beijing
2019 54000.0 77777.0
2020 56000.0 NaN
2018 NaN 66666.0
增加数据 df.insert()
df.insert(loc,column,value,allow_duplicates=False)-> None
loc:索引位置
column:列名称
value:值
allow_duplicates:添加的这一列是否允许重复
None:返回值为空
# 新增深圳的数据到df_city中
df_city.insert(2,'shenzhen',[60000,70000,50000])
df_city.T.drop_duplicates()
2019 2020 2018
shanghai 54000.0 56000.0 NaN
beijing 77777.0 NaN 66666.0
shenzhen 60000.0 70000.0 50000.0
dict_city={
'shanghai':{
2019:54000,2020:56000
},
'beijing':{
2018:66666,2019:77777
}
}
df_city=DataFrame(dict_city)
df_city
shanghai beijing
2019 54000.0 77777.0
2020 56000.0 NaN
2018 NaN 66666.0
给行列去名称
df_city.index.name='年份'
df_city.columns.name='城市'
df_city
城市 shanghai beijing
年份
2019 54000.0 77777.0
2020 56000.0 NaN
2018 NaN 66666.0
df_city.index
Int64Index([2019, 2020, 2018], dtype='int64', name='年份')
我们可以单独创建索引对象,用来创建数据
pd.Index
s=Series([1,2,3,4],index=['a','b','c','d'])
s
a 1
b 2
c 3
d 4
dtype: int64
s2=Series([1,2,3,4],index=pd.Index(['a','b','c','d'],name='字母'))
s2
字母
a 1
b 2
c 3
d 4
dtype: int64
创建一个用日期做索引的数据
pd.date_range(start,end,periods)
start:开始日期
end:结束日期
periods:周期,间隔
pd.date_range('2020-08-01','2020-08-10')
DatetimeIndex(['2020-08-01', '2020-08-02', '2020-08-03', '2020-08-04',
'2020-08-05', '2020-08-06', '2020-08-07', '2020-08-08',
'2020-08-09', '2020-08-10'],
dtype='datetime64[ns]', freq='D')
pd.date_range(start='2020/08/01',periods=10)
DatetimeIndex(['2020-08-01', '2020-08-02', '2020-08-03', '2020-08-04',
'2020-08-05', '2020-08-06', '2020-08-07', '2020-08-08',
'2020-08-09', '2020-08-10'],
dtype='datetime64[ns]', freq='D')
pd.date_range(end='20200810',periods=10)
DatetimeIndex(['2020-08-01', '2020-08-02', '2020-08-03', '2020-08-04',
'2020-08-05', '2020-08-06', '2020-08-07', '2020-08-08',
'2020-08-09', '2020-08-10'],
dtype='datetime64[ns]', freq='D')
pd.date_range(start='2020.08.01',periods=10)
DatetimeIndex(['2020-08-01', '2020-08-02', '2020-08-03', '2020-08-04',
'2020-08-05', '2020-08-06', '2020-08-07', '2020-08-08',
'2020-08-09', '2020-08-10'],
dtype='datetime64[ns]', freq='D')
data=DataFrame(np.random.randn(250,4),
index=pd.date_range('2019-01-01',periods=250),
columns='天.地.玄.黄'.split('.')
)
data
天 地 玄 黄
2019-01-01 0.650884 -0.230704 0.539396 0.443425
2019-01-02 -1.035249 -0.325115 0.486289 0.363106
2019-01-03 -0.199688 0.410041 -0.288964 -1.569419
2019-01-04 -0.149213 -1.962677 0.277294 0.096411
2019-01-05 -1.058891 -0.405065 -0.400450 -1.120895
... ... ... ... ...
2019-09-03 -1.110814 -0.949773 -0.186267 0.609731
2019-09-04 -1.342854 1.604551 -0.116002 1.434391
2019-09-05 0.743069 0.164166 -1.031541 -0.059534
2019-09-06 -0.648797 -0.376949 -0.505039 0.108930
2019-09-07 -0.211451 -0.385478 1.843292 0.221996
250 rows × 4 columns
data.loc['2019.08.8']
天 -1.724266
地 1.533940
玄 -0.374794
黄 1.383316
Name: 2019-08-08 00:00:00, dtype: float64
data.loc['2019.08.8':'20190809']
天 地 玄 黄
2019-08-08 -1.724266 1.533940 -0.374794 1.383316
2019-08-09 -0.345077 0.380281 -0.962117 0.856140
data.loc['2019.08']
天 地 玄 黄
2019-01-01 0.650884 -0.230704 0.539396 0.443425
2019-01-02 -1.035249 -0.325115 0.486289 0.363106
2019-01-03 -0.199688 0.410041 -0.288964 -1.569419
2019-01-04 -0.149213 -1.962677 0.277294 0.096411
2019-01-05 -1.058891 -0.405065 -0.400450 -1.120895
... ... ... ... ...
2019-09-03 -1.110814 -0.949773 -0.186267 0.609731
2019-09-04 -1.342854 1.604551 -0.116002 1.434391
2019-09-05 0.743069 0.164166 -1.031541 -0.059534
2019-09-06 -0.648797 -0.376949 -0.505039 0.108930
2019-09-07 -0.211451 -0.385478 1.843292 0.221996
250 rows × 4 columns