02-pandas基本数据类型

最新推荐文章于 2024-05-06 10:39:48 发布

ge小琦

最新推荐文章于 2024-05-06 10:39:48 发布

阅读量109

点赞数

文章标签： mysql 索引 python

本文链接：https://blog.csdn.net/weixin_48622025/article/details/108042332

版权

文章目录

import numpy as np
import pandas as pd
from pandas import Series,DataFrame

通过字典中包含字典，创建个DataFrame

dict_city={
    'shanghai':{
        2019:54000,2020:56000
    },
    'beijing':{
        2018:66666,2019:77777
    }
}
df_city=DataFrame(dict_city)
df_city

shanghai	beijing
2019	54000.0	77777.0
2020	56000.0	NaN
2018	NaN	66666.0

增加数据 df.insert()

df.insert(loc,column,value,allow_duplicates=False)-> None
loc:索引位置
column:列名称
value:值
allow_duplicates:添加的这一列是否允许重复
None：返回值为空

# 新增深圳的数据到df_city中
df_city.insert(2,'shenzhen',[60000,70000,50000])

df_city.T.drop_duplicates()

2019	2020	2018
shanghai	54000.0	56000.0	NaN
beijing	77777.0	NaN	66666.0
shenzhen	60000.0	70000.0	50000.0

dict_city={
    'shanghai':{
        2019:54000,2020:56000
    },
    'beijing':{
        2018:66666,2019:77777
    }
}
df_city=DataFrame(dict_city)
df_city

shanghai	beijing
2019	54000.0	77777.0
2020	56000.0	NaN
2018	NaN	66666.0

给行列去名称

df_city.index.name='年份'
df_city.columns.name='城市'
df_city

城市	shanghai	beijing
年份		
2019	54000.0	77777.0
2020	56000.0	NaN
2018	NaN	66666.0

df_city.index

Int64Index([2019, 2020, 2018], dtype='int64', name='年份')

我们可以单独创建索引对象，用来创建数据

pd.Index

s=Series([1,2,3,4],index=['a','b','c','d'])
s

a    1
b    2
c    3
d    4
dtype: int64

s2=Series([1,2,3,4],index=pd.Index(['a','b','c','d'],name='字母'))
s2

字母
a    1
b    2
c    3
d    4
dtype: int64

创建一个用日期做索引的数据

pd.date_range(start,end,periods)

start:开始日期
end:结束日期
periods:周期，间隔

pd.date_range('2020-08-01','2020-08-10')

DatetimeIndex(['2020-08-01', '2020-08-02', '2020-08-03', '2020-08-04',
               '2020-08-05', '2020-08-06', '2020-08-07', '2020-08-08',
               '2020-08-09', '2020-08-10'],
              dtype='datetime64[ns]', freq='D')

pd.date_range(start='2020/08/01',periods=10)

DatetimeIndex(['2020-08-01', '2020-08-02', '2020-08-03', '2020-08-04',
               '2020-08-05', '2020-08-06', '2020-08-07', '2020-08-08',
               '2020-08-09', '2020-08-10'],
              dtype='datetime64[ns]', freq='D')

pd.date_range(end='20200810',periods=10)

DatetimeIndex(['2020-08-01', '2020-08-02', '2020-08-03', '2020-08-04',
               '2020-08-05', '2020-08-06', '2020-08-07', '2020-08-08',
               '2020-08-09', '2020-08-10'],
              dtype='datetime64[ns]', freq='D')

pd.date_range(start='2020.08.01',periods=10)

DatetimeIndex(['2020-08-01', '2020-08-02', '2020-08-03', '2020-08-04',
               '2020-08-05', '2020-08-06', '2020-08-07', '2020-08-08',
               '2020-08-09', '2020-08-10'],
              dtype='datetime64[ns]', freq='D')

data=DataFrame(np.random.randn(250,4),
               index=pd.date_range('2019-01-01',periods=250),
               columns='天.地.玄.黄'.split('.')
              )

data
天	地	玄	黄
2019-01-01	0.650884	-0.230704	0.539396	0.443425
2019-01-02	-1.035249	-0.325115	0.486289	0.363106
2019-01-03	-0.199688	0.410041	-0.288964	-1.569419
2019-01-04	-0.149213	-1.962677	0.277294	0.096411
2019-01-05	-1.058891	-0.405065	-0.400450	-1.120895
...	...	...	...	...
2019-09-03	-1.110814	-0.949773	-0.186267	0.609731
2019-09-04	-1.342854	1.604551	-0.116002	1.434391
2019-09-05	0.743069	0.164166	-1.031541	-0.059534
2019-09-06	-0.648797	-0.376949	-0.505039	0.108930
2019-09-07	-0.211451	-0.385478	1.843292	0.221996
250 rows × 4 columns

data.loc['2019.08.8']

天   -1.724266
地    1.533940
玄   -0.374794
黄    1.383316
Name: 2019-08-08 00:00:00, dtype: float64

data.loc['2019.08.8':'20190809']

天	地	玄	黄
2019-08-08	-1.724266	1.533940	-0.374794	1.383316
2019-08-09	-0.345077	0.380281	-0.962117	0.856140

data.loc['2019.08']

天	地	玄	黄
2019-01-01	0.650884	-0.230704	0.539396	0.443425
2019-01-02	-1.035249	-0.325115	0.486289	0.363106
2019-01-03	-0.199688	0.410041	-0.288964	-1.569419
2019-01-04	-0.149213	-1.962677	0.277294	0.096411
2019-01-05	-1.058891	-0.405065	-0.400450	-1.120895
...	...	...	...	...
2019-09-03	-1.110814	-0.949773	-0.186267	0.609731
2019-09-04	-1.342854	1.604551	-0.116002	1.434391
2019-09-05	0.743069	0.164166	-1.031541	-0.059534
2019-09-06	-0.648797	-0.376949	-0.505039	0.108930
2019-09-07	-0.211451	-0.385478	1.843292	0.221996
250 rows × 4 columns

ge小琦

关注

0
点赞
踩
0

收藏

觉得还不错? 一键收藏
0
评论
02-pandas基本数据类型

文章目录通过字典中包含字典，创建个DataFrame增加数据 df.insert()给行列去名称我们可以单独创建索引对象，用来创建数据创建一个用日期做索引的数据pd.date_range(start,end,periods)import numpy as npimport pandas as pdfrom pandas import Series,DataFrame通过字典中包含字典，创建个DataFramedict_city={ 'shanghai':{ 2019:5
复制链接

扫一扫