Python3 pandas库DataFrame基础用法

创建一个DataFrame的三种方法

1、用字典dict,字典值value是列表list

population = {"city":["beijing","shanghai","guangzhou","shenzhen","hangzhou","chongqing"],
              "year":[2016,2017,2016,2017,2017,2016],
              "population":[2100,2300,1000,700,500,500]}#字典里的键和值必须一一对应,否则会报错
population = pd.DataFrame(population)
print(population)
        city  population  year
0    beijing        2100  2016
1   shanghai        2300  2017
2  guangzhou        1000  2016
3   shenzhen         700  2017
4   hangzhou         500  2017
5  chongqing         500  2016
pdc = pd.DataFrame(population,columns=["year","city","population"])#改变列的参数
print(pdc)
   year       city  population
0  2016    beijing        2100
1  2017   shanghai        2300
2  2016  guangzhou        1000
3  2017   shenzhen         700
4  2017   hangzhou         500
5  2016  chongqing         500
temp = {"city":["beijing","shanghai","guangzhou","shenzhen","hangzhou","chongqing"],
              "year":[2016,2017,2016,2017,2017,2016],
              "population":[2100,2300,1000,700,500,500]}
pdci = pd.DataFrame(temp,columns=["year","city","population"],index = ['one','two','three','four','five','six'])
#改变列的顺序和索引格式
print(pdci)
      year       city  population
one    2016    beijing        2100
two    2017   shanghai        2300
three  2016  guangzhou        1000
four   2017   shenzhen         700
five   2017   hangzhou         500
six    2016  chongqing         500

2、用series构建DataFrame

from pandas import pandas as pd
cities={'Beijing':55000,'Shanghai':60000,'shenzhen':50000,'Hangzhou':20000,'Guangzhou':45000,'Suzhou':None}
apts=pd.Series(cities,name='income')
apts['shenzhen']=70000
less_than_50000=(apts<50000)
apts[less_than_50000]=40000

apts2=pd.Series({'Beijing':10000,'Shanghai':8000,'shenzhen':6000,'Tianjin':40000,'Guangzhou':7000,'Chongqing':30000})
#print(apts2)

apts=apts+apts2
apts[apts.isnull()]=apts.mean()#缺省值用中位数填充
#print(apts)
df=pd.DataFrame({'apts':apts,'apts2':apts2})#两个series合并成一个df,共有的键显示值,非共有的显示NaN
              apts    apts2
Beijing    65000.0  10000.0
Chongqing  64000.0  30000.0
Guangzhou  47000.0   7000.0
Hangzhou   64000.0      NaN
Shanghai   68000.0   8000.0
Suzhou     64000.0      NaN
Tianjin    64000.0  40000.0
shenzhen   76000.0   6000.0

3、用一个字典构成的列表list of dicts来构建DataFrame

data = [{'lucy':9999,'linus':8888,'curry':100000},{'lucy':9998,'linus':8887,'curry':1000000}]
pd2 = pd.DataFrame(data,index=['salary1','salary2'])#一个疑问,为什么Lucy在最后?
print(pd2)
           curry  linus  lucy
salary1   100000   8888  9999
salary2  1000000   8887  9998

广播特性

from pandas import pan
  • 0
    点赞
  • 1
    收藏
    觉得还不错? 一键收藏
  • 0
    评论
评论
添加红包

请填写红包祝福语或标题

红包个数最小为10个

红包金额最低5元

当前余额3.43前往充值 >
需支付:10.00
成就一亿技术人!
领取后你会自动成为博主和红包主的粉丝 规则
hope_wisdom
发出的红包
实付
使用余额支付
点击重新获取
扫码支付
钱包余额 0

抵扣说明:

1.余额是钱包充值的虚拟货币,按照1:1的比例进行支付金额的抵扣。
2.余额无法直接购买下载,可以购买VIP、付费专栏及课程。

余额充值