1、pandas数据结构之DataFrame
DataFrame生成方式:1、从另一个DataFrame创建。2、从具有二维形状的NumPy数组或数组的复合结构生成。3、使用Series创建。4、从CSV之类文件生成。下面介绍DataFrame的简单用法:
a):读取文件
代码:
from pandas.io.parsers import read_csv
df=read_csv("H:\Python\data\WHO.csv")
print "DataFrame:",df
运行结果(只截取部分):
DataFrame: Country CountryID Continent \
0 Afghanistan 1 1
1 Albania 2 2
2 Algeria 3 3
3 Andorra 4 2
4 Angola 5 3
代码:
print "Shape:",df.shape #大小
print "Length:",len(df) #长度
结果:
Shape: (202, 358)
Length: 202
代码:
print "Column Headers",df.columns #得到每列的标题
print "Data type",df.dtypes #得到每列数据的类型
结果(截取部分)
Column Headers Index([u'Country', u'CountryID', u'Continent',
u'Adolescent fertility rate (%)', u'Adult literacy rate (%)',
u'Gross national income per capita (PPP international $)',
u'Net primary school enrolment ratio female (%)',
u'Net primary school enrolment ratio male (%)',
u'Population (in thousands) total',
u'Population annual growth rate (%)',
...
u'Total_CO2_emissions', u'Total_income', u'Total_reserves',
u'Trade_balance_goods_and_services', u'Under_five_mortality_from_CME',
u'Under_five_mortality_from_IHME', u'Under_five_mortality_rate',
u'Urban_population', u'Urban_population_growth',
u'Urban_population_pct_of_total'],
dtype='object', length=358)
Data type Country object
CountryID int64
Continent int64
Adolescent fertility rate (%) float64
Adult literacy rate (%) float64
Gross national income per capita (PPP international $) float64
Net primary school enrolment ratio female (%) float64
Net primary school enrolment ratio male (%) float64
代码:
print "Index:",df.index
结果:
Index: RangeIndex(start=0, stop=202, step=1)
代码:
print "Vales:",df.values
结果
Vales: [['Afghanistan' 1L 1L ..., 5740436.0 5.44 22.9]
['Albania' 2L 2L ..., 1431793.9 2.21 45.4]
['Algeria' 3L 3L ..., 20800000.0 2.61 63.3]
...,
['Yemen' 200L 1L ..., 5759120.5 4.37 27.3]
['Zambia' 201L 3L ..., 4017411.0 1.95 35.0]
['Zimbabwe' 202L 3L ..., 4709965.0 1.9 35.9]]
2、pandas数据结构之Series
pandas的Series数据结构是由不同类型的元素组成的一维数组,该数据结构也具有标签,创建方式有:由Python字典创建;由numpy数组创建;由单个标量值创建。
a):类型。当选中DataFrame的一列时,得到的是一个Series型的数据。
代码:
country_df=df["Country"]
print "Type df:",type(df)
print "Type country_df:",type(country_df)
结果:
Type df: <class 'pandas.core.frame.DataFrame'>
Type country_df: <class 'pandas.core.series.Series'>
代码:
print "Series Shape:",country_df.shape #获取列的形状
print "Series index:",country_df.index #获取索引
print "Series values:",country_df.values #获取该列的所有值
print "Series name:",country_df.name #获取列名(标题)
结果:
Series Shape: (202L,)
Series index: RangeIndex(start=0, stop=202, step=1)
Series values: ['Afghanistan' 'Albania' 'Algeria' 'Andorra' 'Angola' 'Antigua and Barbuda'
'Argentina' 'Armenia' 'Australia' 'Austria' 'Azerbaijan' 'Bahamas'
'Bahrain' 'Bangladesh' 'Barbados' 'Belarus' 'Belgium' 'Belize' 'Benin'
'Bermuda' 'Bhutan' 'Bolivia' 'Bosnia and Herzegovina' 'Botswana' 'Brazil'
'Brunei Darussalam' 'Bulgaria' 'Burkina Faso' 'Burundi' 'Cambodia'
'Cameroon' 'Canada' 'Cape Verde' 'Central African Republic' 'Chad' 'Chile'
'China' 'Colombia' 'Comoros'