Series和DataFrame是Pandas里最常用的基本数据类型:
1.创建Series
import pandas as pd
s1 = pd.Series([1,2,3,4,5])
print(s1)
0 1
1 2
2 3
3 4
4 5
dtype: int64
s2 = pd.Series([1,2,3.0,4,5])
print(s2)
0 1.0
1 2.0
2 3.0
3 4.0
4 5.0
dtype: float64
s3 = pd.Series([False, 1,2.0,'hello'])
print(s3)
0 False
1 1
2 2
3 hello
dtype: object
改变第一列索引
ss = pd.Series(['Bill','MicroSoft'],index=['person','company'])
print(ss)
person Bill
company MicroSoft
dtype: object
2.创建DataFrame
import pandas as pd
persons = pd.DataFrame({
'Name':['Rosaline Franklin','William Gosset'],
'Occupation':['Chemist','Statistician'],
'Born':['1920-07-25','1876-06-13'],
'Died':['1958-04-16','1937-10-16'],
'age':[37,61]})
print(persons)
Name Occupation Born Died age
0 Rosaline Franklin Chemist 1920-07-25 1958-04-16 37
1 William Gosset Statistician 1876-06-13 1937-10-16 61
persons.head()
Name | Occupation | Born | Died | age | |
---|---|---|---|---|---|
0 | Rosaline Franklin | Chemist | 1920-07-25 | 1958-04-16 | 37 |
1 | William Gosset | Statistician | 1876-06-13 | 1937-10-16 | 61 |
persons = pd.DataFrame({
'Name':['Rosaline Franklin','William Gosset'],
'Occupation':['Chemist','Statistician'],
'Born':['1920-07-25','1876-06-13'],
'Died':['1958-04-16','1937-10-16'],
'Age':[37,61]},columns=['Occupation','Born','Died','Age'],index=['Rosaline Franklin','William Gosset'])
print(persons)
Occupation Born Died Age
Rosaline Franklin Chemist 1920-07-25 1958-04-16 37
William Gosset Statistician 1876-06-13 1937-10-16 61
使用有顺序的字典
from collections import OrderedDict
persons = pd.DataFrame(OrderedDict([
('Name',['Rosaline Franklin','William Gosset']),
('Occupation',['Chemist','Statistician']),
('Born',['1920-07-25','1876-06-13']),
('Died',['1958-04-16','1937-10-16']),
('Age',[37,61])
]))
print(persons)
Name Occupation Born Died Age
0 Rosaline Franklin Chemist 1920-07-25 1958-04-16 37
1 William Gosset Statistician 1876-06-13 1937-10-16 61