1.series的创建
class pandas.Series(data=None, index=None, dtype=None, name=None, copy=False, fastpath=False)
Parameters | means |
---|---|
data | array-like, dict, or scalar value. Contains data stored in Series |
index | array-like or Index. |
dtype | numpy.dtype or None. If None, dtype will be inferred. |
copy | boolean, default False. Copy input data |
1.1 列表创建
>>> import pandas as pd
>>> import numpy as np
>>> s = pd.Series([1,3,5,np.nan,6,8])
>>> s
0 1.0
1 3.0
2 5.0
3 NaN
4 6.0
5 8.0
dtype: float64
>>> s = pd.Series(np.random.randint(0,7,size=5), index=list('ABCDD'), name='Hello')
>>> s
A 3
B 1
C 4
D 4
D 2
Name: Hello, dtype: int32
>>>
1.2 字典转化
>>> s = pd.Series({'a':2,'b':4,'d':5}, index=list('abcd'))
>>> s
a 2.0
b 4.0
c NaN
d 5.0
dtype: float64
1.3 标量值创建
>>> pd.Series(5, index=list('abcd'))
a 5
b 5
c 5
d 5
dtype: int64
2.dataframe的创建
class pandas.DataFrame(data=None, index=None, columns=None, dtype=None, copy=False)
2.1 从字典
>>> df = pd.DataFrame({'one' : pd.Series([1., 2., 3.], index=['a', 'b', 'c']),
'two' : pd.Series([1., 2., 3., 4.], index=['a', 'b', 'c', 'd'])})
>>> df
one two
a 1.0 1.0
b 2.0 2.0
c 3.0 3.0
d NaN 4.0
可通过设定index和columns自定义顺序,前提是已经有index和columns的情况下,若不存在则全为NaN。
2.2 从列表
>>> d = {'one' : [1., 2., 3., 4.],'two' : [4., 3., 2., 1.]}
>>> pd.DataFrame(d, index=['a', 'b', 'c', 'd'])
one two
a 1.0 4.0
b 2.0 3.0
c 3.0 2.0
d 4.0 1.0
2.3 从多个字典
>>> data2 = [{'a': 1, 'b': 2}, {'a': 5, 'b': 10, 'c': 20}]
>>> pd.DataFrame(data2)
a b c
0 1 2 NaN
1 5 10 20.0
2.4 外部导入
例:read_table
>>> users = pd.read_table('https://raw.githubusercontent.com/justmarkham/DAT8/master/data/u.user',sep='|', index_col='user_id')
>>> users
age gender occupation zip_code
user_id
1 24 M technician 85711
2 53 F other 94043
3 23 M writer 32067
...
941 20 M student 97229
942 48 F librarian 78209
943 22 M student 77841
[943 rows x 4 columns]
更多类型的读取:
读取过程可能会遇到的错误:中文路径问题、反斜杠问题、编码问题,建议的格式为:
file_path = '***.csv'
f = open(file_path, encoding='utf-8')
df = pd.read_csv(f)
f.close()