DataFrame概念(可以通俗理解为excel中一片数据)
表格型数据结构,带有标签的二维数组,有行标签(index)和列标签(columns),其值可以是数值、字符串、布尔值等。
1、.index 行标签
2、.columns 列标签
3、.values 值
.4、dtypes return the dtypes in the DataFrame.
# -*- coding: utf-8 -*-
import pandas as pd
d = {'col1': [1, 2], 'col2': [3, 4]}
df = pd.DataFrame(data=d)
print(df)
print(df.index)
print(df.columns)
print(df.values)
print(df.dtypes)
D:python3installpython.exe D:/python/py3script/test.py
col1 col2
0 1 3
1 2 4
RangeIndex(start=0, stop=2, step=1)
Index(['col1', 'col2'], dtype='object')
[[1 3]
[2 4]]
col1 int64
col2 int64
dtype: object
Process finished with exit code 0
一:数组/list组成的字典创建DataFrame
1、由数组/list组成的字典(字典的值长度必须保持一致)来创建DataFrame,columns为字典的key,index默认为数字标签
2、columns参数即为列的顺序,格式为list,如果现有数据没有该列(比如列'www.python66.com')则显示控制NaN,如果指定的列少于现有数据,则只显示指定列那部分的数据
3、index参数指定行标签,格式为list
# -*- coding: utf-8 -*-
import numpy as np
import pandas as pd
# dictionary创建
d = {'col1': [1, 2], 'col2': [3, 4]}
df = pd.DataFrame(data=d)
print(df)
print('----------------')
# 强制更改dtype
df = pd.DataFrame(data=d, dtype=np.int8)
print(df.dtypes)
print('----------------')
# numpy的ndarray创建
df2 = pd.DataFrame(np.array([[1, 2, 3], [4, 5, 6], [7, 8, 9]]),columns=['a', 'b', 'c'])
print(df2)
print('----------------')
# 更改columns和index
d = {'col1': [1, 2], 'col2': [3, 4]}
df = pd.DataFrame(data=d,columns=['python66','col2'],index=['python66',2])
print(df)
D:python3installpython.exe D:/python/py3script/test.py
col1 col2
0 1 3
1 2 4
----------------
col1 int8
col2 int8
dtype: object
----------------
a b c
0 1 2 3
1 4 5 6
2 7 8 9
----------------
python66 col2
python66 NaN 3
2 NaN 4
Process finished with exit code 0
二:Series组成的字典创建DataFrame
1、由Series组成的字典创建DataFrame,其columns为字典的key,index默认为Series的标签(默认是数字标签)
2、两个Series长度可以不一致(对比第一种方法字典的值长度必须一致),会产生NaN值。
3、Series的index参数中的元素应避免出现相同的。
# -*- coding: utf-8 -*-
import numpy as np
import pandas as pd
# Series字典构造创建
series1 = pd.Series(np.random.rand(3))
df1 = pd.DataFrame({'col1':series1,'col2':series1})
print(df1)
print('---------------------')
series1 = pd.Series(['a','b','c'],index=['python66','py7','py8'])
series2 = pd.Series(['a','b','c','d'],index=['python66','py7','py8','py9'])
df1 = pd.DataFrame({'col1':series1,'col2':series2})
print(df1)
D:python3installpython.exe D:/python/py3script/test.py
col1 col2
0 0.954448 0.954448
1 0.465568 0.465568
2 0.286233 0.286233
---------------------
col1 col2
py7 b b
py8 c c
py9 NaN d
python66 a a
Process finished with exit code 0
三:通过二维数组直接创建
1、通过二维数组创建DataFrame得到形状一样的二位数据,不指定index和columns参数,则默认是数字标签。
2、index和columns和原数组长度保持一致。
# -*- coding: utf-8 -*-
import numpy as np
import pandas as pd
arr = np.random.rand(12).reshape(3,4)
df1 = pd.DataFrame(arr)
print(df1)
print('---------------')
arr = np.random.rand(12).reshape(3,4)
df2 = pd.DataFrame(arr,index=['a','b','c'],columns=['col1','col2','col3','col4'])
print(df2)
D:python3installpython.exe D:/python/py3script/test.py
0 1 2 3
0 0.374332 0.139147 0.413843 0.866404
1 0.223835 0.454294 0.741501 0.583207
2 0.373477 0.602602 0.322788 0.016215
---------------
col1 col2 col3 col4
a 0.409282 0.808191 0.833259 0.972950
b 0.347496 0.673279 0.502993 0.889290
c 0.427744 0.129409 0.894754 0.373354
Process finished with exit code 0
四:由字典组成的列表创建
# -*- coding: utf-8 -*-
import pandas as pd
lis = [{'course','python'},{'domain':'www.python66.com'}]
df = pd.DataFrame(lis)
print(df)
print('----------------')
df = pd.DataFrame(lis,index=['a','b'])
print(df)
D:python3installpython.exe D:/python/py3script/test.py
0 1
0 python course
1 domain None
----------------
0 1
a python course
b domain None
Process finished with exit code 0
五:由字典组成的字典(嵌套字典)创建
1、最外层的key是列标签columns,里面的字典的key为index。
2、可以通过columns参数改变列标签,也可以通过index参数改变行标签,但是原有的index不会变,相当于新增加了一些Index,值为空值。
# -*- coding: utf-8 -*-
import pandas as pd
dic = {'col1':{'index1':'python','index2':'shell'},'col2':{'index1':'python66','index2':'php'}}
df = pd.DataFrame(dic)
print(df)
print('----------------')
df = pd.DataFrame(dic,index=['a','b'])
print(df)
D:python3installpython.exe D:/python/py3script/test.py
col1 col2
index1 python python66
index2 shell php
----------------
col1 col2
a NaN NaN
b NaN NaN
Process finished with exit code 0