Pandas 基本操作

最新推荐文章于 2022-01-21 14:57:09 发布

qq_27931977

最新推荐文章于 2022-01-21 14:57:09 发布

阅读量163

点赞数

分类专栏：机器学习文章标签： Pandas

本文链接：https://blog.csdn.net/qq_27931977/article/details/90717855

版权

机器学习专栏收录该内容

8 篇文章 0 订阅

订阅专栏

Table of Contents

class pandas.DataFrame(data=None, index=None, columns=None, dtype=None, copy=False)

Parameters:	data : ndarray (structured or homogeneous), Iterable, dict, or DataFrame Dict can contain Series, arrays, constants, or list-like objects Changed in version 0.23.0: If data is a dict, argument order is maintained for Python 3.6 and later. index : Index or array-like Index to use for resulting frame. Will default to RangeIndex if no indexing information part of input data and no index provided columns : Index or array-like Column labels to use for resulting frame. Will default to RangeIndex (0, 1, 2, …, n) if no column labels are provided dtype : dtype, default None Data type to force. Only a single dtype is allowed. If None, infer copy : boolean, default False Copy data from inputs. Only affects DataFrame / 2d ndarray input

Parameters:

data : ndarray (structured or homogeneous), Iterable, dict, or DataFrame

Dict can contain Series, arrays, constants, or list-like objects

Changed in version 0.23.0: If data is a dict, argument order is maintained for Python 3.6 and later.

index : Index or array-like

Index to use for resulting frame. Will default to RangeIndex if no indexing information part of input data and no index provided

columns : Index or array-like

Column labels to use for resulting frame. Will default to RangeIndex (0, 1, 2, …, n) if no column labels are provided

dtype : dtype, default None

Data type to force. Only a single dtype is allowed. If None, infer

copy : boolean, default False

Copy data from inputs. Only affects DataFrame / 2d ndarray input

data：可以是数组、字典、DataFrame

index:行索引

columns:列索引

创建 DataFrame

通过字典创建

d = {'col1': [1, 2], 'col2': [3, 4]}

df = pd.DataFrame(data=d)

通过数组创建

df2 = pd.DataFrame(np.array([[1, 2, 3], [4, 5, 6], [7, 8, 9]]),columns=['a', 'b', 'c'])

读取整列

df2 = pd.DataFrame(np.array([[1, 2, 3], [4, 5, 6], [7, 8, 9]]),columns=['a', 'b', 'c'])

通过列索引读取：df2['a']

通过列索引读取：df2.a

读取整行

df = pd.DataFrame([[1, 2], [4, 5], [7, 8]],index=['cobra', 'viper', 'sidewinder'],columns=['max_speed', 'shield'])

通过行索引读取：df2.cobra

通过行索引读取：df2[cobra]

通过loc读取：df.loc['cobra']

通过iloc用列号读取：df2.iloc[0]

删除行或列

df = pd.DataFrame(np.arange(12).reshape(3,4),columns=['A', 'B', 'C', 'D'])

   A  B   C   D
0  0  1   2   3
1  4  5   6   7
2  8  9  10  11

删除行

通过行号：df.drop([0, 1])

删除列

df.drop(['B', 'C'], axis=1)

df.drop(columns=['B', 'C'])

统计行/列名

行名：df.index

列名：df.columns

统计缺失值

print(data.isnull().sum())

缺失值处理

删除全部为NaN的行：data.dropna(axis=0, how='all')

删除表中任何含有NaN的行：data.dropna(axis=0, how='any')

删除表中全部为NaN的列：data.dropna(axis=1, how='all')

用0填充所有NaN：data.fillna(0)

对不同的列填充不同的值：data.fillna({'sex':233,'phone':666})

纵向上，用缺失值上面的值代替缺失值：data.fillna(axis=0, method='ffill')

指定某一列上的缺失值处理：data['a'].fillna(0)

赋值

更改X行X列上的值：data.loc[row, ['fontSize']] = 1

上下合并（连接）数据集

pd.concat([df1, df2])

qq_27931977

关注

0
点赞
踩
0

收藏

觉得还不错? 一键收藏
0
评论
Pandas 基本操作

Table of Contents创建 DataFrame读取整列读取整行删除行或列统计行/列名统计缺失值缺失值处理赋值上下合并（连接）数据集classpandas.DataFrame(data=None,index=None,columns=None,dtype=None,copy=False)Parameters: da...
复制链接

扫一扫

专栏目录