append
DataFrame.append(self, other, ignore_index=False, verify_integrity=False, sort=False) → ‘DataFrame’
将other作为新的行添加到原dataframe中
- other:将要被添加的行,可以为dataframe,Series,dict,或者list
- ignore_index:如果为真,不使用原有索引标签
import pandas as pd
df=pd.DataFrame([[1,2],[3,4]],columns=list('AB'))
print(df)
df_app=pd.DataFrame([[5,6],[7,8]],columns=list('AB'))
print(df.append(df_app,ignore_index=False))
print(df.append(df_app,ignore_index=True))
输出:
A B
0 1 2
1 3 4
A B
0 1 2
1 3 4
0 5 6
1 7 8
A B
0 1 2
1 3 4
2 5 6
3 7 8
drop
DataFrame.drop(self, labels=None, axis=0, index=None, columns=None, level=None, inplace=False, errors=‘raise’)
Remove rows or columns by specifying label names and corresponding axis, or by specifying directly index or column names. When using a multi-index, labels on different levels can be removed by specifying the level. See the user guide <advanced.shown_levels> for more information about the now unused levels.
- labels:需要除去的行数或者列数的下标,元组会被当成单个的label而不是作为list进行处理
- axis({0 or ‘index’, 1 or ‘columns’}, default 0):0代表除去一整行,1代表除去一整列
- index(single label or list-like):另外一种方式来指定axis(
labels, axis=0
等于index=labels
) - columns(single label or list-like):另外一种方式来指定axis(
labels, axis=1
等于columns=labels
) - inplace:bool, default False,如果为False,那么返回一份拷贝。否则进行原地操作并返回None。
返回值:
DataFrame或者None,当inplace=False
时返回除去指定index或column的DataFrame,当inplace=True
时返回None
dropna
DataFrame.dropna(self, axis=0, how=‘any’, thresh=None, subset=None, inplace=False)
除去缺失值
- axis:默认为0,代表行,这时的处理方式是如果行内存在缺失值,那么去除掉这一整行,为1时,列中存在缺失值会去除掉一整列
- how:{‘any’,‘all’}默认为’any’,any代表只要行(或者列)存在一个缺失值就将整行从dataframe中除去,而all代表只有在行内所有值都是缺失值时才将行从dataframe中除去。
isnull isna
两个函数的作用完全相同,二者的作用都是检测缺失值。
返回一个和传入值形状相同的布尔矩阵,其布尔值指代内容是否为空。空值,例如None或者numpy.NaN会被映射为True,其他的所有值都会被映射为False。空字符串或者numpy.inf不会被认为是空值(除非设定pandas.options.mode.use_inf_as_na = True)
返回值
DataFrame:Mask of bool values for each element in DataFrame that indicates whether an element is an NA value.
replace
函数原型:
DataFrame.replace(
to_replace=None,
value=NoDefault.no_default,
inplace=False,
limit=None,
regex=False,
method=NoDefault.no_default
)
将to_replace
中的值修改为value
。这不同于使用.loc
或者.iloc
进行修改(这两个函数需要我们指定位置来更新某些值)
参数:
to_replace:str, regex, list, dict, Series, int, float, or None
如何找到我们要修改的值
- numeric,str或regex
numeric:和to_replace
值相同的numeric会被替换成value
str:strings exactly matchingto_replace
会被替换成value
regex:regex matchingto_replace
会被替换成value
- list of str,regex或numeric
首先,如果to_replace
和value
都是lists,它们必须是等长的
其次,如果regex=true
,那么all of the strings in both lists will be interpreted as regexs otherwise they will match directly。This doesn’t matter much for value since there are only a few possible substitution regexes you can use.
str, regex and numeric rules apply as above. - dict
待补充
value:scalar, dict, list, str, regex, default None
Value to replace any values matching to_replace with. For a DataFrame a dict of values can be used to specify which value to use for each column (columns not in the dict will not be filled). Regular expressions, strings and lists or dicts of such objects are also allowed.
inplace:bool, default False
如果为True,那么执行原地操作并返回None
limit:int, default None
Maximum size gap to forward or backward fill.
regex:bool or same types as to_replace, default False
Whether to interpret to_replace and/or value as regular expressions. 如果这个值为True,那么 to_replace
必须是字符串。Alternatively, this could be a regular expression or a list, dict, or array of regular expressions in which case to_replace must be None.
https://pandas.pydata.org/docs/reference/api/pandas.DataFrame.replace.html
sum
函数原型:
DataFrame.sum(
axis=None,
skipna=True,
level=None,
numeric_only=None,
min_count=0,
**kwargs
)
返回指定axi上值的和,这个方法和numpy.sum
方法相同。
参数:
- axis(index:0;column:1):函数将要作用到的axis
- skipna:bool,default True,当计算结果时是否排除NA/null值
返回:
Series或者DataFrame