apply map
map()将函数作用于Series中的每个元素
apply()作用于DataFrame中的行或者列,一维的向量上,求每个列的均值等操作...
applymap()作用于DataFrame的每个元素上
计算某一列中的去重复后的元素数:df[df.columns[1]].nunique()
sort
df.sort_values(by=['col1'])
df.sort_values(by=['col1', 'col2'])
df.sort_values(by='col1', ascending=False)
df.to_excel
If passing an existing ExcelWriter object, then the sheet will be added to the existing workbook. This can be used to save different DataFrames to one workbook:
>>> writer = pd.ExcelWriter('output.xlsx')
>>> df1.to_excel(writer,'Sheet1')
>>> df2.to_excel(writer,'Sheet2')
>>> writer.save()
iterrows
python里使用iterrows()对dataframe进行遍历
for index, row in df.iterrows():
pass
row为相应的pandas的Series。
How do I get the row count of a Pandas dataframe?
In [1]: import numpy as np
In [2]: import pandas as pd
In [3]: df = pd.DataFrame(np.arange(12).reshape(4,3))
In [4]: df
Out[4]:
0 1 2
0 0 1 2
1 3 4 5
2 6 7 8
3 9 10 11
In [5]: df.shape
Out[5]: (4, 3)
In [6]: timeit df.shape
2.77 µs ± 644 ns per loop (mean ± std. dev. of 7 runs, 100000 loops each)
In [7]: timeit df[0].count()
348 µs ± 1.31 µs per loop (mean ± std. dev. of 7 runs, 1000 loops each)
In [8]: len(df.index)
Out[8]: 4
In [9]: timeit len(df.index)
990 ns ± 4.97 ns per loop (mean ± std. dev. of 7 runs, 100000 loops each)
https://stackoverflow.com/questions/15943769/how-do-i-get-the-row-count-of-a-pandas-dataframe
DataFrame.
sample
(n=None, frac=None, replace=False, weights=None, random_state=None, axis=None)[source]
n : int, optional
Number of items from axis to return. Cannot be used with frac. Default = 1 if frac = None.
frac : float, optional
Fraction of axis items to return. Cannot be used with n.
或者
from sklearn.utils import shuffle
df = shuffle(df)
to numpy
landmarks = landmarks_frame.iloc[65, 1:].values
print(type(landmarks)) # <class 'numpy.ndarray'>
landmarks = landmarks.astype('float').reshape(-1, 2)