Pandas是封装在Numpy基础之上的;
1. pandas 里对字符型值叫Object
a=pandas.read_csv("") read_csv: 默认第一行为列名;
print(type(a))
print(a.dtypes)
print(help(pandas.read_csv))
a.head()默认显示前几条
a.head(3) 显示前2条
a.tail() 显示后几条
a.column, 第一行名称
a.shape 行列,数据的规模;
a.loc[0] 第一个数据
a.loc[6] 第六个数据
2.常用的类型
object- for string values;
int- for integer values;
float- for float values;
datatime-for time values;
bool- for Boolean values;
3. a.loc[3:6]
b=a[''""]
names=a.columns.tolist() 把当前的列名做成了一个list
print(names)
gram_columns=[]
for c in names:
if c.endswith("(g)"):
gram_columns.append(c)
gram_df=food_info[gram_columns]
print(gram_df.head(3))
4.在dataframe中新建列;
5.a.sort_values("ab",inplace=True); Sort_values: 排序;默认从小到大排序;
inplace 代表创建一个新的dataframe.
a.sort_values("ab",inplace=True,ascending=False); 降序操作;
NaN, python将之认为缺失值;
6.数据预处理
age=titanic_survival[''age'']
print(age.loc[0:10])
age_is_null=pd.isnull(age)
print(age_is_null)
age_null_true=age[age_is_null]
print(age_null_true)
age_null_count=len(age_null_true)
print(age_null_count)
缺失值处理:mean=sum()/len()
good_ages=t_survival["age"][age_is_null==False]
快速计算:
passenger_survival=titanic_survival.pivot_table(index="Pclass", values="Survived",aggfunc=np.mean) 默认按均值;
#index tells the method which column to group by
#values is the column that we want to apply the application to
#aggfunc specifies the calculation we want to perform
ports_stats=titanic_survival.pivot_table(index="Embarked",values=["fares","survived"], aggfunc=np.sum) 算总值
print(ports_stats)
#specifying axis=1 or axis='columns' will drop any columns that have null values 丢弃缺失值
drop_na_columns=titanic_survival.dropna(axis=1)
new_survival=titanic_survival.dropna(axis=0,subset=["age","sex"])
row_index_83_age=titanic_survival.loc[83,"Age"]
reset.index(drop=true) 对索引重新排序!
7. This function returns the hundredth item from a series
def hundredth_row(column):
#extract the hundredth item
hundredth_item=column.loc[99]
return hundredth_item
hundredth_row=a.apply(hundredth_row) apply 函数;
8.定义一个求缺失值个数的函数;
def not_null_count(column):
column_null=pd.isnull(column)
null=column[column_null]
return len(null)
#应用
column_null_count=titanic_survival.apply(not_null_count)
print(column_null_count)
9. #Series (collection of values)
#DataFrame (Collection of Series objects)