使用pandas读入csv文件
import pandas as pd
读取csv文件数据
读取train.csv数据:
train_df = pd.read_csv(r'C:\Users\86177\Desktop\大二资料\大数据导\experiment\train.csv')
查看数据信息
print(train_df)
查看前五个数据
print(train_df.head())
数据描述信息
train_df.describe( )
查看是否有空值
train_df.isnull( ).any( )
统计空值数目 重置索引
将空值的数目进行统计,并且重置索引,使得missing_value数据有两列:column_name和null_count:
missing_values = train_df.isnull( ).sum( )
missing_values = missing_values.reset_index( )
missing_values.columns = ['column_name','missing_count']
missing_value
挑选属性
第一种:
挑选这些属性Survived,Pclass,Sex,SibSp,Parch,Embarked,Age,Fare,组成datalist:
datalist = train_df.names = ["Survived“,”Pclass“,”Sex“,”SibSp“,“Parch”,”Embarked“,”Age“,"Fare"]
train_df2 = train_df[datalist]
train_df2.head( )
将数据年收入(Annual Income (k$))和消费习惯(Spending Score (1-100))字段提出,赋给X(dataset为接收数据的)
b=dataset.names = ['Annual Income (k$)‘,'Annual Income (k$)']
X=dataset[b]
print(X.head())
第二种:
将数据年收入(Annual Income (k$))和消费习惯(Spending Score (1-100))字段提出,赋给X(dataset为接收数据的)
X=dataset.loc[:,['Annual Income (k$)‘,'Annual Income (k$)']]
print(X.head())