Pandas可以对数据集进行各种有用的分析和操作。让我们先从最简单的查看数据开始。
我们将使用IMDB电影数据集来演示,数据集文件下载:IMDB-Movie-Data.csv
首先加载CSV数据集,并将电影标题Title指定为索引。
importpandas as pd
movies_df= pd.read_csv("IMDB-Movie-Data.csv", index_col="Title")
head
打开新数据集时,通常要做的第一件事是,打印出几行数据看看,可使用.head()方法,该方法可以传入要显示的行数。
movies_df.head(10)
输出
Rank Genre ... Revenue (Millions) Metascore
Title ...
Guardians of the Galaxy1 Action,Adventure,Sci-Fi ... 333.13 76.0Prometheus2 Adventure,Mystery,Sci-Fi ... 126.46 65.0Split3 Horror,Thriller ... 138.12 62.0Sing4 Animation,Comedy,Family ... 270.32 59.0Suicide Squad5 Action,Adventure,Fantasy ... 325.02 40.0The Great Wall6 Action,Adventure,Fantasy ... 45.13 42.0La La Land7 Comedy,Drama,Music ... 151.06 93.0Mindhorn8 Comedy ... NaN 71.0The Lost City of Z9 Action,Adventure,Biography ... 8.01 78.0Passengers