Kaggle原文链接:https://www.kaggle.com/learn/pandas
数据集链接:https://pan.baidu.com/s/1QTrLgMewrebwXeD3QwnJiA
提取码:wd5b
1.Creating,Reading & Writing
1.1 Creating data
Pandas中有两个核心对象,DataFrame和Series
1.1.1 DataFrame
DataFrame是二维表格,竖列称为column,横列称为index,例如创建一个名为fruits的DataFrame对象。
import pandas as pd
# 方法一:先输入表格数据,然后分别对columns和index作补充说明
fruits = pd.DataFrame([[10,20],[30,40]],columns=["Apples","Bananas"],index=["Price","Amount"])
print('方法一:')
print(fruits)
print('\n')
# 方法二:按列输入每个column的数据,然后对index作单独补充说明
fruits2 = pd.DataFrame({"Apples":[10,30],"Bananas":[20,40]},index=["Price","Amount"])
print('方法二:')
print(fruits2)
方法一:
Apples Bananas
Price 10 20
Amount 30 40
方法二:
Apples Bananas
Price 10 20
Amount 30 40
1.1.2 Series
Series是一维表格,即只有单列column,可以把一个DataFrame看作是多个Series组合起来的合体,它们总是相互关联。不对column单独命名,只有一个表的表格名,例如创建一个名为things的Series对象:
things = pd.Series([1,2,3],index=['Milk','Eggs','Spam'],name='Dinner')
things
Milk 1
Eggs 2
Spam 3
Name: Dinner, dtype: int64
note:如果index不做特殊说明,那么就会是从0开始的连续自然数
1.2 Reading file
数据可以存储在多种文件格式中,目前最基础的是CSV格式的文件,我们使用read_csv()函数把数据读取到一个DataFrame对象中:
# 此时pandas会自动加入从0开始的index
wine_reviews = pd.read_csv('wine.csv')
# 如果原数据有自己的index,比如在第一列,则可以index_col=0来描述,这样第一列就会作为index
win_reviews = pd.read_csv('wine.csv',index_col=0)
另一种常见的数据格式是SQL,它的存储能力相当惊人,SQL有很多不同种类,每一种都需要各自的connector,读取没有kaggle方便,目前在kaggle唯一支持的种类是SQLite
import sqlite3
conn = sqlite3.connect('FPA_FOD_20170508.sqlite')
fires = pd.read_sql_query("SELECT * FROM fires", conn)
1.3 Writing file
使用to_csv()函数将数据写入CSV格式的文件
wine_reviews.to_csv("wine_reviews.csv")
2.Indexing,Selecting,Assigning
2.1 Naive accessors
简单的访问数据:可以直接显示整个DataFrame,若数据量较大最好设置展示的最大行数;也可用head()来展示前几行数据。
reviews = pd.read_csv('wine.csv',index_col=0)
pd.set_option('display.max_rows',5)
reviews
.dataframe tbody tr th {
vertical-align: top;
}
.dataframe thead th {
text-align: right;
}
country | description | designation | points | price | province | region_1 | region_2 | taster_name | taster_twitter_handle | title | variety | winery | |
---|---|---|---|---|---|---|---|---|---|---|---|---|---|
0 | Italy | Aromas include tropical fruit, broom, brimston... | Vulkà Bianco | 87 | NaN | Sicily & Sardinia | Etna | NaN | Kerin O’Keefe | @kerinokeefe | Nicosia 2013 Vulkà Bianco (Etna) | White Blend | Nicosia |
1 | Portugal | This is ripe and fruity, a wine that is smooth... | Avidagos | 87 | 15.0 | Douro | NaN | NaN | Roger Voss | @vossroger | Quinta dos Avidagos 2011 Avidagos Red (Douro) | Portuguese Red | Quinta dos Avidagos |
... | ... | ... | ... | ... | ... | ... | ... | ... | ... | ... | ... | ... | ... |
129969 | France | A dry style of Pinot Gris, this is crisp with ... | NaN | 90 | 32.0 | Alsace | Alsace | NaN | Roger Voss | @vossroger | Domaine Marcel Deiss 2012 Pinot Gris (Alsace) | Pinot Gris | Domaine Marcel Deiss |
129970 | France | Big, rich and off-dry, this is powered by inte... | Lieu-dit Harth Cuvée Caroline | 90 | 21.0 | Alsace | Alsace | NaN | Roger Voss | @vossroger | Domaine Schoffit 2012 Lieu-dit Harth Cuvée Car... | Gewürztraminer | Domaine Schoffit |
129971 rows × 13 columns
# 前5行数据
reviews.head(5)
.dataframe tbody tr th {
vertical-align: top;
}
.dataframe thead th {
text-align: right;
}
country | description | designation | points | price | province | region_1 | region_2 | taster_name | taster_twitter_handle | title | variety | winery | |
---|---|---|---|---|---|---|---|---|---|---|---|---|---|
0 | Italy | Aromas include tropical fruit, broom, brimston... | Vulkà Bianco | 87 | NaN | Sicily & Sardinia | Etna | NaN | Kerin O’Keefe | @kerinokeefe | Nicosia 2013 Vulkà Bianco (Etna) | White Blend | Nicosia |
1 | Portugal | This is ripe and fruity, a wine that is smooth... | Avidagos | 87 | 15.0 | Douro | NaN | NaN | Roger Voss | @vossroger | Quinta dos Avidagos 2011 Avidagos Red (Douro) | Portuguese Red | Quinta dos Avidagos |
2 | US | Tart and snappy, the flavors of lime flesh and... | NaN | 87 | 14.0 | Oregon | Willamette Valley | Willamette Valley | Paul Gregutt | @paulgwine | Rainstorm 2013 Pinot Gris (Willamette Valley) | Pinot Gris | Rainstorm |
3 | US | Pineapple rind, lemon pith and orange blossom ... | Reserve Late Harvest | 87 | 13.0 | Michigan | Lake Michigan Shore | NaN | Alexander Peartree | NaN | St. Julian 2013 Reserve Late Harvest Riesling ... | Riesling | St. Julian |
4 | US | Much like the regular bottling from 2012, this... | Vintner's Reserve Wild Child Block | 87 | 65.0 | Oregon | Willamette Valley | Willamette Valley | Paul Gregutt | @paulgwine | Sweet Cheeks 2012 Vintner's Reserve Wild Child... | Pinot Noir | Sweet Cheeks |
# 后5行数据
reviews.tail(5)
.dataframe tbody tr th {
vertical-align: top;
}
.dataframe thead th {
text-align: right;
}
country | description | designation | points | price | province | region_1 | region_2 | taster_name | taster_twitter_handle | title | variety | winery | |
---|---|---|---|---|---|---|---|---|---|---|---|---|---|
129966 | Germany | Notes of honeysuckle and cantaloupe sweeten th... | Brauneberger Juffer-Sonnenuhr Spätlese | 90 | 28.0 | Mosel | NaN | NaN | Anna Lee C. Iijima | NaN | Dr. H. Thanisch (Erben Müller-Burggraef) 2013 ... | Riesling | Dr. H. Thanisch (Erben Müller-Burggraef) |
129967 | US | Citation is given as much as a decade of bottl... | NaN | 90 | 75.0 | Oregon | Oregon | Oregon Other | Paul Gregutt | @paulgwine | Citation 2004 Pinot Noir (Oregon) | Pinot Noir | Citation |
129968 | France | Well-drained gravel soil gives this wine its c... | Kritt | 90 | 30.0 | Alsace | Alsace | NaN | Roger Voss | @vossroger | Domaine Gresser 2013 Kritt Gewurztraminer (Als... | Gewürztraminer | Domaine Gresser |
129969 | France | A dry style of Pinot Gris, this is crisp with ... | NaN | 90 | 32.0 | Alsace | Alsace | NaN | Roger Voss | @vossroger | Domaine Marcel Deiss 2012 Pinot Gris (Alsace) | Pinot Gris | Domaine Marcel Deiss |
129970 | France | Big, rich and off-dry, this is powered by inte... | Lieu-dit Harth Cuvée Caroline | 90 | 21.0 | Alsace | Alsace | NaN | Roger Voss | @vossroger | Domaine Schoffit 2012 Lieu-dit Harth Cuvée Car... | Gewürztraminer | Domaine Schoffit |
选取特定的某一列数据,可以用DataFrame.column或者DataFrame[“column”]
reviews.country
0 Italy
1 Portugal
...
129969 France
129970 France
Name: country, Length: 129971, dtype: object
reviews['country']
0 Italy
1 Portugal
...
129969 France
129970 France
Name: country, Length: 129971, dtype: object
选取特定的某一行某一列的元素,可以用DataFrame[‘column’][‘index’]
reviews['country'][0]
'Italy'
2.2 DataFrame.iloc
DataFrame.iloc基于数据的数字索引位置来检索数据,也可以用布尔值来进行检索
# 选取第一行
reviews.iloc[0]
country Italy
description Aromas include tropical fruit, broom, brimston...
...
variety White Blend
winery Nicosia
Name: 0, Length: 13, dtype: object
# 选取第一列
reviews.iloc[:,0]
0 Italy
1 Portugal
...
129969 France
129970 France
Name: country, Length: 129971, dtype: object
# 选取第一列的前三个数据
reviews.iloc[:3,0]
0 Italy
1 Portugal
2 US
Name: country, dtype: object
reviews.iloc[[0,3,5],0]
0 Italy
3 US
5 Spain
Name: country, dtype: object
# 选取最后一行
reviews.iloc[-1]
country France
description Big, rich and off-dry, this is powered by inte...
...
variety Gewürztraminer
winery Domaine Schoffit
Name: 129970, Length: 13, dtype: object
# 基于布尔值进行检索选取数据,注意布尔的数量要与index数量一致
reviews.head().iloc[[True,False,False,True,False]]
.dataframe tbody tr th {
vertical-align: top;
}
.dataframe thead th {
text-align: right;
}
country | description | designation | points | price | province | region_1 | region_2 | taster_name | taster_twitter_handle | title | variety | winery | |
---|---|---|---|---|---|---|---|---|---|---|---|---|---|
0 | Italy | Aromas include tropical fruit, broom, brimston... | Vulkà Bianco | 87 | NaN | Sicily & Sardinia | Etna | NaN | Kerin O’Keefe | @kerinokeefe | Nicosia 2013 Vulkà Bianco (Etna) | White Blend | Nicosia |
3 | US | Pineapple rind, lemon pith and orange blossom ... | Reserve Late Harvest | 87 | 13.0 | Michigan | Lake Michigan Shore | NaN | Alexander Peartree | NaN | St. Julian 2013 Reserve Late Harvest Riesling ... | Riesling | St. Julian |
2.3 DataFrame.loc
DataFrame.loc可以基于标签或布尔值对行列进行检索,还可以使用逻辑符号进行条件检索
reviews.loc[0,'country']
'Italy'
reviews.loc[:,['taster_name','points']]
.dataframe tbody tr th {
vertical-align: top;
}
.dataframe thead th {
text-align: right;
}
taster_name | points | |
---|---|---|
0 | Kerin O’Keefe | 87 |
1 | Roger Voss | 87 |
... | ... | ... |
129969 | Roger Voss | 90 |
129970 | Roger Voss | 90 |
129971 rows × 2 columns
reviews.head().loc[[True,False,True,False,True]]
.dataframe tbody tr th {
vertical-align: top;
}
.dataframe thead th {
text-align: right;
}
country | description | designation | points | price | province | region_1 | region_2 | taster_name | taster_twitter_handle | title | variety | winery | |
---|---|---|---|---|---|---|---|---|---|---|---|---|---|
0 | Italy | Aromas include tropical fruit, broom, brimston... | Vulkà Bianco | 87 | NaN | Sicily & Sardinia | Etna | NaN | Kerin O’Keefe | @kerinokeefe | Nicosia 2013 Vulkà Bianco (Etna) | White Blend | Nicosia |
2 | US | Tart and snappy, the flavors of lime flesh and... | NaN | 87 | 14.0 | Oregon | Willamette Valley | Willamette Valley | Paul Gregutt | @paulgwine | Rainstorm 2013 Pinot Gris (Willamette Valley) | Pinot Gris | Rainstorm |
4 | US | Much like the regular bottling from 2012, this... | Vintner's Reserve Wild Child Block | 87 | 65.0 | Oregon | Willamette Valley | Willamette Valley | Paul Gregutt | @paulgwine | Sweet Cheeks 2012 Vintner's Reserve Wild Child... | Pinot Noir | Sweet Cheeks |
iloc和loc总结:
- 相同点:都是先行后列的检索顺序,都支持使用布尔值检索
- iloc适用于数字索引进行检索;loc可以使用表格的标签进行检索,还可以进行条件检索
- 对于一个range(比如1:10),iloc是前闭后开(即1到9),loc是前后都闭(即1到10)
2.3条件检索
reviews.loc[reviews.country=='Italy']
.dataframe tbody tr th {
vertical-align: top;
}
.dataframe thead th {
text-align: right;
}
country | description | designation | points | price | province | region_1 | region_2 | taster_name | taster_twitter_handle | title | variety | winery | |
---|---|---|---|---|---|---|---|---|---|---|---|---|---|
0 | Italy | Aromas include tropical fruit, broom, brimston... | Vulkà Bianco | 87 | NaN | Sicily & Sardinia | Etna | NaN | Kerin O’Keefe | @kerinokeefe | Nicosia 2013 Vulkà Bianco (Etna) | White Blend | Nicosia |
6 | Italy | Here's a bright, informal red that opens with ... | Belsito | 87 | 16.0 | Sicily & Sardinia | Vittoria | NaN | Kerin O’Keefe | @kerinokeefe | Terre di Giurfo 2013 Belsito Frappato (Vittoria) | Frappato | Terre di Giurfo |
... | ... | ... | ... | ... | ... | ... | ... | ... | ... | ... | ... | ... | ... |
129961 | Italy | Intense aromas of wild cherry, baking spice, t... | NaN | 90 | 30.0 | Sicily & Sardinia | Sicilia | NaN | Kerin O’Keefe | @kerinokeefe | COS 2013 Frappato (Sicilia) | Frappato | COS |
129962 | Italy | Blackberry, cassis, grilled herb and toasted a... | Sàgana Tenuta San Giacomo | 90 | 40.0 | Sicily & Sardinia | Sicilia | NaN | Kerin O’Keefe | @kerinokeefe | Cusumano 2012 Sàgana Tenuta San Giacomo Nero d... | Nero d'Avola | Cusumano |
19540 rows × 13 columns
reviews.loc[(reviews.country=='Italy') & (reviews.points>=90)]
.dataframe tbody tr th {
vertical-align: top;
}
.dataframe thead th {
text-align: right;
}
country | description | designation | points | price | province | region_1 | region_2 | taster_name | taster_twitter_handle | title | variety | winery | |
---|---|---|---|---|---|---|---|---|---|---|---|---|---|
120 | Italy | Slightly backward, particularly given the vint... | Bricco Rocche Prapó | 92 | 70.0 | Piedmont | Barolo | NaN | NaN | NaN | Ceretto 2003 Bricco Rocche Prapó (Barolo) | Nebbiolo | Ceretto |
130 | Italy | At the first it was quite muted and subdued, b... | Bricco Rocche Brunate | 91 | 70.0 | Piedmont | Barolo | NaN | NaN | NaN | Ceretto 2003 Bricco Rocche Brunate (Barolo) | Nebbiolo | Ceretto |
... | ... | ... | ... | ... | ... | ... | ... | ... | ... | ... | ... | ... | ... |
129961 | Italy | Intense aromas of wild cherry, baking spice, t... | NaN | 90 | 30.0 | Sicily & Sardinia | Sicilia | NaN | Kerin O’Keefe | @kerinokeefe | COS 2013 Frappato (Sicilia) | Frappato | COS |
129962 | Italy | Blackberry, cassis, grilled herb and toasted a... | Sàgana Tenuta San Giacomo | 90 | 40.0 | Sicily & Sardinia | Sicilia | NaN | Kerin O’Keefe | @kerinokeefe | Cusumano 2012 Sàgana Tenuta San Giacomo Nero d... | Nero d'Avola | Cusumano |
6648 rows × 13 columns
reviews.loc[(reviews.country=='Italy') | (reviews.points>=90)]
.dataframe tbody tr th {
vertical-align: top;
}
.dataframe thead th {
text-align: right;
}
country | description | designation | points | price | province | region_1 | region_2 | taster_name | taster_twitter_handle | title | variety | winery | |
---|---|---|---|---|---|---|---|---|---|---|---|---|---|
0 | Italy | Aromas include tropical fruit, broom, brimston... | Vulkà Bianco | 87 | NaN | Sicily & Sardinia | Etna | NaN | Kerin O’Keefe | @kerinokeefe | Nicosia 2013 Vulkà Bianco (Etna) | White Blend | Nicosia |
6 | Italy | Here's a bright, informal red that opens with ... | Belsito | 87 | 16.0 | Sicily & Sardinia | Vittoria | NaN | Kerin O’Keefe | @kerinokeefe | Terre di Giurfo 2013 Belsito Frappato (Vittoria) | Frappato | Terre di Giurfo |
... | ... | ... | ... | ... | ... | ... | ... | ... | ... | ... | ... | ... | ... |
129969 | France | A dry style of Pinot Gris, this is crisp with ... | NaN | 90 | 32.0 | Alsace | Alsace | NaN | Roger Voss | @vossroger | Domaine Marcel Deiss 2012 Pinot Gris (Alsace) | Pinot Gris | Domaine Marcel Deiss |
129970 | France | Big, rich and off-dry, this is powered by inte... | Lieu-dit Harth Cuvée Caroline | 90 | 21.0 | Alsace | Alsace | NaN | Roger Voss | @vossroger | Domaine Schoffit 2012 Lieu-dit Harth Cuvée Car... | Gewürztraminer | Domaine Schoffit |
61937 rows × 13 columns
# isin()用来选取存在于列表中的数据
reviews.loc[reviews.country.isin(['Italy','France'])]
.dataframe tbody tr th {
vertical-align: top;
}
.dataframe thead th {
text-align: right;
}
country | description | designation | points | price | province | region_1 | region_2 | taster_name | taster_twitter_handle | title | variety | winery | |
---|---|---|---|---|---|---|---|---|---|---|---|---|---|
0 | Italy | Aromas include tropical fruit, broom, brimston... | Vulkà Bianco | 87 | NaN | Sicily & Sardinia | Etna | NaN | Kerin O’Keefe | @kerinokeefe | Nicosia 2013 Vulkà Bianco (Etna) | White Blend | Nicosia |
6 | Italy | Here's a bright, informal red that opens with ... | Belsito | 87 | 16.0 | Sicily & Sardinia | Vittoria | NaN | Kerin O’Keefe | @kerinokeefe | Terre di Giurfo 2013 Belsito Frappato (Vittoria) | Frappato | Terre di Giurfo |
... | ... | ... | ... | ... | ... | ... | ... | ... | ... | ... | ... | ... | ... |
129969 | France | A dry style of Pinot Gris, this is crisp with ... | NaN | 90 | 32.0 | Alsace | Alsace | NaN | Roger Voss | @vossroger | Domaine Marcel Deiss 2012 Pinot Gris (Alsace) | Pinot Gris | Domaine Marcel Deiss |
129970 | France | Big, rich and off-dry, this is powered by inte... | Lieu-dit Harth Cuvée Caroline | 90 | 21.0 | Alsace | Alsace | NaN | Roger Voss | @vossroger | Domaine Schoffit 2012 Lieu-dit Harth Cuvée Car... | Gewürztraminer | Domaine Schoffit |
41633 rows × 13 columns
print(reviews.country.isin(['Italy','France']))
0 True
1 False
2 False
3 False
4 False
...
129966 False
129967 False
129968 True
129969 True
129970 True
Name: country, Length: 129971, dtype: bool
# isnull()与notnull()用来确认数据是否为空
reviews.loc[reviews.price.notnull()]
.dataframe tbody tr th {
vertical-align: top;
}
.dataframe thead th {
text-align: right;
}
country | description | designation | points | price | province | region_1 | region_2 | taster_name | taster_twitter_handle | title | variety | winery | |
---|---|---|---|---|---|---|---|---|---|---|---|---|---|
1 | Portugal | This is ripe and fruity, a wine that is smooth... | Avidagos | 87 | 15.0 | Douro | NaN | NaN | Roger Voss | @vossroger | Quinta dos Avidagos 2011 Avidagos Red (Douro) | Portuguese Red | Quinta dos Avidagos |
2 | US | Tart and snappy, the flavors of lime flesh and... | NaN | 87 | 14.0 | Oregon | Willamette Valley | Willamette Valley | Paul Gregutt | @paulgwine | Rainstorm 2013 Pinot Gris (Willamette Valley) | Pinot Gris | Rainstorm |
3 | US | Pineapple rind, lemon pith and orange blossom ... | Reserve Late Harvest | 87 | 13.0 | Michigan | Lake Michigan Shore | NaN | Alexander Peartree | NaN | St. Julian 2013 Reserve Late Harvest Riesling ... | Riesling | St. Julian |
4 | US | Much like the regular bottling from 2012, this... | Vintner's Reserve Wild Child Block | 87 | 65.0 | Oregon | Willamette Valley | Willamette Valley | Paul Gregutt | @paulgwine | Sweet Cheeks 2012 Vintner's Reserve Wild Child... | Pinot Noir | Sweet Cheeks |
5 | Spain | Blackberry and raspberry aromas show a typical... | Ars In Vitro | 87 | 15.0 | Northern Spain | Navarra | NaN | Michael Schachner | @wineschach | Tandem 2011 Ars In Vitro Tempranillo-Merlot (N... | Tempranillo-Merlot | Tandem |
... | ... | ... | ... | ... | ... | ... | ... | ... | ... | ... | ... | ... | ... |
129966 | Germany | Notes of honeysuckle and cantaloupe sweeten th... | Brauneberger Juffer-Sonnenuhr Spätlese | 90 | 28.0 | Mosel | NaN | NaN | Anna Lee C. Iijima | NaN | Dr. H. Thanisch (Erben Müller-Burggraef) 2013 ... | Riesling | Dr. H. Thanisch (Erben Müller-Burggraef) |
129967 | US | Citation is given as much as a decade of bottl... | NaN | 90 | 75.0 | Oregon | Oregon | Oregon Other | Paul Gregutt | @paulgwine | Citation 2004 Pinot Noir (Oregon) | Pinot Noir | Citation |
129968 | France | Well-drained gravel soil gives this wine its c... | Kritt | 90 | 30.0 | Alsace | Alsace | NaN | Roger Voss | @vossroger | Domaine Gresser 2013 Kritt Gewurztraminer (Als... | Gewürztraminer | Domaine Gresser |
129969 | France | A dry style of Pinot Gris, this is crisp with ... | NaN | 90 | 32.0 | Alsace | Alsace | NaN | Roger Voss | @vossroger | Domaine Marcel Deiss 2012 Pinot Gris (Alsace) | Pinot Gris | Domaine Marcel Deiss |
129970 | France | Big, rich and off-dry, this is powered by inte... | Lieu-dit Harth Cuvée Caroline | 90 | 21.0 | Alsace | Alsace | NaN | Roger Voss | @vossroger | Domaine Schoffit 2012 Lieu-dit Harth Cuvée Car... | Gewürztraminer | Domaine Schoffit |
120975 rows × 13 columns
## 2.5 Assigning data 赋值
reviews['critic'] = 'everyone'
reviews['critic']
0 everyone
1 everyone
2 everyone
3 everyone
4 everyone
...
129966 everyone
129967 everyone
129968 everyone
129969 everyone
129970 everyone
Name: critic, Length: 129971, dtype: object
3.Summary functions & maps
# 描述points的一些情况
reviews.points.describe()
count 129971.000000
mean 88.447138
std 3.039730
min 80.000000
25% 86.000000
50% 88.000000
75% 91.000000
max 100.000000
Name: points, dtype: float64
# 分数的不同数值
reviews.points.unique()
array([ 87, 86, 85, 88, 92, 91, 90, 89, 83, 82, 81, 80, 100,
98, 97, 96, 95, 93, 94, 84, 99], dtype=int64)
# 分数的中值
reviews.points.median()
88.0
# 分数的平均值
reviews.points.mean()
88.44713820775404
pd.set_option('display.max_row',5)
reviews.taster_name.value_counts()
Roger Voss 25514
Michael Schachner 15134
...
Fiona Adams 27
Christina Pickard 6
Name: taster_name, Length: 19, dtype: int64
4.Grouping & Sorting
reviews.groupby('points').points.value_counts()
points points
80 80 397
81 81 692
...
99 99 33
100 100 19
Name: points, Length: 21, dtype: int64
reviews.groupby('price').points.max()
price
4.0 86
5.0 87
..
2500.0 96
3300.0 88
Name: points, Length: 390, dtype: int64
# agg可以运行多个函数
reviews.groupby('country').price.agg([min,max])
.dataframe tbody tr th {
vertical-align: top;
}
.dataframe thead th {
text-align: right;
}
min | max | |
---|---|---|
country | ||
Argentina | 4.0 | 230.0 |
Armenia | 14.0 | 15.0 |
... | ... | ... |
Ukraine | 6.0 | 13.0 |
Uruguay | 10.0 | 130.0 |
43 rows × 2 columns
pd.set_option('display.max_row',20)
countries_reviewed = reviews.groupby(['country','province']).points.agg([len])
countries_reviewed
.dataframe tbody tr th {
vertical-align: top;
}
.dataframe thead th {
text-align: right;
}
len | ||
---|---|---|
country | province | |
Argentina | Mendoza Province | 3264 |
Other | 536 | |
Armenia | Armenia | 2 |
Australia | Australia Other | 245 |
New South Wales | 85 | |
South Australia | 1349 | |
Tasmania | 42 | |
Victoria | 322 | |
Western Australia | 286 | |
Austria | Austria | 26 |
... | ... | ... |
US | Washington | 8639 |
Washington-Oregon | 7 | |
Ukraine | Ukraine | 14 |
Uruguay | Atlantida | 5 |
Canelones | 43 | |
Juanico | 12 | |
Montevideo | 11 | |
Progreso | 11 | |
San Jose | 3 | |
Uruguay | 24 |
425 rows × 1 columns
# reset_index()转换为常规的dataframe类型
countries_reviewed.reset_index()
.dataframe tbody tr th {
vertical-align: top;
}
.dataframe thead th {
text-align: right;
}
country | province | len | |
---|---|---|---|
0 | Argentina | Mendoza Province | 3264 |
1 | Argentina | Other | 536 |
2 | Armenia | Armenia | 2 |
3 | Australia | Australia Other | 245 |
4 | Australia | New South Wales | 85 |
5 | Australia | South Australia | 1349 |
6 | Australia | Tasmania | 42 |
7 | Australia | Victoria | 322 |
8 | Australia | Western Australia | 286 |
9 | Austria | Austria | 26 |
... | ... | ... | ... |
415 | US | Washington | 8639 |
416 | US | Washington-Oregon | 7 |
417 | Ukraine | Ukraine | 14 |
418 | Uruguay | Atlantida | 5 |
419 | Uruguay | Canelones | 43 |
420 | Uruguay | Juanico | 12 |
421 | Uruguay | Montevideo | 11 |
422 | Uruguay | Progreso | 11 |
423 | Uruguay | San Jose | 3 |
424 | Uruguay | Uruguay | 24 |
425 rows × 3 columns
# 默认升序
countries_reviewed.reset_index().sort_values(by='len')
.dataframe tbody tr th {
vertical-align: top;
}
.dataframe thead th {
text-align: right;
}
country | province | len | |
---|---|---|---|
179 | Greece | Muscat of Kefallonian | 1 |
192 | Greece | Sterea Ellada | 1 |
194 | Greece | Thraki | 1 |
354 | South Africa | Paardeberg | 1 |
40 | Brazil | Serra do Sudeste | 1 |
114 | Egypt | Egypt | 1 |
316 | Serbia | Pocerina | 1 |
112 | Cyprus | Pitsilia Mountains | 1 |
110 | Cyprus | Lemesos | 1 |
301 | Portugal | Vinho da Mesa | 1 |
... | ... | ... | ... |
228 | Italy | Veneto | 2716 |
0 | Argentina | Mendoza Province | 3264 |
224 | Italy | Piedmont | 3729 |
375 | Spain | Northern Spain | 3851 |
119 | France | Burgundy | 3980 |
409 | US | Oregon | 5373 |
227 | Italy | Tuscany | 5897 |
118 | France | Bordeaux | 5941 |
415 | US | Washington | 8639 |
392 | US | California | 36247 |
425 rows × 3 columns
countries_reviewed.reset_index().sort_values(by='len',ascending=False)
.dataframe tbody tr th {
vertical-align: top;
}
.dataframe thead th {
text-align: right;
}
country | province | len | |
---|---|---|---|
392 | US | California | 36247 |
415 | US | Washington | 8639 |
118 | France | Bordeaux | 5941 |
227 | Italy | Tuscany | 5897 |
409 | US | Oregon | 5373 |
119 | France | Burgundy | 3980 |
375 | Spain | Northern Spain | 3851 |
224 | Italy | Piedmont | 3729 |
0 | Argentina | Mendoza Province | 3264 |
228 | Italy | Veneto | 2716 |
... | ... | ... | ... |
110 | Cyprus | Lemesos | 1 |
366 | South Africa | Vlootenburg | 1 |
354 | South Africa | Paardeberg | 1 |
58 | Chile | Casablanca-Curicó Valley | 1 |
103 | Croatia | Middle and South Dalmatia | 1 |
101 | Croatia | Krk | 1 |
247 | New Zealand | Gladstone | 1 |
357 | South Africa | Piekenierskloof | 1 |
63 | Chile | Coelemu | 1 |
149 | Greece | Beotia | 1 |
425 rows × 3 columns
5.Data types & dealing with missing data
5.1 Data types
reviews.dtypes
country object
description object
designation object
points int64
price float64
province object
region_1 object
region_2 object
taster_name object
taster_twitter_handle object
title object
variety object
winery object
critic object
dtype: object
5.2 Missing data handing
reviews.country.isnull().sum()
63
pd.set_option('display.max_row',5)
reviews[reviews.region_2.isnull()]
.dataframe tbody tr th {
vertical-align: top;
}
.dataframe thead th {
text-align: right;
}
country | description | designation | points | price | province | region_1 | region_2 | taster_name | taster_twitter_handle | title | variety | winery | critic | |
---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|
0 | Italy | Aromas include tropical fruit, broom, brimston... | Vulkà Bianco | 87 | NaN | Sicily & Sardinia | Etna | NaN | Kerin O’Keefe | @kerinokeefe | Nicosia 2013 Vulkà Bianco (Etna) | White Blend | Nicosia | everyone |
1 | Portugal | This is ripe and fruity, a wine that is smooth... | Avidagos | 87 | 15.0 | Douro | NaN | NaN | Roger Voss | @vossroger | Quinta dos Avidagos 2011 Avidagos Red (Douro) | Portuguese Red | Quinta dos Avidagos | everyone |
... | ... | ... | ... | ... | ... | ... | ... | ... | ... | ... | ... | ... | ... | ... |
129969 | France | A dry style of Pinot Gris, this is crisp with ... | NaN | 90 | 32.0 | Alsace | Alsace | NaN | Roger Voss | @vossroger | Domaine Marcel Deiss 2012 Pinot Gris (Alsace) | Pinot Gris | Domaine Marcel Deiss | everyone |
129970 | France | Big, rich and off-dry, this is powered by inte... | Lieu-dit Harth Cuvée Caroline | 90 | 21.0 | Alsace | Alsace | NaN | Roger Voss | @vossroger | Domaine Schoffit 2012 Lieu-dit Harth Cuvée Car... | Gewürztraminer | Domaine Schoffit | everyone |
79460 rows × 14 columns
# fillna用于将缺失的NaN值替换成别的
reviews.region_2.fillna("Unknown")
0 Unknown
1 Unknown
...
129969 Unknown
129970 Unknown
Name: region_2, Length: 129971, dtype: object
6.Renaming & Combining
reviews.rename(columns={'points':'score'})
reviews.rename(index={0:'FirstEntry',1:'SecondEntry'})
reviews
.dataframe tbody tr th {
vertical-align: top;
}
.dataframe thead th {
text-align: right;
}
country | description | designation | points | price | province | region_1 | region_2 | taster_name | taster_twitter_handle | title | variety | winery | critic | |
---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|
0 | Italy | Aromas include tropical fruit, broom, brimston... | Vulkà Bianco | 87 | NaN | Sicily & Sardinia | Etna | NaN | Kerin O’Keefe | @kerinokeefe | Nicosia 2013 Vulkà Bianco (Etna) | White Blend | Nicosia | everyone |
1 | Portugal | This is ripe and fruity, a wine that is smooth... | Avidagos | 87 | 15.0 | Douro | NaN | NaN | Roger Voss | @vossroger | Quinta dos Avidagos 2011 Avidagos Red (Douro) | Portuguese Red | Quinta dos Avidagos | everyone |
... | ... | ... | ... | ... | ... | ... | ... | ... | ... | ... | ... | ... | ... | ... |
129969 | France | A dry style of Pinot Gris, this is crisp with ... | NaN | 90 | 32.0 | Alsace | Alsace | NaN | Roger Voss | @vossroger | Domaine Marcel Deiss 2012 Pinot Gris (Alsace) | Pinot Gris | Domaine Marcel Deiss | everyone |
129970 | France | Big, rich and off-dry, this is powered by inte... | Lieu-dit Harth Cuvée Caroline | 90 | 21.0 | Alsace | Alsace | NaN | Roger Voss | @vossroger | Domaine Schoffit 2012 Lieu-dit Harth Cuvée Car... | Gewürztraminer | Domaine Schoffit | everyone |
129971 rows × 14 columns