学习Pandas(Kaggle)

Kaggle原文链接:https://www.kaggle.com/learn/pandas
数据集链接:https://pan.baidu.com/s/1QTrLgMewrebwXeD3QwnJiA
提取码:wd5b

1.Creating,Reading & Writing

1.1 Creating data

Pandas中有两个核心对象,DataFrame和Series

1.1.1 DataFrame

DataFrame是二维表格,竖列称为column,横列称为index,例如创建一个名为fruits的DataFrame对象。

import pandas as pd

# 方法一:先输入表格数据,然后分别对columns和index作补充说明
fruits = pd.DataFrame([[10,20],[30,40]],columns=["Apples","Bananas"],index=["Price","Amount"])
print('方法一:')
print(fruits)
print('\n')
# 方法二:按列输入每个column的数据,然后对index作单独补充说明
fruits2 = pd.DataFrame({"Apples":[10,30],"Bananas":[20,40]},index=["Price","Amount"])
print('方法二:')
print(fruits2)
方法一:
        Apples  Bananas
Price       10       20
Amount      30       40

方法二:
        Apples  Bananas
Price       10       20
Amount      30       40

1.1.2 Series

Series是一维表格,即只有单列column,可以把一个DataFrame看作是多个Series组合起来的合体,它们总是相互关联。不对column单独命名,只有一个表的表格名,例如创建一个名为things的Series对象:

things = pd.Series([1,2,3],index=['Milk','Eggs','Spam'],name='Dinner')
things
Milk    1
Eggs    2
Spam    3
Name: Dinner, dtype: int64
note:如果index不做特殊说明,那么就会是从0开始的连续自然数

1.2 Reading file

数据可以存储在多种文件格式中,目前最基础的是CSV格式的文件,我们使用read_csv()函数把数据读取到一个DataFrame对象中:

# 此时pandas会自动加入从0开始的index
wine_reviews = pd.read_csv('wine.csv')

# 如果原数据有自己的index,比如在第一列,则可以index_col=0来描述,这样第一列就会作为index
win_reviews = pd.read_csv('wine.csv',index_col=0)

另一种常见的数据格式是SQL,它的存储能力相当惊人,SQL有很多不同种类,每一种都需要各自的connector,读取没有kaggle方便,目前在kaggle唯一支持的种类是SQLite

import sqlite3

conn = sqlite3.connect('FPA_FOD_20170508.sqlite')
fires = pd.read_sql_query("SELECT * FROM fires", conn)

1.3 Writing file

使用to_csv()函数将数据写入CSV格式的文件

wine_reviews.to_csv("wine_reviews.csv")

2.Indexing,Selecting,Assigning

2.1 Naive accessors

简单的访问数据:可以直接显示整个DataFrame,若数据量较大最好设置展示的最大行数;也可用head()来展示前几行数据。

reviews = pd.read_csv('wine.csv',index_col=0)
pd.set_option('display.max_rows',5)
reviews
.dataframe tbody tr th {
    vertical-align: top;
}

.dataframe thead th {
    text-align: right;
}
countrydescriptiondesignationpointspriceprovinceregion_1region_2taster_nametaster_twitter_handletitlevarietywinery
0ItalyAromas include tropical fruit, broom, brimston...Vulkà Bianco87NaNSicily & SardiniaEtnaNaNKerin O’Keefe@kerinokeefeNicosia 2013 Vulkà Bianco (Etna)White BlendNicosia
1PortugalThis is ripe and fruity, a wine that is smooth...Avidagos8715.0DouroNaNNaNRoger Voss@vossrogerQuinta dos Avidagos 2011 Avidagos Red (Douro)Portuguese RedQuinta dos Avidagos
..........................................
129969FranceA dry style of Pinot Gris, this is crisp with ...NaN9032.0AlsaceAlsaceNaNRoger Voss@vossrogerDomaine Marcel Deiss 2012 Pinot Gris (Alsace)Pinot GrisDomaine Marcel Deiss
129970FranceBig, rich and off-dry, this is powered by inte...Lieu-dit Harth Cuvée Caroline9021.0AlsaceAlsaceNaNRoger Voss@vossrogerDomaine Schoffit 2012 Lieu-dit Harth Cuvée Car...GewürztraminerDomaine Schoffit

129971 rows × 13 columns

# 前5行数据
reviews.head(5)
.dataframe tbody tr th {
    vertical-align: top;
}

.dataframe thead th {
    text-align: right;
}
countrydescriptiondesignationpointspriceprovinceregion_1region_2taster_nametaster_twitter_handletitlevarietywinery
0ItalyAromas include tropical fruit, broom, brimston...Vulkà Bianco87NaNSicily & SardiniaEtnaNaNKerin O’Keefe@kerinokeefeNicosia 2013 Vulkà Bianco (Etna)White BlendNicosia
1PortugalThis is ripe and fruity, a wine that is smooth...Avidagos8715.0DouroNaNNaNRoger Voss@vossrogerQuinta dos Avidagos 2011 Avidagos Red (Douro)Portuguese RedQuinta dos Avidagos
2USTart and snappy, the flavors of lime flesh and...NaN8714.0OregonWillamette ValleyWillamette ValleyPaul Gregutt@paulgwineRainstorm 2013 Pinot Gris (Willamette Valley)Pinot GrisRainstorm
3USPineapple rind, lemon pith and orange blossom ...Reserve Late Harvest8713.0MichiganLake Michigan ShoreNaNAlexander PeartreeNaNSt. Julian 2013 Reserve Late Harvest Riesling ...RieslingSt. Julian
4USMuch like the regular bottling from 2012, this...Vintner's Reserve Wild Child Block8765.0OregonWillamette ValleyWillamette ValleyPaul Gregutt@paulgwineSweet Cheeks 2012 Vintner's Reserve Wild Child...Pinot NoirSweet Cheeks
# 后5行数据
reviews.tail(5)

.dataframe tbody tr th {
    vertical-align: top;
}

.dataframe thead th {
    text-align: right;
}
countrydescriptiondesignationpointspriceprovinceregion_1region_2taster_nametaster_twitter_handletitlevarietywinery
129966GermanyNotes of honeysuckle and cantaloupe sweeten th...Brauneberger Juffer-Sonnenuhr Spätlese9028.0MoselNaNNaNAnna Lee C. IijimaNaNDr. H. Thanisch (Erben Müller-Burggraef) 2013 ...RieslingDr. H. Thanisch (Erben Müller-Burggraef)
129967USCitation is given as much as a decade of bottl...NaN9075.0OregonOregonOregon OtherPaul Gregutt@paulgwineCitation 2004 Pinot Noir (Oregon)Pinot NoirCitation
129968FranceWell-drained gravel soil gives this wine its c...Kritt9030.0AlsaceAlsaceNaNRoger Voss@vossrogerDomaine Gresser 2013 Kritt Gewurztraminer (Als...GewürztraminerDomaine Gresser
129969FranceA dry style of Pinot Gris, this is crisp with ...NaN9032.0AlsaceAlsaceNaNRoger Voss@vossrogerDomaine Marcel Deiss 2012 Pinot Gris (Alsace)Pinot GrisDomaine Marcel Deiss
129970FranceBig, rich and off-dry, this is powered by inte...Lieu-dit Harth Cuvée Caroline9021.0AlsaceAlsaceNaNRoger Voss@vossrogerDomaine Schoffit 2012 Lieu-dit Harth Cuvée Car...GewürztraminerDomaine Schoffit

选取特定的某一列数据,可以用DataFrame.column或者DataFrame[“column”]

reviews.country
0            Italy
1         Portugal
            ...   
129969      France
129970      France
Name: country, Length: 129971, dtype: object
reviews['country']
0            Italy
1         Portugal
            ...   
129969      France
129970      France
Name: country, Length: 129971, dtype: object

选取特定的某一行某一列的元素,可以用DataFrame[‘column’][‘index’]

reviews['country'][0]
'Italy'

2.2 DataFrame.iloc

DataFrame.iloc基于数据的数字索引位置来检索数据,也可以用布尔值来进行检索

# 选取第一行
reviews.iloc[0]
country                                                    Italy
description    Aromas include tropical fruit, broom, brimston...
                                     ...                        
variety                                              White Blend
winery                                                   Nicosia
Name: 0, Length: 13, dtype: object
# 选取第一列
reviews.iloc[:,0]
0            Italy
1         Portugal
            ...   
129969      France
129970      France
Name: country, Length: 129971, dtype: object
# 选取第一列的前三个数据
reviews.iloc[:3,0]
0       Italy
1    Portugal
2          US
Name: country, dtype: object
reviews.iloc[[0,3,5],0]
0    Italy
3       US
5    Spain
Name: country, dtype: object
# 选取最后一行
reviews.iloc[-1]
country                                                   France
description    Big, rich and off-dry, this is powered by inte...
                                     ...                        
variety                                           Gewürztraminer
winery                                          Domaine Schoffit
Name: 129970, Length: 13, dtype: object
# 基于布尔值进行检索选取数据,注意布尔的数量要与index数量一致
reviews.head().iloc[[True,False,False,True,False]]
.dataframe tbody tr th {
    vertical-align: top;
}

.dataframe thead th {
    text-align: right;
}
countrydescriptiondesignationpointspriceprovinceregion_1region_2taster_nametaster_twitter_handletitlevarietywinery
0ItalyAromas include tropical fruit, broom, brimston...Vulkà Bianco87NaNSicily & SardiniaEtnaNaNKerin O’Keefe@kerinokeefeNicosia 2013 Vulkà Bianco (Etna)White BlendNicosia
3USPineapple rind, lemon pith and orange blossom ...Reserve Late Harvest8713.0MichiganLake Michigan ShoreNaNAlexander PeartreeNaNSt. Julian 2013 Reserve Late Harvest Riesling ...RieslingSt. Julian

2.3 DataFrame.loc

DataFrame.loc可以基于标签或布尔值对行列进行检索,还可以使用逻辑符号进行条件检索

reviews.loc[0,'country']

'Italy'

reviews.loc[:,['taster_name','points']]

.dataframe tbody tr th {
    vertical-align: top;
}

.dataframe thead th {
    text-align: right;
}

taster_namepoints
0Kerin O’Keefe87
1Roger Voss87
.........
129969Roger Voss90
129970Roger Voss90

129971 rows × 2 columns

reviews.head().loc[[True,False,True,False,True]]

.dataframe tbody tr th {
    vertical-align: top;
}

.dataframe thead th {
    text-align: right;
}

countrydescriptiondesignationpointspriceprovinceregion_1region_2taster_nametaster_twitter_handletitlevarietywinery
0ItalyAromas include tropical fruit, broom, brimston...Vulkà Bianco87NaNSicily & SardiniaEtnaNaNKerin O’Keefe@kerinokeefeNicosia 2013 Vulkà Bianco (Etna)White BlendNicosia
2USTart and snappy, the flavors of lime flesh and...NaN8714.0OregonWillamette ValleyWillamette ValleyPaul Gregutt@paulgwineRainstorm 2013 Pinot Gris (Willamette Valley)Pinot GrisRainstorm
4USMuch like the regular bottling from 2012, this...Vintner's Reserve Wild Child Block8765.0OregonWillamette ValleyWillamette ValleyPaul Gregutt@paulgwineSweet Cheeks 2012 Vintner's Reserve Wild Child...Pinot NoirSweet Cheeks

iloc和loc总结:

  • 相同点:都是先行后列的检索顺序,都支持使用布尔值检索
  • iloc适用于数字索引进行检索;loc可以使用表格的标签进行检索,还可以进行条件检索
  • 对于一个range(比如1:10),iloc是前闭后开(即1到9),loc是前后都闭(即1到10)

2.3条件检索

reviews.loc[reviews.country=='Italy']

.dataframe tbody tr th {
    vertical-align: top;
}

.dataframe thead th {
    text-align: right;
}

countrydescriptiondesignationpointspriceprovinceregion_1region_2taster_nametaster_twitter_handletitlevarietywinery
0ItalyAromas include tropical fruit, broom, brimston...Vulkà Bianco87NaNSicily & SardiniaEtnaNaNKerin O’Keefe@kerinokeefeNicosia 2013 Vulkà Bianco (Etna)White BlendNicosia
6ItalyHere's a bright, informal red that opens with ...Belsito8716.0Sicily & SardiniaVittoriaNaNKerin O’Keefe@kerinokeefeTerre di Giurfo 2013 Belsito Frappato (Vittoria)FrappatoTerre di Giurfo
..........................................
129961ItalyIntense aromas of wild cherry, baking spice, t...NaN9030.0Sicily & SardiniaSiciliaNaNKerin O’Keefe@kerinokeefeCOS 2013 Frappato (Sicilia)FrappatoCOS
129962ItalyBlackberry, cassis, grilled herb and toasted a...Sàgana Tenuta San Giacomo9040.0Sicily & SardiniaSiciliaNaNKerin O’Keefe@kerinokeefeCusumano 2012 Sàgana Tenuta San Giacomo Nero d...Nero d'AvolaCusumano

19540 rows × 13 columns

reviews.loc[(reviews.country=='Italy') & (reviews.points>=90)]

.dataframe tbody tr th {
    vertical-align: top;
}

.dataframe thead th {
    text-align: right;
}

countrydescriptiondesignationpointspriceprovinceregion_1region_2taster_nametaster_twitter_handletitlevarietywinery
120ItalySlightly backward, particularly given the vint...Bricco Rocche Prapó9270.0PiedmontBaroloNaNNaNNaNCeretto 2003 Bricco Rocche Prapó (Barolo)NebbioloCeretto
130ItalyAt the first it was quite muted and subdued, b...Bricco Rocche Brunate9170.0PiedmontBaroloNaNNaNNaNCeretto 2003 Bricco Rocche Brunate (Barolo)NebbioloCeretto
..........................................
129961ItalyIntense aromas of wild cherry, baking spice, t...NaN9030.0Sicily & SardiniaSiciliaNaNKerin O’Keefe@kerinokeefeCOS 2013 Frappato (Sicilia)FrappatoCOS
129962ItalyBlackberry, cassis, grilled herb and toasted a...Sàgana Tenuta San Giacomo9040.0Sicily & SardiniaSiciliaNaNKerin O’Keefe@kerinokeefeCusumano 2012 Sàgana Tenuta San Giacomo Nero d...Nero d'AvolaCusumano

6648 rows × 13 columns

reviews.loc[(reviews.country=='Italy') | (reviews.points>=90)]

.dataframe tbody tr th {
    vertical-align: top;
}

.dataframe thead th {
    text-align: right;
}

countrydescriptiondesignationpointspriceprovinceregion_1region_2taster_nametaster_twitter_handletitlevarietywinery
0ItalyAromas include tropical fruit, broom, brimston...Vulkà Bianco87NaNSicily & SardiniaEtnaNaNKerin O’Keefe@kerinokeefeNicosia 2013 Vulkà Bianco (Etna)White BlendNicosia
6ItalyHere's a bright, informal red that opens with ...Belsito8716.0Sicily & SardiniaVittoriaNaNKerin O’Keefe@kerinokeefeTerre di Giurfo 2013 Belsito Frappato (Vittoria)FrappatoTerre di Giurfo
..........................................
129969FranceA dry style of Pinot Gris, this is crisp with ...NaN9032.0AlsaceAlsaceNaNRoger Voss@vossrogerDomaine Marcel Deiss 2012 Pinot Gris (Alsace)Pinot GrisDomaine Marcel Deiss
129970FranceBig, rich and off-dry, this is powered by inte...Lieu-dit Harth Cuvée Caroline9021.0AlsaceAlsaceNaNRoger Voss@vossrogerDomaine Schoffit 2012 Lieu-dit Harth Cuvée Car...GewürztraminerDomaine Schoffit

61937 rows × 13 columns

# isin()用来选取存在于列表中的数据
reviews.loc[reviews.country.isin(['Italy','France'])]

.dataframe tbody tr th {
    vertical-align: top;
}

.dataframe thead th {
    text-align: right;
}

countrydescriptiondesignationpointspriceprovinceregion_1region_2taster_nametaster_twitter_handletitlevarietywinery
0ItalyAromas include tropical fruit, broom, brimston...Vulkà Bianco87NaNSicily & SardiniaEtnaNaNKerin O’Keefe@kerinokeefeNicosia 2013 Vulkà Bianco (Etna)White BlendNicosia
6ItalyHere's a bright, informal red that opens with ...Belsito8716.0Sicily & SardiniaVittoriaNaNKerin O’Keefe@kerinokeefeTerre di Giurfo 2013 Belsito Frappato (Vittoria)FrappatoTerre di Giurfo
..........................................
129969FranceA dry style of Pinot Gris, this is crisp with ...NaN9032.0AlsaceAlsaceNaNRoger Voss@vossrogerDomaine Marcel Deiss 2012 Pinot Gris (Alsace)Pinot GrisDomaine Marcel Deiss
129970FranceBig, rich and off-dry, this is powered by inte...Lieu-dit Harth Cuvée Caroline9021.0AlsaceAlsaceNaNRoger Voss@vossrogerDomaine Schoffit 2012 Lieu-dit Harth Cuvée Car...GewürztraminerDomaine Schoffit

41633 rows × 13 columns

print(reviews.country.isin(['Italy','France']))

0          True
1         False
2         False
3         False
4         False
          ...  
129966    False
129967    False
129968     True
129969     True
129970     True
Name: country, Length: 129971, dtype: bool

# isnull()与notnull()用来确认数据是否为空
reviews.loc[reviews.price.notnull()]

.dataframe tbody tr th {
    vertical-align: top;
}

.dataframe thead th {
    text-align: right;
}

countrydescriptiondesignationpointspriceprovinceregion_1region_2taster_nametaster_twitter_handletitlevarietywinery
1PortugalThis is ripe and fruity, a wine that is smooth...Avidagos8715.0DouroNaNNaNRoger Voss@vossrogerQuinta dos Avidagos 2011 Avidagos Red (Douro)Portuguese RedQuinta dos Avidagos
2USTart and snappy, the flavors of lime flesh and...NaN8714.0OregonWillamette ValleyWillamette ValleyPaul Gregutt@paulgwineRainstorm 2013 Pinot Gris (Willamette Valley)Pinot GrisRainstorm
3USPineapple rind, lemon pith and orange blossom ...Reserve Late Harvest8713.0MichiganLake Michigan ShoreNaNAlexander PeartreeNaNSt. Julian 2013 Reserve Late Harvest Riesling ...RieslingSt. Julian
4USMuch like the regular bottling from 2012, this...Vintner's Reserve Wild Child Block8765.0OregonWillamette ValleyWillamette ValleyPaul Gregutt@paulgwineSweet Cheeks 2012 Vintner's Reserve Wild Child...Pinot NoirSweet Cheeks
5SpainBlackberry and raspberry aromas show a typical...Ars In Vitro8715.0Northern SpainNavarraNaNMichael Schachner@wineschachTandem 2011 Ars In Vitro Tempranillo-Merlot (N...Tempranillo-MerlotTandem
..........................................
129966GermanyNotes of honeysuckle and cantaloupe sweeten th...Brauneberger Juffer-Sonnenuhr Spätlese9028.0MoselNaNNaNAnna Lee C. IijimaNaNDr. H. Thanisch (Erben Müller-Burggraef) 2013 ...RieslingDr. H. Thanisch (Erben Müller-Burggraef)
129967USCitation is given as much as a decade of bottl...NaN9075.0OregonOregonOregon OtherPaul Gregutt@paulgwineCitation 2004 Pinot Noir (Oregon)Pinot NoirCitation
129968FranceWell-drained gravel soil gives this wine its c...Kritt9030.0AlsaceAlsaceNaNRoger Voss@vossrogerDomaine Gresser 2013 Kritt Gewurztraminer (Als...GewürztraminerDomaine Gresser
129969FranceA dry style of Pinot Gris, this is crisp with ...NaN9032.0AlsaceAlsaceNaNRoger Voss@vossrogerDomaine Marcel Deiss 2012 Pinot Gris (Alsace)Pinot GrisDomaine Marcel Deiss
129970FranceBig, rich and off-dry, this is powered by inte...Lieu-dit Harth Cuvée Caroline9021.0AlsaceAlsaceNaNRoger Voss@vossrogerDomaine Schoffit 2012 Lieu-dit Harth Cuvée Car...GewürztraminerDomaine Schoffit

120975 rows × 13 columns

## 2.5 Assigning data 赋值
reviews['critic'] = 'everyone'
reviews['critic'] 

0         everyone
1         everyone
2         everyone
3         everyone
4         everyone
            ...   
129966    everyone
129967    everyone
129968    everyone
129969    everyone
129970    everyone
Name: critic, Length: 129971, dtype: object

3.Summary functions & maps

# 描述points的一些情况
reviews.points.describe()

count    129971.000000
mean         88.447138
std           3.039730
min          80.000000
25%          86.000000
50%          88.000000
75%          91.000000
max         100.000000
Name: points, dtype: float64

# 分数的不同数值
reviews.points.unique()

array([ 87,  86,  85,  88,  92,  91,  90,  89,  83,  82,  81,  80, 100,
        98,  97,  96,  95,  93,  94,  84,  99], dtype=int64)

# 分数的中值
reviews.points.median()

88.0

# 分数的平均值
reviews.points.mean()

88.44713820775404

pd.set_option('display.max_row',5)

reviews.taster_name.value_counts()

Roger Voss           25514
Michael Schachner    15134
                     ...  
Fiona Adams             27
Christina Pickard        6
Name: taster_name, Length: 19, dtype: int64

4.Grouping & Sorting

reviews.groupby('points').points.value_counts()

points  points
80      80        397
81      81        692
                 ... 
99      99         33
100     100        19
Name: points, Length: 21, dtype: int64

reviews.groupby('price').points.max()

price
4.0       86
5.0       87
          ..
2500.0    96
3300.0    88
Name: points, Length: 390, dtype: int64

# agg可以运行多个函数
reviews.groupby('country').price.agg([min,max])

.dataframe tbody tr th {
    vertical-align: top;
}

.dataframe thead th {
    text-align: right;
}

minmax
country
Argentina4.0230.0
Armenia14.015.0
.........
Ukraine6.013.0
Uruguay10.0130.0

43 rows × 2 columns

pd.set_option('display.max_row',20)
countries_reviewed = reviews.groupby(['country','province']).points.agg([len])
countries_reviewed

.dataframe tbody tr th {
    vertical-align: top;
}

.dataframe thead th {
    text-align: right;
}

len
countryprovince
ArgentinaMendoza Province3264
Other536
ArmeniaArmenia2
AustraliaAustralia Other245
New South Wales85
South Australia1349
Tasmania42
Victoria322
Western Australia286
AustriaAustria26
.........
USWashington8639
Washington-Oregon7
UkraineUkraine14
UruguayAtlantida5
Canelones43
Juanico12
Montevideo11
Progreso11
San Jose3
Uruguay24

425 rows × 1 columns

# reset_index()转换为常规的dataframe类型
countries_reviewed.reset_index()

.dataframe tbody tr th {
    vertical-align: top;
}

.dataframe thead th {
    text-align: right;
}

countryprovincelen
0ArgentinaMendoza Province3264
1ArgentinaOther536
2ArmeniaArmenia2
3AustraliaAustralia Other245
4AustraliaNew South Wales85
5AustraliaSouth Australia1349
6AustraliaTasmania42
7AustraliaVictoria322
8AustraliaWestern Australia286
9AustriaAustria26
............
415USWashington8639
416USWashington-Oregon7
417UkraineUkraine14
418UruguayAtlantida5
419UruguayCanelones43
420UruguayJuanico12
421UruguayMontevideo11
422UruguayProgreso11
423UruguaySan Jose3
424UruguayUruguay24

425 rows × 3 columns

# 默认升序
countries_reviewed.reset_index().sort_values(by='len')

.dataframe tbody tr th {
    vertical-align: top;
}

.dataframe thead th {
    text-align: right;
}

countryprovincelen
179GreeceMuscat of Kefallonian1
192GreeceSterea Ellada1
194GreeceThraki1
354South AfricaPaardeberg1
40BrazilSerra do Sudeste1
114EgyptEgypt1
316SerbiaPocerina1
112CyprusPitsilia Mountains1
110CyprusLemesos1
301PortugalVinho da Mesa1
............
228ItalyVeneto2716
0ArgentinaMendoza Province3264
224ItalyPiedmont3729
375SpainNorthern Spain3851
119FranceBurgundy3980
409USOregon5373
227ItalyTuscany5897
118FranceBordeaux5941
415USWashington8639
392USCalifornia36247

425 rows × 3 columns

countries_reviewed.reset_index().sort_values(by='len',ascending=False)

.dataframe tbody tr th {
    vertical-align: top;
}

.dataframe thead th {
    text-align: right;
}

countryprovincelen
392USCalifornia36247
415USWashington8639
118FranceBordeaux5941
227ItalyTuscany5897
409USOregon5373
119FranceBurgundy3980
375SpainNorthern Spain3851
224ItalyPiedmont3729
0ArgentinaMendoza Province3264
228ItalyVeneto2716
............
110CyprusLemesos1
366South AfricaVlootenburg1
354South AfricaPaardeberg1
58ChileCasablanca-Curicó Valley1
103CroatiaMiddle and South Dalmatia1
101CroatiaKrk1
247New ZealandGladstone1
357South AfricaPiekenierskloof1
63ChileCoelemu1
149GreeceBeotia1

425 rows × 3 columns

5.Data types & dealing with missing data

5.1 Data types

reviews.dtypes

country                   object
description               object
designation               object
points                     int64
price                    float64
province                  object
region_1                  object
region_2                  object
taster_name               object
taster_twitter_handle     object
title                     object
variety                   object
winery                    object
critic                    object
dtype: object

5.2 Missing data handing

reviews.country.isnull().sum()

63

pd.set_option('display.max_row',5)
reviews[reviews.region_2.isnull()]

.dataframe tbody tr th {
    vertical-align: top;
}

.dataframe thead th {
    text-align: right;
}

countrydescriptiondesignationpointspriceprovinceregion_1region_2taster_nametaster_twitter_handletitlevarietywinerycritic
0ItalyAromas include tropical fruit, broom, brimston...Vulkà Bianco87NaNSicily & SardiniaEtnaNaNKerin O’Keefe@kerinokeefeNicosia 2013 Vulkà Bianco (Etna)White BlendNicosiaeveryone
1PortugalThis is ripe and fruity, a wine that is smooth...Avidagos8715.0DouroNaNNaNRoger Voss@vossrogerQuinta dos Avidagos 2011 Avidagos Red (Douro)Portuguese RedQuinta dos Avidagoseveryone
.............................................
129969FranceA dry style of Pinot Gris, this is crisp with ...NaN9032.0AlsaceAlsaceNaNRoger Voss@vossrogerDomaine Marcel Deiss 2012 Pinot Gris (Alsace)Pinot GrisDomaine Marcel Deisseveryone
129970FranceBig, rich and off-dry, this is powered by inte...Lieu-dit Harth Cuvée Caroline9021.0AlsaceAlsaceNaNRoger Voss@vossrogerDomaine Schoffit 2012 Lieu-dit Harth Cuvée Car...GewürztraminerDomaine Schoffiteveryone

79460 rows × 14 columns

# fillna用于将缺失的NaN值替换成别的
reviews.region_2.fillna("Unknown")

0         Unknown
1         Unknown
           ...   
129969    Unknown
129970    Unknown
Name: region_2, Length: 129971, dtype: object

6.Renaming & Combining

reviews.rename(columns={'points':'score'})
reviews.rename(index={0:'FirstEntry',1:'SecondEntry'})
reviews

.dataframe tbody tr th {
    vertical-align: top;
}

.dataframe thead th {
    text-align: right;
}

countrydescriptiondesignationpointspriceprovinceregion_1region_2taster_nametaster_twitter_handletitlevarietywinerycritic
0ItalyAromas include tropical fruit, broom, brimston...Vulkà Bianco87NaNSicily & SardiniaEtnaNaNKerin O’Keefe@kerinokeefeNicosia 2013 Vulkà Bianco (Etna)White BlendNicosiaeveryone
1PortugalThis is ripe and fruity, a wine that is smooth...Avidagos8715.0DouroNaNNaNRoger Voss@vossrogerQuinta dos Avidagos 2011 Avidagos Red (Douro)Portuguese RedQuinta dos Avidagoseveryone
.............................................
129969FranceA dry style of Pinot Gris, this is crisp with ...NaN9032.0AlsaceAlsaceNaNRoger Voss@vossrogerDomaine Marcel Deiss 2012 Pinot Gris (Alsace)Pinot GrisDomaine Marcel Deisseveryone
129970FranceBig, rich and off-dry, this is powered by inte...Lieu-dit Harth Cuvée Caroline9021.0AlsaceAlsaceNaNRoger Voss@vossrogerDomaine Schoffit 2012 Lieu-dit Harth Cuvée Car...GewürztraminerDomaine Schoffiteveryone

129971 rows × 14 columns

  • 0
    点赞
  • 1
    收藏
    觉得还不错? 一键收藏
  • 0
    评论
Kaggle 是一个数据科学竞赛平台,提供了大量的数据集和机器学习问题供用户解决。Pandas 是 Python 中一个非常强大的数据处理库,可以帮助我们高效地处理和分析数据。 根据题目要求,我们需要使用 Pandas 创建一个数据集,并填充数据。首先,我们可以导入 Pandas 库并创建一个空的数据集: ```python import pandas as pd data = pd.DataFrame() ``` 接下来,我们可以根据题目需求添加列和数据。例如,如果题目要求我们创建一个包含姓名和年龄的表格,我们可以这样做: ```python data['姓名'] = ['张三', '李四', '王五'] data['年龄'] = [25, 30, 35] ``` 这样就创建了一个包含两列(姓名和年龄)的数据集,并填入了对应的数据。 如果题目还要求我们创建其他列,比如性别、职业等,我们可以类似地继续添加数据: ```python data['性别'] = ['男', '女', '男'] data['职业'] = ['工程师', '教师', '医生'] ``` 这样我们就完成了数据集的创建和填充。 最后,我们可以打印数据集并进行验证,确保我们创建的数据集符合要求: ```python print(data) ``` 输出的结果应该是这样的: ``` 姓名 年龄 性别 职业 0 张三 25 男 工程师 1 李四 30 女 教师 2 王五 35 男 医生 ``` 通过上述步骤,我们成功创建了一个包含姓名、年龄、性别和职业的数据集。这就是使用 PandasKaggle 上进行数据处理的简单示例。

“相关推荐”对你有帮助么?

  • 非常没帮助
  • 没帮助
  • 一般
  • 有帮助
  • 非常有帮助
提交
评论
添加红包

请填写红包祝福语或标题

红包个数最小为10个

红包金额最低5元

当前余额3.43前往充值 >
需支付:10.00
成就一亿技术人!
领取后你会自动成为博主和红包主的粉丝 规则
hope_wisdom
发出的红包
实付
使用余额支付
点击重新获取
扫码支付
钱包余额 0

抵扣说明:

1.余额是钱包充值的虚拟货币,按照1:1的比例进行支付金额的抵扣。
2.余额无法直接购买下载,可以购买VIP、付费专栏及课程。

余额充值