pandas二、索引、选择以及赋值

最新推荐文章于 2024-08-08 16:06:13 发布

原创最新推荐文章于 2024-08-08 16:06:13 发布 · 4.1k 阅读

2 ·

CC 4.0 BY-SA版权

文章标签：

#pandas #python

python操作数据同时被 2 个专栏收录

13 篇文章

订阅专栏

pandas

9 篇文章

订阅专栏

本文通过Python的pandas库，对葡萄酒评论数据集进行了一系列的数据探索操作，包括读取CSV文件、查看数据头、选取特定列、获取单行记录、选择前几行描述、按索引选取特定行以及创建包含特定列的子数据集。

首先是检查是否数据

import pandas as pd
reviews = pd.read_csv("winemag-data-130k-v2.csv", index_col=0)
reviews.head()

	country	description	designation	points	price	province	region_1	region_2	taster_name	taster_twitter_handle	title	variety	winery
0	Italy	Aromas include tropical fruit, broom, brimston...	Vulkà Bianco	87	NaN	Sicily & Sardinia	Etna	NaN	Kerin O’Keefe	@kerinokeefe	Nicosia 2013 Vulkà Bianco (Etna)	White Blend	Nicosia
1	Portugal	This is ripe and fruity, a wine that is smooth...	Avidagos	87	15.0	Douro	NaN	NaN	Roger Voss	@vossroger	Quinta dos Avidagos 2011 Avidagos Red (Douro)	Portuguese Red	Quinta dos Avidagos
2	US	Tart and snappy, the flavors of lime flesh and...	NaN	87	14.0	Oregon	Willamette Valley	Willamette Valley	Paul Gregutt	@paulgwine	Rainstorm 2013 Pinot Gris (Willamette Valley)	Pinot Gris	Rainstorm
3	US	Pineapple rind, lemon pith and orange blossom ...	Reserve Late Harvest	87	13.0	Michigan	Lake Michigan Shore	NaN	Alexander Peartree	NaN	St. Julian 2013 Reserve Late Harvest Riesling ...	Riesling	St. Julian
4	US	Much like the regular bottling from 2012, this...	Vintner's Reserve Wild Child Block	87	65.0	Oregon	Willamette Valley	Willamette Valley	Paul Gregutt	@paulgwine	Sweet Cheeks 2012 Vintner's Reserve Wild Child...	Pinot Noir	Sweet Cheeks

1、从结果中选择某一列进行赋值

desc = reviews.description
# or
desc = reviews["description"]  # 这两个都可以
desc.head()
"""
0    Aromas include tropical fruit, broom, brimston...
1    This is ripe and fruity, a wine that is smooth...
2    Tart and snappy, the flavors of lime flesh and...
3    Pineapple rind, lemon pith and orange blossom ...
4    Much like the regular bottling from 2012, this...
Name: description, dtype: object
"""

2、从reviws的description列取第一个值

first_description = reviews.description.iloc[0]
first_description
'''
"Aromas include tropical fruit, broom, brimstone and dried herb. The palate isn't overly expressive, offering unripened apple, citrus and dried sage alongside brisk acidity."
'''

3、取第一行的值，即第一行记录

first_row = reviews.iloc[0]
first_row
'''
country                                                              Italy
description              Aromas include tropical fruit, broom, brimston...
designation                                                   Vulkà Bianco
points                                                                  87
price                                                                  NaN
province                                                 Sicily & Sardinia
region_1                                                              Etna
region_2                                                               NaN
taster_name                                                  Kerin O’Keefe
taster_twitter_handle                                         @kerinokeefe
title                                    Nicosia 2013 Vulkà Bianco  (Etna)
variety                                                        White Blend
winery                                                             Nicosia
Name: 0, dtype: object
'''

4、选取reviews中description列的前10行值。

first_description = reviews.description.iloc[:10]
first_description
'''
0    Aromas include tropical fruit, broom, brimston...
1    This is ripe and fruity, a wine that is smooth...
2    Tart and snappy, the flavors of lime flesh and...
3    Pineapple rind, lemon pith and orange blossom ...
4    Much like the regular bottling from 2012, this...
5    Blackberry and raspberry aromas show a typical...
6    Here's a bright, informal red that opens with ...
7    This dry and restrained wine offers spice in p...
8    Savory dried thyme notes accent sunnier flavor...
9    This has great depth of flavor with its fresh ...
Name: description, dtype: object
'''

5、选取索引1,2,3,5,8的记录行

index = [1,2,3,5,8]
sample_reviews = reviews.iloc[index]
# sample_reviews = reviews.loc[index]
'''
这里loc 和 iloc的区别是：
iloc：
      是你选择的是第1，2,3,5,8行
而loc：
      则是根据你的索引 比如你的索引是从1500开始往后递增的 那么用上面的1,2,3,5,8就会报错
      应该用[1501,1502,1503,1505,1508]
'''
# result

6、创建一个变量df,df包含reviews的 `country`, `province`, `region_1`, and `region_2`列，并保留索引 0 1 10 100的记录，换言之产生一个如下的DataFrame:

cols = ['country', 'province', 'region_1', 'region_2']
indices = [0, 1, 10, 100]
reviews.loc[indices, cols]
# 运行结果如下：

	country	province	region_1	region_2
0	Italy	Sicily & Sardinia	Etna	NaN
1	Portugal	Douro	NaN	NaN
10	US	California	Napa Valley	Napa
100	US	New York	Finger Lakes	Finger Lakes

7、创建包含country、variety列且保留前100行数据的变量df:

cols = ['country', 'variety']
df = reviews.loc[:99, cols]
# or

cols_idx = [0, 11]
df = reviews.iloc[:100, cols_idx]

	country	variety
0	Italy	White Blend
1	Portugal	Portuguese Red
2	US	Pinot Gris
3	US	Riesling
4	US	Pinot Noir
5	Spain	Tempranillo-Merlot

.........................................................................
8、创建一个DataFrame 名字叫做italian_wines,包含 ‘Italy’列即酒的产地。提示： `reviews.country`

italian_wines = reviews[reviews.country == 'Italy']
italian_wines.head()

9、创建一个DataFrame名字叫做top_oceania_wines，包含至少95行以上产地来自Australia或new zealand的信息。

top_oc = reviews[
    (reviews.country.isin(['Australia','new zealand'])&(reviews.points >= 95))
]