Pandas学习笔记（5) Data Types and Missing Values

最新推荐文章于 2024-01-31 17:48:03 发布

小帅吖

最新推荐文章于 2024-01-31 17:48:03 发布

阅读量704

点赞数

分类专栏： Pandas学习文章标签： python 机器学习人工智能 pandas

本文链接：https://blog.csdn.net/qq_47997583/article/details/121980182

版权

Pandas学习专栏收录该内容

6 篇文章 1 订阅

订阅专栏

1.理论部分

1.用dtype函数查看数据类型

reviews.price.dtype

dtype('float64')

reviews.dtypes

country        object
description    object
                ...  
variety        object
winery         object
Length: 13, dtype: object

2.用astype函数更换数据类型

reviews.points.astype('float64')

0         87.0
1         87.0
          ... 
129969    90.0
129970    90.0
Name: points, Length: 129971, dtype: float64

3.通过isnull函数查找有缺失值的数据

reviews[pd.isnull(reviews.country)]

$[外链图片转存失败,源站可能有防盗链机制,建议将图片保存下来直接上传(img-wN5jP7iX-1639647797579)(C:\Users\admin\AppData\Roaming\Typora\typora-user-images\image-20211216172308302.png)]$

4.用fillna函数填充缺失值

reviews.region_2.fillna("Unknown")

0         Unknown
1         Unknown
           ...   
129969    Unknown
129970    Unknown
Name: region_2, Length: 129971, dtype: object

5.用replace函数替换值

reviews.taster_twitter_handle.replace("@kerinokeefe", "@kerino")

0            @kerino
1         @vossroger
             ...    
129969    @vossroger
129970    @vossroger
Name: taster_twitter_handle, Length: 129971, dtype: object

2.实践部分

1.What is the data type of the points column in the dataset?

dtype = reviews.points.dtype

2.Create a Series from entries in the points column, but convert the entries to strings. Hint: strings are str in native Python.

point_strings = reviews.points.astype('str')

3.Sometimes the price column is null. How many reviews in the dataset are missing a price?

n_missing_prices = pd.isnull(reviews.price).sum()

4.What are the most common wine-producing regions? Create a Series counting the number of times each value occurs in the region_1 field. This field is often missing data, so replace missing values with Unknown. Sort in descending order. Your output should look something like this:

Unknown                    21247
Napa Valley                 4480
                           ...  
Bardolino Superiore            1
Primitivo del Tarantino        1
Name: region_1, Length: 1230, dtype: int64

reviews_per_region = reviews.region_1.fillna('Unknow').value_counts().sort_values(ascending=False)

小帅吖

关注

0
点赞
踩
0

收藏

觉得还不错? 一键收藏
0
评论
Pandas学习笔记（5) Data Types and Missing Values

1.理论部分1.用dtype函数查看数据类型reviews.price.dtypedtype('float64')reviews.dtypescountry objectdescription object ... variety objectwinery objectLength: 13, dtype: object2.用astype函数更换数据类型reviews.points.astype(
复制链接

扫一扫