Pandas学习笔记(5) Data Types and Missing Values

1.理论部分

1.用dtype函数查看数据类型

reviews.price.dtype
dtype('float64')
reviews.dtypes
country        object
description    object
                ...  
variety        object
winery         object
Length: 13, dtype: object

2.用astype函数更换数据类型

reviews.points.astype('float64')
0         87.0
1         87.0
          ... 
129969    90.0
129970    90.0
Name: points, Length: 129971, dtype: float64

3.通过isnull函数查找有缺失值的数据

reviews[pd.isnull(reviews.country)]

[外链图片转存失败,源站可能有防盗链机制,建议将图片保存下来直接上传(img-wN5jP7iX-1639647797579)(C:\Users\admin\AppData\Roaming\Typora\typora-user-images\image-20211216172308302.png)]

4.用fillna函数填充缺失值

reviews.region_2.fillna("Unknown")
0         Unknown
1         Unknown
           ...   
129969    Unknown
129970    Unknown
Name: region_2, Length: 129971, dtype: object

5.用replace函数替换值

reviews.taster_twitter_handle.replace("@kerinokeefe", "@kerino")
0            @kerino
1         @vossroger
             ...    
129969    @vossroger
129970    @vossroger
Name: taster_twitter_handle, Length: 129971, dtype: object

2.实践部分

1.What is the data type of the points column in the dataset?

dtype = reviews.points.dtype

2.Create a Series from entries in the points column, but convert the entries to strings. Hint: strings are str in native Python.

point_strings = reviews.points.astype('str')

3.Sometimes the price column is null. How many reviews in the dataset are missing a price?

n_missing_prices = pd.isnull(reviews.price).sum()

4.What are the most common wine-producing regions? Create a Series counting the number of times each value occurs in the region_1 field. This field is often missing data, so replace missing values with Unknown. Sort in descending order. Your output should look something like this:

Unknown                    21247
Napa Valley                 4480
                           ...  
Bardolino Superiore            1
Primitivo del Tarantino        1
Name: region_1, Length: 1230, dtype: int64
reviews_per_region = reviews.region_1.fillna('Unknow').value_counts().sort_values(ascending=False)
  • 0
    点赞
  • 0
    收藏
    觉得还不错? 一键收藏
  • 0
    评论

“相关推荐”对你有帮助么?

  • 非常没帮助
  • 没帮助
  • 一般
  • 有帮助
  • 非常有帮助
提交
评论
添加红包

请填写红包祝福语或标题

红包个数最小为10个

红包金额最低5元

当前余额3.43前往充值 >
需支付:10.00
成就一亿技术人!
领取后你会自动成为博主和红包主的粉丝 规则
hope_wisdom
发出的红包
实付
使用余额支付
点击重新获取
扫码支付
钱包余额 0

抵扣说明:

1.余额是钱包充值的虚拟货币,按照1:1的比例进行支付金额的抵扣。
2.余额无法直接购买下载,可以购买VIP、付费专栏及课程。

余额充值