google play store的app数据分析

google play store app数据源 提取码: 38jk

google play store的app数据分析

1. 加载数据

  • 加载数据分析使用的库
  • 加载数据前,先用文本编辑器简单浏览一下数据
  • 加载好数据之后,第一步先分别使用shape、head、count、describe和info方法看下数据
    import numpy as np
    import pandas as pd
    import matplotlib.pyplot as plt
    
    # 加载文件 
    # 这次只分析'App', 'Category', 'Rating', 'Reviews', 'Size', 'Installs', 'Type'
    df = pd.read_csv('./googleplaystore.csv', usecols=(0, 1, 2, 3, 4, 5, 6))
    
    # 简单浏览下数据
    print(df.head())
    # 查看行列数量
    print(df.shape)
    # 查看各个列的非空数量
    print(df.count())
    
    # 使用describe和info方法看下数据的大概分布
    print(df.describe())
    print(df.info())
                                               App        Category  Rating  \
    0     Photo Editor & Candy Camera & Grid & ScrapBook  ART_AND_DESIGN     4.1   
    1                                Coloring book moana  ART_AND_DESIGN     3.9   
    2  U Launcher Lite – FREE Live Cool Themes, Hide ...  ART_AND_DESIGN     4.7   
    3                              Sketch - Draw & Paint  ART_AND_DESIGN     4.5   
    4              Pixel Draw - Number Art Coloring Book  ART_AND_DESIGN     4.3   
    
      Reviews  Size     Installs  Type  
    0     159   19M      10,000+  Free  
    1     967   14M     500,000+  Free  
    2   87510  8.7M   5,000,000+  Free  
    3  215644   25M  50,000,000+  Free  
    4     967  2.8M     100,000+  Free  
    (10841, 7)
    App         10841
    Category    10841
    Rating       9367
    Reviews     10841
    Size        10841
    Installs    10841
    Type        10840
    dtype: int64
                Rating
    count  9367.000000
    mean      4.193338
    std       0.537431
    min       1.000000
    25%       4.000000
    50%       4.300000
    75%       4.500000
    max      19.000000
    <class 'pandas.core.frame.DataFrame'>
    RangeIndex: 10841 entries, 0 to 10840
    Data columns (total 7 columns):
    App         10841 non-null object
    Category    10841 non-null object
    Rating      9367 non-null float64
    Reviews     10841 non-null object
    Size        10841 non-null object
    Installs    10841 non-null object
    Type        10840 non-null object
    dtypes: float64(1), object(6)
    memory usage: 592.9+ KB
    None
  • 从上面的运行结果得出
  • 数据一共有10841行
  • Rating和Type数据有缺失
  • Rating有一个19的异常值
  • Size的‘M’和‘k’和Installs的‘+’都需要处理,方便进一步计算

2. 数据清洗 # App

  • 查看有没有重复值
    print(df['App'].unique().size)
    9660
  • 有重复值,先不着急删除,为了不把其他列的异常值留下,先处理数值异常的列

3. 数据清洗 # Categoery

    print(df['Category'].value_counts(dropna=False))
    print(df[df['Category'] == '1.9'])
    FAMILY                 1972
    GAME                   1144
    TOOLS                   843
    MEDICAL                 463
    BUSINESS                460
    PRODUCTIVITY            424
    PERSONALIZATION         392
    COMMUNICATION           387
    SPORTS                  384
    LIFESTYLE               382
    FINANCE                 366
    HEALTH_AND_FITNESS      341
    PHOTOGRAPHY             335
    SOCIAL                  295
    NEWS_AND_MAGAZINES      283
    SHOPPING                260
    TRAVEL_AND_LOCAL        258
    DATING                  234
    BOOKS_AND_REFERENCE     231
    VIDEO_PLAYERS           175
    EDUCATION               156
    ENTERTAINMENT           149
    MAPS_AND_NAVIGATION     137
    FOOD_AND_DRINK          127
    HOUSE_AND_HOME           88
    AUTO_AND_VEHICLES        85
    LIBRARIES_AND_DEMO       85
    WEATHER                  82
    ART_AND_DESIGN           65
    EVENTS                   64
    COMICS                   60
    PARENTING                60
    BEAUTY                   53
    1.9                       1
    Name: Category, dtype: int64
                                               App Category  Rating Reviews  \
    10472  Life Made WI-Fi Touchscreen Photo Frame      1.9    19.0    3.0M   
    
             Size Installs Type  
    10472  1,000+     Free    0  
  • 有一条异常值,观察发现应该是Category值缺失,所以这里删除这条数据
    df.drop(index=10472, inplace=True)

4. 数据清洗 # Rating

    print(df['Rating'].value_counts(dropna=False))
    NaN     1474
    4.4     1109
    4.3     1076
    4.5     1038
    4.2      952
    4.6      823
    4.1      708
    4.0      568
    4.7      499
    3.9      386
    3.8      303
    5.0      274
    3.7      239
    4.8      234
    3.6      174
    3.5      163
    3.4      128
    3.3      102
    4.9       87
    3.0       83
    3.1       69
    3.2       64
    2.9       45
    2.8       42
    2.6       25
    2.7       25
    2.5       21
    2.3       20
    2.4       19
    1.0       16
    2.2       14
    1.9       13
    2.0       12
    1.8        8
    1.7        8
    2.1        8
    1.6        4
    1.5        3
    1.4        3
    1.2        1
    Name: Rating, dtype: int64
  • 一共有1474条NaN值,用平均值来填充
    df['Rating'].fillna(value=df['Rating'].mean(), inplace=True)

5. 数据清洗 # Reviews

    print(df['Rating'].value_counts(dropna=False))
    print(df['Reviews'].str.isnumeric().sum())
    4.193338     1474
    4.400000     1109
    4.300000     1076
    4.500000     1038
    4.200000      952
    4.600000      823
    4.100000      708
    4.000000      568
    4.700000      499
    3.900000      386
    3.800000      303
    5.000000      274
    3.700000      239
    4.800000      234
    3.600000      174
    3.500000      163
    3.400000      128
    3.300000      102
    4.900000       87
    3.000000       83
    3.100000       69
    3.200000       64
    2.900000       45
    2.800000       42
    2.700000       25
    2.600000       25
    2.500000       21
    2.300000       20
    2.400000       19
    1.000000       16
    2.200000       14
    1.900000       13
    2.000000       12
    2.100000        8
    1.800000        8
    1.700000        8
    1.600000        4
    1.400000   
  • 0
    点赞
  • 6
    收藏
    觉得还不错? 一键收藏
  • 0
    评论

“相关推荐”对你有帮助么?

  • 非常没帮助
  • 没帮助
  • 一般
  • 有帮助
  • 非常有帮助
提交
评论
添加红包

请填写红包祝福语或标题

红包个数最小为10个

红包金额最低5元

当前余额3.43前往充值 >
需支付:10.00
成就一亿技术人!
领取后你会自动成为博主和红包主的粉丝 规则
hope_wisdom
发出的红包
实付
使用余额支付
点击重新获取
扫码支付
钱包余额 0

抵扣说明:

1.余额是钱包充值的虚拟货币,按照1:1的比例进行支付金额的抵扣。
2.余额无法直接购买下载,可以购买VIP、付费专栏及课程。

余额充值