使用python和pandas进行同类群组分析

背景故事 (Backstory)

I stumbled upon an interesting task while doing a data exercise for a company. It was about cohort analysis based on user activity data, I got really interested so thought of writing this post.

在为公司进行数据练习时,我偶然发现了一项有趣的任务。 这是关于基于用户活动数据的队列分析,我真的很感兴趣,因此想到了写这篇文章。

This article provides an insight into what cohort analysis is and how to analyze data for plotting cohorts. There are various ways to do this, I have discussed a specific approach that uses pandas and python to track user retention. And further provided some analysis into figuring out the best traffic sources (organic/ inorganic) for an organization.

本文提供了关于什么是队列分析以及如何分析数据以绘制队列的见解。 有多种方法可以做到这一点,我已经讨论了一种使用pandas和python跟踪用户保留率的特定方法。 并进一步提供了一些分析,以找出组织的最佳流量来源(有机/无机)。

队列分析 (Cohort Analysis)

Let’s start by introducing the concept of cohorts. Dictionary definition of a cohort is a group of people with some common characteristics. Examples of cohort include birth cohorts (a group of people born during the same period, like 90’s kids) and academic cohorts (a group of people who started working towards the same curriculum to finish a degree together).

让我们开始介绍同类群组的概念。 同类词典的定义是一群具有某些共同特征的人。 同类人群的例子包括出生人群 (在同一时期出生的一群人,例如90年代的孩子)和学术人群 (一群开始朝着相同的课程努力以完成学位的人们)。

Cohort analysis is specifically useful in analyzing user growth patterns for products. In terms of a product, a cohort can be a group of people with the same sign-up date, the same usage starts month/date, or the same traffic source.

同类群组分析在分析产品的用户增长模式时特别有用。 就产品而言,同类群组可以是一群具有相同注册日期,相同使用开始月份/日期或相同流量来源的人。

Cohort analysis is an analytics method by which these groups can be tracked over time for finding key insights. This analysis can further be used to do customer segmentation and track metrics like retention, churn, and lifetime value. There are two types of cohorts — acquisitional and behavioral.

同类群组分析是一种分析方法,可以随着时间的推移跟踪这些组以查找关键见解。 该分析还可以用于进行客户细分,并跟踪诸如保留率,客户流失率和生命周期价值之类的指标。 队列有两种类型:获取型和行为型。

  • Acquisitional cohorts — groups of users on the basis of there signup date or first use date etc.

    获取群组-根据注册日期或首次使用日期等确定的用户组。

  • Behavioral cohorts — groups of users on the basis of there activities in a given period of time. Examples could be when they install the app, uninstall the app, delete the app, etc.

    行为群组-基于给定时间段内的活动的用户组。 例如当他们安装应用程序,卸载应用程序,删除应用程序等时。

In this article, I will be demonstrating the acquisition cohort creation and analysis using a dataset. Let’s dive into it:

在本文中,我将演示使用数据集进行的采集队列创建和分析。 让我们深入了解一下:

建立 (Setup)

I am using pandas, NumPy, seaborn, and matplotlib for this analysis. So let’s start by importing the required libraries.

我正在使用熊猫,NumPy,seaborn和matplotlib进行此分析。 因此,让我们从导入所需的库开始。

import pandas as pdimport numpy as npimport seaborn as snsimport matplotlib.pyplot as plt
%matplotlib inline

数据 (The Data)

This dataset consists of usage data from customers for the month of February. Some of the users in the dataset started using the app in February (if ‘isFirst’ is true) and some are pre-February users.

此数据集包含2月份来自客户的使用情况数据。 数据集中的某些用户从2月开始使用该应用程序(如果'isFirst'为true),另一些则是2月之前的用户。

df=pd.read_json(“data.json”)
df.head()
Image for post

The data has 5 columns:

数据有5列:

date: date of the use (for the month of February)timestamp: usage timestampuid: unique id assigned to usersisFirst: true if this is the user’s first use everSource: traffic source from which the user came

We can compute the shape and info of the dataframe as follows

我们可以如下计算数据框的形状和信息

df.shape(4823567, 5)df.info()<class 'pandas.core.frame.DataFrame'>
RangeIndex: 4823567 entries, 0 to 4823566
Data columns (total 5 columns):
date datetime64[ns]
isFirst bool
timestamp datetime64[ns]
uid object
utmSource object
dtypes: bool(1), datetime64[ns](2), object(2)
memory usage: 151.8+ MB

Below is a table of contents for the data analysis. I will first show the data cleaning that I did for this dataset, followed by the questions that this exercise will answer. The most important part of any analysis based project is what questions are going to be answered by the end of it. I will be answering 3 questions (listed below) followed by some

  • 0
    点赞
  • 1
    收藏
    觉得还不错? 一键收藏
  • 0
    评论

“相关推荐”对你有帮助么?

  • 非常没帮助
  • 没帮助
  • 一般
  • 有帮助
  • 非常有帮助
提交
评论
添加红包

请填写红包祝福语或标题

红包个数最小为10个

红包金额最低5元

当前余额3.43前往充值 >
需支付:10.00
成就一亿技术人!
领取后你会自动成为博主和红包主的粉丝 规则
hope_wisdom
发出的红包
实付
使用余额支付
点击重新获取
扫码支付
钱包余额 0

抵扣说明:

1.余额是钱包充值的虚拟货币,按照1:1的比例进行支付金额的抵扣。
2.余额无法直接购买下载,可以购买VIP、付费专栏及课程。

余额充值