Google Capstone Project: eCommerce Events History in Cosmetics Shop

This is an optional capstone project from the Google Data Analytics Course no. 8: Capstone Project.

The analysis follows the 6 steps of Data Analysis taught in the Google course: Ask, Prepare, Process, Analyse, Share and Act.

Step 1: Ask

In this step, we define the problem and objectives of our case study and its desired outcome.

1.0 Background:

This file contains behavior data for 5 months (Oct 2019 – Feb 2020) from a medium cosmetics online store. Here the data in Jan 2020 was selected. Data was collected by Open CDP (https://rees46.com/en/open-cdp) on large eCommerce store (1.2M visitors per month). All sensitive and personal data was removed. Broken data (like NULL price or products without categories) was removed also. 

1.1 Business Task:

Analyse the cosmetics online store sales data in Jan 2020 to gain insights into consumer behaviours and discover trends for the cosmetics online store marketing strategy.

1.2 Business Objectives:

  • What are the trends identified?
  • How could these trends help improve sales revenue?

1.3 Deliverables:

  • A clear summary of the business task
  • A description of all data sources used
  • A summary of analysis
  • Supporting visualizations and key findings
  • High-level content recommendations based on the analysis

STEP 2: PREPARE

In the Prepare phase, we identify the data being used and its limitations.

2.1 Information on Data Source:

  1. Data is publicly available on Kaggle: eCommerce Events History in Cosmetics Shop.
  2. Data collected by the Open CDP project.
  3. Each row in the file represents an event. All events are related to products and users. Each event is like a many-to-many relation between products and users.

2.2 Is Data ROCCC?

A good data source is ROCCC which stands for Reliable, Original, Comprehensive, Current, and Cited.

  • Reliable — HIGH — It has 4,264,752 observations in total
  • Original — LOW — Third party provider 
  • Comprehensive — MED — Some values in columns like brands and category_id are missing
  • Current — MED — Data is 2 years old 
  • Cited — LOW — Data collected from third party, hence unknown

Overall, the dataset is considered good quality data and it is recommended to produce analytical reports based on this data.

2.3 Data Selection

The following file is selected and copied for analysis.

2020-Jan.csv

2.4 Tool

We are using MySQL for data analysis.

STEP 3: PROCESS

Here, we will process the data by cleaning and ensuring that it is correct, relevant, complete and free of error and outlier by performing:

  • Explore and observe data
  • Check for and treat missing or null values
  • Transform data — format data type
  • Perform preliminary statistical analysis

3.1 Data cleaning and manipulation

  1. Observe and familiarize with data
  2. Check for null or missing values
  3. Perform sanity check of data

3.1.1 How to read the data

There are different types of events. See below.

Semantics (or how to read it):

User userid during session usersession added to shopping cart (property eventtype is equal cart) product productid of brand brand of category categorycode (categorycode) with price price at event_time

File structure

Property Description
event_time Time when event happened at (in UTC).
event_type Only one kind of event: purchase.
product_id ID of a product
category_id Product's category ID
category_code Product's category taxonomy (code name) if it was possible to make it. Usually present for meaningful categories and skipped for different kinds of accessories.
brand Downcased string of brand name. Can be missed.
price Float price of a product. Present.
user_id Permanent user ID.
** user_session** Temporary user's session ID. Same for each user's session. Is changed every time user come back to online store from a long pause.

Event types

Events can be:

  • view - a user viewed a product
  • cart - a user added a product to shopping cart
  • remove_from_cart - a user removed a product from shopping cart
  • purchase - a user purchased a product

Multiple purchases per session

A session can have multiple purchase events. It's ok, because it's a single order.

3.1.2. Find out the outliers

SELECT *
FROM Cosmetics_Shop
WHERE price < 0

39 rows are returned. It is impossible to have products whose price is less than 0 so those 39 rows will be removed. There are no more outliers then. Thus, we can start to analyze this this dataset though there are some missing value in category_code and brand columns.

STEP 4: ANALYSE

4.1 Key Performance Indicators (KPIs) for Ecommerce

4.1.1 Users behaviour KPI

-- numbers of each event type
SELECT SUM(CASE WHEN event_type = 'view' THEN 1 ELSE 0 END) AS view_num ,
SUM(CASE WHEN event_type = 'cart' THEN 1 ELSE 0 END) AS cart_num ,
SUM(CASE WHEN event_type = 'remove_from_cart' THEN 1 ELSE 0 END) AS remove_from_cart_num ,
SUM(CASE WHEN event_type = 'purchase' THEN 1 ELSE 0 END) AS purchase_num ,
COUNT(DISTINCT user_id) AS uv ,
ROUND(SUM(CASE WHEN event_type = 'view' THEN 1 ELSE 0 END) / COUNT(DISTINCT user_id),0) AS 'uv/pv'
FROM Cosmetics_Shop

view_num

cart_num

remove_from_cart_num

purchase_num

uv

uv/pv

2037608

1148323

815024

263797

410073

5

Page View(PV): 2,037,608

Unique Visitors(UV): 410,073

Page View Per Visitor(UV/PV): 5

4.1.2 Users repeat purchasing rate

-- the purchasing frequency of each customer
CREATE VIEW purchase_times AS
SELECT user_id , 
event_time ,
DENSE_RANK()OVER(PARTITION BY user_id ORDER BY DAY(event_time)) AS purchase_times
FROM Cosmetics_Shop
WHERE event_type = 'purchase'
ORDER BY user_id , event_time ;

-- the rate of repeat purchasing
SELECT COUNT(user_id) AS purchase , 
(SELECT COUNT(user_id)
FROM purchase_times
WHERE purchase_times > 1) AS repeat_purchase ,
(SELECT COUNT(user_id)
FROM purchase_times
WHERE purchase_times > 1) / COUNT(user_id) AS repeat_purchase_rate
FROM purchase_times

purchase

repeat_purchase

repeat_purchase_rate

263797

34057

0.1291

The repear purchasing rate is 12.91% , which is not high enough. Thus, we need to take actions to keep customers.

4.2 Analyzing conversion funnel

Funnel: view--cart--purchase

-- numbers of each event type
SELECT event_type , COUNT(event_type) AS event_num
FROM Cosmetics_Shop
GROUP BY event_type ;

--- customers only view
SELECT COUNT(user_id)
FROM Cosmetics_Shop
WHERE user_id NOT IN
(SELECT user_id
FROM Cosmetics_Shop
WHERE event_type = 'cart' OR event_type = 'remove_from_cart' OR event_type = 'purchase')

The number of the behaviour only viewing without adding to cart or purchasing is 707,542 ,  which means the bouncing rate of view-to-cart part is 707,542 / 2,037,608 = 34.7%

--- customers only add products into cart
SELECT COUNT(user_id)
FROM Cosmetics_Shop
WHERE user_id NOT IN
(SELECT user_id
FROM Cosmetics_Shop
WHERE event_type = 'remove_from_cart' OR event_type = 'purchase') AND
event_type = 'cart'

The number of the behaviour only adding products into cart without purchasing is 37,960 ,  which means the bouncing rate of cart-to-purchase of adding products into cart without purchasing part is 156,163 / 1,148,323 = 13.6%

--- customers remove products from cart after adding in
SELECT COUNT(user_id)
FROM Cosmetics_Shop
WHERE user_id NOT IN
(SELECT user_id
FROM Cosmetics_Shop
WHERE event_type = 'purchase') AND
event_type = 'remove_from_cart' AND 
user_id IN
(SELECT user_id
FROM Cosmetics_Shop
WHERE event_type = 'cart')

The number of the behaviour removing products from cart after adding in is 357,594 ,  which means the bouncing rate of cart-to-purchase of removing products from cart after adding in part is 357,594 / 1,148,323 = 31.1%

In conclusion , we can see many customers bounced at the stage of view-to-cart and they tended to remove products from cart after adding in. Though the purchase rate was decent(13%) , we need to try to lower

  • 0
    点赞
  • 0
    收藏
    觉得还不错? 一键收藏
  • 0
    评论
评论
添加红包

请填写红包祝福语或标题

红包个数最小为10个

红包金额最低5元

当前余额3.43前往充值 >
需支付:10.00
成就一亿技术人!
领取后你会自动成为博主和红包主的粉丝 规则
hope_wisdom
发出的红包
实付
使用余额支付
点击重新获取
扫码支付
钱包余额 0

抵扣说明:

1.余额是钱包充值的虚拟货币,按照1:1的比例进行支付金额的抵扣。
2.余额无法直接购买下载,可以购买VIP、付费专栏及课程。

余额充值