Google Capstone Project: eCommerce Events History in Cosmetics Shop

This is an optional capstone project from the Google Data Analytics Course no. 8: Capstone Project.

The analysis follows the 6 steps of Data Analysis taught in the Google course: Ask, Prepare, Process, Analyse, Share and Act.

Step 1: Ask

In this step, we define the problem and objectives of our case study and its desired outcome.

1.0 Background:

This file contains behavior data for 5 months (Oct 2019 – Feb 2020) from a medium cosmetics online store. Here the data in Jan 2020 was selected. Data was collected by Open CDP (https://rees46.com/en/open-cdp) on large eCommerce store (1.2M visitors per month). All sensitive and personal data was removed. Broken data (like NULL price or products without categories) was removed also. 

1.1 Business Task:

Analyse the cosmetics online store sales data in Jan 2020 to gain insights into consumer behaviours and discover trends for the cosmetics online store marketing strategy.

1.2 Business Objectives:

  • What are the trends identified?
  • How could these trends help improve sales revenue?

1.3 Deliverables:

  • A clear summary of the business task
  • A description of all data sources used
  • A summary of analysis
  • Supporting visualizations and key findings
  • High-level content recommendations based on the analysis

STEP 2: PREPARE

In the Prepare phase, we identify the data being used and its limitations.

2.1 Information on Data Source:

  1. Data is publicly available on Kaggle: eCommerce Events History in Cosmetics Shop.
  2. Data collected by the Open CDP project.
  3. Each row in the file represents an event. All events are related to products and users. Each event is like a many-to-many relation between products and users.

2.2 Is Data ROCCC?

A good data source is ROCCC which stands for Reliable, Original, Comprehensive, Current, and Cited.

  • Reliable — HIGH — It has 4,264,752 observations in total
  • Original — LOW — Third party provider 
  • Comprehensive — MED — Some values in columns like brands and category_id are missing
  • Current — MED — Data is 2 years old 
  • Cited — LOW — Data collected from third party, hence unknown

Overall, the dataset is considered good quality data and it is recommended to produce analytical reports based on this data.

2.3 Data Selection

The following file is selected and copied for analysis.

2020-Jan.csv

2.4 Tool

We are using MySQL for data analysis.

STEP 3: PROCESS

Here, we will process the data by cleaning and ensuring that it is correct, relevant, complete and free of error and outlier by performing:

  • Explore and observe data
  • Check for and treat missing or null values
  • Transform data — format data type
  • Perform preliminary statistical analysis

3.1 Data cleaning and manipulation

  1. Observe and familiarize with data
  2. Check for null or missing values
  3. Perform sanity check of data

3.1.1 How to read the data

There are different types of events. See below.

Semantics (or how to read it):

User userid during session usersession added to shopping cart (property eventtype is equal cart) product productid of brand brand of category categorycode (categorycode) with price price at event_time

File structure

PropertyDescription
event_timeTime when event happened at (in UTC).
event_typeOnly one kind of event: purchase.
product_idID of a product
category_idProduct's category ID
category_codeProduct's category taxonomy (code name) if it was possible to make it. Usually present for meaningful categories and skipped for different kinds of accessories.
brandDowncased string of brand name. Can be missed.
priceFloat price of a product. Present.
user_idPermanent user ID.
** user_session**Temporary user's session ID. Same for each user's session. Is changed every time user come back to online store from a long pause.

Event types

Events can be:

  • view - a user viewed a product
  • cart - a user added a product to shopping cart
  • remove_from_cart - a user removed a product from shopping cart
  • purchase - a user purchased a product

Multiple purchases per session

A session can have multiple purchase events. It's ok, because it's a single order.

3.1.2. Find out the outliers

SELECT *
FROM Cosmetics_Shop
WHERE price < 0

39 rows are returned. It is impossible to have products whose price is less than 0 so those 39 rows will be removed. There are no more outliers then. Thus, we can start to analyze this this dataset though there are some missing value in category_code and brand columns.

STEP 4: ANALYSE

4.1 Key Performance Indicators (KPIs) for Ecommerce

4.1.1 Users behaviour KPI

-- numbers of each event type
SELECT SUM(CASE WHEN event_type = 'view' THEN 1 ELSE 0 END) AS view_num ,
SUM(CASE WHEN event_type = 'cart' THEN 1 ELSE 0 END) AS cart_num ,
SUM(CASE WHEN event_type = 'remove_from_cart' THEN 1 ELSE 0 END) AS remove_from_cart_num ,
SUM(CASE WHEN event_type = 'purchase' THEN 1 ELSE 0 END) AS purchase_num ,
COUNT(DISTINCT user_id) AS uv ,
ROUND(SUM(CASE WHEN event_type = 'view' THEN 1 ELSE 0 END) / COUNT(DISTINCT user_id),0) AS 'uv/pv'
FROM Cosmetics_Shop

view_num

cart_num

remove_from_cart_num

purchase_num

uv

uv/pv

2037608

1148323

815024

263797

410073

5

Page View(PV): 2,037,608

Unique Visitors(UV): 410,073

Page View Per Visitor(UV/PV): 5

4.1.2 Users repeat purchasing rate

-- the purchasing frequency of each customer
CREATE VIEW purchase_times AS
SELECT user_id , 
event_time ,
DENSE_RANK()OVER(PARTITION BY user_id ORDER BY DAY(event_time)) AS purchase_times
FROM Cosmetics_Shop
WHERE event_type = 'purchase'
ORDER BY user_id , event_time ;

-- the rate of repeat purchasing
SELECT COUNT(user_id) AS purchase , 
(SELECT COUNT(user_id)
FROM purchase_times
WHERE purchase_times > 1) AS repeat_purchase ,
(SELECT COUNT(user_id)
FROM purchase_times
WHERE purchase_times > 1) / COUNT(user_id) AS repeat_purchase_rate
FROM purchase_times

purchase

repeat_purchase

repeat_purchase_rate

263797

34057

0.1291

The repear purchasing rate is 12.91% , which is not high enough. Thus, we need to take actions to keep customers.

4.2 Analyzing conversion funnel

Funnel: view--cart--purchase

-- numbers of each event type
SELECT event_type , COUNT(event_type) AS event_num
FROM Cosmetics_Shop
GROUP BY event_type ;

--- customers only view
SELECT COUNT(user_id)
FROM Cosmetics_Shop
WHERE user_id NOT IN
(SELECT user_id
FROM Cosmetics_Shop
WHERE event_type = 'cart' OR event_type = 'remove_from_cart' OR event_type = 'purchase')

The number of the behaviour only viewing without adding to cart or purchasing is 707,542 ,  which means the bouncing rate of view-to-cart part is 707,542 / 2,037,608 = 34.7%

--- customers only add products into cart
SELECT COUNT(user_id)
FROM Cosmetics_Shop
WHERE user_id NOT IN
(SELECT user_id
FROM Cosmetics_Shop
WHERE event_type = 'remove_from_cart' OR event_type = 'purchase') AND
event_type = 'cart'

The number of the behaviour only adding products into cart without purchasing is 37,960 ,  which means the bouncing rate of cart-to-purchase of adding products into cart without purchasing part is 156,163 / 1,148,323 = 13.6%

--- customers remove products from cart after adding in
SELECT COUNT(user_id)
FROM Cosmetics_Shop
WHERE user_id NOT IN
(SELECT user_id
FROM Cosmetics_Shop
WHERE event_type = 'purchase') AND
event_type = 'remove_from_cart' AND 
user_id IN
(SELECT user_id
FROM Cosmetics_Shop
WHERE event_type = 'cart')

The number of the behaviour removing products from cart after adding in is 357,594 ,  which means the bouncing rate of cart-to-purchase of removing products from cart after adding in part is 357,594 / 1,148,323 = 31.1%

In conclusion , we can see many customers bounced at the stage of view-to-cart and they tended to remove products from cart after adding in. Though the purchase rate was decent(13%) , we need to try to lower the boucing rates of each part as much as we can.

4.3 Analyzing the reason behind high bouncing rate

4.3.1 The reason why customers only view without adding to cart

I pick Top 100 sales products and Top 100 views products to see how many of them are matched.

CREATE VIEW Top_100_sales AS
SELECT product_id , COUNT(product_id)
FROM Cosmetics_Shop
WHERE event_type = 'purchase'
GROUP BY product_id
ORDER BY COUNT(product_id) DESC
LIMIT 100 ;
 
CREATE VIEW Top_100_view AS
SELECT product_id , COUNT(product_id)
FROM Cosmetics_Shop
WHERE event_type = 'view'
GROUP BY product_id
ORDER BY COUNT(product_id) DESC
LIMIT 100 ;
 
SELECT *
FROM Top_100_view
JOIN Top_100_sales
USING (product_id)

 Only 37 of 100 are matched. It can be concluded those products viewed most are not those that are purchased under most circumstances. Thus , it is important to improve our algorithm and recommendation system so that customers can see products they are interested in more often.

4.4 Analyzing average hourly and daily view activity

SELECT hr ,
AVG(view) AS avg_view,
AVG(CASE WHEN wkday = 0 THEN view END) AS mon ,
AVG(CASE WHEN wkday = 1 THEN view END) AS tue ,
AVG(CASE WHEN wkday = 2 THEN view END) AS wed ,
AVG(CASE WHEN wkday = 3 THEN view END) AS thu ,
AVG(CASE WHEN wkday = 4 THEN view END) AS fri ,
AVG(CASE WHEN wkday = 5 THEN view END) AS sat ,
AVG(CASE WHEN wkday = 6 THEN view END) AS sun
FROM (
SELECT HOUR(event_time) AS hr,
WEEKDAY(event_time) AS wkday,
COUNT(event_type) AS view
FROM Cosmetics_Shop
WHERE event_type = 'view'
GROUP BY 1 , 2 ) AS hourly_view
GROUP BY hr

hr

avg_view

mon

tue

wed

thu

fri

sat

sun

0

2991.7143

2791

3058

3153

3449

3197

2683

2611

1

2421

2023

2255

2810

3121

2649

2000

2089

2

2546.7143

2336

2494

2508

3186

3099

2039

2165

3

3543.5714

3404

3500

3624

4697

4506

2538

2536

4

5000.5714

4972

5070

5340

6115

6536

3542

3429

5

7397

7197

8217

8142

8759

9109

5113

5242

6

10128.1429

9844

11238

11112

12172

12539

6950

7042

7

12603

13159

13378

13617

14237

14839

9388

9603

8

14031.1429

13325

14881

15690

16322

16451

10544

11005

9

15467.2857

14950

17299

16810

18075

16753

11818

12566

10

16124.4286

15129

16534

18036

18580

18109

12672

13811

11

16764.7143

15851

17562

18012

19819

18822

12762

14525

12

17288.7143

18667

17660

19612

19431

19070

11941

14640

13

16970.2857

19481

16311

17521

19687

18836

12145

14811

14

15801.8571

16412

15554

16585

17856

17054

12719

14433

15

15346.4286

14902

14973

16372

16336

16524

13149

15169

16

15940

15985

14962

17792

16705

16747

13797

15592

17

17522.7143

17531

16967

18522

18736

18479

15705

16719

18

19101.8571

18877

18321

20372

21281

19525

16552

18785

19

20014.1429

20689

18582

21281

22189

20193

17395

19770

20

18586.8571

18476

17363

20530

20258

19185

16094

18202

21

12676.7143

11757

11522

13760

13699

13265

12287

12447

22

7997.2857

7388

7199

9428

7676

8220

8057

8013

23

4820.7143

4612

4395

5338

5029

4592

4867

4912

As we can see, more customers view our products on weekdays than they did on weekends. The view behaviour mainly happened during 11AM to 1PM and 6PM to 8PM and the reason is that it’s the time period people take a rest during the working hours and finishing their work. To improve the conversion rate , it’s better to push notifications and send them emails or messages during this time period. 

4.5 Analyzing Top 10 spending customers

SELECT user_id , ROUND(SUM(price),0) AS total_spend
FROM Cosmetics_Shop
WHERE event_type = 'purchase'
GROUP BY user_id
ORDER BY ROUND(SUM(price),0) DESC
LIMIT 10

-- Then we analyse what did the 3rd customer No.469299888 purchasing record
SELECT product_id , COUNT(product_id)
FROM Cosmetics_Shop
WHERE event_type = 'purchase' AND user_id = 469299888
GROUP BY product_id
ORDER BY COUNT(product_id) DESC

As we can see , the customer No.469299888 bought 55 different kinds of products. We need to spend more time on finding out what products these users are really interested in so that they will ‘fall in love’ with our store.

4.6 Analyzing customers values with RFM model

4.6.1 Calculate recency and frequency of each customer

-- calculate recency and frequency of each customer
CREATE VIEW recency_and_frequency AS
SELECT DISTINCT user_id,
DATEDIFF('2020-01-31',MAX(event_time)) AS recency,
COUNT(*) AS frequency,
DATE(MAX(event_time)) AS latest_purchase_time
FROM Cosmetics_Shop
WHERE event_type = 'purchase'
GROUP BY user_id 

 4.6.2 Define the value of each customer

ScoreRecencyFrequency
10-7>6
28-154-6
316-232-3
424-300-1
-- define the value of each customer
CREATE VIEW rf_score AS
SELECT user_id,
(CASE WHEN recency BETWEEN 24 AND 30 THEN 4
WHEN recency BETWEEN 16 AND 23 THEN 3
WHEN recency BETWEEN 8 AND 15 THEN 2
WHEN recency BETWEEN 0 AND 7 THEN 1 END) AS r_score ,
(CASE WHEN frequency BETWEEN 0 AND 1 THEN 4
WHEN frequency BETWEEN 2 AND 3 THEN 3
WHEN frequency BETWEEN 4 AND 6 THEN 2
WHEN frequency > 6 THEN 1 END) AS f_score
FROM recency_and_frequency ;
 
-- calculate the average of recency and frequency scores
SELECT AVG(r_score) , AVG(f_score)
FROM rf_score ;

Average R_score : 2.2188

Average F_score : 1.9153

-- separate customers into different categories
CREATE VIEW ranking AS
SELECT user_id ,
(CASE WHEN r_score >= 2.2188 AND f_score >= 1.9153 THEN 'A'
WHEN r_score >= 2.2188 AND f_score < 1.9153 THEN 'B'
WHEN r_score < 2.2188 AND f_score >= 1.9153 THEN 'C'
WHEN r_score < 2.2188 AND f_score < 1.9153 THEN 'D' END) AS ranking
FROM rf_score ; 
 
-- number of each ranking
SELECT ranking , COUNT(ranking)
FROM ranking
GROUP BY ranking
ORDER BY ranking 

ranking

COUNT(ranking)

A

6402

B

4997

C

8482

D

8339

The proportion of category A : 22.69% 

The proportion of category B : 17.71%

The proportion of category C : 30.06%

The proportion of category D : 29.55%

As we can see , category C and D accounts for the majority of customers. Thus, we should find out products they are interested in and put them on their homepages more often so that they are more likely to buy our products. We can also set up questionnaires to understand the customers' needs and where we can improve on.

4.6 Product analysis

SELECT product_id ,
SUM(CASE WHEN event_type = 'view' THEN 1 ELSE 0 END) AS view_num ,
SUM(CASE WHEN event_type = 'purchase' THEN 1 ELSE 0 END) AS purchase_num ,
SUM(CASE WHEN event_type = 'purchase' THEN 1 ELSE 0 END) / SUM(CASE WHEN event_type = 'view' THEN 1 ELSE 0 END) AS CVR ,
ROUND(SUM(CASE WHEN event_type = 'purchase' THEN price END),2) AS Total_Revenue ,
ROUND((SUM(CASE WHEN event_type = 'purchase' THEN price END) / SUM(CASE WHEN event_type = 'purchase' THEN 1 ELSE 0 END)),2) AS average_order_value ,
ROUND((SUM(CASE WHEN event_type = 'purchase' THEN price END) / SUM(CASE WHEN event_type = 'view' THEN 1 ELSE 0 END)),2) AS revenue_per_view
FROM Cosmetics_Shop
GROUP BY product_id
ORDER BY total_revenue DESC
LIMIT 20

product_id

View_Num

Purchase_Num

CVR

Total_Revenue

Average_Order_Value

revenue_per_view

5560754

3092

85

0.0275

16527.4

194.44

5.35

5809910

24044

2029

0.0844

10631.96

5.24

0.44

5751422

4703

798

0.1697

8738.1

10.95

1.86

5849033

4320

676

0.1565

6976.32

10.32

1.61

5560756

1545

33

0.0214

6862.02

207.94

4.44

5751383

5234

648

0.1238

6687.36

10.32

1.28

5792800

2937

586

0.1995

6047.52

10.32

2.06

5809912

7678

953

0.1241

4993.72

5.24

0.65

5917178

10955

293

0.0267

4790.55

16.35

0.44

5751742

1502

318

0.2117

4519.6

14.21

3.01

5846437

2774

99

0.0357

4478.76

45.24

1.61

5810480

485

194

0.4

4372.76

22.54

9.02

5855507

1435

53

0.0369

4198.13

79.21

2.93

5823970

224

156

0.6964

4185.48

26.83

18.69

5861760

1611

53

0.0329

3785.79

71.43

2.35

5820270

180

141

0.7833

3783.03

26.83

21.02

5528035

2782

394

0.1416

3750.88

9.52

1.35

5822650

468

184

0.3932

3505.2

19.05

7.49

5877453

2198

64

0.0291

3474.56

54.29

1.58

5692527

1467

75

0.0511

3451.5

46.02

2.35

There are different categories in the top 20 revenue products.

First, for products like 5560754 and 5560756, their prices are quite high, the conversion rates and revenue per views are low. Thus, we need to improve our marketing  in promoting campaigns like Black Friday so that we can target people who are willing to buy these products. As a result, we can witness a rapid growth in sales revenue.

Secondly, for products like 5810480 , 5823970 , 5820270 , their view numbers are low, however , their purchase numbers and conversion rates are very impressive. We can invest more or increase our marketing budgets in these products so that we can optimize our marketing investments. 

STEP 5: Share

STEP 6: ACT

In the final step, we will deliver our insights and providing recommendations based on our analysis.

Here, we revisit our business questions and share with you our high-level business recommendations.

1. What are the trends identified?

  • Majority of customers were bouncing after viewing the products (34.7%) and removing products from the cart (31.1%).
  • Customers prefer to view products during weekdays as compared to weekends - perhaps because they are willing to spend their time fragmented when they are taking a break or on their way home.

2. How could these trends help improve sales revenue?

  • Marketing team can build a more effective recommendation system by improving the accuracy so that people can find products fitting their requirements more , which is beneficial for us to keep our customers loyal to our store.
  • On weekends, we can also prompt notification to encourage users to view and recommend our products or store to their friends when they are spending time together.

  • 0
    点赞
  • 0
    收藏
    觉得还不错? 一键收藏
  • 0
    评论

“相关推荐”对你有帮助么?

  • 非常没帮助
  • 没帮助
  • 一般
  • 有帮助
  • 非常有帮助
提交
评论
添加红包

请填写红包祝福语或标题

红包个数最小为10个

红包金额最低5元

当前余额3.43前往充值 >
需支付:10.00
成就一亿技术人!
领取后你会自动成为博主和红包主的粉丝 规则
hope_wisdom
发出的红包
实付
使用余额支付
点击重新获取
扫码支付
钱包余额 0

抵扣说明:

1.余额是钱包充值的虚拟货币,按照1:1的比例进行支付金额的抵扣。
2.余额无法直接购买下载,可以购买VIP、付费专栏及课程。

余额充值