Account Sharing Classification Model

最新推荐文章于 2024-03-26 10:00:14 发布

ebay

最新推荐文章于 2024-03-26 10:00:14 发布

阅读量793

点赞数

分类专栏： Machine Learning

本文链接：https://blog.csdn.net/ebay/article/details/43529471

版权

Machine Learning 专栏收录该内容

5 篇文章 0 订阅

订阅专栏

Author:Zhang Felix

Introduction

In the age of internet, online shopping has created a revolution for consumers, growing rapidly year by year and offering a Golden Age of shopping. User account sharing is quite a common phenomenon in online shopping. People are willing to share user account in all purposes. Sellers are sharing their user accounts with employees to sell in a more efficient way. Buyers are sharing their user account with family, friends, roommates to enable more convenience user experience. In this paper, we present a logic based study to define account sharing behavior in eBay marketplace. Using data from a large eBay behavior data platform, we show that there are discernible and noteworthy patterns of account sharing behavior. We also build account sharing classification model to capture shared accounts using user demographic, pre-switch, post-switch, switch transition and meta session level features. Our findings show that our model can attain strong account sharing detection accuracy and improve personalization experience in eBay and enable analytics for device switch.

RELATED WORK

In this paper our focus is on identifying account sharing users. Related work falls in the following areas: (1) Connecting sessions in device switch. (2) Account sharing behavior classification.

Multi-screen usage is the key to the account sharing model. The concept of multi-screen has been raised by Google 2 years ago. Device switch has been studied intensively, behavior data from log server have proven to be extremely valuable in studying how people switch device to another. In our case of eBay marketplace, there are three device categories: desktop, cellphone, tablet. We are connecting these switching sessions from one device category to another. We call these sessions as “Meta Session” in this project. There are three types of meta session.

First is sequential, the session 2 starts after session 1 within 30 mins.

Second is overlapping, session 2 overlaps with session 1.

Third is subsuming, session1 contains session 2.

Modeling

In order to build the account sharing classification model, we created 5 groups of features.

Feature Group 1 - User Demographic Features

Specification: This is a group of features which is related to basic attributions for eBay user.

Buyer/Seller Indicator
Gender
Age Group
Number of Children
Single/Family Indicator
Seller Level (CSS)

Feature Group 2 - Pre-Switch Features

Specification: This is a group of features related to pre-switch session.

Device category
Avg leaf male fraction
Avg leaf female fraction
Session Duration

Feature Group 3: Post-Switch Features

Specification: This is a group of features related to post-switch session.

Device category
Avg leaf male fraction
Avg leaf female fraction
Session duration

Feature Group 4 - Switch Transition Features

Specification: This is a group of features related to transition attribution between pre-switch session and post-switch session.

Distance
Moving Speed
Sequential gap duration
Overlap gap duration
Event Gap Count for 1 & 2 second threshold
Switch hour bucket
Device pair frequent count
Leaf gender variance score
Meta Category similarity Score
Notification pair indicator

Feature Group 5 - Meta Session Features

Specification: This is a group of features related to overall account daily performance in the meta session.

# unique devices
# unique cellphones
# unique desktops
# unique tablets

We added feature groups step by step to train our GBM model, the following the Specificity vs Sensitivity plot.

M1: we use user demographic features + pre-switch session features

M2: we use M1 + post-switch feature

M3:we use M2 + between-switch transition feature

M4: we use M3 + meta session level features

M5: we use Ensemble model based on 8 M4 models

GBM Parameter Tuning

We have tuned the optimized model parameters by grid search,
finally, we got the best parameters: tree=350, depth=3 and shrinkage = 0.15,
so for the further model tuning, we will apply these 3 parameters to all the models.

# of Trees = 350:

Stochastic Gradient Boosting

67500 samples
24 predictors
2 classes: 'No', 'Yes'

No pre-processing
Resampling: Cross-Validated (3 fold)

Summary of sample sizes: 44999, 45001, 45000

Resampling results across tuning parameters:

n.trees ROC Sens Spec ROC SD Sens SD Spec SD
100 0.958 0.973 0.472 0.00064 0.000983 0.00284
150 0.961 0.971 0.512 0.000583 0.00121 0.0046
200 0.962 0.969 0.55 0.000406 0.000684 0.00861
250 0.963 0.968 0.568 0.000547 5.69e-05 0.0103
300 0.963 0.967 0.582 0.000506 0.000591 0.00743
350 0.964 0.966 0.595 0.000618 0.000402 0.00395
400 0.964 0.966 0.599 0.00063 0.000421 0.00409
450 0.964 0.966 0.599 0.000767 0.000427 0.00686
500 0.964 0.966 0.605 0.000733 0.000733 0.0112

# interaction.depth = 3:

67500 samples
24 predictors
2 classes: 'No', 'Yes'

No pre-processing
Resampling: Cross-Validated (3 fold)

Summary of sample sizes: 45000, 44999, 45001

Resampling results across tuning parameters:

interaction.depth ROC Sens Spec ROC SD Sens SD Spec SD
1 0.943 0.978 0.338 0.00238 0.00233 0.0177
3 0.963 0.965 0.597 0.00136 0.000994 0.00711
5 0.964 0.964 0.628 0.00159 0.00108 0.00653
7 0.964 0.963 0.629 0.00155 0.00158 0.00479

Shrinkage=0.15:

Stochastic Gradient Boosting

67500 samples
24 predictors
2 classes: 'No', 'Yes'

No pre-processing
Resampling: Cross-Validated (3 fold)

Summary of sample sizes: 45000, 44999, 45001

Resampling results across tuning parameters:

shrinkage ROC Sens Spec ROC SD Sens SD Spec SD
0.05 0.961 0.97 0.531 0.000171 0.00197 0.0166
0.1 0.963 0.967 0.588 0.000417 0.00118 0.0107
0.15 0.964 0.965 0.597 4e-04 0.00115 0.00869
0.2 0.964 0.964 0.607 0.000871 0.00155 0.00758

Ensemble Model

Under Sampling

Because the positive/negative data is imbalanced(ratio is nearly 1:8), so use under-sampling technics to partition 8 training data sets and build 8 GBM models

Mixture Model

Blending them together to generate a predicted positive label probability

And set the cutoff to be 90% to improve the precision

Mixing with rule engine

In order to complement the recall sacrifice, we combined the rule back to get the final labels.

ROC comparison:

Model	Model Description	ROC
M1	user demographics + pre_switch features	0.6273
M2	M1 + post_switch features	0.6558
M3	M2 + between_switch transition features	0.9381
M4	M3 + meta session level features	0.938
M5	8 M4 Ensemble Model	0.9696

Variable Importance Chart:

Feature	Overall
mob_dev_cnt	100
pc_dev_cnt	71.4528
seq_gap_dur	24.3231
notif_pair_as_label	20.7925
tab_dev_cnt	15.6279
device_pair_cnt	9.9352
overlap_gap_dur	7.6
meta_categ_similarity_score	6.8111
ttl_dev_cnt	5.7397
sec_gap_pct	3.5903
sec_gap_cnt	2.308
to_sess_dur	1.5664
from_sess_dur	1.1491
leaf_gender_diff_score	1.0783
to_avg_female_pct	1.0188
is_buyer	0.5936
to_avg_male_pct	0.424
user_age	0.3801
from_avg_female_pct	0.3113
switch_hour_bucket	0.2983

ebay

关注

0
点赞
踩
0

收藏

觉得还不错? 一键收藏
0
评论
Account Sharing Classification Model

Author:Zhang FelixIntroductionIn the age of internet, online shopping has created a revolution for consumers, growing rapidly year by year and offering a Golden Age of shopping. User account s
复制链接

扫一扫

专栏目录