货架分类

为了规范化货架管理,根据货架的销量和货损将货架分类,为不同质量的货架提供不同的服务

为了不使货架分类过多(预计大概分为3-6类),选用可以指定聚类类数的K-Means算法进行聚类,选出最佳聚类数

import pandas as pd
import numpy as np
from sklearn.preprocessing import StandardScaler
from sklearn.cluster import KMeans
from sklearn import metrics
import matplotlib.pyplot as plt
#读入数据
datas = pd.read_csv('C:/Users/acer/Desktop/sf-data.CSV')
datas
 
 IDGMVLOSELR
0A27500044815780.128989
1A13263443833990.091034
2A16556143003480.080930
3A52400542855920.138156
4A45003942735640.131992
5A40258642296800.160795
6A38948042075520.131210
7A34664541765230.125239
8A77391641243610.087536
9A32335541016930.168983
10A50628040926120.149560
11A27126937654080.108367
12A71620837125370.144666
13A52025336906800.184282
14A37228335534140.116521
15A27142135504130.116338
16A39723935383590.101470
17A35848535284620.130952
18A52621934985340.152659
19A50823134363440.100116
20A51064933415370.160730
21A53872933024320.130830
22A66862832795270.160720
23A78219332643160.096814
24A56147332593660.112304
25A42486732533410.104826
26A47039731975540.173287
27A63000131715700.179754
28A43322031585580.176694
29A36584030916040.195406
...............
584A341195104110.105769
585A799917100130.130000
586A53810396340.354167
587A21796594410.436170
588A49399193210.225806
589A22891088280.318182
590A72680371370.521127
591A73882067681.014925
592A64344966671.015152
593A50889862350.564516
594A28684858440.758621
595A40798554210.388889
596A46497151521.019608
597A20168247210.446809
598A51630747110.234043
599A72585639110.282051
600A61260334130.382353
601A66315627281.037037
602A1673162340.173913
603A16187722391.772727
604A7063460290.000000
605A444550040.000000
606A5590780260.000000
607A5748250610.000000
608A2036740570.000000
609A351185080.000000
610A6143870240.000000
611A7071700200.000000
612A681535090.000000
613A7271400700.000000

614 rows × 4 columns

 

根据要求销量过低货损过高的货架都给予撤架处理,所以将货损率LR大于25%且销量低于500和销量低于100的数据剔除。

datas = datas[datas.GMV>100]
datas = datas.drop(datas[(datas.GMV<500) & (datas.LR>0.25)].index)
data = datas[['GMV','LOSE','LR']]
data
 
 GMVLOSELR
044815780.128989
143833990.091034
243003480.080930
342855920.138156
442735640.131992
542296800.160795
642075520.131210
741765230.125239
841243610.087536
941016930.168983
1040926120.149560
1137654080.108367
1237125370.144666
1336906800.184282
1435534140.116521
1535504130.116338
1635383590.101470
1735284620.130952
1834985340.152659
1934363440.100116
2033415370.160730
2133024320.130830
2232795270.160720
2332643160.096814
2432593660.112304
2532533410.104826
2631975540.173287
2731715700.179754
2831585580.176694
2930916040.195406
............
531177130.073446
532176400.227273
533176360.204545
53517480.045977
538164410.250000
540163340.208589
541158210.132911
54215820.012658
543156340.217949
54515670.044872
549151330.218543
550151170.112583
551149310.208054
552149370.248322
553146170.116438
556139210.151079
557135240.177778
559133210.157895
562131200.152672
564130110.084615
565127210.165354
566123250.203252
569118130.110169
571117230.196581
57211510.008696
573111100.090090
576110230.209091
58010670.066038
58210580.076190
584104110.105769

541 rows × 3 columns


#由于销量GMV和货损LOSE的量纲相差太大,所以将数据标准化后训练模型
scaler = StandardScaler().fit(data.astype(float))
data = scaler.transform(data.astype(float))
data = pd.DataFrame({'GMV':data[:,0], 'LOSE':data[:,1], 'LR':data[:,1]})
data
 
 GMVLOSELR
04.2363913.4687133.468713
14.1246952.1751322.175132
24.0300951.8065701.806570
34.0129993.5698873.569887
43.9993223.3675393.367539
53.9491734.2058374.205837
63.9240983.2808193.280819
73.8887663.0712443.071244
83.8294991.9005171.900517
93.8032844.2997844.299784
103.7930273.7144213.714421
113.4203272.2401732.240173
123.3599203.1724183.172418
133.3348464.2058374.205837
143.1787002.2835332.283533
153.1752802.2763062.276306
163.1616031.8860641.886064
173.1502062.6304152.630415
183.1160133.1507383.150738
193.0453491.7776631.777663
202.9370723.1724183.172418
212.8926222.4136142.413614
222.8664073.1001513.100151
232.8493111.5753161.575316
242.8436121.9366511.936651
252.8367741.7559831.755983
262.7729483.2952723.295272
272.7433143.4108993.410899
282.7284973.3241793.324179
292.6521343.6566073.656607
............
511-0.669107-0.614377-0.614377
512-0.670246-0.419256-0.419256
513-0.670246-0.448163-0.448163
514-0.672526-0.650511-0.650511
515-0.683923-0.412029-0.412029
516-0.685063-0.462616-0.462616
517-0.690762-0.556563-0.556563
518-0.690762-0.693871-0.693871
519-0.693041-0.462616-0.462616
520-0.693041-0.657737-0.657737
521-0.698740-0.469843-0.469843
522-0.698740-0.585470-0.585470
523-0.701020-0.484296-0.484296
524-0.701020-0.440936-0.440936
525-0.704439-0.585470-0.585470
526-0.712417-0.556563-0.556563
527-0.716976-0.534883-0.534883
528-0.719256-0.556563-0.556563
529-0.721535-0.563790-0.563790
530-0.722675-0.628830-0.628830
531-0.726094-0.556563-0.556563
532-0.730653-0.527657-0.527657
533-0.736352-0.614377-0.614377
534-0.737492-0.542110-0.542110
535-0.739771-0.701098-0.701098
536-0.744330-0.636057-0.636057
537-0.745470-0.542110-0.542110
538-0.750029-0.657737-0.657737
539-0.751169-0.650511-0.650511
540-0.752309-0.628830-0.628830

541 rows × 3 columns


#作出散点图
plt.scatter(data['GMV'],data['LOSE'])
<matplotlib.collections.PathCollection at 0x5c5cf90198>
 

将货架分为3-12类

n = 0
for k in range(3,13):
    n +=1
    plt.subplot(5,2,n)
    models = KMeans(n_clusters = k).fit(data)
    sorts = models.predict(data)
    scores = metrics.calinski_harabaz_score(data, sorts)
    print(k,'类得分:',scores)
    plt.scatter(data['GMV'], data['LOSE'], c=sorts)
    plt.text(.99, .01, ('k=%d, scores: %.2f' % (k,scores)),
                 transform=plt.gca().transAxes, size=10,
                 horizontalalignment='right')
3 类得分: 2410.732723770415
4 类得分: 2386.4614419681943
5 类得分: 2522.589657882834
6 类得分: 2667.9083791621647
7 类得分: 2525.5496643415527
8 类得分: 2631.022208672163
9 类得分: 2640.4237671093597
10 类得分: 2784.043529129476
11 类得分: 2885.534381287436
12 类得分: 2887.893470979444

将货架分为3-12类时,从得分增长看到,对货架分类是分类越细得分越高即分类效果越好,在超过9类之后,得分差不多维持稳定

但考虑到业务会随着货架种类的增多变得复杂化,所以当选择尽可能最少的分类并达到简化业务的分类方法,所以当从3、5、6中选择一种分类

建议选择5个分类,因为6个分类类别过多,且6个分类的与3、5分类的得分差距来看,差距不大,而3、5两个分类得分几乎相同,但3个分类类别过少,

可能无法精细化业务,而5个分类则可以保证分类不导致使业务太过复杂的情况下精细化服务

 

各货架的分类

model = KMeans(n_clusters = 5)
K_Means = model.fit(data)
K_Means.labels_
array([3, 4, 4, 3, 3, 3, 3, 3, 4, 3, 3, 4, 3, 3, 4, 4, 4, 4, 3, 4, 3, 4,
       3, 4, 4, 4, 3, 3, 3, 3, 4, 3, 3, 3, 4, 4, 4, 4, 4, 4, 4, 4, 1, 1,
       1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1,
       1, 1, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0,
       0, 0, 0, 0, 0, 2, 2, 0, 2, 0, 2, 0, 2, 0, 2, 0, 2, 0, 2, 0, 0, 2,
       2, 0, 2, 2, 0, 0, 0, 2, 0, 2, 0, 2, 0, 0, 0, 2, 2, 0, 0, 2, 0, 2,
       0, 0, 2, 2, 0, 0, 2, 0, 2, 2, 0, 0, 2, 2, 0, 2, 2, 0, 2, 0, 0, 0,
       2, 2, 0, 0, 0, 0, 2, 2, 2, 2, 2, 2, 2, 0, 0, 2, 2, 2, 0, 0, 2, 0,
       2, 0, 0, 2, 2, 2, 2, 0, 2, 0, 2, 0, 2, 0, 0, 0, 0, 2, 0, 0, 2, 0,
       2, 0, 2, 2, 2, 0, 0, 2, 0, 0, 0, 0, 2, 2, 0, 2, 0, 0, 0, 2, 0, 0,
       2, 2, 0, 2, 0, 2, 0, 2, 2, 0, 0, 2, 2, 2, 0, 0, 2, 2, 2, 2, 2, 0,
       2, 0, 0, 2, 2, 2, 2, 0, 0, 2, 0, 2, 2, 2, 2, 2, 0, 0, 2, 0, 2, 0,
       0, 0, 0, 2, 2, 2, 2, 0, 2, 0, 2, 2, 0, 2, 0, 0, 0, 2, 2, 0, 0, 2,
       2, 0, 2, 2, 2, 2, 2, 2, 2, 2, 2, 2, 2, 2, 2, 2, 2, 2, 2, 2, 2, 2,
       2, 2, 2, 2, 2, 2, 2, 2, 2, 2, 2, 2, 2, 2, 2, 2, 2, 2, 2, 2, 2, 2,
       2, 2, 2, 2, 2, 2, 2, 2, 2, 2, 2, 2, 2, 2, 2, 2, 2, 2, 2, 2, 2, 2,
       2, 2, 2, 2, 2, 2, 2, 2, 2, 2, 2, 2, 2, 2, 2, 2, 2, 2, 2, 2, 2, 2,
       2, 2, 2, 2, 2, 2, 2, 2, 2, 2, 2, 2, 2, 2, 2, 2, 2, 2, 2, 2, 2, 2,
       2, 2, 2, 2, 2, 2, 2, 2, 2, 2, 2, 2, 2, 2, 2, 2, 2, 2, 2, 2, 2, 2,
       2, 2, 2, 2, 2, 2, 2, 2, 2, 2, 2, 2, 2, 2, 2, 2, 2, 2, 2, 2, 2, 2,
       2, 2, 2, 2, 2, 2, 2, 2, 2, 2, 2, 2, 2, 2, 2, 2, 2, 2, 2, 2, 2, 2,
       2, 2, 2, 2, 2, 2, 2, 2, 2, 2, 2, 2, 2, 2, 2, 2, 2, 2, 2, 2, 2, 2,
       2, 2, 2, 2, 2, 2, 2, 2, 2, 2, 2, 2, 2, 2, 2, 2, 2, 2, 2, 2, 2, 2,
       2, 2, 2, 2, 2, 2, 2, 2, 2, 2, 2, 2, 2, 2, 2, 2, 2, 2, 2, 2, 2, 2,
       2, 2, 2, 2, 2, 2, 2, 2, 2, 2, 2, 2, 2])
data.insert(3,column = 'grade', value = K_Means.labels_)
data.head()
 GMVLOSELRgrade
04.2363913.4687133.4687133
14.1246952.1751322.1751324
24.0300951.8065701.8065704
34.0129993.5698873.5698873
43.9993223.3675393.3675393

各类货架及其销量/货损散点图

plt.figure(figsize=(14,7), facecolor='w')
plt.ylim(-1,5)
plt.plot(data['LR'], data['grade'], 'bo',markersize = 8, zorder=2, label='LR')
plt.plot(data['GMV'], data['grade'], 'go', markersize = 16, zorder=1, label='GMV' )
plt.legend(loc = 'upper left')
plt.xlabel('grade', fontsize=18)
plt.ylabel('GMV or LR', fontsize=18)
plt.title('classfication and indicators', fontsize=20)
Text(0.5, 1.0, 'classfication and indicators')
#各类货架收益与货损占比
GMV_sum = sum(abs(data['GMV']))
LR_sum = sum(abs(data['LR']))
print('一类货架收益占比:',(sum(abs(data[data.grade == 0].GMV))/GMV_sum)*100)
print('一类货架货损占比:',(sum(abs(data[data.grade == 0].LR))/LR_sum)*100)
print('二类货架收益占比:',(sum(abs(data[data.grade == 1].GMV))/GMV_sum)*100)
print('二类货架货损占比:',(sum(abs(data[data.grade == 1].LR))/LR_sum)*100)
print('三类货架收益占比:',(sum(abs(data[data.grade == 2].GMV))/GMV_sum)*100)
print('三类货架货损占比:',(sum(abs(data[data.grade == 2].LR))/LR_sum)*100)
print('四类货架收益占比:',(sum(abs(data[data.grade == 3].GMV))/GMV_sum)*100)
print('四类货架货损占比:',(sum(abs(data[data.grade == 3].LR))/LR_sum)*100)
print('五类货架收益占比:',(sum(abs(data[data.grade == 4].GMV))/GMV_sum)*100)
print('五类货架货损占比:',(sum(abs(data[data.grade == 4].LR))/LR_sum)*100)
一类货架收益占比: 6.164401861697537
一类货架货损占比: 6.652831058827551
二类货架收益占比: 9.089795550942922
二类货架货损占比: 15.101246088993388
三类货架收益占比: 45.97416334065695
三类货架货损占比: 46.421436906323414
四类货架收益占比: 19.726729310594852
四类货架货损占比: 19.542840412045173
五类货架收益占比: 19.04490993610771
五类货架货损占比: 12.281645533810488

可以看出根据货架的销量和货损将货架分为五类,分别是:

第一类:低销量中低货损,此类货架中存在部分损失远大于收益的情况,对于这部分货架考虑撤架

第二类:中等销量中上货损,损失大于收益,此类货架中存在部分损失远大于收益的情况,对于这部分货架考虑撤架

第三类:低销量低货损,收益大于损失,可根据维护成本酌情撤架

第四类:高销量高货损,此类货损贡献了很大的GMV但货损也较高几乎抵消掉了收益,应设法降低货损

第五类:高销量低货损,此类货架为优质货架

 

提取第一类货架中货损在总货损中占比大于销量在总销量中占比且单个货架货损占货架销量15%以上的货架,这部分货架是考虑撤架的货架

 

index2 = list(data[data.LR > max(data[data.grade==0].GMV)].index)
datas2 = datas.iloc[index2, :]
datas2[datas2.LR>0.15].sort_values(by=['LR'],ascending=False)

 

 IDGMVLOSELR
67A13150213034200.322333
66A51538913104220.322137
55A50549217314440.256499
63A54926815423910.253567
64A49206914373630.252610
53A27582317664310.244054
65A72643313863380.243867
90A5591018001930.241250
61A32678815513690.237911
49A26643719454490.230848
54A57013017644020.227891
58A45326416323660.224265
88A6258288311860.223827
57A43148416913750.221762
82A2285809111990.218441
62A52626315453350.216828
56A12208317043660.214789
32A30156430456430.211166
51A40769319114030.210884
60A55516815593260.209108
45A40548621024350.206946
48A72053219724030.204361
87A2830518521740.204225
59A19670515693170.202040
50A44695019343790.195967
29A36584030916040.195406
33A62155429905700.190635
44A29871723704440.187342
13A52025336906800.184282
27A63000131715700.179754
43A39885123854250.178197
52A37833718423260.176982
28A43322031585580.176694
42A16279325124420.175955
26A47039731975540.173287
76A50289211321950.172261
79A33862810731830.170550
9A32335541016930.168983
5A40258642296800.160795
20A51064933415370.160730
22A66862832795270.160720
47A55773620053220.160599
74A62095211591860.160483
69A60877912031930.160432
31A55015730764930.160273
70A37144711861880.158516
36A39878928844460.154646
18A52621934985340.152659
38A22234526894090.152101

提取第二类货架中货损在总货损中占比大于销量在总销量中占比且单个货架货损占货架销量15%以上的货架,这部分货架是考虑撤架的货架

index4 = list(data[data.LR > max(data[data.grade==1].GMV)].index)
datas4 = datas.iloc[index4, :]
datas4[datas4.LR>0.15].sort_values(by=['LR'],ascending=False)
 
 IDGMVLOSELR
67A13150213034200.322333
66A51538913104220.322137
55A50549217314440.256499
63A54926815423910.253567
53A27582317664310.244054
49A26643719454490.230848
54A57013017644020.227891
57A43148416913750.221762
32A30156430456430.211166
51A40769319114030.210884
45A40548621024350.206946
48A72053219724030.204361
50A44695019343790.195967
29A36584030916040.195406
33A62155429905700.190635
44A29871723704440.187342
13A52025336906800.184282
27A63000131715700.179754
43A39885123854250.178197
28A43322031585580.176694
42A16279325124420.175955
26A47039731975540.173287
9A32335541016930.168983
5A40258642296800.160795
20A51064933415370.160730
22A66862832795270.160720
31A55015730764930.160273
36A39878928844460.154646
18A52621934985340.152659
38A22234526894090.152101

转载于:https://www.cnblogs.com/aioverg/p/11157272.html

  • 0
    点赞
  • 0
    收藏
    觉得还不错? 一键收藏
  • 0
    评论

“相关推荐”对你有帮助么?

  • 非常没帮助
  • 没帮助
  • 一般
  • 有帮助
  • 非常有帮助
提交
评论
添加红包

请填写红包祝福语或标题

红包个数最小为10个

红包金额最低5元

当前余额3.43前往充值 >
需支付:10.00
成就一亿技术人!
领取后你会自动成为博主和红包主的粉丝 规则
hope_wisdom
发出的红包
实付
使用余额支付
点击重新获取
扫码支付
钱包余额 0

抵扣说明:

1.余额是钱包充值的虚拟货币,按照1:1的比例进行支付金额的抵扣。
2.余额无法直接购买下载,可以购买VIP、付费专栏及课程。

余额充值