优衣库销售数据分析

#1.加载工具包
import pandas as pd
import matplotlib.pyplot as plt
import seaborn as sns
%matplotlib inline

#加载优衣库文件
UNIQLO=pd.read_csv('L2W1.csv')
#清理数据,描述性语句查看缺失值,去掉销售额为负数的结果
UNIQLO.head()

store_idcitychannelgender_groupage_groupwkd_indproductcustomerrevenueorderquantunit_cost
0658深圳线下Female25-29Weekday当季新品4796.04459
1146杭州线下Female25-29Weekday运动1149.01149
270深圳线下Male>=60WeekdayT恤2178.02249
3658深圳线下Female25-29WeekdayT恤159.01149
4229深圳线下Male20-24Weekend袜子265.0239
UNIQLO.info()#无缺失值
UNIQLO.describe()
<class 'pandas.core.frame.DataFrame'>
RangeIndex: 22293 entries, 0 to 22292
Data columns (total 12 columns):
store_id        22293 non-null int64
city            22293 non-null object
channel         22293 non-null object
gender_group    22293 non-null object
age_group       22293 non-null object
wkd_ind         22293 non-null object
product         22293 non-null object
customer        22293 non-null int64
revenue         22293 non-null float64
order           22293 non-null int64
quant           22293 non-null int64
unit_cost       22293 non-null int64
dtypes: float64(1), int64(5), object(6)
memory usage: 2.0+ MB
store_idcustomerrevenueorderquantunit_cost
count22293.00000022293.00000022293.00000022293.00000022293.00000022293.000000
mean335.3915581.629480159.5313711.6519981.85807246.124658
std230.2361671.785605276.2540661.8614802.34730119.124347
min19.0000001.000000-0.6600001.0000001.0000009.000000
25%142.0000001.00000064.0000001.0000001.00000049.000000
50%315.0000001.00000099.0000001.0000001.00000049.000000
75%480.0000002.000000175.0000002.0000002.00000049.000000
max831.00000058.00000012538.00000065.00000084.00000099.000000
UNIQLO1 = UNIQLO[UNIQLO['revenue']>0]
UNIQLO1.describe()
store_idcustomerrevenueorderquantunit_cost
count22262.00000022262.00000022262.00000022262.00000022262.00000022262.000000
mean335.4866141.630357159.7535491.6529061.85922246.127841
std230.3714541.786694276.3821351.8626172.34872319.120825
min19.0000001.00000010.0000001.0000001.0000009.000000
25%142.0000001.00000066.0000001.0000001.00000049.000000
50%315.0000001.00000099.0000001.0000001.00000049.000000
75%480.0000002.000000175.0000002.0000002.00000049.000000
max831.00000058.00000012538.00000065.00000084.00000099.000000

问题一:整体销售情况随着时间的变化是怎样的?

题目拆解:
数据中与时间有关的字段仅为类别变量wkd_ind代表的Weekday和Weekend,即购买发生的时间是周中还是周末。本题意为分析对比周末和周中与销售有关的数据,包括产品销售数量quant、销售金额revenue、顾客人数customer的情况,可生成柱状图进行可视化。

sns.barplot(x='wkd_ind',y='quant',data =UNIQLO1)
C:\Users\LYY\Anaconda3\lib\site-packages\scipy\stats\stats.py:1713: FutureWarning: Using a non-tuple sequence for multidimensional indexing is deprecated; use `arr[tuple(seq)]` instead of `arr[seq]`. In the future this will be interpreted as an array index, `arr[np.array(seq)]`, which will result either in an error or a different result.
  return np.add.reduce(sorted[indexer] * weights, axis=axis) / sumval





<matplotlib.axes._subplots.AxesSubplot at 0x25c8b73d0b8>

[外链图片转存失败(img-tyTPOmSc-1567858156536)(output_5_2.png)]

sns.barplot(x='wkd_ind',y='revenue',data =UNIQLO1)
C:\Users\LYY\Anaconda3\lib\site-packages\scipy\stats\stats.py:1713: FutureWarning: Using a non-tuple sequence for multidimensional indexing is deprecated; use `arr[tuple(seq)]` instead of `arr[seq]`. In the future this will be interpreted as an array index, `arr[np.array(seq)]`, which will result either in an error or a different result.
  return np.add.reduce(sorted[indexer] * weights, axis=axis) / sumval





<matplotlib.axes._subplots.AxesSubplot at 0x25c8ba31dd8>

[外链图片转存失败(img-zpuqQ81e-1567858156537)(output_6_2.png)]

sns.barplot(x='wkd_ind',y='customer',data =UNIQLO1)
C:\Users\LYY\Anaconda3\lib\site-packages\scipy\stats\stats.py:1713: FutureWarning: Using a non-tuple sequence for multidimensional indexing is deprecated; use `arr[tuple(seq)]` instead of `arr[seq]`. In the future this will be interpreted as an array index, `arr[np.array(seq)]`, which will result either in an error or a different result.
  return np.add.reduce(sorted[indexer] * weights, axis=axis) / sumval





<matplotlib.axes._subplots.AxesSubplot at 0x25c8ba8e278>

[外链图片转存失败(img-jFdmLvWf-1567858156537)(output_7_2.png)]

问题二:不同产品的销售情况是怎样的?顾客偏爱哪一种购买方式?

题目拆解:

  • 不同产品即指product字段中不同类别的产品,销售情况即为销售额revenue,可生成柱状图进行可视化
  • 购买方式只有channel是线上还是线下这一个指标,而顾客可以从不同性别gender_group、年龄段age_group、城市city三个维度进行分解,因此本问即为探究不同性别、年龄段和城市的顾客对线上、线下两种购买方式的偏好,可生成柱状图进行可视化的呈现
UNIQLO1.groupby(['product'])['revenue'].describe()
countmeanstdmin25%50%75%max
product
T恤10598.0145.192002154.28878813.5079.099.0158.06636.00
当季新品2534.0233.095848597.85285719.0076.0111.0197.012538.00
毛衣806.0304.752854290.71561213.00149.0199.0396.04975.00
牛仔裤1410.0174.558496238.76062613.3359.079.0199.02087.00
短裤1691.063.56350155.63152610.0037.040.077.0676.00
袜子2048.062.36882851.15313610.0027.052.079.0595.36
裙子629.0218.287409172.44921210.0099.0197.0237.01442.00
运动975.0121.087528142.76042518.0039.078.0149.01257.00
配件1571.0283.058657398.76806829.0099.0149.0298.04187.00
#u_plot = UNIQLO1.groupby(['product'])['revenue']
#解决中文和负号不正常显示问题
plt.rcParams['font.sans-serif'] = 'SimHei'
plt.rcParams['axes.unicode_minus'] = False
sns.barplot(x = 'product',y = 'revenue',data = UNIQLO1)
#这个地方怎么排序
C:\Users\LYY\Anaconda3\lib\site-packages\scipy\stats\stats.py:1713: FutureWarning: Using a non-tuple sequence for multidimensional indexing is deprecated; use `arr[tuple(seq)]` instead of `arr[seq]`. In the future this will be interpreted as an array index, `arr[np.array(seq)]`, which will result either in an error or a different result.
  return np.add.reduce(sorted[indexer] * weights, axis=axis) / sumval





<matplotlib.axes._subplots.AxesSubplot at 0x25c8bafdda0>

[外链图片转存失败(img-jlqwMbMf-1567858156538)(output_10_2.png)]

1.从不同性别gender_group

plt.rcParams['font.sans-serif'] = 'SimHei'
plt.rcParams['axes.unicode_minus'] = False
sns.countplot(y='gender_group',hue='channel',data=UNIQLO1,order=UNIQLO1['gender_group'].value_counts().index)
plt.tick_params(labelsize=20)

[外链图片转存失败(img-pk6pYqL2-1567858156538)(output_12_0.png)]

plt.rcParams['font.sans-serif'] = 'SimHei'
plt.rcParams['axes.unicode_minus'] = False
sns.countplot(y='age_group',hue='channel',data=UNIQLO1,order=UNIQLO1['age_group'].value_counts().index)
plt.tick_params(labelsize=20)

[外链图片转存失败(img-nIWODwSP-1567858156538)(output_13_0.png)]

plt.rcParams['font.sans-serif'] = 'SimHei'
plt.rcParams['axes.unicode_minus'] = False
sns.countplot(y='city',hue='channel',data=UNIQLO1,order=UNIQLO1['city'].value_counts().index)
plt.tick_params(labelsize=20)

[外链图片转存失败(img-Ub5toJtE-1567858156539)(output_14_0.png)]

问题三:销售额和产品成本之间的关系怎么样?

题目拆解:

  • 每单顾客的总销售额为revenue,根据数量quant可以计算出单件产品销售金额,又已知单件产品成本为unit_cost和其类别product。
  • 思路一:单件产品销售额-成本为利润margin,margin是如何分布的?是否存在亏本销售的产品?
  • 思路二:探究实际销售额和产品成本之间的关系,即为求它们之间的相关,若成正相关,则产品成本越高,销售额越高,或许为高端商品;若成负相关,则成本越低,销售额越高,为薄利多销的模式。
  • 还可以拆分得更细,探究不同城市和门店中成本和销售额的相关性。
UNIQLO1.head()
store_idcitychannelgender_groupage_groupwkd_indproductcustomerrevenueorderquantunit_cost
0658深圳线下Female25-29Weekday当季新品4796.04459
1146杭州线下Female25-29Weekday运动1149.01149
270深圳线下Male>=60WeekdayT恤2178.02249
3658深圳线下Female25-29WeekdayT恤159.01149
4229深圳线下Male20-24Weekend袜子265.0239
UNIQLO1['unit_revenue'] = (UNIQLO1['revenue']/UNIQLO1['quant'])
UNIQLO1['margin'] = (UNIQLO1['revenue']/UNIQLO1['quant']-UNIQLO1['unit_cost'])
C:\Users\LYY\Anaconda3\lib\site-packages\ipykernel_launcher.py:1: SettingWithCopyWarning: 
A value is trying to be set on a copy of a slice from a DataFrame.
Try using .loc[row_indexer,col_indexer] = value instead

See the caveats in the documentation: http://pandas.pydata.org/pandas-docs/stable/indexing.html#indexing-view-versus-copy
  """Entry point for launching an IPython kernel.
C:\Users\LYY\Anaconda3\lib\site-packages\ipykernel_launcher.py:2: SettingWithCopyWarning: 
A value is trying to be set on a copy of a slice from a DataFrame.
Try using .loc[row_indexer,col_indexer] = value instead

See the caveats in the documentation: http://pandas.pydata.org/pandas-docs/stable/indexing.html#indexing-view-versus-copy
UNIQLO1.head()
store_idcitychannelgender_groupage_groupwkd_indproductcustomerrevenueorderquantunit_costunit_revenueprofitmargin
0658深圳线下Female25-29Weekday当季新品4796.04459199.000000140.000000140.000000
1146杭州线下Female25-29Weekday运动1149.01149149.000000100.000000100.000000
270深圳线下Male>=60WeekdayT恤2178.0224989.00000040.00000040.000000
3658深圳线下Female25-29WeekdayT恤159.0114959.00000010.00000010.000000
4229深圳线下Male20-24Weekend袜子265.023921.66666712.66666712.666667
#单件产品的利润的分布,存在收入为负的情况
sns.distplot(UNIQLO1['margin'])
C:\Users\LYY\Anaconda3\lib\site-packages\scipy\stats\stats.py:1713: FutureWarning: Using a non-tuple sequence for multidimensional indexing is deprecated; use `arr[tuple(seq)]` instead of `arr[seq]`. In the future this will be interpreted as an array index, `arr[np.array(seq)]`, which will result either in an error or a different result.
  return np.add.reduce(sorted[indexer] * weights, axis=axis) / sumval





<matplotlib.axes._subplots.AxesSubplot at 0x25c8c3c0390>

[外链图片转存失败(img-j4WgBYFR-1567858156539)(output_19_2.png)]

销售产品的利润分布(类别)
  • 牛仔裤最可能成为亏本销售产品,部分的毛衣和T恤也可能存在亏本销售;
  • T恤的盈利的波动较大,-50到200
  • 裙子和配件是盈利比较高的两类商品

sns.boxplot(x='margin',y = 'product',data = UNIQLO1)
<matplotlib.axes._subplots.AxesSubplot at 0x25c8c415080>

[外链图片转存失败(img-N7d0gWEZ-1567858156539)(output_21_1.png)]

销售产品的利润分布(城市)
  • 各城市表现类似
sns.boxplot(x='margin',y = 'city',data = UNIQLO1)
<matplotlib.axes._subplots.AxesSubplot at 0x25c8c56d9b0>

[外链图片转存失败(img-OWuIrg8t-1567858156540)(output_23_1.png)]

sns.boxplot(x='margin',y = 'channel',data = UNIQLO1)
<matplotlib.axes._subplots.AxesSubplot at 0x25c8c678710>

[外链图片转存失败(img-G2lpkYkC-1567858156540)(output_24_1.png)]

#探究实际销售额和产品成本之间的关系,R值为0.5,成本越高,销售额越高,可能是高端商品占主要??
UNIQLO1[['unit_revenue','unit_cost']].corr()
unit_revenueunit_cost
unit_revenue1.0000000.503499
unit_cost0.5034991.000000
sns.heatmap(UNIQLO1[['unit_revenue','unit_cost']].corr())
<matplotlib.axes._subplots.AxesSubplot at 0x25c8c7b2588>

[外链图片转存失败(img-rbWoHspc-1567858156540)(output_26_1.png)]

还可以拆分得更细,探究不同城市和门店中成本和销售额的相关性
UNIQLO1.groupby(['city'])['unit_revenue','unit_cost'].corr(min_periods=1)
unit_revenueunit_cost
city
上海unit_revenue1.0000000.475711
unit_cost0.4757111.000000
北京unit_revenue1.0000000.474852
unit_cost0.4748521.000000
南京unit_revenue1.0000000.550571
unit_cost0.5505711.000000
广州unit_revenue1.0000000.446615
unit_cost0.4466151.000000
成都unit_revenue1.0000000.514083
unit_cost0.5140831.000000
杭州unit_revenue1.0000000.529296
unit_cost0.5292961.000000
武汉unit_revenue1.0000000.536779
unit_cost0.5367791.000000
深圳unit_revenue1.0000000.487396
unit_cost0.4873961.000000
西安unit_revenue1.0000000.523069
unit_cost0.5230691.000000
重庆unit_revenue1.0000000.504232
unit_cost0.5042321.000000
sns.heatmap(UNIQLO1.groupby(['city'])['unit_revenue','unit_cost'].corr())
<matplotlib.axes._subplots.AxesSubplot at 0x25c8c70e390>

[外链图片转存失败(img-n8sEMdIQ-1567858156541)(output_29_1.png)]

不同城市成本和销售额的相关性比较一致,其中南京相关性稍微强一些
  • 5
    点赞
  • 25
    收藏
    觉得还不错? 一键收藏
  • 8
    评论
评论 8
添加红包

请填写红包祝福语或标题

红包个数最小为10个

红包金额最低5元

当前余额3.43前往充值 >
需支付:10.00
成就一亿技术人!
领取后你会自动成为博主和红包主的粉丝 规则
hope_wisdom
发出的红包
实付
使用余额支付
点击重新获取
扫码支付
钱包余额 0

抵扣说明:

1.余额是钱包充值的虚拟货币,按照1:1的比例进行支付金额的抵扣。
2.余额无法直接购买下载,可以购买VIP、付费专栏及课程。

余额充值