“黑色星期五”数据分析实战

最新推荐文章于 2023-05-19 13:25:24 发布

L _W

最新推荐文章于 2023-05-19 13:25:24 发布

阅读量4.7k

点赞数 7

本文链接：https://blog.csdn.net/weixin_43655375/article/details/106415827

版权

项目背景

美国圣诞节大采购一般是从感恩节之后开始的。感恩节是每年11月的第四个星期四。因此它的第二天，也就是美国人大采购的第一天。在这一天，美国的商场都会推出大量的打折和优惠活动，以在年底进行最后一次大规模的促销。这有点类似于淘宝的双十一购物节。项目中所使用的数据是零售商店中黑色星期五的约54万条交易样本数据，商店希望更加了解用户购买行为。所以本项目通过总结本次“黑五”销售状况，分析和研究不同用户对不同产品的购买行为来为下一次“黑五”提供相关参考。

字段说明

字段	含义
User_ID	用户编码
Product_ID	产品编码
Gender	性别（M为男性，F为女性）
Age	年龄（0-17，18-25，26-35，36-45,46-50，51-55，55+ 7种）
Occupation	职业（用数字代表具体职业，一共有20种职业）
City_Category	城市分类（分为三类城市：ABC）
Stay_In_Current_City_Years	在目前城市的居住的年数（0,1,2,3,4+5种）
Marital_Status	婚姻状况（0代表未婚，1代表已婚）
Product_Category_1	产品分类为1（不能为空）
Product_Category_2	产品分类为2（可以为空）
Product_Category_3	产品分类为3（可以为空）
Purchase	购买金额（单位为美元）

数据总览

import pandas as pd
data=pd.read_csv('BlackFriday.csv')
data.drop_duplicates(inplace=True)#去重
print(data.info())

Data columns (total 12 columns):
User_ID                       537577 non-null int64
Product_ID                    537577 non-null object
Gender                        537577 non-null object
Age                           537577 non-null object
Occupation                    537577 non-null int64
City_Category                 537577 non-null object
Stay_In_Current_City_Years    537577 non-null object
Marital_Status                537577 non-null int64
Product_Category_1            537577 non-null int64
Product_Category_2            370591 non-null float64
Product_Category_3            164278 non-null float64
Purchase                      537577 non-null int64
dtypes: float64(2), int64(5), object(5)

数据集中共有约54万条数据，Product_Category_1 不能为空，Product_Category_2 和Product_Category_3 可以为空。因此数据不存在缺失值。