shopee刷销量_shopee如何帮助品牌增加销量

最新推荐文章于 2025-05-19 00:09:57 发布

weixin_26729763

最新推荐文章于 2025-05-19 00:09:57 发布

阅读量1.6k

点赞数

原文链接：https://medium.com/@rizkyramadhana26/how-does-shopee-help-brands-to-increase-their-sales-1f70a5d07887

版权

本文介绍了如何使用Python编程语言和Pandas库分析Shopee品牌销售数据，找出每个品牌在2019年5月10日至5月31日期间收入最高的三个商品。通过清理和组织数据，确定官方店铺，并计算每个商品的总收入，从而制定广告策略以增加销售额。这种方法在数据科学竞赛中获得0.88888的高分，证明了Python和Pandas在数据分析中的高效性。

摘要生成于 C知道，由 DeepSeek-R1 满血版支持，前往体验 >

shopee刷销量

Disclaimer : This article is based on my personal experience and I do not represent anyone or any brand. This article will mostly talk about Python programming language and its Pandas library.

免责声明：本文基于我的个人经验，我不代表任何人或任何品牌。本文将主要讨论Python编程语言及其Pandas库。

There are hundreds brand at Shopee. Every brand surely want to increase their sales so that they can gain more profit. In order to do that, Shopee Data Analyst would identify three products with highest revenue from each brand. Then, campaign and ads for those three products could be increase in such a way that overall sales for each brand have a high chance to increase.

Shopee有数百个品牌。每个品牌肯定希望增加销量，以便获得更多利润。为此，Shopee Data Analyst将从每个品牌中找出收入最高的三种产品。然后，可以增加这三个产品的广告系列和广告，以使每个品牌的整体销售额都有很大的增长机会。

In this article we will walk through the code one may use to do the task mentioned before. This task is adapted from “I’m The Best Coder ! Challenge 2019”, a data science competition held by Shopee (go to this page https://www.kaggle.com/c/ptr-rd1 to see the original problem).

在本文中，我们将逐步介绍一个人可能用来执行前面提到的任务的代码。此任务改编自“我是最好的编码器！ Shopee举办的数据科学竞赛“挑战2019”(转到此页面https://www.kaggle.com/c/ptr-rd1以查看原始问题)。

Task:

任务：

Find the top three itemids (in a list) from the ‘Official Shop’ of that particular brand that generated the highest Gross Sales Revenue (amount*item_price_usd) from 10th May to 31st May 2019.

从该品牌的'官方商店'中找到前三个最重要的id(在列表中)，该ID在2019年5月10日至5月31日产生了最高的总销售收入(amount * item_price_usd)。

Without further ado, let’s jump right into the code. We use Pandas to analyze and modify our data.

事不宜迟，让我们直接进入代码。我们使用熊猫来分析和修改我们的数据。

In[1]:import pandas as pd brands = pd.read_csv("d:/Extra_material_2.csv")
brands.head()

We store a dataframe in a variable named ‘brands’. Each shop_id belong to certain brand. From this dataframe, we know whether that shop_id is an official shop or not. Below is brands’ first five rows.

我们将数据框存储在名为“ brands”的变量中。每个shop_id都属于某个品牌。从这个数据框中，我们知道shop_id是否是官方商店。以下是品牌的前五行。

Out[1]:    shop_id       brand         shop_type
0 92567513.0       JBL        Official Shop
1 1657263.0      Sandisk      Official Shop
2 83401047.0     Revlon       Official Shop
3 4259661.0      Philips      Official Shop
4 4350106.0  Kimberly Clark   Official Shop

Before analyzing the data, we should “clean” it first. Three steps that we used:

在分析数据之前，我们应该先“清理”数据。我们使用了三个步骤：

Remove duplicated rows
删除重复的行
Change ‘shop_id’ from float to integer (so that ‘.0’ will be removed)
将'shop_id'从float更改为整数(这样将删除'.0')
Sort the data based on ‘brand’ alphabetically
按字母顺序对数据进行排序

In[2]:
brands.drop_duplicates(keep=’first’, inplace=True)
brands['shop_id'] = dataa.apply(lambda x: int(x['shop_id']),axis=1)
brands.sort_values(‘brand’, inplace=True)
brands.reset_index(inplace=True)
brands

Below is our dataframe ‘brands’ after being cleaned and organized.

以下是经过整理和整理后的数据框“品牌”。

Out[2]:
    index    shop_id    brand    shop_type
0    1553    63512637    3M         NaN
1    1554    15424243    3M         NaN
2     40     13735592    3M    Official Shop
3     41     15607060    3M    Official Shop
4     42     63512637    3M    Official Shop
…     …         …         …          …
1388 2237    46944726    Za         NaN
1389 2435    78346322   eMart       NaN
1390 613     78346970   eMart   Shopee Store
1391 2436    78346970   eMart       NaN
1392 612     78346322   eMart   Shopee Store

Next, we import our second data and saved it to a dataframe named ‘transactions’

接下来，我们导入第二个数据并将其保存到名为“ transactions”的数据框中

In[3]:
transactions = pd.read_csv(”d:/Extra_material_3.csv”)
transactions.drop_duplicates(keep='first', inplace=True)
transactions.reset_index(inplace=True)
transactions.head()

This is how our dataframe ‘transactions’ looks like.

这就是我们的数据框“交易”的样子。

Out[3]:
 index    orderid     itemid   date_id amount item_price_usd shopid
0    0 1316773501 1355135789 16/5/2019      1   5.483204    65948755
1    1 1316774684 1257902741 16/5/2019      1   0.387050    61364537
2    2 1316803330 2126770455 16/5/2019      1   0.344044    4980129
3    3 1317196434 2112999088 16/5/2019      1   6.704561    29089750
4    4 1318105348 1832131969 16/5/2019      1   5.354187    94372493

Unfortunately, transactions’ column named ‘date_id’ is a string.

不幸的是，名为“ date_id”的“交易”列是一个字符串。

In[4]:
transactions[‘date_id’].describe()
Out[4]:
count      1013725
unique          12
top       5/5/2019
freq        317810
Name: date_id, dtype: object

We have to convert it to datetime first. Then, remove orders before May 10th 2019 and orders after May 31st 2019. For further interests, we also convert column ‘itemid’ to string.

我们必须先将其转换为日期时间。然后，在2019年5月10日之前删除订单，在2019年5月31日之后删除订单。为了进一步引起兴趣，我们还将列“ itemid”转换为字符串。

In[5]: transactions[‘date_id’]=pd.to_datetime(transactions[‘date_id’])
transactions = transactions[(transactions[‘date_id’]>=’2019–05–10')&
                            (transactions[‘date_id’]<=’2019–05–31')]transactions['itemid']=transactions.apply(lambda x:str
                                          (int(x['itemid'])),axis=1)

We created new column named ‘subtotal’. This column contains number came from formula ‘amount’ x ‘item_price_usd’.

我们创建了一个名为“小计”的新列。此列包含来自公式“金额” x“ item_price_usd”的数字。

In[6]:
transactions[‘subtotal’]=transactions.apply(lambda x: x.amount*
                                            x.item_price_usd,axis=1)
transactions

This is how our dataframe ‘transactions’ looks like.

这就是我们的数据框“交易”的样子。

Out[6]:
  ...  itemid       date_id amount item_price_usd  shopid  subtotal
0      1355135789 2019–05–16  1       5.483204    65948755 5.483204
1      1257902741 2019–05–16  1       0.387050    61364537 0.387050
2      2126770455 2019–05–16  1       0.344044     4980129 0.344044
3      2112999088 2019–05–16  1       6.704561    29089750 6.704561
4      1832131969 2019–05–16  1       5.354187    94372493 5.354187
           …          …       …           …           …       …
658070 1286506704 2019–05–29  1       2.924375    73789223 2.924375
658071 1726943954 2019–05–29  2      18.062319    52323543 36.124638
658072 423611298  2019–05–29  1       4.300552    20187300 4.300552
658073 1420094546 2019–05–29  1      13.976795    46595353 13.976795
658074 237867635  2019–05–29  1      23.179976    21180592 23.179976

Next, we create a new dataframe named ‘products’. This dataframe will tell us how much revenue generated by each itemid. (Each product have unique itemid)

接下来，我们创建一个名为“产品”的新数据框。该数据框将告诉我们每个itemid产生了多少收入。 (每个产品都有唯一的itemid)

In[7]:
products = pd.DataFrame({‘itemid’:transactions[‘itemid’].unique()},
                        columns=[‘itemid’])
gross_revenue = []for x in products[‘itemid’]:     
    gross_revenue.append(transactions[transactions[‘itemid’]==x]
                         [‘subtotal’].sum(axis=0))products[‘gross_revenue’]=gross_revenue

Next, we labelled each itemid with its corresponding shopid.

接下来，我们用其对应的shopid标记每个itemid。

In[8]:shopid = []for x in products[‘itemid’]:
    shopid.append(transactions[transactions[‘itemid’]==x]
                  [‘shopid’].unique()[0])products[‘shopid’]=shopidproducts.head()

This is how five first row of our dataframe ‘products’ looks like.

这就是我们数据框“产品”的五个第一行的样子。

Out[8]:
      itemid gross_revenue     shopid
0 1355135789      5.483204   65948755
1 1257902741      1.161149   61364537
2 2126770455      1.720221    4980129
3 2112999088      6.704561   29089750
4 1832131969     67.088614   94372493

Next, we want to find three products with highest revenue. Here are the steps:

接下来，我们想找到三种收入最高的产品。步骤如下：

Create dataframe ‘list’ which is a list of ‘Official Shop’ shopid from certain brand
创建数据框“列表”，该列表是特定品牌的“官方商店”购物清单
Create dataframe ‘product_list’ which is a list of product that is sold from Official Shops listed in ‘list’
创建数据框“ product_list”，这是从“ list”中列出的官方商店出售的产品的列表
Find three products from ‘product_list’ with highest revenue
从“ product_list”中找到收入最高的三个产品
Repeat to other brands
重复其他品牌

Store the answer in a dataframe named ‘Answers’

将答案存储在名为“答案”的数据框中

In[9]:
Answers = []for x in dataa[‘brand’].unique():
    list = brands[(brands[‘brand’]==x) & 
                 (brands[‘shop_type’]==’Official Shop’)][‘shop_id’]
    product_list = datac[datac[‘shopid’].isin(list)]
    best3 = product_list.nlargest(3, ‘gross_revenue’).reset_index()
    if len(best3) == 3:
        Answers.append(x +’, ‘+best3.loc[0,’itemid’]+’, ‘  
                       +best3.loc[1,’itemid’]+’, ‘+
                       best3.loc[2,’itemid’])
    elif len(best3) == 2:
        Answers.append(x +’, ‘+best3.loc[0,’itemid’]+’, ‘ 
                       +best3.loc[1,’itemid’])
    elif len(best3) == 1:
        Answers.append(x +’, ‘+best3.loc[0,’itemid’])
    else :
        Answers.append(x +’, N.A’)

Export ‘Answers’ to csv file and submit it to Kaggle.

将“答案”导出到csv文件并将其提交给Kaggle。

In[10]:Answers = pd.DataFrame(Answers, columns=[‘Answers’])
Index = range(1, 271)
Answers.insert(0, ‘Index’, Index)
Answers.to_csv(”d:/Answers3.csv”, index = False)
Answers

Our ‘Answers’ should look like this

我们的“答案”应如下所示

Out[10]:     Index                                          Answers
0        1                                          3M, N.A
1        2                                 3M Littmann, N.A
2        3          AHC, 1617870234, 1617870245, 1617870178
3        4                                        ASUS, N.A
4        5       Abbott, 1437756029, 1414181351, 1414181383
..     ...                                              ...
265    266                            Wyeth Nutrition , N.A
266    267        Xiaomi, 1373245117, 967538060, 1913571916
267    268  Yves Rocher, 1719453583, 1719453786, 2109554227
268    269            Za, 365981058, 1074619349, 1867715485
269    270                                       eMart, N.A

Some brands have no product sold, so we put ‘N.A’ after the brand’s name.

有些品牌没有出售产品，因此我们在品牌名称后加上“ NA”。

This method yielded score 0.88888 out of 1. This method also took a very long time to proceed. Any comments and suggestions for better and faster method will be much appreciated ! :D

该方法得出的满分为0.88888。该方法也花费了很长时间。任何更好和更快的方法的意见和建议，将不胜感激！：D

In summary, one can boost its profit by increasing campaign and advertisement for products that generated highest revenue. In fact, this simple task can be done by hand. But with thousands of data, doing it by hand is hard and painful. A little knowledge about Python and Pandas will help us a lot.

总之，可以通过增加产生最高收入的产品的广告系列和广告来增加利润。实际上，这个简单的任务可以手动完成。但是，要处理成千上万的数据，手工完成将非常困难。对Python和Pandas的一点了解将对我们有很大帮助。

Thank you for reading this article ! I hope you gained some new knowledge after reading this article. Any comments and suggestions will be much appreciated. Thank you !

感谢您阅读本文！我希望您阅读本文后能获得一些新知识。任何意见和建议将不胜感激。谢谢！