python数据框一行变多行,python将数据框列分成多行

I have a dataframe like this:

--------------------------------------------------------------------

Product ProductType SKU Size

--------------------------------------------------------------------

T-shirt Top [111,222,333,444] ['XS','S','M','L']

Pant(Flared) Bottoms [555,666] ['M','L']

Sweater Top None None

I want the below output:

Product ProductType SKU Size

T-shirt Top 111 XS

T-shirt Top 222 S

T-shirt Top 333 M

T-shirt Top 444 L

Pant(Flared) Bottoms 555 M

Pant(Flared) Bottoms 666 L

Sweater Top None None

I tried the following code:

s = df['SKU'].apply(Series,1).stack()

s.index = s.index.droplevel(-1)

s.name = 'SKU'

del df['SKU']

df = df.join(s)

r = df['Size'].apply(Series,1).stack()

r.index = r.index.droplevel(-1)

r.name = 'Size'

del df['Size']

df = df.join(r)

But this explodes into the following:

Product ProductType SKU Size

T-shirt Top 111 XS

T-shirt Top 111 S

T-shirt Top 111 M

T-shirt Top 111 L

T-shirt Top 222 XS

T-shirt Top 222 S

T-shirt Top 222 M

T-shirt Top 222 L

T-shirt Top 333 XS

T-shirt Top 333 S

T-shirt Top 333 M

T-shirt Top 333 L

T-shirt Top 444 XS

T-shirt Top 444 S

T-shirt Top 444 M

T-shirt Top 444 L

Pant(Flared) Bottoms 555 M

Pant(Flared) Bottoms 555 L

Pant(Flared) Bottoms 666 M

Pant(Flared) Bottoms 666 L

Note that for simplicity sake, I have added two columns that will be repeated (Product, ProductType) but I have 5 such columns that contain strings.

I basically want to associate the SKU with the size for each product.

Can anyone help here ?

解决方案

This is open to bugs so use with caution:

Convert Product column to a collection of lists whose sizes are the same with the lists in other columns (say, column SKU. This will not work if the lists in SKU and Size are of different lengths)

df["Product"] = df["Product"].map(list) * df["SKU"].map(len)

Out[184]:

SKU Size Product

0 [111, 222, 333, 444] [XS, S, M, L] [a, a, a, a]

1 [555, 666] [M, L] [b, b]

Take the sum of the columns (it will extend the lists) and pass that to the dataframe constructor with to_dict():

pd.DataFrame(df.sum().to_dict())

Out[185]:

Product SKU Size

0 a 111 XS

1 a 222 S

2 a 333 M

3 a 444 L

4 b 555 M

5 b 666 L

Edit:

For several columns, you can define the columns to be repeated:

cols_to_be_repeated = ["Product", "ProductType"]

Save the rows that has None values in another dataframe:

na_df = df[pd.isnull(df["SKU"])].copy()

Drop None's from the original dataframe:

df.dropna(inplace = True)

Iterate over those columns:

for col in cols_to_be_repeated:

df[col] = df[col].map(lambda x: [x]) * df["SKU"].map(len)

And use the same approach:

pd.concat([pd.DataFrame(df.sum().to_dict()), na_df])

Product ProductType SKU Size

0 T-shirt Top 111.0 XS

1 T-shirt Top 222.0 S

2 T-shirt Top 333.0 M

3 T-shirt Top 444.0 L

4 Pant(Flared) Bottoms 555.0 M

5 Pant(Flared) Bottoms 666.0 L

2 Sweater Top NaN None

It might be better to work on a copy of the original dataframe.

  • 0
    点赞
  • 0
    收藏
    觉得还不错? 一键收藏
  • 0
    评论
评论
添加红包

请填写红包祝福语或标题

红包个数最小为10个

红包金额最低5元

当前余额3.43前往充值 >
需支付:10.00
成就一亿技术人!
领取后你会自动成为博主和红包主的粉丝 规则
hope_wisdom
发出的红包
实付
使用余额支付
点击重新获取
扫码支付
钱包余额 0

抵扣说明:

1.余额是钱包充值的虚拟货币,按照1:1的比例进行支付金额的抵扣。
2.余额无法直接购买下载,可以购买VIP、付费专栏及课程。

余额充值