I have a dataframe like this:
--------------------------------------------------------------------
Product ProductType SKU Size
--------------------------------------------------------------------
T-shirt Top [111,222,333,444] ['XS','S','M','L']
Pant(Flared) Bottoms [555,666] ['M','L']
Sweater Top None None
I want the below output:
Product ProductType SKU Size
T-shirt Top 111 XS
T-shirt Top 222 S
T-shirt Top 333 M
T-shirt Top 444 L
Pant(Flared) Bottoms 555 M
Pant(Flared) Bottoms 666 L
Sweater Top None None
I tried the following code:
s = df['SKU'].apply(Series,1).stack()
s.index = s.index.droplevel(-1)
s.name = 'SKU'
del df['SKU']
df = df.join(s)
r = df['Size'].apply(Series,1).stack()
r.index = r.index.droplevel(-1)
r.name = 'Size'
del df['Size']
df = df.join(r)
But this explodes into the following:
Product ProductType SKU Size
T-shirt Top 111 XS
T-shirt Top 111 S
T-shirt Top 111 M
T-shirt Top 111 L
T-shirt Top 222 XS
T-shirt Top 222 S
T-shirt Top 222 M
T-shirt Top 222 L
T-shirt Top 333 XS
T-shirt Top 333 S
T-shirt Top 333 M
T-shirt Top 333 L
T-shirt Top 444 XS
T-shirt Top 444 S
T-shirt Top 444 M
T-shirt Top 444 L
Pant(Flared) Bottoms 555 M
Pant(Flared) Bottoms 555 L
Pant(Flared) Bottoms 666 M
Pant(Flared) Bottoms 666 L
Note that for simplicity sake, I have added two columns that will be repeated (Product, ProductType) but I have 5 such columns that contain strings.
I basically want to associate the SKU with the size for each product.
Can anyone help here ?
解决方案
This is open to bugs so use with caution:
Convert Product column to a collection of lists whose sizes are the same with the lists in other columns (say, column SKU. This will not work if the lists in SKU and Size are of different lengths)
df["Product"] = df["Product"].map(list) * df["SKU"].map(len)
Out[184]:
SKU Size Product
0 [111, 222, 333, 444] [XS, S, M, L] [a, a, a, a]
1 [555, 666] [M, L] [b, b]
Take the sum of the columns (it will extend the lists) and pass that to the dataframe constructor with to_dict():
pd.DataFrame(df.sum().to_dict())
Out[185]:
Product SKU Size
0 a 111 XS
1 a 222 S
2 a 333 M
3 a 444 L
4 b 555 M
5 b 666 L
Edit:
For several columns, you can define the columns to be repeated:
cols_to_be_repeated = ["Product", "ProductType"]
Save the rows that has None values in another dataframe:
na_df = df[pd.isnull(df["SKU"])].copy()
Drop None's from the original dataframe:
df.dropna(inplace = True)
Iterate over those columns:
for col in cols_to_be_repeated:
df[col] = df[col].map(lambda x: [x]) * df["SKU"].map(len)
And use the same approach:
pd.concat([pd.DataFrame(df.sum().to_dict()), na_df])
Product ProductType SKU Size
0 T-shirt Top 111.0 XS
1 T-shirt Top 222.0 S
2 T-shirt Top 333.0 M
3 T-shirt Top 444.0 L
4 Pant(Flared) Bottoms 555.0 M
5 Pant(Flared) Bottoms 666.0 L
2 Sweater Top NaN None
It might be better to work on a copy of the original dataframe.