>>> df
Duration End station Start station
0 1407 NaN 14th & V St NW
1 509 NaN 21st & I St NW
2 638 15th & P St NW. NaN
3 1532 NaN Massachusetts Ave & Dupont Circle NW
4 759 NaN Adams Mill & Columbia Rd NW
为两列提供相同的名称
>>> df.columns = df.columns.str.replace('.*?station', 'station')
>>> df
Duration station station
0 1407 NaN 14th & V St NW
1 509 NaN 21st & I St NW
2 638 15th & P St NW. NaN
3 1532 NaN Massachusetts Ave & Dupont Circle NW
4 759 NaN Adams Mill & Columbia Rd NW
然后堆栈取消堆叠.
>>> s = df.stack()
>>> s
0 Duration 1407
station 14th & V St NW
1 Duration 509
station 21st & I St NW
2 Duration 638
station 15th & P St NW.
3 Duration 1532
station Massachusetts Ave & Dupont Circle NW
4 Duration 759
station Adams Mill & Columbia Rd NW
dtype: object
>>> df = s.unstack()
>>> df
Duration station
0 1407 14th & V St NW
1 509 21st & I St NW
2 638 15th & P St NW.
3 1532 Massachusetts Ave & Dupont Circle NW
4 759 Adams Mill & Columbia Rd NW
>>>
这就是我认为这是有效的:
.stack使用MultiIndex创建一个系列,并为您处理空值.它对齐列名称的第二级,因为列名相同,只有一个 – unstacking只生成一个列.
如果你不改变列名,这只是基于Index之间差异的猜测.
>>> # without changing column names
>>> s.index
MultiIndex(levels=[[0, 1, 2, 3, 4], ['Duration', 'End station', 'Start station']],
labels=[[0, 0, 1, 1, 2, 2, 3, 3, 4, 4], [0, 2, 0, 2, 0, 1, 0, 2, 0, 2]])
>>> # column names the same
>>> s.index
MultiIndex(levels=[[0, 1, 2, 3, 4], ['Duration', 'station']],
labels=[[0, 0, 1, 1, 2, 2, 3, 3, 4, 4], [0, 1, 0, 1, 0, 1, 0, 1, 0, 1]])
似乎有点棘手,也许有人会评论它.
替代方案 – 使用pd.concat和.dropna
>>> stations = pd.concat([df.iloc[:,1],df.iloc[:,2]]).dropna()
>>> stations.name = 'stations'
>>> stations
2 15th & P St NW.
0 14th & V St NW
1 21st & I St NW
3 Massachusetts Ave & Dupont Circle NW
4 Adams Mill & Columbia Rd NW
Name: stations, dtype: object
>>> df2 = pd.concat([df['Duration'], stations], axis=1)
>>> df2
Duration stations
0 1407 14th & V St NW
1 509 21st & I St NW
2 638 15th & P St NW.
3 1532 Massachusetts Ave & Dupont Circle NW
4 759 Adams Mill & Columbia Rd NW