python中数据框同时加两列_在pandas / python中的同一数据框中将两列合并为一列

最新推荐文章于 2024-02-29 20:03:30 发布

weixin_39562606

最新推荐文章于 2024-02-29 20:03:30 发布

阅读量859

点赞数

文章标签： python中数据框同时加两列

我有一个问题是在同一个数据帧(start_end)中将两列合并为一个,同时删除空值.我打算将“Start station”和“End station”合并到“station”中,并根据新列“station”保持“duration”.我已经尝试过pd.merge,pd.concat,pd.append,但我无法解决它.

start_end的dataFrame：

Duration End station Start station

14 1407 NaN 14th & V St NW

19 509 NaN 21st & I St NW

20 638 15th & P St NW. NaN

27 1532 NaN Massachusetts Ave & Dupont Circle NW

28 759 NaN Adams Mill & Columbia Rd NW

预期产量：

Duration stations

14 1407 14th & V St NW

19 509 21st & I St NW

20 638 15th & P St NW

27 1532 Massachusetts Ave & Dupont Circle NW

28 759 Adams Mill & Columbia Rd NW

我到目前为止的代码：

#start_end is the dataframe, 'start station', 'end station', 'duration'

start_end = pd.concat([df_start, df_end])

这是我试图：

station = pd.merge([start_end['Start station'],start_end['End station']])

解决方法:

>>> df

Duration End station Start station

0 1407 NaN 14th & V St NW

1 509 NaN 21st & I St NW

2 638 15th & P St NW. NaN

3 1532 NaN Massachusetts Ave & Dupont Circle NW

4 759 NaN Adams Mill & Columbia Rd NW

为两列提供相同的名称

>>> df.columns = df.columns.str.replace('.*?station', 'station')

>>> df

Duration station station

0 1407 NaN 14th & V St NW

1 509 NaN 21st & I St NW

2 638 15th & P St NW. NaN

3 1532 NaN Massachusetts Ave & Dupont Circle NW

4 759 NaN Adams Mill & Columbia Rd NW

然后堆栈取消堆叠.

>>> s = df.stack()

>>> s

0 Duration 1407

station 14th & V St NW

1 Duration 509

station 21st & I St NW

2 Duration 638

station 15th & P St NW.

3 Duration 1532

station Massachusetts Ave & Dupont Circle NW

4 Duration 759

station Adams Mill & Columbia Rd NW

dtype: object

>>> df = s.unstack()

>>> df

Duration station

0 1407 14th & V St NW

1 509 21st & I St NW

2 638 15th & P St NW.

3 1532 Massachusetts Ave & Dupont Circle NW

4 759 Adams Mill & Columbia Rd NW

>>>

这就是我认为这是有效的：

.stack使用MultiIndex创建一个系列,并为您处理空值.它对齐列名称的第二级,因为列名相同,只有一个 – unstacking只生成一个列.

如果你不改变列名,这只是基于Index之间差异的猜测.

>>> # without changing column names

>>> s.index

MultiIndex(levels=[[0, 1, 2, 3, 4], ['Duration', 'End station', 'Start station']],

labels=[[0, 0, 1, 1, 2, 2, 3, 3, 4, 4], [0, 2, 0, 2, 0, 1, 0, 2, 0, 2]])

>>> # column names the same

>>> s.index

MultiIndex(levels=[[0, 1, 2, 3, 4], ['Duration', 'station']],

labels=[[0, 0, 1, 1, 2, 2, 3, 3, 4, 4], [0, 1, 0, 1, 0, 1, 0, 1, 0, 1]])

似乎有点棘手,也许有人会评论它.

替代方案 – 使用pd.concat和.dropna

>>> stations = pd.concat([df.iloc[:,1],df.iloc[:,2]]).dropna()

>>> stations.name = 'stations'

>>> stations

2 15th & P St NW.

0 14th & V St NW

1 21st & I St NW

3 Massachusetts Ave & Dupont Circle NW

4 Adams Mill & Columbia Rd NW

Name: stations, dtype: object

>>> df2 = pd.concat([df['Duration'], stations], axis=1)

>>> df2

Duration stations

0 1407 14th & V St NW

1 509 21st & I St NW

2 638 15th & P St NW.

3 1532 Massachusetts Ave & Dupont Circle NW

4 759 Adams Mill & Columbia Rd NW

标签：python,pandas,dataframe,merge,append

weixin_39562606

关注

0
点赞
踩
0

收藏

觉得还不错? 一键收藏
0
评论
复制链接

分享到 QQ

分享到新浪微博

扫一扫