python去掉重复pandas_python pandas删除重复的列

最新推荐文章于 2024-02-27 15:05:39 发布

weixin_39846191

最新推荐文章于 2024-02-27 15:05:39 发布

阅读量625

点赞数

文章标签： python去掉重复pandas

What is the easiest way to remove duplicate columns from a dataframe?

I am reading a text file that has duplicate columns via:

import pandas as pd

df=pd.read_table(fname)

The column names are:

Time, Time Relative, N2, Time, Time Relative, H2, etc...

All the Time and Time Relative columns contain the same data. I want:

Time, Time Relative, N2, H2

All my attempts at dropping, deleting, etc such as:

df=df.T.drop_duplicates().T

Result in uniquely valued index errors:

Reindexing only valid with uniquely valued index objects

Sorry for being a Pandas noob. Any Suggestions would be appreciated.

Additional Details

Pandas version: 0.9.0

Python Version: 2.7.3

Windows 7

(installed via Pythonxy 2.7.3.0)

data file (note: in the real file, columns are separated by tabs, here they are separated by 4 spaces):

Time Time Relative [s] N2[%] Time Time Relative [s] H2[ppm]

2/12/2013 9:20:55 AM 6.177 9.99268e+001 2/12/2013 9:20:55 AM 6.177 3.216293e-005

2/12/2013 9:21:06 AM 17.689 9.99296e+001 2/12/2013 9:21:06 AM 17.689 3.841667e-005

2/12/2013 9:21:18 AM 29.186 9.992954e+001 2/12/2013 9:21:18 AM 29.186 3.880365e-005

... etc ...

2/12/2013 2:12:44 PM 17515.269 9.991756+001 2/12/2013 2:12:44 PM 17515.269 2.800279e-005

2/12/2013 2:12:55 PM 17526.769 9.991754e+001 2/12/2013 2:12:55 PM 17526.769 2.880386e-005

2/12/2013 2:13:07 PM 17538.273 9.991797e+001 2/12/2013 2:13:07 PM 17538.273 3.131447e-005

解决方案

There's a one line solution to the problem. This applies if some column names are duplicated and you wish to remove them:

df = df.loc[:,~df.columns.duplicated()]

How it works:

Suppose the columns of the data frame are ['alpha','beta','alpha']

df.columns.duplicated() returns a boolean array: a True or False for each column. If it is False then the column name is unique up to that point, if it is True then the column name is duplicated earlier. For example, using the given example, the returned value would be [False,False,True].

Pandas allows one to index using boolean values whereby it selects only the True values. Since we want to keep the unduplicated columns, we need the above boolean array to be flipped (ie [True, True, False] = ~[False,False,True])

Finally, df.loc[:,[True,True,False]] selects only the non-duplicated columns using the aforementioned indexing capability.

Note: the above only checks columns names, not column values.

weixin_39846191

关注

0
点赞
踩
0

收藏

觉得还不错? 一键收藏
0
评论
python去掉重复pandas_python pandas删除重复的列

What is the easiest way to remove duplicate columns from a dataframe?I am reading a text file that has duplicate columns via:import pandas as pddf=pd.read_table(fname)The column names are:Time, Time R...
复制链接

扫一扫