python删除多列,Python Pandas-通过列表删除多列

I tried searching for the answer to this question but was not able to find it... so here it goes.

I have a dataset with 23987 columns. I actually only want the information in 35 of those columns (quite spread out between them). I have put these 35 items in a list. I wanted to know if there is a quick way to drop all the columns except those by passing the list

I tried this:

df1.drop(df1.columns.difference([ALTJ_genes]), axis=1, inplace=True)

ALTJ_genes is the list with the 35 items. The error I get is:

TypeError: unhashable type: 'list'

I was wondering if there is a way to do it, I know I can reach my goal by passing the individual columns but I want to know if with the list is possible. This would make the code much clearer.

In any case, thanks!

EDIT: I provide some screenshot, maybe it is useful.

zD9CL.png

ttkTB.png

Now, this is the complete error I get when passing the list with all the genes.

---------------------------------------------------------------------------

KeyError Traceback (most recent call last)

in

----> 1 df1[ALTJ_genes]

/opt/anaconda3/lib/python3.7/site-packages/pandas/core/frame.py in getitem(self, key)

2984 if is_iterator(key):

2985 key = list(key)

-> 2986 indexer = self.loc._convert_to_indexer(key, axis=1, raise_missing=True)

2987

2988 # take() does not accept boolean indexers

/opt/anaconda3/lib/python3.7/site-packages/pandas/core/indexing.py in _convert_to_indexer(self, obj, axis, is_setter, raise_missing)

1283 # When setting, missing keys are not allowed, even with .loc:

1284 kwargs = {"raise_missing": True if is_setter else raise_missing}

-> 1285 return self._get_listlike_indexer(obj, axis, **kwargs)1

1286 else:

1287 try:

/opt/anaconda3/lib/python3.7/site-packages/pandas/core/indexing.py in _get_listlike_indexer(self, key, axis, raise_missing)

1090

1091 self._validate_read_indexer(

-> 1092 keyarr, indexer, o._get_axis_number(axis), raise_missing=raise_missing

1093 )

1094 return keyarr, indexer

/opt/anaconda3/lib/python3.7/site-packages/pandas/core/indexing.py in _validate_read_indexer(self, key, indexer, axis, raise_missing)

1175 raise KeyError(

1176 "None of [{key}] are in the [{axis}]".format(

-> 1177 key=key, axis=self.obj._get_axis_name(axis)

1178 )

1179 )

KeyError: "None of [Index([ ('APEX1',), ('ASF1A',), ('CDKN2D',), ('CIB1',), ('DNA2',),\n ('FAAP24',), ('FANCM',), ('GEN1',), ('HRAS',), ('LIG1',),\n ('LIG3',), ('MEN1',), ('MRE11',), ('MSH3',), ('MSH6',),\n ('NUDT1',), ('MTOR',), ('NABP2',), ('NTHL1',), ('PALB2',),\n ('PARP1',), ('PARP3',), ('POLA1',), ('POLM',), ('POLQ',),\n ('PRPF19',), ('RAD51D',), ('RBBP8',), ('RRM2',), ('RUVBL2',),\n ('SOD1',), ('KAT5',), ('UNG',), ('WRN',), ('XRCC1',)],\n dtype='object', name='Gene_Name')] are in the [columns]"

解决方案

I think you need remove [] because ALTJ_genes is list and [ALTJ_genes] is nested list:

df1.drop(df1.columns.difference(ALTJ_genes), axis=1, inplace=True)

But simplier is select columns by list:

df1 = df1[ALTJ_genes]

EDIT:

I think problem is with defined columns with nested list, so get one level non standard MultiIndex:

df1 = pd.DataFrame([[1,2,3,4]])

#nested list

df1.columns = [['APEX1', 'ASF1A', 'CDKN2D', 'AAA']]

print (df1)

APEX1 ASF1A CDKN2D AAA

0 1 2 3 4

print (df1.columns)

MultiIndex([( 'APEX1',),

( 'ASF1A',),

('CDKN2D',),

( 'AAA',)],

)

If pass non nested list:

df1 = pd.DataFrame([[1,2,3,4]])

#not nested list

df1.columns = ['APEX1', 'ASF1A', 'CDKN2D', 'AAA']

print (df1)

APEX1 ASF1A CDKN2D AAA

0 1 2 3 4

print (df1.columns)

Index(['APEX1', 'ASF1A', 'CDKN2D', 'AAA'], dtype='object')

  • 0
    点赞
  • 0
    收藏
    觉得还不错? 一键收藏
  • 0
    评论
评论
添加红包

请填写红包祝福语或标题

红包个数最小为10个

红包金额最低5元

当前余额3.43前往充值 >
需支付:10.00
成就一亿技术人!
领取后你会自动成为博主和红包主的粉丝 规则
hope_wisdom
发出的红包
实付
使用余额支付
点击重新获取
扫码支付
钱包余额 0

抵扣说明:

1.余额是钱包充值的虚拟货币,按照1:1的比例进行支付金额的抵扣。
2.余额无法直接购买下载,可以购买VIP、付费专栏及课程。

余额充值