python的dropna subset,Dropna没有删除,fillna没有填充,我的列表理解不能理解如何摆脱nans(python)...

I have a case where I am adding data from one dataframe to another, but I can't rid of the nan values.

Example data

df1 = pd.DataFrame(

{

'Journal' : ['US Drug standards.','Acta veterinariae.','Bulletin of big toe science.','The UK journal of dermatology.'],

'ISSN_1': ['0096-0225', '0567-8315','0007-4977','0007-0963'],

'ISSN_2': ['0096-0225','nan','0007-4977','0007-0963'],

'ISSN_3': ['nan','1820-7448','nan','0366-077X'],

'ISSN_4': ['nan','0567-8315','nan','1365-2133']

}

)

df1 = df1[['Journal'] + df1.columns[:-1].tolist()]

df2 = pd.DataFrame(

{

'Full Journal Title': ['Drug standards.','Acta veterinaria.','Bulletin of marine science.','The British journal of dermatology.'],

'Abbreviated Title': ['DStan','Avet','Marsci','BritSkin'],

'Total Cites': ['223','444','324','166'],

'ISSN': ['0096-0225','0567-8315','0007-4977','0007-0963']

})

#this makes list of ISSNs from df1 to combine into a column to add to df2

xx=df1.set_index('Journal').values.tolist()

df2['New']=df2.ISSN.apply(lambda x : [y for y in xx if x in y] )

df2=df2[df2.New.apply(len)>0]

df2['New']=df2.New.apply(pd.Series)[0].apply(lambda x : ','.join(x))

I have tried a replace:

df2 = df2.replace(np.nan, '', regex=True)

I have tried dropna:

print(df2.dropna(subset=['New']))

I have tried fillna:

print(df2.fillna(''))

I have tried a replace list comprehension:

xx = [value for value in xx if str(value) != 'nan']

No matter what I try, the "New" column is still full of nans.

0 0096-0225,0096-0225,nan,nan

1 0567-8315,nan,1820-7448,0567-8315

2 0007-4977,0007-4977,nan,nan

3 0007-0963,0007-0963,0366-077X,1365-2133

I want them skipped or dropped. I only want the valid ISSNs.

Thanks in advance for the help.

解决方案

There are a few things going on here. The first is that the question shows that 'nan' is in the dataframe, however the comment suggests that this should actually be nan (string versus null).

The second is that you are storing lists, and then strings of those lists in a dataframe which is typically discouraged - for precisely the reason you are running into - there is often unexpected behavior.

I will address the question as it was posed although you should be able to adapt this to nans as well

The code that is causing the issue is:

xx=df1.set_index('Journal').values.tolist()

df2['New']=df2.ISSN.apply(lambda x : [y for y in xx if x in y] )

df2=df2[df2.New.apply(len)>0]

df2['New']=df2.New.apply(pd.Series)[0].apply(lambda x : ','.join(x))

The second line here is adding all of the values in xx to df2['New'] which contains 'nan' then subsequent lines turn these into a list and then a string. Once those values exist in a string or list you are not going to be able access them with normal pandas methods.

My suggestion would be to remove them from xx and then they won't show up in df2 at all:

xx=df1.set_index('Journal').values.tolist()

#get rid of nans here

xx=[[y for y in x if y != 'nan'] for x in xx]

df2['New']=df2.ISSN.apply(lambda x : [y for y in xx if x in y] )

df2=df2[df2.New.apply(len)>0]

df2['New']=df2.New.apply(pd.Series)[0].apply(lambda x : ','.join(x))

Note the second line here is removing the 'nan's at time when they are easily accessible.

This should get you what you need, though once again I would caution against storing lists in dataframes if possible and be sure to use nan and not 'nan'. Hope this helps!

  • 0
    点赞
  • 0
    收藏
    觉得还不错? 一键收藏
  • 0
    评论
评论
添加红包

请填写红包祝福语或标题

红包个数最小为10个

红包金额最低5元

当前余额3.43前往充值 >
需支付:10.00
成就一亿技术人!
领取后你会自动成为博主和红包主的粉丝 规则
hope_wisdom
发出的红包
实付
使用余额支付
点击重新获取
扫码支付
钱包余额 0

抵扣说明:

1.余额是钱包充值的虚拟货币,按照1:1的比例进行支付金额的抵扣。
2.余额无法直接购买下载,可以购买VIP、付费专栏及课程。

余额充值