python的dropna subset,Dropna没有删除，fillna没有填充，我的列表理解不能理解如何摆脱nans（python）...

最新推荐文章于 2024-04-21 08:30:00 发布

z3精品咖啡

最新推荐文章于 2024-04-21 08:30:00 发布

阅读量323

点赞数

文章标签： python的dropna subset

I have a case where I am adding data from one dataframe to another, but I can't rid of the nan values.

Example data

df1 = pd.DataFrame(

{

'Journal' : ['US Drug standards.','Acta veterinariae.','Bulletin of big toe science.','The UK journal of dermatology.'],

'ISSN_1': ['0096-0225', '0567-8315','0007-4977','0007-0963'],

'ISSN_2': ['0096-0225','nan','0007-4977','0007-0963'],

'ISSN_3': ['nan','1820-7448','nan','0366-077X'],

'ISSN_4': ['nan','0567-8315','nan','1365-2133']

}

)

df1 = df1[['Journal'] + df1.columns[:-1].tolist()]

df2 = pd.DataFrame(

{

'Full Journal Title': ['Drug standards.','Acta veterinaria.','Bulletin of marine science.','The British journal of dermatology.'],

'Abbreviated Title': ['DStan','Avet','Marsci','BritSkin'],

'Total Cites': ['223','444','324','166'],

'ISSN': ['0096-0225','0567-8315','0007-4977','0007-0963']

})

#this makes list of ISSNs from df1 to combine into a column to add to df2

xx=df1.set_index('Journal').values.tolist()

df2['New']=df2.ISSN.apply(lambda x : [y for y in xx if x in y] )

df2=df2[df2.New.apply(len)>0]

df2['New']=df2.New.apply(pd.Series)[0].apply(lambda x : ','.join(x))

I have tried a replace:

df2 = df2.replace(np.nan, '', regex=True)

I have tried dropna:

print(df2.dropna(subset=['New']))

I have tried fillna:

print(df2.fillna(''))

I have tried a replace list comprehension:

xx = [value for value in xx if str(value) != 'nan']

No matter what I try, the "New" column is still full of nans.

0 0096-0225,0096-0225,nan,nan

1 0567-8315,nan,1820-7448,0567-8315

2 0007-4977,0007-4977,nan,nan

3 0007-0963,0007-0963,0366-077X,1365-2133

I want them skipped or dropped. I only want the valid ISSNs.

Thanks in advance for the help.

解决方案

There are a few things going on here. The first is that the question shows that 'nan' is in the dataframe, however the comment suggests that this should actually be nan (string versus null).

The second is that you are storing lists, and then strings of those lists in a dataframe which is typically discouraged - for precisely the reason you are running into - there is often unexpected behavior.

I will address the question as it was posed although you should be able to adapt this to nans as well

The code that is causing the issue is:

xx=df1.set_index('Journal').values.tolist()

df2['New']=df2.ISSN.apply(lambda x : [y for y in xx if x in y] )

df2=df2[df2.New.apply(len)>0]

df2['New']=df2.New.apply(pd.Series)[0].apply(lambda x : ','.join(x))

The second line here is adding all of the values in xx to df2['New'] which contains 'nan' then subsequent lines turn these into a list and then a string. Once those values exist in a string or list you are not going to be able access them with normal pandas methods.

My suggestion would be to remove them from xx and then they won't show up in df2 at all:

xx=df1.set_index('Journal').values.tolist()

#get rid of nans here

xx=[[y for y in x if y != 'nan'] for x in xx]

df2['New']=df2.ISSN.apply(lambda x : [y for y in xx if x in y] )

df2=df2[df2.New.apply(len)>0]

df2['New']=df2.New.apply(pd.Series)[0].apply(lambda x : ','.join(x))

Note the second line here is removing the 'nan's at time when they are easily accessible.

This should get you what you need, though once again I would caution against storing lists in dataframes if possible and be sure to use nan and not 'nan'. Hope this helps!

z3精品咖啡

关注

0
点赞
踩
0

收藏

觉得还不错? 一键收藏
0
评论
python的dropna subset,Dropna没有删除，fillna没有填充，我的列表理解不能理解如何摆脱nans（python）...

I have a case where I am adding data from one dataframe to another, but I can't rid of the nan values.Example datadf1 = pd.DataFrame({'Journal' : ['US Drug standards.','Acta veterinariae.','Bulletin o...
复制链接

扫一扫