python插值填补缺省值_插值缺失值2d python

I have a 2d array(or matrix if you prefer) with some missing values represented as

NaN. The missing values are typically in a strip along one axis, eg:

1 2 3 NaN 5

2 3 4 Nan 6

3 4 Nan Nan 7

4 5 Nan Nan 8

5 6 7 8 9

where I would like to replace the NaN's by somewhat sensible numbers.

I looked into delaunay triangulation, but found very little documentation.

I tried using astropy's convolve as it supports use of 2d arrays, and is quite straightforward.

The problem with this is that convolution is not interpolation, it moves all values towards the average (which could be mitigated by using a narrow kernel).

This question should be the natural 2-dimensional extension to this post. Is there a way to interpolate over NaN/missing values in a 2d-array?

解决方案

Yes you can use scipy.interpolate.griddata and masked array and you can choose the type of interpolation that you prefer using the argument method usually 'cubic' do an excellent job:

import numpy as np

from scipy import interpolate

#Let's create some random data

array = np.random.random_integers(0,10,(10,10)).astype(float)

#values grater then 7 goes to np.nan

array[array>7] = np.nan

That looks something like this using plt.imshow(array,interpolation='nearest')

:

x = np.arange(0, array.shape[1])

y = np.arange(0, array.shape[0])

#mask invalid values

array = np.ma.masked_invalid(array)

xx, yy = np.meshgrid(x, y)

#get only the valid values

x1 = xx[~array.mask]

y1 = yy[~array.mask]

newarr = array[~array.mask]

GD1 = interpolate.griddata((x1, y1), newarr.ravel(),

(xx, yy),

method='cubic')

This is the final result:

Look that if the nan values are in the edges and are surrounded by nan values thay can't be interpolated and are kept nan. You can change it using the fill_value argument.

How would this work if there is a 3x3 region of NaN-values, would you get sensible data for the middle point?

It depends on your kind of data, you have to perform some test. You could for instance mask on purpose some good data try different kind of interpolation e.g. cubic, linear etc. etc. with the array with the masked values and calculuate the difference between the values interpolated and the original values that you had masked before and see which method return you the minor difference.

You can use something like this:

reference = array[3:6,3:6].copy()

array[3:6,3:6] = np.nan

method = ['linear', 'nearest', 'cubic']

for i in method:

GD1 = interpolate.griddata((x1, y1), newarr.ravel(),

(xx, yy),

method=i)

meandifference = np.mean(np.abs(reference - GD1[3:6,3:6]))

print ' %s interpolation difference: %s' %(i,meandifference )

That gives something like this:

linear interpolation difference: 4.88888888889

nearest interpolation difference: 4.11111111111

cubic interpolation difference: 5.99400137377

Of course this is for random numbers so it's normal that the result may vary a lot. So the best thing to do is to test on "on purpose masked" piece of your dataset and see what happen.

  • 0
    点赞
  • 0
    收藏
    觉得还不错? 一键收藏
  • 0
    评论
评论
添加红包

请填写红包祝福语或标题

红包个数最小为10个

红包金额最低5元

当前余额3.43前往充值 >
需支付:10.00
成就一亿技术人!
领取后你会自动成为博主和红包主的粉丝 规则
hope_wisdom
发出的红包
实付
使用余额支付
点击重新获取
扫码支付
钱包余额 0

抵扣说明:

1.余额是钱包充值的虚拟货币,按照1:1的比例进行支付金额的抵扣。
2.余额无法直接购买下载,可以购买VIP、付费专栏及课程。

余额充值