扫描件图像背景消除

最新推荐文章于 2023-09-13 20:05:02 发布

wi162yyxq

最新推荐文章于 2023-09-13 20:05:02 发布

阅读量1.6k

点赞数

分类专栏： opncv python 文章标签： python 计算机视觉

本文链接：https://blog.csdn.net/wi162yyxq/article/details/113241749

版权

opncv 同时被 2 个专栏收录

39 篇文章

订阅专栏

python

25 篇文章

订阅专栏

翻译原文地址：https://mzucker.github.io/2016/09/20/noteshrink.html

input/output comparison

左：输入扫描@300 DPI，7.2Mb PNG/790 KB JPG。右：输出@相同分辨率，121 KB PNG。

算法表现结果如上，突出前景文字类信息，消隐文件背景图像，同时减少了图像大小。

一、背景识别

在这里我们讨论的非均匀背景情况，均匀背景一个二值化就解决了。

一般来说图像都比较大，像素点很多，这里的原始扫描图像为2,081 x 2,531，总面积为5,267,011像素，如果每一个像素都考虑进去计算量过于庞大，这里我们随机选择一部分例如2%的像素作为代表来判断背景。让我们看看从原始扫描中随机选择的10,000个像素的一个更小的子集：

random pixels

def sample_pixels(img, options):

    '''Pick a fixed percentage of pixels in the image, returned in random
order.'''

    pixels = img.reshape((-1, 3))
    num_pixels = pixels.shape[0]
    num_samples = int(num_pixels*options.sample_fraction)

    idx = np.arange(num_pixels)
    np.random.shuffle(idx)

    return pixels[idx[:num_samples]]

虽然它与实际扫描页面几乎没有相似之处，但这两张图像中的颜色分布几乎是相同的。它们都是灰白色的，有少量的红色、蓝色和深灰色像素。以下是该10,000个像素按亮度排序（它们的R、G和B强度之和)：

random pixels, sorted

从远处看，底部80%-90%的图像似乎都是相同的颜色；事实上，上面图像中最多的颜色，RGB值(240,240,242)，仅占10,000个样本中的226不到像素总数的3%。

因为这里颜色最多的部分只占样本的一小部分，用它描述图像中颜色分布的很显然不是很合适。为了能在找到更合适的颜色区间，我们将颜色进行压缩，每条通道8位移动到4位，我们将相似的像素分组到更大的“回收箱”中，这样就可以更容易地在数据中找到一个强峰值。

random pixels, sorted, 4 bits per channel

######################################################################

def quantize(image, bits_per_channel=None):

    '''Reduces the number of bits per channel in the given image.'''

    if bits_per_channel is None:
        bits_per_channel = 6

    assert image.dtype == np.uint8

    shift = 8-bits_per_channel
    halfbin = (1 << shift) >> 1

    return ((image.astype(int) >> shift) << shift) + halfbin

######################################################################

def pack_rgb(rgb):

    '''Packs a 24-bit RGB triples into a single integer,
works on both arrays and tuples.'''

    orig_shape = None

    if isinstance(rgb, np.ndarray):
        assert rgb.shape[-1] == 3
        orig_shape = rgb.shape[:-1]
    else:
        assert len(rgb) == 3
        rgb = np.array(rgb)

    rgb = rgb.astype(int).reshape((-1, 3))

    packed = (rgb[:, 0] << 16 |
              rgb[:, 1] << 8 |
              rgb[:, 2])

    if orig_shape is None:
        return packed
    else:
        return packed.reshape(orig_shape)

######################################################################

def unpack_rgb(packed):

    '''Unpacks a single integer or array of integers into one or more
24-bit RGB values.

    '''

    orig_shape = None

    if isinstance(packed, np.ndarray):
        assert packed.dtype == int
        orig_shape = packed.shape
        packed = packed.reshape((-1, 1))

    rgb = ((packed >> 16) & 0xff,
           (packed >> 8) & 0xff,
           (packed) & 0xff)

    if orig_shape is None:
        return rgb
    else:
        return np.hstack(rgb).reshape(orig_shape + (3,))

######################################################################

def get_bg_color(image, bits_per_channel=None):

    '''Obtains the background color from an image or array of RGB colors
by grouping similar colors into bins and finding the most frequent
one.

    '''

    assert image.shape[-1] == 3

    quantized = quantize(image, bits_per_channel).astype(int)
    packed = pack_rgb(quantized)

    unique, counts = np.unique(packed, return_counts=True)

    packed_mode = unique[counts.argmax()]

    return unpack_rgb(packed_mode)

现在最常见的颜色有RGB值(224,224,224)，占采样像素的3,623(36%)。本质上，通过减少比特深度，我们将相似的像素分组到更大的“回收箱”中，这样就可以更容易地在数据中找到一个强峰值。

在可靠性和精确性之间有一种权衡：小垃圾箱能更好地区分颜色，但更大的回收箱更容易找到峰值。最后，我用6位深度来识别背景颜色，这似乎是两个极端之间的一个平衡点。

二、前景分离

一旦我们确定了背景颜色，我们就可以根据图像中每个像素与它的相似程度定前背景。计算两种颜色相似性的一种自然方法是计算欧式距离在RGB空间中的坐标；但是，这个简单的方法无法正确分割如下所示的颜色：

对应的欧式距离：

颜色	被发现的地方	R	G	B	迪斯特。从BG
白色	背景	238	238	242	—
灰	背景	160	168	166	129.4
黑色	前景	71	73	71	290.4
红色	前景	219	83	86	220.7
粉红	左缘垂直线	243	179	182	84.3

很明显无法把粉红色的线条和背景区分开，所以我们从rgb空间转移到HSV空间。

diagram of HSV space

颜色	明度	饱和	与BG的明度差异	与BG的饱和度差异
白色	0.949	0.017	—	—
灰	0.659	0.048	0.290	0.031
黑色	0.286	0.027	0.663	0.011
红色	0.859	0.621	0.090	0.604
粉红	0.953	0.263	0.004	0.247

白色、黑色和灰色的值差别很大，但也有类似的低饱和水平，远远低于红色或粉红色。有了HSV提供的附加信息，我们就可以成功地将一个像素标记为属于前景的像素，条件之一是：

该值与背景明度差超过0.3，或
饱和度与背景颜色相差超过0.2

前者是黑色笔迹，后者则是红色墨水和粉色线条。这两个标准都成功地排除了前景中的灰色。不同的图像可能需要不同的饱和/明度阈值。

def get_fg_mask(bg_color, samples, options):

    '''Determine whether each pixel in a set of samples is foreground by
comparing it to the background color. A pixel is classified as a
foreground pixel if either its value or saturation differs from the
background by a threshold.'''

    s_bg, v_bg = rgb_to_sv(bg_color)
    s_samples, v_samples = rgb_to_sv(samples)

    s_diff = np.abs(s_bg - s_samples)
    v_diff = np.abs(v_bg - v_samples)

    return ((v_diff >= options.value_threshold) |
            (s_diff >= options.sat_threshold))

######################################################################

def rgb_to_sv(rgb):

    '''Convert an RGB image or array of RGB colors to saturation and
value, returning each one as a separate 32-bit floating point array or
value.

    '''

    if not isinstance(rgb, np.ndarray):
        rgb = np.array(rgb)

    axis = len(rgb.shape)-1
    cmax = rgb.max(axis=axis).astype(np.float32)
    cmin = rgb.min(axis=axis).astype(np.float32)
    delta = cmax - cmin

    saturation = delta.astype(np.float32) / cmax.astype(np.float32)
    saturation = np.where(cmax == 0, 0, saturation)

    value = cmax/255.0

    return saturation, value

######################################################################

def get_palette(samples, options, return_mask=False, kmeans_iter=40):

    '''Extract the palette for the set of sampled RGB values. The first
palette entry is always the background color; the rest are determined
from foreground pixels by running K-means clustering. Returns the
palette, as well as a mask corresponding to the foreground pixels.

    '''

    if not options.quiet:
        print('  getting palette...')

    bg_color = get_bg_color(samples, 6)

    fg_mask = get_fg_mask(bg_color, samples, options)

    centers, _ = kmeans(samples[fg_mask].astype(np.float32),
                        options.num_colors-1,
                        iter=kmeans_iter)

    palette = np.vstack((bg_color, centers)).astype(np.uint8)

    if not return_mask:
        return palette
    else:
        return palette, fg_mask

最后应用在源图像上，提取前景像素点

######################################################################

def apply_palette(img, palette, options):

    '''Apply the pallete to the given image. The first step is to set all
background pixels to the background color; then, nearest-neighbor
matching is used to map each foreground color to the closest one in
the palette.

    '''

    if not options.quiet:
        print('  applying palette...')

    bg_color = palette[0]

    fg_mask = get_fg_mask(bg_color, img, options)

    orig_shape = img.shape

    pixels = img.reshape((-1, 3))
    fg_mask = fg_mask.flatten()

    num_pixels = pixels.shape[0]

    labels = np.zeros(num_pixels, dtype=np.uint8)

    labels[fg_mask], _ = vq(pixels[fg_mask], palette)

    return labels.reshape(orig_shape[:-1])

大体思路其实就是减少通道数进行像素数量判断，选择背景像素，然后根据选择的背景颜色hsv值进行判断，最后提取前景信息。