CBIR: Texture Features 基于内容的图像检索:纹理特征

CBIR: Texture Features

QBE Using Texture Features

 

"Texture - ...(in extended use) the constitution, structure, or substance of anything with regard to its constituents or formative elements." (The Oxford Dictionary, 1971; 1989).

"Texture - ...a basic scheme or structure; the overall structure of something incorporating all of most of parts." (Webster's Dictionary, 1959; 1986).

Texture is a very general notion that can be attributed to almost everything in nature. For a human, the texture relates mostly to a specific, spatially repetitive (micro)structure of surfaces formed by repeating a particular element or several elements in different relative spatial positions. Generally, the repetition involves local variations of scale, orientation, or other geometric and optical features of the elements.

Image textures are defined as images of natural textured surfaces and artificially created visual patterns, which approach, within certain limits, these natural objects. Image sensors yield additional geometric and optical transformations of the perceived surfaces, and these transformations should not affect a particular class of textures the surface belongs.

Flowers0001Flowers0002Leaves0011Metal0002
Synthesised textures
Examples of natural image textures (VisTex database, MIT Media Lab., USA) and textures synthesised using a generic Gibbs random field (GGRF) model with multiple pairwise pixel interactions.

It is almost impossible to describe textures in words, although each human definition involves various informal qualitative structural features, such as fineness - coarseness, smoothness, granularity, lineation, directionality, roughness, regularity - randomness, and so on. These features, which define a spatial arrangement of texture constituents, help to single out the desired texture types, e.g. fine or coarse, close or loose, plain or twilled or ribbed textile fabrics. It is difficult to use human classifications as a basis for formal definitions of image textures, because there is no obvious ways of associating these features, easily perceived by human vision, with computational models that have the goal to describe the textures. Nonetheless, after several decades of reseach and development of texture analysis and synthesis, a variety of computational characteristics and properties for indexing and retrieving textures have been found. The textural features describe local arrangements of image signals in the spatial domain or the domain of Fourier or other spectral transforms. In many cases, the textural features follow from a particular random field model of textured images (Castelli & Bergman, 2002).

Texture Features and Co-occurrence Matrices

Many statistical texture features are based on co-occurrence matrices representing second-order statistics of grey levels in pairs of pixels in an image. The matrices are sufficient statistics of a Markov/Gibbs random field with multiple pairwise pixel interactions.

A co-occurrence matrix shows how frequent is every particular pair of grey levels in the pixel pairs, separated by a certain distance along a certain direction a.

Let g = (gx,yx = 1, ..., My = 1, ..., N) be a digital image. Let Q={0, ..., qmax} be the set of grey levels. The co-occurrence matrix for a given inter-pixel distance d and directional angle a is defined as

COOC m=d cos an=d sin a( g)=[ COOC mn( qs| g):  q, s = 0, ...,  q max]

where COOCmn(qs|g) is the cardinality of the set Cm,n of pixel pairs [(x,y), (x+my+n)] such that gx,y=q and gx+m,y+n=s


Various statistical and information theoretic properties of the co-occurrence matrices can serve as textural features (e.g., such features as homogeneity, coarseness, or periodicity introduced by Haralick). But these features are expensive to compute, and they were not very efficient for image classification and retrieval (Castelli & Bergman, 2002).

Tamura's Texture Features

Today's CBIR systems use in most cases the set of six visual features, namely,

  • coarseness,
  • contrast,
  • directionality,
  • linelikeness,
  • regularity,
  • roughness
selected by Tamura, Mori, and Yamawaki ( Tamura e.a., 1977 Castelli & Bergman, 2002 ) on the basis of psychological experiments.

Coarseness relates to distances of notable spatial variations of grey levels, that is, implicitly, to the size of the primitive elements (texels) forming the texture. The proposed computational procedure accounts for differences between the average signals for the non-overlapping windows of different size:

  1. At each pixel (x,y), compute six averages for the windows of size 2k × 2kk=0,1,...,5, around the pixel.
  2. At each pixel, compute absolute differencesEk(x,y) between the pairs of nonoverlapping averages in the horizontal and vertical directions.
  3. At each pixel, find the value of k that maximises the difference Ek(x,y) in either direction and set the best size Sbest(x,y)=2k.
  4. Compute the coarseness feature Fcrs by averaging Sbest(x,y) over the entire image.

Instead of the average of Sbest(x,y, an improved coarseness feature to deal with textures having multiple coarseness properties is a histogram characterising the whole distribution of the best sizes over the image (Castelli & Bergman, 2002).

Contrast measures how grey levels qq = 0, 1, ..., qmax, vary in the image g and to what extent their distribution is biased to black or white. The second-order and normalised fourth-order central moments of the grey level histogram (empirical probability distribution), that is, the variance, σ2, and kurtosis, α4, are used to define the contrast:  where  andm is the mean grey level, i.e. the first order moment of the grey level probability distribution. The value n=0.25 is recommended as the best for discriminating the textures.

Degree of directionality is measured using the frequency distribution of oriented local edges against their directional angles. The edge strengthe(x,y) and the directional angle a(x,y) are computed using the Sobel edge detector approximating the pixel-wise x- and y-derivatives of the image:

e(x,y) = 0.5(|Δx(x,y)| + |Δy(x,y)| )
a(x,y) = tan-1(Δy(x,y) / Δx(x,y))

where Δx(x,y) and Δy(x,y) are the horizontal and vertical grey level differences between the neighbouring pixels, respectively. The differences are measured using the following 3 × 3 moving window operators:

−101   1  1  1
−101   0  0  0
−101 −1−1−1

A histogram Hdir(a) of quantised direction values a is constructed by counting numbers of the edge pixels with the corresponding directional angles and the edge strength greater than a predefined threshold. The histogram is relatively uniform for images without strong orientation and exhibits peaks for highly directional images. The degree of directionality relates to the sharpness of the peaks: 


where  np  is the number of peaks,  ap  is the position of the  p th peak,  wp  is the range of the angles attributed to the  p th peak (that is, the range between valleys around the peak),  r  denotes a normalising factor related to quantising levels of the angles  a , and  a  is the quantised directional angle (cyclically in modulo 180 o ).

Three other features are highly correlated with the above three features and do not add much to the effectiveness of the texture description. Thelinelikeness feature Flin is defined as an average coincidence of the edge directions (more precisely, coded directional angles) that co-occurred in the pairs of pixels separated by a distance d along the edge direction in every pixel. The edge strength is expected to be greater than a given threshold eliminating trivial "weak" edges. The coincidence is measured by the cosine of difference between the angles, so that the co-occurrences in the same direction are measured by +1 and those in the perpendicular directions by -1.

The regularity feature is defined as Freg=1-r(scrs+scon+sdir + slin) where r is a normalising factor and each s... means the standard deviation of the corresponding feature F... in each subimage the texture is partitioned into. The roughness feature is given by simply summing the coarseness and contrast measures: Frgh=Fcrs+Fcon

In the most cases, only the first three Tamura's features are used for the CBIR. These features capture the high-level perceptual attributes of a texture well and are useful for image browsing. However, they are not very effective for finer texture discrimination (Castelli & Bergman, 2002).

Markov Random Field Texture Models

Random field models consider an image as a 2D array of random scalars (grey values) or vectors (colours). In other words, the signal at each pixel location is a random variable. Each type of textures is characterised by a joint probability distribution of signals that accounts for spatial inter-dependence, or interaction among the signals. The interacting pixel pairs are usually calles neighbours, and a random field texture model is characterised by geometric structure and quantitative strength of interactions among the neighbours.

If pixel interactions are assumed translation invariant, the interaction structure is given by a set N of characteristic neighbours of each pixel. This results in the Markov random field model where the conditional probability of signals in each pixel (x,y) depends only on the signals in the neighbourhood {(x+m,y+n): (m,n) from the set N}. 


In a special case of the simultaneous autoregressive (SAR) Gauss-Markov model, the texture is represented by a set of parameters of the autoregression: 


Here,  w  is independent (white) noise with zero mean and unit variance, and parameters  a ( m , n ) and  s  specify the SAR model. The basic problem is how to find the adequate neighbourhood, and this nontrivial problem has no general solution.

More general generic Gibbs random field models with multiple pairwise pixel interactions allow to relate the desired neighbourhood to a set of most "energetic" pairs of the neighbours. Then the interaction structure itself and relative frequency distributions of signal cooccurrences in the chosen pixel pairs can serve as the texture features (Gimel'farb & Jain, 1996). 

MIT "Fabrics0008"Interaction structure
 with 35 neighbours

Similarity Measures for Texture Features

Texture features are usually compared on the basis of dissimilarity between the two feature vectors. The dissimilarity is given by the Euclidean, Mahalanobis, or city-block distance. In some cases, the weigted distances are used where the weight of each vector component is inversely proportional to the standard deviation of this feature in the database.

If the feature vector represents relative frequency distribution (e.g., a normalised grey level cooccurrence histogram), the dissimilarity can also be measured by the relative entropy, or Kullback-Leibler (K-L) divergence. Let D(g,q) denote the divergence between two distributions, fg = (fg,t : t=1, ..., T) and fq = (fq,t : t=1, ..., T). Then 


This dissimilarity measure is asymmetric and does not represent a distance because the triangle inequality is not satisfied. The symmetric distance is obtained by averaging  D ( g , q ) and  D ( q , g ). It should be noted that no single similarity measure achieves the best overall performance for retrieval of different textures ( Castelli & Bergman, 2002 ).

Wold Decomposition Based and Gabor Texture Features

If a texture is modelled as a sample of a 2D stationary random field, the Wold decomposition can also be used for similarity-based retrieval (Liu & Picard, 1996Castelli & Bergman, 2000). In the Wold model a spatially homogeneous random field is decomposed into three mutually orthogonal components, which approximately represent periodicity, directionality, and a purely random part of the field.

The deterministic periodicity of the image is analysed using the autocorrelation function. The corresponding Wold feature set consists of the frequencies and the magnitudes of the harmonic spectral peaks (e.g., the K largest peaks). The indeterministic (random) components of the image are modelled with the multiresolution simultaneous autoregressive (MR-SAR) process. The retrieval uses matching of the harmonic peaks and the distances between the MRSAR parameters. The similarity measure involves a weighted ordering based on the confidence in the query pattern regularity. Experiments with some natural texture datrabases had shown that the Wold model provides perceptually better quality retrieval than the MR-SAR model or the Tamura's features (Castelli & Bergman, 2002).

An alternative to the spatial domain for computing the texture features is to use domains of specific transforms, such as the discrete Fourier transform (DFT), the discrete cosine transform (DCT), or the discrete wavelet transforms (DWT). Global power spectra computed from the DFT have not been effective in texture classification and retrieval, comparting to local features in small windows. At present, most promising for texture retrieval are multiresolution features obtained with orthogonal wavelet transforms or with Gabor filtering. The features describe spatial distributions of oriented edges in the image at multiple scales.

A 2D Gabor function γ(x,y) and its Fourier transform Γ(u,v) are as follows (Manjunath & Ma, 1996):

where σu = 1/2πσx and σv = 1/2πσy. The Gabor function is a product of an elliptical Gaussian and a complex-plane wave and it minimises joint 2D uncertainty in both spatial and frequency domain. Appropriate dilations and rotations of this function yield a class of self-similar Gabor filters for orientation- and scale-tunable edge and line detection. The filters form a complete but non-orthogonal basis set for expanding an image and getting its localised spatial frequency description. The total number of Gabor filters is equal to the product of the numbers of scales and orientations.

A class of self-similar Gabor wavelets is produced from the "mother" wavelet γ(x,y) by the dilations and rotations specified with the generating function that depends on the integer parameters m and k and the scale factor a > 1 as follows: γmk(x,y) = amγ(x′n,y′k) where x′k = am(x cos θk + y sin θk); y′k = am(−x sin θk + y cos θk); &thetak = kπ/K, and K is the total number of orientations. The scale factor am for x′kand y′k makes the filter energy independent of m. To exclude sensitivity of the filters to absolute intensity values, the real (even) components of the 2D Gabor filters are usually biased to make them zero mean. An ensemble of grey-coded differently oriented odd (a) and even (b) 2D Gabor filters are exemplified below (see Lee, 1996 for more detail):

These generalised Gabor functions are of the following 2D form:

where (x0y0) is the spatial location of the filter centre in the image, and (u0v0) is the spatial frequency of the filter in the frequency domain.

The Gabor wavelets are not orthogonal. Therefore, to reduce the informational redundancy of the filter outputs, the typical filter design strategy makes the half-peak magnitude supports of the filter responses in the spatial frequency domain touch each other. For example (Manjunath & Ma, 1996), the contours below correspond to the half-peak magnitude of the filter responses in the set of Gabor filters with the upper centre frequency of interest, uh = 0.4, the lower centre frequency of interest, ul = 0.05, six orientations (K = 6), and four scales (S = 4):

In such a design, the scale factor a and filter parameters σu and σv (and thus σx and σy) are specified in terms of uh = WulK, and S, in particular, a = (uh/ul)−1/(S−1) (Manjunath & Ma, 1996Vajihollahi & Farahbod, 2002):

The Gabor texture features include the mean and the standard deviation of the magnitude of the Gabor wavelet transform coefficients. Given an image g = (g(x,y): (x = 0, 1, ..., X−1; y = 0, 1, ..., Y−1), the Gabor wavelet transform is defined as

where the asterisk (*) indicates the complex conjugate, and the mean μmk and the stabdard deviation σmk of the magnitude of the transformed image describe each local texture region under an assumption of its spatial homogeneity:

A feature vector contains these pairs for all the scales and orientations of the wavelets, e.g. for the six orientations (K = 6), and four scales (S = 4) the feature vector contains 24 pairs: f=(μ00, σ00, ..., μ35, σ35).

MPEG-7 Texture Descriptors

The MPEG-7 multimedia content description interface involves three texture descriptors for representing texture regions in images (Manjunath e.a., 2001Sikora, 2001), namely,

  • the texture browsing descriptor to characterise perceptual directionality, regularity, and coarseness of a texture,
  • the homogeneous texture descriptor (HTD) to quantitatively characterise homogeneous texture regions for similarity retrieval using local spatial statistics of the texture obtained by scale- and orientation-selective Gabor filtering, and
  • the local edge histogram descriptor to characterise non-homogeneous texture regions.

    Texture Browsing Descriptor

    This 12-bit descriptor relates to regularity, directionality, and coarseness (scale) of visual texture perception and can be used both for browsing and coarse classification of textures. First, the image is filtered with a bank of orientation- and scale-tuned Gabor filters in order to select and code two dominant texture orientations (3 bits per orientation). Then an analysis of filtered projections of the image along the dominant orientations specify the regularity (2 bits) and coarseness (2 bits per scale). The second dominant orientation and second scale features are optional.

    The regularity of a texture has four levels, from 0 (irregular, or random texture) to 3 (a periodic pattern). There is an ambiguity in the intermediate two values: a well-defined directionality with no perceivable micro-pattern is considered more regular than a pattern that lacks directionality and periodicity, even if the individual micro-patterns are clearly identified:

    Texture regularity00 (irregular)011011 (periodic)
    Texture example
    from the Brodatz's
    digitised set
     D005D066D068D001

    The directionality of a texture is quantified to six values from 0o to 150o in steps of 30o. In particular, the above texture D001 has strong vertical and horizontal directionalities. The descriptor specifies up to two dominant directions encoded each to 3 bits: 0 means a texture does not have any dominant directionality, and the remaining directions have values from 1 to 6.

    The coarseness associated with each dominant direction relates to image scale or resolution and is quantised to four levels: 0 indicates a fine grain texture and 3 indicates a coarse texture. These values are also related to the frequency space partitioning used in computing the HTD (see the next section).

    To compute the browsing descriptor, an image is filtered using a bank of scale and orientation selective band-pass filters similar to those for the HTD. The filtered outputs are then used to compute the components of the texture browsing descriptor. Bec ause the descroptor semantics relates to human perception of the texture, the descriptor can be also specified manually. In browsing, any combination of the three main components - regularity, directionality, and coarseness - can be used to browse the database. In similarity retrieval, the texture browsing detector can be used to select a set of candidates; then the HTD provides a precise similarity matching among the candidate images.

    Homogeneous Texture Descriptor

    Homogeneous texture is an important visual primitive for searching and browsing through large collections of similar looking patterns. If an image can be partitioned into a set of homogeneous texture regions, then the texture features associated with the regions can index the image data. Examples of homogeneous textured patterns are viewed from a distance parking lots with cars parked at regular intervals, or aagricultural areas and vegetation patches in aerial and satellite imagery. Examples of queries that could be supported in this context could include "Retrieve all satellite images with less than 20% cloud cover" or "Find a vegetation patch that looks like this region" (Martinez, 2004).

    This descriptor uses 62 8-bit numbers per image or image region in order to allow for accurate search and retrieval. The image is filtered with a bank of orientation and scale sensitive Gabor filters, and the means and the standard deviations of the filtered outputs in the spatial frequency domain (5 scales × 6 orientations per scale) are used as the descriptor components. The frequency space is partitioned into 30 channels with equal angular divisions at 30o intervals and five-octave division in the radial direction:

    In a normalised frequency space 0 ≥ &omega ≥ 1, the centre frequencies of the feature channels are spaced equally in 30o in angular direction such that the polar angle θk = 30ok where k = 0, 1, ..., 5 is angular index. In the radial direction, the centre frequencies of the neighbouring feature channels are spaced one octave apart: ωm = 2mω0 where m = 0, 1, ..., 4 is radial index and ω0 = 0.75 is the highest centre frequency, i.e. ω1 = 0.375; ω2 = 0.1875; ω3 = 0.09375, and ω4 = 0.046875. The octave bandwidths Bm are, respectively, B0 = 0.5; B1 = 0.25; B2 = 0.125; B3 = 0.0625, and B4 = 0.03125. The channel index i = 6m + k + 1.

    The Gabor filters for the feature channels are represented in the polar coordinates; the Fourier transform of such a 2D Gabor function is:

    The filter parameters ensure that the half-peak contours of the 2D Gaussians of adjacent filters in the radial and angular directions touch each other: σθ,k = 15o/(2 ln 2)0.5 and σω,m = 0.5Bm/(2 ln 2)0.5.

    The mean and standard deviation of the filter output are logarithmically scaled to obtain two numerical features (ei and di, respectively) for each channel i. In addition to these 2×30 = 60 features, the HTD includes also the mean intensity fDC and the standard deviation fSD of the image: HTD = [fDCfSDe1, ..., e30d1, ..., d30].

    Edge Histogram Descriptor

    The edge histogram descriptor resembles the colour layout descriptor (CLD) in its principle of capturing the spatial distribution of edges which is useful in image matching even if the texture itself is not homogeneous. An image is partitioned into 4×4 = 16 sub-images, and 5-bin local edge histograms are computed for these sub-images, each histogram representing five broad categories of vertical, horizontal, 45o-diagonal, 135o-diagonal, and isotropic (non-orientation specific) edges. The resulting scale-invariant descriptor is of size 240 bits, i.e.16×5 = 80 bins and supports both rotation-sensitive and rotation-invariant matching (Manjunath e.a., 2001Sikora, 2001).

    The edge histograms are computed by subdividing each of the 16 sub-images into a fixed number of blocks. The size of these blocks depends on the image size and is assumed to be a poer of 2. To have the constant number of blocks per sub-image, their sizes are scaled in accord with the original image dimensions. Each block is then treated as 2×2 pixel image (by averaging each of the 2×2 partitions), and a simple edge detector is applied to these average values. The detector consists of four directional filters and one isotropic filter:

    Five edge strengths, one for each of the five filters, are computed for each image block. If the maximum of these strengths exceeds a certain preset threshold, the corresponding image block is an edge block contributing to the edge histogram bins. The bin values are normalised to the range [0.0, 1.0] and non-linearly quantised into 3 bits per bin.

    References

    • V. Castelli and L. D. Bergman (Eds.). Image Databases: Search and Retrieval of Digital Imagery. Wiley: New York, 2002.
    • G.L.Gimel'farb and A.K.Jain. On retrieving textured images from an image data base. Pattern Recognition, vol. 29, no. 9, 1996, 1441 - 1483.
    • A.Hanjalic, G. C. Langelaar, P. M. B. van Roosmalen, J. Biemond, and R. Lagendijk. Image and Video Data Bases: Restoration, Watermarking and Retrieval. Elsevier Science: Amsterdam, 2000.
    • T. S. Lee. Image representation using 2D Gabor wavelets. IEEE Transactions on Pattern Analysis and Machine Intelligence, vol. 18, no. 10, 1996, 959 - 971.
    • F. Liu and R. W. Picard. Periodicity, directionality, and randomness: Wold features for image modeling and retrieval. IEEE Transactions on Pattern Analysis and Machine Intelligence, vol. 18, no. 7, 1996, 722 - 733.
    • B. S. Manjunath, J.-R. Ohm, V. V. Vasudevan, and A. Yamada. Color and texture descriptors. IEEE Transactions on Circuits and Systems for Video Technology, vol. 11, no. 6, 2001, 703 - 715.
    • B. S. Manjunath and W. Y. Ma. Texture features for browsing and retrieval of image data. IEEE Transactions on Pattern Analysis and Machine Intelligence, vol. 18, no. 8, 1996, 837 - 842.
    • J. M. Martinez, Ed. MPEG-7 Overview. ISO/IEC JTC1/SC29/WG11 No. 6828 (2004). On-line. http://www.chiariglione.org/mpeg/standards/mpeg-7/mpeg-7.htm#E12E27
    • S. M. Rahman (Ed.). Interactive Multimedia Systems. IRM Press: Hershey, 2002.
    • T. K. ShihDistributed Multimedia Databases: Techniques & Applications. Idea Group Publishing: Hershey, 2002.
    • T. Sikora. The MPEG-7 visual standard for content description - an overview. IEEE Transactions on Circuits and Systems for Video Technology, vol. 11, no. 6, 2001, 696 - 702.
    • A. W. M. Smeulders and R. Jain (Eds.). Image Databases and Multimedia Search. World Scientific: Singapore, 1997.
    • H. Tamura, S. Mori, and T. Yamawaki. Texture features corresponding to visual perception. IEEE Transactions on Systems, Man, and Cybernetics, vol. SMC-8, no. 6, 1978, 460 - 473.
    • M. Vajihollahi and R. Farahbod. The MPEG-7: Visual Standard for Content Description, 2002. On-line: http://www.cs.sfu.ca/CC/820/li/material/presentations/paper8.ppt
    Return to the local table of contents

    Return to the general table of contents

from: https://www.cs.auckland.ac.nz/courses/compsci708s1c/lectures/Glect-html/topic4c708FSC.htm
  • 0
    点赞
  • 4
    收藏
    觉得还不错? 一键收藏
  • 0
    评论
[目录] 1 前言 1 2 图像检索的基本原理 6 3 基于纹理特征图像检索算法研究 16 4 图像检索系统设计与实现 30 5 总结与展望 43 致谢 44 参考文献 45 [原文] 随着互联网技术向宽带、高速、多媒体方向的发展,人类正快速进入一个信息化的时代。各种信息工具、技术、载体等应运而生。在众多类型的信息资源中,图像具有直观、形象、易于理解和信息量大等特点,成为资源库的重要组成部分。同网络信息一样,由于图像数量巨大,种类繁多,加之排列方式错综复杂,这给图像检索带来了困难。近年来,基于内容图像检索技术有了长足的发展。基于内容图像检索能有效的对图像进行管理和检索,这项技术既充分体现了图像的信息特点,又充分结合了传统数据库技术,它的应用对解决信息膨胀,有效快速地利用多媒体信息有很好的实用价值。 图像的内容包括图像的颜色、纹理、形状等视觉特征和语义特征。其中,纹理特征作为最为显著的视觉特征之一,它是一种不依赖于颜色或亮度反映图像中同质现象的视觉特征。纹理特征包含了物体表面结构组织排列的重要信息,以及与周围环境的联系。因此在基于内容图像检索中得到了广泛应用。 1.1 图像检索的发展现状 自90年代以来,基于内容图像检索已经成为了一个非常活跃的研究领域。从目前的研究现状来看,基于内容图像检索......... [摘要] 随着计算机技术和网络技术的发展,以及多媒体的推广应用,产生了大量的各式各样的图像。如何有效地对这些图像进行分析、存储和检索是一个急待解决的问题。基于内容图像检索技术能有效地解决这一问题,成为研究的重点。图像检索的研究目的就是实现自动化、智能化的图像查询和管理方式,使查询者可以实现方便、快速、准确地查找。纹理是图像的一个主要视觉特征,也是基于内容图像检索系统中的一个重要手段。本文对基于纹理特征图像检索技术进行了研究,并通过实验验证了检索算法的有效性。 图像的特征提取是图像检索的关键技术之一。本文将灰度共生矩阵用于图像的纹理特征提取。 设计并实现了一个基于纹理特征图像检索系统。给出了系统的流程图,并介绍了系统的查询模块、特征提取模块、匹配模块和图像显示模块及其各个模块的功能。系统采用欧氏距离法作为图像的相似性度量,采用灰度共生矩阵算法提取图像的纹理特征。最后通过实验对给定的图像进行检索。 [参考文献] [1]章毓晋,基于内容的视觉信息检索.北京科学出版社 [2]吴健康.数字图像分析.人民邮电出版社 [3]赵荣椿.数字图象处理导论.西北工业大学出版社 [4]吕维雪,医学图象处理.上海高等教育出版社 [5]程兵,王莹,郑南宁.基于Markov随机场和FRAME模型的无监督图像分割.中国科学E辑(技术科学)2004.34(4):391-400 [6]罗坛,章毓晋,高永英.基于分析的图像有意义区域提取.计算机学报,2000,23(12):1313—1318 [7]庄越挺,潘云鹤.基于内容图像检索综述.模式识别与人工智能,1999,12(2):170—172 [8]贾永红.计算机图像处理与分析.武汉:武汉大学出版社,2001. [9]姚敏等,数字图像处理.北京:机械工业出版社,2006,l:205—206. [10](美)崔金泰著,小波分析导论,程正兴译,西安交通大学出版社,1995 [11]夏良正.数字图像处理.东南大学出版社,1999 [12]章毓晋.图像工程(上册).北京:清华大学出版社,2000 [13]何斌等.VC++数字图像处理.第2版.北京:人民邮电出版社,2002 [14]李向阳,庄越挺,潘云鹤.基于内容图像检索技术与系统.计算机研究与发展, [15]庄越挺,潘云鹤,吴飞编著.网上多媒体信息分析与检索.北京:清华大学出版社,2002 [16]王李冬,邰晓英,巴特尔.基于小波变换纹理分析的医学图像检索[J].中国医疗器械杂志,2006,30(2):102—103. [17]荆延国,一个基于图像中语义对象的图文双向查阅系统的设计与实现.[大连海事大学硕士学位论文].2000:4-6 [18]王上飞,陈恩红,汪祖媛等.基于支持向量机的图像情感语义注释和检索算法的研究.模式识别与人工智能,2004,17(1):27—33
### 回答1: 深度学习图像检索CBIR)是指使用深度学习方法来实现从图像库中快速、准确地检索到与查询图像相似的图像。在过去的十年中,CBIR领域取得了显著的进展和突破,成为计算机视觉领域的研究热点之一。 随着深度学习的迅猛发展和图像数据的快速增长,CBIR在图像搜索、目标识别、智能推荐等领域得到了广泛应用。在CBIR的发展过程中,主要涉及到以下几个方面的研究和进展。 首先,深度学习网络的设计和优化成为CBIR技术快速发展的核心。传统的CBIR方法主要依赖手工提取的特征,但深度学习可以自动从图像中学习逐层抽象的特征表示,使得图像检索更加准确和有效。卷积神经网络(CNN)在图像特征提取方面取得了重大突破,并且通过不断改进网络结构、使用更加复杂的模型(如残差网络和注意力机制)取得了更好的检索性能。 其次,大规模数据集和深度学习模型的训练为CBIR的效果提供了更好的基础。通过在大规模图像数据集上进行训练,深度学习模型可以学习到更加丰富和泛化的特征表示,从而提高检索的准确性。而且,使用预训练的模型和迁移学习的方法可以减少数据需求和训练时间,加速CBIR系统的搭建。 另外,多模态深度学习的应用也为CBIR的发展带来了新的机遇。将图像和其他类型的多媒体信息(如文本、音频)融合在一起,可以更全面地描述图像,提高检索的效果。多模态深度学习方法的研究已经成为CBIR领域的热点之一。 最后,深度学习图像检索在实际应用中还面临一些挑战,如大规模图像库的索引和检索速度、图像语义理解、模型可解释性等。解决这些挑战需要进一步开展深入研究和探索。 总之,十年来,深度学习图像检索作为一种新兴的方法和技术,已经取得了很大的进展和突破。通过不断改进和发展,CBIR有望在图像搜索和识别等领域发挥更加重要的作用。 ### 回答2: 深度学习图像检索(Content-Based Image Retrieval,CBIR)是指利用深度学习算法进行图像检索的技术。在过去的十年中,CBIR得到了快速的发展和广泛的应用。 首先,随着深度学习算法的不断进步,CBIR的精度得到了极大的提高。深度学习模型可以学习到更高层次的特征表示,其中包括颜色、纹理、形状等多种视觉信息。与传统的手工设计特征相比,深度学习模型具有更好的泛化能力和鲁棒性,可以更好地理解和表示图像内容。 其次,在十年的时间里,大量的深度学习图像检索方法被提出和研究。从基于全局特征的方法到基于局部特征的方法,再到结合全局和局部特征的方法,不断有新的模型和算法被提出。例如,基于卷积神经网络(CNN)的方法在图像识别领域取得了显著的成果,并被广泛应用于图像检索任务。 此外,随着深度学习技术的成熟和硬件的快速发展,CBIR的效率也得到了提高。通过使用GPU并行计算和深度学习模型的优化,检索速度大大加快,从而使得CBIR在实际应用中更具可行性。 最后,CBIR在许多领域中得到了广泛的应用。例如,在医学影像中,CBIR可以帮助医生快速检索出与患者病例相似的影像,辅助诊断和治疗。在商业领域中,CBIR可以帮助用户更快速地找到所需商品,并提供相关的推荐服务。 总而言之,过去十年中,深度学习图像检索在精度、方法、效率和应用方面取得了重要的进展。随着技术的不断演进,相信CBIR将继续在各个领域中发挥重要作用。

“相关推荐”对你有帮助么?

  • 非常没帮助
  • 没帮助
  • 一般
  • 有帮助
  • 非常有帮助
提交
评论
添加红包

请填写红包祝福语或标题

红包个数最小为10个

红包金额最低5元

当前余额3.43前往充值 >
需支付:10.00
成就一亿技术人!
领取后你会自动成为博主和红包主的粉丝 规则
hope_wisdom
发出的红包
实付
使用余额支付
点击重新获取
扫码支付
钱包余额 0

抵扣说明:

1.余额是钱包充值的虚拟货币,按照1:1的比例进行支付金额的抵扣。
2.余额无法直接购买下载,可以购买VIP、付费专栏及课程。

余额充值