Python+OpenCV:K-Means聚类

82 篇文章 20 订阅

Python+OpenCV:K-Means聚类

目标

  • Learn to use cv.kmeans() function in OpenCV for data clustering.

理解参数

输入参数:

  1. samples : It should be of np.float32 data type, and each feature should be put in a single column.
  2. nclusters(K) : Number of clusters required at end.
  3. criteria : It is the iteration termination criteria. When this criteria is satisfied, algorithm iteration stops. Actually, it should be a tuple of 3 parameters. They are `( type, max_iter, epsilon )`:
    1. type of termination criteria. It has 3 flags as below:
      • cv.TERM_CRITERIA_EPS - stop the algorithm iteration if specified accuracy, epsilon, is reached.
      • cv.TERM_CRITERIA_MAX_ITER - stop the algorithm after the specified number of iterations, max_iter.
      • cv.TERM_CRITERIA_EPS + cv.TERM_CRITERIA_MAX_ITER - stop the iteration when any of the above condition is met.
    2. max_iter - An integer specifying maximum number of iterations.
    3. epsilon - Required accuracy.
  4. attempts : Flag to specify the number of times the algorithm is executed using different initial labellings. The algorithm returns the labels that yield the best compactness. This compactness is returned as output.
  5. flags : This flag is used to specify how initial centers are taken. Normally two flags are used for this : 
    cv.KMEANS_PP_CENTERS (Use kmeans++ center initialization by Arthur and Vassilvitskii [Arthur2007].) and cv.KMEANS_RANDOM_CENTERS (Select random initial centers in each attempt.).

输出参数:

  1. compactness : It is the sum of squared distance from each point to their corresponding centers.
  2. labels : This is the label array (same as 'code' in previous article) where each element marked '0', '1'.....
  3. centers : This is array of centers of clusters.

Now we will see how to apply K-Means algorithm with three examples.

Data with Only One Feature

Consider, you have a set of data with only one feature, ie one-dimensional.

For eg, we can take our t-shirt problem where you use only height of people to decide the size of t-shirt.

####################################################################################################
# K-Means聚类(K-Means Clustering)
def lmc_cv_k_means_demo(method):
    """
        函数功能: method:
        0: Data with Only One Feature with K-Means Clustering in OpenCV.
    """

    # 0: Data with Only One Feature with K-Means Clustering in OpenCV.
    if 0 == method:
        x = np.random.randint(25, 100, 25)
        y = np.random.randint(175, 255, 25)
        z = np.hstack((x, y))
        z = z.reshape((50, 1))
        z = np.float32(z)
        pyplot.figure('Data Histogram', figsize=(16, 9))
        pyplot.hist(z, 256, [0, 256])
        pyplot.show()

        # Define criteria = ( type, max_iter = 10 , epsilon = 1.0 )
        criteria = (lmc_cv.TERM_CRITERIA_EPS + lmc_cv.TERM_CRITERIA_MAX_ITER, 10, 1.0)
        # Set flags (Just to avoid line break in the code)
        flags = lmc_cv.KMEANS_RANDOM_CENTERS
        # Apply KMeans
        compactness, labels, centers = lmc_cv.kmeans(z, 2, None, criteria, 10, flags)

        # split the data to different clusters depending on their labels.
        cluster_a = z[labels == 0]
        cluster_b = z[labels == 1]

        # plot 'A' in red, 'B' in blue, 'centers' in yellow
        pyplot.figure('Result', figsize=(16, 9))
        pyplot.hist(cluster_a, 256, [0, 256], color='r')
        pyplot.hist(cluster_b, 256, [0, 256], color='b')
        pyplot.hist(centers, 32, [0, 256], color='y')
        pyplot.show()

Data with Multiple Features

In previous example, we took only height for t-shirt problem. Here, we will take both height and weight, ie two features.

Remember, in previous case, we made our data to a single column vector. Each feature is arranged in a column, while each row corresponds to an input test sample.

For example, in this case, we set a test data of size 50x2, which are heights and weights of 50 people.

First column corresponds to height of all the 50 people and second column corresponds to their weights.

First row contains two elements where first one is the height of first person and second one his weight.

Similarly remaining rows corresponds to heights and weights of other people.

Check image below:

####################################################################################################
# K-Means聚类(K-Means Clustering)
def lmc_cv_k_means_demo(method):
    """
        函数功能: method:
        1: Data with Multiple Features with K-Means Clustering in OpenCV.
    """


    # 1: Data with Multiple Features with K-Means Clustering in OpenCV.
    if 1 == method:
        x = np.random.randint(25, 50, (25, 2))
        y = np.random.randint(60, 85, (25, 2))
        z = np.vstack((x, y))
        # convert to np.float32
        z = np.float32(z)

        # define criteria and apply kmeans()
        criteria = (lmc_cv.TERM_CRITERIA_EPS + lmc_cv.TERM_CRITERIA_MAX_ITER, 10, 1.0)
        ret, label, center = lmc_cv.kmeans(z, 2, None, criteria, 10, lmc_cv.KMEANS_RANDOM_CENTERS)

        # Now separate the data, Note the flatten()
        cluster_a = z[label.ravel() == 0]
        cluster_b = z[label.ravel() == 1]

        # Plot the data
        pyplot.figure('Result', figsize=(16, 9))
        pyplot.scatter(cluster_a[:, 0], cluster_a[:, 1])
        pyplot.scatter(cluster_b[:, 0], cluster_b[:, 1], c='r')
        pyplot.scatter(center[:, 0], center[:, 1], s=80, c='y', marker='s')
        pyplot.xlabel('Height')
        pyplot.ylabel('Weight')
        pyplot.show()

Color Quantization

Color Quantization is the process of reducing number of colors in an image.

One reason to do so is to reduce the memory. Sometimes, some devices may have limitation such that it can produce only limited number of colors.

In those cases also, color quantization is performed. Here we use k-means clustering for color quantization.

There is nothing new to be explained here. There are 3 features, say, R,G,B. So we need to reshape the image to an array of Mx3 size (M is number of pixels in image).

And after the clustering, we apply centroid values (it is also R,G,B) to all pixels, such that resulting image will have specified number of colors.

And again we need to reshape it back to the shape of original image.

Below is the code:

####################################################################################################
# K-Means聚类(K-Means Clustering)
def lmc_cv_k_means_demo(method):
    """
        函数功能: method:
        2: Color Quantization with K-Means Clustering in OpenCV.
    """


    # 2: Color Quantization with K-Means Clustering in OpenCV.
    if 2 == method:
        stacking_images = []
        image_file_name = ['D:/99-Research/TestData/image/Castle01.jpg',
                           'D:/99-Research/TestData/image/Castle02.jpg',
                           'D:/99-Research/TestData/image/Castle03.jpg',
                           'D:/99-Research/TestData/image/Castle04.jpg']
        for i in range(len(image_file_name)):
            image = lmc_cv.imread(image_file_name[i])
            image = lmc_cv.cvtColor(image, lmc_cv.COLOR_BGR2RGB)
            stacking_image = image.copy()
            result_image = image.copy()
            z = image.reshape((-1, 3))
            # convert to np.float32
            z = np.float32(z)
            # define criteria, number of clusters and apply kmeans()
            criteria = (lmc_cv.TERM_CRITERIA_EPS + lmc_cv.TERM_CRITERIA_MAX_ITER, 10, 1.0)
            for clusters_number in range(1, 4):
                ret, label, center = lmc_cv.kmeans(z, 2 ** clusters_number, None, criteria, 10,
                                                   lmc_cv.KMEANS_RANDOM_CENTERS)
                # Now convert back into uint8, and make original image
                center = np.uint8(center)
                res = center[label.flatten()]
                result_image = res.reshape(image.shape)
                # stacking images side-by-side
                stacking_image = np.hstack((stacking_image, result_image))

            # stacking images side-by-side
            stacking_images.append(stacking_image)

        # 显示图像
        for i in range(len(stacking_images)):
            pyplot.figure('Color Quantization with K-Means Clustering %d' % (i + 1))
            pyplot.subplot(1, 1, 1)
            pyplot.imshow(stacking_images[i], 'gray')
            pyplot.title('Color Quantization with K-Means Clustering: k=2 k=4 k=8')
            pyplot.xticks([])
            pyplot.yticks([])
            pyplot.savefig('%02d.png' % (i + 1))
        pyplot.show()

        # 根据用户输入保存图像
        if ord("q") == (lmc_cv.waitKey(0) & 0xFF):
            # 销毁窗口
            pyplot.close('all')
        return

​​​​​​​

  • 1
    点赞
  • 3
    收藏
    觉得还不错? 一键收藏
  • 0
    评论

“相关推荐”对你有帮助么?

  • 非常没帮助
  • 没帮助
  • 一般
  • 有帮助
  • 非常有帮助
提交
评论
添加红包

请填写红包祝福语或标题

红包个数最小为10个

红包金额最低5元

当前余额3.43前往充值 >
需支付:10.00
成就一亿技术人!
领取后你会自动成为博主和红包主的粉丝 规则
hope_wisdom
发出的红包
实付
使用余额支付
点击重新获取
扫码支付
钱包余额 0

抵扣说明:

1.余额是钱包充值的虚拟货币,按照1:1的比例进行支付金额的抵扣。
2.余额无法直接购买下载,可以购买VIP、付费专栏及课程。

余额充值