mosse

最新推荐文章于 2024-03-01 10:48:34 发布

eilot_c

最新推荐文章于 2024-03-01 10:48:34 发布

阅读量310

点赞数

本文链接：https://blog.csdn.net/eilot_c/article/details/106858084

版权

MOSSE

MOSSE(Minimum Output Sum of Squared Error) 是2010年的CVPR，它的全名叫做Visual Object Tracking using Adaptive Correlation Filters。 MOSSE 是第一篇将correlation filter(CF) 引入object tracking 的论文，它也是CSK和KCF/DCF等算法的基础。

CF(相关滤波)

相关一般分为自相关和互相关，这里我们一般指的是互相关，假设我们有两个信号f和g

f∗表示f的共轭，互相关的直接解释就是衡量两个信号在某个时刻τ时的相似程度。
假设f和g的形状一样，那么一定是f和g对齐的时候二者的相似程度最大，此时达到最大的输出响应，如下图所示：

卷积计算和相关计算的关系

Two-dimensional correlation is equivalent to two-dimensional convolution with the filter matrix rotated 180 degrees.

论文解读

将CF应用在tracking方面最基本的思想就是，设计一个滤波模板，使得该模板与跟踪目标的ROI做卷积运算，得到最大的输出响应。

g表示输出响应
f表示输入原始图片的灰度图像
h表示滤波模板
为了简化计算，将时域的卷积转化为频域的点乘积。
时域公式表示：频域公式表示：所以目标H的计算为：在跟踪的光照等其他因素的影响下，为了提高滤波模板的鲁棒性，在文章中作者对GroundTruth进行随机仿射变换得到一系列的训练样本fi，gi是由高斯函数产生的并且其峰值位置是在fi的中心,我们同时考虑m帧作为参考，这就是MOSSE模型的思想，最终该模型的目标函数表示为：将目标函数最小化，对上式在频域进行求导（复数域不同于实数域），得到：在跟踪过程中，我们只需要将以上模板与当前帧与滤波模板做相关操作，在输出响应中找到最大值的位置，该位置就是目标在当前帧中的位置。本文的参数更新的策略为：其中，η是一个超参数，为经验值。

缺点：

输入的特征为单通道灰度图像，特征表达能力有限
没有尺度更新，对于尺度变化的跟踪目标不敏感

代码解析

这里面主要做的就是，初始帧的输入与输出来求出Ai与Bi，从而求出初始的模板Hi，下面将初始的Hi与当前帧所在的上个位置进行卷积，频域也就是进行相乘。然后找到最大值的位置也就是当前目标的中心，由于宽高不变，所以在此基础上更新宽高就可以了，实现目标跟踪。

import numpy as np
import cv2
import os
from utils import linear_mapping, pre_process, random_warp

"""
This module implements the basic correlation filter based tracking algorithm -- MOSSE

Date: 2018-05-28

"""

class mosse:
    def __init__(self, args, img_path):
        # get arguments..
        self.args = args
        self.img_path = img_path
        # get the img lists...
        self.frame_lists = self._get_img_lists(self.img_path)
        self.frame_lists.sort()
    
    # start to do the object tracking...
    def start_tracking(self):
        # get the image of the first frame... (read as gray scale image...)
        init_img = cv2.imread(self.frame_lists[0])
        init_frame = cv2.cvtColor(init_img, cv2.COLOR_BGR2GRAY)
        init_frame = init_frame.astype(np.float32)
        # get the init ground truth.. [x, y, width, height]
        init_gt = cv2.selectROI('demo', init_img, False, False)
        init_gt = np.array(init_gt).astype(np.int64)
        # start to draw the gaussian response...
        response_map = self._get_gauss_response(init_frame, init_gt)
        # start to create the training set ...
        # get the goal..
        print(init_gt)
        g = response_map[init_gt[1]:init_gt[1]+init_gt[3], init_gt[0]:init_gt[0]+init_gt[2]]
        print(g)
        fi = init_frame[init_gt[1]:init_gt[1]+init_gt[3], init_gt[0]:init_gt[0]+init_gt[2]]
        G = np.fft.fft2(g)
        # start to do the pre-training...
        Ai, Bi = self._pre_training(fi, G)

        # start the tracking...
        for idx in range(len(self.frame_lists)):
            current_frame = cv2.imread(self.frame_lists[idx])
            frame_gray = cv2.cvtColor(current_frame, cv2.COLOR_BGR2GRAY)
            frame_gray = frame_gray.astype(np.float32)
            if idx == 0:
                Ai = self.args.lr * Ai
                Bi = self.args.lr * Bi
                pos = init_gt.copy()
                clip_pos = np.array([pos[0], pos[1], pos[0]+pos[2], pos[1]+pos[3]]).astype(np.int64)
            else:
                Hi = Ai / Bi
                fi = frame_gray[clip_pos[1]:clip_pos[3], clip_pos[0]:clip_pos[2]]
                fi = pre_process(cv2.resize(fi, (init_gt[2], init_gt[3])))
                Gi = Hi * np.fft.fft2(fi)
                gi = linear_mapping(np.fft.ifft2(Gi))
                # find the max pos...
                max_value = np.max(gi)
                max_pos = np.where(gi == max_value)
                dy = int(np.mean(max_pos[0]) - gi.shape[0] / 2)
                dx = int(np.mean(max_pos[1]) - gi.shape[1] / 2)
                
                # update the position...
                pos[0] = pos[0] + dx
                pos[1] = pos[1] + dy

                # trying to get the clipped position [xmin, ymin, xmax, ymax]
                clip_pos[0] = np.clip(pos[0], 0, current_frame.shape[1])
                clip_pos[1] = np.clip(pos[1], 0, current_frame.shape[0])
                clip_pos[2] = np.clip(pos[0]+pos[2], 0, current_frame.shape[1])
                clip_pos[3] = np.clip(pos[1]+pos[3], 0, current_frame.shape[0])
                clip_pos = clip_pos.astype(np.int64)

                # get the current fi..
                fi = frame_gray[clip_pos[1]:clip_pos[3], clip_pos[0]:clip_pos[2]]
                fi = pre_process(cv2.resize(fi, (init_gt[2], init_gt[3])))
                # online update...
                Ai = self.args.lr * (G * np.conjugate(np.fft.fft2(fi))) + (1 - self.args.lr) * Ai
                Bi = self.args.lr * (np.fft.fft2(fi) * np.conjugate(np.fft.fft2(fi))) + (1 - self.args.lr) * Bi
            
            # visualize the tracking process...
            cv2.rectangle(current_frame, (pos[0], pos[1]), (pos[0]+pos[2], pos[1]+pos[3]), (255, 0, 0), 2)
            cv2.imshow('demo', current_frame)
            cv2.waitKey(100)
            # if record... save the frames..
            if self.args.record:
                frame_path = 'record_frames/' + self.img_path.split('/')[1] + '/'
                if not os.path.exists(frame_path):
                    os.mkdir(frame_path)
                cv2.imwrite(frame_path + str(idx).zfill(5) + '.png', current_frame)


    # pre train the filter on the first frame...
    def _pre_training(self, init_frame, G):
        height, width = G.shape
        fi = cv2.resize(init_frame, (width, height))
        # pre-process img..
        fi = pre_process(fi)
        Ai = G * np.conjugate(np.fft.fft2(fi))
        Bi = np.fft.fft2(init_frame) * np.conjugate(np.fft.fft2(init_frame))
        for _ in range(self.args.num_pretrain):
            if self.args.rotate:
                fi = pre_process(random_warp(init_frame))
            else:
                fi = pre_process(init_frame)
            Ai = Ai + G * np.conjugate(np.fft.fft2(fi))
            Bi = Bi + np.fft.fft2(fi) * np.conjugate(np.fft.fft2(fi))
        
        return Ai, Bi

    # get the ground-truth gaussian reponse...
    def _get_gauss_response(self, img, gt):
        # get the shape of the image..
        height, width = img.shape
        # get the mesh grid...
        xx, yy = np.meshgrid(np.arange(width), np.arange(height))
        # get the center of the object...
        center_x = gt[0] + 0.5 * gt[2]
        center_y = gt[1] + 0.5 * gt[3]
        # cal the distance...
        dist = (np.square(xx - center_x) + np.square(yy - center_y)) / (2 * self.args.sigma)
        # get the response map...
        response = np.exp(-dist)
        # normalize...
        response = linear_mapping(response)
        return response

    # it will extract the image list 
    def _get_img_lists(self, img_path):
        frame_list = []
        for frame in os.listdir(img_path):
            if os.path.splitext(frame)[1] == '.jpg':
                frame_list.append(os.path.join(img_path, frame)) 
        return frame_list
    
    # it will get the first ground truth of the video..
    def _get_init_ground_truth(self, img_path):
        gt_path = os.path.join(img_path, 'groundtruth.txt')
        with open(gt_path, 'r') as f:
            # just read the first frame...
            line = f.readline()
            gt_pos = line.split(',')

        return [float(element) for element in gt_pos]

参考链接：
http://simtalk.cn/2017/07/03/Object-Tracking/
https://github.com/TianhongDai/mosse-object-tracking

eilot_c

关注

0
点赞
踩
3

收藏

觉得还不错? 一键收藏
0
评论
mosse

MOSSEMOSSE(Minimum Output Sum of Squared Error) 是2010年的CVPR，它的全名叫做Visual Object Tracking using Adaptive Correlation Filters。 MOSSE 是第一篇将correlation filter(CF) 引入object...
复制链接

扫一扫