这个是我前段时间完成的基于turtlebot实现跟随特定颜色的物体实现自主移动的项目,主要目的是为了学习熟悉一下ROS和turtlebot,然后也是基于图像的一个处理方便后续的vslam的应用。运行成功后,turtlebot会跟随相机中的目标实现追踪(自主的转向和移动)。下面来讲一下内部的主要算法:
主要技术和算法
算法主要是追踪模块和控制模块,都是基于ROS所以可以依靠topic进行通讯,非常的方便。
颜色识别和追踪模块
颜色识别和追踪最主要就是反向投影算法和camshift算法。
反向投影算法
OpenCV中直方图反向投影算法实现来自一篇论文《Indexing Via Color Histograms》其作者有两位、是Michael.J.Swain与Dana H. Ballard。论文分为两个部分,前面一部分详细描述了颜色直方图以及通过颜色直方图交叉来实现对象鉴别。可以实现对象背景区分、复杂场景中查找对象、不同光照条件影响等。假设M表示模型直方图数据、I表示图像直方图数据、直方图交叉匹配可以被描述为如下:
其中J表示直方图的范围,即bin的个数。最终得到结果是表示多少个模型颜色像素与图像中的像素相同或者相似,值越大,表示越相似。归一化表示如下:
这种方法对背景像素变换可以保持稳定性、同时对尺度变换也有一定抗干扰作用,但是无法做到尺度不变性特征。基于上述理论,两位作者发现通过该方法可以定位图像中已知物体的位置,它们把这个方法叫做直方图反向投影(Back Projection)。
举个例子:
设有原灰度图像矩阵:
Image=
1 2 3 4
5 6 7 7
9 8 0 1
5 6 7 6
将灰度值划分为如下四个区间:
[0,2] [3,5] [6,7] [8,10]
很容易得到这个图像矩阵的直方图 hist=4 4 6 2
接下来,计算反向投影矩阵(反向投影矩阵的大小和原灰度图像矩阵的大小相同):
原图相中坐标为(0,0)的灰度值为1,1位于区间[0,2]中,区间[0,2]对应的直方图值为4,所以反向投影矩阵中坐标为(0,0)的值记为4。按照这个方法,依次映射,即可得到image的直方图方向投影矩阵为:
back_Projection=
4 4 4 4
4 6 6 6
2 2 4 4
4 6 6 6
正是因为直方图反向投影有这样能力,所以在经典的MeanShift与CAMShift跟踪算法中一直是通过直方图反向投影来实现已知对象物体的定位
这里在放一个别人的实例讲解
https://blog.csdn.net/qq_33221533/article/details/81148673.
camshift
想要理解camshift算法,需要先理解meanshift。opencv中CamShift函数的程序实际上只是比MeanShift程序多出一个调整搜索框大小的步骤。
meanshfit算法又称均值漂移算法,是一个迭代的步骤,即先算出当前点的偏移均值,将该点移动到此偏移均值,然后以此为新的起始点,继续移动,直到满足最终的条件。可以看下图,比较直观:
1.在ROI内计算偏移均值
2.移动ROI圆心点到偏移均值点处(重心)
3.重复上述的过程(计算新的偏移均值,移动)
经过几次迭代,可以使圆的中心和小球密度最大的点(重心)重合,不再移动。
meanshift应用在实际目标追踪上的话和上图过程有一些区别,但是原理是类似的。图像的像素点是横平竖直排列的,像素点之间的距离相同,上图的小球是疏密相间彼此间的距离不一样;图像的像素点各自的值也不一样,上图的小球各自的“重量”默认相等。原理上都是搜索圆心向“重心”移动,在上图中搜索圆的圆心是向小球最多最密的地方移动,在图像中,将图像看成二维的重量分布不均匀的薄木板,搜索圆心向薄木板的“重心”移动。
通过上面已经知道是求得反向投影之后再进行camshift追踪,反向投影是以直方图为参照产生的概率分布图,以人脸追踪为例,越可能是人脸的地方,像素的值就越大越“重”,此时对反向投影使用meanshift,搜索框会向图像的“重心”移动,可以预见搜索框最终会停留在人脸可能性最大的地方。输入下一帧图像,重复上述过程,如果人脸位置改变,则反向投影也会改变,搜索框会移动至新的“重心”,搜索框显示在连续图像上就达成了“人脸追踪”效果。
可以看出追踪效果的质量依赖于反向投影的准确性,如果追踪目标和背景的色调(Hue)相近,则会影响到反向投影,造成追踪效果不佳。
这里在放一个别人的讲解
https://blog.csdn.net/qq_37775990/article/details/88246184.
控制板块(ROS)
控制主要就是用的ROS里面的功能包,只要发送特定的topic就行了(这就是ROS的强大之处)
代码
common.py
'''
Created on Jan 12, 2015
@author: Administrator
'''
#!/usr/bin/env python
'''
This module contais some common routines used by other samples.
'''
import numpy as np
import cv2
import os
from contextlib import contextmanager
import itertools as it
image_extensions = ['.bmp', '.jpg', '.jpeg', '.png', '.tif', '.tiff', '.pbm', '.pgm', '.ppm']
class Bunch(object):
def __init__(self, **kw):
self.__dict__.update(kw)
def __str__(self):
return str(self.__dict__)
def splitfn(fn):
path, fn = os.path.split(fn)
name, ext = os.path.splitext(fn)
return path, name, ext
def anorm2(a):
return (a*a).sum(-1)
def anorm(a):
return np.sqrt( anorm2(a) )
def homotrans(H, x, y):
xs = H[0, 0]*x + H[0, 1]*y + H[0, 2]
ys = H[1, 0]*x + H[1, 1]*y + H[1, 2]
s = H[2, 0]*x + H[2, 1]*y + H[2, 2]
return xs/s, ys/s
def to_rect(a):
a = np.ravel(a)
if len(a) == 2:
a = (0, 0, a[0], a[1])
return np.array(a, np.float64).reshape(2, 2)
def rect2rect_mtx(src, dst):
src, dst = to_rect(src), to_rect(dst)
cx, cy = (dst[1] - dst[0]) / (src[1] - src[0])
tx, ty = dst[0] - src[0] * (cx, cy)
M = np.float64([[ cx, 0, tx],
[ 0, cy, ty],
[ 0, 0, 1]])
return M
def lookat(eye, target, up = (0, 0, 1)):
fwd = np.asarray(target, np.float64) - eye
fwd /= anorm(fwd)
right = np.cross(fwd, up)
right /= anorm(right)
down = np.cross(fwd, right)
R = np.float64([right, down, fwd])
tvec = -np.dot(R, eye)
return R, tvec
def mtx2rvec(R):
w, u, vt = cv2.SVDecomp(R - np.eye(3))
p = vt[0] + u[:,0]*w[0] # same as np.dot(R, vt[0])
c = np.dot(vt[0], p)
s = np.dot(vt[1], p)
axis = np.cross(vt[0], vt[1])
return axis * np.arctan2(s, c)
def draw_str(dst, (x, y), s):
cv2.putText(dst, s, (x+1, y+1), cv2.FONT_HERSHEY_PLAIN, 1.0, (0, 0, 0), thickness = 2, lineType=cv2.CV_AA)
cv2.putText(dst, s, (x, y), cv2.FONT_HERSHEY_PLAIN, 1.0, (255, 255, 255), lineType=cv2.CV_AA)
class Sketcher:
def __init__(self, windowname, dests, colors_func):
self.prev_pt = None
self.windowname = windowname
self.dests = dests
self.colors_func = colors_func
self.dirty = False
self.show()
cv2.setMouseCallback(self.windowname, self.on_mouse)
def show(self):
cv2.imshow(self.windowname, self.dests[0])
def on_mouse(self, event, x, y, flags, param):
pt = (x, y)
if event == cv2.EVENT_LBUTTONDOWN:
self.prev_pt = pt
if self.prev_pt and flags & cv2.EVENT_FLAG_LBUTTON:
for dst, color in zip(self.dests, self.colors_func()):
cv2.line(dst, self.prev_pt, pt, color, 5)
self.dirty = True
self.prev_pt = pt
self.show()
else:
self.prev_pt = None
# palette data from matplotlib/_cm.py
_jet_data = {'red': ((0., 0, 0), (0.35, 0, 0), (0.66, 1, 1), (0.89,1, 1),
(1, 0.5, 0.5)),
'green': ((0., 0, 0), (0.125,0, 0), (0.375,1, 1), (0.64,1, 1),
(0.91,0,0), (1, 0, 0)),
'blue': ((0., 0.5, 0.5), (0.11, 1, 1), (0.34, 1, 1), (0.65,0, 0),
(1, 0, 0))}
cmap_data = { 'jet' : _jet_data }
def make_cmap(name, n=256):
data = cmap_data[name]
xs = np.linspace(0.0, 1.0, n)
channels = []
eps = 1e-6
for ch_name in ['blue', 'green', 'red']:
ch_data = data[ch_name]
xp, yp = [], []
for x, y1, y2 in ch_data:
xp += [x, x+eps]
yp += [y1, y2]
ch = np.interp(xs, xp, yp)
channels.append(ch)
return np.uint8(np.array(channels).T*255)
def nothing(*arg, **kw):
pass
def clock():
return cv2.getTickCount() / cv2.getTickFrequency()
@contextmanager
def Timer(msg):
print msg, '...',
start = clock()
try:
yield
finally:
print "%.2f ms" % ((clock()-start)*1000)
class StatValue:
def __init__(self, smooth_coef = 0.5):
self.value = None
self.smooth_coef = smooth_coef
def update(self, v):
if self.value is None:
self.value = v
else:
c = self.smooth_coef
self.value = c * self.value + (1.0-c) * v
class RectSelector:
def __init__(self, win, callback):
self.win = win
self.callback = callback
cv2.setMouseCallback(win, self.onmouse)
self.drag_start = None
self.drag_rect = None
def onmouse(self, event, x, y, flags, param):
x, y = np.int16([x, y]) # BUG
if event == cv2.EVENT_LBUTTONDOWN:
self.drag_start = (x, y)
if self.drag_start:
if flags & cv2.EVENT_FLAG_LBUTTON:
xo, yo = self.drag_start
x0, y0 = np.minimum([xo, yo], [x, y])
x1, y1 = np.maximum([xo, yo], [x, y])
self.drag_rect = None
if x1-x0 > 0 and y1-y0 > 0:
self.drag_rect = (x0, y0, x1, y1)
else:
rect = self.drag_rect
self.drag_start = None
self.drag_rect = None
if rect:
self.callback(rect)
def draw(self, vis):
if not self.drag_rect:
return False
x0, y0, x1, y1 = self.drag_rect
cv2.rectangle(vis, (x0, y0), (x1, y1), (0, 255, 0), 2)
return True
@property
def dragging(self):
return self.drag_rect is not None
def grouper(n, iterable, fillvalue=None):
'''grouper(3, 'ABCDEFG', 'x') --> ABC DEF Gxx'''
args = [iter(iterable)] * n
return it.izip_longest(fillvalue=fillvalue, *args)
def mosaic(w, imgs):
'''Make a grid from images.
w -- number of grid columns
imgs -- images (must have same size and format)
'''
imgs = iter(imgs)
img0 = imgs.next()
pad = np.zeros_like(img0)
imgs = it.chain([img0], imgs)
rows = grouper(w, imgs, pad)
return np.vstack(map(np.hstack, rows))
def getsize(img):
h, w = img.shape[:2]
return w, h
def mdot(*args):
return reduce(np.dot, args)
def draw_keypoints(vis, keypoints, color = (0, 255, 255)):
for kp in keypoints:
x, y = kp.pt
cv2.circle(vis, (int(x), int(y)), 2, color)
video.py
'''
Created on Jan 12, 2015
@author: Administrator
'''
#!/usr/bin/env python
'''
Video capture sample.
Sample shows how VideoCapture class can be used to acquire video
frames from a camera of a movie file. Also the sample provides
an example of procedural video generation by an object, mimicking
the VideoCapture interface (see Chess class).
'create_capture' is a convinience function for capture creation,
falling back to procedural video in case of error.
Usage:
video.py [--shotdir <shot path>] [source0] [source1] ...'
sourceN is an
- integer number for camera capture
- name of video file
- synth:<params> for procedural video
Synth examples:
synth:bg=../cpp/lena.jpg:noise=0.1
synth:class=chess:bg=../cpp/lena.jpg:noise=0.1:size=640x480
Keys:
ESC - exit
SPACE - save current frame to <shot path> director
'''
import numpy as np
import cv2
from time import clock
from numpy import pi, sin, cos
import common
class VideoSynthBase(object):
def __init__(self, size=None, noise=0.0, bg = None, **params):
self.bg = None
self.frame_size = (640, 480)
# self.frame_size = (640, 480)
if bg is not None:
self.bg = cv2.imread(bg, 1)
h, w = self.bg.shape[:2]
self.frame_size = (w, h)
if size is not None:
w, h = map(int, size.split('x'))
self.frame_size = (w, h)
self.bg = cv2.resize(self.bg, self.frame_size)
self.noise = float(noise)
def render(self, dst):
pass
def read(self, dst=None):
w, h = self.frame_size
if self.bg is None:
buf = np.zeros((h, w, 3), np.uint8)
else:
buf = self.bg.copy()
self.render(buf)
if self.noise > 0.0:
noise = np.zeros((h, w, 3), np.int8)
cv2.randn(noise, np.zeros(3), np.ones(3)*255*self.noise)
buf = cv2.add(buf, noise, dtype=cv2.CV_8UC3)
return True, buf
def isOpened(self):
return True
class Chess(VideoSynthBase):
def __init__(self, **kw):
super(Chess, self).__init__(**kw)
w, h = self.frame_size
self.grid_size = sx, sy = 10, 7
white_quads = []
black_quads = []
for i, j in np.ndindex(sy, sx):
q = [[j, i, 0], [j+1, i, 0], [j+1, i+1, 0], [j, i+1, 0]]
[white_quads, black_quads][(i + j) % 2].append(q)
self.white_quads = np.float32(white_quads)
self.black_quads = np.float32(black_quads)
fx = 0.9
self.K = np.float64([[fx*w, 0, 0.5*(w-1)],
[0, fx*w, 0.5*(h-1)],
[0.0,0.0, 1.0]])
self.dist_coef = np.float64([-0.2, 0.1, 0, 0])
self.t = 0
def draw_quads(self, img, quads, color = (0, 255, 0)):
img_quads = cv2.projectPoints(quads.reshape(-1, 3), self.rvec, self.tvec, self.K, self.dist_coef) [0]
img_quads.shape = quads.shape[:2] + (2,)
for q in img_quads:
cv2.fillConvexPoly(img, np.int32(q*4), color, cv2.CV_AA, shift=2)
def render(self, dst):
t = self.t
self.t += 1.0/30.0
sx, sy = self.grid_size
center = np.array([0.5*sx, 0.5*sy, 0.0])
phi = pi/3 + sin(t*3)*pi/8
c, s = cos(phi), sin(phi)
ofs = np.array([sin(1.2*t), cos(1.8*t), 0]) * sx * 0.2
eye_pos = center + np.array([cos(t)*c, sin(t)*c, s]) * 15.0 + ofs
target_pos = center + ofs
R, self.tvec = common.lookat(eye_pos, target_pos)
self.rvec = common.mtx2rvec(R)
self.draw_quads(dst, self.white_quads, (245, 245, 245))
self.draw_quads(dst, self.black_quads, (10, 10, 10))
classes = dict(chess=Chess)
presets = dict(
empty = 'synth:',
lena = 'synth:bg=../cpp/lena.jpg:noise=0.1',
chess = 'synth:class=chess:bg=../cpp/lena.jpg:noise=0.1:size=640x480'
)
def create_capture(source = 0, fallback = presets['chess']):
'''source: <int> or '<int>|<filename>|synth [:<param_name>=<value> [:...]]'
'''
source = str(source).strip()
chunks = source.split(':')
# hanlde drive letter ('c:', ...)
if len(chunks) > 1 and len(chunks[0]) == 1 and chunks[0].isalpha():
chunks[1] = chunks[0] + ':' + chunks[1]
del chunks[0]
source = chunks[0]
try: source = int(source)
except ValueError: pass
params = dict( s.split('=') for s in chunks[1:] )
cap = None
if source == 'synth':
Class = classes.get(params.get('class', None), VideoSynthBase)
try: cap = Class(**params)
except: pass
else:
cap = cv2.VideoCapture(source)
params["size"] = "320x240"
if 'size' in params:
w, h = map(int, params['size'].split('x'))
cap.set(cv2.cv.CV_CAP_PROP_FRAME_WIDTH, w)
cap.set(cv2.cv.CV_CAP_PROP_FRAME_HEIGHT, h)
if cap is None or not cap.isOpened():
print 'Warning: unable to open video source: ', source
if fallback is not None:
return create_capture(fallback, None)
return cap
if __name__ == '__main__':
import sys
import getopt
print __doc__
args, sources = getopt.getopt(sys.argv[1:], '', 'shotdir=')
args = dict(args)
shotdir = args.get('--shotdir', '.')
if len(sources) == 0:
sources = [ 0 ]
caps = map(create_capture, sources)
shot_idx = 0
while True:
imgs = []
for i, cap in enumerate(caps):
ret, img = cap.read()
imgs.append(img)
cv2.imshow('capture %d' % i, img)
ch = 0xFF & cv2.waitKey(1)
if ch == 27:
break
if ch == ord(' '):
for i, img in enumerate(imgs):
fn = '%s/shot_%d_%03d.bmp' % (shotdir, i, shot_idx)
cv2.imwrite(fn, img)
print fn, 'saved'
shot_idx += 1
cv2.destroyAllWindows()
上面都是Opencv自带的调用摄像头的demo
下面是算法模块和控制模块:
算法模块
test.py
#!/usr/bin/env python
import numpy as np
import cv2
import serial
import rospy
from std_msgs.msg import String
# from picamera.array import PiRGBArray
# from picamera import PiCamera
# import time
# import copy
import video
from matplotlib import pyplot as plt
class App(object):
def __init__(self, color):
self.cam = video.create_capture(0)
# self.cam = PiCamera()
# self.cam.resolution = (640,480)
# self.cam.framerate = 32
# self.rCa = PiRGBArray(self.cam, size=(320,240))
# time.sleep(0.1)
# self.cam.capture(self.rCa, format='bgr')
# self.frame = self.rCa.array
# ret, self.frame = self.cam.read()
cv2.namedWindow('camshift')
if color == 0:
self.roi = cv2.imread( 'lan.jpg' )
self.flag = "Lan"
else :
self.flag = "Hong"
self.roi = cv2.imread('hong.jpg')
self.selection = None
self.tracking_state = 0
self.show_backproj = False
# self.ser = serial.Serial('/dev/ttyAMA0',115200,timeout=0.5)
def start(self):
self.tracking_state = 0
#x, y = np.int16([220, 110]) # BUG
if self.flag == 'Lan':
self.selection = (4, 6, 407, 304)
else:
self.selection = (40, 54, 296, 230)
self.tracking_state = 1
# print "start"
def show_hist(self):
bin_count = self.hist.shape[0]
bin_w = 24
img = np.zeros((256, bin_count*bin_w, 3), np.uint8)
for i in xrange(bin_count):
h = int(self.hist[i])
cv2.rectangle(img, (i*bin_w+2, 255), ((i+1)*bin_w-2, 255-h), (int(180.0*i/bin_count), 255, 255), -1)
img = cv2.cvtColor(img, cv2.COLOR_HSV2BGR)
cv2.imshow('example', img)
def run(self):
roi = self.roi
self.start()
#state=0
while True:
# for frame in self.cam.capture_continuous(self.rCa, format='bgr', use_video_port=True):
ret, self.frame = self.cam.read()
# self.frame = frame.array
vis = self.frame.copy()
# vis = copy.deepcopy(self.frame)
hsv = cv2.cvtColor(self.frame, cv2.COLOR_BGR2HSV)
mask = cv2.inRange(hsv, np.array((0., 60., 32.)), np.array((180., 255., 255.)))
# self.selection = 1
if self.selection:
# x0, y0, x1, y1 = 220, 110, 358, 245
x0, y0, x1, y1 = self.selection
self.track_window = (x0, y0, x1-x0, y1-y0)
# hsv_roi = hsv[y0:y1, x0:x1]
# mask_roi = mask[y0:y1, x0:x1]
hsv_roi = cv2. cvtColor(roi,cv2. COLOR_BGR2HSV)
mask_roi = cv2.inRange(hsv_roi, np.array((0., 60., 32.)), np.array((180., 255., 255.)))
hist = cv2.calcHist( [hsv_roi], [0], mask_roi, [16], [0, 180] )
# hist = cv2.calcHist( [hsv_roi], [0,2],None, [180,256], [0, 180,0 , 255] )
cv2.normalize(hist, hist, 0, 255, cv2.NORM_MINMAX);
self.hist = hist.reshape(-1)
#plt.imshow(hist,interpolation = 'nearest')
#plt.show()
self.show_hist()
vis_roi = vis[y0:y1, x0:x1]
cv2.bitwise_not(vis_roi, vis_roi)
vis[mask == 0] = 0
if self.tracking_state == 1:
self.selection = None
prob = cv2.calcBackProject([hsv], [0], self.hist, [0, 180], 1)
prob &= mask
term_crit = ( cv2.TERM_CRITERIA_EPS | cv2.TERM_CRITERIA_COUNT, 10, 1 )
track_box, self.track_window = cv2.CamShift(prob, self.track_window, term_crit)
# if track_box[0][1] <= 240:
# self.ser.write(str(int(track_box[0][0])-320) + " " + str(int(track_box[0][1])-240))
# print str(int(track_box[0][0])-320) + " " + str(int(track_box[0][1])-240)
if track_box[1][1] <= 1:
self.tracking_state = 0
self.start()
else:
if self.show_backproj:
vis[:] = prob[...,np.newaxis]
try:
cv2.ellipse(vis, track_box, (0, 0, 255), 2)
# print track_box
a = str(track_box[0][0])+" "+str(track_box[0][1])+" "+str(round(track_box[1][0],2))\
+" "+str(round(track_box[1][1],2))+" "+str(round(track_box[2],2))+"\r\n"
#print a
if float(track_box[0][0])-160 >-20 and float(track_box[0][0])-160<20:
print "straight"
talker("straight")
elif float(track_box[0][0])-160>20:
print "right"
talker("right")
elif float(track_box[0][0])-160 <-20:
print "left"
talker("left")
#print state,track_box[0][0]
#state=float(track_box[0][0])
#print str(track_box) + "!"
# self.ser.write(a)
except: print track_box
cv2.imshow('camshift', vis)
ch = 0xFF & cv2.waitKey(5)
if ch == 27:
break
if ch == ord('b'):
self.show_backproj = not self.show_backproj
if ch == ord('r'):
self.tracking_state = 0
self.start()
cv2.destroyAllWindows()
def talker(msg):
pub = rospy.Publisher('chatter', String, queue_size=10)
rospy.init_node('talker', anonymous=True)
rate = rospy.Rate(10) # 10hz
# while not rospy.is_shutdown():
hello_str = msg
rospy.loginfo(hello_str)
pub.publish(hello_str)
rate.sleep()
if __name__ == '__main__':
import sys
try: color = sys.argv[1]
except: color = 1
print __doc__
a = App(color)
a.run()
控制模块
topic_listerner.py
#!/usr/bin/env python
import rospy
from std_msgs.msg import String
from geometry_msgs.msg import Twist
from math import pi
def callback(data):
rospy.loginfo(rospy.get_caller_id() + 'I heard %s', data.data)
#rospy.init_node('color_track', anonymous=False)
rospy.on_shutdown(shutdown)
cmd_vel = rospy.Publisher('cmd_vel_mux/input/navi', Twist, queue_size=1)
rate = 10
r = rospy.Rate(rate)
linear_speed = 0.1
goal_distance = 0.5
linear_duration = goal_distance / linear_speed
if data.data == "left":
angular_speed = 0.75
elif data.data == "right":
angular_speed = -0.75
else:
angular_speed = 0
goal_angle = pi
#angular_duration = goal_angle / angular_speed
move_cmd = Twist()
move_cmd.linear.x = linear_speed
move_cmd.angular.z = angular_speed
cmd_vel.publish(move_cmd)
r.sleep()
# cmd_vel.publish(Twist())
def shutdown(self):
rospy.loginfo("Stopping the robot...")
self.cmd_vel.publish(Twist())
rospy.sleep(1)
def listener():
rospy.init_node('listener', anonymous=True)
rospy.Subscriber('chatter', String, callback)
rospy.spin()
return
if __name__ == '__main__':
listener()