数据驱动的3D体素模式用于对象类别识别（Matlab代码实现）

长安程序猿

于 2024-12-16 07:04:46 发布

阅读量883

点赞数 23

文章标签： 3d matlab 开发语言

本文链接：https://blog.csdn.net/Yan_she_He/article/details/144496435

版权

💥💥💞💞欢迎来到本博客❤️❤️💥💥

🏆博主优势：🌞🌞🌞博客内容尽量做到思维缜密，逻辑清晰，为了方便读者。

⛳️座右铭：行百里者，半于九十。

📋📋📋本文目录如下：🎁🎁🎁

目录

⛳️赠与读者

💥1 概述

📚2 运行结果

🎉3 参考文献

🌈4 Matlab代码、数据、文章

⛳️赠与读者

👨‍💻做科研，涉及到一个深在的思想系统，需要科研者逻辑缜密，踏实认真，但是不能只是努力，很多时候借力比努力更重要，然后还要有仰望星空的创新点和启发点。当哲学课上老师问你什么是科学，什么是电的时候，不要觉得这些问题搞笑。哲学是科学之母，哲学就是追究终极问题，寻找那些不言自明只有小孩子会问的但是你却回答不出来的问题。建议读者按目录次序逐一浏览，免得骤然跌入幽暗的迷宫找不到来时的路，它不足为你揭示全部问题的答案，但若能让人胸中升起一朵朵疑云，也未尝不会酿成晚霞斑斓的别一番景致，万一它居然给你带来了一场精神世界的苦雨，那就借机洗刷一下原来存放在那儿的“躺平”上的尘埃吧。

或许，雨过云收，神驰的天地更清朗.......🔎🔎🔎

💥1 概述

摘要：
尽管在将物体识别为图像中的二维边界框方面取得了很大进展，但从单幅图像中检测被遮挡的物体并估计多个物体的三维属性仍然非常具有挑战性。在本文中，我们提出了一种新的对象表示方法，即3D体素模式（3DVP），该方法联合编码了对象的关键属性，包括外观、3D形状、视点、遮挡和截断。我们以数据驱动的方式发现3DVP，并为3DVP字典训练一组专门的检测器。3DVP检测器能够检测具有特定可见性模式的物体，并将元数据从3DVP传输到检测到的物体，如2D分割掩模、3D姿态以及遮挡或截断边界。转移的元数据使我们能够推断出对象之间的遮挡关系，从而提供改进的对象识别结果。实验在KITTI检测基准[17]和室外场景数据集[41]上进行。我们在汽车检测和姿态估计方面取得了最先进的结果，并取得了显著的优势（在KITTI的困难数据中为6%）。我们还验证了我们的方法在从背景中准确分割对象并在3D中定位它们的能力。

Abstract:

Despite the great progress achieved in recognizing objects as 2D bounding boxes in images, it is still very challenging to detect occluded objects and estimate the 3D properties of multiple objects from a single image. In this paper, we propose a novel object representation, 3D Voxel Pattern (3DVP), that jointly encodes the key properties of objects including appearance, 3D shape, viewpoint, occlusion and truncation. We discover 3DVPs in a data-driven way, and train a bank of specialized detectors for a dictionary of 3DVPs. The 3DVP detectors are capable of detecting objects with specific visibility patterns and transferring the meta-data from the 3DVPs to the detected objects, such as 2D segmentation mask, 3D pose as well as occlusion or truncation boundaries. The transferred meta-data allows us to infer the occlusion relationship among objects, which in turn provides improved object recognition results. Experiments are conducted on the KITTI detection benchmark [17] and the outdoor-scene dataset [41]. We improve state-of-the-art results on car detection and pose estimation with notable margins (6% in difficult data of KITTI). We also verify the ability of our method in accurately segmenting objects from the background and localizing them in 3D.

📚2 运行结果

部分代码：

function idx = cluster_2d_occlusion_patterns(cls, data, algorithm, K, pscale)

opt = globals;
pascal_init;

% select the clustering data
cls_ind = find(strcmp(cls, data.classes) == 1);
flag = data.cls_ind == cls_ind & data.difficult == 0 & data.is_pascal == 1;
fprintf('%d %s examples in clustering\n', sum(flag), cls);

% determine the canonical size of the bounding boxes
modelDs = compute_model_size(data.bbox(:,flag));

% compute features
if strcmp(cls, 'bottle') == 1
sbin = 4;
else
sbin = 8;
end
index = find(flag == 1);
X = [];
fprintf('computing features...\n');
for i = 1:numel(index)
ind = index(i);
% read the image
id = data.id{ind};
if data.is_pascal(ind) == 1
filename = sprintf(VOCopts.imgpath, id);
else
filename = [sprintf(path_img_imagenet, cls) '/' id '.JPEG'];
end
I = imread(filename);
if data.is_flip(ind) == 1
I = I(:, end:-1:1, :);
end
% crop image
bbox = data.bbox(:,ind);
gt = [bbox(1) bbox(2) bbox(3)-bbox(1) bbox(4)-bbox(2)];
Is = bbApply('crop', I, gt, 'replicate', modelDs([2 1]));
C = features(double(Is{1}), sbin);
X(:,i) = C(:);
end
fprintf('done\n');

switch algorithm
case 'kmeans'
% kmeans clustering
fprintf('%s 2d kmeans %d\n', cls, K);
opts = struct('maxiters', 1000, 'mindelta', eps, 'verbose', 1);
[center, sse] = vgg_kmeans(X, K, opts);
[idx_kmeans, d] = vgg_nearest_neighbour(X, center);

% construct idx
num = numel(data.imgname);
idx = zeros(num, 1);
idx(flag == 0) = -1;
index_all = find(flag == 1);
for i = 1:K
index = find(idx_kmeans == i);
[~, ind] = min(d(index));
cid = index_all(index(ind));
idx(index_all(index)) = cid;
end
case 'ap'
fprintf('2d AP %f\n', pscale);
fprintf('computing similarity scores...\n');
scores = compute_similarity_2d(X);
fprintf('done\n');

N = size(scores, 1);
M = N*N-N;
s = zeros(M,3); % Make ALL N^2-N similarities
j = 1;
for i = 1:N
for k = [1:i-1,i+1:N]
s(j,1) = i;
s(j,2) = k;
s(j,3) = scores(i,k);
j = j+1;
end
end

p = min(s(:,3)) * pscale;

% clustering
fprintf('Start AP clustering\n');
[idx_ap, netsim, dpsim, expref] = apclustermex(s, p);

fprintf('Number of clusters: %d\n', length(unique(idx_ap)));
fprintf('Fitness (net similarity): %f\n', netsim);

% construct idx
num = numel(data.imgname);
idx = zeros(num, 1);
idx(flag == 0) = -1;
index_all = find(flag == 1);

cids = unique(idx_ap);
K = numel(cids);
for i = 1:K
index = idx_ap == cids(i);
cid = index_all(cids(i));
idx(index_all(index)) = cid;
end
end

function modelDs = compute_model_size(bbox)

% pick mode of aspect ratios
h = bbox(4,:) - bbox(2,:) + 1;
w = bbox(3,:) - bbox(1,:) + 1;
xx = -2:.02:2;
filter = exp(-[-100:100].^2/400);
aspects = hist(log(h./w), xx);
aspects = convn(aspects, filter, 'same');
[~, I] = max(aspects);
aspect = exp(xx(I));

% pick 20 percentile area
areas = sort(h.*w);
area = areas(max(floor(length(areas) * 0.2), 1));
area = max(min(area, 10000), 500);

% pick dimensions
w = sqrt(area/aspect);
h = w*aspect;
modelDs = round([h w]);