博主的源码地址:https://github.com/sjy234sjy234/KinectFusion-ios
原论文:"KinectFusion: Real-time dense surface mapping and tracking."
本文主要介绍KinectFusion-ios中的接口调用方法,以及KinectFusion的算法框架。
一、算法调用示例
在上一篇的博文里已经介绍过,算法调用示例的代码主要编写在ViewController.mm文件中。其中,从depth.bin读取深度帧数据流的过程,这是ios研发相关内容,不再展开介绍。这里主要解释一下KinectFusion的调用示例,主要是FusionProcessor类的初始化和调用。
首先,是viewDidLoad方法中FusionProcessor的初始化代码
self.fusionProcessor = [FusionProcessor shareFusionProcessorWithContext: _metalContext];
[self.fusionProcessor setRenderBackColor: {24.0 / 255, 31.0 / 255, 50.0 / 255, 1}];
simd::float4 cube = {-107.080887, -96.241348, -566.015991, 223.474106};
[self.fusionProcessor setupTsdfParameterWithCube: cube];
其中,第1行是类的初始化,第2行设置渲染展示时的背景颜色,第3、4行设置一个立方体(x, y, z, w)。这里解释一下,KinectFusion算法需要初始化一个立方体包围盒,代码中的cube就是一个预先确定的人脸立方体包围盒,这可以通过计算机视觉的方法获得,这里不展开介绍。其中,(x, y, z)表示立方体的一个顶点,w表示立方体的边长,从而确定三维空间中的一个立方体包围盒。
然后,是读取深度数据流后执行的回调方法
- (void)stream:(NSStream *)stream handleEvent:(NSStreamEvent)eventCode {
switch(eventCode) {
case NSStreamEventHasBytesAvailable:
{
//read every frame from depth.bin, which contains one single disparity frame of 640 x 480 x float16,
//we can easily derive depth from disparity: depth = 1.0 / disparity;
int frameLen = PORTRAIT_WIDTH * PORTRAIT_HEIGHT * 2;
uint8_t* buf = new uint8_t[frameLen];
unsigned int len = 0;
len = [(NSInputStream *)stream read:buf maxLength:frameLen];
if(len == frameLen)
{
BOOL isFusionOK = [self.fusionProcessor processDisparityData:buf withIndex:m_fusionFrameIndex withTsdfUpdate: YES];
if(isFusionOK)
{
id<MTLTexture> textureAfterFusion=[self.fusionProcessor getColorTexture];
[self.scanningRenderer render: textureAfterFusion];
m_fusionFrameIndex++;
}
else
{
NSLog(@"Fusion Failed");
}
}
delete buf;
break;
}
default:
if(m_fusionFrameIndex > 0)
{
m_isFusionComplete = YES;
}
}
}
其中,7-10行从数据流读取一帧的深度数据,单帧的大小是480x640x2 byte;13行完成一次KinectFusion的单帧处理的调用;16-17行是将当前重建的模型进行渲染展示;18行更新帧的序列号m_fusionFrameIndex。
下面是重置FusionProcessor的操作,也是app中唯一的一个按键回调,重新进入一次全新的扫描
- (IBAction)onResetScan:(id)sender {
if(m_isFusionComplete)
{
m_fusionFrameIndex = 0;
m_isFusionComplete = NO;
simd::float4 cube = {-107.080887, -96.241348, -566.015991, 223.474106};
[self.fusionProcessor setupTsdfParameterWithCube: cube];
[self setUpStreamForFile: self.streamPath];
}
}
第4行中,帧的序列号m_fusionFrameIndex初始化时必须为0,FusionProcessor会根据m_fusionFrameIndex的值是否为0,自动完成初始化,无需显式调用初始化方法。然后,第6-7行,重新设置新的立方体cube,就可以进入一次全新的扫描。
二、算法框架介绍
这里,从算法实现的角度,画了一个KinectFusion算法框架的流程图如下
其中,圆形框表示数据流,矩形框表示处理模块。可以看到,算法主要由4个处理模块构成:数据准备、TSDF、ICP、MarchingCube。
之前的文章介绍过,项目的FusionProcessor文件夹里包含KinectFusion的源码。其中,所有的4个处理模块都存在其下的FusionComputer子文件夹中。并且FusionComputer下的4个子文件夹一一对应各模块的实现:FuPreProcess、FuTsdfFusioner、FuICPMatrix、FuMarchingCube。最后,FusionProcessor类的文件FusionProcessor.mm则直接放在FusionProcessor文件夹里,实现了上面算法流程图里面的整个过程,将各个模块组织成完整的KinectFusion算法。
需要注意的是,上述的流程图,需要对照源码进行理解,并且不一定十分完备,不要过于深究。
首先,介绍下FusionProcessor类的对外主要接口,有如下两个
- (BOOL) processDisparityData: (uint8_t *)disparityPixelBuffer withIndex: (int) fusionFrameIndex withTsdfUpdate:(BOOL)isTsdfUpdate;
- (BOOL) processDisparityPixelBuffer: (CVPixelBufferRef)disparityPixelBuffer withIndex: (int) fusionFrameIndex withTsdfUpdate:(BOOL)isTsdfUpdate;
其中,第1行就是在前面的算法调用示例中出现过的接口调用,传入每一帧新的深度帧数据byte流,以更新重建三维场景。第3行是用于直接从iphoneX获取的深度数据类型CVPixelBufferRef的实时调用借口。实际上,两者完成的操作是一模一样的,只是传入的数据流格式略有差别,下面是两个方法的实现
- (BOOL) processDisparityData: (uint8_t *)disparityData withIndex: (int) fusionFrameIndex withTsdfUpdate:(BOOL)isTsdfUpdate
{
id<MTLBuffer> inDisparityMapBuffer = [_metalContext.device newBufferWithBytes:disparityData
length:PORTRAIT_WIDTH*PORTRAIT_HEIGHT*2
options:MTLResourceOptionCPUCacheModeDefault];
return [self processFrame: inDisparityMapBuffer withIndex: fusionFrameIndex withTsdfUpdate: isTsdfUpdate];
}
- (BOOL) processDisparityPixelBuffer: (CVPixelBufferRef)disparityPixelBuffer withIndex: (int) fusionFrameIndex withTsdfUpdate:(BOOL)isTsdfUpdate
{
id<MTLBuffer> inDisparityMapBuffer = [_metalContext bufferWithF16PixelBuffer: disparityPixelBuffer];
return [self processFrame: inDisparityMapBuffer withIndex: fusionFrameIndex withTsdfUpdate: isTsdfUpdate];
}
可以看到,两个方法只是做了数据的中转处理,把深度数据统一处理成MTLBuffer数据类型,然后统一调用processFrame私有方法,实现KinectFusion的单帧处理过程。
processFrame是KinectFusion算法框架的核心处理函数,其内容如下
- (BOOL) processFrame: (id<MTLBuffer>) inDisparityMapBuffer withIndex: (int) fusionFrameIndex withTsdfUpdate:(BOOL)isTsdfUpdate;
{
//pre-process
[_fuMeshToTexture drawPoints: m_mCubeExtractPointBuffer normals: m_mCubeExtractNormalBuffer intoColorTexture: m_colorTexture andDepthTexture: m_depthTexture withTransform: m_projectionTransform * m_globalToFrameTransform];
[_fuTextureToDepth compute: m_depthTexture intoTexture: m_preDepthMapPyramid[0] with: m_cameraNDC2Depth];
[_fuDisparityToDepth compute: inDisparityMapBuffer intoDepthMapBuffer: m_currentDepthMapPyramid[0]];
for(int level=1;level<PYRAMID_LEVEL;++level)
{
[_fuPyramidDepthMap compute: m_currentDepthMapPyramid[level - 1] intoDepthMapBuffer: m_currentDepthMapPyramid[level] withLevel: level];
[_fuPyramidDepthMap compute: m_preDepthMapPyramid[level - 1] intoDepthMapBuffer: m_preDepthMapPyramid[level] withLevel: level];
}
for(int level=0;level<PYRAMID_LEVEL;++level)
{
[_fuDepthToVertex compute: m_currentDepthMapPyramid[level] intoVertexMapBuffer: m_currentVertexMapPyramid[level] withLevel: level andIntrinsicUVD2XYZ: m_intrinsicUVD2XYZ[level]];
[_fuVertexToNormal compute: m_currentVertexMapPyramid[level] intoNormalMapBuffer: m_currentNormalMapPyramid[level] withLevel: level];
[_fuDepthToVertex compute: m_preDepthMapPyramid[level] intoVertexMapBuffer: m_preVertexMapPyramid[level] withLevel: level andIntrinsicUVD2XYZ: m_intrinsicUVD2XYZ[level]];
[_fuVertexToNormal compute: m_preVertexMapPyramid[level] intoNormalMapBuffer: m_preNormalMapPyramid[level] withLevel: level];
}
//icp
if(fusionFrameIndex<=0)
{
//first frame, no icp
NSLog(@"first frame, fusion reset");
[self reset];
}
else
{
//icp iteration
BOOL isSolvable=YES;
simd::float3x3 currentF2gRotate;
simd::float3 currentF2gTranslate;
simd::float3x3 preF2gRotate;
simd::float3 preF2gTranslate;
simd::float3x3 preG2fRotate;
simd::float3 preG2fTranslate;
matrix_transform_extract(m_frameToGlobalTransform,currentF2gRotate,currentF2gTranslate);
matrix_transform_extract(m_frameToGlobalTransform, preF2gRotate, preF2gTranslate);
matrix_transform_extract(m_globalToFrameTransform, preG2fRotate, preG2fTranslate);
for(int level=PYRAMID_LEVEL-1;level>=0;--level)
{
uint iteratorNumber=ICPIteratorNumber[level];
for(int it=0;it<iteratorNumber;++it)
{
uint occupiedPixelNumber = [_fuICPPrepareMatrix computeCurrentVMap:m_currentVertexMapPyramid[level] andCurrentNMap:m_currentNormalMapPyramid[level] andPreVMap:m_preVertexMapPyramid[level] andPreNMap:m_preNormalMapPyramid[level] intoLMatrix:m_icpLeftMatrixPyramid[level] andRMatrix:m_icpRightMatrixPyramid[level] withCurrentR:currentF2gRotate andCurrentT:currentF2gTranslate andPreF2gR:preF2gRotate andPreF2gT:preF2gTranslate andPreG2fR:preG2fRotate andPreG2fT:preG2fTranslate andThreshold:m_icpThreshold andIntrinsicXYZ2UVD:m_intrinsicXYZ2UVD[level] withLevel:level];
if(occupiedPixelNumber==0)
{
isSolvable=NO;
}
if(isSolvable)
{
[_fuICPReduceMatrix computeLeftMatrix:m_icpLeftMatrixPyramid[level] andRightmatrix:m_icpRightMatrixPyramid[level] intoLeftReduce:m_icpLeftReduceBuffer andRightReduce:m_icpRightReduceBuffer withLevel:level andOccupiedNumber:occupiedPixelNumber];
float result[6];
isSolvable=matrix_float6x6_solve((float*)m_icpLeftReduceBuffer.contents, (float*)m_icpRightReduceBuffer.contents, result);
if(isSolvable)
{
simd::float3x3 rotateIncrement=matrix_float3x3_rotation(result[0], result[1], result[2]);
simd::float3 translateIncrement={result[3], result[4], result[5]};
currentF2gRotate=rotateIncrement*currentF2gRotate;
currentF2gTranslate=rotateIncrement*currentF2gTranslate+translateIncrement;
}
}
}
if(!isSolvable)
{
break;
}
}
if(isSolvable)
{
matrix_transform_compose(m_frameToGlobalTransform, currentF2gRotate, currentF2gTranslate);
m_globalToFrameTransform=simd::inverse(m_frameToGlobalTransform);
}
else
{
NSLog(@"lost frame");
return NO;
}
}
if(isTsdfUpdate||fusionFrameIndex<=0)
{
//tsdf fusion updater
[_fuTsdfFusioner compute:m_currentDepthMapPyramid[0] intoTsdfVertexBuffer:m_tsdfVertexBuffer withIntrinsicXYZ2UVD:m_intrinsicXYZ2UVD[0] andTsdfParameter:m_tsdfParameter andTransform:m_globalToFrameTransform];
//marching cube
int activeVoxelNumber = [_fuMCubeTraverse compute:m_tsdfVertexBuffer intoActiveVoxelInfo:m_mCubeActiveVoxelInfoBuffer withMCubeParameter:m_mCubeParameter];
if(activeVoxelNumber==0)
{
NSLog(@"alert: no active voxel");
m_mCubeExtractPointBuffer = nil;
m_mCubeExtractNormalBuffer = nil;
return NO;
}
else if(activeVoxelNumber>=m_mCubeParameter.maxActiveNumber)
{
NSLog(@"alert: too mush active voxels");
m_mCubeExtractPointBuffer = nil;
m_mCubeExtractNormalBuffer = nil;
return NO;
}
else
{
void *baseAddress=m_mCubeActiveVoxelInfoBuffer.contents;
ActiveVoxelInfo *activeVoxelInfo=(ActiveVoxelInfo*)baseAddress;
for(int i=1;i<activeVoxelNumber;++i)
{
activeVoxelInfo[i].vertexNumber=activeVoxelInfo[i-1].vertexNumber+activeVoxelInfo[i].vertexNumber;
}
uint totalVertexNumber=activeVoxelInfo[activeVoxelNumber-1].vertexNumber;
m_mCubeExtractPointBuffer = [_metalContext.device newBufferWithLength: 3 * totalVertexNumber * sizeof(float) options:MTLResourceOptionCPUCacheModeDefault];
m_mCubeExtractNormalBuffer = [_metalContext.device newBufferWithLength: 3 * totalVertexNumber * sizeof(float) options:MTLResourceOptionCPUCacheModeDefault];
[_fuMCubeExtract compute: m_mCubeActiveVoxelInfoBuffer andTsdfVertexBuffer: m_tsdfVertexBuffer withActiveVoxelNumber: activeVoxelNumber andTsdfParameter: m_tsdfParameter andMCubeParameter: m_mCubeParameter andOutMCubeExtractPointBufferT: m_mCubeExtractPointBuffer andOutMCubeExtractNormalBuffer: m_mCubeExtractNormalBuffer];
}
}
return YES;
}
博主已经注释出了4个处理模块的代码被调用所在的位置。其中,3-18行是数据准备模块,20-79行是ICP模块,83-84行是TSDF模块,86-114行是MarchingCube模块。
这个方法的代码量较多,不在这里一次性全部展开介绍。后续的文章中,博主会分4个部分,对每一个模块的原理和代码实现做进一步的详细讲解。