Real-time Arm Gesture Recognition in Smart Home Scenarios via Millimeter Wave Sensing 阅读笔记

本文链接：https://blog.csdn.net/Maybemust/article/details/115308735

本文介绍了一种基于毫米波雷达的实时臂部手势识别系统mHomeGes，针对家居环境中的大尺度运动限制、干扰问题及离线处理挑战，提出集中位置-多普勒特征、mGesNet神经网络和HMM-VM投票机制。通过去除干扰、区分用户并实时分割手势，mHomeGes在准确性、场景适应性和抗多用户干扰方面超越现有技术。实验验证了系统的实用性，并公开了一个大规模的毫米波手势数据集。

摘要生成于 C知道，由 DeepSeek-R1 满血版支持，前往体验 >

Real-time Arm Gesture Recognition in Smart Home Scenarios via Millimeter Wave Sensing
HAIPENG LIU, YUHENG WANG, ANFU ZHOU, HANYUE HE, WEI WANG, KUNPENG WANG, PEILIN PAN, YIXUAN LU, LIANG LIU, and HUADONG MA Beijing University of Posts and Telecommunications, China

Challenge

家居环境下的使用场景——范围、干扰、尺度、分割；

(i) limited to large-scale whole-body movements: It is inconvenient to control devices by such movements, especially for the elderly. We find that RadHAR and mmGait are deficient in minor arm
gestures, i.e., with accuracies of 84.24% and 35.68%, respectively, though they can well recognize large whole-body activities (with accuracy over 90%).

前人的工作识别大尺度，所以本文识别arm gesture。那更小一点的控制是不是更好呢？

(ii) Vulnerable to interference in practical home scenarios. While mmGait utilizes a DBscan [9] method to cluster a user from noises, it cannot separate a targeted gesture when performing in a practical home scenario with multi-path reflection, or when other people moving around.

多径环境下信号容易受干扰，值得关注

(iii) Off-line solution: Both RadHAR and mmGait operate in an off-line way, i.e., assuming gesture samples are segmented and prepared in advance. However, in practice, users perform gestures both intermittently and continuously, and thus it is a non-trivial challenge to distill and segment them out accurately and timely.

如何实时做分割而不是提前处理好

Overview

In this paper, we propose mHomeGes, a real-time mmWave arm gesture recognition system for practical smart home-usage.

应用：毫米波雷达做实时连续手臂动作识别——共十个动作

In summary, our key contributions can be summarized as follows,
(i) We propose a concentrated position-doppler profile (CPDP) and also a novel shallow neural network
mGesNet to distill the inherent features underlying different arm gestures, which together enable accurate recognition for fine-grained multi-joint arm gestures.
(ii) We design a novel user discovery algorithm (UDAN) to distill any potential gesturing user while eliminating multipath effect caused by the reflectors in different home scenes, even when the user is close to a wall with strong reflection.
(iii) We design a hidden-Markov model-based voting mechanism (HMM-VM) to segment gestures for continuous user movement, so as to identify user’s gestures at runtime. Besides, HMM-VM is also robust to the influence of non-gestural human activities.
(iv) We prototype mHomeGes on the commodity TI-IWR1443 mmWave sensor. Extensive experiments demonstrate that mHomeGes outperforms state-of-the-art solutions in terms of recognition accuracy, home scenes compatibility, multi-user robustness, and real-time performance. Integrating mHomeGes into a controller of three family appliances, we perform a comparative user study to demonstrate the feasibility of its practical daily usage.
(v) We have collected and publicly archived an extensive mmWave gesture dataset [11] consisting of 22,000 samples from 25 persons, which shall facilitate future studies in mmWave gesture sensing community.

We implement mHomeGes on a commodity mmWave radar and also perform a user study.
regardless of the impact of surrounding movements, concurrent gestures, human physiological conditions, and outer packing materials.

Method

voerview

信号预处理

At first, mHomeGes transfers collected raw mmWave signals reflected from the human skin into reflection point (ref-point) cloud, which depicts how the gesture is performed.

去除干扰

This module separates the ref-points of any potential user from noise via our method (UDAN), and feeds each candidate user in a PDP form to the next module.

干扰有两类：concurrent movements 和 multipath effect；前者被mmGait解决了，后者本文将着重解决

synchrony value
1）使用DBscan->several clusters，point amount超过20的User才认为是真的User。因为当arm不靠近墙壁时，镜面反射容易排除（根据z轴坐标）

图的意思是不是说反射次数越多，radar认为其位置越远，所以越不可能是User。具体算如下：

至于宠物等非人物体运动，则简单的根据高度排除
当user靠近墙，the reflection points of the user and the ghost are density-reachable but not density-connected in their respective regions.
结论是，去除不干净但是不影响识别
UDAN

特征提取

This module first judges whether the captured motion is an arm gesture by Hidden Markov Models (HMMs), and extracts its feature (CPDP) from PDP for further recognition.

Concentrated Position-Doppler Profile

cpdp

雷达收集数据组成点云，用p向量表示；
对收集到的数据做三次快速傅里叶变换（范围，多普勒，角度）和噪声消除，才能获得上面的p向量，具体处理如下：
每一帧的点集PC（一段较短时间）有n个反射点，每个反射点都有一个p向量
固定长度滑窗内的所有帧的点集PC作为position-doppler profile (PDP) Γ特征，这里长度为30（但是上文没有提到一帧有多长？）
计算CPDP

我理解的意思是不是说根据距离r和多普勒d，将对应相等的 $\epsilon$ 相加从而构成新的2维矩阵作为输入。
CPDP can be regarded as the sum in the time dimension of the already “denoised” RD images (denoised by the CFAR method [36] before deriving PDPs [17]).

这一段考虑写的很好

这可能也是为什么能够区分不同人的原因——因为有角度信息就可以区分信号的来源属于谁

手势识别

This module utilizes a voting mechanism (VM) to recognize and respond to user’s gestures timely by accumulatively associating the recognition likelihoods from both mGesNet and HMMs, where the former consumes CPDP, and the latter consumes PDP. In particular, users may perform gestures anywhere so we set several anchor points with equal intervals, i.e., 15cm, to recognize their adjacent occurred gestures.

mGesNet

输入：CPDP
输出： classification class score scr（scr再输入到voting mechanism中做判断）
结构：
在这里插入图片描述
补充：根据anchor position分别训练

HMM-VM

输入：PDP
流程：
在这里插入图片描述

是否有arm gesture发生
由frame f->three observable symbols (i.e., spatial entropy, overall spatial density, and kinetic energy)
utilize three HMMs to transfer the data stream with its result sequence into the ‘status’ possibility vector；
比较Pr和阈值Tr（设为0.9），大于则进入下一步
VM投票判断

In addition, we handle the issue of unequal gesture duration by collecting a lot of data to train the HMMs. From these samples, mHomeGes can learn to tolerate the duration change.

如果是长度差别很大的复杂手势这样处理还有作用吗？
HMM

In the HMM, the state, namely the sort of a user’s gesture, is not directly visible, but the symbol
depends on the state and is observable. Therefore, each state has a probability distribution across the possible symbols.

这个假设不知道是否合理。
其做法就是建立3个隐马尔可夫模型作为gesture的表征，
在这里插入图片描述
VM
两种方法的结果求点积。