2021-09-22

最新推荐文章于 2024-10-04 08:54:00 发布

蓝鲸小镇不临海

最新推荐文章于 2024-10-04 08:54:00 发布

阅读量217

点赞数

文章标签：机器学习

本文链接：https://blog.csdn.net/weixin_44411120/article/details/120417988

版权

This section illustrates the core part of our system, including the data preprocessing module and the transformer-based predictor.

data preprocessing module

Intuitively, Fast Fourier Transform (FFT) is considered to be one of the state-of-the-art tools to analyze time-series signals. However, we want to obtain the relationship between frequency and time in the motion signal to improve the dimensionality of our data to learn more information. The signals obtained by Fourier transform only include the frequency domain, and there is no corresponding time-domain information. Our solution is replacing FFT with Hilbert Transform.
But this again introduces new questions: the prerequisite of the Hilbert transform is a stationary signal, but our motion signal obtained from the IMU sensor does not satisfy this condition, which makes the calculated instantaneous frequency likely have no physical meaning. To obtain an ideal decomposition result, we choose Hilbert-Huang Transform (HHT). Its improved idea is to use EEMD to convert non-stationary signals into stationary signals before using Hilbert Transform. Then we concat the converted data with the original signal into a 12-dimensional signal. Finally, we use a sliding window with a size of 3 seconds and a step size of 1 second to split the signal into the appropriate size for input into our predictor for training and testing.

transformer-based predictor

Although RNN has been viewed as the classic module in a time-series problem, it is a little too simple for our 12-dimensional data. To learn more information from complex signals and achieve high reconstruction precision, we introduce the transformer architecture. Nevertheless, we still face two problems. First of all, when we walk, the movement of the mobile phone is in three-dimensional space. This means that the data contains much richer spatial-temporal information. Our model should be able to reasonably learn useful information from these associations. Secondly, the amount of data of the motion dataset we collect is smaller than that of the conventional dataset, which tests the generalization ability of the model. How to make the model friendly (that is, making effective predictions without collecting data from new users or just collecting less data) to new users has become a challenge.

Solution with Divided Space-Time Attention. To address these problems, we adopt “Divided Space-Time” Attention to replace the standard self-attention. A visualization of the module is given in Fig. X. The TimeSformer Encoder is consists of 3 encoder blocks. Within each block, we first compute temporal attention in each channel. The result of using temporal attention is then fed back for spatial attention computation instead of being passed to the MLP. According to [cite], this space-time factorization is not only more efficient but it also leads to reduce error.
Model generalization. We design our predictor by referring to the state-of-the-art transformer architecture [cite]. To avoid overfitting and reduce the number of parameters, we use a backbone network to extract feature from the acceleration and gyroscope of each axis and their corresponding time-frequency signals. It was followed by the TimeSformer Encoder and an convolutional feature extractor. Finally, the output layer is linear regression.
To make our system perform well for different users, we adopted meta-learning technique to train our model. Model-agnostic meta-learning (MAML)[cite] is one of the state-of-the-art methods of meta-learning, which is a conceptually simple and general algorithm that has been shown to perform well on few-shot learning problems in classification and regression. Given model parameters $\theta$ , MAML aims to adapt to a new task $\tau_{t}$ with SGD:

$\theta_{t}^{\prime}=\theta-\alpha \nabla_{\theta} \mathcal{L}_{\mathcal{T}_{\text {train }(t)}}\left(f_{\theta}\right)$

where $t$ is the task number and α is the learning rate. $T_{train(t)}$ and $T_{test(t)}$ denote the training and test set within task t. The tasks are sampled from a defined $p(\tau_{t})$ . The meta-objective is:
在这里插入图片描述

The model aims to optimize the parameters θ such that with just one SGD step it can adapt to the new task. For the optimization in Eq. 2, this looks as follows:

$\theta_{t}=\theta-\beta \nabla_{\theta} \mathcal{L}_{\mathcal{T}_{\text {test }(t)}}\left(f_{\theta_{t}^{\prime}}\right)$

where β is the meta step size. This gives an algorithm that learns an initialization of $\theta$ that is useful for being adapted to new tasks efficiently with a small number of iterations.

The problem MAML is that the initial model can be trained biased towards some tasks, particularly those sampled in meta-training phase. Such a biased initial model may not be well generalizable to an unseen task that has a large deviation from meta-training tasks, especially when very few examples are available on the new task. We introduce an extension to MAML in our solution, called Inequality-Minimization TAML. The algorithm directly minimizes the inequality of losses by the initial model across a variety of tasks to force the meta-learner to learn a unbiased initial model without over-performing on some particular tasks.
The idea is that the loss of an initial model on each task $T_{i}$ is viewed as an income for that task. Then for the TAML model, its loss inequality over multiple tasks is minimized to make the meta-learner task-agnostic.Formally, consider a batch of sampled tasks ${T_{i}}$ and their losses ${\mathcal{L} _{\tau_{i}}f(\theta )}$ by the initial model $f_{θ}$ , one can compute the inequality measure by $\mathcal{I}_{\mathcal{E}}\left(\left\{\mathcal{L}_{\mathcal{T}_{i}}\left(f_{\theta}\right)\right\}\right)$ as discussed later. Then the initial model parameter θ is meta-learned by minimizing the following objective

$\mathbb{E}_{\mathcal{T}_{i} \sim p(\mathcal{T})}\left[\mathcal{L}_{\mathcal{T}_{i}}\left(f_{\theta_{i}}\right)\right]+\lambda \mathcal{I}_{\mathcal{E}}\left(\left\{\mathcal{L}_{\mathcal{T}_{i}}\left(f_{\theta}\right)\right\}\right)$

The first term is the expected loss by the model $f_{\theta}$ after the update, while the second is the inequality of losses by the initial model $f_{\theta}$ before the update. Both terms are a function of the initial model parameter $\theta$ since $\theta_{i}$ is updated from $\theta$ . About the inequality measure $\mathcal{I}_{\mathcal{E}}$ , we choose the Theil Index, which is derived from redundancy in information theory. Suppose that we have $M$ losses $\left\{\ell_{i} \mid i=1, \cdots, M\right\}$ , then Thiel Index is defined as

$T_{T}=\frac{1}{M} \sum_{i=1}^{M} \frac{\ell_{i}}{\bar{\ell}} \ln \frac{\ell_{i}}{\bar{\ell}}$

Implementation of our sysytem. The training and implementation process of the model is shown in the Fig. First, we collect the volunteers’ data to make the dataset for mate-learning. Considering to improve the generalization ability of our final model to new users’ data, we regard the data of different volunteers in different months as a separate task. In the second step, we use the TAML algorithm introduced above to train the transformer-based model. Then we will get the initialization parameters that are suitable for each task. Finally, when faced with a new user, we only need a small amount of that user’s data to update our model to adpat the new user and improve prediction accuracy obviously.