Test Condition
MCU: STM32G431RB @170MHz
IDE: IAR V9.40
Optimization: -o3
Benchmark
Time consuming
Float32 | Q31 | Q15 | |
---|---|---|---|
Duration Without Calc Mag (us) | 4912 | 1659 | 1001 |
Duration Calc Mag (us) | 5948 | 1943 | 1243 |
Result representation
Float32 | Q31 | Q15 | |
---|---|---|---|
Input DC | 64 | 1024 | 128 |
Output DC | ~128 | ~1024 | ~128 |
Input AC | 1024 | 1024 | 1024 |
Output AC | ~1024 | ~512 | ~512 |
Key points
- 定点RFFT的输出Buffer长度必须是输入的2倍 (浮点没有这个要求),这个没有在源代码里说明,只在官方网页上有一行描述:
Official page
If the input buffer is of length N (fftLenReal), the output buffer must have length 2N since it is containing the conjugate part (except for MVE version where N+2 is enough). The input buffer is modified by this function.
For the RIFFT, the source buffer must have length N+2 since the Nyquist frequency value is needed but conjugate part is ignored. It is not using the packing trick of the float version.
- 关于数据定标,如果输入的数据格式是Q15,则FFT的结果已经不是Q15了,已经放大了(同样见Official page表格)。比如2048点的Q15输入,输出变成了Q4。个人认为这种定标没什么意义,还不如直接给出结果跟原始数据的关系,因为这种系数纯粹是计算过程引入的,并没有什么物理意义,而且还随点数的变化而变化。有可能不管点数多少,输入幅值和输出幅值都是确定的关系,那这种定标就更没意义了。按前面实测表格,定点数的输出DC值与输入DC值接近,而输出AC幅值则约为输入幅值的一半;而浮点格式下输出DC值约为输入的2倍,而输出AC幅值与输入AC幅值接近。
- 官方的取模函数是个大坑,它把实部和虚部求平方和后把这个平方和的结果归一回Q14(右移17位),然后再求开方,这样平方和结果小于17位的输入就都被移成0了。对RFFT来说,虚部接近于零,因此小于9位的数据基本都被移为0了。即使较大的数开方后的结果也变得很小了。初步比较,发现计算过程是平方和右移17位,开方后左移6位,或者说是平方和右移5位再开方。迷之操作。
Test Code
/* USER CODE BEGIN Header */
/**
******************************************************************************
* @file : main.c
* @brief : Main program body
******************************************************************************
* @attention
*
* Copyright (c) 2024 STMicroelectronics.
* All rights reserved.
*
* This software is licensed under terms that can be found in the LICENSE file
* in the root directory of this software component.
* If no LICENSE file comes with this software, it is provided AS-IS.
*
******************************************************************************
*/
/* USER CODE END Header */
/* Includes ------------------------------------------------------------------*/
#include "main.h"
#define GENERATE_INPUT_ONLINE
// #define Q31_TEST
// #define USER_MAG_CALC
/* Private includes ----------------------------------------------------------*/
/* USER CODE BEGIN Includes */
#include <stdbool.h>
#include "arm_math.h"
#ifndef GENERATE_INPUT_ONLINE
#include "iInputData.h"
#endif
#include "arm_const_structs.h"
/* USER CODE END Includes */
/* Private typedef -----------------------------------------------------------*/
/* USER CODE BEGIN PTD */
/* USER CODE END PTD */
/* Private define ------------------------------------------------------------*/
/* USER CODE BEGIN PD */
/* USER CODE END PD */
/* Private macro -------------------------------------------------------------*/
/* USER CODE BEGIN PM */
/* USER CODE END PM */
/* Private variables ---------------------------------------------------------*/
/* USER CODE BEGIN PV */
/* USER CODE END PV */
/* Private function prototypes -----------------------------------------------*/
void SystemClock_Config(void);
static void MX_CORDIC_Init(void);
/* USER CODE BEGIN PFP */
#define N 2048
#define SAMPLE_FREQUENCY 1000
#ifdef FLOAT_TEST
float32_t fInputData[N];
float32_t fOutputData[N+2];
float32_t fMag[N+1];
#elif defined(Q31_TEST)
// arm_rfft_instance_q15 rfftInstance;
#ifdef GENERATE_INPUT_ONLINE
#pragma data_alignment=16
static q31_t iInputData[N];
#endif
#pragma data_alignment=16
q31_t iOutputData[N*2];
q31_t iFFT_Mag[(N>>1)+1];