我已经成功实现了一个功能,该功能可以将从环形缓冲区中任意点开始的任意数量的值复制到连续数组中,但是我想使其更加高效。 这是我的代码的最小示例:
#include
#include
#include
#include
using namespace std;
/*Foo: a function*/
void Foo(int * print_array, int print_amount){
/*Simulate overhead*/
this_thread::sleep_for(chrono::microseconds(1000));
int sum = 0;
for (int i = 0; i < print_amount; i++){
sum += print_array[i]; //Linear operation
// cout << print_array[i] << " "; //Uncomment to check if correct funtionality
}
}
/*Example function*/
int main(){
/*Initialze ring buffer*/
int ring_buffer_elements = 32; //A largeish size
int ring_buffer_size = ring_buffer_elements * sizeof(int);
int * ring_buffer = (int *) malloc(ring_buffer_size);
for (int i = 0; i < ring_buffer_elements; i++)
ring_buffer[i] = i; //Fill buffer with ordered numbers
/*Initialze array*/
int array_elements = 16; //A smaller largeish size
int array_size = array_elements * sizeof(int);
int * array = (int *) malloc(array_size);
/*Set reference pointers*/
int * start_pointer = ring_buffer;
int * end_pointer = ring_buffer + ring_buffer_elements;
/*Set moving copy pointer*/
int * copy_pointer = start_pointer;
/*Set "random" amount to be copied at each iteration*/
int copy_amount = 11;
/*Set loop amount to check functionality or run time*/
int loop_amount = 1000; //Set lower if checking functionality
/***WORKING METHOD***/
/*Start timer*/
auto start_time = chrono::high_resolution_clock::now();
/*"Continuous" loop*/
for (int i = 0; i < loop_amount; i++){
/*Copy loop*/
for (int j = 0; j < copy_amount; j++){
array[j] = *copy_pointer; //Copy value from ring buffer
copy_pointer++; //Move pointer
if (copy_pointer >= end_pointer)
copy_pointer = start_pointer; //Reset pointer if reached end of ring buffer
}
Foo(array, copy_amount); //Call a function
}
/*Check run time*/
chrono::duration run_time_ticks = chrono::high_resolution_clock::now() - start_time;
double run_time = run_time_ticks.count();
/*Print result*/
cout << endl << run_time << endl;
/***NAIVE METHOD***/
/*Reset moving pointer*/
copy_pointer = start_pointer;
/*Start timer*/
start_time = chrono::high_resolution_clock::now();
/*"Continuous" loop*/
for (int i = 0; i < loop_amount; i++){
/*Compute how many elements must be copied after reaching end of ring buffer*/
int copy_remainder = copy_pointer + copy_amount - end_pointer; //Ugly pointer arithmetic?
/*Check if we need to loop back or not*/
if (copy_remainder <= 0){
Foo(copy_pointer, copy_amount); //Call function
copy_pointer += copy_amount; //Move pointer
} else {
Foo(copy_pointer, copy_amount-copy_remainder); //Call function with part of values from copy pointer
Foo(start_pointer, copy_remainder); //Call function with remainder of values from start of ring buffer
copy_pointer = start_pointer + copy_remainder; //Move pointer
}
}
/*Check run time*/
run_time_ticks = chrono::high_resolution_clock::now() - start_time;
run_time = run_time_ticks.count();
/*Print result*/
cout << endl << run_time << endl;
/***memcpy METHOD***/
/*Reset moving pointer*/
copy_pointer = start_pointer;
/*Initialize size reference*/
int int_size = (int) sizeof(int);
/*Start timer*/
start_time = chrono::high_resolution_clock::now();
/*"Continuous" loop*/
for (int i = 0; i < loop_amount; i++){
/*Compute how many elements must be copied after reaching end of ring buffer*/
int copy_remainder = copy_pointer + copy_amount - end_pointer; //Ugly pointer arithmetic?
/*Check if we need to loop back or not*/
if (copy_remainder <= 0){
memcpy(array, copy_pointer, copy_amount*int_size); //Use memcpy
copy_pointer += copy_amount; //Move pointer
} else {
memcpy(array, copy_pointer, (copy_amount-copy_remainder)*int_size); //Use memcpy with part of values from copy pointer
memcpy(array+(copy_amount-copy_remainder), start_pointer, copy_remainder*int_size); //Use memcpy wih remainder of values from start of ring buffer
copy_pointer = start_pointer + copy_remainder; //Move pointer
}
/*Call a function*/
Foo(array, copy_amount);
}
/*Check run time*/
run_time_ticks = chrono::high_resolution_clock::now() - start_time;
run_time = run_time_ticks.count();
/*Print result*/
cout << endl << run_time << endl;
}
环形缓冲区用于连续更新音频数据流,因此必须将引入的延迟量保持在最低水平,这就是为什么我要对其进行改进的原因。
我当时认为在WORKING METHOD中复制值是多余的,应该只传递原始的环形缓冲区数据就可以了。 我这样做的天真的方法是使用原始数据进行写入,并且每当数据循环回时,都应再次写入(请参见“原始改进”)。
实际上,在这个最小示例中,这种改进要快几个数量级。 但是,在我的实际应用程序中, Foo被写有硬件缓冲区的函数所取代,并且具有相当大的开销̣̣̣̣̣-最终结果比WORKING METHOD代码慢,这意味着我永远都不要使用它(或者在这种情况下为Foo)多次(每次写入音频数据)。 ( 编辑将模拟开销添加到Foo中,以准确描述此问题)。
因此,我的问题是是否有更快的方法将数据从环形缓冲区复制到单个连续数组?
(此外,环形缓冲区每次写入都不需要回环超过一次:copy_amount始终小于ring_buffer_elements)
谢谢!
编辑按照Passer By的建议,用最少的示例替换了原始代码段。
编辑2根据duong_dajgja的建议,添加了模拟开销和memcpy。 在示例中,memcpy方法和工作方法具有基本相同的性能(后者具有某些优势)。 在我的应用程序中,使用尽可能小的缓冲区时,memcpy比工作方法快大约3-4%。 如此之快,但遗憾的是远非如此。