Round-robin(轮询)是一种常用的、也是最基本的仲裁策略(Arbitration),用在多进一出的mux结构中;其基本思路是 有一个last gnt id记录上一次的仲裁结果,下一次则先从last gnt id +1 处开始查询是否此input是否有申请仲裁,如果有则其获胜,如果其没有申请仲裁,则轮询下一个input,依次类推。
Round-robin是一种非常公平的仲裁策略,可以保证每个input都有相同的概率获取到output的权利。但有些时候,某些request比较紧急,优先级较高,我们需要提高其获取仲裁的速度,这就需要 基于QoS (Quality of Service)的round-robin策略。有些时候,request的length较长,其在 获取Arb之后,会stall Arb pipeline几拍,以便向后级模块传输request全部的信息,如果request的length有长有短,还按照request 的个数进行平均仲裁的话,就会导致从bandwidth方面来看的不公平:长length的input占用的bandwidth比短length的input 大。为此有 weight cnt based的round-robin策略。
对于QoS based的round-robin策略,有两种实现方法,分为round-robin based one level和 round-robin per priority level。两种方法的区别是,前者只有一个统一的cnt 来记录上次获取仲裁的id;而后者是 对每一个priority level 都设置一个 cnt 来记录。
对于前者,常用的策略是,将request的qos value和 input port id 与last_grant_id 的距离 整合成一个 最终的total priority value,根据total priority value的大小决定最终的grant id。
total priority value = (Request priority value << n) + (input_port_number - (port_index – last grant id) ) % input_port_number
其中n 大于等于 ceil( log2(input port number) ),例如,input port number = 4,那么n 就>= 2。可以看出,last grant id 只有一个。
对于round-robin per priority level,比如仲裁支持的priority level为N,则就需要一个长度为N的 last grant id数组 last grant id[N],每一个 last grant id对应一个priority level。可以简单的理解为 N个不带QoS 的round-robin 独立并行进行仲裁。以下是一个 4 input port,priority level N=4 的示意图。
看起来,基于QoS的这两种round-robin方式差别不大,但还是有一些区别的。
- 前者硬件实现比较简单,后者相当于N份仲裁并行,资源消耗比前者大
- 后者round-robin per priority level对各个input port会更加公平一些。甚至,前者在某些特定场景下会导致一些input port 饿死。
以下,还以4进1出的仲裁为例进行分析,假如4个input每个cycle都有申请仲裁,cycle 1 – cycle 5和cycle 7 - 9 所有input的request 的priority 相同,都为0;cycle 6,input 3 的request的priority为1,大于其他input。对于round-robin based one level的方法,会得到 0 1 2 3 0 3 0 1 2 的gnt id 序列;而round-robin per priority level会得到0 1 2 3 0 3 1 2 3的gnt id 序列。
priority value | input 0 | input 1 | input 2 | input 3 | last_gnt_id | last_gnt_id |
cycle 1 | 0 | 0 | 0 | 0 | 3 | 0 |
cycle 2 | 0 | 0 | 0 | 0 | 0 | 1 |
cycle 3 | 0 | 0 | 0 | 0 | 1 | 2 |
cycle 4 | 0 | 0 | 0 | 0 | 2 | 3 |
cycle 5 | 0 | 0 | 0 | 0 | 3 | 0 |
cycle 6 | 0 | 0 | 0 | 1 | 0 | 3 |
cycle 7 | 0 | 0 | 0 | 0 | 3 | 0 |
cycle 8 | 0 | 0 | 0 | 0 | 0 | 1 |
cycle 9 | 0 | 0 | 0 | 0 | 1 | 2 |
priority value | input 0 | input 1 | input 2 | input 3 | last_gnt_id [0] | last_gnt_id [0] | last_gnt_id [1] | last_gnt_id [1] |
cycle 1 | 0 | 0 | 0 | 0 | 3 | 0 |
|
|
cycle 2 | 0 | 0 | 0 | 0 | 0 | 1 |
|
|
cycle 3 | 0 | 0 | 0 | 0 | 1 | 2 |
|
|
cycle 4 | 0 | 0 | 0 | 0 | 2 | 3 |
|
|
cycle 5 | 0 | 0 | 0 | 0 | 3 | 0 |
|
|
cycle 6 | 0 | 0 | 0 | 1 |
|
| 3 | 3 |
cycle 7 | 0 | 0 | 0 | 0 | 0 | 1 |
|
|
cycle 8 | 0 | 0 | 0 | 0 | 1 | 2 |
|
|
cycle 9 | 0 | 0 | 0 | 0 | 2 | 3 |
|
|
以下给出一个 round-robin based one level会导致一些input port 饿死的场景。如下表所示,依然是4个input每个cycle都有申请仲裁,其中 input 1 的request的priority value为 0和1间隔出现,此时,会发现gnt id 序列为 1 2 1 2 1 2 … 也就是说,input 0 和3 一直得不到仲裁,导致饿死。但对于round-robin per priority level,相同的input priority value 序列,得到的gnt id 序列是 1 0 1 1 1 2 1 3。
priority value | input 0 | input 1 | input 2 | input 3 | last_gnt_id | last_gnt_id |
cycle 1 | 0 | 1 | 0 | 0 | 3 | 1 |
cycle 2 | 0 | 0 | 0 | 0 | 1 | 2 |
cycle 3 | 0 | 1 | 0 | 0 | 2 | 1 |
cycle 4 | 0 | 0 | 0 | 0 | 1 | 2 |
cycle 5 | 0 | 1 | 0 | 0 | 2 | 1 |
cycle 6 | 0 | 0 | 0 | 0 | 1 | 2 |
cycle 7 | 0 | 1 | 0 | 0 | 2 | 1 |
cycle 8 | 0 | 0 | 0 | 0 | 1 | 2 |
priority value | input 0 | input 1 | input 2 | input 3 | last_gnt_id [0] | last_gnt_id [0] | last_gnt_id [1] | last_gnt_id [1] |
cycle 1 | 0 | 1 | 0 | 0 |
|
| 3 | 1 |
cycle 2 | 0 | 0 | 0 | 0 | 3 | 0 |
|
|
cycle 3 | 0 | 1 | 0 | 0 |
|
| 1 | 1 |
cycle 4 | 0 | 0 | 0 | 0 | 0 | 1 |
|
|
cycle 5 | 0 | 1 | 0 | 0 |
|
| 1 | 1 |
cycle 6 | 0 | 0 | 0 | 0 | 1 | 2 |
|
|
cycle 7 | 0 | 1 | 0 | 0 |
|
| 1 | 1 |
cycle 8 | 0 | 0 | 0 | 0 | 2 | 3 |
|
|
以下是SystemC Model 代码的实现。其RRArbitration thread中,在while1外,先将各个input的申请仲裁信号(代码中使用deque 是否为空,size() 来表示input 是否有申请仲裁 ) 统一用一个指针数组来命名,方便后续使用下标来访问每一个input。仲裁时,先判断是否至少有1个input申请仲裁,如果全没有申请,则需要等待,代码中为wait push evt;然后 需要判断 仲裁逻辑的下游模块是否存在反压,代码中只添加了注释;最后进行round-robin仲裁,得到gnt id,同时不要忘记将grant input的申请信号拉低(代码中为 pop deque)。一般情况下仲裁都会消耗1拍,故需要wait 1T;假如存在 长length的request需要传输多beat,且需要stall Arbitration pipeline的,那么可以根据需求控制wait 多少拍。
#include <cmath>
#include <deque>
#include <iomanip>
#include <iostream>
#include "systemc.h"
using namespace std;
using namespace sc_core;
const int kInputPortNum = 4;
class TestPlatform : public sc_module {
public:
SC_HAS_PROCESS(TestPlatform);
TestPlatform(const sc_module_name &name)
: sc_module(name), m_period(sc_time(1000, SC_PS)) {
SC_THREAD(PushPeq);
SC_THREAD(RRArbitration);
SC_THREAD(QoSRRArbitration);
SC_THREAD(QoSRRPrioLevelArbitration);
};
~TestPlatform() = default;
private:
void PushPeq();
void RRArbitration(); // normal round-robin
void QoSRRArbitration(); // round-robin based one level
void QoSRRPrioLevelArbitration(); // round-robin per priority level
public:
sc_time m_period;
std::deque<int> m_input[kInputPortNum * 3];
sc_event m_push_evt;
};
void TestPlatform::PushPeq() {
m_input[0].push_back(2);
m_input[1].push_back(1);
m_input[2].push_back(3);
m_input[3].push_back(0);
m_input[4].push_back(2);
m_input[5].push_back(1);
m_input[6].push_back(3);
m_input[7].push_back(0);
m_input[8].push_back(2);
m_input[9].push_back(1);
m_input[10].push_back(3);
m_input[11].push_back(0);
}
void TestPlatform::RRArbitration() {
const int t_arb_num = kInputPortNum;
int t_last_grant_id = t_arb_num - 1;
int t_gnt_id = 0xFFFF;
// unified input signal name
std::deque<int> *t_input[kInputPortNum];
for (int i = 0; i < t_arb_num; ++i) {
t_input[i] = &m_input[i];
}
while (1) {
// check whether have input
bool t_have_input = false;
for (int i = 0; i < t_arb_num; ++i) {
if (t_input[i]->size()) {
t_have_input = true;
break;
}
}
if (!t_have_input) {
wait(m_push_evt);
continue;
}
// check whether there is backpressure downstream
// arb
t_gnt_id = 0xFFFF; // check gnt use
for (int i = 0; i < t_arb_num; ++i) {
t_last_grant_id = (t_last_grant_id + 1) % t_arb_num;
if (t_input[t_last_grant_id]->size()) {
t_gnt_id = t_last_grant_id;
break;
}
}
assert(t_gnt_id != 0xFFFF);
cout << "[" << sc_time_stamp()
<< "] Normal RR arb finish, gnt input id = " << t_gnt_id << endl;
t_input[t_gnt_id]->pop_front();
// if grant input need consume multi beats to send data,
// here can control the period according to the grant id
wait(1 * m_period);
}
}
void TestPlatform::QoSRRArbitration() {
wait(10,SC_NS);
const int t_arb_num = kInputPortNum;
const int t_priority_level = 4;
const int t_qos_low_bit = std::log2(t_arb_num) + 1;
int t_last_grant_id = t_arb_num - 1;
int t_gnt_id = 0xFFFF;
int t_max_priority = -1;
// unified input signal name
std::deque<int> *t_input[kInputPortNum];
for (int i = 0; i < t_arb_num; ++i) {
t_input[i] = &m_input[i + 4];
}
while (1) {
// check whether have input
bool t_have_input = false;
for (int i = 0; i < t_arb_num; ++i) {
if (t_input[i]->size()) {
t_have_input = true;
break;
}
}
if (!t_have_input) {
wait(m_push_evt);
continue;
}
// check whether there is backpressure downstream
// arb
cout << "[" << sc_time_stamp()
<< "] input port id Qos Value input port prio input "
"total prio: (Qos value << "
<< t_qos_low_bit << ") + port prio" << endl;
t_gnt_id = 0xFFFF; // check gnt use
t_max_priority = -1; // clear
for (int i = 0; i < t_arb_num; ++i) {
if (t_input[i]->size()) {
int t_input_port_priority =
(t_arb_num - (i - t_last_grant_id)) % t_arb_num;
int t_qos_id = t_input[i]->front();
// for + symbol priority greater than << / >>, so here must add ()
int t_total_prioity =
(t_qos_id << t_qos_low_bit) + t_input_port_priority;
if (t_total_prioity > t_max_priority) {
t_max_priority = t_total_prioity;
t_gnt_id = i;
}
cout << "[" << sc_time_stamp() << "] " << setw(8) << i << setw(20)
<< t_qos_id << setw(20) << t_input_port_priority << setw(20)
<< t_total_prioity << endl;
}
}
assert(t_gnt_id != 0xFFFF);
cout << "[" << sc_time_stamp()
<< "] QoS based RR arb finish, gnt input id = " << t_gnt_id << endl;
t_input[t_gnt_id]->pop_front();
t_last_grant_id = t_gnt_id;
// if grant input need consume multi beats to send data,
// here can control the period according to the grant id
wait(1 * m_period);
}
}
void TestPlatform::QoSRRPrioLevelArbitration() {
wait(20,SC_NS);
const int t_arb_num = kInputPortNum;
const int t_priority_level = 4;
std::vector<int> t_last_grant_id (t_priority_level, t_arb_num - 1 );
int t_gnt_id = 0xFFFF;
// unified input signal name
std::deque<int> *t_input[kInputPortNum];
for (int i = 0; i < t_arb_num; ++i) {
t_input[i] = &m_input[i + 8];
}
while (1) {
// check whether have input
bool t_have_input = false;
for (int i = 0; i < t_arb_num; ++i) {
if (t_input[i]->size()) {
t_have_input = true;
break;
}
}
if (!t_have_input) {
wait(m_push_evt);
continue;
}
// check whether there is backpressure downstream
// arb
t_gnt_id = 0xFFFF; // check gnt use
for (int level = t_priority_level - 1 ; level >= 0 ; --level) {
for (int i = 0; i < t_arb_num; ++i) {
t_last_grant_id[level] = (t_last_grant_id[level] + 1) % t_arb_num;
if (t_input[t_last_grant_id[level]]->size()) {
int t_qos_id = t_input[t_last_grant_id[level]]->front();
if(t_qos_id == level){
t_gnt_id = t_last_grant_id[level];
break;
}
}
}
if(t_gnt_id != 0xFFFF){
break;
}
}
assert(t_gnt_id != 0xFFFF);
cout << "[" << sc_time_stamp()
<< "] RR per priority level arb finish, gnt input id = " << t_gnt_id
<<" , req Qos value= " << t_input[t_gnt_id]->front() << endl;
t_input[t_gnt_id]->pop_front();
// if grant input need consume multi beats to send data,
// here can control the period according to the grant id
wait(1 * m_period);
}
}
int sc_main(int argc, char **argv) {
TestPlatform *m_platform;
m_platform = new TestPlatform("TestPlatform");
sc_start(30, SC_NS);
return 0;
}
[0 s] Normal RR arb finish, gnt input id = 0
[1 ns] Normal RR arb finish, gnt input id = 1
[2 ns] Normal RR arb finish, gnt input id = 2
[3 ns] Normal RR arb finish, gnt input id = 3
[10 ns] input port id Qos Value input port prio input total prio: (Qos value << 3) + port prio
[10 ns] 0 2 3 19
[10 ns] 1 1 2 10
[10 ns] 2 3 1 25
[10 ns] 3 0 0 0
[10 ns] QoS based RR arb finish, gnt input id = 2
[11 ns] input port id Qos Value input port prio input total prio: (Qos value << 3) + port prio
[11 ns] 0 2 2 18
[11 ns] 1 1 1 9
[11 ns] 3 0 3 3
[11 ns] QoS based RR arb finish, gnt input id = 0
[12 ns] input port id Qos Value input port prio input total prio: (Qos value << 3) + port prio
[12 ns] 1 1 3 11
[12 ns] 3 0 1 1
[12 ns] QoS based RR arb finish, gnt input id = 1
[13 ns] input port id Qos Value input port prio input total prio: (Qos value << 3) + port prio
[13 ns] 3 0 2 2
[13 ns] QoS based RR arb finish, gnt input id = 3
[20 ns] RR per priority level arb finish, gnt input id = 2 req Qos value= 3
[21 ns] RR per priority level arb finish, gnt input id = 0 req Qos value= 2
[22 ns] RR per priority level arb finish, gnt input id = 1 req Qos value= 1
[23 ns] RR per priority level arb finish, gnt input id = 3 req Qos value= 0