HDLBits学习：CS450/gshare

最新推荐文章于 2024-09-13 08:25:13 发布

启航起航

最新推荐文章于 2024-09-13 08:25:13 发布

阅读量937

点赞数 25

文章标签：学习数据库

本文链接：https://blog.csdn.net/weixin_66492206/article/details/141433962

版权

Branch direction predictor

A branch direction predictor generates taken/not-taken predictions of the direction of conditional branch instructions. It sits near the front of the processor pipeline, and is responsible for directing instruction fetch down the (hopefully) correct program execution path. A branch direction predictor is usually used with a branch target buffer (BTB), where the BTB predicts the target addresses and the direction predictor chooses whether to branch to the target or keep fetching along the fall-through path.

Sometime later in the pipeline (typically at branch execution or retire), the results of executed branch instructions are sent back to the branch predictor to train it to predict more accurately in the future by observing past branch behaviour. There can also be pipeline flushes when there is a mispredicted branch.

Branch direction predictor located in the Fetch stage. The branch predictor makes a prediction using the current pc and history register, with the result of the prediction affecting the next pc value. Training and misprediction requests come from later in the pipeline.

For this exercise, the branch direction predictor is assumed to sit in the fetch stage of a hypothetical processor pipeline shown in the diagram on the right. This exercise builds only the branch direction predictor, indicated by the blue dashed rectangle in the diagram.

The branch direction prediction is a combinational path: The pc register is used to compute the taken/not-taken prediction, which affects the next-pc multiplexer to determine the value of pc in the next cycle.

Conversely, updates to the pattern history table (PHT) and branch history register take effect at the next positive clock edge, as would be expected for state stored in flip-flops.

Gshare predictor

Branch direction predictors are often structured as tables of counters indexed by the program counter and branch history. The table index is a hash of the branch address and history, and tries to give each branch and history combination its own table entry (or at least, reduce the number of collisions). Each table entry contains a two-bit saturating counter to remember the branch direction when the same branch and history pattern executed in the past.

One example of this style of predictor is the gshare predictor[1]. In the gshare algorithm, the branch address (pc) and history bits "share" the table index bits. The basic gshare algorithm computes an N-bit PHT table index by xoring N branch address bits and N global branch history bits together.

The N-bit index is then used to access one entry of a 2N-entry table of two-bit saturating counters. The value of this counter provides the prediction (0 or 1 = not taken, 2 or 3 = taken).

Training indexes the table in a similar way. The training pc and history are used to compute the table index. Then, the two-bit counter at that index is incremented or decremented depending on the actual outcome of the branch.

References

Jump up↑ S. McFarling, "Combining Branch Predictors", WRL Technical Note TN-36, Jun. 1993

Description

Build a gshare branch predictor with 7-bit pc and 7-bit global history, hashed (using xor) into a 7-bit index. This index accesses a 128-entry table of two-bit saturating counters (similar to cs450/counter_2bc). The branch predictor should contain a 7-bit global branch history register (similar to cs450/history_shift).

The branch predictor has two sets of interfaces: One for doing predictions and one for doing training. The prediction interface is used in the processor's Fetch stage to ask the branch predictor for branch direction predictions for the instructions being fetched. Once these branches proceed down the pipeline and are executed, the true outcomes of the branches become known. The branch predictor is then trained using the actual branch direction outcomes.

When a branch prediction is requested (predict_valid = 1) for a given pc, the branch predictor produces the predicted branch direction and state of the branch history register used to make the prediction. The branch history register is then updated (at the next positive clock edge) for the predicted branch.

When training for a branch is requested (train_valid = 1), the branch predictor is told the pc and branch history register value for the branch that is being trained, as well as the actual branch outcome and whether the branch was a misprediction (needing a pipeline flush). Update the pattern history table (PHT) to train the branch predictor to predict this branch more accurately next time. In addition, if the branch being trained is mispredicted, also recover the branch history register to the state immediately after the mispredicting branch completes execution.

If training for a misprediction and a prediction (for a different, younger instruction) occurs in the same cycle, both operations will want to modify the branch history register. When this happens, training takes precedence, because the branch being predicted will be discarded anyway. If training and prediction of the same PHT entry happen at the same time, the prediction sees the PHT state before training because training only modifies the PHT at the next positive clock edge. The following timing diagram shows the timing when training and predicting PHT entry 0 at the same time. The training request at cycle 4 changes the PHT entry state in cycle 5, but the prediction request in cycle 4 outputs the PHT state at cycle 4, without considering the effect of the training request in cycle 4.

Training and predicting using PHT entry 0 at the same time123456789clktrain_validtrain_pc ^ train_history0train_takenpht[0]123predict_validpredict_pc ^ predict_history0predict_takentrainpredictabc

areset is an asynchronous reset that clears the entire PHT to 2b'01 (weakly not-taken). It also clears the global history register to 0.

AI翻译：

分支方向预测器

分支方向预测器生成对条件分支指令方向的已采纳/未采纳的预测。它位于处理器管道的前端附近，负责将指令提取引导到（希望是）正确的程序执行路径。分支方向预测器通常与分支目标缓冲区（BTB）一起使用，其中 BTB 预测目标地址，方向预测器选择是分支到目标还是沿回溯路径继续提取。

在管道的某个时间（通常在分支执行或停用时），执行的分支指令的结果将发送回分支预测器，以训练它通过观察过去的分支行为来更准确地预测未来。当存在错误预测的分支时，也可能会发生管道刷新。

位于 Fetch 阶段的分支方向预测器。分支预测器使用当前 pc 和历史寄存器进行预测，预测结果会影响下一个 pc 值。训练和错误预测请求来自管道的后期。

在本练习中，假设分支方向预测器位于假设处理器管道的提取阶段，如右图所示。本练习仅构建分支方向预测器，由图中的蓝色虚线矩形表示。

分支方向预测是一个组合路径：寄存器用于计算已取/未取的预测，这会影响下一个 pc 多路复用器以确定下一个周期的值。pcpc

相反，对模式历史表（PHT）和分支历史寄存器的更新在下一个正时钟边沿生效，正如存储在触发器中的状态所预期的那样。

Gshare 预测器

分支方向预测器的结构通常为计数器表，这些计数器由程序计数器和分支历史记录编制索引。表索引是分支地址和历史记录的哈希值，并尝试为每个分支和历史记录组合提供其自己的表条目（或者至少减少冲突次数）。每个表条目都包含一个两位饱和计数器，用于在过去执行相同的分支和历史模式时记住分支方向。

这种预测变量的一个示例是 gshare 预测变量[1].在 gshare 算法中，分支地址（）和历史位“共享”表索引位。基本的 gshare 算法通过将 N 个分支地址位和 N 个全局分支历史位放在一起来计算 N 位 PHT 表索引。pc

然后，使用 N 位索引访问 2 中的一个条目N-两位饱和计数器的入口表。此计数器的值提供预测（0 或 1 = 未采用，2 或 3 = 采用）。

训练以类似的方式为表编制索引。训练和历史记录用于计算表索引。然后，该索引处的两位计数器根据分支的实际结果递增或递减。pc

引用

Jump up↑ S. McFarling，“组合分支预测因子”，WRL 技术说明 TN-36,1993 年 6 月

描述

构建一个具有 7 位和 7 位全局历史的 gshare 分支预测器，经过哈希处理（使用 xor）到 7 位索引。此索引访问一个包含 128 个条目的 2 位饱和计数器表（类似于pcCS450/counter_2bc).分支预测器应包含一个 7 位全局分支历史寄存器（类似于CS450/history_shift).

分支预测器有两组接口：一组用于执行预测，另一组用于执行训练。预测接口在处理器的 Fetch 阶段使用，用于请求分支预测器对正在获取的指令进行分支方向预测。一旦这些分支沿着管道进行并被执行，分支的真正结果就变得已知了。然后，使用实际的分支方向结果对分支预测器进行训练。

当请求对给定的分支预测（ = 1）时，分支预测器会生成预测的分支方向和用于进行预测的分支历史寄存器的状态。然后，更新预测分支的分支历史寄存器（在下一个正时钟边沿）。predict_validpc

当请求对分支进行训练（ = 1）时，将告知分支预测器正在训练的分支的分支历史记录寄存器值，以及实际分支结果以及分支是否为误预测（需要管道刷新）。更新模式历史记录表（PHT）以训练分支预测器，以便下次更准确地预测此分支。此外，如果被训练的分支被误判，也要在误预测的分支完成执行后立即将分支历史寄存器恢复到该状态。train_validpc

如果错误预测和预测（针对不同的、较年轻的指令）的训练发生在同一周期中，则这两个操作都将希望修改分支历史寄存器。发生这种情况时，训练优先，因为无论如何，被预测的分支都会被丢弃。如果同一 PHT 条目的训练和预测同时发生，则预测会在训练之前看到 PHT 状态，因为训练只会修改下一个正时钟边沿的 PHT。以下时序图显示了同时训练和预测 PHT 进入 0 时的时序。第 4 周期的训练请求改变了第 5 周期的 PHT 入口状态，但第 4 周期的预测请求输出了第 4 周期的 PHT 状态，而没有考虑第 4 周期中训练请求的影响。

同时使用 PHT 条目 0 进行训练和预测123456789时钟train_validtrain_pc ^ train_history0train_takenPHT[0]123predict_validpredict_pc ^ predict_history0predict_taken火车预测一个bc

areset是一种异步复位，将整个 PHT 清除到 2b'01（弱未取）。它还将全局历史记录寄存器清除为 0。

代码如下：

module top_module(
    input clk,
    input areset,

    input  predict_valid,
    input  [6:0] predict_pc,
    output predict_taken,
    output reg [6:0] predict_history,

    input train_valid,
    input train_taken,
    input train_mispredicted,
    input [6:0] train_history,
    input [6:0] train_pc
);
    reg [1:0] PHT [127:0];
    integer i;
    always@(posedge clk or posedge areset)begin
        if(areset) begin
            predict_history<=0;
            for(i=0;i<128;i++)
                PHT[i]<=2'b01;
        end
        else begin
            if(train_valid&train_mispredicted)
                predict_history<={train_history[5:0],train_taken};
            else if(predict_valid)
                predict_history<={predict_history[5:0],predict_taken};
            
            if(train_valid)begin
                if(train_taken)
                    PHT[train_pc^train_history]<=(PHT[train_pc^train_history]==2'b11)?2'b11:PHT[train_pc^train_history]+1;
            	else
                    PHT[train_pc^train_history]<=(PHT[train_pc^train_history]==2'b00)?2'b00:PHT[train_pc^train_history]-1;
            end
        end
    end
    assign predict_taken=PHT[predict_history^predict_pc][1];
endmodule

代码参考：Cs450/gshare_gshare算法-CSDN博客