rbg答：As sigma -> inf the loss approaches L1 (abs) loss. Setting sigma = 3, makes the transition point from quadratic to linear happen at |x| <= 1 / 3**2 (closer to the origin). The reason for doing this is because the RPN bbox regression targets are not normalized by their stdev (unlike in Fast R-CNN), because the statistics of the targets are changing constantly throughout learning. In a future update I may simply replace smooth L1 with (hard) L1 which I believe will likely work as well and be simpler (no sigma, etc.).
rbg答：Smooth L1 loss can be used in cases where you do want to bprop to both inputs (e.g., in a “siamese” network). In the case of Fast R-CNN, we don’t need derivatives for the bbox regression labels, but the layer is more general than its use in Fast R-CNN.