3
If you look into the original ResNet Paper (http://openaccess.thecvf.com/content_cvpr_2016/papers/He_Deep_Residual_Learning_CVPR_2016_paper.pdf) they use strided convolutions to downsample the image. The main path is downsampled automatically using these strided convolutions as is done in your code. The residual path uses either (a) identity mapping with zero entries added to add no additional parameters or (b) a 1x1 convolution with the same stride parameter.
https://towardsdatascience.com/yolo-v4-optimal-speed-accuracy-for-object-detection-79896ed47b50
#shortcut = nn.Conv2d(in_channels=x.shape[1], out_channels=out.shape[1], kernel_size=(1,1), stride=2)(x)