the format names describe the storage scheme exactly, NCHW is consecutive in image (HW) then channel, while NHWC is consecutive is point-expansion (C) then image.
example:
“TensorFlow performance and advance topics”
explanation:
gpu - How much faster is NCHW compared to NHWC in TensorFlow/cuDNN? - Stack Overflow
==> outter product multiplication is a better accumulation strategy in general