Steps:
$
git clone https://github.com/NVIDIA/cutlass.git
GitHub - NVIDIA/cutlass: CUDA Templates for Linear Algebra Subroutines
$ mkdir build && cd build
$ cmake .. -DCUTLASS_NVCC_ARCHS=80 # compiles for NVIDIA Ampere GPU architecture
ERROR:
/usr/include/c++/11/bits/std_function.h:435:145: error: parameter packs not expanded with ‘...’:
fix: use g++10
$ sudo apt install gcc-10 g++-10
$ cmake .. -DCUTLASS_NVCC_ARCHS=80
-DCMAKE_CUDA_HOST_COMPILER=/usr/bin/g++-10
$ make cutlass_profiler -j12