Seems to be as if what you are trying to do is somehow equivalent to clustering algorithms – i.e. to group elements that are close together, forming multiple clusters of elements. Once you have these clusters, the distance within each cluster should be small compared to the rest of the distances, and then you would just arrange blocks for each groups.
You may want to take a look at a nice comparison of clustering algorithms that are already implemented in Python here; whereas some of the clustering algorithms require you to specify in advance how many clusters you are expecting, some others have perhaps easier to control parameters.
With clustering, your algorithm should be:
Compute the clusters
Reorder the elements so that they are ordered by cluster blocks (i.e. [1,1,1,2,2,3,3,3,…])
The diagonal should now theoretically be very low compared to other values
How to order elements within each cluster, or how to order the clusters, is perhaps not well defined enough (as we need a proper definition of what’s “diagonal as possible”), but see if this works for you?