STL sort源码分析

pengpeng02

已于 2024-07-08 14:24:12 修改

阅读量574

点赞数 2

文章标签：排序算法数据结构快速排序

于 2021-05-11 14:35:10 首次发布

本文链接：https://blog.csdn.net/qq_41529799/article/details/116603338

版权

本文分析了STL中sort函数的实现，指出其结合了快速排序、插入排序和堆排序的思想。在大量数据时使用快速排序，基本有序时切换到插入排序，当发现排序效率降低时采用堆排序。通过这种方式，sort在保证效率的同时兼顾了不同场景的性能需求。

摘要由CSDN通过智能技术生成

先大概介绍一下sort的实现吧。sort的底层实际上算是两种甚至三种排序的混合。当对大量数据采用快速排序，当数据基本有序时改用插入排序，并且在发现时间复杂度有向O(n*n)发展时，改用堆排序。

以下代码均来自于devc++ 5.11版本。

我们从sort函数看起。为了方便表述，本文假设__comp为小于。

//stl_algo.h, line 4703
  template<typename _RandomAccessIterator, typename _Compare>
    inline void
    sort(_RandomAccessIterator __first, _RandomAccessIterator __last,
	 _Compare __comp)
    {
      // concept requirements
      __glibcxx_function_requires(_Mutable_RandomAccessIteratorConcept<
	    _RandomAccessIterator>)
      __glibcxx_function_requires(_BinaryPredicateConcept<_Compare,
	    typename iterator_traits<_RandomAccessIterator>::value_type,
	    typename iterator_traits<_RandomAccessIterator>::value_type>)
      __glibcxx_requires_valid_range(__first, __last);

      std::__sort(__first, __last, __gnu_cxx::__ops::__iter_comp_iter(__comp));
    }

可以看到它在调用__sort之前还有一大段代码不知道在干嘛。那我们来看下__glibcxx_function_requires和__glibcxx_requires_valid_range分别是怎么做的吧

//concept_check.h , line 47    
#define __glibcxx_function_requires(...)
//debug.h, line 65 
# define __glibcxx_requires_valid_range(_First,_Last)

你没看错，define后面啥都没有。虽然不太明白这种写法，但大概可以认为那几行代码可以忽略掉。那么就只剩最后的__sort函数调用了。

//stl_algo.h, line 1960
template<typename _RandomAccessIterator, typename _Compare>
inline void
__sort(_RandomAccessIterator __first, _RandomAccessIterator __last,
       _Compare __comp)
{
	if (__first != __last)
	{
		std::__introsort_loop(__first, __last,
		                      std::__lg(__last - __first) * 2,//计算对数，暂时没找到源码。用来控制快排递归的深度，避免时间复杂度退化到 O(n*n) 
		                      __comp); 
		std::__final_insertion_sort(__first, __last, __comp);//调用插入排序 
	}
}

if语句先判断区间的有效性。然后调用__introsort_loop，最后使用插入排序收尾。

我们来看一下__introsort_loop干了什么。

/**
   *  @doctodo
   *  This controls some aspect of the sort routines.
  */
//stl_algo.h, line 1874
enum { _S_threshold = 16 };
//stl_algo.h, line 1937
template<typename _RandomAccessIterator, typename _Size, typename _Compare>
void
__introsort_loop(_RandomAccessIterator __first,
                 _RandomAccessIterator __last,
                 _Size __depth_limit, _Compare __comp)
{
	while (__last - __first > int(_S_threshold))
	{
		if (__depth_limit == 0)
		{
			std::__partial_sort(__first, __last, __last, __comp);//当时间复杂度有退化到O(n*n)倾向时，选用堆排序 
			return;
		}
		--__depth_limit;
		_RandomAccessIterator __cut = std::__unguarded_partition_pivot(__first, __last, __comp);//选出一个枢轴，并将小于的丢左边，大于的丢右边 
		std::__introsort_loop(__cut, __last, __depth_limit, __comp);//递归右半边区域 
		__last = __cut;//在下一次的循环的时候，便可以处理左边区域了 
	}
}

可以很明显的看到这是一个递归的函数。如果不考虑那个if语句，它其实就是我们平常快排的代码。但又和我们习惯的用法不一样，一个是它只递归了右半边，二个是它用了一个循环。

先说下那个while的判断是在干什么。我们可以看出来，在__introsort_loop结束之后，整个序列里会存在多个元素个数小于等于16的子序列。每个子序列都有了相当程度的排序，但又没有被完全排序。这种情况下继续用递归来排序明显不划算，因为递归的开销相对来说太大了。而对于基本有序的序列，插入排序是最合适的，它的时间复杂度可以达到O(n)级别。因此在这里选择中止排序，转而选用插入排序。

我们再说下那个if语句，它就是用来监测快排的效率的。我们知道，快排之所以会退化到O(n*n)，主要就是因为枢轴没选好，导致递归的深度过深，从而时间复杂度退化。而这里在发现递归的深度达到了设定的阈值（2logN）之后，就不允许再递归下去了，而是调用__partial_sort结束递归。这里其实就是堆排序的实现。

//stl_algo.h, line 1925
template<typename _RandomAccessIterator, typename _Compare>
inline void
__partial_sort(_RandomAccessIterator __first,
               _RandomAccessIterator __middle,
               _RandomAccessIterator __last,
               _Compare __comp)
{
	std::__heap_select(__first, __middle, __last, __comp);
	std::__sort_heap(__first, __middle, __comp);
}

尽管__heap_select和__sort_heap的源码展示出来，但看函数名字应该也大概可以才出来是用的堆排序。这里就不继续展开了，因为堆排序的代码比较长，而且我也还没看那部分。

但这里有个问题，为什么不一开始就用堆排序呢？因为虽然两者的时间复杂度相同，且堆排序的空间复杂度低，但就执行效率上，堆排比快排其实要慢得多。这里可以参考知乎的一个讨论。堆排序缺点何在

那为什么是选择了堆排序而不是其他的排序算法呢？个人认为这里是出于空间复杂度的考虑。首先，三个基本的排序算法肯定不考虑，时间复杂度太高了。基数和桶排肯定也不行，适用范围太小了。那只剩下希尔，堆排和归并了。先说为什么没用归并，因为虽然它和堆排的时间复杂度都是很稳定的O(NlogN)，但归并需要递归下去，空间复杂度O(n)，而堆排O(1)。那希尔排序呢？空间复杂度相同，但希尔排序最坏情况O(n*n)。本来选用其他算法就是为了把时间复杂度改好，这万一脸背刚好踩到那最坏情况不是很亏。所以你会发现，那么多排序算法，能选的就只有堆排序。

回到前面的__introsort_loop函数。我们看到它只对右半边进行了递归调用，而没有左半边了。刚开始看到这里我也有点蒙，但仔细想一想发现这样做其实也是可行的。为什么呢，它并不是没有管左半边的了，而是把对左半边的处理放到了下一次循环中，这也是它在这里使用了循环的原因。我们看到在使用__unguarded_partition_pivot函数选出枢轴将原始序列分成两半之后，递归调用处理了右半部分，然后执行了__last = __cut。而cut就是__unguarded_partition_pivot返回的分割点。那么[first, last)区间不就是左半部分了。而因为这里用了循环，所以下一次循环，便可以成功处理左半部分了。

那么，问题来了，为什么这里不直接两次递归呢？原因其实很简单，就是为了提高效率。这里的处理方法做到了在不使用额外的空间开销的情况下，减少近一半的函数调用。而我们知道函数调用也是需要时间的，减少函数调用就意味着节省时间。应该能明白为什么是一半吧，因为每次调用都只递归了一次。

但是，这一优化的效果似乎并不是很明显。我试过将这里改成两个递归调用，结果发现在数组长度比较小的时候，也许比优化的还稍微快一点点。但即使是5e6的长度，也没有很明显的时间差距。不排除是我测试方法有问题。但不管怎么说，这可读性真的有点低。顺便一提，这种牺牲可读性换效率的操作，在STL里似乎还挺多的，起码在sort这里我已经遇到好几次了。

接下来让我们看下__unguarded_partition_pivot是怎么写的吧。

//stl_algo.h, line 1914
template<typename _RandomAccessIterator, typename _Compare>
inline _RandomAccessIterator
__unguarded_partition_pivot(_RandomAccessIterator __first,
                            _RandomAccessIterator __last, _Compare __comp)
{
	_RandomAccessIterator __mid = __first + (__last - __first) / 2;
	std::__move_median_to_first(__first, __first + 1, __mid, __last - 1, __comp);
	return std::__unguarded_partition(__first + 1, __last, __first, __comp);
}

这个函数就是正常的快排里选枢轴的部分了。我们先看__move_median_to_first。

//stl_algo.h, line 76
/// Swaps the median value of *__a, *__b and *__c under __comp to *__result
template<typename _Iterator, typename _Compare>
void
__move_median_to_first(_Iterator __result,_Iterator __a, _Iterator __b,
                       _Iterator __c, _Compare __comp)
{
	if (__comp(__a, __b))// a<b
	{
		if (__comp(__b, __c))// a<b<c
			std::iter_swap(__result, __b);
		else if (__comp(__a, __c))//a<c, b>=c    a<c<=b
			std::iter_swap(__result, __c);
		else//c<=a<b
			std::iter_swap(__result, __a);
	}
	else if (__comp(__a, __c))//a>=b, a<c     b<=a<c
		std::iter_swap(__result, __a);
	else if (__comp(__b, __c))//b<c<=a
		std::iter_swap(__result, __c);
	else//c<=b<=a
		std::iter_swap(__result, __b);
}

尽管这种代码风格我实在有点不习惯，但看懂还是比较容易的。这个函数是用来选取枢轴的。并且为了避免枢轴枢轴没选好导致快排退化，选择了取开头，中间和结尾三个数中中间的那个。这样做在绝大部分情形下都优于平常的直接选开头或结尾的方法。

接下来我们看下__unguarded_partition是怎么写的。

//stl_algo.h, line 1893
template<typename _RandomAccessIterator, typename _Compare>
_RandomAccessIterator
__unguarded_partition(_RandomAccessIterator __first,
                      _RandomAccessIterator __last,
                      _RandomAccessIterator __pivot, _Compare __comp)
{
	while (true)
	{
		while (__comp(__first, __pivot))
			++__first;
		--__last;
		while (__comp(__pivot, __last))
			--__last;
		if (!(__first < __last))
			return __first;
		std::iter_swap(__first, __last);
		++__first;
	}
}

这里也挺容易理解的。和我们平常快排的做法差不多。先让first指向第一个大于枢轴的，然后last指向第一个小于枢轴的。接下来判断如果first和last交错，就返回first。否则交换first和last的值，first向后移一位，再继续循环。有一个问题，这里没有对first和last进行边界检查，这样真的不会有问题吗？答案是肯定的，因为我们选择是首尾中间位置三个值的中间值作为枢轴，因此一定会在超出此有效区域之前中止指针的移动。建议自己手动模拟一边，可以更好的理解这里的可靠性。

程序的可靠性没有怀疑了，新的问题又来了，目的是什么？当然还是可以节省时间，提高效率啊。进行边界检查需要进行判断，而判断就需要时间。减少了没必要的判断，当然就减少了时间消耗。这种为了效率去掉边界检查的代码在后面插入排序还出现过。

终于到了最后的__final_insertion_sort了。

//stl_algo.h, line 1877 
template<typename _RandomAccessIterator, typename _Compare>
void
__final_insertion_sort(_RandomAccessIterator __first,
                       _RandomAccessIterator __last, _Compare __comp)
{
	if (__last - __first > int(_S_threshold))
	{
		std::__insertion_sort(__first, __first + int(_S_threshold), __comp);
		std::__unguarded_insertion_sort(__first + int(_S_threshold), __last,
		                                __comp);
	}
	else
		std::__insertion_sort(__first, __last, __comp);
}

我们可以看到这里被分成了两种情况。当序列长度大于_S_threshold时，先调用__insertion_sort处理前_S_threshold个元素。后面部分选用__unguarded_insertion_sort处理。而长度小于_S_threshold时，直接__insertion_sort处理。

我们先看__insertion_sort是怎么写的。

//stl_algo.h, line 1837
template<typename _RandomAccessIterator, typename _Compare>
void
__insertion_sort(_RandomAccessIterator __first,
                 _RandomAccessIterator __last, _Compare __comp)
{
	if (__first == __last) return;

	for (_RandomAccessIterator __i = __first + 1; __i != __last; ++__i)
	{
		if (__comp(__i, __first))
		{
			typename iterator_traits<_RandomAccessIterator>::value_type
			__val = _GLIBCXX_MOVE(*__i);// __val=*__i
			_GLIBCXX_MOVE_BACKWARD3(__first, __i, __i + 1);//整体后移一位 
			*__first = _GLIBCXX_MOVE(__val);//*__first=__val
		}
		else
			std::__unguarded_linear_insert(__i,//插入排序的内循环。将元素插入到合适的位置。 
			                               __gnu_cxx::__ops::__val_comp_iter(__comp));
	}
}

相信能看出来，这里的for循环插入排序的外循环部分。但这里面是什么意思呢？我们看下if里面调用的那几个函数是怎么写的吧。

//move.h, line 145
#define _GLIBCXX_MOVE(__val) (__val)
//stl_algobase.h, line 683
#define _GLIBCXX_MOVE_BACKWARD3(_Tp, _Up, _Vp) std::copy_backward(_Tp, _Up, _Vp)
//stl_algobase.h, line 610
/**
 *  @brief Copies the range [first,last) into result.
 *  @ingroup mutating_algorithms
 *  @param  __first  A bidirectional iterator.
 *  @param  __last   A bidirectional iterator.
 *  @param  __result A bidirectional iterator.
 *  @return   result - (first - last)
 *
 *  The function has the same effect as copy, but starts at the end of the
 *  range and works its way to the start, returning the start of the result.
 *  This inline function will boil down to a call to @c memmove whenever
 *  possible.  Failing that, if random access iterators are passed, then the
 *  loop count will be known (and therefore a candidate for compiler
 *  optimizations such as unrolling).
 *
 *  Result may not be in the range (first,last].  Use copy instead.  Note
 *  that the start of the output range may overlap [first,last).
*/
template<typename _BI1, typename _BI2>
inline _BI2
copy_backward(_BI1 __first, _BI1 __last, _BI2 __result)
{
	// concept requirements
	__glibcxx_function_requires(_BidirectionalIteratorConcept<_BI1>)
	__glibcxx_function_requires(_Mutable_BidirectionalIteratorConcept<_BI2>)
	__glibcxx_function_requires(_ConvertibleConcept<
	                            typename iterator_traits<_BI1>::value_type,
	                            typename iterator_traits<_BI2>::value_type>)
	__glibcxx_requires_valid_range(__first, __last);

	return (std::__copy_move_backward_a2<__is_move_iterator<_BI1>::__value>
	        (std::__miter_base(__first), std::__miter_base(__last),
	         __result));
}

所以if语句是__i比__last小时，区间整体右移一位，然后将__i赋值给了__first。这样写就可以保证__first一定会是遇到的数值中的最小值。这也是保证后面的做法不会异常的原因。

那我们再看else。就是简单的调用__unguarded_linear_insert。

//stl_algo.h, line 1820
template<typename _RandomAccessIterator, typename _Compare>
void
__unguarded_linear_insert(_RandomAccessIterator __last,
                          _Compare __comp)
{
	typename iterator_traits<_RandomAccessIterator>::value_type
	__val = _GLIBCXX_MOVE(*__last);//__val = *__last 
	_RandomAccessIterator __next = __last;
	--__next;//__next = __last-1。
	//插入排序的内循环，将元素插入到合适的位置 
	while (__comp(__val, __next))//__val < __next，表明存在有逆序对 
	{
		*__last = _GLIBCXX_MOVE(*__next);//__next的元素后移 
		__last = __next;
		--__next;//__next和__last指针前移 
	}
	*__last = _GLIBCXX_MOVE(__val);//找到了合适的位置，插入 
}

可以很明显的看出来这就是插入排序的插入代码，但它没有进行边界检查。为什么可以这样做？因为之前的if语句保证了序列的开头一定会是最小值。这样，尽管没有边界检查，但并不会出现访问越界问题。而且因为少了部分比较操作，可以提高性能。

至此__insertion_sort讲完了，最后就只差__unguarded_insertion_sort了。

//stl_algo.h, line 1860
template<typename _RandomAccessIterator, typename _Compare>
inline void
__unguarded_insertion_sort(_RandomAccessIterator __first,
                           _RandomAccessIterator __last, _Compare __comp)
{
	for (_RandomAccessIterator __i = __first; __i != __last; ++__i)
		std::__unguarded_linear_insert(__i,
		                               __gnu_cxx::__ops::__val_comp_iter(__comp));
}

看得出来，这里是直接进入插入排序的内循环了。我们知道__unguarded_linear_insert是没有边界检查的。而这里又是直接调用，所以省掉了很多比较操作，明显也是为了提高效率。但这样做的前提条件是区间左边的有效范围内存在一个最小值，不然可能会有访问越界的情况出现。

至此，sort的代码就只差堆排序了。代码有点多，还没看，以后再写。