No. 05 - The Least k Numbers

No. 05 - The Least k Numbers


Question: Please find out the least  k numbers out of  n numbers. For example, if given the 8 numbers 4, 5, 1, 6, 2, 7, 3 and 8, please return the least 4 numbers 1, 2, 3 and 4.

Analysis: The naïve solution is sort the  n input numbers increasingly, and the least  k numbers should be the first  k numbers. Since it needs to sort, its time complexity is  O(nlog n). Interviewers will ask us explore more efficient solutions.

Solution 1: O(nlogk) time efficiency, be suitable for data with huge size

A data container with capacity  k is firstly created to store the least  k numbers, and then a number is read out of the  n input numbers at each time.   If the container has less than  k numbers, the number read at current round (denoted as  num) is inserted into container directly. If it contains  knumbers already,  num cannot be inserted directly any more. However, it may replace an existing number in the container.  We get the maximum number of the  k numbers in the container, and compare it with  num. If  num is less than the maximum number, we replace the maximum number with  num. Otherwise we discard  num, since we already have  k numbers in the container which are all less than  num and it cannot be one of the least  k numbers.

Three steps may be required when a number is read and the container is full: The first step is to find the maximum number, secondly we may delete the maximum number, and at last we may insert a new number. The second and third steps are optional, which depend on whether the number read at current round is greater than the maximum number in container or not. If we implement the data container as a binary tree, it costs O(log k)  time for these three steps. Therefore, the overall time complexity is O( nlog k)  for n input numbers.

We have different choices for the data container. Since we need to get the maximum number out of k numbers, it intuitively might a maximum heap. In a maximum heap, its root is always greater than its children, so it costs O(1) time to get the maximum number. However, it takes O(log k)  time to insert and delete a number.

We have to write a lot of code for a maximum heap, and it is too difficult in the dozens-of-minute interview. We can also implement it as a red-black tree. A red-black tree classifies its nodes into red and black categories, and assure that it is somewhat balanced based on a set of rules. Therefore, it costs O(log k) time to find, insert and delete a number. The classes set and multiset in STL are all based on red-black trees. We may use data containers in STL directly if our interviewers are not against it. The following sample code is based on the multiset in STL:

typedef multiset< int, greater< int> >            intSet;
typedef multiset< int, greater< int> >::iterator  setIterator;

void GetLeastNumbers( const vector< int>& data, intSet& leastNumbers,  int k)
{
    leastNumbers.clear();

     if(k < 1 || data.size() < k)
         return;

    vector< int>::const_iterator iter = data.begin();
     for(; iter != data.end(); ++ iter)
    {
         if((leastNumbers.size()) < k)
            leastNumbers.insert(*iter);

         else
        {
            setIterator iterGreatest = leastNumbers.begin();

             if(*iter < *(leastNumbers.begin()))
            {
                leastNumbers.erase(iterGreatest);
                leastNumbers.insert(*iter);
            }
        }
    }
}

Solution 2: O(n) time efficiency, be suitable only when we can reorder the input

We can also utilize the function Partition in quick sort to solve this problem with a hypothesis. It assumes that  n input numbers are contained in an array. If it takes the  k-th number as a pilot to partition the input array, all of numbers less than the  k-th number should be at the left side and other greater ones should be at the right side. The  k numbers at the left side are the least  knumbers after the partition. We can develop the following code according to this solution:

void GetLeastNumbers( int* input,  int n,  int* output,  int k)
{
     if(input == NULL || output == NULL || k > n || n <= 0 || k <= 0)
         return;

     int start = 0;
     int end = n - 1;
     int index = Partition(input, n, start, end);
     while(index != k - 1)
    {
         if(index > k - 1)
        {
            end = index - 1;
            index = Partition(input, n, start, end);
        }
         else
        {
            start = index + 1;
            index = Partition(input, n, start, end);
        }
    }

     for( int i = 0; i < k; ++i)
        output[i] = input[i];
}

Comparison between two solutions

The second solution based on the function Partition costs only O( n) time, so it is more efficient than the first one. However, it has two obvious limitations: One limitation is that it needs to load all input numbers into an array, and the other is that we have to reorder the input numbers.
Even though the first takes more time, the second solution does have the two limitations as the first one. It is not required to reorder the input numbers (data in the code above). We read a number from data at each round, and all write operations are taken in the containerleastNumbers. It does not require loading all input number into memory at one time, so it is suitable for huge-size data. Supposing our interview asks us get the least  k numbers from a huge-size input. Obviously we cannot load all data with huge size into limited memory at one time. We can read a number from auxiliary space (such as disk) at each round with the first solution, and determine whether we need to insert it into the container leastNumbers. It works once memory can accommodate leastNumbers, so it is especially works when  n is huge and  k is small.

The characters of these two solutions can be summarized in Table 1:


First Solution
Second Solution
Time complexity
O(n*logk)
O(n)
Reorder input numbers?
No
Yes
Suitable for huge-size data?
Yes
No
Table 1: Pros and cons of two solutions

Since each solution has its own pros and cons, candidates had better to ask interviews for more requirements and details to choose the most suitable solution, including the input data size and whether it is allowed to reorder the input numbers.

The author Harry He owns all the rights of this post. If you are going to use part of or the whole of this ariticle in your blog or webpages,  please add a reference to  http://codercareer.blogspot.com/. If you are going to use it in your books, please contact me (zhedahht@gmail.com) . Thanks.

  • 0
    点赞
  • 0
    收藏
    觉得还不错? 一键收藏
  • 0
    评论
评论
添加红包

请填写红包祝福语或标题

红包个数最小为10个

红包金额最低5元

当前余额3.43前往充值 >
需支付:10.00
成就一亿技术人!
领取后你会自动成为博主和红包主的粉丝 规则
hope_wisdom
发出的红包
实付
使用余额支付
点击重新获取
扫码支付
钱包余额 0

抵扣说明:

1.余额是钱包充值的虚拟货币,按照1:1的比例进行支付金额的抵扣。
2.余额无法直接购买下载,可以购买VIP、付费专栏及课程。

余额充值