The problem is:
Given an arbitrarily large file and a number, N , containing individual numbers on each line (e.g. 200Gb file), will output the largest N numbers, highest first. Analyse the run time/space complexity of your approach.
The intuitive method for the top- N is to sort all numbers first, and then return the first/last N numbers. However, the time complexity of sorting is usually O(nlogn) , and when the data are very big, it is impossible to do the sorting in memory directly.
- See more at: http://bo-yang.github.io/2014/06/29/top-n-numbers/#sthash.L1exrLoX.dpuf