1. The ORDER BY clause is familiar from other SQL dialects. It performs a total ordering of
the query result set. This means that all the data is passed through a single reducer,
which may take an unacceptably long time to execute for larger data sets.
Hive adds an alternative, SORT BY, that orders the data only within each reducer, thereby
performing a local ordering, where each reducer’s output will be sorted. Better perfor-
mance is traded for total ordering.
the query result set. This means that all the data is passed through a single reducer,
which may take an unacceptably long time to execute for larger data sets.
Hive adds an alternative, SORT BY, that orders the data only within each reducer, thereby
performing a local ordering, where each reducer’s output will be sorted. Better perfor-
mance is traded for total ordering.