...one of the most highly
regarded and expertly designed C++ library projects in the
world.
— Herb Sutter and Andrei
Alexandrescu, C++
Coding Standards
The
algorithm used in this library is designed to provide best possible worst-case
performance, while still being cache-friendly. It provides the better of
𝑶(N*log(K/S + S)) and 𝑶(N*log(N))
worst-case time, where K is the log of the range.
The log of the range is normally the length in bits of the data type; 32
for a 32-bit integer.
spreadsort
flash_sort
(another hybrid
algorithm), by comparison is 𝑶(N) for evenly distributed
lists. The problem is, flash_sort
is merely an MSD radix
sort combined with 𝑶(N*N) insertion sort to
deal with small subsets where the MSD Radix Sort is inefficient, so it
is inefficient with chunks of data around the size at which it switches
to insertion_sort
, and
ends up operating as an enhanced MSD Radix Sort. For uneven distributions
this makes it especially inefficient.
and integer_sort
use introsort
instead, which provides 𝑶(N*log(N)) performance for
these medium-sized pieces. Also, float_sort
flash_sort
's
𝑶(N) performance for even distributions comes at the
cost of cache misses, which on modern architectures are extremely expensive,
and in testing on modern systems ends up being slower than cutting up the
data in multiple, cache-friendly steps. Also worth noting is that on most
modern computers, log2(available
RAM)/log2(L1 cache
size)
is around 3, where a cache miss takes more than 3 times as long as an in-cache
random-access, and the size of max_splits is tuned
to the size of the cache. On a computer where cache misses aren't this
expensive, max_splits could be increased to a large
value, or eliminated entirely, and integer_sort/float_sort
would have the same 𝑶(N) performance on even distributions.
Adaptive Left Radix (ALR) is similar to flash_sort
,
but more cache-friendly. It still uses insertion_sort. Because ALR uses
𝑶(N*N) insertion_sort
,
it isn't efficient to use the comparison-based fallback sort on large lists,
and if the data is clustered in small chunks just over the fallback size
with a few outliers, radix-based sorting iterates many times doing little
sorting with high overhead. Asymptotically, ALR is still 𝑶(N*log(K/S
+ S)), but with a very small S (about 2
in the worst case), which compares unfavorably with the 11 default value
of max_splits for Spreadsort.
ALR also does not have the 𝑶(N*log(N)) fallback, so
for small lists that are not evenly distributed it is extremely inefficient.
See the alrbreaker
and
binaryalrbreaker
testcases
for examples; either replace the call to sort with a call to ALR and update
the ALR_THRESHOLD at the top, or as a quick comparison make get_max_count return
ALR_THRESHOLD
(20 by default
based upon the paper). These small tests take 4-10 times as long with ALR
as std::sort
in the author's testing, depending on the test system, because they are
trying to sort a highly uneven distribution. Normal Spreadsort does much
better with them, because get_max_count
is designed around minimizing worst-case runtime.
burst_sort
is an efficient
hybrid algorithm for strings that uses substantial additional memory.
uses minimal additional memory by comparison. Speed comparisons between
the two haven't been made, but the better memory efficiency makes string_sort
more general.
string_sort
postal_sort
and
are similar. A direct performance comparison would be welcome, but an efficient
version of string_sort
postal_sort
was not found in a search for source.
is most similar to the American
flag sort algorithm. The main difference is that it doesn't bother
trying to optimize how empty buckets/piles are handled, instead just checking
to see if all characters at the current index are equal. Other differences
are using std::sort
as the fallback algorithm, and a larger fallback size (256 vs. 16), which
makes empty pile handling less important.
string_sort
Another difference is not applying the stack-size restriction. Because
of the equality check in
,
it would take m*m memory worth of strings to force
string_sort
to create a stack of depth m. This problem isn't a
realistic one on modern systems with multi-megabyte stacksize limits, where
main memory would be exhausted holding the long strings necessary to exceed
the stacksize limit. string_sort
can be thought of as modernizing American
flag sort to take advantage of introsort
as a fallback algorithm. In the author's testing, American
flag sort (on string_sort
std::strings
)
had comparable runtime to introsort,
but making a hybrid of the two allows reduced overhead and substantially
superior performance.