...one of the most highly
regarded and expertly designed C++ library projects in the
world.
— Herb Sutter and Andrei
Alexandrescu, C++
Coding Standards
is a hybrid radix-based/comparison-based algorithm that sorts strings of
characters (or arrays of binary data) in ascending order.
string_sort
The simplest version (no functors) sorts strings of items that can cast to an unsigned data type (such as an unsigned char), have a < operator, have a size function, and have a data() function that returns a pointer to an array of characters, such as a std::string. The functor version can sort any data type that has a strict weak ordering, via templating, but requires definitions of a get_char (acts like x[offset] on a string or a byte array), get_length (returns length of the string being sorted), and a comparison functor. Individual characters returned by get_char must support the != operator and have an unsigned value that defines their lexicographical order.
This algorithm is not efficient for character types larger than 2 bytes each, and is optimized for one-byte character strings. For this reason, std::sort will be called instead if the character type is of size > 2.
has a special optimization for identical substrings. This adds some overhead
on random data, but identical substrings are common in real strings.
string_sort
reverse_string_sort sorts strings in reverse (decending) order, but is otherwise
identical.
is sufficiently flexible that it should sort any data type that std::sort
can, assuming the user provides appropriate functors that index into a key.
string_sort
See stringfunctorsample.cpp for an example of how to sort structs using a string key and all functors:
struct lessthan { inline bool operator()(const DATA_TYPE &x, const DATA_TYPE &y) const { return x.a < y.a; } };
struct bracket { inline unsigned char operator()(const DATA_TYPE &x, size_t offset) const { return x.a[offset]; } };
struct getsize { inline size_t operator()(const DATA_TYPE &x) const{ return x.a.size(); } };
and these functors are used thus:
string_sort(array.begin(), array.end(), bracket(), getsize(), lessthan());
See generalizedstruct.cpp for a working example of a generalized approach to sort structs by a sequence of integer, float, and multiple string keys using string_sort:
struct DATA_TYPE { time_t birth; float net_worth; string first_name; string last_name; }; static const int birth_size = sizeof(time_t); static const int first_name_offset = birth_size + sizeof(float); static const boost::uint64_t base_mask = 0xff; struct lessthan { inline bool operator()(const DATA_TYPE &x, const DATA_TYPE &y) const { if (x.birth != y.birth) { return x.birth < y.birth; } if (x.net_worth != y.net_worth) { return x.net_worth < y.net_worth; } if (x.first_name != y.first_name) { return x.first_name < y.first_name; } return x.last_name < y.last_name; } }; struct bracket { inline unsigned char operator()(const DATA_TYPE &x, size_t offset) const { // Sort date as a signed int, returning the appropriate byte. if (offset < birth_size) { const int bit_shift = 8 * (birth_size - offset - 1); unsigned char result = (x.birth & (base_mask << bit_shift)) >> bit_shift; // Handling the sign bit. Unnecessary if the data is always positive. if (offset == 0) { return result ^ 128; } return result; } // Sort a signed float. This requires reversing the order of negatives // because of the way floats are represented in bits. if (offset < first_name_offset) { const int bit_shift = 8 * (first_name_offset - offset - 1); unsigned key = float_mem_cast<float, unsigned>(x.net_worth); unsigned char result = (key & (base_mask << bit_shift)) >> bit_shift; // Handling the sign. if (x.net_worth < 0) { return 255 - result; } // Increasing positives so they are higher than negatives. if (offset == birth_size) { return 128 + result; } return result; } // Sort a string that is before the end. This approach supports embedded // nulls. If embedded nulls are not required, then just delete the "* 2" // and the inside of the following if just becomes: // return x.first_name[offset - first_name_offset]; const unsigned first_name_end_offset = first_name_offset + x.first_name.size() * 2; if (offset < first_name_end_offset) { int char_offset = offset - first_name_offset; // This signals that the string continues. if (!(char_offset & 1)) { return 1; } return x.first_name[char_offset >> 1]; } // This signals that the string has ended, so that shorter strings come // before longer ones. if (offset == first_name_end_offset) { return 0; } // The final string needs no special consideration. return x.last_name[offset - first_name_end_offset - 1]; } }; struct getsize { inline size_t operator()(const DATA_TYPE &x) const { return first_name_offset + x.first_name.size() * 2 + 1 + x.last_name.size(); } };
string_sort(array.begin(), array.end(), bracket(), getsize(), lessthan());
Other examples:
Sort structs using a string key and indexing functors.
Sort structs using a string keynd all functors in reverse order.