There are several reasons for studying t
simple sorting algorithms in some detail. First, they provide a relatively
painless way to learn terminology and basic mechanisms for sorting algorithms
so that we get an adequate background for studying the more sophisticated
algorithms. Second, there are many applications of sorting where it’s better to
use these simple methods than the more powerful general-purpose methods.
Finally, some of the simple methods extend to better general purpose methods or
can be used to improve the efficiency of more powerful methods. The most
prominent example of this is seen in recursive sorts which “divide and conquer”
big files into many small ones. Obviously, it is advantageous to know the best
way to deal with small files in such situations.
As mentioned above, there are several sorting
applications in which a relatively simple algorithm may be the method of
choice. Sorting programs are often used only once (or only a few times). If the
number of items to be sorted is not too large (say, less than five hundred
elements), it may well be more efficient just to run a simple method than to
implement and debug a complicated method. Elementary methods are always suitable
for small files (say, less than fifty elements); it is unlikely that a
sophisticated algorithm would be justified for a small file, unless a very
large number of such files are to be sorted. Other types of files that are
relatively easy to sort are ones that are already almost sorted (or already
sorted) or ones that contain large numbers of equal keys. Simple methods can do
much better on such well-structured files than general-purpose methods.
Insertion sort is the method often used by
people to sort bridge hands: consider the elements one at a time, inserting
each in its proper place among those already considered (keeping them sorted).
The element being considered is inserted merely by moving larger elements one
position to the right, then inserting the element into the vacated position.
The code for this algorithm is straightforward.
Insertion Sort by Diminishing Increment A
refinement of the straight insertion sort was proposed by D. L. Shell in l959 (Shell 1959). The method is
explained and demonstrated on our standard example of eight items (see Table
1). First, all items that are four positions apart are grouped and sorted
separately. This process is called a 4-sort. In this example of eight items,
each group contains exactly two items. After this first pass, the items are
regrouped into groups with items two positions apart and then sorted anew. This
process is called a 2-sort. Finally, in a third pass, all items are sorted in
an ordinary sort or 1-sort. One may at first wonder if the necessity of several
sorting passes, each of which involves all items, does not introduce more work
than it saves. However, each sorting step over a chain either involves
relatively few items or the items are already quite well ordered and
comparatively few rearrangements are required. It is obvious that the method
results in an ordered array, and it is fairly obvious that each pass profits
from previous passes (since each i-sort combines two groups sorted in the
preceding 2i-sort).
It is also obvious that
any sequence of increments is acceptable, as long as the last one is unity,
because in the worst case the last pass does all the work. It is, however, much
less obvious that the method of diminishing increments yields even better
results with increments other than powers of 2.
The procedure is therefore developed without
relying on a specific sequence of increments. The T increments are denoted by h0,
h1, ... , hT-1 with the conditions ht-1 = 1, hi+1 < hi
Analysis of Shellsort.
The analysis of this algorithm poses some
very difficult mathematical problems, many of which have not yet been solved.
In particular, it is not known which choice of increments yields (gap function)
the best results. One surprising fact, however, is that they should not be multiples
of each other. This will avoid the phenomenon evident from the example given
above in which each sorting pass combines two chains that before had no
interaction whatsoever. It is indeed desirable that interaction between various
chains takes place as often as possible, and the following theorem holds: If a
k-sorted sequence is i-sorted, then it remains k-sorted.
1, 4, 13, 40, 121, ...where hk-1 = 3hk+1, ht
= 1, and t = k×log3(n) - 1.He also recommends the sequence 1, 3, 7, 15, 31, ...
where hk-1 = 2hk+1, ht = 1, and t = k×log2(n) - 1(Nievergelt and Hinrichs 1999)
.
For
the latter choice, mathematical analysis yields an effort proportional to n2
required for sorting n items with the Shellsort algorithm. Although this is a
significant improvement over n2 , we will not expound further on this method,
since even better algorithms are known.
Deciding
which gap sequence to use is difficult. Every gap sequence that contains 1
yields a correct sort; however, the properties of thus obtained versions of
Shellsort may be very different.
The
table below compares most proposed gap sequences published so far((Shell 1959), (Frank and Lazarus 1960), (Hibbard 1963), (Papernov and Stasevich 1965), (Incerpi and Sedgewick 1985) (Sedgewick 1986) (Gonnet, Baeza-Yates et al. 1991) (Tokuda 1992) (Ciura 2001)). Some of them have decreasing elements that depend on the size of the
sorted array (N). Others are increasing infinite sequences, whose elements less
than N should be used in reverse order.
R and example of code and benchmarking:
The
Rcpp package has become the most widely used language extension for R, the powerful environment and language
for computing with data. As of September 2016, 778 packages on CRAN and a further 76 on BioConductor deploy Rcpp to extend R,
to accelerate computations and to connect to other C++ projects. Rcpp
provides a powerful API on top of R, permitting direct interchange of rich R
objects (including S3, S4 or Reference Class objects) between R and C++. Rcpp
sugar gives syntactic sugar such as vectorised C++ expression; Rcpp modules
provide easy extensibility using declarations and Rcpp attributes greatly
facilitates code integration.
Shell
sort is used naturally in R using order and sort function in the basic package.
The source file of the Rcpp implementation is available at: M. E., Macherki
(2016): Shell Sort Algorithm:Rcpp. figshare.
References
Ciura,
M. (2001). Best increments for the average case of shellsort. International
Symposium on Fundamentals of Computation Theory, Springer.
Frank,
R. and R. Lazarus (1960). "A high-speed sorting procedure."
Communications of the ACM 3(1): 20-22.
Gonnet,
G. H., et al. (1991). Lexicographical indices for text: Inverted les vs pat
trees, Technical Report TR-OED-91-01, University of Waterloo.
Hibbard,
T. N. (1963). "A simple sorting algorithm." Journal of the ACM (JACM)
10(2): 142-150.
Incerpi,
J. and R. Sedgewick (1985). "Improved upper bounds on Shellsort."
Journal of Computer and System Sciences 31(2): 210-224.
Nievergelt, J. and K. H. Hinrichs (1999).
Algorithms & data structures, vdf Hochschulverl. an der ETH.
Papernov, A. and G. Stasevich (1965). "A
method of information sorting in computer memories." Problemy Peredachi
Informatsii 1(3): 81-98.
Sedgewick, R. (1986). "A new upper bound
for Shellsort." Journal of Algorithms 7(2): 159-173.
Shell, D. L. (1959). "A high-speed
sorting procedure." Communications of the ACM 2(7): 30-32.
Tokuda, N. (1992). An improved Shellsort.
Proceedings of the IFIP 12th World Computer Congress on Algorithms, Software,
Architecture-Information Processing'92, Volume 1-Volume I, North-Holland
Publishing Co.