biology daily - the biology and biochemistry encyclopedia
biology daily articles and research Encyclopedia Dictionary Forums biology research links Weblinks Pictures Articles Blogs Newsletter

Insertion sort

Insertion sort is a simple sort algorithm in which the sorted array (or list) is built one entry at a time. It is much less efficient than the more advanced algorithms such as quicksort, heapsort, or merge sort, but it has various advantages:

  • Simple to implement
  • Efficient on (quite) small data sets
  • Efficient on data sets which are already substantially sorted
  • Stable (does not change the order of already ordered elements)
  • In-place

In abstract terms, each iteration of an insertion sort removes an element from the input data, inserting it at the correct position in the already sorted list, until no elements are left in the input. The choice of which element to remove from the input is arbitrary and can be made using almost any choice algorithm.

Sorting is typically done in-place. The result array after k iterations contains the first k entries of the input array and is sorted. In each step, the first remaining entry of the input is removed, inserted into the result at the right position, thus extending the result:

 The array right before insertion of x

becomes:

 The array right after insertion of x

with each element > x copied to the right as it is compared against x.

The algorithm can be described as:

  1. Start with the result being the first element of the input.
  2. Loop over the input array until it is empty, "removing" the first remaining (leftmost) element.
  3. Compare the removed element against the current result, starting from the highest (rightmost) element, and working left towards the lowest element.
  4. If the removed input element is lower than the current result element, copy that value into the following element to make room for the new element below, and repeat with the next lowest result element.
  5. Otherwise, the new element is in the correct location; save it in the cell left by copying the last examined result up, and start again from (2) with the next input element.


Contents

Implementations

Python

def insertsort(array):
    for removed_index in range(1, len(array)):
        removed_value = array[removed_index]
        insert_index = removed_index
        while insert_index > 0 and array[insert_index - 1] > removed_value:
            array[insert_index] = array[insert_index - 1]
            insert_index = insert_index - 1
        array[insert_index] = removed_value

C

 void insertSort(int a[], size_t length) {
     size_t i, j;
 
     for(i = 1; i < length; i++) {
         int value = a[i];
         j = i - 1;
         while (j >= 0 && a[j] > value) {
             a[j+1] = a[j];
             j--;
         }
         a[j+1] = value;
     }
 }

Haskell

 insert :: Ord a => a -> [a] -> [a]
 insert item []  = [item]
 insert item (h:t) | item <= h = item:h:t
                   | otherwise = h:(insert item t)

 insertsort :: Ord a => [a] -> [a]
 insertsort []    = []   
 insertsort (h:t) = insert h (insertsort t)

ML

fun insertsort [] = []
  | insertsort (x::xs) =
    let fun insert (x:real, []) = [x]
          | insert (x:real, y::ys) =
              if x<=y then x::y::ys
              else y::insert(x, ys)
    in insert(x, insertsort xs)
    end;

Perl

 sub insert_sort {
     for(my $i = 0; $i <= $#_; $i++) {
         my ($j, $val) = ($i - 1, $_[$i]);
         $_[$j-- + 1] = $_[$j] while ($j >= 0 && $_[$j] > $val);
         $_[$j+1] = $val;
     }
 }

Java

   void insertion_sort (int[] A) {
       int i;
       for (int j = 1; j < A.length; j++) {
               int a = A[j];
               i = j - 1;
               while (i >= 0 && A[i] > a) {
                       A[i + 1] = A[i];
                       i--;
               }
               A[i + 1] = a;
        }
   }

Good and bad input cases

In the best case of an already sorted array, this implementation of insertion sort takes O(n) time: in each iteration, the first remaining element of the input is only compared with the last element of the result. It takes O(n2) time in the average and worst cases, which makes it impractical for sorting large numbers of elements. However, insertion sort's inner loop is very fast, which often makes it one of the fastest algorithms for sorting small numbers of elements, typically less than 10 or so.

Variants

D.L. Shell made substantial improvements to the algorithm, and the modified version is called Shell sort. It compares elements separated by a distance that decreases on each pass. Shellsort has distinctly improved running times in practical work and is often a good choice.

If comparisons are very costly compared to swaps, as is the case for example with string keys stored by reference, then using binary insertion sort can be a good strategy. Binary insertion sort employs binary search to find the right place to insert new elements, and therefore performs \lceil ln(n!) \rceil comparisons in the worst case, which is Θ(n log n). The algorithm as a whole still takes Θ(n2) time on average due to the series of swaps required for each insertion, and since it always uses binary search, the best case is no longer O(n) but O(n log n).

To avoid having to make a series of swaps for each insertion, we could instead store the input in a linked list, which allows us to insert and delete elements in constant time. Unfortunately, binary search on a linked list is impossible, so we still spend Ω(n2) time searching. If we instead replace it by a more sophisticated data structure such as a heap or binary tree, we can significantly decrease both search and insert time. This is the essence of heap sort and binary tree sort.

Comparisons to other sorts

Insertion sort is very similar to bubble sort. In bubble sort, after k passes through the array, the k largest elements have bubbled to the top. (Or the k smallest elements have bubbled to the bottom, depending on which way you do it.) In insertion sort, after k passes through the array, you have a run of k sorted elements at the bottom of the array. Each pass inserts another element into the sorted run. So with bubble sort, each pass takes less time than the previous one, but with insertion sort, each pass may take more time than the previous one.

In contrast, C A R Hoare's Quicksort works by recursively dividing the array to be sorted into smaller runs each of which is sorted separately; highly optimized implementations of Quicksort often use insertion sort to sort these runs once they get "small enough".

External links



07-14-2008 23:18:10
The contents of this article are licensed from Wikipedia.org under the GNU Free Documentation License. How to see transparent copy
BiologyDaily.com 2005. Legal info   Privacy