1. Data Structures & Algorithm
1
Sorting
Session VII
Dr. V.Umadevi M.Sc(CS &IT). M.Tech (IT).,
M.Phil., PhD., D.Litt.,
Director, Department of Computer Science,
Jairams Arts and Science College, Karur.
3. Data Structures & Algorithm
3
Insertion Sort
• One of the simplest methods to sort an array is an insertion sort.
• An example of an insertion sort occurs in everyday life while playing cards.
– To sort the cards in your hand you extract a card, shift the remaining
cards, and then insert the extracted card in the correct place.
– This process is repeated until all the cards are in the correct sequence.
• Def: Sort by repeatedly taking the next item and inserting it into the final
data structure in its proper order with respect to items already
inserted. Run time is O(n2) because of moves.
• Starts by considering the two first elements of the array; if they are out of
order, an interchange takes place then the third item is considered
and inserted into its proper place.
• An Insertion Sort is one that sorts a set of records by inserting records into
an existing sorted file.
• Take elements one by one.
• Insert the element in its proper position among those already taken and
sorted in a new collection.
• Repeat until all elements taken and sorted in proper order.
• The simplest implementation of this requires two list structures - the source
4. Data Structures & Algorithm
4
Source code for Insertion sort
Insertion (int a[], int N) /* in C */ /* sort a[1..N], NB. 1 to N */
{
int i, j, ai;
a[0] = -MAXINT; /* a sentinel */
for(i=2; i <= N; i++)
{ /* invariant: a[1..i-1] sorted */
ai = a[i];
j = i-1;
while( a[j] > ai )
{ /* invariant: a[j+2..i] > ai */
a[j+1] = a[j];
j--;
} /* a[1..j] <= ai < a[j+2..i] */
a[j+1] = ai; /* a[1..i] is sorted */
}
} /*insertion*/
Pros: Relatively simple and easy to implement.
Cons: Inefficient for large lists.
5. Data Structures & Algorithm
5
• Insertion sort is an in-place sort.
• No extra memory is required.
• Insertion sort is also a stable sort.
• Assuming there are n elements in the array, we must index through
n - 1 entries.
• For each entry, we may need to examine and shift up to n - 1 other
entries, resulting in a O(n2
) algorithm
• The number of comparisons of elements in the worst case is
(N-1) + (N-2) + ... + 1 = (N-1)*N/2 i.e. O(N2
).
• The average case time-complexity is O((N-1)*N/4), i.e. O(N2
).
• The best-case time complexity is when the array is already sorted,
and is O(N).
Analysis of Insertion sort
6. Data Structures & Algorithm
6
Shell Sort
• Shell sort, developed by Donald L. Shell, is a non-stable in-place sort.
• Shell sort improves on the efficiency of insertion sort by quickly shifting
values to their destination.
• Average sort time is O(n7/6), while worst-case time is O(n4/3).
• An improvement of Insertion Sort.
• Insertion Sort is slow since it compares and exchanges only elements in
neighbor.
• Allow comparison and exchange of elements that are far apart to gain
speed.
• Take every hth element to form a new collection of elements and sort them
(using Insertion Sort), which is called h-sort.
• Choose a new h with a smaller value.
• e.g., calculated by hi+1 = 3*hi + 1, or hi = (hi+1 - 1)/3; h0 =1,
• thus, we have a sequence ..., 1093, 364, 121, 40, 13, 4, 1)
– Repeat until h = 1, then the file will be sorted in proper order.
8. Data Structures & Algorithm
8
void shellSort(int numbers[], int array_size)
{
int i, j, increment, temp;
increment = 3;
while (increment > 0)
{
for (i=0; i < array_size; i++)
{ j = i; temp = numbers[i];
while ((j >= increment) && (numbers[j-increment] > temp))
{ numbers[j] = numbers[j - increment];
j = j - increment;
}
numbers[j] = temp;
}
if (increment/2 != 0)
increment = increment/2;
else if (increment == 1)
increment = 0;
else increment = 1;
}
}
9. Data Structures & Algorithm
9
Address Calculation Sort
• This sorting method, considers as the application of hashing function to sort.
Def : A sort algorithm which uses knowledge of the domain of the items to
calculate the position of each item in the sorted array.
• In this method a function f is applied to each key.
• The result of this function determines into which of several subfiles the
record is to be placed.
• The function should have the property that if x < y, f(x) < f(y).
• Such a function is called order-preserving.
• Thus all of the records in one subfile will have keys that are less than or
equal to the keys of the records in another subfile.
• An item is placed into a subfile in correct sequence by using any sorted
methods; simple insertion is often used.
• After all the items of the original file have been placed into subfiles, the
subfiles may be concatenated to produce the sorted result.
10. Data Structures & Algorithm
10
12 null
25 null
33
48 null
57 null
37 null
92 null
86 null
F(2)
F(1)
F(3)
F(4)
F(5)
F(9)
F(8)
F(7) = null
F(6) = null
F(0) = nullAddress calculation sort
• Let us create ten subfiles, one for each of the ten possible first digits.
• Initially, each of these subfiles is empty.
• An array of pointers f [10] is declared, where f[i] points to the first element in the file
whose first digit is i.
25, 57 48 37, 12, 92, 86, 33
11. Data Structures & Algorithm
11
• Assuming that the non decreasing hashing function uniformly distributes the
records of the table among the linked lists, this sort performs in a linear
manner;
• The number of comparisons is O(n).
• Worst case occurs when all keys are mapped into the same number.
• In this case the performance of the sorting method degenerates to O(n2).
Analysis
• After scanning the first element (25) it is placed into the file headed by f [2].
• Each of the subfiles is maintained as a sorted linked list of the original array
elements.
• After processing each of the elements in the original file, the subfiles appear as
in the above Figure
• The routine assumes an array of two-digit numbers and uses the first digit of
each number to assign that number to a subfile.
12. Data Structures & Algorithm
12
#define NUMELTS ...
addr(int x[], int n)
int f[10], first, i, j, P, Y;
struct {
int info;
int next;
}node[NUMELTS];
/* initialize available linked list*/
int avail = 0;
for (i = 0; i <n-1; i++)
node[il.next = i+l;
node[n-1].next = -1;
/* initialize pointers*/
for (i = 0; i < 10; i ++)
f[i]=-1;
for (i = 0, i < n; i ++){
/* We successively insert each element into its */
/* respective subfile using list insertion */.
Code for Address calculation sort
13. Data Structures & Algorithm
13
y = x[i];
first = y/10; /* Find the Ist digit of a two digit number*/
/* Search the linked list*/
place (&f[first], y);
/* place inserts y into its proper position*/
/* in the linked list pointed to by f[first]*/
} /* end for */
/* Copy numbers back into the array x*/
i = 0;
for (j = 0; j < 10; j++)
{
p = f[j];
while (p!=-1)
{
x[i++] = node[p].info;
p = node[p].next;
} /* end while*/
} /* end for*/
/* end addr */
14. Data Structures & Algorithm
14
Merge Sort
• Most common external sorting, that is for the problem in which data is
stored in disks or magnetic tapes.
• Merge sort is an excellent sorting method.
• Divide the files into two equal sized sub files and the sort the sub files
separately, then merge the sorted files into one.
• Merging: combining two sorted files to make one larger sorted file.
• Steps for merge files
– If the array to be sorted has more than one item in it divide it into two parts.
– Recursively call MergeSort() to sort the first half-Array.
– Recursively call MergeSort() to sort the second half-Array.
– Merge the two half-arrays
• Selection (in Quick Sort): partitioning a file into two independent files.
• Selection and merging are complementary operations.
• Good and stable performance (N log N, even in the worst case), the same
as Quick Sort and Heap Sort.
• Drawback: linear extra space for the merge (can only sort half the memory).
• Pros: Marginally faster than the heap sort for larger sets.
15. Data Structures & Algorithm
15
Algorithm for Merge sort
Algorithm MergeSort(l,h)
//a[l:h] is a gloabal array to be sorted
//small(p) is true if there is only one element sort.
//In this case the list already sorted.
{ if (l<h) // if there are more than one elements
{ //Divide p into subproblem
mid=[l+h]/2;
//solve the subproblems
mergesort(l,mid);
mergesort(mid+1,h);
//combine the solutions
merge(l,mid,h);
}}
16. Data Structures & Algorithm
16
• The array name is ‘A’ the l,h,mid are parameters of the array.
• The array is split into equal half size separately ie one range is l to
mid and the other range is mid+1 to h and sort them separately.
• Finally the sorted lists are combined as l,mid and h.
• This merge sort subroutine is responsible for allocating additional
workspace needed.
Sorted Sorted
First
Last
[First+Last ]/2
Sort recursively Sort recursively
By merge sort By merge sort
for merge strategy
17. Data Structures & Algorithm
17
• The A array values are given below i.e. 25, 57, 49, 36, 13, 98, 80, 30
• In the first stage divide the array value into equal size a1 =25,49,13,80.
a2=57,36,98,30
• compare the equal index values of both arrays.
Original [25] [57] [49] [36] [13] [98] [80] [30]
Files
I st [25 57] [49 36] [13 98] [80 30]
Stage
II
Stage [25 36 49 57] [13 30 80 90]
III
Stage [13 25 30 36 49 57 80 90]
Successive stages of merge sort
• Divide the array into equal size of two sub list and merge the adjacent (disjoint) pairs
of sublist.
• Repeat the process until there is only one list remaining of size n.
• Each individual list is contained in braces.
18. Data Structures & Algorithm
18
Radix Sort
• Start by sorting least significant digits
• It requires the absolute minimum amount of space and the minimum amount of
data movement, it does no comparisons.
• It is ideal if usage linked lists with integer keys.
• It sorts elements by looking at their KEY values one digit at a time.
• First it sorts them according to their least significant digit.
• It sorts the result of this according to the second least significant digit.
• And carries on like this until, at last, it has sorted according to the most
significant digit.
• Data holder: Filled array of data structures and an empty array of the same
size.
• Sort progresses structures will be moved back and forth between the two until
they are completely sorted.
Technique: Start with the ones column of each key and sort all 0s, 1s, 2s, etc
into separate groups.
• Arrange the groups in ascending order.
•
19. Data Structures & Algorithm
19
• Each digit requires n comparisons
• The algorithm is O(n)
• The preceding lower bound analysis does not apply because radix sort does
not compare keys
• After sorting the rightmost column we must use a stable sort for the remaining
columns
• Faster than quicksort
• Not an in-place sort
• Stable
20. Data Structures & Algorithm
20
Bucket sort
• Definition: A distribution sort where input elements are initially distributed
to several buckets based on an interpolation of the element's key.
• Each bucket is sorted if necessary, and the buckets' contents are
concatenated.
•Bucket sort is possibly the simplest distribution sorting algorithm.
•The essential requirement is that the size of the universe from which
the elements to be sorted are drawn is a small, fixed constant, say m.
•For example, suppose elements are sorted drawn from {0,1,….m-1},
•i.e., the set of integers in the interval [0, m-1].
•Bucket sort uses m counters.
•The ith counter keeps track of the number of occurrences of the ith
element of the universe.
21. Data Structures & Algorithm
21
.Bucket Sorting
• For example, let the elements to sort are only positive integers smaller than M.
• One can keep an array of size M (initialized to 0) then read the input and copy each
element on its proper place.
• This algorithm takes O(M+N) time and if Mis O(N), then the running time is O(N).
• Note that this does not the violate our result on the lower bound for sorting
algorithms, since this algorithm does not use comparisons between the elements
• In Figure the universal set is assumed to be {0,1,…9}.
• Therefore, ten counters are required one to keep track of the number of zeroes,
one to keep track of the number of ones, and so on.
• A single pass through the data suffices to count all of the elements. Once the
counts have been determined, the sorted sequence is easily obtained.
• E.g., the sorted sequence contains no zeroes, two ones, one two, and so on.
22. Data Structures & Algorithm
22
list sort( s, min, max )
list s; typekey min, max;
{ int i;
typekey div, maxb[M], minb[M];
list head[M], t;
struct rec aux;
extern list Last;
if (s==NULL) return(s);
if (max==min)
{ for (Last=s; Last->next!=NULL; Last = Last->next); return( s ); }
div = (max-min) / M; /* Find dividing factor */
if (div==0) div = 1;
for (i=0; i<m;i++) head[i]=null;
/*place records in buckets*/
while (s!=null) {
i=s->k-min) / div; if (i<0) i=0; else if (i>=M) i = M-1;
t = s; s = s->next; t->next = head[i];
if (head[i]==NULL) minb[i] = maxb[i] = t->k;
head[i] = t;
if ( t->k > maxb[i] ) maxb[i] = t->k;
if ( t->k < minb[i] ) minb[i] = t->k; } /* sort recursively */
t = &aux; for (i=0; i <m;i++)if (head[i]!=null){
t->inext = sort( head[i], minb[i], maxb[i] );
t = Last; }
return(aux.next); }