Sorting Algorithms in Java. Complexity. Algorithms. Interfaces.
Topics:
Problem definition
Insertion Sort
Selection Sort
Counting Sort
Merge Sort
Collections.sort
Teaching material for the course of "Tecniche di Programmazione" at Politecnico di Torino in year 2012/2013. More information: http://bit.ly/tecn-progr
5. Formal problem definition: Sorting
Input:
A sequence of n numbers <a1, a2, …, an>
Output:
A permutation <a’1, a’2, …, a’n> of the original elements, such
that a’1 a’2 … a’n
5 Tecniche di programmazione A.A. 2012/2013
6. Types of sorting approaches
Internal sorting
Data to be sorted are all within the main computer memory
(RAM)
Direct access to all element values
External sorting
Data to be sorted may not all be loaded in memory at the
same time
We must work directly on data stored on file
Typically, sequential access to data
6 Tecniche di programmazione A.A. 2012/2013
7. Sorting objects
Book-algorithms always refer to sorting sequences of
numbers
In practice, we need to sort the elements of a collection,
of some class type
The objects to be sorted must implement the
Comparable interface
7 Tecniche di programmazione A.A. 2012/2013
8. Comparable
public interface Comparable<T> (java.lang)
Must implement:
int compareTo(T other)
Returns a negative integer, zero, or a positive integer as this
object is less than, equal to, or greater than the specified other
object.
It is strongly recommended, but not strictly required
that (x.compareTo(y)==0) == (x.equals(y))
http://docs.oracle.com/javase/7/docs/api/java/lang/Comparable.html
8 Tecniche di programmazione A.A. 2012/2013
9. Sorting Comparable objects
Given a class, usually
A sub-set of the fields is used for sorting
The fields for sorting are called the «key» of the objects
.equals and .compareTo are defined according to the key
fields
Other fields are regarded as «additional data»
Different types of keys (and thus ordering criteria) may
be defined
The Comparable interface specifies the «natural»
ordering
Other orderings may be achieved with the Comparator helper
classes
9 Tecniche di programmazione A.A. 2012/2013
10. Comparator
public interface Comparator<T> (java.util)
Must implement:
int compare(T obj1, T obj2)
Returns a negative integer, zero, or a positive integer as the
first argument is less than, equal to, or greater than the second.
It is generally the case, but not strictly required
that (compare(x, y)==0) == (x.equals(y))
Comparators can be passed to a sort method
http://docs.oracle.com/javase/7/docs/api/java/util/Comparator.html
10 Tecniche di programmazione A.A. 2012/2013
11. Example
public class Studente implements Comparable<Studente> {
private int matricola ;
private String cognome ;
private String nome ;
private int voto ;
@Override
public int compareTo(Studente other) {
return this.matricola - other.matricola ;
}
11 Tecniche di programmazione A.A. 2012/2013
12. Example
public class Studente implements Comparable<Studente> {
private int matricola ;
private String cognome ;
private String nome ;
«Natural» ordering:
by Matricola field
private int voto ;
@Override
public int compareTo(Studente other) {
return this.matricola - other.matricola ;
}
12 Tecniche di programmazione A.A. 2012/2013
13. Based on the same
Example «key» fields
// Since we define compareTo, we should also redefine equals and
hashCode !!!
@Override
public boolean equals(Object other) {
return this.matricola == ((Studente)other).matricola ;
}
@Override
public int hashCode() {
return ((Integer)this.matricola).hashCode();
}
... getters & setters ...
}
13 Tecniche di programmazione A.A. 2012/2013
14. Comparator for sorting by name
public class StudenteByName implements Comparator<Studente> {
@Override
public int compare(Studente arg0, Studente arg1) {
int cmp = arg0.getCognome().compareTo(arg1.getCognome()) ;
if( cmp!=0 )
return cmp ;
else
return arg0.getNome().compareTo(arg1.getNome()) ;
}
}
Check names only if
surnames are equal.
14 Tecniche di programmazione A.A. 2012/2013
15. Comparator for sorting by voto
public class StudenteByVoto implements Comparator<Studente> {
@Override
public int compare(Studente o1, Studente o2) {
return o1.getVoto()-o2.getVoto() ;
}
}
Note: repeated values for the
Voto field are possible
15 Tecniche di programmazione A.A. 2012/2013
16. Stability
A sorting algorithm is said to be stable when, if multiple
elements share the same value of the key, in the sorted
sequence such elements appear in the same relative
order of the original sequence.
16 Tecniche di programmazione A.A. 2012/2013
17. Algorithms
Various sorting algorithms are known, with differing
complexity:
O(n2): simple, iterative
Insertion sort, Selection sort, Bubble sort, …
O(n): applicable in special cases, only
Counting sort, Radix sort, Bin (o Bucket) sort, …
O(n log n): more complex, recursive
Merge sort, Quicksort, Heapsort
17 Tecniche di programmazione A.A. 2012/2013
19. Insertion sort
Already ordered Not considered yet
v[j]
2 3 6 12 16 21 8
Move right by one cell all
elements ‘i’ for which v[i]>v[j]
2 3 6 8 12 16 21
2 3 6 8 12 16 21 5
19 Tecniche di programmazione A.A. 2012/2013
24. Selection Sort
At every iteration, find the minimum of the yet-unsorted
part of the vector
Swap the minimum with the current position in the
vector
Already ordered Not ordered
2 3 6 12 16 21 34 81 25 28 41 27 60
Mimimum
v[j]
2 3 6 12 16 21 25 81 34 28 41 27 60
24 Tecniche di programmazione A.A. 2012/2013
25. Complexity
The loops don’t depend on the data stored in the array:
complexity is independent from the contents of the
values to be sorted
Worst case performance : О(n2)
Best case performance: О(n2)
Average case performance: О(n2)
25 Tecniche di programmazione A.A. 2012/2013
28. Counting sort
Not applicable in general
Precondition (hypothesis for applicability):
The n elements to be sorted are integer numbers ranging from
1 to k, for some positive integer k
With this hypothesis, if k = O(n), then the algorithm has
complexity O(n), only!
28 Tecniche di programmazione A.A. 2012/2013
29. Basic idea
Find, for each element x to be sorted, how many
elements are less than x
This information allows us to directly deposit x into its
final destination position.
29 Tecniche di programmazione A.A. 2012/2013
30. Data structures
We need 3 vectors:
Starting vector : A[1..n]
Result vector : B[1..n]
Support vector : C[1..k]
Vector C keeps track of the number of elements in A that
have a certain value:
C[i] = how many elements in A have value i
The sum of the first i elements in C equals the number of
elements in A with value <= i.
30 Tecniche di programmazione A.A. 2012/2013
31. Pseudo-code
31 Tecniche di programmazione A.A. 2012/2013
32. Analysis
For each j, C[A[j]] is the number of elements <=A[j], and
also represents the final position of A[j] in B:
B[ C[A[j]] ] = A[j]
The corrective term C[A[j]] C[A[j]] – 1 handles the
presence of duplicate items
32 Tecniche di programmazione A.A. 2012/2013
33. Example (n=8, k=6)
A 3 6 4 1 3 4 1 4
C 2 0 2 3 0 1
C 2 2 4 7 7 8
B 4 C 2 2 4 6 7 8
B 1 4 C 1 2 4 6 7 8
33 Tecniche di programmazione A.A. 2012/2013
34. Example
A 3 6 4 1 3 4 1 4
B 4 C 2 2 4 6 7 8 j=8
B 1 4 C 1 2 4 6 7 8 j=7
B 1 4 4 C 1 2 4 5 7 8 j=6
B 1 3 4 4 C 1 2 3 5 7 8 j=5
B 1 1 3 4 4 C 0 2 3 5 7 8 j=4
B 1 1 3 4 4 4 C 0 2 3 4 7 8 j=3
B 1 1 3 4 4 4 6 C 0 2 3 4 7 7 j=2
B 1 1 3 3 4 4 4 6 C 0 2 2 4 7 7 j=1
34 Tecniche di programmazione A.A. 2012/2013
35. Complexity
1-2: Initialization of C: O(k)
3-4: Computaion of C: O(n)
6-7: Running sum in C: O(k)
9-11: Copy back to B: O(n)
Total complexity is therefore: O(n+k).
The algorithm is useful with k=O(n), only…
In such a case, the overall complexity is O(n).
35 Tecniche di programmazione A.A. 2012/2013
38. Merge Sort
The Merge Sort algorithm is a direct application of the
Divide et Impera approach
6 12 4 5 2 9 5 12
Divide
6 12 4 5 2 9 5 12
Solve Solve
4 5 6 12 2 5 9 12
Combine
2 4 5 5 6 9 12 12
38 Tecniche di programmazione A.A. 2012/2013
39. Merge Sort: Divide
The vector is simply partitioned in two sub-vector,
according to a splitting point
The splitting point is usually chosen at the middle of the
vector
1 8 p r
6 12 4 5 2 9 5 12
Divide
6 12 4 5 2 9 5 12
1 4 5 8 p q q+1 r
39 Tecniche di programmazione A.A. 2012/2013
40. Merge Sort: Termination
Recursion terminates when the sub-vector:
Has one element, only: p=r
Has no elements: p>r
p r
p q q+1 r
40 Tecniche di programmazione A.A. 2012/2013
41. Merge Sort: Combine
The combining step implies merging two sorted sub-
vectors
Recursion guarantees that the sub-vectors are sorted
The merging approach compares the first element of each of
the two vectors, and copies the lowest one
The result of the merging is saved in a different vector
Such algorithm may be realized in (n).
4 5 6 12 2 5 9 12
Combine
2 4 5 5 6 9 12 12
41 Tecniche di programmazione A.A. 2012/2013
42. Pseudo-code
MERGE-SORT(A, p, r)
1 if p < r Termination
2 then q (p+r)/2 Divide
3 MERGE-SORT(A, p, q)
Solve
4 MERGE-SORT(A, q+1, r)
5 MERGE(A, p, q, r) Combine
42 Tecniche di programmazione A.A. 2012/2013
43. Note
We often use the following symbols:
x = integer part of x, i.e. largest integer preceding x (floor
function)
x = smallest integer following x (ceiling function)
Examples:
3 = 3 = 3
3.1 = 3; 3.1 = 4
43 Tecniche di programmazione A.A. 2012/2013
44. The Merge procedure
MERGE(A, p, q, r) Complexity: (n).
1 i p ; j q+1 ; k 1
2 while( i q and j r )
3 if( A[i] < A[j]) B[k] A[i] ; i i+1
4 else B[k] A[j] ; j j+1
5 k k+1
6 while( iq ) B[k]A[i] ; ii+1; kk+1
7 while( jr ) B[k]A[j] ; jj+1; kk+1
8 A[p..r] B[1..k-1]
44 Tecniche di programmazione A.A. 2012/2013
45. The Merge procedure
MERGE(A, p, q, r) At each iteration, the smallest number
between the heads of the two vectors is
1 i p ; j q+1 ; k 1 copied to B
2 while( i q and j r )
3 if( A[i] < A[j]) B[k] A[i] ; i i+1
4 else B[k] A[j] ; j j+1
5 k k+1
6 while( iq ) B[k]A[i] ; ii+1; kk+1
7 while( jr ) B[k]A[j] ; jj+1; kk+1
8 A[p..r] B[1..k-1]
The «tail» of one of the vectors is
emptied
45 Tecniche di programmazione A.A. 2012/2013
46. Complexity analysis
Termination: a simple test, (1)
Divide (2): find the mid-point of the vector, D(n)=(1)
Solve (3-4): solves 2 sub-problems of size n/2 each,
2T(n/2)
Combine (5): based on the Merge algorithm, C(n) = (n).
46 Tecniche di programmazione A.A. 2012/2013
47. Complexity analysis
Termination: a simple test, (1)
Divide (2): find the mid-point of the vector, D(n)=(1)
Solve (3-4): solves 2 sub-problems of size n/2 each,
2T(n/2)
Combine (5): based on the Merge algorithm, C(n) = (n).
One sub-problem has size
n/2 , the other n/2 .
This detail does not change the
complexity result.
47 Tecniche di programmazione A.A. 2012/2013
48. Complexity
T(n) =
(1) for n 1
2T(n/2) + (n) for n > 1
The solution (proof omitted…) is:
T(n) = (n log n)
48 Tecniche di programmazione A.A. 2012/2013
49. Intuitive understanding (n=16)
16 1 x 16 = n
8 8 2x8=n
4x4=n
log2 n
4 4 4 4
2 2 2 2 2 2 2 2 8x2=n
1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 16 x 1 = n
Recursion levels: log2 n Operations per level: n
Total operations: n log2 n
49 Tecniche di programmazione A.A. 2012/2013
50. Warning
Not all recursive implementations have (n log n)
complexity.
For example, if merge sort is used with asymmetric
partitioning (q=p+1), it degrades to an insertion sort,
yielding (n2).
50 Tecniche di programmazione A.A. 2012/2013
53. Sorting, in practice, in Java
A programmer’s motto says:
Use the system sort
i.e., the sorting algorithm already provided by your libraries
In other words, don’t re-implement your own sorting functions
The Collections framework provides:
public class Collections
This class consists exclusively of static methods that operate
on or return collections
public static <T extends Comparable<? super T>>
void sort(List<T> list)
public static <T> void sort(List<T> list,
Comparator<? super T> c)
74 Tecniche di programmazione A.A. 2012/2013
54. Collections.sort(list)
Sorts the specified list into ascending order, according to
the natural ordering of its elements.
All elements in the list must implement
the Comparable interface.
Furthermore, all elements in the list must be mutually
comparable (that is, e1.compareTo(e2) must not throw
a ClassCastException for any elements e1 and e2 in the list).
This sort is guaranteed to be stable: equal elements will
not be reordered as a result of the sort.
The specified list must be modifiable, but need not be
resizable.
http://docs.oracle.com/javase/7/docs/api/java/util/Coll
ections.html#sort(java.util.List)
75 Tecniche di programmazione A.A. 2012/2013
55. Implementation of Collections.sort
This implementation is a stable, adaptive, iterative
mergesort that requires far fewer than n lg(n)
comparisons when the input array is partially sorted,
while offering the performance of a traditional mergesort
when the input array is randomly ordered.
If the input array is nearly sorted, the implementation
requires approximately n comparisons.
Temporary storage requirements vary from a small
constant for nearly sorted input arrays to n/2 object
references for randomly ordered input arrays.
http://docs.oracle.com/javase/7/docs/api/java/util/C
ollections.html#sort(java.util.List)
76 Tecniche di programmazione A.A. 2012/2013
56. Resources
Algorithms in a Nutshell, By George T. Heineman, Gary
Pollice, Stanley Selkow, O'Reilly Media
http://docs.oracle.com/javase/7/docs/api/java/lang/Compar
able.html
http://www.sorting-algorithms.com/
77 Tecniche di programmazione A.A. 2012/2013
57. Licenza d’uso
Queste diapositive sono distribuite con licenza Creative Commons
“Attribuzione - Non commerciale - Condividi allo stesso modo (CC
BY-NC-SA)”
Sei libero:
di riprodurre, distribuire, comunicare al pubblico, esporre in pubblico,
rappresentare, eseguire e recitare quest'opera
di modificare quest'opera
Alle seguenti condizioni:
Attribuzione — Devi attribuire la paternità dell'opera agli autori
originali e in modo tale da non suggerire che essi avallino te o il modo in
cui tu usi l'opera.
Non commerciale — Non puoi usare quest'opera per fini
commerciali.
Condividi allo stesso modo — Se alteri o trasformi quest'opera, o se
la usi per crearne un'altra, puoi distribuire l'opera risultante solo con una
licenza identica o equivalente a questa.
http://creativecommons.org/licenses/by-nc-sa/3.0/
78 Tecniche di programmazione A.A. 2012/2013