Today, I am going to dive into the awesome world of sorting. It reminds me with entry level computer science courses and algorithm design. The goal of this article is to summarize popular sorting algorithms and put it in one place so that it is easy to remember and refer to. My intention is to include most important information about sorting algorithms without the fluff that you find on many websites. I will do my best to provide an easy to follow explanation. Without further due, let us get started with the definition of sorting.
A sorting algorithm puts an array of elements into a non-decreasing (or non-increasing) order. Notice the wording here, we did not say strictly increasing or decreasing because an array can have duplicate values. This definition is not enough though. The sorted array must contain the same elements as the input array. There is one more optional condition if we want to be strict. Sorting has to be stable, which means already sorted elements should not be moved from their original positions. We will briefly discuss stability later in this post. Now, let us mention some sorting applications.
Applications of Sorting
There are many applications of sorting. Here is a few...
- It is human nature, people prefer sorted data.
- Sorting helps in understanding algorithm design techniques.
- Sorted data is faster to search.
- Find stats and patterns such as min, max, duplicates, similar values, frequencies, etc.
- Compare or find overlap between arrays.
Sorting algorithms can be classified based on various criteria. I will only mention four...
1. Internal vs External Sorting
Internal sorting assumes that all data is already stored in memory while sorting is in progress. On the other hand, external sorting deals with data stored outside memory (ex. disk). This type of sorting is used when data cannot fit entirely in memory. The challenge here is how to load or unload data in an optimal way.
2. Comparison based Sorting
In order to put an element in its correct position, we use comparison operators such as equal, less than or greater than. This is called comparison based sorting which is easy to comprehend. In contrast, non-comparison based sorting is somehow confusing. How can we put an element in its correct position without doing any comparison ? It is done by making certain assumptions about data. For example, the algorithm assumes that all elements are within a certain integer range. Later in the article, we are going to see how such an assumption will magically put elements in order without doing any comparisons.
3. Numeric vs Lexicographic Sorting
In comparison based sorting, numeric order is used to sort numbers (ex. integers, floats) while lexicographic order is used to sort text alphabetically. Numeric order is easy to understand so let us only provide an example on lexicographic order. Lexicographically sorting the list [house, House, household] produces the list [House, house, household]. You may take a look at this article. It implements lexicographic sorting in Java.
4. In-place Sorting
In-place sorting algorithms do not require any additional temporary storage while sorting data. On the other hand, there are sorting algorithms that use extra memory space. Saving space while sorting data in modern computers is not a big deal, however it can be of a concern for embedded on-chip systems where memory might be a scarce resource.
1. Running Time
In algorithm design, performance is expressed in terms of running time or Big-O notation. Running time refers to the number of operations (ex. sorting comparisons) an algorithm takes to accomplish a particular task. Mathematically speaking, It is a function of input size (ex. number of elements). Note that number of operations and execution time are two different metrics. For example, if an algorithm performs 100 file access operations, the actual execution time is equal to 100 multiplied by file access time assuming it is relatively constant. Since the execution time to perform a single operation depends on the computational power of the computer, we say running time is O(n) where (n) in this case is 100. Running time (Big-O) tells us more about the performance of an algorithm when the input size is large. This is tricky, a simple sorting algorithm can be a better choice if the input size is small even if running time is not optimal. In other words, Big-O does not have the final word on performance as input data size need to be taken into account. The overhead of a complex but fast sorting algorithm may overbalance the benefit if the input data size is not large enough. One remark before we close this point, please note that many authors use number of comparisons to discuss sorting performance. Although this is convenient, yet not only comparisons but also move operations contribute to execution time. Finally, to give you a clue how different algorithms compare in terms of running time. O(log(n)) is faster than O(n) which is in turn faster than O(n log(n)) which is faster than O(n^2) which is faster than O(e^n). In other words, logarithm running time is the fastest then linear time then polynomial time and exponential is the slowest running time. Typically, efficient sorting algorithms such as merge sort has a running time of O(n log(n)) and bad performing algorithm such as bubble sort has a running time of O(n^2). If you want to learn more about algorithm complexity check out this article.
2. Average and Worst Case Scenarios
Consider the following scenario. Given an array of 100 numbers. We want to lookup the array for a specific number. How many comparisons are needed to accomplish this task? The answer is "It depends". If the number does not exist or is located in the last position then 100 comparisons are needed. This is the worst case scenario. If the number is in the first position then one comparison is needed which is the best case scenario. On a average, we will be doing a number of comparisons somehow in between. Sorting is no different, depending on data, a given sorting algorithm may behave differently. For example, quick-sort in worst case runs as bad as bubble sort with running time of O(n^2). The irony is that quick-sort worst case occurs if the input data is already or almost sorted. Merge sort performs equally well on average and worst case with a running time of O(n log(n)). An important remark to notice here is that worst case scenario for a given algorithm is critical in case of real time systems when response time need to be guaranteed. Lastly, as we explore popular sorting algorithms in this post, we will indicate their best, average and worst case scenarios.
3. Space Usage
As we indicated earlier in the classification section, sorting algorithms may require additional memory space. For example, sorting an array of 100 elements using merge sort requires an additional space for 100 elements. In other words, merge sort has O(n) space complexity. It is important to remember that saving some space by implementing an in place sorting algorithm is not worth the effort in many cases. You end up with a complicated piece of code for no real value as space is typically abundant in modern computers. A typical memory size of 16G on laptops (servers have more memory) coupled with OS virtual memory tells us that spending a little bit extra space for speedup is not a bad idea.
4. Sorting and Recursion
Some popular sorting algorithms such as merge sort follow a divide and conquer approach. This technique is directly related to recursion. You may refer to the following article for more information about recursion. Too many recursive calls may eat stack memory space. When this happens, code breaks with stack overflow errors. When designing sorting algorithms, recursion depth must be taken into consideration and specially with certain data sets. For example, a long list of already sorted elements may overflow the stack memory in case of quick sort. Just keep in mind that too many function calls as in recursion is not a good thing.
5. Sorting and Caching
Efficient algorithms exploit the inner workings of underlying OS and hardware architecture. More specifically, sorting algorithms may take advantage of microprocessor caching system. The closer the data to the CPU, the faster it can be accessed and retrieved. Accessing data in cache is much faster than in RAM. For that reason, algorithm designers take advantage of data locality of reference which is an important caching concept. Locality can be spatial or temporal. Spatial locality means if sequential data is accessed then neighboring data will probably be accessed as well. When data is accessed from RAM, not only we copy it to the cache but also some neighboring data. Quick sort is a good example on sorting algorithms that may benefit from caching. It moves items to the left and right of the pivot element (we will explain what that means later in this post). Each time a swap is made, it is likely that subsequent swaps are already loaded into the cache which in turn impacts sorting performance positively.
Stable sorting algorithms do not alter the relative order of elements that have the same value. To better explain that, consider the following example: Array (A) below has two numbers each has the value = (5) . For identification, let us assume the first (5) is black and the second is white as shown below
A = [5 (black), 6, 1, 5 (white)]
//A stable sorting algorithm yields
B = [1, 5 (black), 5 (white), 6]
//However, an unstable sorting algorithm yields
B = [1, 5 (white), 5 (black), 6]
Stable sorting algorithms are desired when data can be sorted in different ways. For example, consider a database table containing people names and ages. If two people have the same age, they can be ordered alphabetically. Examples on stable sorting algorithms include bubble sort and merge sort. Quick sort is an example on an unstable sorting algorithm. Finally, keep in mind that sorting stability may depend on algorithm implementation. In other words , some unstable algorithms can be implemented in a certain way to achieve stability.
Best Sorting Algorithm
In theory, a perfect sorting algorithm is expected to be stable, in-place, worst case running time of O(n log(n)) and O(n) running time when data is sorted or almost sorted. In reality, there is no such a perfect sorting algorithm. Each algorithm has its own strengths and weaknesses. The performance of a sorting algorithm depends on data and application. We will emphasize this fact as we explore some of the most popular sorting algorithms below.
Popular Sorting Algorithms
We are done with the introduction, let us now explore some popular sorting algorithms.
1. Bubble Sort
Bubble Sort Description
Assume that we have an array of integer numbers. Our goal is to sort the array in an ascending order (i.e. numbers do not decrease from left to right). Bubble sort algorithm operates in passes. In the first pass, the algorithm keeps pushing the first element to the right by swapping numbers until it encounters a larger number. If that is the case, the algorithm continues pushing the encountered number to the right. This process is repeated until the algorithm puts the maximum number in the last position of the array. The second pass also starts with the first element and puts the second largest number in the second position of the array from the right. Each pass puts one number in its correct position. The number of passes needed to sort the array equals to (array size - 1) in worst case. The logic of passing through the array is the outer loop of the algorithm. In each pass, the number of comparisons needed is (n-1). Comparison and swapping goes into the inner loop of the algorithm. Note that the inner loop can be optimized as it is not necessary to perform (n-1) comparisons in every pass. We can stop comparing numbers the moment we reach the sorted portion of the array on the right side. For instance, the 5th pass in the example below may only perform 4 comparisons instead of 8. Finally, please recall that the algorithm is called bubble sort due to the fact that swapping numbers makes large numbers bubble up the list or shift from left to right. Let us now trace the logic using a trivial example.
Bubble Sort Example
A = [7, 1, 8, 3, 6, 4, 5, 9, 2]
//1st pass compares all adjacent pairs in the array and puts the max (9)
//in the last position.
//Note that (7) was shifted to the right until (8) was encountered.
//After that (8) was shifted to the right until (9) was encountered.
A = [1, 7, 3, 6, 4, 5, 8, 2, 9]
//2nd pass compares all adjacent pairs except last pair (2 9).
A = [1, 3, 6, 4, 5, 7, 2, 8, 9]
//3d pass compares all adjacent pairs except second pair from right (2 8)
A = [1, 3, 4, 5, 6, 2, 7, 8, 9]
//4th pass compares all adjacent pairs except 3d pair from right (2 7)
A = [1, 3, 4, 5, 2, 6, 7, 8, 9]
//5th pass compares all adjacent pairs except 4th pair from right (2 6)
A = [1, 3, 4, 2, 5, 6, 7, 8, 9]
//6th pass compares all adjacent pairs except 5th pair from right (2 5)
A = [1, 3, 2, 4, 5, 6, 7, 8, 9]
//7th pass compares all adjacent pairs except 6th pair from right (2 4)
A = [1, 2, 3, 4, 5, 6, 7, 8, 9]
Each two adjacent numbers are compared together. If they are out of order they are switched by swapping. If you look at the example above, it clearly demonstrates the bubbling effect. For example,number 2 is bubbling all the way from right to left and number 7 is bubbling up from left to right.
Let us take the 1st pass and trace it in details
A = [7, 1, 8, 3, 6, 4, 5, 9, 2]
//7 > 1 swap
A = [1, 7, 8, 3, 6, 4, 5, 9, 2]
//7 < 8 no swap
A = [1, 7, 8, 3, 6, 4, 5, 9, 2]
//8 > 3 swap
A = [1, 7, 3, 8, 6, 4, 5, 9, 2]
//8 > 6 swap
A = [1, 7, 3, 6, 8, 4, 5, 9, 2]
//8 > 4 swap
A = [1, 7, 3, 6, 4, 8, 5, 9, 2]
//8 > 5 swap
A = [1, 7, 3, 6, 4, 5, 8, 9, 2]
//8 < 9 no swap
A = [1, 7, 3, 6, 4, 5, 8, 9, 2]
//9 > 2 swap
A = [1, 7, 3, 6, 4, 5, 8, 2, 9]
The array after the last swap represent the first pass
Bubble Sort Python Code
The following piece of code implements bubble sort in python
# Define an array of 9 elements
A = [7, 1, 8, 3, 6, 4, 5, 9, 2]
# Outer loop represents the passes of the algorithm
# len(A)-1 = 9 - 1 = 8
# range(8) is 0 to 7
# reversed of (0 to 7) is (i = 7 down to 0)
for i in reversed(range(len(A)-1)):
# Inner loop performs the comparisons
# For each pass we start with the first element A
# and keep pushing large numbers to the right
# we stop at the last sorted element at [i + 1]
# Note that python arrays start at i = 0 that is why
# the last sorted element is located at i + 1
for j in range(0, i+1):
# If the two adjacent elements are not in order then swap them
if A[j] > A[j+1]:
t = A[j]
A[j] = A[j+1]
A[j+1] = t
# By now A should be sorted
Bubble Sort Running time
Bubble sort is frequently used for academic purposes to demonstrate the sorting concept but it is not suitable for practical applications. As we have seen in the code above, there are two nested loops, each runs (n) times where (n) is the size of the array. Even if the inner loop is optimized as we indicated earlier, still the asymptotic running time is o(n^2) because for large values of (n) reducing the number of times the inner loop runs is not going to help. If you are curious, consider the following. Two full nested loops each runs (n) times means there are n x n times the comparison operation is going to execute. On the other hand, if the inner loop is going to decrease by one operation each iteration then the number of operations is 1 + 2 + 3 + ... + n which is n(n+1)/2. For small values of (n) there is a considerable difference between n(n+1)/2 and (n x n) however for large numbers they is no practical difference. o(n^2) is a polynomial running time which is slow (recall that fast algorithms run in linear or logarithmic time). The running time of bubble sort in the best case scenario is O(n). This is the case when the input array is already sorted as only one pass is needed. For an optimized implementation, you can refer to the following article .
Bubble Sort Space Complexity
Bubble sort is an in-place sorting algorithm, so it does not require additional space. Only one extra variable is needed to do the swapping so the space complexity is o(1) which is negligible.
Bubble Sort Stability
Bubble sort is a stable sorting algorithm. It does not change the relative position of equal elements. Comparison uses greater than or less than operators which do not trigger a swap operation if the elements are duplicates.
Bubble Sort Loop Invariant
The concept of a loop invariant is used to prove the correctness of a given algorithm. For detailed information about loop invariants please refer to the following article. For more information about the bubble sort loop invariant please refer to the following article.
This is a work in progress article and not finished yet. I will be updating the post about other popular sorting algorithms so stay tuned. Please use the comments section below for questions, corrections or feedback. Thanks for reading.