Sorting Algorithms — Basics

Disclaimer: Most of the content below will be cited. If any sentence is not cited, I want to clarify that this article is only for knowledge purpose and I will not be earning any money from it.

Related Story : Sorting Algorithms- Properties/Pros/Cons/Comparisons

Loop Invariant

It is relationship between variables in your program which holds true just before the loop is ever run (referred as establishing the invariant) and is true again at the bottom of the loop (referred as maintaining the invariant)

Stable sorting algorithms:

A sorting algorithm is said to be stable if two objects with equal keys appear in the same order in sorted output as they appear in the input array to be sorted.

  • Sorting “straw” and “spork”, w.r.t 1st character:

With Stable Sort, the Second character remain same (st will come before sp as result) i.e. when sorting with respect to 1st character ”straw” will come before “spork”.

  • Stability is mainly important when we have key value pairs with duplicate keys possible (like people names as keys and their details as values). And we wish to sort these objects by keys

Example

Insertion, Radix Sort, Counting Sort, Merge sort (is one of the most powerful sorting algorithms)

  • Insertion is Stable : If 2 elements in array are at duplicate, their position will not interchange. Only numbers larger will be considered

Running Time:

  • The running time of an algorithm on a particular input is the number of primitive operations or “steps” executed
  • The running time of the algorithm is the sum of running times for each statement executed
    linear function of n : an + b
    quadratic function of n : an² + bn + c

Different Sorting Algorithms:

Insertion Sort Algorithm

  • Bringing all sorted element to left | Unsorted/pending remains on right | e.g.: Sorting via Hand
  • 2 Subsets : LEFT -> Sorted ; RIGHT -> Unsorted subset. In the end, unsorted subset becomes empty
  • During i-th iteration, we insert A[i] into the sorted sequence A[1..i-1]
  • Insertion sort is an efficient algorithm for sorting a small number of elements
  • INSERTION-SORT can take different amounts of time to sort two input sequences of the same size depending on how nearly sorted they already are.
  • In INSERTION-SORT, the best case occurs if the array is already sorted.
  • If the array is in reverse sorted order i.e in decreasing order, INSERTION-SORT gives the worst case results.
  • Average Case: When half the elements are sorted while half not
  • The running time of insertion sort therefore belongs to both Ω(n) and O(n²)

Inversion

  • Inversion count is max when array is reverse sorted
  • It is least i.e. 0 when array is already sorted in increasing order

Divide and Conquer

  • Divide the problem into a number of sub-problems that are smaller instances of the same problem.
  • Conquer the sub-problems by solving them recursively. If the sub-problem sizes are small enough, however, just solve the sub-problems in a straightforward manner.
  • Combine the solutions to the sub-problems into the solution for the original problem.

Merge Sort

Divide: The divide step just computes the middle of the subarray, which takes constant time. Thus, D(n) = Θ(1).
Conquer: We recursively solve two subproblems, each of size n=2, which contributes 2T(n/2) to the running time.
Combine: We have already noted that the MERGE procedure on an n-element subarray takes time Θ(n), and so Combine(n)=Θ(n).

  • Merge Sort is a comparison based sort algorithm.
  • It is stable sorting algorithm
  • Merge sort is preferred for linked list over arrays.
  • MERGE Algorithm : Mergeing L[1 .. n1 + 1] and R[1 .. n2 + 1] via sorting => 0(n) Time Complexity
  • Select the smallest value from the front of each list (excluding values already in the sorted array)
  • When one list becomes empty, copy all values from the remaining array into the sorted array
  • Merge Sort’s running time is Ω(n log n) in the best-case, O(n log n) in the worst-case, and Θ(n log n) in the average-case (when all permutations are equally likely).
  • The space complexity of Merge sort is O(n). This means that this algorithm takes a lot of space and may slower down operations for the last data sets.

Insertion sort vs Merge Sort :

Click here

HeapSort:

It is the slowest of the O(nlogn) sorting algorithms but unlike merge and quick sort it does not require massive recursion or multiple arrays to work.

Quick Sort

Best explainaintion : https://medium.com/karuna-sehgal/a-quick-explanation-of-quick-sort-7d8e2563629b#238d

  • The quick sort is an in-place, divide-and-conquer, massively recursive sort algorithm.
  • It can be said as the faster version of the merge sort.
  • The efficiency of the algorithm is majorly impacted by which element is chosen as the pivot point.
  • The worst-case efficiency of the quick sort is o(n²) when the list is sorted and left most element is chosen as the pivot.
  • As long as the pivot point is chosen randomly, the quick sort has an algorithmic complexity of O(n log(n)).

Partitioning

  • If the partitioning is balanced, the algorithm runs asymptotically as fast as merge.
  • If the partitioning is unbalanced, however, it can run asymptotically as slowly as insertion sort.

Worst-case partitioning:

  • The worst-case behavior for quick-sort occurs when the partitioning routine produces one sub-problem with (n-1) elements and one with 1 element.
  • Assume that this unbalanced partitioning arises in each recursive call. which evaluates to 0(n²)

Best-case partitioning

  • PARTITION produces two subproblems, each of size no more than n/2, since one is of size ⌊n/2⌋ and one of size ⌊n/2⌋ -1.

Average-case Partitioning / Balanced Partitioning:
The running time is O(n lg n) whenever the split has constant proportionality in each (sub)partition like 1:9

Types of Quick Sort:

1. Deterministic Quick Sort:
2. Randomized Quick Sort:
3. Quick Sort with selection:

Quick Sort vs Merge Sort:

Click here

Randomized Quick Sort

  • Recall Randomized Quicksort. Let X be a random variable denoting the running time of it on input array A of size n which is already sorted. Let Y be a random variable denoting the running time of it on input array B of size n which is not sorted. Then, E[X] < E[Y ]. subarray). Hence, E[X] = E[Y ]

Quick Sort with Selection [“Randomized” Algorithm]

  • Using the Select algorithm (that finds k-th smallest element in linear time) as a sub-routine, we can design a comparison-based any comparison-based sorting algorithm would require Ω(n log2 n) comparisons (hence the runtime).

Full-Stack Data Engineer | System Design Enthusiast | ayusharora.me