Algorithms- Recurrence Relation, Solving recurrences, Order of Growth, Hashing notes
Disclaimer: Most of the content below will be cited. If any sentence is not cited, I want to clarify that this article is only for knowledge purpose and I will not be earning any money from it.
Related Story : Sorting Algorithms- Properties/Pros/Cons/Comparisons
Growth Relation
[ <<< symbol => (grows much much slower than) ]
Constant <<< Poly-logs <<< Polynomials <<< Factorial (n!) <<< Exponential (n^n)
Example:
1 < log(n) < n^(1/2) < n (i.e. n¹) < n log(n) ~ log(n!) < n² < n³ <<< 2^n < 3^n <<< n! <<< n^n
Recurrence Equation/ Recurrence/ Recurrence Relation
- A recurrence is an equation or inequality that describes a function in terms of its value on smaller inputs.
- When an algorithm contains a recursive call to itself, we can often describe its running time by a recurrence equation, which describes the overall running time on a problem of size n in terms of the running time on smaller inputs.
Solving Recurrences:
Recursion Tree Method
- The recursion-tree method converts the recurrence into a tree whose nodes represent the costs incurred at various levels of the recursion.
- We use techniques for bounding summations to solve the recurrence.
Master Method/Theorem
The master theorem for DIVIDE-AND-CONQUER RECURRENCES provides an asymptotic analysis for recurrence relations of types that falls under any one of the three cases of master’s theorem.
Substitution Method
- It is used to establish either upper or lower bounds on a recurrence
Substitution Method vs Master Method vs Recurrence equation
Similarity
- All 3 are used for analyzing divide-and-conquer algorithms
Differences
- Recurrence equation: It describes the overall running time on a problem of size n in terms of the running time on smaller inputs.
- In the substitution method, we guess a bound and then use mathematical induction to prove our guess correct.
- The master method provides bounds for recurrences of the form
T (n) = a . T (n/b) + f (n)
Why choose Iteration over Recursion?
• Recursive calls can have have stack overflow problem over iteration
• Each recursive call requires use of heap space in memory unlike iteration
• Recursive call incurs space allocation instructions which requires extra CPU running time
Order of Growth / Rate of Growth
We usually consider one algorithm to be more efficient than another if its worst-case running time has a lower order of growth
Example
(T^1.8) [lower order of growth] is more efficient than (T² )
- x amount of work will be done in just (T¹.8) by algorithm 1, whereas algorithm 2 requires T²
- Large enough inputs, Theta(n¹.8) algorithm, for example, will run more quickly in the worst case than Theta(n²) algorithm.
Lower is faster
Analyzing algorithms
Asymptotic Notation:
Input sizes are large enough to make the order of growth of the running time relevant
Average-case analysis [Θ Notation]
For a given function g(n), we denote by theta(g(n)) the set of functions
theta(g(n)) = f(n): there exist positive constants c1, c2, and n0 such that 0 0<= g(n) <= f(n) <= c2 g(n) for all n>=n0
The “average case” is often roughly as bad as the worst case.
f(n) lies in set Θ(g(n)) | g(n) is asymptotically tight bound for f(n)
An asymptotically positive/nonnegative function is one that is positive for all sufficiently large n
Best Case — Lower Bound — [ Ω (>=) (Asymptotic Lower Bound) ; ω(>)]
f(n) lies above c.g(n) on graph
Worst-case [O (<=)(Asymptotic Upper Bound) AND o(<)]
- The worst-case running time of an algorithm gives us an upper bound on the running time for any input
f(n) lies below/is bounded by c.g(n) on graph
Average-case analysis vs Best Case vs Worst-case [Input based i.e Polynomial]
- Worst case run-time means that you are feeding the worst possible input (of that size) into your algorithm.
- Best case run-time means that you are feeding the best possible input into your algorithm.
Big O vs Big Omega vs and Big Theta
These refers to a way of bounding complicated functions by a simpler function
Properties of Asymptotic Notations
- Reflexivity: f(n) is O/Big Omega/theta of f(n)
- Symmetry: f(n) = Θ(g(n)) if and only if g(n) = Θ(f(n))
- Transistivity: f(n) = O(g(n)) and g(n) = O(h(n)) ⇒ f(n) = O(h(n))
- Transpose symmetry: f(n) = O(g(n)) if and only if g(n) = Ω(f(n))
General Recurrence Relation for Divide and Conquer based Problems
T(n) = a . T (n/b) + f(n) => Used in Master Method/Theorem
Dynammic Programming
- Recursive implementation of dynamic programming algorithm without memoization is better than iterative implementation when it comes to real-world applications. (e. tail call recursion/optimization)
Longest Common Sub-sequence
Sub-sequence
A sub-sequence is a sequence that can be derived from another sequence by deleting some elements without changing the order of the remaining elements
The length of a/the longest common subsequence (LCS) of “ADBCD” and “DAXBC” is 3.
Hashing
Load Factor = (Keys length) n/m (Table length)
Hash Tables/ Functions/ Hashing
Hash Function
- Given two different hash functions h1 and h2, it would be better to use h1 (than h2) in practice if h1 leads to less collisions than h2 for every universe of keys as searching will be faster with less collisions
- Suppose we use some hashing function h and chaining to implement hash tables. We INSERT two items (x first, followed by y) into an empty hash table where h(x.key) = h(y.key). After we insert y, the table would no longer have a reference to x as inserting y would overwrite to the slot with a reference to x.
- For any deterministic algorithm that takes a permutation as an input, let T (n) be its worst-case running time (on input size n) and S(n) be its average-case running time (assuming that each of the n! permutations is equally likely). Then, for large enough n, we always have S(n) = O(T (n)) but S(n) = o(T(n)) may or may not hold.
Good Hash Functions:
- Easy to compute : Should not be an Algo itself, will not be worth it
- Even Distribution:
- Less Collision
Hashing by division and hashing by multiplication VS Universal Hashing
- Hashing by division and hashing by multiplication, are heuristic in nature, whereas universal hashing, uses randomization to provide provably good performance.