Paul Lemelle

View Original

Binary Search & Sorting

Using linear search O(n) is an inefficient way to search for a specific item because its worse case is it must search through all of the elements in a collection. In additionally, the performance of using linear search will increase as the number of data in the collection increases. For example, if the collection has a billion elements, the worse case means linear search would need a billion operations to find the specific element (assuming the element is the last item in the collection). For any computer, it would probably take longer than the idle performance.

Binary Search is a more efficient algorithm than linear search as it uses a concept called ‘Divide and Conquer,’ which is basically just creating subset(s) of problems to solve. In the case of Binary Search, the subproblem would be to cut the N elements in the collection in half. It divides the N elements in half by first ordering the collection then using a high/low search pattern to filter the results. For example, searching for the number 24 in a collection of 100 ordered numbers would simply start at the middle index (50) then eliminate half of the elements in the collection because it’s over the search value. After eliminating half the N values, the Binary Search that will be shown in the example uses a while loop to repeat the ‘Divide & Conquer’ process.

In Big O notation terms, Binary Search is considered O(log n). The log in O(log n) is for logarithm, which is just the inverse of an exponent. For example, logarithm 2^3 (8) would be 3.

Binary Search requires a sorted collection, and thankfully most programming languages (including Swift) have a sort method built in, but there are times that you may want to choose a different sorting algorithm. Selection Sort & Quick Sort take two different approaches to the sort problem and how the performance can vary between the two.

Selection Sort:

Selection Sort compares elements of a collection from the smallest to the largest element. It looks at the first index of the collection then compares that index sequentially with all elements in the collection. For example, if it finds that index 0 is higher than index 20 then it has to move index 20 to position 0. Selection Sort repeats this process until the entire collection is sorted. Selection Sort is considered to be O(n x n) because 1) it must search through all elements of the collection, and 2) move the elements through the collection. So worse case, using a Deletion Sort would be twice as long as doing a linear search.

Quick Sort:

QuickSort is considered a ‘Divide & Conquer’ algorithm and is considered O(n log n) in Big-O notation on average. What does O(n log n) mean in relation to using QuickSort? QuickSort picks an element from a collection, O(n… ) as the collection grows, so does the number of operations required to the collection. The process of picking an element is called a pivot, and choosing the wrong pivot will impact the performance of sorting the collection. After selecting a pivot, QuickSort finds elements that are smaller than the pivot and elements that are larger than the pivot, this process allows for paritioning / dividing the problem into three parts (1: elements smaller than the pivot, 2: the pivot itself, and 3: elements larger than the pivot). As the algorithm creates sub-arrays, the problem that its solving decreases (i.e. …log n). QuickSort does this by recursively calling itself until its sub-arrays are all sorted.

The Project:

Picking up from the Set branch, I used Binary Search & Linear search algorithms to compare finding a random element in the dataset. I included a variable to print out the number of iterations each algorithm used to find the specific element, which should start to give you an idea of why algorithms are important.

But in case it does not…. I also compared Selection Sort & QuickSort algorihtm to sort the dataset, and this turned out be a great example of choosing the wrong algorithm can slow up a fast computer.

GitHub Link.