Arrays: — Paul Lemelle

With software you will use data whether it’s user generated data or data represented within your software. And depending on the need of your software, you will need to plan how the data will scale for the user. And choosing the correct or incorrect data structure will greatly impact the effectiveness of your software. Working with small amounts of data (i.e. 100 names) this is not a problem for modern computers / mobile devices, and you have to figure out how you will access the data within your software.

Arrays:

Arrays are probably one of the most common data structures of any programming language. In short, a data structure is a way for a computer to read & write data. A linear array (simplest form) is an ordered collection of values that are stored contiguously in memory. Reading an element from an array tends to be fast, but problematic when you need to store more elements than originally planned or conversely allocate more memory than required. To select an element in an array, you must know its specific index.

Big-O Notation:

The effectiveness / “speed” of an array depends on the operation that is being performed. For example, searching for a particular element of an unordered array using a linear search algorithm would take N operations based on the number of elements in the array. In other words, the time it takes to find an element would increase as the elements in the array increases. In Big-O Notation this is considered “Oh of N” and is written as O(n) to denote Big-O notation.

Searching an array is just one operation when using an array, Big-O Notation is also used to describe other common operations of an array. Because arrays store data contiguously in memory, it only takes one operation to read an element of an array. For Big-O operations that only take one step, this is considered ‘Constant Time’ and is written as O(1). Constant Time is the most efficient operation of a that is used with a data structure or algorithm.

Using Big-O Notation to measure the operation of ‘inserting’ elements of array depends on where in the array you choose to modify the array. Modifying an array by inserting elements at the end of an array just takes one operation or O(1) time. However, inserting elements at the beginnning or middle of an array takes longer because the array is required to shift all its other elements that is previously stored in the array. The worse case scenario is inserting elements at the beginning of an array requires all the other elements to be shifted by one and its effectiveness would be based on the number of elements in the array. So, one step to ‘insert an element’ and N steps to shift the rest of the data in the array. Likewise, Deleting elements of array also requires elements to be shifted because of how arrays are stored in memory.

The Dataset:

To help illustrate this point, I created a project with a large data set from Kaggle that look at unemployment rates across various cities in the United States over the last few decades. I felt that this is probably more respective of a real world scenario.

The first thing that I had to do was to clean up the data set, which just basically means that I had to ensure that the output was consistent (ie. removing extra spaces or excluding empty data for a particular year) enough to process it. Once the data is in an array, we will need to search for a particular element in the array, which is done by using Simple Search.

The Project:

The project is hosted on my GitHub account with instructions to run it in repositiory. Other data structures & algorithms are implemented in different branches that can be access directly on GitHub or links on this website.

Project link.