Data structure

A data structure is a format for organizing, managing, and storing data so that it can be efficiently accessed or modified. A data structure is a concrete representations of data, whereas abstract data types define the logical form of the data type. Understanding the performance a data structure provides relies upon analysis of the algorithm's computational complexity.

Overview

Data structures are classified into two categories:

  1. linear data structures
  2. non-linear data structures

Linear data structures

A structure where elements are arranged sequentially and attached to their next and previous adjacents are called linear data structures. This type of structure involves a single level and therefore all elements can be traversed in a single run.

Examples of linear data structures include: array, stack, queue, linked list.

Non-linear data structures

A structure where elements are not arranged sequentially are called non-linear data structures. This type of structure involves more than a single level and therefore all elements cannot be traversed in a single run.

Examples of non-linear data structures include: tree, graph.

Terminology

An interface represents the set of operations that a data structure supports. An interface only provides the list of supported operations, type of parameters they can accept, and return type of these operations.

Implementation provides the internal representation of a data structure. Implementation also provides the definition of the algorithms used in the operations of the data structure.

Static array

A static array is a fixed length container of contiguous chunks of memory containing n elements indexable—each position can be referenced with a number—from range [0, n-1]. The only way to access elements in an array is by their index. Array indexing is zero-based, meaning the first element is found at position zero.

Where are static arrays used?

  • Storing and accessing sequential data
  • Used by IO routines as buffers
  • Lookup tables and inverse lookup tables
  • Used to return multiple values from a function
  • Used in dynamic programming to cache answers to sub-problems

Complexity analysis of static arrays

Static array
Access O(1)O(1)
Search O(n)O(n)
Insertion N/A
Appending N/A
Deletion N/A

Dynamic array

A dynamic array differs from a static array in that it does not have a fixed length. Rather, a dynamic array can grow and shrink in size as needed.

Dynamic array implementation

One way of implementing a dynamic array is to use a static array.

  1. Create a static array with an initial capacity.
  2. Add elements to the underlying static array, keeping track of the number of elements.
  3. If adding another element will exceed the capacity, then create a new static array with twice the capacity and copy the original elements into it.

Complexity analysis of dynamic arrays

Dynamic array
Access O(1)O(1)
Search O(n)O(n)
Insertion O(n)O(n)
Appending O(1)O(1)
Deletion O(n)O(n)

Linked list

A linked list is a dynamic data structure comprised of nodes pointing to other nodes.

Where are linked lists used?

  • Used in many list, queue, and stack implementations
  • Great for creating circular lists
  • Can easily model real world objects such as trains
  • Used in separate chaining
  • Often used in the implementation of adjacency lists for graphs

Terminology

  • Head: the first node in a linked list
  • Tail: the last node in a linked list
  • Pointer: reference to another node
  • Node: an object containing data and pointer(s)

Singly vs doubly linked lists

Singly linked lists only hold a reference to the next node. Doubly linked lists hold a reference to the next and previous nodes and can therefore be traversed in both directions.

Pros Cons
Singly linked Uses less memory, simpler implementation Cannot easily access previous elements
Doubly linked Can be traversed backwards Takes 2x memory

Complexity analysis of linked lists

Singly linked Doubly Linked
Search O(n)O(n) O(n)O(n)
Insert at head O(1)O(1) O(1)O(1)
Insert at tail O(1)O(1) O(1)O(1)
Remove at head O(1)O(1) O(1)O(1)
Remove at tail O(n)O(n) O(1)O(1)
Remove in middle O(n)O(n) O(n)O(n)

Stack

A stack is a one-ended linear data structure which models a real world stack by having two primary operations: push and pop. A stack is a last in first out (LIFO) data structure.

Where are stacks used?

  • Used by undo mechanisms in text editors.
  • Used in compiler syntax checking for matching brackets and braces.
  • Used behind the scenes to support recursion by keeping track of previous function calls.
  • Can be used to do a Depth First Search on a graph.

Complexity analysis of stacks

Stack
Pushing O(1)O(1)
Popping O(1)O(1)
Peeking O(1)O(1)
Searching O(n)O(n)
Size O(1)O(1)

Stack implementation

One way of implementing a stack is by using a singly linked list:

  1. To initialize, point the head to a null node.
  2. To push, insert a new node before the head and adjust the head pointer to the new node.
  3. To pop, move the head pointer to the next node and deallocate the last node.

Queue

A queue is a linear data structure which models real world queues by having two primary operations: enqueue and dequeue. A queue is a first in first out (FIFO) data structure.

Where are queues used?

  • Any waiting line models a queue.
  • Used to keep track of the x most recently added elements.
  • First come first serve web server request management.
  • Breadth first search (BFS) graph traversal.

Terminology

  • Enqueue: add an element to the end of the queue
  • Dequeue: remove an element from the front of the queue

There is no consistent terminology for inserting and removing elements from a queue. Enqueue is sometimes called adding or offering; dequeue is sometimes called removing or polling.

Complexity analysis of queues

Queue
Enqueue O(1)O(1)
Dequeue O(1)O(1)
Peeking O(1)O(1)
Contains O(n)O(n)
Removal O(n)O(n)
is Empty O(1)O(1)

Prioirty queue

A priority queue is an abstract data type that operates similar to a normal queue except that each element has a certain priority. The priority of the elements determine the order in which elements are removed. Priority queues only support comparable data, meaning the data must be able to be ordered in some way from least to greatest or greatest to least so that it is possible to assign relative priorities to each element.

Where are priority queues used?

  • Anytime you need to dynamically fetch the 'next best' or 'next worse' element.
  • Used in certain implementations of Dijkstra's Shortest Path algorithm.
  • Used in Huffman coding which is often used for lossless data compression.
  • Best first search algorithms such as A*.
  • Used by minimum spanning tree algorithms.

Complexity analysis of priority queues with binary heap

Priority queue
Binary heap construction O(n)O(n)
Polling O(log(n))O(log(n))
Peeking O(1)O(1)
Adding O(log(n))O(log(n))
Naive removing O(n)O(n)
Removing (hash table) O(log(n))O(log(n))
Naive contains O(n)O(n)
Contains (hash table) O(1)O(1)

Inverting priority queues

Problem: Often the standard library of most programming languages only provide a min PQ which sorts by smallest elements first, but sometimes we need a max PQ. Since elements are comparable, they implement some sort of comparable interface which we can simply negate to achieve a max heap.

Map

Map implementations

Hash map

Tree map

A tree represents data as parent nodes linked to child nodes by edges.

Binary tree

A binary tree is a tree structure that adheres to the following properties:

  • The value of the left sub-tree is less than the value of its parent node
  • The value of the right sub-tree is greater than or equal to the value of its parent node

Graph

A graph represents data as nodes and edges.

Resources