HarvardX CS50 Introduction to Computer Science

Course notes for HarvardX CS50 Introduction to Computer Science 2021.

Week 0 - Scratch

Representing information

Binary numbers use a base-2 numeral system where each place represents one bit.

16 8 4 2 1
25 24 23 22 21
  • 1 bit = 0 or 1
  • 1 byte = 8 bits

Bytes represent more complex data, like characters, by referencing standardized encodings. 01000001 is a in ASCII which is an 8 bit encoding. Unicode is a subset of ASCII that uses one byte for the first 128 code points and up to four bytes for other characters.

Algorithms

Step by step process for solving a problem. Comprised of:

  • functions (verbs or actions)
  • conditions (forks in the road)
  • boolean expressions (questions that decide where to go)
  • loops (words that lead to cycles)

Other features

  • variables (stored values)
  • threads (do multiple things at once)
  • events (respond to changes)

What questions rise from executing simple tasks?

  • efficiency
  • design/quality

Week 1 - C

Writing good code: correctness, design, and style.

C (programming language)

Main article c (programming language)

Integer overflow

An incremented value exceeds the available bits. 111 in binary plus one should equal 8, but without adding another bit to hold 1000, the available bits become 000.

Week 2 - Arrays

Compilation

Convert source code to machine code.

  1. preprocessing - text substitution for preprocessor commands #
  2. compiling - convert to intermediate assembly code
  3. assembling - convert assembly to binary
  4. linking - combine separate files into one

Debugging

Tool number one is to utilize diagnostic print statements to display the state of the program.

Tool number two is to utilize a debugger. For example, the GDB: Gnu Debugger allows you to see what is going on ‘inside’ another program while it executes or what another program was doing at the moment it crashed.

Tool number three is to utilize a ‘rubber ducky’. Talking through a problem aloud, to a ‘rubber ducky’, can help make the issue apparent.

Data storage

See c programming language data types.

Random access memory (RAM) is fast but volatile. It is purely electronic—requires electricity to store data—unlike long term memory which does not.

Arrays

See c programming language arrays.

Week 3 - Algorithms

Computational complexity

See computational complexity

Searching

Linear search

for i from 0 to n-1
  if the i'th element is 50
    return true
return false

Binary search

if no items
  return false
if number is middle item
  return true
else if number < middle item
  search left half
else if number > middle item
  search right half

Sorting

Selection sort

for i from 0 to n-1
  find smallest item between i'th item and last item
  swap smallest item with i'th item

Bubble sort

repeat n-1 times
  for i from 0 to n-2
    if i'th and i+1 elements out of order
      swap them
    if no swaps
      quit

Merge sort

if only one item
  quit
else
  sort left half of items
  sort right half of items
  merge sorted halves

Recursion

Recursion is when a function calls itself. Useful when the structure of data is defined by referring to itself.

Algorithm resources

Week 4 - Memory

Hexadecimal

Hexadecimal is base-16; uses 16 digits: 0 1 2 3 4 5 6 7 8 9 A B C D E F.

4096 256 16
163 162 161

Binary 11111111 (255) is equal to 0xFF in hex.

The ambiguous prefix 0x is added to hexadecimal numbers for clarity.

Pointers

See c programming language pointers.

Dynamic memory allocation

We can use pointers to get access to a block of dynamically-allocated memory at runtime. Allocated memory must be returned to the system.

malloc returns a pointer.

// Statically obtain an integer
int x;

// Dynamically obtain an integer
int *px = malloc(sizeof(int));

// Get an integer from the user
int x = GetInt();

// Array of floats on the stack
float stack_array[x];

// Array of floats on the heap
float* heap_array = malloc(x * sizeof(float));

Program memory

  • machine code
  • globals
  • heap
  • stack

Week 5 - Data structures

See data structure.

Array

Arrays occupy contiguous bytes of memory.

  • search: O(1)
  • insert: O(n)

Disadvantage is that arrays must be duplicated to grow in size, making them difficult to resize efficiently.

Linked list

Linked lists are dynamic data structures comprised of nodes pointing to other nodes.

// The name `node` here is similar to a
// function prototype. It is required to
// use `node` inside of this typedef.
typedef struct _node {
  int number;
  struct _node *next;
} node;
  • search: O(n)
  • insert: O(1)

Insertion running time is O(1). Disadvantage is the lack of random access and increased memory usage.

Binary search tree

Binary search trees are recursive data structures where the height of the tree is log n.

typedef struct _node {
  int number;
  struct _node *left;
  struct _node *right;
} node;

The running time of insert on a balanced binary search tree is equivalent to the number of steps it takes to find the location in which the new number belongs—O(log n).

  • search: O(log n)
  • insert: O(1)

Disadvantage is the rapid degeneration of balance.

Hash table

Hash tables are a data structure that can be implemented as an array of linked lists dependent on a hash function. Hashing means to look at some input and produce some numeric output based on some characteristic of the input. A hash function should be deterministic and uniformly distribute data. Collisions need to be handled.

  • search: O(1)
  • insert: O(1)

Disadvantage is increased memory usage.

Trie

Tries are a data structure that is a tree of nodes. Each node is an array of pointers to other nodes.

typedef struct _trie {
  char university[20];
  struct _trie* paths[10];
} trie;
  • search: O(1)
  • insert: O(1)

Disadvantage is increased memory usage.

Abstract data structures

Queue

A queue is a first in first out (FIFO) data structure.

Methods:

  • enqueue
  • dequeue

Stack

A stack is a last in first out (LIFO) data structure.

Methods:

  • push
  • pop

Dictionary

A dictionary is a set of associated key/value pairs.

Week 6 - Python

See python (programming language).

Week 7 - SQL

Databases

See database.

Flat-file database

A flat-file database is a database stored as an unstructured file. i.e. csv, tsv.

Relational database

A relational database stores and provides access to data points that are related to one another represented as tables. Each row in the table is a record with a unique ID called the key. The columns of the table hold attributes of the data.

SQL

See sql.

  • Stores data in a binary file
  • Requires a user facing program to manage

A primary key uniquely identifies each row. A foreign key references the primary key of another table.

Week 8 - HTML, CSS, & JavaScript

Internet

See Internet Protocol Suite, IP address, and DNS.

TCP/IP

  • Internet Protocol (IP) standardizes how computers address each other and is responsible for relaying packets of data across network boundaries.
  • Transmission Control Protocol (TCP) provides reliable, ordered, and error-checked delivery of a stream of bytes between applications running on hosts communicating via an IP network.
  • Domain Name System (DNS) converts human readable domain names to numerical IP addresses.

HTTP

Hypertext Transfer Protocol (HTTP) is an application layer protocol in the internet protocol suite model for distributed, collaborative, hypermedia information systems.

Some supported commands:

  • GET - Request information from a server
  • POST - Send information to a server

Web languages

See HTML, CSS, and JavaScript.

Week 9 - Flask

Flask

Building web applications with Flask.

Cookies

A cookie is a packet of data stored in device memory that allows a website to recognise users when they return to a site often allowing access to different content or services.

Artificial intelligence

Search algorithms

  • Depth-first search
  • Breadth-first search
  • Greedy best-first search
  • A* search

Machine learning

Reinforcement learning

Algorithms manage the exploration vs. exploitation trade-off. One such method is epsilon-greedy, where 0<ϵ<10 < \epsilon < 1 is a parameter controlling the amount of exploration vs. exploitation.

Genetic algorithms

A genetic algorithm is a metaheuristic inspired by the process of natural selection that belongs to the larger class of evolutionary algorithms.

Neural networks

Artificial neural networks are comprised of a node layers, containing an input layer, one or more hidden layers, and an output layer. Each node, or artificial neuron, connects to another and has an associated weight and threshold. If the output of any individual node is above the specified threshold value, that node is activated, sending data to the next layer of the network. Otherwise, no data is passed along to the next layer of the network.