Rule-based systems, grammars, and Markov chains

Lesson 1 - Rule-based systems

Both formal and informal rule-based systems (prediction systems, substitution systems, or re-writing systems) have been used to guide behavior and thought for millennia. Formal rule-based systems are ubiquitous in computer science.

Types of rule-based systems

  • Generative grammars
  • L-systems
  • Shape grammars
  • Automaton
    • Transition networks
    • Augmented transition networks
    • Petri nets
  • Markov chains
  • Hidden Markov chains

Lesson 2 - Generative grammars

Rule-based systems have been used for centuries to describe the syntax of natural languages—a formal grammar.

Generative grammar for syntax

Human languages have three dimensions: syntax, semantics, and pragmatics. We can formalize the syntax of a language using a grammar and, with a formal grammar, can test if a given sentence is part of the language or not (syntactically correct) and generate new sentences using the rules.

What is a rule?

Generative grammars are made of rewrite rules: rules in which the left side of the rule is rewritten by the right side and where the sides are expressions composed of terminal symbols (concrete elements such as nouns, verbs, adjectives, etc.) and nonterminal symbols (lexical categories such as sentence, verbal phrase, nominal phrase, noun, verb, etc.).

Simple English sentence grammar

G forms a grammar:

  • G = [V, VT, R, S]

V is a set of nonterminal symbols:

  • S - sentence
  • VP - verb phrase
  • NP - noun phrase
  • V - verb
  • PP - prepositional phrase
  • AP - adjective phrase
  • Adv - adverb
  • P - preposition
  • Det - determinant
  • N - noun

R is a set of rules where parenthesis indicate optional elements and an initial symbol is a nonterminal symbol:

  • S → NP VP
  • VP → V (NP) (PP)
  • AP → (Adv) A (PP)
  • PP → P NP
  • NP → (Det) (AP) N (PP)

VT is a set of terminal symbols:

  • VT = {“Sally”, “Bob”, “dog”, “cat”, “house”, “a”, “the”, “follows”, “likes”, “knows”, etc. }

S, some of the nonterminal symbols can be substituted by terminal symbols:

  • N → “Sally”, “Bob”, “dog”, “cat”, “house”
  • Det → “a”, “the”
  • V → “follows”, “likes”, “knows”
  • etc.

A syntax tree can be constructed using the rules of the grammar to test if a given sentence belongs to the language or to generate new ones. It should be noted that this method can result in sentences that are grammatically correct, but meaningless such as “the house knows a very big Bob.”

Lesson 3 - The Chomsky hierarchy

Typology of generative grammars for formal languages. In decreasing order of generative capacity:

Grammar Restrictions Formal language Automaton Generative capacity Complexity Expressivity
Type-0 grammar (unrestricted grammars) No restriction on either side of the production rules Recursively enumerable or partially decidable Non-deterministic Turing machine Very high Undecidable The most expressive but too complex to be practical
Type-1 grammar (context sensitive grammar) The number of symbols on the right must be greater than or equal to the number of symbols on the right Context sensitive Linear-bounded automaton High Exponential (All languages of type-1 and above are decidable) Expressive but too complex to be practical in most cases
Type-2 grammar (context-free grammar) The left hand side of the production rules must consist of one single nonterminal symbol Context-free Pushdown automaton Medium Polynomial
Type-3 grammar (regular grammar) The left hand side must consist of one single nonterminal symbol and right is a terminal followed by a nonterminal (right-linear) OR a terminal preceded by a nonterminal (left-linear) Regular Deterministic finite automaton or non-deterministic finite automaton (and Markov models) Low Linear

Determinism and indeterminism

Choice between several rules. Can be managed as a probabilistic grammar.

  • R1: S = NP VP NP - p(R1)=0.1, rare
  • R2: S = NP VP - p(R2)=0.3, occasional
  • R3: S = NP - p(R3)=0.6, common

Lesson 4 - Grammars in generative art and computational creativity

Grammars are powerful and expressive formalism that can elicit complex structures and recurrence and are good at capturing hierarchical structures. They are human readable and allow us to encode determinism and indeterminism. They are precise descriptors of the knowledge involved in the generative process. Generative grammars are limited in that their structures are often rigid and the way they encode syntax is difficult to extend to other domains with the same formalism.

Grammars are typically used alongside other modules, not alone, and the use of grammars for natural language generation exceeds mere creative or artistic purposes.

Generative grammars in music

Grammatical models have been used for the description and analysis of music for centuries but have only been utilized as generative tools since the advent of computers. With systems that learn and achieve generative tasks—like style imitation—we are shifting towards computer-assisted creativity tools.

Generative grammars in literature and poetry

Approaches to automatic text generation:

  • Combinatorial - putting together existing pieces of text
  • Procedural - creation of new text from a system’s encoded knowledge

Generated text can be linear or nonlinear.

Lesson 5 - Transition networks

Transition networks are a set of finite state automata. Graphically, nodes are states and edges are transitions. Each automation represents a nonterminal symbol and each automation produces a terminal or a nonterminal symbol.

Augmented transition networks

Extended to allow more instructions to be associated with the transitions via jumps, conditionals, and subprograms, augmented transition networks have the expressibility of type-0 grammars but are more practical to manage due to their typographical representation.

Style imitation

Given a set of artifacts representing a given style (S), a style imitation system will generate new artifacts that an unbiased observer would classify as belonging to style S.

Lesson 6 - L-systems

Lindenmayer systems (L-systems) were developed by Aristid Lindenmayer to formalize the growth process of plants. They are parallel rewriting systems—all of the production rules are applied simultaneously. There are no distinctions between terminal and nonterminal nodes; all symbols correspond to an action.

L-systems give the user power over the morphogenesis of the output’s development per iteration. They are a family of powerful rule-based rewriting systems, simple, and readable. They are high in expressivity and computationally cheap. Currently however, there are no good learning algorithms.

An L-system is a triplet (v, w, P) where:

  • v is the alphabet (a finite set of symbols)
  • w is the initiator (seed, axiom, etc.) where w is part of v+
  • P is a finite set of production rules where the predecessor belongs to v and the successor belongs to v*

Using only one rule, L-systems can model self-similar structures (fractals) which happen to be the natural pattern present in the growth of plants.


L-systems can be context-free or context-sensitive. In context-sensitive systems, different production rules apply in different contexts:

  • P1: a > b → aba
  • P2: a < b → bab
  • P3: a < b > a → baa

L is the number of symbols considered in the rules, therefore, P1 and P2 are 1L-system rules and P3 is a 2L-system rule.

Stochastic L-systems

While in deterministic L-systems, there is at most one rule per symbol, in a stochastic L-system, there can be many productions for a single symbol. Each rule is assigned a probability, introducing variety in the output.

Parametric L-systems

Parametric L-systems allow the attachment of parameters to symbols. This is useful for generating animated forms or add conditions to prediction rules. Parameters can also be randomized to create more diversity.

Lesson 7 - L-system based art, music, and architecture

  • Shape grammars (design and architecture)
  • L-systems are used less frequently in music than other types of grammars, but still have a history of being mapped to musical events and concepts.

Lesson 8 - Markov models

Markov models are used to model sequences of discrete events. Introduced by Andrei Markov in the early 20th century.


Using a frequency distribution, it is possible to calculate the probability of all notes in a composition (P of X):

  • Xt is a random variable representing the note at time t
  • P(Xt) is the probability distribution of the events represented by the random variable X at time steps 0..t

While sounding better than a uniform distribution, the above model can be improved by modeling transitions via conditional probability distributions that take the context into account.

  • P(Xt|Xt-1) represents the conditional probability distribution.

The Markov assumption is that the future depends only on the present or a limited part of the past (d).

  • d is the order of the Markov model
    • Order 0: probability distribution of the events, P(Xt)
    • Order 1: conditional probability distribution, P(Xt|Xt-1)
    • Order 2: conditional probability distribution, P(Xt|Xt-1,Xt-2)
    • Order d: conditional probability distribution, P(Xt|Xt-1,Xt-d)

Learning algorithm

Construct a transition count table that counts for each possible context the frequency distribution of continuations and divide each count by the number of possible continuations for the context so that each line adds up to 1.


  • Finite state automaton: Each state is a node and the transitions are edges weighted by the transition probabilities
  • Fixed-order Markov models are equivalent to type-2 stochastic grammars i.e. S → C (0.2), S → B (0.5), etc.

Variable order Markov models

Models that predict the conditional probability distribution at any level of depth. Often composed of three components:

  1. Counting - build up the transition table
  2. Smoothing - deal with unseen sequences (zero frequency problem) or dead ends (continuations of lower orders)
  3. Variable length modeling

Approaches to variable length modeling:

  • Transition matrix
  • Probabilistic suffix tree
  • Factor Oracle and Context Tree Weighting method
  • Lempel-Ziv 78 (LZ78) and LZ-MS
  • Prediction by partial match

Lesson 9 - Markov chains in generative art and computational creativity


  • Intuitive and easy to understand
  • Computationally inexpensive


  • “Short memory”
  • Randomness in the output
  • Lack of overall structure
  • Equivalence (transient or recurrent classes)
  • Limited to one-dimensional symbolic sequences
  • Limited to style imitation, parody, and pastiche

Lesson 10 - Hidden Markov models

The state of the system becomes hidden and only accessible through the emission probabilities. Useful in weather and sensor data, speech recognition, gesture recognition, melody accompaniment or harmonization, composition interpretation of a score, etc.


  • The Forward Algorithm - computes the probability that a sequence occurred (assumes known transition and observation probabilities as well as the initial state)
  • The Baum-Welch Algorithm - computes the most likely parameters on the basis of observed sequences
  • The Viterbi Algorithm - calculates the most likely sequence of hidden states (Viterbi path) on the basis of observable sequence


  • Efficient and popular to learn and generate discrete sequences
  • Allow modeling vertical dependencies. HMM is coupled stochastic process
  • Better at respecting structure than regular Markov models


  • Require a good understanding of the domain or task to fine tune the model architecture
  • Requires a large training set