Both formal and informal rule-based systems (prediction systems, substitution systems, or re-writing systems) have been used to guide behavior and thought for millennia. Formal rule-based systems are ubiquitous in computer science.
Types of rule-based systems
Rule-based systems have been used for centuries to describe the syntax of natural languages—a formal grammar.
Human languages have three dimensions: syntax, semantics, and pragmatics. We can formalize the syntax of a language using a grammar and, with a formal grammar, can test if a given sentence is part of the language or not (syntactically correct) and generate new sentences using the rules.
Generative grammars are made of rewrite rules: rules in which the left side of the rule is rewritten by the right side and where the sides are expressions composed of terminal symbols (concrete elements such as nouns, verbs, adjectives, etc.) and nonterminal symbols (lexical categories such as sentence, verbal phrase, nominal phrase, noun, verb, etc.).
G forms a grammar:
V is a set of nonterminal symbols:
R is a set of rules where parenthesis indicate optional elements and an initial symbol is a nonterminal symbol:
VT is a set of terminal symbols:
S, some of the nonterminal symbols can be substituted by terminal symbols:
A syntax tree can be constructed using the rules of the grammar to test if a given sentence belongs to the language or to generate new ones. It should be noted that this method can result in sentences that are grammatically correct, but meaningless such as “the house knows a very big Bob.”
Typology of generative grammars for formal languages. In decreasing order of generative capacity:
Grammar | Restrictions | Formal language | Automaton | Generative capacity | Complexity | Expressivity |
---|---|---|---|---|---|---|
Type-0 grammar (unrestricted grammars) | No restriction on either side of the production rules | Recursively enumerable or partially decidable | Non-deterministic Turing machine | Very high | Undecidable | The most expressive but too complex to be practical |
Type-1 grammar (context sensitive grammar) | The number of symbols on the right must be greater than or equal to the number of symbols on the right | Context sensitive | Linear-bounded automaton | High | Exponential (All languages of type-1 and above are decidable) | Expressive but too complex to be practical in most cases |
Type-2 grammar (context-free grammar) | The left hand side of the production rules must consist of one single nonterminal symbol | Context-free | Pushdown automaton | Medium | Polynomial | |
Type-3 grammar (regular grammar) | The left hand side must consist of one single nonterminal symbol and right is a terminal followed by a nonterminal (right-linear) OR a terminal preceded by a nonterminal (left-linear) | Regular | Deterministic finite automaton or non-deterministic finite automaton (and Markov models) | Low | Linear |
Choice between several rules. Can be managed as a probabilistic grammar.
Grammars are powerful and expressive formalism that can elicit complex structures and recurrence and are good at capturing hierarchical structures. They are human readable and allow us to encode determinism and indeterminism. They are precise descriptors of the knowledge involved in the generative process. Generative grammars are limited in that their structures are often rigid and the way they encode syntax is difficult to extend to other domains with the same formalism.
Grammars are typically used alongside other modules, not alone, and the use of grammars for natural language generation exceeds mere creative or artistic purposes.
Grammatical models have been used for the description and analysis of music for centuries but have only been utilized as generative tools since the advent of computers. With systems that learn and achieve generative tasks—like style imitation—we are shifting towards computer-assisted creativity tools.
Approaches to automatic text generation:
Generated text can be linear or nonlinear.
Transition networks are a set of finite state automata. Graphically, nodes are states and edges are transitions. Each automation represents a nonterminal symbol and each automation produces a terminal or a nonterminal symbol.
Extended to allow more instructions to be associated with the transitions via jumps, conditionals, and subprograms, augmented transition networks have the expressibility of type-0 grammars but are more practical to manage due to their typographical representation.
Given a set of artifacts representing a given style (S), a style imitation system will generate new artifacts that an unbiased observer would classify as belonging to style S.
Lindenmayer systems (L-systems) were developed by Aristid Lindenmayer to formalize the growth process of plants. They are parallel rewriting systems—all of the production rules are applied simultaneously. There are no distinctions between terminal and nonterminal nodes; all symbols correspond to an action.
L-systems give the user power over the morphogenesis of the output’s development per iteration. They are a family of powerful rule-based rewriting systems, simple, and readable. They are high in expressivity and computationally cheap. Currently however, there are no good learning algorithms.
An L-system is a triplet (v, w, P) where:
Using only one rule, L-systems can model self-similar structures (fractals) which happen to be the natural pattern present in the growth of plants.
L-systems can be context-free or context-sensitive. In context-sensitive systems, different production rules apply in different contexts:
L is the number of symbols considered in the rules, therefore, P1 and P2 are 1L-system rules and P3 is a 2L-system rule.
While in deterministic L-systems, there is at most one rule per symbol, in a stochastic L-system, there can be many productions for a single symbol. Each rule is assigned a probability, introducing variety in the output.
Parametric L-systems allow the attachment of parameters to symbols. This is useful for generating animated forms or add conditions to prediction rules. Parameters can also be randomized to create more diversity.
Markov models are used to model sequences of discrete events. Introduced by Andrei Markov in the early 20th century.
Using a frequency distribution, it is possible to calculate the probability of all notes in a composition (P of X):
While sounding better than a uniform distribution, the above model can be improved by modeling transitions via conditional probability distributions that take the context into account.
The Markov assumption is that the future depends only on the present or a limited part of the past (d).
Construct a transition count table that counts for each possible context the frequency distribution of continuations and divide each count by the number of possible continuations for the context so that each line adds up to 1.
Models that predict the conditional probability distribution at any level of depth. Often composed of three components:
Approaches to variable length modeling:
Advantages:
Issues:
The state of the system becomes hidden and only accessible through the emission probabilities. Useful in weather and sensor data, speech recognition, gesture recognition, melody accompaniment or harmonization, composition interpretation of a score, etc.
Algorithms:
Advantages:
Issues: