# Automata Notes

Published on the- Introduction
- Deterministic Finite Automata
- Chomsky Hierarchy
- Production Rules
- Non-Determinism
- Pushdown Automata
- Turing Machines
- Computable Numbers
- The Halting Problem
- Rice’s Theorem
- Busy Beaver Function
- Church-Turing Thesis
- Bibliography

## Introduction

- Automata are models of computation. Their input is a string of symbols, from a finite input alphabet, \(\Sigma\). Examples of alphabets include ASCII, Unicode and, most relevantly, the binary alphabet \(\{0, 1\}\). Formally, an alphabet is a set of symbols.
- \(\Sigma^*\), where \(\Sigma\) is a finite alphabet, denotes the set of all possible strings that can be made from \(\Sigma\). \(\epsilon\) denotes the empty string, which in most programming languages is denoted by
`""`

. For instance, \(\{0, 1\}^*\) refers to \(\{\epsilon, 0, 1, 00, 01, 10, 11, 000, ...\}\). - After receiving an input string, an automaton is said to either accept the string, or reject the string. Automata, accept and reject, purely based on their input; they are functions of their input. In functional programming, they would be called pure functions. The strings that an automaton \(M\) accepts are denoted by \(L_M\), where \(L_M \subseteq \Sigma^*\).
- Because the strings of all other finite alphabets can be encoded using the binary alphabet, we shall mostly consider automata that take strings of 0s and 1s. Note that \(\Sigma\) is finite, but provided it is non-empty, the cardinality of \(\Sigma^*\) is infinite. However, in this case, it is countably infinite (since we can describe a generator function that will eventually list all strings in \(\Sigma^*\)). Hence the cardinality of \(L_M\) is at most countably infinite. However, note still that \(\mathcal{P}(\Sigma^*)\) is uncountably infinite.
- To repeat, all languages over a non-empty finite alphabet contain a countably infinite number of strings, however the set of all possible languages over a non-empty alphabet is uncountably infinite.
- The importance of automata lies in their language-accepting power. For instance, if we have a kind of automata \(\mathfrak{A}\), that can be constructed to accept a proper superset of languages that a kind of automata, \(\mathfrak{B}\) can be constructed to accept, we can conclude that \(\mathfrak{A}\) is more powerful than \(\mathfrak{B}\). Accepting strings of 0s and 1s, might not seem to be exciting, till you consider that everything that is stored on a computer can be encoded in binary. For instance, \(\mathfrak{A}\) might be able to accept the language of binary-encoded prime numbers, or the language of binary-encoded Avril Lavigne songs.
- Since all possible languages over a non-empty alphabet is uncountably infinite, if all possible ways of a kind of automaton can be constructed is countably infinite, then we can conclude immediately that it cannot accept all possible languages. For instance, we might devise a way for an automaton to be reduced a natural number. In fact, even the most powerful model of computation atnd kind of automata, the Turing machine, can be reduced a natural number, therefore is an uncountably infinite number of languages, that a Turing machine and every computer hitherto known, cannot accept.
- If you believe in the Church-Turing hypothesis, which states there is no model of computation more powerful than Turing machines, then this holds for every possible model of computation and physical computer, that exists and is yet to exist.

## Deterministic Finite Automata

- Deterministic Finite Automata (DFAs), are automata that consist of a finite number of states \(Q\), and at any one time exist in one state. A DFA starts in the start state, \(q_0\) and for ever input symbol, there is a transition function \(\delta\) of the current state and the current input symbol, that gives us the next state of automaton. Hence, the next state of the automaton depends solely on two things, the previous state, and the current input symbol. Formally this can be written as: \[ \delta : Q \times \Sigma \to \Sigma \]
- The above pattern accepts the pattern \(\mathsf{0}\mathsf{1}^n\mathsf{0}\), where \(n \ge 1\), i.e. \(\mathsf{0}\) followed by one or more \(\mathsf{1}\)s, followed by another \(\mathsf{0}\). States are indicated by circles, and transitions between states, with lines marked with the symbol needed to transition between the states. For instance, \(\delta(q_0, \mathsf{0}) \equiv q_1\). The final state is indicated with with an additional line around it.
- The transition from \(q_2\) back to \(q_2\) demonstrates the ability of automata to form loops.
- If after accepting a string of input, a DFA is in one of a set of final states, \(F\) where \(F \subseteq Q\) then it is said the DFA accepts its input, otherwise it rejects its input.
- Formally, any DFA can be represented as a quintuple: \[ (\Sigma, Q, q_0, \delta, F)\]
- Note that \(\delta\) must be a total function, however often when describing DFAs, it is conventional to miss out certain transitions for certain inputs. It is implied that these transitions, lead to a “dead” state, and the DFA will stay in the “dead” state for the rest of input string and won’t be able to accept.
- We may also refer to the extended transition function, \(\hat{\delta}\), which is of a state, \(q\), string, \(w\), and maps onto to the state after a DFA has processed \(w\) from \(q\): \[ \hat{\delta} : Q \times \Sigma^* \to \Sigma \]
- The extended transition function, can be defined recursively in terms of the transition function.
- Using the notion of an extended transition function, it is also possible to define the language of a DFA, \(M\) as follows: \[ L_M \equiv \{x | x \in \Sigma^* \wedge \hat{\delta}(q_0, x) \in F\} \]

## Chomsky Hierarchy

Type | Automaton | Language |
---|---|---|

0 | Turing Machines | Recursively Enumerable |

1 | Linear-bounded Turing Machines | Context-sensitive |

2 | Pushdown Automata | Context-free |

3 | Finite Automata | Regular |

- The class of languages, equivalent to DFAs, are known as the regular languages. In 1950s, American linguist Noam Chomsky, devised a hierarchy of formal languages, and the grammars need to express them. Regular languages are at the bottom, generated by type 3 grammars. A grammar, is just a set of rules that can be used to generate every string of a given language.
- Regular languages, are not very powerful, although they are used a great deal in programming. This is due to regular expressions, which are expressions that define regular languages over an alphabet, usually Unicode or ASCII in programming.
- Most syntaxes for defining regular languages come from the POSIX standard, which was needed to maintain standards between UNIX variants. Regular expressions in UNIX, came about due to the work of Ken Thompson.
- Regular languages are not very expressive, for example, the language of valid emails (under RFC 5322) is not regular (and hence cannot be represented by a regular expression, unless you use one of the more sophisticated regular expression libraries such as in Perl, which in actual fact can express more than the regular languages).

## Production Rules

- Another way of expressing grammars, not just type 3 grammars, is by production rules. Production rules are a set of rules that are able to generate every string in a language. Production rules consist of two kinds of symbols: terminal and non-terminal, which form a disjoint union.
- We shall represent terminals by Greek letters, and non-terminals by capital Roman letters. Production rules have a start rule, which we shall call \(S\).
- Type 3 grammar rules are in the form \(X \to \alpha\), or \(X \to \alpha B\). A production rules states a way of going from a non-terminal symbol to a string consisting of non-terminal and/or terminal symbols. There might be multiple production rules for a given non-terminal symbol (if there weren’t then the grammar could only generate one string).
- Similarly, production rules may be self-referential, like loops in DFAs. The left-hand of a production rule must always be a non-terminal. Once production rules are continuously applied to the start symbol, and the string contains no non-terminals, then a string in the langauge of the grammar has been produced.
- Here is an example that produces the language of the above DFA:

\[ S \to \mathsf{0}M \] \[ A \to \mathsf{1}B \] \[ B \to \mathsf{0} \] \[ B \to \mathsf{1}B \]

## Non-Determinism

- For any state-symbol combination, a DFA defines exactly one possible transition. Another kind of automata are non-deterministic finite automata (NFAs).
- Unlike DFAs, NFAs can have multiple transitions for every state-symbol combination. There are two ways of thinking about this: either the next state of an automaton is one of several possibilities, or the automaton can exist in multiple states at once. If you take the former explanation, then the automaton “knows” which transition, if any, will lead to a final state.
- The NFA’s transition functions are defined as: \[ \delta: Q \times \Sigma \to \mathcal{P}(Q) \] \[ \hat{\delta}: Q \times \Sigma^* \to \mathcal{P}(Q) \]
- Acceptance is defined as: \[ L_M \equiv \{x | x \in \Sigma^* \wedge (\exists y \in F\; (y \in \hat{\delta}(q_0, x)))\} \]
This is as an NFA accepts its input string, if any chain of transitions leads to a final state.

- NFAs are no more powerful than DFAs, this is because NFAs have a finite number of states \(n\), so the maximum number of combinations of states an NFA could be in is \(2^n\). Therefore, a corresponding DFA can be constructed, whose states are the sets of states an NFA could be in.

The transition function of the DFA \(\delta_D\) can be defined in terms of the transition function of the NFA \(\delta_N\): \[ \delta_D(\{{q_k}_1..{q_k}_n\}, a) \equiv \bigcup_{i \in \{k_1..k_n\}}\;\delta_N(q_i,a) \]

## Pushdown Automata

- Pushdown automata (PDA) are automata with equivalent language-accepting power to the context-free languages. Pushdown automata are non-deterministic, although a deterministic variant does exist, though it is less powerful.
- A PDA is similar to a NDA, except it has a stack. The transition function depends on the current state, current symbol and the head of the stack.
- Transitions, also consist of replacing the head of the stack with zero or more items.
- A PDA is a septuple, where \(\Gamma\) refers to the stack alphabet and the \(Z_0\) the first symbol on the stack: \[ (Q, \Sigma, \Gamma, \delta, q_0, Z_0, F) \]
- The transition function maps to an ordered pair: \[ \delta : Q \times \Sigma \times \Gamma \to \mathcal{P}(Q \times \Gamma^*) \]
- There are two models of accepting for a PDA. The first is the conventional model, where the PDA ends in a state which is a final state. The alternative, is when the stack is empty.

## Turing Machines

- Turing machines are the most powerful kind of automata and model of computation known. They can be expressed as a septuple: \[(Q, \Sigma, B, \Gamma, \delta, q_0, F)\]
- \(B\) refers to the blank symbol and here \(\Gamma\) refers to the tape alphabet, where \(\Gamma \equiv \Sigma \setminus B\).
- The transition function, \(\delta\) is defined as: \[ \delta : Q \times \Gamma \to Q \times \Gamma \times \{L, R\} \]
- A non-deterministic variant also exists, which is as powerful as the deterministic Turing machine, where the transition function is: \[ \delta : Q \times \Gamma \to \mathcal{P}(Q \times \Gamma \times \{L, R\}) \]
- Turing machine are like DFAs except they have a tape and a tape head. The transition function depends on the current state and current symbol under the tape head. The Turing machine is initialised with the input string written from the symbol under the tape head onwards. At each transition, a Turing machine can move the tape head leftwards or rightwards.
- Note that a two-stack PDA is equivalent to a Turing machine, as the first stack can simulate everything to the left of the tape head, and the second stack can simulate the tape from the current symbol under the tape head.

## Computable Numbers

- Alan Turing’s paper where he introduces Turing machines were about computable numbers. He defined computable numbers, as numbers “calculable by finite means”, in other words there exists an algorithm with a finite number of instructions.
- An alternative definition, is a machine that can print the \(n\)th digit of a number.
- It is easy to see all rational numbers are computable numbers, since their representation as digits can be calculated using long division.
- Irrational numbers such as \(\sqrt{2}\) are also computable e.g. by using interval bisection.
- Transcendental numbers such as \(e\) and \(\pi\) can also be calculated by finite means, e.g. using infinite series.
- Turing also showed that computable functions of computable variables are computable. This is because even if the computable variable is irrational, the computable function can calculate the digits of the computable variable as it needs them.
- Examples of computable functions include sine, as this can be calculated using as infinite series.
- Turing showed that his machines could be expressed as a set of quintuples representing transitions, and hence be converted to a positive integer, he called a
*description number*. - Clearly not all real numbers are computable, since Turing machines can be reduced to a positive integer and if it were the case that all real numbers were computable, this would contradict the fact that the real numbers are uncountably infinite.

## The Halting Problem

Proposition: There exists no Turing machine which takes the description number of a Turing machine, \(M\) and its input \(I\) on its tape and can determine whether \(M\) will halt on \(I\).

Proof: Suppose there were such a Turing machine. Then it would be possible to construct another Turing machine, \(K\), that takes the description number of a Turing machine \(M\). If \(M\) halts on \(M\), then \(K\) doesn’t halt. If \(M\) doesn’t halt on \(M\), \(K\) does halt. Now execute \(K\) on \(K\). If \(K\) halts on \(K\), then \(K\) doesn’t halt on \(K\). If \(K\) doesn’t halt on \(K\), then \(K\) halts on \(K\). Thus, we have arrived a a contradiction. Hence, our original assumption must be incorrect.

- In other words, there exists no general procedure to see whether a Turing machine or a piece of code in a Turing-complete language will ever terminate.
- The languages of Turing machines are known as the recursively enumerable languages.
- Turing machines, that halt on every input, are known as total Turing machines, where as Turing machines that don’t halt on every input are known as partial Turing machines (similar to total and partial functions).
- The languages of total Turing machines are known as recursive languages, languages for which there exists a Turing machine that halts on every input.
- The language consisting of encoded Turing machines that terminate is recursively enumerable, but not recursive. Creating a Turing machine that halts if its input halts, is as simple as simulating that Turing machine.
- The halting problem means, that a compiler cannot check definitively that for all inputs a function will terminate.

## Rice’s Theorem

Proposition: All non-trivial semantic properties of Turing machines are undecidable.

Proof: Suppose there exists a non-trivial semantic property of Turing machines that is decidable, let us call that property \(P\) and let us the call the Turing machine that can decide this property \(M\). There must exists some Turing machine that satisfies the property \(P\), let us it \(K_1\). Clearly inputting \(K_1\) to \(M\) will result in acceptance. Let us create a new Turing machine \(K_2\) which computes \(K_1\), but before this simulates some arbitrary Turing machine \(h\) on some arbitrary input \(i\). Provided \(h\) halts on \(i\) then, \(K_2\) behaves identically to \(K_1\). In other words, \(M\) will accept on \(K_2\) provided \(h\) halts on \(i\). Thus, \(M\) must know if \(h\) will halt on \(i\). Therefore to solve the halting problem, \(K_2\) can be constructed for the Turing machine \(h\) and its input \(i\), and this can be fed into \(M\), if \(M\) halts, then it is known that \(h\) halts. Thus the halting problem is solved. However, we know that the halting problem cannot be solved, therefore we have reached a contradiction. Hence, our original assumption must be wrong.

- Rice’s theorem states that all non-trivial semantic properties of Turing machines are undecidable. By semantic properties it refers to properties of the languages/partial functions that the Turing machines compute rather than the a syntactic property e.g. does the Turing machine contain take up \(n\) units of space on its tape or terminate with \(n\) steps?
- By non-trivial property, it refers to any property that is not always-true or always-false i.e. a property that is a function of the Turing machine.
- This is an example of a problem that is solved by a reduction to the halting problem.

## Busy Beaver Function

- The busy beaver function, \(\Sigma\) is a function, introduced by Tibor Radó. \(\Sigma(n)\) is the score of the \(n\)-state Turing machine that wins the busy beaver game: \[ \newcommand{ceil}[1]{\lceil #1 \rceil} \Sigma : \mathbb{N} \to \mathbb{N} \]
- An \(n\)-state “busy beaver” Turing machine has \(n\) states in addition to a halting state, start with a blank tape, halt and be the \(n\)-state Turing machine.
- The number of 1s a “busy beaver” Turing machine prints is called its score.
- The \(n\)-state “busy beaver” Turing machine that wins the busy beaver game, prints the biggest number of 1s on its tape.
- The busy beaver function is an example of a non-computable function. Due to the halting problem, it is impossible to detect whether a Turing machine will halt, so there can be no algorithm to see if a Turing machine is a valid “busy beaver” Turing machine.
- Here is an alternate proof of the uncomputable nature of the busy beaver function:

Proposition: The busy beaver function \(\Sigma\) is uncomputable.

Proof: Suppose there were a Turing machine to compute the busy beaver function for \(n\)-state Turing machines, called \(A\). We shall assume \(A\) begins by writing \(n\) on its tape, in binary. This means \(A\) has \(\ceil{\log_2 n + k}\) states, where \(k\) is a constant. However, in order to differentiate the binary pattern, from the rest of the tape, since only the binary alphabet is permitted, we shall encode the binary number with 1s between each digit. So \(A\) will have \(2\ceil{\log_2 n} + k_1\) states. We can also construct a Turing machine \(B\) that reads the input from \(A\) and prints that many ones, suppose \(B\) requires \(k_2\) states, where \(k_2\) is a constant. We can combine these machines, to create a machine \(AB\), that has \(2\ceil{\log_2 n} + k_1 + k_2\) states. There will eventually some value for \(n\) for which \(n \ge 2\ceil{{\log_2}^n} + k_1 + k_2\) states. Therefore for that value of \(n\) we will have created a machine with less states, that prints the same number of 1s. This, of course, cannot be, because then there will be a Turing machine with \(n\) states, that prints more 1s than \(\Sigma(n)\). Hence we have reached a contradiction, and our original assumption must be wrong.

- However, this proof can be modified to show that \(\Sigma\) grows asymptotically faster than any computable function. As if there existed some computable \(f\), for which there existed \(n \in \mathbb{N}\), such that \(\forall x\;(x \ge n \to f(n) > \Sigma(n)\) then this upper bound could be used, as above, to construct a Turing machine that prints more 1s in the same number of states than the appropriate \(\Sigma(n)\).
- Another uncomputable function is \(S(n)\), the maximum number of steps made by any \(n\)-state busy beaver Turing machine. If such a function were computable, then a Turing machine to calculate \(\Sigma(n)\) would only have to simulate Turing machines up to \(S(n)\) steps. Moreover, \(S(n)\) must grow asymptotically faster than any computable function:

Proposition: For any computable function \(f\), there exists \(n \in \mathbb{N}\), such that \(\forall x\;(x \ge n \to \Sigma(n) > f(n))\).

Proof: Suppose this were not true. Then there would exist some function \(f\) and number \(n \in \mathbb{N}\), after which \(\forall x\;(x \ge n \to f(n) > \Sigma(n)\). However, this would mean \(\Sigma(x)\) could be computed, where \(x \ge n\). Since \(f\) provides an upper bound for the number of steps the \(n\)-state busy beaver Turing machine would have to perform. We know that \(\Sigma(n)\) is uncomputable, therefore we have reached a contradiction. Hence, out original assumption must be wrong.

## Church-Turing Thesis

- The Church-Turing thesis is the hypothesis that no model of computation is more powerful than a Turing machine.
- Evidence for the Church-Turing thesis lies in the fact that all model of computations so far devised have been either less powerful or equivalent to Turing machines.
- A programming language/model of computation is Turing-complete if it compute every function computable by a Turing machine. Most Turing machines are Turing complete, a sub-Turing language might be one, where all functions are required to halt.
- Untyped lambda calculus is Turing complete, however simply-typed lambda calculus is not; one reason for this is in simply-typed lambda calculus, every function is required to halt.

## Bibliography

- Stanford’s online Coursera course on Automata Theory. The course is no longer available, but the video lectures have been put on YouTube.
*New Turing Omnibus*by A. K. Dewdney*The Annotated Turing*by Charles Petzold*Logic*by Wilfrid Hodges*Who Can Name the Bigger Number?*, by Scott Aaronson, on his blog.