Please refer to this page for information on how to work on your assignment.
A finite automaton is an algorithm with a finite memory which scans through an input string or text. The finite memory is given by states, one of them being a starting state and several accepting states. Furthermore, it has an update function which computes from the old state and the current datum the new state. For example, the following automaton checks whether in a string there is an even number of vowels.
State-Names: Even, Odd; Starting State: Even; Accepting States: Even; Rejecting States: Odd; Alphabet: Lower case letters; Update Function: Old State New State at "a","e","i","o","u"; at other letters Even Odd Even Odd Even Odd So New State ("a",Even) is Odd, New State ("b",Even) is Even, New State ("c",Odd) is Odd. Sample Runs: Input seen so far State of Automaton (1) - Even w Even wh Even who Odd whos Odd whose Even (2) - Even w Even wh Even wha Odd what Odd
Example (1) accepts and Example (2) rejects the input as after
processing the complete word whose
the automaton is in the accepting
state Even and after processing the complete word what
the automaton
is in the rejecting state Odd. The type of finite automata dealt with here
is called deterministic since the function to compute the new state permits
always only one unique value; otherwise the automaton would be called
non-deterministic. For more information on finite automata, see
Wikipedia
and
Panet Math.
The automaton of this exercise does the following: it scans through a text and checks whether this text has the following minimum qualities:
The task is to extend these rules as follows and to adapt the below syntax checker accordingly.
His brothers' cars are green and he's going to his sister's house.which should not be confused with quotes.
The automaton which does the syntax checking has the following states which are listed together with a short intuitive explanation of their meaning:
"e": error - this state is taken when an error is found; "f": automaton is going to read the first character of sentence; "c": automaton has seen a comma or similar sign ",", ":"; "d": automaton has seen a dot or similar sign ".", "!", "?"; "n": automaton is processing or has been processing a number; "z": automaton has seen a single zero; "s": automaton has seen a space; "h": automaton has seen a hyphen "-"; "p": automaton has seen a plus "+" or a minus "-"; "q": automaton is between two quotes of a quotation; "r": automaton has seen the second quote of a quotation; "w": automaton is processing or has been processing a word;
The newstate(oldstate, symbol)
function computes how to update
the state of the automaton when it is processing a symbol. The only thing
which needs to be adapted is this function. Here some explanation on an
example how this function works. Consider the following sample run:
My number is 007. wwswwwwwwswwszeee The sentence is syntactically incorrect.
In this example, the sentencecheck()
function is already doing
what it should as in the specification above, it is said that no number
except 0 itself should state with the symbol 0,
so James Bond would have to write his number as "0 0 7" in order to pass
the syntax checking. The sample run is a short representations for the
following calls of the newstate()
function which always takes
the old value of the state and computes from it and the symbol above the
new value of the state. The initial value "f" is dropped in above
representation, but the second line gives the states after having processed
the corresponding part of the word above it, so "wwsw" are the states taken
while processing "My n" with "w" being the state after these four symbols
have all been processed. A longer representation of this expression would
be the following one:
"w" = newstate("f","M"); "w" = newstate("w","y"); "s" = newstate("w"," "); "w" = newstate("s","n"); "w" = newstate("w","u"); "w" = newstate("w","m"); "w" = newstate("w","b"); "w" = newstate("w","e"); "w" = newstate("w","r"); "s" = newstate("w"," "); "w" = newstate("s","i"); "w" = newstate("w","s"); "s" = newstate("w"," "); "z" = newstate("s","0"); "e" = newstate("z","0"); "e" = newstate("e","7"); "e" = newstate("e",".");
This input is rejected, that is, classified as false by the finite
automaton. Indeed, it would reject any input where the machine is not in
the state "d" after having processed all the symbols of the input. The most
straightforward implementation of the newstate()
function
would be a large table which tells the function what value to assign for
which state and which input, this table would be finite as there are only
finitely many states and finitely many symbols. Somehow, the table would be
extremely large and difficult to typeset, therefore it is much more
convenient to write a JavaScript function instead as it is done in this
exercise.
The function has a variable rv
which contains the return value
of the function which is passed back as the new state after it finishes.
The initial value of rv
is "e", so that the error-state is
taken in the case that the function does not provide an explicit case for
what to do with the old state and symbol in the input. The switch command
in the function has entries for all states which need to be addressed, for
example in the case of the state "f", the entry looks like this:
case "f": if(("A" <= symbol) && ("Z" >= symbol))
{
rv = "w";
}
break;
So if the symbol is one of the upper case letters A,
B, … , Z, the
rv
variable which has the next state takes the value "w" for
indicating that a word is processed. Indeed, as a sentence should start
with a word having an upper case letter at the beginning, this is exactly
what the program should do. Other options should make the automaton to take
the error state "e" what indeed happens whenever it is not clearly
overwritten. The break
command indicates that the case dealing
with "f" is completed. Here another example, namely the one inside a word:
case "w": if(("a" <= symbol) && ("z" >= symbol))
{
rv = "w";
}
if(symbol == " ")
{
rv = "s";
}
if((symbol == ".") || (symbol == "?") || (symbol == "!"))
{
rv = "d";
}
if((symbol == ",") || (symbol == ":"))
{
rv = "c";
}
break;
Here the update rule is more complicated. There are four if statements, each dealing with another transfer of the state. If the symbol is one of the letters a, b, … , z then the next state is again "w". If the symbol is a space then the next state is "s". If the symbol is ., ? or ! then the next state is "d" which stands for dot-like sentence signs. If the symbol is , or : then the next state is "c" which stands for comma-like sentence signs. If none of these cases apply, the return value will have the default "e" and the finite automaton will go into the error state.
In order to do the requested changes, one has to introduce such type of environments for the states "h" and "p". Note that "p" is entered from "s" when there is a + or - after a space. From "p" there should follow one of the digits 1 to 9 but not the digit 0. So "-325" and "+7711" are permitted but neither "+007", "-0" nor "+ 88". The hyphen state "h" should only be reached from "w" if there is an hyphen instead of a letter and after the hyphen, there should again be one of the letters a, b, … , z. If not, the automaton should take the state "e". Note that it depends from the context whether the symbol - is a minus or a hyphen.
Furthermore, at several places, ; has to be
added into the list of permitted symbols for a state change. Also the quote
symbol ' should be handled adequately. Only
single quotes occur. If they are after a space, they indicate the beginning
of a quotation and the automaton should take the state "q". What follows is
already there. If they occur inside a word, they are not the beginning of a
quotation but just an omission symbol as in the sentence He's running
home fast to meet his sister's boyfriend.
Therefore, quote symbols
inside a word should be treated like letters. For the ease of programming,
it can be assumed that the first symbol of a sentence is not a quote symbol
and that the above two cases are all which occur. Quote symbol denoting
omissions or genitives can be treated like lower-case letters and do not
need an extra state of the automaton.