Postfix to NFA Conversion Algorithm - michellelally/regular-expression-matching GitHub Wiki
Postfix to NFA Conversion Algorithm
The postfix expression is passed into a function which will process it and hopefully build a non-deterministic finite automata from it, as long as it doesn't run into any bugs!!
Thompson's Construction
The algorithm that makes this possible, is known as Thompsons Construction. It gives the idea of breaking an expression into a collection of small NFA's, one for each character in the expression, which will be combined to provide a diagram of one large NFA which will then be used to check if a string matches the expression.
The procedure used is as follows:
- Expressions are parsed left to right.
- Each character that is read in, has an NFA built for it.
- The NFA will require an initial state and an accept state.
- Each NFA is added to a stack where all the letters' NFA's will exist
- When a special character is read, the NFA's on the stack will then be removed. It can be 1 or 2, dependent on the type of special character
- The NFA's will then be joined to create a bigger NFA which is then added to the stack
- Adding each individual NFA to the stack everytime
Thompsons Construction Psuedocode...
An NFA class will contain an inital and an accept state which will both have a label for the character and at least one arrow and at most 2 arrows
Read in 1 character at a time, we'll call it 'c'
If c = '.'
Pop 2 NFA's off the stack
Take an arrow from the accept state of the first NFA
Set it to equal to the initial state of the second NFA
This creates a new NFA
Append the new NFA to the stack
If c = '|'
Pop 2 NFA's off the stack
A new initial and accept state needs to be created
Take the arrows of the new initial state and join it to the inital state of the first and second NFA
Take the arrows of the new accept state and join accept states of the first and second NFA to the new accept state
If c = '*'
Pop single nfa from the stack
Create new initial and accept states
Take an arrow from the new initial state and join it to the NFA's initial state
Take an arow from the initial state and let it equal an accept state
Join the old accept state to the new accept state
This creates a new NFA
Append it to the stack
if c = '+'
Pop single nfa from the stack
Create new initial and accept states
Take an arrow from the new initial state and join it to the NFA's initial state
Join the old accept state to the NFA's initial state
Take an edge from the NFA's initial state and join it to the new accept state
This creates a new NFA
Append it to the stack
Not sure about this test data, didn't really spend too much time on making proper data that really tests the internals of the program
Test Data: inifixes = ["abc*", "a?(b|d)c*", "(a(b|d))", "d(bb)c?", "b+aa?", "d+(ba)", "b*(b?d?)c"] strings = ["", "abc", "abccc", "abbc", "bbcc", "abad", "daab", "abbbc", "dbdbc", "abbabcc"]