Postfix to NFA Conversion Algorithm - michellelally/regular-expression-matching GitHub Wiki

Postfix to NFA Conversion Algorithm

The postfix expression is passed into a function which will process it and hopefully build a non-deterministic finite automata from it, as long as it doesn't run into any bugs!!

Thompson's Construction

The algorithm that makes this possible, is known as Thompsons Construction. It gives the idea of breaking an expression into a collection of small NFA's, one for each character in the expression, which will be combined to provide a diagram of one large NFA which will then be used to check if a string matches the expression.

The procedure used is as follows:

Expressions are parsed left to right.
Each character that is read in, has an NFA built for it.
The NFA will require an initial state and an accept state.
Each NFA is added to a stack where all the letters' NFA's will exist
When a special character is read, the NFA's on the stack will then be removed. It can be 1 or 2, dependent on the type of special character
The NFA's will then be joined to create a bigger NFA which is then added to the stack
Adding each individual NFA to the stack everytime

Thompsons Construction Psuedocode...

An NFA class will contain an inital and an accept state which will both have a label for the character and at least one arrow and at most 2 arrows

Read in 1 character at a time, we'll call it 'c'

If c = '.'

Pop 2 NFA's off the stack

Take an arrow from the accept state of the first NFA 

Set it to equal to the initial state of the second NFA

This creates a new NFA 

Append the new NFA to the stack

If c = '|'

Pop 2 NFA's off the stack

A new initial and accept state needs to be created

Take the arrows of the new initial state and join it to the inital state of the first and second NFA 

Take the arrows of the new accept state and join accept states of the first and second NFA to the new accept state

If c = '*'

Pop single nfa from the stack 

Create new initial and accept states

Take an arrow from the new initial state and join it to the NFA's initial state

    Take an arow from the initial state and let it equal an accept state

Join the old accept state to the new accept state

This creates a new NFA 

Append it to the stack

if c = '+'

Pop single nfa from the stack 

Create new initial and accept states

Take an arrow from the new initial state and join it to the NFA's initial state

Join the old accept state to the NFA's initial state

Take an edge from the NFA's initial state and join it to the new accept state

This creates a new NFA 

Append it to the stack

Not sure about this test data, didn't really spend too much time on making proper data that really tests the internals of the program

Test Data: inifixes = ["abc*", "a?(b|d)c*", "(a(b|d))", "d(bb)c?", "b+aa?", "d+(ba)", "b*(b?d?)c"] strings = ["", "abc", "abccc", "abbc", "bbcc", "abad", "daab", "abbbc", "dbdbc", "abbabcc"]