Natural Language Processing - OxRAMSociety/RobotArm GitHub Wiki

Natural language processing

The natural language processing component of this project is based on RASA:

Rasa is an open source machine learning framework for automated text and voice-based conversations. Understand messages, hold conversations, and connect to messaging channels and APIs.

RASA documentation

The NLP code is divided into three folders:

Rasa

This is the Natural Language model.

Rasa works by prescribing a sequence of component where the output of each component is fed into the next one, as described into the config file. These components progressively transform the text into the desired form.

The NLP model has to perform two tasks: recognise intent (i.e. what action needs to be performed?) and pick out key words that relate to that action (words such as "knight" or "C6"). For the intent recognition an Intent classifier is used. For the word extraction an entity extractor is used.

A general pretrained model from Rasa will not "know" what information to extract and which intent we want to recognise, hence we must perform some fine-tuning on this model. For this training step (which is performed in the full pipeline) training data is needed. This data consists of a list of "intents" we want to recognise, and some examples (with examples of the tokens we want to extract).

Training and testing is performed using the command line, no script needed.

Important to note is that Rasa performs the text to intent and entity extraction, but does not take care of speech to text.

Rosbridge

In our final product, we would want ROS to be run on the robot through Raspberry Pi but do all the computations on a computer that may not have ROS. To do this, we use ROSbridge which connects the non-ROS computer with the ROS computer through the web.

Webapp

The webapp is used as the main interface for the natural language component. It performs two main actions (initiated by HTTP requests):

  • The "/record" endpoint, when called, starts recording from a microphone, and perform text to speech conversion, returning the text.

  • The "/predict" endpoint calls the Rasa model, using the message passed to it, and returns the model predictions

Results

The most promising model currently uses the DIETClassifier for both the Intent classification and Entity classification. Overview of results