M2 ICP 2 - TondiToday/CSEE5590-IOT-Robotics GitHub Wiki

Teammate 1: Tonderai Kambarami

Teammate 2: Luis Guillermo Usseglio-Carbajal

Teammate 3: Reed Bader

Teammate 4: Tarik Salay

Introduction

For this project, we created a simple flask app which would house our audio recognition project. The audio recognition runs on a TensorFlow backend where we run a pre-trained VGG-19 model to do audio classification on 10 second snippets of audio. The training dataset used was the Respiratory Sound Database provided by Kaggle.

Objectives

The objective is to be able to use a pre-trained model to identify the health state of users based on audio from the heart or lungs coming through the Stethoscope 1. Have all this packaged in an application and allow automatic delivery from audio file recording to the output from TensorFlow.

Approaches/Methods

Flask Application

We used flask as our application framework because its Python based, which allows for easier integration with the TensorFlow/Pytorch. The Flask application opens to the main page where the user can upload audio segments to their application using the set of functions below: The functions break the audio file down into smaller segments, then turns them into waveform plots. Which are then passed to the machine learning backend of the application. Flask set up is as follows:

Machine Learning

We used the VGG-19 pretrained model and trained it using the Respiratory Sound Database provided by Kaggle. With this model we pass the wav plots from the flask app into the model and run classification to get an estimated value. The functions required for machine learning are found below.

Evaluation

While we successfully created an application which could do audio classification there where some limitations we ran into. First of all the re training the VGG-19 model in real-time was unsuccessful and would crash the flash application. We believe this is due to the sheer size of the Respiratory Dataset. Also the classification accuracy (especially when considering multiple classes) was extremely poor, maxing out at 20%. Also if time was permitted i would have improved the audio streaming technology to streamline with the Stethoscope 01. While we had working code we were unable to combine it into the application. Also adding CSS to make the application look nicer was on our to do list but never got around to doing. Below you will see the output of our code. Whereby the audio segment is uploaded, analyzed by the machine learning algorithm then outputs an accuracy score. As well as prints out a full classification report.

Team Video