M2 ICP_2 - akkipant/IoT-Fall-2019 GitHub Wiki
Introduction:
In this ICP we had to record different heartbeat sounds using digital stethoscope and using machine learning to train the model to detect the type of disease based on the pattern of heartbeat. But, since this was a lengthy task, we had to first do audio classification using Python Deep Learning. Python Code must be able to classify different environmental sounds like Dog, Rooster, Rain, etc.
Objectives:
- Audio classification by converting .wav file (sound) to .png file (image).
- Training audio model with Esc10 dataset. This is sub-set of Environmental Sound Classification dataset.
- Achieving accuracy more than 60% to classify the sound.
Approaches/Methods:
Our approach was to make our own model and train it but we were getting low accuracy. Hence, we decided to use vgg19 model and train it.
Workflow:
- We used google colab for developing Python code.
- Import pytorch library.
- We Converted the .wav files into .png files in order to get their spectrograms. We saved the spectrograms of 10 different classes in 10 different folders.
- We used vgg19 as our model because the model which we created was giving very poor accuracy.
- We used torch data loader to load images and create datasets.
- We trained our models on the datasets by calculating loss and doing Back-Propagation.
- We tested our output using validation dataset.
Algorithm
- Read the .wav file
- Load the datasets with ImageFolder
- Using the image datasets and the trainforms, define the dataloaders
- Set GPU
- Send Model to GPU
- Calculate loss
Output:
Parameters:
Maximum accuracy achieved = 87% Minimum accuracy achieved = 25%
Video:
Evaluation and Discussion:
Since, machine learning and deep learning was new area for us, most of the time was spent in learning the concepts and process to train the models. Once, that was achieved, we were struggling with the validation percentage when the classes were increased to 10. Hence, we used the "vgg19" and trained the model.
Conclusion:
In this ICP, we learned audio classification using deep learning. Audio classification was challenging as we had to convert audio into spectrogram and apply deep learning algorithms on spectrogram images. We achieved best accuracy (87%) for certain classes but 1-2 classes gave 50% accuracy.