Fatihah overfitting experiment - TarteelAI/tarteel-ml GitHub Wiki

As a sanity check for our architecture, we have trained a model on all of the valid recordings from our v1 dataset for Surah Al Fatihah. Overall, the idea was to make sure that we are able to:

  • Sufficiently overfit on the training set
  • Avoid mode collapse and be able to output every one of the seven ayat for at least one input
  • Do decently on the test set

Dataset stats

A total of 185 ayat very labeled as correct in the v1 dataset. We randomly split them into train/validation/test with a 80/10/10 split. Actual file lists are included in the appendix. In total, we have the following number of files:

  • Train: 149 files
  • Validation: 18 files
  • Test: 18 files

Results

  • Model trained for 21 hours on train ~ roughly 100 epochs on CPU
  • Train accuracy: 100%
  • Validation accuracy: 83% (15/18 correct)
  • Test accuracy: 50% (9/18 correct)

[Loss curve] [Perplexity curve]

Preprocessing settings

  • Sampling rate: 16000
    • ASR community rarely gets better performance out of higher rates
  • Channels: 1
    • Stereo would take twice as long to train, not sure about performance gain but definitely not significant
  • Output segmentation: character level
  • -src_seq_length = 150 and -tgt_seq_length = 150
  • Use tensorboard for progress monitoring

Training scripts

Convert word tokenized text into character tokenized text:

#!/bin/bash

cat word_tokenized_text.txt | sed 's/ /_/g' | sed 's/\(.\)/\1 /g' | sed 's/_/<space>/g' > char_tokenized_text.txt

OpenNMT preprocessing

python preprocess.py \
    -data_type audio \
    -src_dir ${base_dir}/dataset \
    -train_src ${base_dir}/dataset/processed/train_src.txt \
    -train_tgt ${base_dir}/dataset/processed/train_tgt.txt \
    -valid_src ${base_dir}/dataset/processed/val_src.txt \
    -valid_tgt ${base_dir}/dataset/processed/val_tgt.txt \
    -shard_size 300 \
    -src_seq_length 150 \
    -tgt_seq_length 150 \
    -save_data ${base_dir}/exp0/data/processed

OpenNMT train

python train.py \
    -model_type audio \
    -enc_rnn_size 512 \
    -dec_rnn_size 512 \
    -audio_enc_pooling 1,2 \
    -dropout 0 \
    -enc_layers 2 \
    -dec_layers 1 \
    -rnn_type LSTM \
    -data ${base_dir}/exp0/data/processed \
    -save_model ${base_dir}/exp0/models/model \
    -global_attention mlp \
    -batch_size 8 \
    -optim adam \
    -max_grad_norm 100 \
    -learning_rate 0.0003 \
    -learning_rate_decay 0.8 \
    -train_steps 2000 \
    -valid_steps 150 \
    -save_checkpoint_steps 150 \
    -tensorboard \
    -tensorboard_log_dir ${base_dir}/exp0/tensorboard_run

OpenNMT inference

python translate.py \
    -data_type audio \
    -model ../exp0/models/model_step_2000.pt \
    -src_dir ../dataset/ \
    -src ../dataset/processed/train_src.txt \
    -output pred_train.txt -verbose

Appendix

Train file list

wav/1_5_1425074844.wav
wav/1_4_2399580457.wav
wav/1_6_912313373.wav
wav/1_4_1616077583.wav
wav/1_4_2787081723.wav
wav/1_5_3044071082.wav
wav/1_2_615013790.wav
wav/1_6_1691939011.wav
wav/1_4_1953431019.wav
wav/1_6_2808985807.wav
wav/1_6_4145987231.wav
wav/1_7_2529863598.wav
wav/1_4_2468710739.wav
wav/1_1_3780033083.wav
wav/1_5_1326790959.wav
wav/1_4_4280440585.wav
wav/1_6_3217733272.wav
wav/1_5_1812842962.wav
wav/1_7_873726059.wav
wav/1_3_4292150794.wav
wav/1_5_3116160521.wav
wav/1_6_147824879_4cfPiIj.wav
wav/1_7_3721766143.wav
wav/1_1_3578475207.wav
wav/1_2_4160724792.wav
wav/1_3_2763432460.wav
wav/1_5_1883279349.wav
wav/1_6_3560987809.wav
wav/1_2_3829087781.wav
wav/1_2_576225772.wav
wav/1_2_3823105993.wav
wav/1_6_1220528519.wav
wav/1_4_805219025.wav
wav/1_4_1468388529.wav
wav/1_5_1199950785.wav
wav/1_7_3771589469.wav
wav/1_5_1213070565.wav
wav/1_5_2679844491.wav
wav/1_3_2416307587.wav
wav/1_1_1358470096.wav
wav/1_2_1730556095.wav
wav/1_4_3089997106.wav
wav/1_3_41491049.wav
wav/1_2_3582471.wav
wav/1_6_4090222456.wav
wav/1_7_3901341232.wav
wav/1_1_3506158158.wav
wav/1_2_2744196375.wav
wav/1_6_2661688511.wav
wav/1_2_516378307.wav
wav/1_3_2741711775.wav
wav/1_6_3474201716.wav
wav/1_1_3738241632.wav
wav/1_7_4128084961.wav
wav/1_6_3641512560.wav
wav/1_7_1217178866.wav
wav/1_2_143136275.wav
wav/1_6_3629675681.wav
wav/1_4_1886635465.wav
wav/1_7_2860872163.wav
wav/1_3_1315646419.wav
wav/1_1_3082757238.wav
wav/1_4_735014477.wav
wav/1_3_1508971909.wav
wav/1_6_2150950087.wav
wav/1_2_1851488969.wav
wav/1_5_1946142856.wav
wav/1_4_2873489627.wav
wav/1_1_526190118.wav
wav/1_1_3427586811.wav
wav/1_2_3671905585_th9Sf05.wav
wav/1_4_4162098808.wav
wav/1_3_3100440406.wav
wav/1_6_3641512560_a7hC3Xe.wav
wav/1_3_3830232711.wav
wav/1_4_3734729224.wav
wav/1_1_2998129123.wav
wav/1_6_3419846128.wav
wav/1_1_2677219793.wav
wav/1_7_1030177697.wav
wav/1_6_3635830474.wav
wav/1_6_1540864270.wav
wav/1_3_1180620269.wav
wav/1_5_1359210028.wav
wav/1_3_712557111.wav
wav/1_7_92302695.wav
wav/1_2_1073266932.wav
wav/1_3_2791353353.wav
wav/1_2_515905884.wav
wav/1_3_3828791224.wav
wav/1_2_1232757445.wav
wav/1_1_3371917115.wav
wav/1_1_3752224010.wav
wav/1_2_2868134480.wav
wav/1_7_1069479178.wav
wav/1_2_3964756442.wav
wav/1_6_1311251597.wav
wav/1_4_3406460270.wav
wav/1_2_1589922825.wav
wav/1_6_3964846100.wav
wav/1_5_2425583359.wav
wav/1_5_1500429571.wav
wav/1_6_4081852003.wav
wav/1_7_2674702755.wav
wav/1_2_3099254338.wav
wav/1_7_3041007525.wav
wav/1_1_692579729.wav
wav/1_2_2615458200.wav
wav/1_6_1959363339.wav
wav/1_2_2198883921.wav
wav/1_6_439389781.wav
wav/1_4_632387383.wav
wav/1_2_4115400297.wav
wav/1_1_2425861049.wav
wav/1_5_3465885399.wav
wav/1_1_376413349.wav
wav/1_1_3637224605_BvJ4Mad.wav
wav/1_3_409467092.wav
wav/1_6_3694918494.wav
wav/1_3_885940819.wav
wav/1_2_1989340117.wav
wav/1_4_1539465450.wav
wav/1_1_3129375594.wav
wav/1_5_2475998766.wav
wav/1_3_1486618514.wav
wav/1_6_1413065775.wav
wav/1_1_3402647477.wav
wav/1_5_2130286189.wav
wav/1_7_890336865.wav
wav/1_6_3184206461.wav
wav/1_5_3047031822.wav
wav/1_7_456658554_Aa2NLrB.wav
wav/1_1_3901909884.wav
wav/1_3_1543704090.wav
wav/1_6_2463426825.wav
wav/1_4_302218851.wav
wav/1_5_2540785300.wav
wav/1_5_3047031822_Ar3djLc.wav
wav/1_2_1397555154.wav
wav/1_2_1227982512.wav
wav/1_2_2651969464.wav
wav/1_2_656533845.wav
wav/1_5_2917054842.wav
wav/1_3_1610046637.wav
wav/1_5_1793560208.wav
wav/1_6_4035742518.wav
wav/1_7_2963788463.wav
wav/1_1_314605978.wav
wav/1_4_578219235.wav

Validation file list

wav/1_3_1542266059.wav
wav/1_1_125500082.wav
wav/1_4_3605765818.wav
wav/1_5_1071233062.wav
wav/1_7_2375198015.wav
wav/1_3_725768211.wav
wav/1_4_2617021714.wav
wav/1_5_4159695804.wav
wav/1_5_3543268651.wav
wav/1_4_1369217603.wav
wav/1_2_3128088926.wav
wav/1_4_441453214.wav
wav/1_6_502620076.wav
wav/1_1_2794059534.wav
wav/1_1_2167320070.wav
wav/1_5_1359210028_jHSqAVh.wav
wav/1_3_505703049.wav
wav/1_2_1497707070.wav

Test file list

wav/1_5_1718922210.wav
wav/1_5_3116160521_xImWhIW.wav
wav/1_2_277993764.wav
wav/1_2_1503202022.wav
wav/1_4_123558308.wav
wav/1_4_1464974484.wav
wav/1_4_2949633585.wav
wav/1_2_3039629829.wav
wav/1_6_914096626.wav
wav/1_7_2653915203.wav
wav/1_5_3734843448.wav
wav/1_3_619438520.wav
wav/1_4_1011589459.wav
wav/1_2_3917663890.wav
wav/1_7_2238976240.wav
wav/1_5_3101382828.wav
wav/1_3_191544354.wav
wav/1_3_2973889121.wav
⚠️ **GitHub.com Fallback** ⚠️