The tarteel v1 dataset is a snapshot of 25,000 recordings from January 2019. The total duration across these 25,000 recordings amounts to 242618 seconds (67.39 hours), with an average audio length of 9.70 seconds. The recordings themselves are in their original form and are unevaluated - but we have made an automatically evaluated dataset available. A manually annotated dataset of over 20,000 ayat will be made available in the new future.

Some characteristics of the dataset:

  • This a truly an "ayah-recitation-in-the-wild" kind of dataset, and hence you'll find a good mix of recitation styles, proficiency and speed.
  • Repetition and hesitations are frequent - this is very good from the perspective of real world data but also a challenge that needs to be considered when building an ML system (since the corresponding text has neither of these).
  • An (currently unknown) portion of the dataset currently has the beginning or end of the ayah missing. This is usually in the "continuous" recitation mode where the reciter is reading ayat in a sequence and the action of clicking next is not perfectly aligned with the reciter starting the next ayah.
  • About 33% of the dataset is not actually encoded as wav even though the extension is wav. This is because the encoding of the recording currently depends on the device on which it was recorded and no transcoding is done on the server side. See technical notes to see commands to detect audio type and convert it into another format.
  • 281 files (~35 minutes) are absolute (or near absolute silence) - probably because the client was not able to access microphone data. Files are listed in the appendix

Audio encoding heterogeneity

Count -  Type, Sample Rate, Channels  , Encoding
   3  -  wav , 8000 Hz    , 2 channels, s16
   4  -  wav , 192000 Hz  , 2 channels, s16
 108  -  wav , 16000 Hz   , 2 channels, s16
 441  -  aac , 44100 Hz   , mono      , fltp
7883  -  aac , 44100 Hz   , stereo    , fltp
8226  -  wav , 48000 Hz   , 2 channels, s16
8335  -  wav , 44100 Hz   , 2 channels, s16

Manual Evaluation

About 150 ayat were manually annotated along with specific notes for each of them. The following table gives an overview of the diversity of these recitations.

Surah Number Ayah Number URL to Recording Correct Recitation? Notes
29 51 29_51_3340224152.wav FALSE
76 1 76_1_3020326476_eYuCSBG.wav TRUE
39 41 39_41_1554241100.wav TRUE
17 62 17_62_3574186471.wav TRUE Contains hesitation
36 37 36_37_1754041755.wav TRUE
6 86 6_86_880309420.wav TRUE
6 1 6_1_179366417.wav TRUE Late start
15 42 15_42_4030057700.wav TRUE Missing characters at the end
73 8 73_8_1202722251.wav FALSE Recitation is of a different ayah
67 3 67_3_2931720214.wav TRUE
17 108 17_108_2289803411.wav FALSE Empty
2 212 2_212_1333381006.wav TRUE
2 70 2_70_3046857028.wav FALSE Incomplete ayah
90 2 90_2_3497493728_8UXhCjH.wav TRUE
113 1 113_1_3192978335.wav TRUE
2 9 2_9_2493201828.wav FALSE Empty
46 9 46_9_4035555337.wav TRUE
34 29 34_29_2472434834.wav TRUE Contains hesitation
18 82 18_82_936377773.wav TRUE Contains hesitation
1 6 1_6_3419846128.wav TRUE
16 19 16_19_3469972445.wav FALSE Empty
56 15 56_15_560698210.wav TRUE
60 13 60_13_3223691956.wav TRUE
78 17 78_17_3573919523.wav TRUE
73 19 73_19_2936868360.wav FALSE Recitation is of a different ayah
57 6 57_6_4294733945.wav TRUE
24 12 24_12_279043107.wav TRUE
8 73 8_73_2581299400.wav TRUE False start and missing characters at the end
7 113 7_113_786353004.wav TRUE
26 66 26_66_1426103961.wav TRUE
54 12 54_12_3772293134.wav FALSE Empty
38 85 38_85_3935227980.wav TRUE
5 4 5_4_859410680.wav TRUE Includes characters from next ayah
3 177 3_177_2658793224.wav TRUE Missing characters at the start, page ruffling at the end
3 133 3_133_1551876359.wav TRUE
11 20 11_20_4169923669.wav FALSE Empty
57 8 57_8_2564476253.wav TRUE
19 63 19_63_1205676658.wav TRUE Missing characters at the start
79 43 79_43_637994452.wav TRUE
28 28 28_28_1310678751.wav TRUE
37 67 37_67_1300620030.wav TRUE Missing characters at the end
9 55 9_55_1002940119.wav TRUE
90 12 90_12_2612117498.wav TRUE
27 48 27_48_2658412579.wav TRUE
18 86 18_86_1990251764.wav TRUE
2 258 2_258_3034376487.wav TRUE
75 29 75_29_2044192076.wav TRUE Missing characters at the start
56 45 56_45_317928467.wav TRUE
67 11 67_11_587631042.wav TRUE Includes characters from next ayah
77 12 77_12_3177618951.wav TRUE
18 79 18_79_1913846909.wav TRUE
9 47 9_47_1001329568.wav TRUE Missing characters at the end
6 22 6_22_2259216929.wav FALSE Almost empty
55 63 55_63_2838529228.wav TRUE
5 101 5_101_1026578053.wav TRUE
88 22 88_22_204844392.wav TRUE
20 120 20_120_2687472128.wav FALSE Incomplete recitation
70 28 70_28_4018276970.wav TRUE
85 17 85_17_3749108940.wav TRUE
37 160 37_160_2251393673.wav TRUE Missing characters at the end
4 113 4_113_1037152294.wav TRUE
14 10 14_10_4242971217_1PGDQeo.wav TRUE False start
9 103 9_103_687047901.wav FALSE Empty
26 202 26_202_2142350463.wav TRUE
44 19 44_19_1015587995.wav TRUE
99 1 99_1_2915792010.wav TRUE
109 4 109_4_806780268.wav TRUE
31 28 31_28_2052098992.wav TRUE
67 1 67_1_3481815969.wav TRUE
3 183 3_183_3638863501.wav TRUE Missing characters at the end
92 11 recording_n74iGRl.wav FALSE Almost empty
43 46 43_46_1416875347.wav TRUE
76 16 76_16_1347475632.wav TRUE
5 25 5_25_834330814.wav TRUE
88 10 88_10_1248334570.wav TRUE
11 73 11_73_973544851.wav TRUE
36 2 36_2_290659745.wav FALSE Empty
56 17 56_17_2600116285.wav TRUE
43 84 43_84_2333856581.wav TRUE
39 59 39_59_3794840617.wav TRUE
4 90 4_90_940938099.wav TRUE Contains hesitation
56 8 56_8_396325185.wav TRUE
80 34 80_34_2070149842.wav TRUE
31 30 31_30_1837859811.wav TRUE Contains repetition
2 22 2_22_614831518.wav TRUE
31 27 31_27_4208421270.wav TRUE
5 46 5_46_1451316755.wav TRUE Missing characters at the end
16 47 16_47_4284316601.wav TRUE Contains hesitation
56 32 56_32_2576214909.wav TRUE
7 157 7_157_2885906502.wav TRUE Contains repetition
40 19 40_19_3673324834.wav TRUE Late start
17 74 17_74_4093306271.wav FALSE Empty
21 87 21_87_2253646439.wav FALSE Empty
33 30 33_30_1047930702.wav TRUE
2 101 2_101_1722311922.wav TRUE
18 107 18_107_3605461623.wav TRUE
56 16 56_16_211365629.wav TRUE Missing characters at the end
3 132 3_132_1234621432.wav FALSE False start, but most of ayah is correct
73 4 73_4_2389030636.wav FALSE Recitation is of a different ayah
2 207 2_207_1738853420.wav FALSE Recitation is of a different ayah
2 61 2_61_178625651.wav TRUE Long with repetitions
46 35 46_35_1003010349.wav TRUE Starts with A’udhu billahi, long and lots of repetitions minor mistakes with corrections
2 282 2_282_2162859841_pNdIVFG.wav TRUE
24 61 24_61_19471702_AdFj7CM.wav TRUE Long with repetitions
33 53 33_53_2637130870.wav TRUE Minor corrections and a few missing words
13 31 13_31_2917018758.wav FALSE Wrong ayah
74 1 74_1_3853833136.wav FALSE No recitation with only background audio
24 31 24_31_821140661_ZpLNB1t.wav TRUE
3 21 3_21_4195370451.wav TRUE Slow recitation from new reciter
4 12 4_12_3397812632.wav FALSE Wrong ayah and bad audio quality
18 86 18_86_2892001514.wav FALSE No recitation with only background audio
2 187 2_187_2567604468.wav TRUE
57 20 57_20_1444517750.wav TRUE
22 5 22_5_1319386874.wav TRUE Loud audio with many clipped samples
2 43 2_43_1839768713.wav TRUE
5 6 5_6_3971870288.wav TRUE
2 164 2_164_146627317.wav TRUE Minor corrections and pauses
5 12 5_12_62466597.wav TRUE
2 275 2_275_1194007719.wav TRUE Starts with Bismillah, includes lots of repetitions and corrections
39 71 39_71_2058796630.wav TRUE
24 31 24_31_3077665806.wav TRUE Minor mistakes, pauses and corrections
46 19 46_19_1077663174.wav FALSE Empty
58 11 58_11_143685028.wav FALSE Ayah is repeated 2.5 times
41 12 41_12_1530114944.wav TRUE Contains repetitions
48 29 48_29_4136437142.wav TRUE Contains long range repetitions
24 61 24_61_19471702_Hmf33gP.wav TRUE
24 58 24_58_3828369781.wav TRUE Contains corrections
2 196 2_196_1510345966.wav TRUE
2 177 2_177_242597694.wav TRUE
4 12 4_12_687196028.wav TRUE Contains long range repetitions
2 282 2_282_3601780668.wav TRUE
11 57 11_57_1451809866.wav TRUE Starts with Bismillah, new reciter and long pauses
38 23 38_23_1115556105.wav TRUE New reciter and long pauses
39 38 39_38_280896818.wav TRUE Contains long range repetitions
5 64 5_64_2727374906.wav TRUE
22 5 22_5_1472297438_KeJeDoL.wav TRUE
24 31 24_31_821140661_9wmRcHt.wav TRUE
2 177 2_177_1480427117.wav TRUE
2 61 2_61_3298157126.wav TRUE
4 12 4_12_2321146328.wav TRUE
24 31 24_31_26647198.wav TRUE
2 282 2_282_1220445354.wav TRUE Contains minor corrections and pauses
58 22 58_22_3262135905.wav TRUE
18 17 18_17_1053309820.wav TRUE New reciter and long pauses
33 50 33_50_1871251261.wav TRUE
7 143 7_143_2751962472.wav TRUE
33 50 33_50_3069713586.wav TRUE
24 31 24_31_821140661_EoyQa04.wav TRUE
2 85 2_85_4269060186.wav TRUE
11 63 11_63_4051377768.wav TRUE New reciter and long pauses
48 25 48_25_4019202576.wav TRUE Contains repetition
13 7 13_7_3722306701.wav FALSE Empty

Automatic Evaluation

The entire dataset of 25,000 recitations has also been automatically evaluated by an algorithm. The algorithm uses Google Speech-to-text to first encode the recordings, and then uses Iqra (a quran search engine) to search for the ayah using the transcription. If the result returned by Iqra matches with the recorded ayah, we mark is as correct. An manual evaluation over 100 ayat revealed an accuracy of about 97%, i.e. 3% of the evaluations were actually wrong (either the algorithm claiming its a correct recitation, but the recitation being wrong, or vice versa). The code is available in the tarteel-dataset-labeler repository.

The automatic evaluated marked 20565 recordings as correct (82.26% of the full dataset). This amounts to 226,985 seconds (65.05 hours) in audio length. The evaluated dataset is available to download here, with the last column containing the automatic evaluations.

A second, extended version of the auto-evaluated dataset is also available, and contains an extra column with the transcriptions for Google's Speech-to-text system. The last column contains the transcriptions, separated by ||| in case multiple transcriptions were available for the same recording.

Useful Subsets

Some useful subsets and their statistics are listed below:

Surah Al-Fatihah

  • Surah #1
  • 185 recordings (auto-evaluated as correct)
  • Duration: 18 minutes and 13 seconds (1092.8 seconds)

Juz 'Amma (30th Chapter)

  • Surahs #78 - #114
  • 2605 recordings (auto-evaluated as correct)
  • Duration: 3 hours and 47 minutes (13628.3 seconds)

Technical Notes

  • Get length of an audio file in seconds:
ffprobe -v error -show_entries format=duration -of default=noprint_wrappers=1:nokey=1 <path-to-audio-file>
  • Convert any audio format into a wav file:
ffmpeg -i <path-to-input-file> <path-to-output.wav>
  • Resample wav to have a different sampling rate, bitrate, number of channels etc:
sox <path-to-input.wav> -b <bitrate> -r <sample-rate> -c <number-of-channels> -e <encoding> <path-to-output.wav>
  • Detect files that have near silence the entire time (basically microphone data was never recorded, and empty zeroed bytes were sent):
for i in *.wav; do echo $i `ffmpeg -t 10 -i $i -af "volumedetect" -f null /dev/null 2>&1 | grep mean_volume`; done | grep "\-9"


