Tarteel v1 Dataset - TarteelAI/tarteel-ml GitHub Wiki
The tarteel v1 dataset is a snapshot of 25,000 recordings from January 2019. The total duration across these 25,000 recordings amounts to 242618 seconds (67.39 hours), with an average audio length of 9.70 seconds. The recordings themselves are in their original form and are unevaluated - but we have made an automatically evaluated dataset available. A manually annotated dataset of over 20,000 ayat will be made available in the new future.
Some characteristics of the dataset:
- This a truly an "ayah-recitation-in-the-wild" kind of dataset, and hence you'll find a good mix of recitation styles, proficiency and speed.
- Repetition and hesitations are frequent - this is very good from the perspective of real world data but also a challenge that needs to be considered when building an ML system (since the corresponding text has neither of these).
- An (currently unknown) portion of the dataset currently has the beginning or end of the ayah missing. This is usually in the "continuous" recitation mode where the reciter is reading ayat in a sequence and the action of clicking next is not perfectly aligned with the reciter starting the next ayah.
- About 33% of the dataset is not actually encoded as
even though the extension iswav
. This is because the encoding of the recording currently depends on the device on which it was recorded and no transcoding is done on the server side. See technical notes to see commands to detect audio type and convert it into another format. - 281 files (~35 minutes) are absolute (or near absolute silence) - probably because the client was not able to access microphone data. Files are listed in the appendix
Count - Type, Sample Rate, Channels , Encoding
3 - wav , 8000 Hz , 2 channels, s16
4 - wav , 192000 Hz , 2 channels, s16
108 - wav , 16000 Hz , 2 channels, s16
441 - aac , 44100 Hz , mono , fltp
7883 - aac , 44100 Hz , stereo , fltp
8226 - wav , 48000 Hz , 2 channels, s16
8335 - wav , 44100 Hz , 2 channels, s16
About 150 ayat were manually annotated along with specific notes for each of them. The following table gives an overview of the diversity of these recitations.
Surah Number | Ayah Number | URL to Recording | Correct Recitation? | Notes |
29 | 51 | 29_51_3340224152.wav | FALSE | |
76 | 1 | 76_1_3020326476_eYuCSBG.wav | TRUE | |
39 | 41 | 39_41_1554241100.wav | TRUE | |
17 | 62 | 17_62_3574186471.wav | TRUE | Contains hesitation |
36 | 37 | 36_37_1754041755.wav | TRUE | |
6 | 86 | 6_86_880309420.wav | TRUE | |
6 | 1 | 6_1_179366417.wav | TRUE | Late start |
15 | 42 | 15_42_4030057700.wav | TRUE | Missing characters at the end |
73 | 8 | 73_8_1202722251.wav | FALSE | Recitation is of a different ayah |
67 | 3 | 67_3_2931720214.wav | TRUE | |
17 | 108 | 17_108_2289803411.wav | FALSE | Empty |
2 | 212 | 2_212_1333381006.wav | TRUE | |
2 | 70 | 2_70_3046857028.wav | FALSE | Incomplete ayah |
90 | 2 | 90_2_3497493728_8UXhCjH.wav | TRUE | |
113 | 1 | 113_1_3192978335.wav | TRUE | |
2 | 9 | 2_9_2493201828.wav | FALSE | Empty |
46 | 9 | 46_9_4035555337.wav | TRUE | |
34 | 29 | 34_29_2472434834.wav | TRUE | Contains hesitation |
18 | 82 | 18_82_936377773.wav | TRUE | Contains hesitation |
1 | 6 | 1_6_3419846128.wav | TRUE | |
16 | 19 | 16_19_3469972445.wav | FALSE | Empty |
56 | 15 | 56_15_560698210.wav | TRUE | |
60 | 13 | 60_13_3223691956.wav | TRUE | |
78 | 17 | 78_17_3573919523.wav | TRUE | |
73 | 19 | 73_19_2936868360.wav | FALSE | Recitation is of a different ayah |
57 | 6 | 57_6_4294733945.wav | TRUE | |
24 | 12 | 24_12_279043107.wav | TRUE | |
8 | 73 | 8_73_2581299400.wav | TRUE | False start and missing characters at the end |
7 | 113 | 7_113_786353004.wav | TRUE | |
26 | 66 | 26_66_1426103961.wav | TRUE | |
54 | 12 | 54_12_3772293134.wav | FALSE | Empty |
38 | 85 | 38_85_3935227980.wav | TRUE | |
5 | 4 | 5_4_859410680.wav | TRUE | Includes characters from next ayah |
3 | 177 | 3_177_2658793224.wav | TRUE | Missing characters at the start, page ruffling at the end |
3 | 133 | 3_133_1551876359.wav | TRUE | |
11 | 20 | 11_20_4169923669.wav | FALSE | Empty |
57 | 8 | 57_8_2564476253.wav | TRUE | |
19 | 63 | 19_63_1205676658.wav | TRUE | Missing characters at the start |
79 | 43 | 79_43_637994452.wav | TRUE | |
28 | 28 | 28_28_1310678751.wav | TRUE | |
37 | 67 | 37_67_1300620030.wav | TRUE | Missing characters at the end |
9 | 55 | 9_55_1002940119.wav | TRUE | |
90 | 12 | 90_12_2612117498.wav | TRUE | |
27 | 48 | 27_48_2658412579.wav | TRUE | |
18 | 86 | 18_86_1990251764.wav | TRUE | |
2 | 258 | 2_258_3034376487.wav | TRUE | |
75 | 29 | 75_29_2044192076.wav | TRUE | Missing characters at the start |
56 | 45 | 56_45_317928467.wav | TRUE | |
67 | 11 | 67_11_587631042.wav | TRUE | Includes characters from next ayah |
77 | 12 | 77_12_3177618951.wav | TRUE | |
18 | 79 | 18_79_1913846909.wav | TRUE | |
9 | 47 | 9_47_1001329568.wav | TRUE | Missing characters at the end |
6 | 22 | 6_22_2259216929.wav | FALSE | Almost empty |
55 | 63 | 55_63_2838529228.wav | TRUE | |
5 | 101 | 5_101_1026578053.wav | TRUE | |
88 | 22 | 88_22_204844392.wav | TRUE | |
20 | 120 | 20_120_2687472128.wav | FALSE | Incomplete recitation |
70 | 28 | 70_28_4018276970.wav | TRUE | |
85 | 17 | 85_17_3749108940.wav | TRUE | |
37 | 160 | 37_160_2251393673.wav | TRUE | Missing characters at the end |
4 | 113 | 4_113_1037152294.wav | TRUE | |
14 | 10 | 14_10_4242971217_1PGDQeo.wav | TRUE | False start |
9 | 103 | 9_103_687047901.wav | FALSE | Empty |
26 | 202 | 26_202_2142350463.wav | TRUE | |
44 | 19 | 44_19_1015587995.wav | TRUE | |
99 | 1 | 99_1_2915792010.wav | TRUE | |
109 | 4 | 109_4_806780268.wav | TRUE | |
31 | 28 | 31_28_2052098992.wav | TRUE | |
67 | 1 | 67_1_3481815969.wav | TRUE | |
3 | 183 | 3_183_3638863501.wav | TRUE | Missing characters at the end |
92 | 11 | recording_n74iGRl.wav | FALSE | Almost empty |
43 | 46 | 43_46_1416875347.wav | TRUE | |
76 | 16 | 76_16_1347475632.wav | TRUE | |
5 | 25 | 5_25_834330814.wav | TRUE | |
88 | 10 | 88_10_1248334570.wav | TRUE | |
11 | 73 | 11_73_973544851.wav | TRUE | |
36 | 2 | 36_2_290659745.wav | FALSE | Empty |
56 | 17 | 56_17_2600116285.wav | TRUE | |
43 | 84 | 43_84_2333856581.wav | TRUE | |
39 | 59 | 39_59_3794840617.wav | TRUE | |
4 | 90 | 4_90_940938099.wav | TRUE | Contains hesitation |
56 | 8 | 56_8_396325185.wav | TRUE | |
80 | 34 | 80_34_2070149842.wav | TRUE | |
31 | 30 | 31_30_1837859811.wav | TRUE | Contains repetition |
2 | 22 | 2_22_614831518.wav | TRUE | |
31 | 27 | 31_27_4208421270.wav | TRUE | |
5 | 46 | 5_46_1451316755.wav | TRUE | Missing characters at the end |
16 | 47 | 16_47_4284316601.wav | TRUE | Contains hesitation |
56 | 32 | 56_32_2576214909.wav | TRUE | |
7 | 157 | 7_157_2885906502.wav | TRUE | Contains repetition |
40 | 19 | 40_19_3673324834.wav | TRUE | Late start |
17 | 74 | 17_74_4093306271.wav | FALSE | Empty |
21 | 87 | 21_87_2253646439.wav | FALSE | Empty |
33 | 30 | 33_30_1047930702.wav | TRUE | |
2 | 101 | 2_101_1722311922.wav | TRUE | |
18 | 107 | 18_107_3605461623.wav | TRUE | |
56 | 16 | 56_16_211365629.wav | TRUE | Missing characters at the end |
3 | 132 | 3_132_1234621432.wav | FALSE | False start, but most of ayah is correct |
73 | 4 | 73_4_2389030636.wav | FALSE | Recitation is of a different ayah |
2 | 207 | 2_207_1738853420.wav | FALSE | Recitation is of a different ayah |
2 | 61 | 2_61_178625651.wav | TRUE | Long with repetitions |
46 | 35 | 46_35_1003010349.wav | TRUE | Starts with A’udhu billahi, long and lots of repetitions minor mistakes with corrections |
2 | 282 | 2_282_2162859841_pNdIVFG.wav | TRUE | |
24 | 61 | 24_61_19471702_AdFj7CM.wav | TRUE | Long with repetitions |
33 | 53 | 33_53_2637130870.wav | TRUE | Minor corrections and a few missing words |
13 | 31 | 13_31_2917018758.wav | FALSE | Wrong ayah |
74 | 1 | 74_1_3853833136.wav | FALSE | No recitation with only background audio |
24 | 31 | 24_31_821140661_ZpLNB1t.wav | TRUE | |
3 | 21 | 3_21_4195370451.wav | TRUE | Slow recitation from new reciter |
4 | 12 | 4_12_3397812632.wav | FALSE | Wrong ayah and bad audio quality |
18 | 86 | 18_86_2892001514.wav | FALSE | No recitation with only background audio |
2 | 187 | 2_187_2567604468.wav | TRUE | |
57 | 20 | 57_20_1444517750.wav | TRUE | |
22 | 5 | 22_5_1319386874.wav | TRUE | Loud audio with many clipped samples |
2 | 43 | 2_43_1839768713.wav | TRUE | |
5 | 6 | 5_6_3971870288.wav | TRUE | |
2 | 164 | 2_164_146627317.wav | TRUE | Minor corrections and pauses |
5 | 12 | 5_12_62466597.wav | TRUE | |
2 | 275 | 2_275_1194007719.wav | TRUE | Starts with Bismillah, includes lots of repetitions and corrections |
39 | 71 | 39_71_2058796630.wav | TRUE | |
24 | 31 | 24_31_3077665806.wav | TRUE | Minor mistakes, pauses and corrections |
46 | 19 | 46_19_1077663174.wav | FALSE | Empty |
58 | 11 | 58_11_143685028.wav | FALSE | Ayah is repeated 2.5 times |
41 | 12 | 41_12_1530114944.wav | TRUE | Contains repetitions |
48 | 29 | 48_29_4136437142.wav | TRUE | Contains long range repetitions |
24 | 61 | 24_61_19471702_Hmf33gP.wav | TRUE | |
24 | 58 | 24_58_3828369781.wav | TRUE | Contains corrections |
2 | 196 | 2_196_1510345966.wav | TRUE | |
2 | 177 | 2_177_242597694.wav | TRUE | |
4 | 12 | 4_12_687196028.wav | TRUE | Contains long range repetitions |
2 | 282 | 2_282_3601780668.wav | TRUE | |
11 | 57 | 11_57_1451809866.wav | TRUE | Starts with Bismillah, new reciter and long pauses |
38 | 23 | 38_23_1115556105.wav | TRUE | New reciter and long pauses |
39 | 38 | 39_38_280896818.wav | TRUE | Contains long range repetitions |
5 | 64 | 5_64_2727374906.wav | TRUE | |
22 | 5 | 22_5_1472297438_KeJeDoL.wav | TRUE | |
24 | 31 | 24_31_821140661_9wmRcHt.wav | TRUE | |
2 | 177 | 2_177_1480427117.wav | TRUE | |
2 | 61 | 2_61_3298157126.wav | TRUE | |
4 | 12 | 4_12_2321146328.wav | TRUE | |
24 | 31 | 24_31_26647198.wav | TRUE | |
2 | 282 | 2_282_1220445354.wav | TRUE | Contains minor corrections and pauses |
58 | 22 | 58_22_3262135905.wav | TRUE | |
18 | 17 | 18_17_1053309820.wav | TRUE | New reciter and long pauses |
33 | 50 | 33_50_1871251261.wav | TRUE | |
7 | 143 | 7_143_2751962472.wav | TRUE | |
33 | 50 | 33_50_3069713586.wav | TRUE | |
24 | 31 | 24_31_821140661_EoyQa04.wav | TRUE | |
2 | 85 | 2_85_4269060186.wav | TRUE | |
11 | 63 | 11_63_4051377768.wav | TRUE | New reciter and long pauses |
48 | 25 | 48_25_4019202576.wav | TRUE | Contains repetition |
13 | 7 | 13_7_3722306701.wav | FALSE | Empty |
The entire dataset of 25,000 recitations has also been automatically evaluated by an algorithm. The algorithm uses Google Speech-to-text to first encode the recordings, and then uses Iqra (a quran search engine) to search for the ayah using the transcription. If the result returned by Iqra matches with the recorded ayah, we mark is as correct. An manual evaluation over 100 ayat revealed an accuracy of about 97%, i.e. 3% of the evaluations were actually wrong (either the algorithm claiming its a correct recitation, but the recitation being wrong, or vice versa). The code is available in the tarteel-dataset-labeler repository.
The automatic evaluated marked 20565 recordings as correct (82.26% of the full dataset). This amounts to 226,985 seconds (65.05 hours) in audio length. The evaluated dataset is available to download here, with the last column containing the automatic evaluations.
A second, extended version of the auto-evaluated dataset is also available, and contains an extra column with the transcriptions for Google's Speech-to-text system. The last column contains the transcriptions, separated by |||
in case multiple transcriptions were available for the same recording.
Some useful subsets and their statistics are listed below:
- Surah #1
- 185 recordings (auto-evaluated as correct)
- Duration: 18 minutes and 13 seconds (1092.8 seconds)
- Surahs #78 - #114
- 2605 recordings (auto-evaluated as correct)
- Duration: 3 hours and 47 minutes (13628.3 seconds)
- Get length of an audio file in seconds:
ffprobe -v error -show_entries format=duration -of default=noprint_wrappers=1:nokey=1 <path-to-audio-file>
- Convert any audio format into a
ffmpeg -i <path-to-input-file> <path-to-output.wav>
- Resample
to have a different sampling rate, bitrate, number of channels etc:
sox <path-to-input.wav> -b <bitrate> -r <sample-rate> -c <number-of-channels> -e <encoding> <path-to-output.wav>
- Detect files that have near silence the entire time (basically microphone data was never recorded, and empty zeroed bytes were sent):
for i in *.wav; do echo $i `ffmpeg -t 10 -i $i -af "volumedetect" -f null /dev/null 2>&1 | grep mean_volume`; done | grep "\-9"