models openai whisper large - Azure/azureml-assets GitHub Wiki
Whisper is an OpenAI pre-trained speech recognition model with potential applications for ASR solutions for developers. However, due to weak supervision and large-scale noisy data, it should be used with caution in high-risk domains. The model has been trained on 680k hours of audio data representing 98 different languages, leading to improved robustness and accuracy compared to existing ASR systems. However, there are disparities in performance across languages and the model is prone to generating repetitive texts, which may increase in low-resource languages. There are dual-use concerns and real economic implications with such performance disparities, and the model may also have the capacity to recognize specific individuals. The affordable cost of automatic transcription and translation of large volumes of audio communication is a potential benefit, but the cost of transcription may limit the expansion of surveillance projects.
The above summary was generated using ChatGPT. Review the original model card to understand the data used to train the model, evaluation metrics, license, intended uses, limitations and bias before using the model.
Inference type | Python sample (Notebook) | CLI with YAML |
---|---|---|
Real time | asr-online-endpoint.ipynb | asr-online-endpoint.sh |
Batch | asr-batch-endpoint.ipynb | coming soon |
{
"input_data": {
"audio": ["https://www2.cs.uic.edu/~i101/SoundFiles/gettysburg.wav", "https://www2.cs.uic.edu/~i101/SoundFiles/preamble.wav"],
"language": ["en", "en"]
}
}
[
{
"text": " Four score and seven years ago, our fathers brought forth on this continent a new nation, conceived in liberty and dedicated to the proposition that all men are created equal. Now we are engaged in a great civil war, testing whether that nation or any nation so conceived and so dedicated can long endure."
},
{
"text": " We, the people of the United States, in order to form a more perfect union, establish justice, ensure domestic tranquility, provide for the common defense, promote the general welfare, and secure the blessings of liberty to ourselves and our posterity, do ordain and establish this Constitution for the United States of America."
}
]
Version: 17
huggingface_model_id : openai/whisper-large
author : OpenAI
inference_compute_allow_list : ['Standard_DS5_v2', 'Standard_D16a_v4', 'Standard_D16as_v4', 'Standard_D32a_v4', 'Standard_D32as_v4', 'Standard_D48a_v4', 'Standard_D48as_v4', 'Standard_D64a_v4', 'Standard_D64as_v4', 'Standard_D96a_v4', 'Standard_D96as_v4', 'Standard_F32s_v2', 'Standard_F48s_v2', 'Standard_F64s_v2', 'Standard_F72s_v2', 'Standard_FX24mds', 'Standard_FX36mds', 'Standard_FX48mds', 'Standard_E16s_v3', 'Standard_E32s_v3', 'Standard_E48s_v3', 'Standard_E64s_v3', 'Standard_NC6s_v3', 'Standard_NC8as_T4_v3', 'Standard_NC12s_v3', 'Standard_NC16as_T4_v3', 'Standard_NC64as_T4_v3', 'Standard_NC48ads_A100_v4', 'Standard_NC96ads_A100_v4', 'Standard_ND96asr_v4', 'Standard_ND96amsr_A100_v4', 'Standard_ND40rs_v2']
Featured
license : mit
SharedComputeCapacityEnabled
hiddenlayerscanned
task : automatic-speech-recognition
View in Studio: https://ml.azure.com/registries/azureml/models/openai-whisper-large/version/17
License: mit
SharedComputeCapacityEnabled: True
SHA:
inference-min-sku-spec: 6|0|56|112
inference-recommended-sku: Standard_DS5_v2, Standard_D16a_v4, Standard_D16as_v4, Standard_D32a_v4, Standard_D32as_v4, Standard_D48a_v4, Standard_D48as_v4, Standard_D64a_v4, Standard_D64as_v4, Standard_D96a_v4, Standard_D96as_v4, Standard_F32s_v2, Standard_F48s_v2, Standard_F64s_v2, Standard_F72s_v2, Standard_FX24mds, Standard_FX36mds, Standard_FX48mds, Standard_E16s_v3, Standard_E32s_v3, Standard_E48s_v3, Standard_E64s_v3, Standard_NC6s_v3, Standard_NC8as_T4_v3, Standard_NC12s_v3, Standard_NC16as_T4_v3, Standard_NC64as_T4_v3, Standard_NC48ads_A100_v4, Standard_NC96ads_A100_v4, Standard_ND96asr_v4, Standard_ND96amsr_A100_v4, Standard_ND40rs_v2
languages: en, zh, de, es, ru, ko, fr, ja, pt, tr, pl, ca, nl, ar, sv, it, id, hi, fi, vi, he, uk, el, ms, cs, ro, da, hu, ta, no, th, ur, hr, bg, lt, la, mi, ml, cy, sk, te, fa, lv, bn, sr, az, sl, kn, et, mk, br, eu, is, hy, ne, mn, bs, kk, sq, sw, gl, mr, pa, si, km, sn, yo, so, af, oc, ka, be, tg, sd, gu, am, yi, lo, uz, fo, ht, ps, tk, nn, mt, sa, lb, my, bo, tl, mg, as, tt, haw, ln, ha, ba, jw, su