Identify the language of text - tuarua/Firebase-ANE GitHub Wiki
The contents of this page are based on the original Firebase Documentation
You can use ML Kit to identify the language of a string of text. You can get the string's most likely language or get confidence scores for all of the string's possible languages.
ML Kit recognizes text in 103 different languages in their native scripts. In addition, romanized text can be recognized for Arabic, Bulgarian, Chinese, Greek, Hindi, Japanese, and Russian.
Identify the language of a string
To identify the language of a string, get an instance of LanguageIdentification, and then pass the string to the identifyLanguage() method.
var languageId:LanguageIdentification = NaturalLanguageANE.naturalLanguage.languageIdentification();
languageId.identifyLanguage("My hovercraft is full of eels.",
function (language:String, error:LanguageIdentificationError):void {
languageIdentification.close();
if (error) {
trace("Natural Language error: " + error.errorID + " : " + error.message);
return;
}
trace("Language detected: " + language);
});
If the call succeeds, a BCP-47 language code is passed to the completion handler, indicating the language of the text. See the complete list of supported languages. If no language could be confidently detected, the code und (undetermined) is passed.
By default, ML Kit returns a non-und value only when it identifies the language with a confidence value of at least 0.5. You can change this threshold by passing a LanguageIdentificationOptions object to languageIdentification():
var options:LanguageIdentificationOptions = LanguageIdentificationOptions()
var languageId:LanguageIdentification = NaturalLanguageANE.naturalLanguage.languageIdentification(options);
Get the possible languages of a string
To get the confidence values of a string's most likely languages, get an instance of LanguageIdentification, and then pass the string to the identifyPossibleLanguages() method.
languageId.identifyPossibleLanguages("an amicable coup d'etat",
function (languages:Vector.<IdentifiedLanguage>, error:LanguageIdentificationError):void {
languageIdentification.close();
if (error) {
trace("Natural Language error: " + error.errorID + " : " + error.message);
return;
}
for each (var language:IdentifiedLanguage in languages) {
trace(language.languageCode + " : "+ Math.floor(language.confidence * 100));
}
});
If the call succeeds, a list of IdentifiedLanguage objects is passed to the continuation handler. From each object, you can get the language's BCP-47 code and the confidence that the string is in that language. See the complete list of supported languages. Note that these values indicate the confidence that the entire string is in the given language; ML Kit doesn't identify multiple languages in a single string.
By default, ML Kit returns only languages with confidence values of at least 0.01. You can change this threshold by passing a LanguageIdentificationOptions object to languageIdentification()