Frequently Asked Questions (FAQ) - NC0DER/LMRank GitHub Wiki
Language Support Questions
How many languages does LMRank support?
LMRank currently supports 14 languages in total, as listed in the table below:
| Language | Code |
|---|---|
| English 🇬🇧 | en |
| Greek 🇬🇷 | el |
| Danish 🇩🇰 | da |
| Catalan | ca |
| Dutch 🇳🇱 | nl |
| Finnish 🇫🇮 | fi |
| French 🇫🇷 | fr |
| German 🇩🇪 | de |
| Italian 🇮🇹 | it |
| Japanese 🇯🇵 | ja |
| Norwegian 🇳🇴(Borkmal) | nb |
| Portuguese 🇵🇹 | pt |
| Spanish 🇪🇸 | es |
| Swedish 🇸🇪 | sv |
Will other languages be supported in the future?
LMRank uses the technique of dependency parsing to form candidate keyphrases, which utilizes spaCy's noun chunks.
When spaCy adds a small model (sm) with noun-chunk support for a language, support for it can be easily added.
Practical questions
Are there any examples that show me how to use LMRank?
You can see some examples at Google Colab or GitHub
Can I use a different transformer model from HuggingFace?
Yes, see the relevant section in the examples linked above.
Research Questions
Where can I find the datasets used in the experiments of the publication?
The datasets are available in this link
How can I extract the keyphrases for a specific dataset using LMRank?
Setup the base_path in main.py for the dataset directory and run main().
How can I benchmark the LMRank approach?
Setup the output_path in main.py for the lmrank_timings.csv and run benchmark().