Frequently Asked Questions (FAQ) - NC0DER/LMRank GitHub Wiki

Language Support Questions

How many languages does LMRank support?

LMRank currently supports 14 languages in total, as listed in the table below:

Language Code
English 🇬🇧 en
Greek 🇬🇷 el
Danish 🇩🇰 da
Catalan ca
Dutch 🇳🇱 nl
Finnish 🇫🇮 fi
French 🇫🇷 fr
German 🇩🇪 de
Italian 🇮🇹 it
Japanese 🇯🇵 ja
Norwegian 🇳🇴(Borkmal) nb
Portuguese 🇵🇹 pt
Spanish 🇪🇸 es
Swedish 🇸🇪 sv

Will other languages be supported in the future?

LMRank uses the technique of dependency parsing to form candidate keyphrases, which utilizes spaCy's noun chunks.
When spaCy adds a small model (sm) with noun-chunk support for a language, support for it can be easily added.

Practical questions

Are there any examples that show me how to use LMRank?

You can see some examples at Google Colab or GitHub

Can I use a different transformer model from HuggingFace?

Yes, see the relevant section in the examples linked above.

Research Questions

Where can I find the datasets used in the experiments of the publication?

The datasets are available in this link

How can I extract the keyphrases for a specific dataset using LMRank?

Setup the base_path in main.py for the dataset directory and run main().

How can I benchmark the LMRank approach?

Setup the output_path in main.py for the lmrank_timings.csv and run benchmark().