NBT v0.01 - SITE5039/nlp_benchmark_tasks GitHub Wiki

This page outlines about the requirements that needs to be met to roll out NBT version 0.01.

As most of the contributors work on distinct research directions, the evolution of NBT would be a breadth-wise addition of downstream NLP tasks instead of depth-wise.

As far as v0.01 is concerned, after an initial discussion between ZiqiaoWangGeothe, rzTian, and yottabytt, they came up with the following set of approaches and requirements to take the project forward.

Stick to Python 3.6 and Pytorch 1.1 (The stable version at this point in time).
Bucketize the downstream tasks according to distinct and widely accepted measures of performance. For example, F1, BLEU, Perplexity etc.,
Pick the tasks breadth-wise and integrate them into NBT.

Distance metric	F1	BLEU/ROUGE	Perplexity
Word Similarity	Word Sense Disambiguation	Summarization	Language Modeling
Word Analogy	Question Answering	Machine Translation

The above table can briefly give an idea of how they have planned to choose and integrate applications into NBT. The tasks are actually chosen by considering those for which the current performance metrics are relatively lesser than the tasks that were not chosen. By breadth-wise, they mean picking applications row-wise. The above table is definitely subject to change in the future (They believe the change would mostly be in terms of the new additions to it).

For identifying task, datasets and current state-of-the-art(SOTA) code, they have planned to check with NLP-progress.