Point Assignment Mechanism for Created Quizzes - bounswe/bounswe2024group11 GitHub Wiki

Motivation

In order to enhance our application's utilization in the context of Quality of Service, we have decided to move forward with the implementation of an automatically point assignment mechanism for created quizzes. Moreover, solving process of a quiz should also affect points gained by user following this process. However, under this document, we will only cover point assignment mechanism for quiz creation.

Implemented Mechanism is explained below.

Methodology

It is decided to generate a mathematical formula that will calculate its score for a single question inside any quiz. Then, scores of all questions inside the created quiz will be summed up and this will be construct the point of the question.
Features listed below will be used to create the mechanism for point-assignment of a single question.

Rarity

There are APIs enabling to capture the rarity of a word.
Some of them includes lexi-scoring, frequency-score, etc.
This score can be integrated to point calculation of a single question with its weight.
Datamuse API - returns rough estimate of rarity.
Wordnik API - frequency data.
Oxford Dictionaries API - lexistats
vocabkitchen - understand cefr level of a sentence

Word Complexity

Longer sequence of characters, more syllables generally indicates harder words.
This can be easily partitioned mathematically in the backend side with a lower weight.

Word Part of Speech

Decide to type of the word
Noun-verb (easy), adjectives & adverbs (medium), conjuction & preposition (hard)
These difficulty levels can imply a scoring weight under this aspect.

Abstract or Idiomatic Words

Abstract ones are generally easy to understand.
Again Oxford Dictionary API or calls to website can be utilized.

Length of definition

Words that require longer explanations or more complex examples to define are generally harder to learn.
babel can help with this problem with a binary selection whether long or short description.

Score Calculation of Question

Above features can help with selecting which attributes of a question should contribute to score calculation of a created quiz.
Using a combination of them instead of using all may be more efficient way of reflecting scores.
All features for scoring should be normalized in the same range between two natural numbers such as 0-3, 1-5, etc.
Score enabling features design can include weights.
Total score of a single question can be calculated using these weights included.
Lastly, points can be rounded to numbers multiple of 10 for simplicity on the user side.

Implemented Structure

There is not any CEFR Profiler API available free on the internet according to our research. Thus we will create the formula mentioned above using 4 different features of given keywords:

Frequency
Part of Speech (noun, verb, adjective, etc.)
Number of Syllables
Number of Synonyms

API used for getting these features of a given keyword is Datamuse API. This API is utilized for this task for various reasons:

Free Use: Using this API, we can get responses for 100000 requests per day for free.
Performance: This API enables to get responses for requests in a faster manner compared to other APIs.
Python Library: This API contains a python package which can be imported and used easily for integration.
No Scraping: Other available tools require scraping since there is no available configured API for the service provided.
Variety of Flags: Different kind of flags are provided by this API so that different features can be easily responded.

Algorithm

An algorithm using these 4 different features of given keywords is used to calculate the difficulty of a given word. It simply multiplies the features along with their weights, then sums them up. Reason behind this structure is to be able to scale and fine-tune the features' contribution to the point assignment of a quiz. A feature can have different behavior for words provided than the expected, thus, having a lower weight related with it can help to scale its behavior. Moreover, advantage of using a linear approach to find the balance between features for most accurate points for the words.

Scaling

All of features' points are distributed onto 10 points since the point mechanism should be easily understood by users when it is shown in the leaderboard. Moreover, it also enables the flexibility to degrade a question's rewarded point to a discrete natural number such as 5 when a hint is used to solve the question.

Features mentioned above have their value ranges separately provided from API. Value ranges of four different features are inspected with higher CEFR profiled English words (C1, C2) and lower ones (A1, A2) along with their feature ranges such as frequency values returned from API. Then all of features' points distributed into 10 points.

All of feature points have discrete ranges inside since a continuous math expression of this kind of feature is not describing the behavior of the feature appropriately.

Frequency: API uses Google Ngram and WordNet for this feature. General logic can be expressed as lower the frequency, higher the point. However, instead of using a linear approach, we have used more of a logarithmic scale for points. Since 5 frequency value change does not imply anything if the value is 100, but implies so much when the value is 8. General values returned from this API ranges between 0 to 200 for most of the words.
Number of Syllables: 1 syllable words is generally easier than the others, whilst 4 & 5 syllable words are generally rare. It has a discrete range changing from 1 to 5. 10 points rewarded if the word contains 4 or 5 syllables. Thus, changing lower to higher points as close as linearly.
Number of Synonyms: This feature has not a continuously changing behavior for increasing or decreasing. It rewards highest on the middle at the scale, lowest in the beginning and the middle at the end. Since the mostly used materialistic words have less synonyms than the abstract ones. However, if a word has many synonyms then we can think that this word is generally used thus lowering the point.
Part of Speech: This feature is based on the general idea that adjectives and adverbs are usually have higher profiled words compared to nouns and verbs.

Weights

Different weights for the features are tested for changing point values. Using linear regression for this task is way more suitable, however, since there is no dataset prepared for this purpose it was a challenging problem. Moreover, requests are sent to an API and returned back. Thus, this process should take a time period not convenient for milestone developments.

Weights are:

Frequency: 0.55
Number of Syllables: 0.15
Number of Synonyms: 0.15
Part of Speech: 0.15

Then the formula becomes

Question Point = $0.55\times \text{frequency} + 0.15\times \text{syllable} + 0.15\times \text{synonym} + 0.15\times \text{partofspeech}$

in the range of 10 points per question. It is multiplied with 3 and separated into difficulty levels can be simply thought of within 30 points where:

Beginner: 10 points
Intermediate: 20 points
Advanced: 30 Points

for the reasons explained above.

Quiz Points

A quiz has the point of all of the questions' point summed up it includes. Levels for quizzes is also uniformly distributed between 3 levels specified above. Note that, this structure enables to quizzes to have different points within the same difficulty level according to its questions.