Recommendation Engine Phase 1 Overview & Design Document - VSAResearchGroup/MathEngine GitHub Wiki
Recommendation Engine Design Document
Problem Statement:
The Virtual Student Advisor (VSA) is a project aims to help students to choose amongst the best possible academic routes to transition from community college to the desired degree program and reduce the amount of human efforts being put by academicians towards helping students chart the right academic routes.
There can be various combinations of courses for which credits can be transferred by a student to fulfill the requirements of degree program. Let us call these course as CT. Each CT is preceded by its prerequisite course and that course may be preceded by its prerequisite courses. Hence, for a student to qualify a CT, a student may have to study a chain of courses. Two or more course chains may qualify as permissible for one CT.
Besides having (i) various combinations of CTs, and (ii) an unknown number of permissible academic routes for a CT; (iii) different academic routes can have common precincts within them, and hence, solving such a problem requires involvement of complex human understanding. We try to solve this problem by dividing the whole project in different phases out of which we are going to discuss only the phase #1 here.
Phase #1 : In this phase, we try to enumerate all the possible combinations of academic paths which can be transferred towards a degree program.
Phase #2 : In this phase, we try to generate viable academic plans by feeding all the possible combinations of academic paths to an artificial engine which forms the crux of this project.
Phase #1 : In this phase, we try to fine tune and secure the whole process being automated by VSA.
Expected Outcome:
To take an input from the end-user containing a list of all the possible courses which can be transferred for qualifying a degree program and determine all the possible academic routes for all the courses fed as input with the following features:
- The algorithm should determine at-least one academic route for each input course.
- The academic routes so generated should be delivered in ascending order, i.e. best first.
- The run-time complexity for the average and best case of the algorithm should not exceed Ξ(n), i.e. over a wide search space, the algorithm should be making exactly n passes where n is the number of all possible routes.
Approach:
In the proposed approach, we create a directed graph of the CTs and their prerequisites. The algorithm uses custom backtracking technique to construct various academic routes within this graph.
Creation of Graph:
-
We first create a graph of all courses which form the prerequisite for a set of CTs. The prerequisite courses come at the top of the graph and CTs lie at the bottom of the graph. Each course forms a node of the graph and the βis a prerequisite toβ relation forms the edge between this course and its successive course.
-
Each node is augmented with a vector of integer type with length, say n. The length βnβ for a node denotes the number of ways the node can be reached from an (any) starting-point (base-courses), i.e. opposite to the direction of the edges pointing towards the node.
- If we create two child branches from a parent node, both the children nodes inherit a copy of this vector, in the same sequence as its parent.
- Otherwise, if two graph nodes direct towards the child node, the child node inherits vector of augmented data from both the parents, one by one. Each parentβs vector is appended to the child node, from left to right in sequence.
- The values for each column of the vector denotes the distance of the node from the starting-point courses corresponding to that column, i.e. number of intermediate courses required to be taken as prerequisites.
- With each inheritance, the corresponding values within the augmented vector are updated by the distance represented by the respective child-parent edge of the graph.
For example, in fig. 1, node D4 spawns two children nodes, D5 and X1, both of which inherit its vector of augmented data. In a sequence of left to right, D4 is the third parent node which connects with X1, after B4; hence D4 vector is appended after C1 vector is appended to B4 vector; before being inherited by X1. Values of this vector are then updated with respect to the distance between each parent and child node pair, correspondingly. Since, all the nodes in this example are far away from each other by 1 unit, hence, all the values are added with 1 unit distance.
Calculating various Academic Routes:
Based upon the maximum number of intermediary child nodes, various unique academic routes can be calculated between each starting and AIM courses, by using the vector of augmented data. The total number of columns in augmented vectors of each AIM course node and their corresponding values represent the total number of unique routes and their corresponding distance to the starting courses. Backtracking can be done from the AIM courses to the starting course by following these steps:
- Select the right most column of the augmented vector attached with this node, which has a non-negative value and changes its value to 0.
- Select the edge of the graph which corresponds to the augmented distance selected in the previous step.
- Move in the opposite direction of the directed edge and reach the parent node.
- Repeat the process till all the columns within the edge vector change their distances into a 0 value and create corresponding chains of course paths.
For example, AIM2 course would have 4 unique academic routes:
AIM2 β X2 β X1 β B4 β B3 β B2 β A2
.
AIM2 β X2 β X1 β C1 β B1 β A1
.
AIM2 β X2 β X1 β D4 β D3 β C3 β C2 β A2
.
AIM2 β X2 β X6 β D5 β D4 β D3 β C3 β C2 β A2
.
Hence, all the unique academic paths can be computed using this algorithm, which can be further sorted in ascending order to their total distance, easily.