3. Analysis of course network - Sam1511/Social_Graph_Project_Final_2016 GitHub Wiki
Network of courses by recommended prerequisites
The first approach to handling the data was to create an overall network of all the courses found in the database. The link/connection between the courses would be based on their recommended prerequisites that could be found on each page. The references to other courses in recommended prerequisites were based on course numbers which simplified the network generation method, especially by using the networkx library in Python.
That network looks like this:
After generating the network, it is advantageous to utilize networkx to do some network analysis that would be telling of the relationship between courses and lead to interesting regarding the courses. To get an overall intrepetation of the network, utilizing degree distrubution is advantageous.
The degree distrubution
It can be seen in the plot that a majority of the nodes have degree 1 and the number of them is decreasing with higher degree. This can be interpreted as a graph/network with a lot of single nodes connected to one big cluster. We can also see that there is not a single node with a degree 0 which means that if we pick every single node there is always a path to all nodes in the network from that node.
In this case we calculated the in-degrees and out-degrees, plotted it against each other, and eventually isolated outliers in that plot. The outliers would correspond to courses with a big disparity between in-degrees and out-degrees with respect to out-degrees as the higher value. From an analytical standpoint this would indicate which courses are the most infuential on other courses around, so it would be courses that student's should strongly consider taking as they are of importance.
Firstly, is the barplot of the in-degrees and out-degrees:
And the subsequent scatterplot:
_From the notebook: _ The top 10 outliers in the scatterplot was the courses of:
- Course number: 11080 has ingoing degree 0 and outdoing degree 15
- Course number: 42086 has ingoing degree 0 and outdoing degree 11
- Course number: 62193 has ingoing degree 0 and outdoing degree 11
- Course number: 62233 has ingoing degree 0 and outdoing degree 22
- Course number: 02526 has ingoing degree 0 and outdoing degree 14
- Course number: 30740 has ingoing degree 0 and outdoing degree 11
- Course number: 41663 has ingoing degree 0 and outdoing degree 13
- Course number: 30742 has ingoing degree 0 and outdoing degree 11
- Course number: 11375 has ingoing degree 0 and outdoing degree 12
We also looked into in-degree and out-degree centralities in the same procedure. These values tells us specifically the fraction of nodes its incoming and outcoming edges are connected to respectively i.e. the higher the value the more connected a node is the more importance it has to its associates. The nature of the relationship in terms of who is dependent on who is denoted by whether it is out-degree or in-degree. The top outlier in this plot was:
The scatter plot of degree centralities:
_The outliers i.e. most influental courses: _
- Course number: 11080 has ingoing degree 0.0 and outdoing degree 0.00619322873658
- Course number: 62233 has ingoing degree 0.0 and outdoing degree 0.00908340214699
- Course number: 02526 has ingoing degree 0.0 and outdoing degree 0.00578034682081
- Course number: 41663 has ingoing degree 0.0 and outdoing degree 0.00536746490504
- Course number: 02343 has ingoing degree 0.000825763831544 and outdoing degree 0.00536746490504
Likewise, eigenvalue centrality was calculated to review which courses where most popular in regards to other high-degree courses i.e. the most important courses among important courses, which denoted the results. The outliers of the eigenvalue centrality i.e. the most influental courses amongst important courses:
- Course number: 62583 has ingoing degree 0.0 and outdoing degree 0.356957549843
- Course number: 02368 has ingoing degree 0.0 and outdoing degree 0.410986298529
- Course number: 62584 has ingoing degree 0.0 and outdoing degree 0.356957549843
We also looked into the highest betweenness centrality which were indicators of a large influence on the transfer of items (student) through the network of courses, under the assumption that item transfer follows the shortest paths i.e. the overall most important (or in this case neccessary) course.
The most important courses ranked:
- '02131', 9.635616786915753e-06
- '41511', 9.692464083593721e-06
- '41560', 1.210847419240741e-05
- '02402', 1.2148267300081986e-05
- '41704', 1.7054189003390714e-05
- '41401', 1.790689845356025e-05
- '41502', 2.4629091285730088e-05
- '41612', 2.489911594495044e-05
- '01035', 3.510320569864589e-05
- '41501', 6.733562291505434e-05
In regards to the shortest path we also tested out a hypothesis that states that DTU contain more base courses denoted by courses (nodes) that have the shortest path lenghts than courses that have longer shortest path lengths (advanced courses).
A graph showing the frequency of shortest path lenghts:
The last part of our analysis work of the network was identifiying the communities that could be found within it. The interesting aspect of this is that that we supposedly already now the communities (i.e. departments) but the network analyis tools like the louvain-algorithm might denote a different outcome. This could be telling us about how well the distrubution of courses between departments is actually.
The network with communities illustrated by color:
Firstly, we calculate the the assortativity of the undirected graph as a negative value ( -0.0667934917309). The negative value indicate that smiliar high-degree (important) courses do not show a tendency to link to other high-degree nodes(courses). Lastly, we calculate the modularity as it can be compared to the modularity of the actual communities later, where the modularity was 0.8053529955
Another network was made using the pre-requisites but this version set the actual communities denoted by departments.
The course network colored by departments:
Furthermore, networks for each department was created in order to calculate each department's (community) modularity to then collect at have a signle modularity value for the whole network Networks for each department. The overall modularity for the actual communities were 0.4492397651481369. The disparity(the other modularity is higher) between modularity values indicate that the departments are not the best way of defining the communities. The modularity of the communities of the undirected version of the department based network was also higher (0.6853537981269512).
The simplified colored network for courses by departments:
Overall departments can't be considered good communities, so the student has to think beyond his/her department when choosing courses.