2. Data Collection - Sam1511/Social_Graph_Project_Final_2016 GitHub Wiki

Scraping the Data

The first and most crucial part of this project was scraping (i.e. retrieving) the information of each course from the course database. The content of the course pages were retrieved, structured and segmented by taking advantage of the course base filter by department.Thereafter, it was matter of structuring this retrieved data, so it could become easily readable.

Due to some irregularities with the data formatting on the webpages in the DTU course base, some minor manual clean-ups where necessary in order to make the further computations go more smoothly.

Transposed image of how the complete dataframe looked like. In this alternate case each category of information is a row and each course is a column.