2. Data Collection - Sam1511/Social_Graph_Project_Final_2016 GitHub Wiki
Scraping the Data
The first and most crucial part of this project was scraping (i.e. retrieving) the information of each course from the course database. The content of the course pages were retrieved, structured and segmented by taking advantage of the course base filter by department.Thereafter, it was matter of structuring this retrieved data, so it could become easily readable.
Due to some irregularities with the data formatting on the webpages in the DTU course base, some minor manual clean-ups where necessary in order to make the further computations go more smoothly.