12_Weekly Team Update and Planning - ulquyorra-11/Cinemalytics GitHub Wiki

📅 Entry Date: 2024-02-25

📌 Meeting Topic: Working on Data Sets (Separation, Merging and Cleaning)

✍️ Author: Uzair

🙋 Attendance

Samer
Uzair

✅ Highlights & Achievements

Separated netflix, prime, disney+ datasets in movies and series datasets
Combined movies datasets of all datasets
Combined series datasets of all datasets
Removed unnecessary columns director, date_added, show_id, cast, type
Renamed column listed_in to genre
Renamed 'duration' to 'duration_min' and 'duration_seasons'
Replaced empty values with NULL
Created Cleaned dataset files for further use
Created and assigned new tickets for next tasks

❗ Challenges

DataFrames when saved with Python add a column at the beginning of the dataset. Some data sets were prepared in SQL which did not have this additional column. We discovered this problem while combining the datasets.
As a solution, we manipulated all the datasets in SQL to avoid using 2 platforms but as a suggestion, only one should be used for the whole process to keep the data uniform.

📝 Notes

Cleaned and combined datasets are now available in Data -> Clean folder
Data separation, merging, and cleaning have been done in both Python and SQL for learning
The next step is visualization of the cleaned DataSets
The next group meeting date has been set to Thursday (29.02.2024) at 7:00 PM (19:00)