PDAI - EMbeDS-education/ComputingDataAnalysisModeling20242025 GitHub Wiki
This is the home page of the courses Programming and Data Analytics and AI
- Module 1
- Module 2-ML
- Module 2-PM
Module 1 is held every year, in the fall.
As regards modules 2, they alternate every year. The are held in the second semester (February-March)
- In 2023-2024: Module 2-ML
- In 2024-2025: Module 2-PM
- In 2025-2026: Module 2-ML
The right-sidebar can be used to navigate pages related to the course, e.g., to consult the calendar, access example datasets, reserve a classroom seat and mark attendance, and retrieve slides, code and materials for our Lectures.
Lecturer: Andrea Vandin ([email protected])
Co-lecturer: Daniele Giachini ([email protected], 1-hour digression on Agent-based models)
Teaching Assistant: Sima Sarv Ahrabi ([email protected])
Former co-lecturer: Daniele Licari, now at Bank of Italy.
- Daniele co-designed these courses and has been co-lecturer until leaving Sant'Anna
Language: English
Duration: Module 1 20h, Nov, 2024; Module 2 20h, Feb-Mar, 2025.
This course is structured in three modules (PDAI1, PDAI2-ML, PDAI2-PM) that students can attend in different years. PDAI1 is offered each year, while the other two alternate. PDAI1 is preparatory to the other two, which can be taken independently of each other.
The course provides a well-structured introduction to the fundamentals of (object-oriented) programming (PDAI1), data processing and artificial intelligence (PDAI2-ML), and process-oriented data science (process mining, PDAI2-PM). The course will focus on how to create good quality software (PDAI1), on how to carry out good quality data analysis and artificial intelligence projects (PDAI2-ML), and on research-oriented aspects related to process-oriented data science, in particular on process mining, where the aim is to analyse and optimise the data-generating process (PDAI2-PM). The student who has achieved the course objectives will gain an understanding of the problems and tasks related to structured programming, data analysis and machine learning in order to be able to make informed decisions. The student will be able to write Python programmes of various kinds, with a focus on complex data analysis and AI tasks, and process mining.
-
Module 1 introduces students to the fundamental principles of structured programming, with basic applications to data processing. It starts from basic notions of programming (variables, data types, collections, control & repetition structures, functions & modules), and progresses to basic data processing functionalities (loading, manipulation, and visualization of CSV data).
-
Module 2-ML introduces students to the components of typical data analysis processes and machine learning pipelines. It first builds the necessary toolset by introducing popular Python libraries for data manipulation/visualization (NumPy, Pandas, Seaborn, scikit-learn) with simple applications. The toolset is then applied to a more complex case study on the classification of benign and malignant breast cancer, including aspects of data preprocessing, dimensionality reduction, clustering, and classification. The course will conclude with one research-driven topics like process-oriented data science (Process Mining).
-
Module 2-PM introduces students to recent data-driven techniques where the main component is the process that generated the data (the data generating process). This is a particularly hot topic, with many companies and researchers involved (see, e.g., the list of industrial that sponsored the reference conference in 2023 https://icpmconference.org/2023/sponsor-and-exhibition/). We will consider techniques known as Process Mining, in which logs generated during the execution of a process (e.g., an industrial production process, business processes, social system 'processes') are used to infer the structure of the process. Questions of interest are, e.g.: What is the actual process being executed? Are there possibilities for improvement? Does the actual process conform to the intended reference process?
A student who has met the objectives of the course will acquire an understanding of the issues and tasks involved in structured computer programming, data analysis, machine learning and process mining, so to be able to make informed decisions. The student will be able to write complex Python programs of various nature, with a focus on complex data analysis tasks like machine learning and process mining.
Prerequisites: No prerequisites for Module 1, while Modules 2 require knowledge on computer programming obtained attending Module 1.
The course makes extensive use of online repositories and game-based e-learning platforms to
- GitHub Wiki: collect and distribute slides, coding examples, datasets, and further course material
- Colab: distribute and automatically provide feedback for coding assignments
- Kahoot: perform online quizzes to monitor the learning process
Where possible, we will also coordinate some content and Practicum activities with the courses Applied Statistics (run in parallel with M1) and Statistical Learning for Large Data Topics in Statistical Learning (whose M1 is run in parallel with our M2).
Suggested books are
- Learning Python, M. Lutz
- Python for Data Analysis, W. McKinney
- Statistics and Machine Learning in Python, E.Duchesnay, T.Löfstedt, F.Younes
- Process Mining: Data Science in Action, W. van der Aalst
We will employ Pyhton as the programming language and statistical software of choice for the course.
- Please visit the setup your machine entry on the right sidebar
Students can attend single modules, therefore there will be an evaluation for each module.
- Module 1: Evaluation will be based on individual oral examinations on the topics covered in the course, starting from the students' solutions to the weekly assignments.
- Module 2-ML: Evaluation will be based on oral examinations, starting from group project work and written reports to be held/handed in at the end of the course. Each group will use a given dataset (or propose one of interest) and apply to it techniques described during the course. The project report consists of a jupyterlab notebook as those used by the lecturers.
- Module 2-PM: To be decided.
Attendance: The course will be given in blended mode using
- the rooms specified in the general calendar
- remotely on WebEx. The recurrent meeting link is: https://santannapisa.webex.com/meet/a.vandin
Allievi Ordinari of Scuola Superiore Sant'Anna have to attend the classes in person, if not explicitly justified (e.g., Allievi abroad participating to the ERASMUS project).
- All other attendees are allowed to attend in person only if enough seats are available (information on this is going to be provided via email). Alternatively, they will have to attend remotely on WebEx (this should not affect the learning process: some previous editions of the course were run mostly online).