Progetto formativo – AA 2023-2024
Pillar
Foundations, Computational Thinking, Logic and Statistics |
---|
Title: An introduction to score-based generative models Lecturer: Proff. Giovanni Conforti (Università di Padova) and Alain Durmus (École Polytechnique, Parigi) Internal Contact Person: Dr. Michele Salvi (salvi@mat.uniroma2.it) Abstract: In simple words, generative modeling consists in learning a map capable of generating new data instances that resemble a given set of observations, starting from a simple prior distribution, most often a standard Gaussian distribution. This course aims at providing a mathematical introduction to generative models and in particular to Score-based Generative Models (SGM). SGMs have gained prominence for their ability to generate realistic data across diverse domains, making them a popular tool for researchers and practitioners in machine learning. Participants will learn about the methodological and theoretical foundations, as well as some practical applications associated with these models. The first two lectures motivate the use of generative models, introduce their formalism and present two simple though relevant examples: energy-based models and Generative Adversarial Networks. In the third and fourth lecture we present score-based diffusion models and explain how they provide an algorithmic framework to the basic idea that sampling from the time-reversal of a diffusion process converts noise into new data instances. We shall do so following two different approaches: a first elementary one that only relies on discrete transition probabilities, and a second one based on stochastic calculus. After this introduction, we derive sharp theoretical guarantees of convergence for score-based diffusion models assembling together ideas coming from stochastic control, functional inequalities and regularity theory for HamiltonJacobi-Bellman equations. The course ends with an overview of some of the most recent and sophisticated algorithms such as flow matching and diffusion Sch¨odinger bridges (DSB), which bring an (entropic) optimal transport insight into generative modeling. Background: Advanced Courses in Mathematical Analysis and Probability & Statistics. Test: To be decided Number of hours: 12 Mode of Delivery: only in-class mode Scheduling: February 2024: Lun 19 14h00-17h00, Mer 21 09h30-12h30, Gio 21 09:30-12h30 |
Title: Indices of Centrality for Complex Networks and their Efficient Computation Lecturer: Prof. D. Bertaccini (bertaccini@mat.uniroma2.it) Abstract: We introduce the main centrality/role indexes to rank nodes and/or data in large complex networks and then we describe algorithmics methods to efficiently compute them. Background: An undergraduate course in Mathematical Analysis Test: oral discussion Number of hours: 15 ore Mode of Delivery: only in-class mode Time Scheduling: April-May 2024 |
Computation, Learning and Inference |
---|
Title: Mining Massive Data Lecturer: Prof. Clementi, Prof. Gualà, Prof. Pasquale (clementi/guala/pasquale@mat.uniroma2.it) Abstract: The course is organized in 3 modules.
Background: Undergraduate Courses in Algorithms & Data Structures and in Probability. Test: To get official credits (CFU), graduate students will be required to give a talk on some topic/result/paper proposed by the Lecturers. In-class presence is mandatory. Number of hours: 4 x 3 = 12 Mode of Delivery: only in-class mode Time Scheduling: The course will be given in the week 11-15 of November 2024. Specific days, hours, and location will be fixed by the second week of October 2024. The course will not be given in streaming and no recording is guaranteed. However, notes and all teaching materials will be available online. |
Title: Deep Learning ad Structured Inference – Neural Models and Algorithms for Linguistic Recognition and Inference Lecturer: Prof. Roberto Basili (basili@info.uniroma2.it), Prof. Fabio Massimo Zanzotto (fabio.massimo.zanzotto@uniroma2.it) Abstract: Modern AI is growingly faced with complex problems, characterized by heterogeneous forms of structured evidence in input and complex decisions. In medicine historical data, biological phenomena or images manifest through streams of structured data, usually digitally represented into sequences, trees or graphs. Machine Learning methods for structured learning have been studied whereas some mathematical paradigms (such as dimensionality reduction, structured kernels or neural embedding) have been proposed as modeling tools. In Natural Language Processing, Machine Translation and other Natural Language Inference (NLI) tasks, such as Question Answering or Textual Entailment, have been approached via kernels or neural models of the input representation. These achieved accurate state-of-the-art classification and prediction capabilities by enabling the exploration of huge spaces of possible solutions (e.g. target sequences or decisions). In this way, they correspond to both enabling technologies and software tools as well as to models of investigation able to systematically select hypotheses and validate controversial theories about linguistic phenomena. The application of these empirical methodologies to other areas like biology, medicine and medical robotics is more than promising, given the similar complexity of the domains targeted by AI and Life Sciences. The course will try to promote this interesting research perspective in Deep Learning to PhD students with a specific focus, but not limited to, Life Science phenomena. Duration: 12 hours Prerequisites: Basic competencies in Algorithmics, Logic and Machine Learning. Final Examination: Project Approximate Timing: February 2024 |
Title: Web of Data
Lecturer: dott. Manuel Fiorelli (fiorelli@info.uniroma2.it)
Internal Contact Person prof. Armando Stellato (stellato@uniroma2.it)
Abstract: The course introduces the Web of Data, as outlined by the Semantic Web and Linked Open Data in terms of an extension of the Web as a global dataspace for publication, reuse and integration of data. Best practices and (open) standards will be discussed as part of the course, emphasizing machine actionability, a core value of the FAIR paradigm for data custody. Emphasizing the distributed and decentralized nature of the Web of Data, prerequisites for autonomy and independence, we will discuss how to avoid the data silos phenomenon through a distributed and as-needed integration process. In this regard, ontology matching and entity linking techniques will be discussed. The dual role of the Web of Data as a controlled environment for Big Data experimentation and as a source of background knowledge for information extraction and content analytics, in general, will be mentioned. Various examples of big datasets will be shown throughout the course, including general resources, such as DBpedia and Wikdata, and more domain-specific GLAM (Galleries, Libraries, Archives, and Museums) resources. Concrete examples of tabular data lifting and modern standards for their semantic annotation will also be shown.
Background: Foundations of Logic, Databases, and Java or Python Languages Test: Written Exam Number of Hours: 14 ore Mode of Delivery: only in-class mode Scheduling: September-December 2024 |
Title: Simulation-based Predictive Process Mining Lecturer: dott. Paolo Bocciarelli (paolo.bocciarelli@uniroma2.it) Internal Contact Person: Prof. Andrea D’Ambrogio (dambro@uniroma2.it) Abstract: The course introduces the essential elements of process mining (PM) and simulation. These approaches are initially proposed as tools for analyzing processes from different perspectives, to achieve different objectives. While PM aims to extract knowledge by analyzing a log that records data on past process executions, simulation provides predictions on future or alternative behaviors of the same process. Then, an innovative point of view is proposed in which PM and simulation are seen as complementary tools whose joint adoption leads to an effective analysis paradigm. The first part of the course introduces basic concepts on simulation: simulation modeling, discrete event simulation, local and distributed simulation. The implementation of a Java-based discrete event simulator is also discussed. In the second part, principles, methods, and tools for PM are provided. Finally, the course introduces “Predictive Process Mining” as an innovative paradigm based on the joint use of the two approaches. It is outlined how the knowledge extracted from the log analysis through PM techniques can be used to guide the development of a simulation model, whose execution provides further insights into the system under study. In this context, the most relevant research challenges, opportunities and open issues are illustrated. Background: Basic skills in software development and knowledge of at least one object-oriented programming language (Java recommended). Test: Practical exercise assigned at the end of the course. Number of Hours: 12 ore Mode of Delivery: in-class mode and online Scheduling: June or September 2024 |
Statistical Methods for Economics and Health Sciences |
---|
Title: Quantile regression Lecturer: Prof. Alessio Farcomeni (alessio.farcomeni@uniroma2.it) Abstract The main techniques of quantile regression, an alternative to classical linear regression, will be introduced. As an example, consider a regression model in which we estimate the association between Equivalised Disposable Income of a sample of households and various predictors, including an exogenous treatment. Using quantile regression, it is possible to estimate the effect of treatment on the entire distribution of households, resulting in a potentially different estimated effect at each quantile. Indeed, the treatment could be positive for the income of rich households (high quantiles) and negative for the income of poor households (low quantiles). Similarly, the association of predictors with median income can be evaluated, avoiding the need to assume that the response is Gaussian (symmetric, homoschedastic) and that there are no outliers. If time permits, principles of robust statistics will also be discussed, including linear regression techniques and robust prediction. Background: Use of software R. Undergraduate Courses in Statistical Inference and Linear Models Tests: Projects Number of Hours: 12 ore Mode of Delivery: only in-class mode Time Scheduling: March, April, or September |
Natural Sciences and Complex Systems |
---|
Title: Hands on Machine Learning for Physics Lecturers: Michele Buzzicotti (m.buzzicotti@gmail.com) Abstract: The course is aimed at deepening the concepts, techniques and tools needed to construct Machine Learning algorithms mainly used in Physics. The target audience are PhD students who want to learn how to program Machine Learning (ML) codes for data analysis of physics problems. During the course, special emphasis will be given to the study of the operation of generative algorithms, such as Variational Auto-Encoders and Generative Adversarial Networks. It starts with a brief theoretical reminder of the problem and then continues with the implementation of a ML algorithm in all its phases, from the construction of the dataset, to the validation of the results. Tests: Projects Background: Undergradute Courses in Linear Algebra, Mathematical Analysis, and Programming in Python Time Scheduling: April – May 2024 Mode of Delivery: only in-class mode Number of hours: 14 |
Culture, Art and Society |
---|
The pillar will propose seminars as well as specific events as part of the teaching program. They will be announced on this site as soon as possible. |