Progetto formativo – AA 2023-2024

Foundations, Computational Thinking, Logic and Statistics

Title: An introduction to score-based generative models

Lecturer: Proff. Giovanni Conforti (Università di Padova) and Alain Durmus (École Polytechnique, Parigi)

Internal Contact Person: Dr. Michele Salvi (salvi@mat.uniroma2.it)

Abstract: In simple words, generative modeling consists in learning a map capable of generating new data instances that resemble a given set of observations, starting from a simple prior distribution, most often a standard Gaussian distribution. This course aims at providing a mathematical introduction to generative models and in particular to Score-based Generative Models (SGM). SGMs have gained prominence for their ability to generate realistic data across diverse domains, making them a popular tool for researchers and practitioners in machine learning. Participants will learn about the methodological and theoretical foundations, as well as some practical applications associated with these models. The first two lectures motivate the use of generative models, introduce their formalism and present two simple though relevant examples: energy-based models and Generative Adversarial Networks. In the third and fourth lecture we present score-based diffusion models and explain how they provide an algorithmic framework to the basic idea that sampling from the time-reversal of a diffusion process converts noise into new data instances. We shall do so following two different approaches: a first elementary one that only relies on discrete transition probabilities, and a second one based on stochastic calculus. After this introduction, we derive sharp theoretical guarantees of convergence for score-based diffusion models assembling together ideas coming from stochastic control, functional inequalities and regularity theory for HamiltonJacobi-Bellman equations. The course ends with an overview of some of the most recent and sophisticated algorithms such as flow matching and diffusion Sch¨odinger bridges (DSB), which bring an (entropic) optimal transport insight into generative modeling.

Background: Advanced Courses in Mathematical Analysis and Probability & Statistics.

Test: To be decided

Number of hours: 12  

Mode of Delivery: only in-class mode

Scheduling: February  2024: Lun 19 14h00-17h00, Mer 21 09h30-12h30, Gio 21 09:30-12h30

Title: Indices of Centrality for Complex Networks and their Efficient Computation

Lecturer: Prof. D. Bertaccini (bertaccini@mat.uniroma2.it)

Abstract: We introduce the main centrality/role indexes to rank nodes and/or data in large complex networks and then we describe algorithmics methods to efficiently compute them.

Background:  An undergraduate course in Mathematical Analysis

Test: oral discussion

Number of hours: 15 ore

Mode of Delivery: only in-class mode

Time Scheduling: April-May 2024

Computation, Learning and Inference

Title: Mining Massive Data

Lecturer: Prof. Clementi, Prof. Gualà, Prof. Pasquale (clementi/guala/pasquale@mat.uniroma2.it)

Abstract: The course is organized in 3 modules.

  • Mining Huge Data Sets. One of the key problems in Data Mining is to fastly recover all items that are similar according to effective notions of similarity, such as the Jaccard one. We cover the Locality Sensitive Hashing technique that can be used to break the n^2-time barrier required to solve the problem in the worst case. We introduce the framework of Streaming Algorithms, algorithms for problems in which the input is so huge that it cannot even be stored in the memory and the algorithm can look at each element of the input just once in an online fashion. We study the problems of counting distinct elements, finding the most frequent elements and finding the number of elements in a given queried window that meet a certain criterion.
  • WEB Search Engine. The Page-Rank Algorithm: Introduction to the key algorithmic ideas of the Page-Rank Algorithm and how it computes an effective Popularity Score the modern WEB search engines applies to rank WEB sites and pages.
  • The Bitcoin Lightning Network. Introduction to the fully-decentralized system designed to manage the massive data yielded by the micropayments that take place over the BitCoin Networks.

Background: Undergraduate Courses in Algorithms & Data Structures and in Probability.

Test: To get official credits (CFU), graduate students will be required to give a talk on some topic/result/paper proposed by the Lecturers. In-class presence is mandatory.

Number of hours: 4 x 3 = 12

Mode of Delivery: only in-class mode

Time Scheduling: The course will be given in the week 11-15 of November 2024. Specific days, hours, and location will be fixed by the second week of October 2024. The course will not be given in streaming and no recording is  guaranteed. However, notes and all teaching materials will be available online.

Title: Deep Learning ad Structured Inference – Neural Models and Algorithms for Linguistic Recognition and Inference

Lecturer: Prof. Roberto Basili (basili@info.uniroma2.it), Prof. Fabio Massimo Zanzotto (fabio.massimo.zanzotto@uniroma2.it)

Abstract: Modern AI is growingly faced with complex problems, characterized by heterogeneous forms of structured evidence in input and complex decisions. In medicine historical data, biological phenomena or images manifest through streams of structured data, usually digitally represented into sequences, trees or graphs. Machine Learning methods for structured learning have been studied whereas some mathematical paradigms (such as dimensionality reduction, structured kernels or neural embedding) have been proposed as modeling tools. In Natural Language Processing, Machine Translation and other Natural Language Inference (NLI) tasks, such as Question Answering or Textual Entailment, have been approached via kernels or neural models of the input representation. These achieved accurate state-of-the-art classification and prediction capabilities by enabling the exploration of huge spaces of possible solutions (e.g. target sequences or decisions). In this way, they correspond to both enabling technologies and software tools as well as to models of investigation able to systematically select hypotheses and validate controversial theories about linguistic phenomena. The application of these empirical methodologies to other areas like biology, medicine and medical robotics is more than promising, given the similar complexity of the domains targeted by AI and Life Sciences. The course will try to promote this interesting research perspective in Deep Learning to PhD students with a specific focus, but not limited to, Life Science phenomena.

Duration: 12 hours

Prerequisites: Basic competencies in Algorithmics, Logic and Machine Learning.

Final Examination: Project

Approximate Timing: February 2024

Title: Web of Data
 
Lecturer: dott. Manuel Fiorelli (fiorelli@info.uniroma2.it)
 
Internal Contact Person prof. Armando Stellato (stellato@uniroma2.it)
 
Abstract: The course introduces the Web of Data, as outlined by the Semantic Web and Linked Open Data in terms of an extension of the Web as a global dataspace for publication, reuse and integration of data. Best practices and (open) standards will be discussed as part of the course, emphasizing machine actionability, a core value of the FAIR paradigm for data custody. Emphasizing the distributed and decentralized nature of the Web of Data, prerequisites for autonomy and independence, we will discuss how to avoid the data silos phenomenon through a distributed and as-needed integration process. In this regard, ontology matching and entity linking techniques will be discussed. The dual role of the Web of Data as a controlled environment for Big Data experimentation and as a source of background knowledge for information extraction and content analytics, in general, will be mentioned. Various examples of big datasets will be shown throughout the course, including general resources, such as DBpedia and Wikdata, and more domain-specific GLAM (Galleries, Libraries, Archives, and Museums) resources. Concrete examples of tabular data lifting and modern standards for their semantic annotation will also be shown.

Background:  Foundations of Logic, Databases, and Java or Python Languages

Test:  Written Exam

Number of Hours: 14 ore

Mode of Delivery: only in-class mode

Scheduling:   September-December 2024

Title: Simulation-based Predictive Process Mining

Lecturer: dott. Paolo Bocciarelli (paolo.bocciarelli@uniroma2.it)

Internal Contact Person: Prof. Andrea D’Ambrogio (dambro@uniroma2.it)

Abstract: The course introduces the essential elements of process mining (PM) and simulation. These approaches are initially proposed as tools for analyzing processes from different perspectives, to achieve different objectives. While PM aims to extract knowledge by analyzing a log that records data on past process executions, simulation provides predictions on future or alternative behaviors of the same process. Then, an innovative point of view is proposed in which PM and simulation are seen as complementary tools whose joint adoption leads to an effective analysis paradigm. The first part of the course introduces basic concepts on simulation: simulation modeling, discrete event simulation, local and distributed simulation. The implementation of a Java-based discrete event simulator is also discussed. In the second part, principles, methods, and tools for PM are provided. Finally, the course introduces “Predictive Process Mining” as an innovative paradigm based on the joint use of the two approaches. It is outlined how the knowledge extracted from the log analysis through PM techniques can be used to guide the development of a simulation model, whose execution provides further insights into the system under study. In this context, the most relevant research challenges, opportunities and open issues are illustrated.

Background: Basic skills in software development and knowledge of at least one object-oriented programming language (Java recommended).

Test: Practical exercise assigned at the end of the course.

Number of Hours: 12 ore

Mode of Delivery:   in-class mode and online

Scheduling:  June or September 2024

Statistical Methods for Economics and Health Sciences

Title: Quantile regression

Lecturer: Prof. Alessio Farcomeni (alessio.farcomeni@uniroma2.it)

Abstract The main techniques of quantile regression, an alternative to classical linear regression, will be introduced. As an example, consider a regression model in which we estimate the association between Equivalised Disposable Income of a sample of households and various predictors, including an exogenous treatment. Using quantile regression, it is possible to estimate the effect of treatment on the entire distribution of households, resulting in a potentially different estimated effect at each quantile. Indeed, the treatment could be positive for the income of rich households (high quantiles) and negative for the income of poor households (low quantiles). Similarly, the association of predictors with median income can be evaluated, avoiding the need to assume that the response is Gaussian (symmetric, homoschedastic) and that there are no outliers. If time permits, principles of robust statistics will also be discussed, including linear regression techniques and robust prediction.

Background:  Use of software R.  Undergraduate Courses in Statistical Inference and Linear Models

Tests:  Projects

Number of Hours: 12 ore 

Mode of Delivery: only in-class mode 

Time Scheduling:    March, April, or September

Natural Sciences and Complex Systems

Title: Hands on Machine Learning for Physics

Lecturers: Michele Buzzicotti (m.buzzicotti@gmail.com)

Abstract: The course is aimed at deepening the concepts, techniques and tools needed to construct Machine Learning algorithms mainly used in Physics. The target audience are PhD students who want to learn how to program Machine Learning (ML) codes for data analysis of physics problems. During the course, special emphasis will be given to the study of the operation of generative algorithms, such as Variational Auto-Encoders and Generative Adversarial Networks. It starts with a brief theoretical reminder of the problem and then continues with the implementation of a ML algorithm in all its phases, from the construction of the dataset, to the validation of the results.

Tests: Projects

Background: Undergradute Courses in Linear Algebra, Mathematical Analysis, and Programming in Python

Time Scheduling: April – May 2024

Mode of Delivery: only in-class mode

Number of hours: 14

 

Culture, Art and Society

The pillar will propose seminars as well as specific events as part of the teaching program. They will be announced on this site as soon as possible.