 |
Risorsa bibliografica obbligatoria |
 |
Risorsa bibliografica facoltativa |
|
|
Anno Accademico
|
2025/2026
|
|
Scuola
|
Scuola di Ingegneria Industriale e dell'Informazione |
|
Insegnamento
|
052512 - BIOINFORMATICS ALGORITHMS
|
| Cfu |
5.00
|
Tipo insegnamento
|
Monodisciplinare
|
|
Docenti: Titolare (Co-titolari)
|
Piro Rosario Michael
|
| Corso di Studi |
Codice Piano di Studio preventivamente approvato |
Da (compreso) |
A (escluso) |
Insegnamento |
| Ing Ind - Inf (1 liv.)(ord. 270) - MI (358) INGEGNERIA INFORMATICA | * | A | ZZZZ | 052512 - BIOINFORMATICS ALGORITHMS | | Ing Ind - Inf (1 liv.)(ord. 270) - MI (365) INGEGNERIA MATEMATICA | * | A | ZZZZ | 052512 - BIOINFORMATICS ALGORITHMS | | Ing Ind - Inf (1 liv.)(ord. 96/23) - MI (531) INGEGNERIA INFORMATICA | * | A | ZZZZ | 052512 - BIOINFORMATICS ALGORITHMS | | Ing Ind - Inf (1 liv.)(ord. 96/23) - MI (532) INGEGNERIA MATEMATICA | * | A | ZZZZ | 052512 - BIOINFORMATICS ALGORITHMS |
| Obiettivi dell'insegnamento |
|
Bioinformatics represents an application domain of increasing interest and importance for computer science. This discipline arose from the increasing need to develop appropriate computational methods for different problems of molecular biology and genetics, including the analysis of biological sequences (DNA, RNA, proteins). More specifically, recent next-generation sequencing technologies allow to generate massive amounts of biological sequence data that pose complex problems and require specific algorithms and computational analyses for their efficient and effective processing.
The goal of the course is to introduce students to some of the most important algorithms (and their underlying concepts) used in computational biology/bioinformatics, especially regarding those used for aligning biological sequences and identifying their locations in the reference genome. Also the assembly of a genome from short sequences, the modeling of functional biological sequences and the identification of mutations in a sequenced genome will be discussed. Each topic is treated both theoretically and practically. By discussing also alternative approaches for several of the algorithms, including their advantages and diadvantages, the students can learn how effective and efficient algorithms can be designed and when approximate solutions are preferable.
|
| Risultati di apprendimento attesi |
|
Dublin Descriptors
|
Expected learning outcomes
|
|
Knowledge and understanding (DD1)
|
Students will learn how to:
- Align two or more sequences to each other (global and local sequence alignment, multiple sequence alignment)
- Align short sequences produced by modern sequencing technology to a reference genome (read mapping)
- Assemble a genome from short sequences produced by modern sequencing technology (genome assembly)
- Model functional biological sequences by means of Hidden Markov Models
- Identify differences in a sequenced genome with respect to a reference genome (variant/mutation calling)
|
|
Applying knowledge and understanding (DD2)
|
Given specific biological sequences (e.g., DNA sequences such as next-generation sequencing reads, reference genomes) students will be able to:
- Perform global and local sequences alignments
- Apply Burrows-Wheeler transformations on genome sequences and use the transforms to perform exact and inexact matching of short sequences
- Understand the complexity of sequence analysis algorithms and discard inefficient algorithms
- Understand how functional sequences can be modeled by statistical means using a Hidden Markov Model
|
|
Making judgements (DD3)
|
Given the discussion of multiple alternative approaches and underlying concepts, students will learn to:
- Evaluate the appropriateness of an algorithm based on its complexity as well as the complexity of the task
- Appreciate the importance of algorithms that approximate an optimal solution
|
|
Communication (DD4)
|
(not applicable)
|
|
Lifelong learning skills (DD5)
|
Given the discussion of multiple alternative approaches and underlying concepts, students can:
- Learn to approach the same algorithmic problem in different ways and from different (sometimes opposing) starting points
- Learn to what aspects pay attention when studying a specific approach (e.g., time and space complexity, inexact versus exact solutions)
|
|
|
Detailed course program:
1. (Very brief) Introduction to the biological background
-
Cells, biological molecules (DNA, RNA, proteins) and cellular processes
-
DNA sequencing techniques
2. Sequence alignment
-
Defining and measuring distances and similarities between sequences
-
Pairwise alignment of sequences based on similarity measures
-
Dynamic programming for constructing global and local sequence alignments (Needleman-Wunsch and Smith-Waterman algorithms)
-
Longest common substring problem
-
Multiple sequence alignment (more than two sequences)
-
Approximate versus optimal solution
-
Center star algorithm
-
Tree-based, progressive multiple sequence alignment
-
Weighted dynamic programming based on alignment/sequence profiles
-
Local multiple sequence alignment and motif scoring
-
Computational complexity of sequence alignment algorithms
3. Read alignment of next-generation sequencing reads
-
Suffix trie, suffix tree and suffix array
-
Burrows-Wheeler transform
-
FM index for searching substrings
-
Exact matching and inexact matching of sequencing reads
-
Computational complexity of the different approaches
4. De-novo genome assembly from next-generation sequencing reads
-
Modeling sequence data and computing expected results
-
Graph-based genome assembly (de Bruijn graphs, Eulerian path problem and Hierholzer’s algorithm)
5. Notions of variant calling
-
Single nucleotide variants/mutations
-
Copy number variants (deletions and duplications)
-
Germline variants in hereditary disorders versus somatic variants in cancer
6. Introduction to Hidden Markov Models for sequence modeling
-
Markov chains and Hidden Markov Model (HMMs)
-
Probability of a sequence being generated by an HMM
-
Viterbi decoding and posterior decoding
-
HMM training: learning model parameters
|
| Obiettivi di sviluppo sostenibile - SDGs |
Questo insegnamento contribuisce al raggiungimento dei seguenti Obiettivi di Sviluppo Sostenibile dell'Agenda ONU 2030:
- SDG2 - ZERO HUNGER
- SDG3 - GOOD HEALTH AND WELLBEING
|
|
The discussed algorithms are mainly used in basic biomedical research (SDG3 targets 3.3: fighting communicable diseases; SDG3 target 3.4: reducing mortality from non-communicable diseases) but can also be used for agricultural research, e.g., for the identification of the genetic basis of heat resistance of plants (SDG2 target 2.4: strengthening the capacity for adaptation to climate change, extreme weather, drought, flooding and other disasters).
|
|
Basic knowledge of algorithms and algorithmic complexity
|
|
The assessment will be based on a written exam to be taken in the exam sessions defined by the school and covering all aspects in the syllabus. The exam will assign up to 60 points (+ 6 possible bonus points). 30 cum laude will be assigned when the total score exceeds 64 points.
The following table provides a detailed overview of the elements that will be considered in the written exam.
|
Type of assessment
|
Description
|
Dublin descriptor
|
|
Written test
|
Solution of algorithmic and mathematical problems:
- Computation of sequence alignments (global and local)
- Computation of Burrows-Wheeler transforms and reconstruction of original string from a Burrows-Wheeler transform
- Computation of requirements for genome assembly
- Computation of alignment profiles
- Application of Hidden Markov Models
Exercises focusing on graph-based approaches
- Construction of suffix tries and suffix trees
- Construction of de Bruijn graphs
- Evaluation of Eulerian paths
Descriptive exercises focusing on conceptual aspects:
- Formulation or description of algorithms and/or their requirements
- Basic notions of the biological background on DNA sequences
- Adaptation of learned algorithms to particular problems
|
DD1, DD2
DD1, DD2, DD3
DD1, DD2, DD3, DD4, DD5
|
|
Manuela Helmer Citterich, Fabrizio Ferrè, Giulio Pavesi, Graziano Pesole, Chiara Romualdi, Fondamenti di Bioinformatica, Editore: Zanichelli, Anno edizione: 2018
Dan Gusfield, Algorithms on strings, trees, and sequences, Editore: Cambridge University Press, Anno edizione: 1997
|
| Nessun software richiesto |
| Forma Didattica |
Ore Didattica Assistita (hh:mm) |
% Didattica Assistita |
|
DIDATTICA TRASMISSIVA/FRONTALE
|
32:00
|
65.3 %
|
|
DIDATTICA INTERATTIVA/PARTECIPATIVA
|
17:00
|
34.7 %
|
|
DIDATTICA VALUTATIVA
|
0:00
|
0.0 %
|
|
DIDATTICA LABORATORIALE
|
0:00
|
0.0 %
|
|
DIDATTICA PROGETTUALE
|
0:00
|
0.0 %
|
|
Totale ore didattica assistita (hh:mm)
|
49:00 |
|
Totale ore di studio autonomo (hh:mm)
|
76:00 |
| Informazioni in lingua inglese a supporto dell'internazionalizzazione |
Insegnamento erogato in lingua

Inglese
|
|
Disponibilità di materiale didattico/slides in lingua inglese
|
|
Disponibilità di libri di testo/bibliografia in lingua inglese
|
|
Possibilità di sostenere l'esame in lingua inglese
|
|