*Training Material_Machine Learning and Statistical Modeling for Multi-Omics*

January 2022

Coordinated by University of Turku and attended by members of the FindingPheno consortium together with close research collaborators. Target audience included vanced students and applied researchers who wish to develop their skills in multi-omics analysis.

Learning Objectives

The workshop offers an overview of analytical tools for multi-omics studies in R. A particular focus is on multi-omics tools and techniques required to process microbial community data in combination with other omics. After the workshop, participants should be able to (i) preprocess and manipulate data, (ii) perform simple visualizations and statistical analyses, (iii) apply unsupervised and supervised machine learning, and (iv) produce robust and reproducible results.

Day 1 - Lectures

1. Welcome and Introduction - Leo Lahti, Associate Professor (UTU)
Download ppt

2. Introduction to Metagenomics - Katariina Pärnänen, Postdoctoral Researcher
(UTU) Download ppt

3. Completing the picture of microbiome study through the lens of Metabolomics -
Pande Putu Erawijantari, Postdoctoral Researcher (UTU) Download ppt

4. Introduction to Multi-omics - Leo Lahti, Associate Professor (UTU)
Download ppt

Day 1 - Practical Excercises

Led by: Tuomas Borman and Chouaib Benchraka, Research Assistants (UTU)

Topics:
(a) Preparation: Instructions here for how to install R, R studio and R tools and how to install and
load the required packages.

(b) Data import and structure: Data is stored using the MultiAssayExperiment (MAE) container,
providing an organized way to bind several different data structures together in a single
object. Here are instructions for how to import a practice data set in this format.

(c) Microbiome data exploration: Investigate how the taxonomic profiling data is organized in R,
including aggregation and transformation.

(d) Visualization: Data can be graphed using miaViz (instructions) to give a visual overview of the
information.

(e) Beta diversity: This measures the dissimilarity between samples by quantifying differences in
the overall taxonomic composition between them. This section gives methods for measuring
and visualizing the beta diversity present within the data.

Day 2 - Lectures

1. Unsupervised and Supervised Machine Learning - Matti Ruuskanen, Postdoctoral
Researcher (UTU) Download ppt

2. Introduction to Individual-based Modelling: Spatio-temporal models - Gergely
Boza, Research Fellow (CER) Download ppt

3. Data Integration Methods - Leo Lahti, Associate Professor (UTU)
Download ppt

Day 2 - Practical Exercises

Led by: Tuomas Borman, Matti Ruuskanen and Chouaib Benchraka (UTU)

Topics:
(a) Cross-correlation analysis: This allows for the analyzing of associations between variables, e.g.
does a higher presence of a specific taxon equal higher levels of a biomolecule.

(b) Unsupervised machine learning: Unsupervised learning tries to find information in unlabelled
data. Examples given here are biclustering, a method which clusters rows and columns
simultaneously, and MOFA, a factor analysis model that provides a general framework for
integrating multi-omic data sets in an unsupervised fashion.

(c) Supervised machine learning: These models learn a function to predict values of the dependent
variable based on labeled data. Examples given here use random forests and the caret package
to train regression and classification models to predict butyrate concentration based on
microbiome composition.

Machine Learning and Statistical Modeling for Multi-Omics: