top of page


  • LinkedIn
  • Twitter
  • Facebook
  • YouTube

Machine Learning and Statistical Modeling for Multi-Omics

January 2022

Coordinated by University of Turku and attended by members of the FindingPheno project plus close collaborators. Target audience is advanced students and applied researchers who wish to develop their skills in multi-omics analysis.

This workshop provides an overview of analytical tools for multi-omics studies in R. A particular focus is on multi-omics tools and techniques required to process microbial community data in combination with other omics. After the workshop the participants should be able to preprocess and manipulate data, perform simple visualizations and statistical analyses, apply unsupervised and supervised machine learning, and produce robust and reproducible results.

Learning Objectives:

Day 1 Lectures

Welcome and introduction - Leo Lahti, Associate professor (UTU)

   Download ppt

Metagenomics - Katariina Pärnänen, Postdoctoral researcher (UTU)

   Download ppt

Metabolomics - Pande Putu Erawijantari, Postdoctoral researcher (UTU)

   Download ppt

Multi-omics - Leo Lahti, Associate professor (UTU)

   Download ppt

Day 1 Practical Exercises

Led by: Tuomas Borman and Chouaib Benchraka, Research assistants (UTU)


  • PreparationInstructions here for how to install R, R studio and R tools and how to install and load the required packages.

  • Data import and structure: Data is stored using the MultiAssayExperiment (MAE) container, providing an organized way to bind several different data structures together in a single object. Here are instructions for how to import a practise data set in this format.

  • Microbiome data exploration: Investigate how the taxonomic profiling data is organized in R, including aggregation and transformation.

  • Visualization: Data can be graphed using miaViz (instructions) to give a visual overview of the information.

  • Beta diversity: This measures the dissimilarity between samples by quantifying differences in the overall taxonomic composition between them. This section gives methods for measuring and visualizing the beta diversity present within the data.

Day 2 Lectures

Unsupervised and supervised machine learning - Matti Ruuskanen, Postdoctoral researcher (UTU)

   Download ppt

Individual-based modeling - Gergely Boza, Research fellow (CER)

   Download ppt

Data integration - Leo Lahti, Associate professor (UTU)

   Download ppt

Day 2 Practical Exercises

Led by: Tuomas Borman, Matti Ruuskanen and Chouaib Benchraka (UTU)



  • Unsupervised machine learning: Unsupervised learning tries to find information in unlabelled data. Examples given here are biclustering, a method which clusters both rows and columns simultaneously, and MOFA, a factor analysis model that provides a general framework for integrating multi-omic data sets in an unsupervised fashion.


  • Supervised machine learning: These models learn a function to predict values of the dependent variable based on labeled data. Examples given here use random forests and the caret package to train regression and classification models to predict butyrate concentration based on microbiome composition.

bottom of page