Machine Learning and Statistical Modeling for Multi-Omics
January 2022
Coordinated by University of Turku and attended by members of the FindingPheno project plus close collaborators. Target audience is advanced students and applied researchers who wish to develop their skills in multi-omics analysis.
This workshop provides an overview of analytical tools for multi-omics studies in R. A particular focus is on multi-omics tools and techniques required to process microbial community data in combination with other omics. After the workshop the participants should be able to preprocess and manipulate data, perform simple visualizations and statistical analyses, apply unsupervised and supervised machine learning, and produce robust and reproducible results.
Learning Objectives:
Day 1 Lectures
Welcome and introduction - Leo Lahti, Associate professor (UTU)
Metagenomics - Katariina Pärnänen, Postdoctoral researcher (UTU)
Metabolomics - Pande Putu Erawijantari, Postdoctoral researcher (UTU)
Multi-omics - Leo Lahti, Associate professor (UTU)
Day 1 Practical Exercises
Led by: Tuomas Borman and Chouaib Benchraka, Research assistants (UTU)
Topics:
-
Preparation: Instructions here for how to install R, R studio and R tools and how to install and load the required packages.
-
Data import and structure: Data is stored using the MultiAssayExperiment (MAE) container, providing an organized way to bind several different data structures together in a single object. Here are instructions for how to import a practise data set in this format.
-
Microbiome data exploration: Investigate how the taxonomic profiling data is organized in R, including aggregation and transformation.
-
Visualization: Data can be graphed using miaViz (instructions) to give a visual overview of the information.
-
Beta diversity: This measures the dissimilarity between samples by quantifying differences in the overall taxonomic composition between them. This section gives methods for measuring and visualizing the beta diversity present within the data.
Day 2 Lectures
Unsupervised and supervised machine learning - Matti Ruuskanen, Postdoctoral researcher (UTU)
Individual-based modeling - Gergely Boza, Research fellow (CER)
Data integration - Leo Lahti, Associate professor (UTU)
Day 2 Practical Exercises
Led by: Tuomas Borman, Matti Ruuskanen and Chouaib Benchraka (UTU)
Topics:
-
Cross-correlation analysis: This allows us to analyze assocations between variables, e.g. does a higher presence of a specific taxon equal higher levels of a biomolecule.
-
Unsupervised machine learning: Unsupervised learning tries to find information in unlabelled data. Examples given here are biclustering, a method which clusters both rows and columns simultaneously, and MOFA, a factor analysis model that provides a general framework for integrating multi-omic data sets in an unsupervised fashion.
-
Supervised machine learning: These models learn a function to predict values of the dependent variable based on labeled data. Examples given here use random forests and the caret package to train regression and classification models to predict butyrate concentration based on microbiome composition.