top of page


  • LinkedIn
  • Twitter
  • Facebook
  • YouTube

Introduction to the FindingPheno research project

Updated: Oct 26, 2022

What actually is FindingPheno?

FindingPheno is a Research and Innovation Action (RIA) combining eight different academic and industrial partners from five countries. RIAs are a specific type of international research collaboration focused on establishing new knowledge, where the partners work together to explore new ideas, technologies, or methods that could eventually become a marketable product. Our funding came from the European Union via Horizon 2020 (H2020), part of the European Framework Programme for Research and Innovation. This framework includes a succession of Work Programmes offering significant public funding to cutting edge research and new product development with the aim of promoting scientific excellence and industrial leadership while developing solutions to societal problems. H2020 was the eighth out of nine such Work Programmes, awarding funding between 2014 and 2020, and all our project activities and outputs are regulated under this programme.

OK, but really, what is FindingPheno?

FindingPheno is a group of data scientists, theoretical biologists and industrial researchers from different organisations working together to find new ways for analysing biological data. We focus on omics data, i.e. high-throughput biochemical measurements of all of the molecules of one type (DNA, mRNA, proteins, metabolites, etc) in a biological sample, using already existing data sets sourced from public repositories. Because of this focus on reuse rather than new data generation, our project does not include any wet lab or fieldwork activities allowing us to concentrate fully on the computational aspects of this task.

Graphs showing publication numbers going up over time
Fig 1: Articles in PubMed mentioning one omics data type vs those mentioning at least three types together ('multi-omics'). Taken from Noor et al. 2019 & Krassowski et al, 2020.

Why do we need FindingPheno?

Ever since the emergence of the Human Genome Project in the early 2000s, the generation of biological data has exploded, with upward trends in new technology, data volume, and data complexity forming an overwhelming tsunami of biological information. Omics data, in particular, continues to accumulate with increasing rates of publication (Fig 1) and data creation (Fig 2) within this area. Increasingly, these data sets contain information from more than one type of molecule within the same sample, known as multi-omics data, and may even include matching data for both a host organism (e.g. a plant or animal) and its associated microbiome, i.e. hologenomic data.

Genomics has nearly as many commits or repositories as astronomy (the largest data generating discipline) and is growing over time.
Fig 2: GitHub commits and repositories for different data science disciplines, with genomics shown in red. Taken from Navarro et al 2019.