top of page
""

FindingPheno

  • LinkedIn
  • Twitter
  • Facebook
  • YouTube

Name

(URL)

Acronym

Description

Reference

Kyoto Encyclopedia of Genes and Genomes

https://www.genome.jp/kegg/kegg2.html

KEGG2

A collection of database resources for understanding high-level functions and utilities of the biological system, such as the cell, the organism and the ecosystem, from molecular-level information, especially large-scale molecular datasets generated by genome sequencing and other high-throughput experimental technologies.

KEGG Organisms

Collections of genes and proteins in complete genomes of cellular organisms generated from publicly available resources, mostly from NCBI RefSeq and GenBank, and annotated by KEGG in the form of KO (KEGG Orthology) assignment.

GO

The Gene Ontology (GO) knowledgebase is the world’s largest source of information on the functions of genes. This knowledge is both human-readable and machine-readable, and is a foundation for computational analysis of large-scale molecular biology and genetics experiments in biomedical research.

Ensembl

Ensembl is a genome browser for vertebrate genomes that supports research in comparative genomics, evolution, sequence variation and transcriptional regulation. Ensembl annotate genes, computes multiple alignments, predicts regulatory function and collects disease data.

MGnify

MGnify offers an automated pipeline for the analysis and archiving of microbiome data to help determine the taxonomic diversity and functional & metabolic potential of environmental samples. Users can submit their own data for analysis or freely browse all of the analysed public datasets held within the repository. In addition, users can request the assembly and analysis of any appropriate dataset within the European Nucleotide Archive (ENA).

The Genome Taxonomy Database

https://gtdb.ecogenomic.org/

GTDB

The Genome Taxonomy Database provides a curated, phylogenetically consistent and rank-normalized database of microbial genomes. It is sourced on the NCBI Assembly database, and receives regular updates.

Genomic Evolutionary Rate Profiling

https://bio.tools/gerp

GERP

GERP++ is a tool that uses maximum likelihood evolutionary rate estimation for position-specific scoring and, in contrast to previous bottom-up methods, a novel dynamic programming approach to subsequently define constrained elements.

Online Mendelian Inheritance in Animals

https://omia.org/home/

OMIA

A catalogue/compendium of inherited disorders, other (single-locus) traits, and associated genes and variants in 346 animal species (other than human and mouse and rats and zebrafish, which have their own resources) co-authored by Professor Frank Nicholas and Associate Professor Imke Tammen of the University of Sydney, Australia, with help from many people over the years. OMIA information is stored in a database that contains textual information and references, as well as links to relevant PubMed and Gene records at the NCBI, and to OMIM and Ensembl.

The NHGRI-EBI GWAS Catalog

https://www.ebi.ac.uk/gwas/

GWAS Catalog

A high-quality curated collection of all published genome-wide association studies enabling investigations to identify causal variants, understand disease mechanisms, and establish targets for novel therapies.

Functional Annotation of ANimal Genomes project

https://www.faang.org/

FAANG

FAANG is the Functional Annotation of ANimal Genomes project. We are working to understand the genotype to phenotype link in domesticated animals.

InterPro

InterPro provides functional analysis of proteins by classifying them into families and predicting domains and important sites. To classify proteins in this way, InterPro uses predictive models, known as signatures, provided by several different databases (referred to as member databases) that make up the InterPro consortium. We combine protein signatures from these member databases into a single searchable resource, capitalising on their individual strengths to produce a powerful integrated database and diagnostic tool.

Genome Properties

Genome properties is an annotation system whereby functional attributes can be assigned to a genome, based on the presence of a defined set of protein signatures within that genome. Properties (which often describe pathways) are composed of steps, with each step defining a protein required for the function of the pathway/property. Genome properties use protein signatures as evidence to determine the presence of each step within a property.

Reactome

Reactome is a free, open-source, curated and peer-reviewed pathway database.

Interactive Pathways Explorer

https://pathways.embl.de/

iPath

A web-based tool for the visualization, analysis and customization of various pathway maps.

MetaCyc metabolic pathway database

https://metacyc.org/

MetaCyc

MetaCyc is a curated database of experimentally elucidated metabolic pathways from all domains of life. MetaCyc contains pathways involved in both primary and secondary metabolism, as well as associated metabolites, reactions, enzymes, and genes. The goal of MetaCyc is to catalog the universe of metabolism by storing a representative sample of each experimentally elucidated pathway.

MSigDB

A collection of annotated gene sets including genes grouped by their location in the human genome, canonical pathways and experimental signatures curated from publications, genes sharing cis-regulatory motifs up- or downstream of their coding sequences, clusters of genes co-expressed in microarray compendia, genes grouped according to gene ontology (GO) categories, signatures of oncogenic pathway activation, and a large collection of immunological conditions. All of the gene sets in MSigDB are manually reviewed, curated, and annotated.

STRING

STRING is a database of known and predicted protein-protein interactions. The interactions include direct (physical) and indirect (functional) associations; they stem from computational prediction, from knowledge transfer between organisms, and from interactions aggregated from other (primary) databases.

IntAct Molecular Interaction Database

https://www.ebi.ac.uk/intact/home

IntAct

IntAct provides a free, open source database system and analysis tools for molecular interaction data. All interactions are derived from literature curation or direct user submissions.

AlphaFold Protein Structure Database

https://alphafold.ebi.ac.uk/

AlphaFold DB

AlphaFold is an AI system developed by DeepMind that predicts a protein’s 3D structure from its amino acid sequence. AlphaFold DB provides open access to 992,316 protein structure predictions for the human proteome and other key proteins of interest, to accelerate scientific research.

Sorting Intolerant from Tolerant

https://sift.bii.a-star.edu.sg/

SIFT

SIFT predicts whether an amino acid substitution affects protein function based on sequence homology and the physical properties of amino acids. SIFT can be applied to naturally occurring nonsynonymous polymorphisms and laboratory-induced missense mutations.

Gallus GBrowse

Online access to genomic and other information about the chicken, Gallus gallus. Includes predicted genes and Gene Ontology (GO) terms, links to Gallus In Situ Hybridization Analysis (GEISHA), Unigene and Reactome, the genomic positions of chicken genetic markers, SNPs and microarray probes, and mappings from turkey, condor and zebra finch DNA and EST sequences to the chicken genome. We also provide a BLAT server (http://birdbase.net/cgi-bin/webBlat) for matching user-provided sequences to the chicken genome.

GalBase

Galbase is a chicken multi-omics database that hosts reference genomes, annotations, high-quality genetic variants, transcriptomes, histone modifications, open chromatin regions, GWAS, and QTL. Galbase allows users to retrieve genomic variations in geographical maps, gene expression in heatmaps, and epigenomic signal in peak patterns, and also provides modules for batch annotation of genes, regions, and loci based on multi-layered omics data. Galbase integrated the UCSC Genome Browser, the WashU Epigenome Browser, BLAT, BLAST, and LiftOver, to facilitate search and visualize sequence features.

ChickenSD

Chicken SNP Database (ChickenSD) is a data container for the variation information of chicken (Gallus gallus) genome. The aim of this database is to construct an SNPs detector and online visualization tool for the chicken research communities on population, evolution, phenotype and life habit studies. Currently, ChickenSD contains ~33 million whole genome non-redundant SNPs with well annotated information, which identified from 865 samples (167 wild, 697 domesticated and 1 hybrid).

Chicken Quantitative Trait Locus Database

https://www.animalgenome.org/cgi-bin/QTLdb/GG/index

Chicken QTLdb

Chicken QTL and association data curated from published data. The database is designed to facilitate the process for users to compare, confirm, and locate the most plausible location for genes responsible for quantitative traits important to chicken production. We have been striving our best to curate all available data, and adding tools to the QTLdb for users to accomplish many data meta-analysis and comparison tasks.

SalmoBase

Salmobase is a tool for making molecular genomic resources for salmonid species publicly available in a framework of visualizations and analytic tools.

Maize Genetics and Genomics Database

https://www.maizegdb.org/

MaizeGDB

MaizeGDB is a community-oriented, long-term, federally funded informatics service to researchers focused on the crop plant and model organism Zea mays. It is a USDA/ARS funded project to integrate the data found in MaizeDB and ZmDB into a single schema, develop an effective interface to access this data, and develop additional tools to make data analysis easier.

FindingPheno’s major objective is to develop better computational solutions for the challenges posed by the vast amount of multi-omics data that is currently being produced. In particular, Work Package 5 develops a statistical inference framework that builds on what we already know about the genes, proteins and metabolic pathways active within both host and microbiome when analysing multi-omics data. The aim is to decrease multiple testing burden by removing data points unlikely to contribute to phenotype, improving our predictions of truly causal molecular interactions.

 

To accomplish these tasks we rely on external public databases to provide evolutionary and biological knowledge that we can integrate into our models. It is important that the information contained within these datasets can be trusted and remains accessible both during and after our project duration, so we focus on those resources which are supported by high quality publications and which follow FAIR data sharing principles. We have complied the following list of useful resources which fulfil these criteria.

bottom of page