Do you want to know how we use machine learning in biology?
FindingPheno is a bioinformatics project where we analyse omics data sets using advanced computational methods such as machine learning. But what do all those words mean? Our video will explain!
Examples of how we use machine learning to solve biological problems
Below is a list of case studies where different machine learning techniques are being applied to difficult biological problems, with focus on cutting edge applications in omics and big data.
Do computer scientists have a role to play in the Century of Biology?
Biology has become very data driven. Modern technology allows us to measure all kinds of things, very fast and in large amounts. But all of this data needs to be analysed so that we can make sense of it all, a process which uses computers. It is more important than ever to train people in computer science to help generate and analyse all this data, allowing us to work together to solve the biggest problems.
What is bioinformatics?
The field of data science that works with biological datasets.
Bioinformatics is the process of working any large and complex data sets that come from biology. It relies on computers to process and analyse the data, looking within to understand what is going on at a molecular level and how this affects biological function. Visualising data with graphs or diagrams and writing reports to communicate the findings are also important tasks in bioinformatics.
Omics is the measurement of what kind of information?
All the genes, proteins or other molecules in a sample, all measured at the same time.
Omics refers to a collection of different data types, each one generated by measuring all of a specific type of molecule in a sample. For example, genomics finds the sequence of all of the DNA (or genome), while transcriptomics measures all the different mRNAs (or transcripts), proteomics all the proteins, metabolomics all the metabolites, and so on.
Who designs a machine learning algorithm?
The computer designs the algorithm itself.
Machine learning is a process where the computer evolves and adapts the algorithm without human help, trying different methods to analyse the data, keeping what works and throwing away what doesn't until it narrows down to the best. This automatic training is what makes ML different and it creates a more powerful and efficient system than a human could design.
What kinds of problems can we solve with bioinformatics?
All of the examples given can benefit because they all include living systems as a key part of the problem.
So, among other things, we can: better understand disease to develop new treatments or even cures, change food production to be more sustainable and increase animal welfare, reduce or prevent biodiversity loss and ecosystem damage and improve health and longevity across different populations.
Test your understanding with Kahoot!
The 21st century has been named The Century of Biology. This is where we now use life sciences to solve the world's biggest problems. Things like climate change, biodiversity loss, even human health. For a biologist like me this is super exciting. However, it uses data, lots and lots of data. So we biologists need people who are smart in maths and computers to help us out.
Our project uses bioinformatics to analyse omics data sets using machine learning. That’s a lot of complex technical words all put together, so I'm going to walk you through the concepts.
Bioinformatics is a type of data science where we work with biological datasets called omics. Omics is where you take all the genes, proteins or other molecules within a sample and you measure them all at once, then you look at the networks or pathways that are going on between them. By taking all of the omics types together we can start to get a really good understanding of what’s going on inside a living creature. However, these data sets are really large and complicated and we can’t analyse them by hand, we need to use computers.
The main method that we use is called machine learning. Machine learning is where an algorithm or calculation can take in a whole lot of data, look at that data and then predict an outcome. So, for example, you can take a whole lot of different patient samples and look at them with machine learning and better predict which ones have got cancer.
But the trick is, we don’t come up with the calculations, we let the computer do it, all on its own. So the computer tries a bunch of different things, it learns which ones works and which ones don’t, and it throws away the ones that don’t, and then it adapts and it evolves and it gets better, until it narrows down on the best. And then we come up with a much powerful and efficient calculation than a human could ever think up.
So by using machine learning to analyse our omics data sets, we can get a better understanding of what’s going on inside a living system at a molecular level. And! When you can understand a system, you can design a better system. For example, if you understand everything that’s going on with a fish, you can come up with a better farming system for that fish, reduce your environmental impact and give the fish a better life.
What is Machine Learning?
Machine learning, or ML, is a type of artifical intelligence which is able to adapt and learn based on the data you feed into the system. This learning ability makes ML technology very flexible, able to work with a wide range of data types or different tasks, and can result in an analytical tool that is more efficient and powerful than a human can come up with alone. ML is particularly suitable for analysing large, complex, interacting data sets such as the omics data found in bioinformatics, where the ML system learns how to find patterns in the data that are not obvious or linear, untangling the interactions to make meaningful suggestions.
Find out more about how ML works over here.
Facial recognition in pigs for improved welfare monitoring
Farmers would like to keep track of the individual animals on their farm, keeping an eye on their health and welfare and noticing any problems early so they can be fixed. But modern farming has too many animals for the farmer to watch all of them all the time. So researchers are now adapting facial recognition methods first developed for tracking humans to instead recognise individual pigs, allowing a webcam and computer system to keep an eye on all the animals. By training the system with a type of ML called a deep neural network, it is hoped that it will not only recognise each animal, but also quickly identify when an animal is unhappy or not feeling well based on its posture and facial expressions. This will keep the farmer informed about their animals so that problems can be solved early and all the pigs given the best care.
Genome Wide Association Studies - finding the genes that cause disease
GWAS compare the genome of many individuals, some with a disease and some without, to see which parts of the DNA are more likely to occur when the person is sick and with the aim of figuring out the genetic causes of that disease. But! DNA data is complicated and noisy plus many other things can be involved (e.g. environment, lifestyle, etc). So these studies are very very large, often including hundreds of thousands of people. Analysing all of this data to find which genes are truly involved is very difficult. Using ML for this analysis increases the speed and power, allowing us to develop, combine and test many different statistical models very fast to find the best. This is why ML is now becoming a standard part of the GWAS pipeline.
Read more about ML in GWAS analysis over here.
AI-guided systems for optimised brewing
Brewing is often described as a combination of art and science, where living organisms are used to transform basic ingredients into a new end product. This is because the fermentation process is complicated, i.e. made up of a large network of interacting, sequential decision points, each with many options and which cannot be easily modelled or understood, making it difficult for the brewer to predict the overall outcome at each step. ML systems, however, are becoming powerful enough to build those models, taking in data about how the beer is progressing and providing decision support to the brewer about what steps to take next. By making the correct decisions at the right times the process can be optimised, resulting in cheaper, more consistent and tastier beer at the end.
Improved diagnostics from tissue biopsies - cancer or not?
Cancer diagnosis is usually done by taking a small amount of tissue from the relevant part of the patient's body (called a biopsy), slicing and staining the tissue to visualise what is in there, then analysing it under a microscope. This is because cancer looks different from normal cells, both in the shape and size of the cells and also how the protein markers are expressed. Traditionally, this analysis is conducted by a trained pathologist, who can recognise and interpret different aspects within the complicated images. This process works because the human brain is very good at distinguishing even very small changes in how something looks; however, time consuming and difficult to do. So instead, computer systems are being developed using a type of ML called neural networks which can learn how to analyse the images automatically. These systems can provide a high throughput workflow which is fast, accurate and efficient, reducing wait times for patients to get their diagnosis.
Read more about ML in digital pathology over here.