Good Leaver

Identification of patterns/functional modules in phosphoproteomics/transcriptomics datasets

Posted 2 years ago


The European Bioinformatics Institute is an international, innovative and interdisciplinary research organisation funded by European states. The EBI’s goal is to help scientists realise the potential of big data in biology, exploiting complex information to make discoveries that benefit humankind.  

Project Supervisor  Dr. Evangelia Petsalaki

Project location:  Remote


  • Experience applying Deep Learning to large data sets  
  • Solid foundational understanding of machine learning/statistics 
  • Bioinformatics knowledge useful but not essential 


Cell signalling describes the processes that occur in a cell in response to changes in its environment. These processes are controlled by proteins that interact with each other. Modification of these proteins by addition of a phosphate group (phosphorylation) regulates these interactions and drives the signal transduction. The end result of a cell signalling process is typically (but not always) a change in the activity of proteins called transcription factors.  These then change the protein contents of the cell, and therefore its functions and behavior, by driving changes in the expression of different genes from which the proteins are produced. 


These processes underlie all cell functions and most diseases. Understanding how the proteins interact with each other in the context of cell signalling to result in changed cell behaviour is crucial for understanding cell functions and disease development, and to develop therapeutics.  

Phosphoproteomics datasets are collected using mass spectrometry and they aim to identify the phosphorylated proteins in a sample, i.e. they capture a snapshot of cell signalling.  Transcriptomics datasets are collected using next generation sequencing (current) or microarrays (older technology) and show which genes are expressed in a sample. They can also be used to infer transcription factor activities.   

Phosphoproteomics data, as mentioned above, provide a snapshot of the cell signalling architecture in a given moment. When analysing such datasets, we commonly try to map our prior knowledge about the sub processes that are represented. These processes however are biased and represent only a fraction of the total processes that are relevant in a cell signalling network.  

Can we combine multiple phosphoproteomics datasets and using deep learning discover functional units in a data-driven way? This could also be done at the level of transcriptomics to identify units of gene regulation, if available phosphodatasets are not enough.   

This project aims to uncover completely novel functional units and to describe the global signalling network architecture, including the understudied space.  

Apply Online

A valid email address is required.
A valid phone number is required.
Copy link
Powered by Social Snap