Development of a scalable platform for the exploration and analysis of large collections of cancer genomes

Remote

Posted 3 years ago

Institute

The European Bioinformatics Institute https://www.ebi.ac.uk/ is an international, innovative and interdisciplinary research organisation funded by European states. The EBI’s goal is to help scientists realise the potential of big data in biology, exploiting complex information to make discoveries that benefit humankind.

Project Supervisor: Dr. Isidro Cortés-Ciriano

Project location: Remote

Project Requirements

Bash, Python, JavaScript, Docker, web design
Experience in complex data visualization and analytics

Project Summary

Genome sequencing has become a routine diagnostic tool for rare diseases and cancer, and a fundamental technology in basic cancer research. A careful analysis and interpretation of cancer genome sequencing data is crucial to deliver correct diagnosis and to guide therapeutic intervention. To this aim, it is crucial to integrate multi-omic and clinical data sets across different scales and across large collections of cancer genomes previously analysed. Despite the importance of this process, there is still a lack of scalable tools for the large-scale exploration of cancer genomics data sets (in particular for whole-genome sequencing data sets) across hundreds to thousands of patients featuring visualization functionalities for the exploration of raw and processed data at multiple scales.

The purpose of this project is to develop an open source visualization and analysis platform for the secure exploration of genome sequencing data sets scalable to large numbers of patients. The platform will provide both high-level and detailed visualization tools to explore one or multiple cases simultaneously, as well as additional clinical and genomic annotations. The ultimate goal of such a tool is to facilitate research and data interpretation in cancer genomics in both academic and clinical settings.

First, we will build an interactive, multi-scale, and scalable platform for the dynamic visualization of collections of sequencing data sets, which will include (i) high-level genome representations using interactive Circos plots (see Figure on the right for an example), (ii) multi-track panels to visualize raw sequencing reads, and (iii) functionalities for the federated access to various collections of sequencing data sets. Secondly, we will develop functionalities to integrate clinically-relevant annotations to facilitate the interpretation of mutations detected in cancer genomes, thus enabling multi-modal, integrative analysis across patients and data types.

Apply Online