Good Leaver

Large-Scale Genomic Copy Number Software Development

Remote
Posted 2 years ago

Institute  

The European Bioinformatics Institute https://www.ebi.ac.uk/ is an international, innovative and interdisciplinary research organisation funded by European states. The EBI’s goal is to help scientists realise the potential of big data in biology, exploiting complex information to make discoveries that benefit humankind.  

Project Supervisor: Ewan Birney / Tomas Fitzgerald

Project location: Remote 

Project Requirements  

  • Essential: Experience of statistical modelling using linear models 
  • Desirable: Experience of statistical modelling using complex traits 
  • Essential: C++, R, bash, LSF (or similar batch scheduler) 
  • Desirable: Knowledge of algorithm optimisation techniques and data format design for efficient storage / fast access  

Project Summary  

Over the past years we have developed methods that allow copy number variation (CNV) detection and genome wide association testing (GWAS) from extremely large whole genome and whole exome sequence datasets. These methods require considerable resources to run, generating very large amounts of CNV specific information genome wide across many hundreds of thousands of large datasets. The base methods and packages have been developed and are made up of a mixture of C++ libraries, R packages and bash scripts. This software also generates its own CNV specific custom binary file format that is intended to allow very fast data slicing across genome positions and samples.

Currently we have packaged these methods as docker and singularity containers which allows us to transfer methods onto a number of cloud-based systems for large scale data processing. This software also runs on a very large compute cluster based at the EBI.

This project is intended to take this development grade software towards a production grade software package, ideally creating a single C++ library and slick command line execution entry point. There are also further research areas available into method optimisation, data structure improvements as well as modelling improvements for those who are interested in developing cutting edge big data processing solutions.    

Apply Online

A valid email address is required.
A valid phone number is required.
Copy link
Powered by Social Snap