Good Leaver

Oxford Nanopore Project

Posted 2 years ago


The European Bioinformatics Institute is an international, innovative and interdisciplinary research organisation funded by European states. The EBI’s goal is to help scientists realise the potential of big data in biology, exploiting complex information to make discoveries that benefit humankind.

Collaborators : Prof. Thomas Keane

Location: Remote

Project Requirements

  • Experience applying advanced statistical and/or ML models to very large data sets
  • Experience with LossLess and/or loss compression algorithms desired but not essential
  • Bioinformatics experience desirable but not essential as training will be provided

Project Summary

Oxford nanopore are one of two third generation sequencing platforms that can decode the sequence of very long strands of DNA, up to 100,000 base pairs, whereas previously technologies were limited to reading a few hundred bases. The platform is now in production at thousands of sites around the world, and the volume of sequencing data and number of sequencing experiments that use nanopore technology is set to increase dramatically. Nanopore sequencing data is derived from electrical voltage changes as DNA moves through the pore. Along with detecting the four primary nucleotides (A, C, G, and T), researchers are investigating how base modifications can also be detected.

The Global Alliance for Genomics and Health (GA4GH) is a non-profit organisation that maintains and develops many of the key community formats for storing and exchanging genomic data (e.g. reads, variants, annotations). The SAM/BAM/CRAM file formats are the widely used file formats for read sequencing data, and can be read by hundreds of community genomics tools and libraries. These formats can store sequencing data from any of the current platforms, including short and long read technologies. The Oxford nanopore platform currently produces a HDF5 based container format (Fast5), which includes all of the run metadata, signal, and read sequences. However, this format is poorly documented and not interoperable with existing community tools. The electrical signal comprises approximately >99% of the space occupied by nanopore data and is required to identify base modifications such as methylation.

The goals of this project are:

  • Implement, benchmark, and evaluate algorithms for compression of nanopore raw signal data (lossless);
  • Implement, benchmark, and evaluate algorithms for lossy compression of nanopore raw signal data (lossy);
  • Coordinate implementation in CRAM of selected compression with GA4GH file formats group;
  • Contribute to the existing ONT to CRAM conversion software package .

Progress will be reported and monitored by weekly meetings and the project collaborators include: James Bonfield (Wellcome Sanger Institute), Adrien Leger (Postdoc, Birney Group), Thomas Keane (EGA/EVA Team Leader), Ewan Birney (EBI Director), Forrest Brennen (Oxford Nanopore), GA4GH Large Scale Genomics Workstream .

Apply Online

A valid email address is required.
A valid phone number is required.
Copy link
Powered by Social Snap