Study Applied Bioinformatics at Cranfield

Over the past few years, bioinformatics has become the most exciting field in biology. This MSc course provides a unique hands-on learning experience in bioinformatics skills, by combining the latest advances in analysing high-throughput genomic, transcriptomic and metabolomics data.

Cranfield’s Bioinformatics MSc is the first of its kind and the longest-running bioinformatics course in the UK. With more than 200 alumni over the past 10 years, it has become the most popular postgraduate course in bioinformatics in Europe. Because Cranfield is a solely postgraduate university it means that every single taught module of the Applied Bioinformatics course is uniquely tailored to be master's level. That’s why it is the winner of the BBSRC’s Master’s Training Grant (MTG) award for the best course in life sciences.

Our taught modules cover in great depth a plethora of programming languages typically applied in bioinformatics, such as Python, Java, R and SQL, as well as modern web technologies, such as JavaEE, NoSQL and JavaScript. Furthermore, we have two dedicated taught modules focusing on established bioinformatics protocols for the latest next generation sequencing (NGS) and 3rd generation sequencing (3GS) technologies.


Overview

  • Start dateFull-time: October, part-time: October
  • DurationOne year full-time, two-three years part-time
  • DeliveryTaught modules 40%, group project 20%, individual research project 40%
  • QualificationMSc
  • Study typeFull-time / Part-time
  • CampusCranfield campus

This course meets the requirements of the Level 7 Bioinformatics Apprenticeship Standard. Eligible organisations can use £18,000 of their Apprenticeship Levy to cover the course tuition fees. View fees and funding informationor find out more about master's-level apprenticeships.



Your career

Industry, alumni and current students talk about Bioinformatics at Cranfield

Bioinformatics is a fast-growing field that offers progressive career opportunities for forward-thinking people who are ready to grasp the challenge; people who understand both the biological and computing aspects of this science.

Our MSc opens doors to careers in industry, public research establishments, and university research. The multidisciplinary nature of our course has allowed our students to follow diverse career paths in various medical-related sectors.

Successful graduates have been able to pursue or enhance careers in a variety of key areas such as:

Pharmaceutical and biotech companies, plant research institutes, food sector, public health sectors, bioinformatics & IT companies.

Cranfield Careers and Employability Service

Cranfield’s Career Service is dedicated to helping you meet your career aspirations. You will have access to career coaching and advice, CV development, interview practice, access to hundreds of available jobs via our Symplicity platform and opportunities to meet recruiting employers at our careers fairs. Our strong reputation and links with potential employers provide you with outstanding opportunities to secure interesting jobs and develop successful careers. Support continues after graduation and as a Cranfield alumnus, you have free life-long access to a range of career resources to help you continue your education and enhance your career.

Others have chosen to continue their research training by undertaking a PhD either at Cranfield or elsewhere.

Cranfield graduates are highly successful in achieving relevant work. For professionals already in the industry, Cranfield qualifications enhance their careers, benefiting both the candidate and their employer.

Previous students have gone on to jobs within prestigious institutions including:
The Sanger Institute, Illumina, Oxford Nanopore, AstraZeneca, The European Bioinformatics Institute (EBI), GlaxoSmithKlinePubGeneTessellathe Wellcome Trust, InpharmaticaInvitrogen, Oxford Gene TechnologyCancer Research.

Others have chosen to continue their research training by undertaking a PhD either at Cranfield or elsewhere.

Cranfield graduates are very successful in achieving relevant work. For professionals already in the industry, Cranfield qualifications enhance their careers, benefiting both the candidate and their employer.

Since being at Cranfield, the highlight has been the group project. It feels less like “school” and reminds me of working on team projects at a company, with all the associated pros and cons. We are currently in week 6 of 10 and the project is really coming together. Each person has grown into their own unique role and has provided their own contribution, it’s been a lot of fun.
Cranfield University has a good reputation among recruiters. Also, the courses are applied and up to date with what the industry needs. And specifically for my course, among all the courses I checked in the UK, it had the most programming languages, which is what we need for my field.
The curriculum struck a perfect balance between theoretical knowledge and hands-on practical experience, providing me with a comprehensive understanding of different domains of bioinformatics. This approach not only enriched my academic journey but also equipped me with skills that are invaluable for my future career prospects.

Why this course?

1. The only Bioinformatics MSc in the UK offering a truly bespoke postgraduate experience

Cranfield University is the only solely postgraduate university in the UK, which means that every single lecture and practical session within the Applied Bioinformatics MSc is tailored to master's level. Unlike other MSc Bioinformatics courses, you may come across, you will not be sharing any modules or lectures with other undergraduate students or MSc courses. This gives our Applied Bioinformatics MSc a truly tailored postgraduate experience.

2. A variety of programming languages

Experience taught us that there is no such thing as a single preferred programming language in the field of bioinformatics. Every programming language has its strengths and advantages depending on the task in hand. For example, Java can be quite powerful if you are developing a visualisation and/or standalone application, while R and Python are excellent choices for machine learning and statistical analysis. Perl, on the other hand, is a very easy programming language to learn by biologists and forms the foundation of most of the legacy tools and frameworks developed for the human genome project, and still being used to date. This is why the Applied Bioinformatics MSc is the only course in the UK that offers three dedicated programming modules as part of the taught component, covering R, Java, and Python. Furthermore, other programming languages such as Bash, SQL, JavaScript, and Python are also comprehensively covered. This means that upon the completion of this course, you won’t only have the skills and expertise to develop optimised bioinformatics tools for various tasks, but you will also find it relatively much easier to learn new programming languages that were not covered during the course as you will have the foundation in interpreted, object-oriented, and statistically-focused languages.

3. A truly NGS- and 3GS-focused course

Analysing sequencing data from the latest sequencing platforms such as Illumina® Pacific Biosciences® and Oxford Nanopore® nowadays is a standard skill required for most bioinformatics jobs (a quick search on LinkedIn for bioinformatics jobs should prove this!) This is why the course includes two dedicated modules focusing on analysing sequencing data. The first module, 'Next Generation Sequencing Informatics', focuses on pre-processing and analysing Illumina® short-reads sequences for performing sequence alignment, gene expression profiling using RNA-Seq, and genotyping for variant discovery. The second module, 'Advanced Sequencing Informatics and Genome Assembly', provides hands-on experience in performing de-novo sequence assembly using short and long-reads sequencing data, as well as providing computer practical sessions in developing and optimising your own assembler using the overlap-layout-consensus (OLC) and de-Bruijn-graph (DBG) algorithms.

4. Industrial and research applications

Example article 1

Example article 2

This MSc is supported by our team of professional thought leaders, including Professor Andrew Thompson who is influential in this field and an integral part of this MSc.

Informed by industry

Cranfield University benefits from the input of a group of world-renowned experts in a range of applied sciences including bioinformatics. We lead and collaborate in diverse research and consultancy projects, both nationally and internationally.

Our collaborators include:

  • AstraZeneca
  • Horizon Discovery
  • Illumina
  • UK Health Security Agency (formally known as Public Health England)
  • GlaxoSmithKline
  • London School of Hygiene and Tropical Medicine
  • Queen Mary University of London
  • Unilever
  • Rothamsted Research
  • The European Bioinformatics Institute
  • University of Athens
  • Cambridge University

Course details

The taught programme is generally delivered from October until March and comprises eight compulsory taught modules, a group project and an individual thesis project. Students on the part-time programme will complete all of the compulsory modules based on a flexible schedule that will be agreed with the Course Director.

Water course structure diagram
 

Course delivery

Taught modules 40%, group project 20%, individual research project 40%

Group project

Watch our Applied Bioinformatics group project video

Real-life experience

Working in project teams is part of everyday working life. It requires not only your individual expertise but also an appreciation of the skills of the other members of the team. This part of the course gives you the opportunity of working as part of a team on a group project. This is an invaluable experience that will help you to recognise and implement the differing contributions that colleagues bring to team work, and the different roles that we can choose to play within a team. 

Individual project

Industry-related projects

A four-month thesis project carried out either at Cranfield or an external research establishment or commercial organisation within the UK or Europe, this gives you the chance to concentrate on a subject area of particular interest to you, perhaps in collaboration with the type of organisation that you are hoping to find employment with.

Real-life problem-solving thesis projects

Our MSc students finalise their hands-on study practice with individual thesis projects that solve problems in multidisciplinary areas whilst working under academic supervision. Some recent projects include:

- Development of a web-based resource for tuberculosis genotyping and diagnosis from whole genome sequencing data: PhyTB

This project by Ernest Diez (2013-2014) is focused on creating PhyTB - an application for the interactive study of variation in M.tuberculosis using data from the PhyloTrack library.

Further reading

- Applications of data science and machine learning in detection of meat adulteration

This project by MSc student Rafal Kural (2014-2015) is focused on the application of machine learning methods to unravel hidden patterns of meat samples using Fourier transform spectrometry, gas chromatography mass spectrometry, high performance liquid chromatography and VideometerLab. Over the course of this work, it has been proven that it is certainly possible to obtain very accurate detection of meat adulteration, reaching sample adulteration level prediction accuracy of 100% for GCMS and 90-97% for FTIR and VM data.

Modules

Keeping our courses up-to-date and current requires constant innovation and change. The modules we offer reflect the needs of business and industry and the research interests of our staff and, as a result, may change or be withdrawn due to research developments, legislation changes or for a variety of other reasons. Changes may also be designed to improve the student learning experience or to respond to feedback from students, external examiners, accreditation bodies and industrial advisory panels.

To give you a taster, we have listed the compulsory and elective (where applicable) modules which are currently affiliated with this course. All modules are indicative only, and may be subject to change for your year of entry.


Course modules

Compulsory modules
All the modules in the following list need to be taken as part of this course.

Exploratory Data Analysis and Essential Statistics using R

Module Leader
  • Dr Maria Anastasiadi
Aim
    This module provides you with an introduction to the theoretical and practical aspects required to undertake rigorous and valid data analysis of multivariate biological datasets using the R environment. You will learn best practices for experimental design and data collection, inspection and manipulation and visualisation of biological datasets, statistical analysis, and interpretation of the results. You will use R throughout the course to execute these tests using existing R libraries and will also learn to develop bespoke scripts for their individual needs. The course aims to bring you up to an advanced level in using R tools for data science and you will have the opportunity to gain significant hands-on experience throughout the course on several different topics. 
Syllabus
    • An introduction to R,
    • Introductory statistics – averages, variance, and significance testing,
    • Data pre-processing techniques,
    • Introduction to Bayesian Statistics,
    •  Exploratory data analysis using unsupervised methods (PCA, HCA, k-means).
Intended learning outcomes

On successful completion of this module you should be able to:

  • Generate R scripts to perform data analysis tasks,
  • Critically assess the basic principles of different statistical techniques, be able to implement them programmatically and effectively integrate and devise statistical methods into experimental protocol design,
  • Conduct exploratory data analysis and manipulate the data to meet required specifications using different data pre-processing techniques,
  • Evaluate the difference between univariate and multivariate analysis,
  • Apply exploratory data analysis using unsupervised multivariate analysis methods. 

Introduction to Bioinformatics using Python

Module Leader
  • Dr Alexey Larionov
Aim

    This module provides a general introduction to bioinformatics and fundamentals of programming. The module covers the programming basics required by students in order to program in Python, which is nowadays becoming one of the most popular programming languages in the bioinformatics community; and its application in retrieving, parsing and visualising biological sequence data.

Syllabus

    Fundamentals of Python programming,

    Introduction to Object Oriented Programming (OOP),

    Simple mathematical operations,

    Modules in Python,

    Various data types and Objects,

    Control Statements,

    Lists, Tuples and Dictionaries,

    Functions,

    Regular expressions,

    Error handling,

    File IO,

    Programming for biology using BioPython:

    DNA sequence manipulation,

    Reading protein files,

    Performing Multiple Sequence Alignment,

    BLAST,

    Data Visualisation.

    Biological data formats.

Intended learning outcomes

On successful completion of this module you should be able to:

  • Identify the most important programming structures.
  • Retrieve nucleotide, protein sequences and their corresponding metadata from online public data resources.
  • Develop custom Python scripts for sequence manipulation.
  • Develop Python scripts to automate data handling and curation tasks.
  • Develop advanced stand-alone Python programs for the acquisition and consolidation of data from remote databases.

Application of Bioinformatics in Epigenetics, Proteomics and Metagenomics

Module Leader
  • Dr Alexey Larionov
Aim

    To provide you with the knowledge of the current trends in analysing epigenomic, proteomic, and metagenomic data and to demonstrate its principles, challenges, and complexities in bioinformatics.

Syllabus
    • Introduction to general epigenetics concepts, and to analysis of DNA methylation, histones modification, chromatin structure, and transcription factors binding sites.
    • Quality control, pre-processing, and analysis of ATAC-seq data through a standard pipeline.
    • Application of bioinformatics to relate ATAC-seq to transcriptomics data and to assess phenotypic outcomes.
    • Introduction to practical proteomics (qualitative & quantitative).
    • Proteomics repositories and databases (PDB, UNIPROT, etc.).
    • Protein/peptide identification algorithms Protein structures and molecular modelling.
    • Soil metagenomics: quality control, filtering and assembly to taxonomic classification, clustering, and functional assignment.
    • Analysis of microbial community composition and comparative metagenomics using QIIME2 pipeline and selected R packages
Intended learning outcomes

On successful completion of this module you should be able to:

  • Synthesise information to discuss the key technological development in the acquisition of epigenomic, proteomic and metagenomic data,
  • Explain the mode of operation of the most common analytical techniques and how these relate to the quality of the data acquired,
  • Critically assess current practices and identify the relative strengths and weaknesses of the techniques covered,
  • Discover information using bioinformatics tools and effectively apply the information to biological problems,
  • Participate in scientific discussions regarding the relevant omics technologies and evaluate scientific results.

Next Generation Sequencing Informatics

Module Leader
  • Professor Fady Mohareb
Aim

    To introduce you to the techniques that have given rise to the genomic data now available and develop skills and understanding in the bioinformatics approaches that facilitate evaluation and application of these data. Over the past decade, Next-generation DNA Sequencing (NGS) technology has been a huge stimulus for a lot of breakthrough discoveries in biology. This module provides an overview of many core types of NGS projects, including latest protocols in genomic and transcriptomic analyses, genotyping and variant calling as well as detailed hands-on practical sessions of our best practice data-analysis workflows.

Syllabus
    • Gene expression analysis using microarray,
    • Introduction to Next Generation Sequencing (NGS) Technology,
    • Overview of genome assembly and quality control,
    • Transcriptome informatics,
    • Sequence data analysis web platforms,
    • Geneotyping and variant calling.

Intended learning outcomes

On successful completion of this module you should be able to:

  • Critically evaluate the operation of the most common analytical techniques used in the acquisition of genomic sequence and expression data,
  • Critically assess the quality of raw sequence reads and apply various techniques to overcome the challenges of dealing with poor quality data using appropriate software tools,
  • Perform gene expression profiling using both first and next generation sequencing data,
  • Critically assess current practices and evaluate the relative strengths and weaknesses of the techniques covered and how these relate to the quality of the biological findings,
  • Critically contrast a range of NGS tools and related sequence software tools for NGS applications and interpret the output from those tools.

Machine Learning for Metabolomics

Module Leader
  • Dr Maria Anastasiadi
Aim
    During this module you will learn the main aspects related to the analysis of the metabolic profile in living organisms and explore statistical and computational techniques that are central to the field of metabolomics with particular emphasis on machine learning. Machine learning is a rapidly expanding form of artificial intelligence (AI) which has found many applications in the field of Biosciences. Examples include explanatory analysis of complex biological systems, novel biomarker discovery and prediction modelling. You will have the opportunity to learn the fundamentals of machine learning, from a theoretical perspective and will also acquire practical experience on a wide variety of machine learning techniques, such as generalised linear regression, penalised regression, decision trees and random forests, support vector machines, and deep neural networks which they will apply to biological datasets to develop and validate prediction models for classification and numeric prediction purposes. All practical sessions will be completed using the R environment. 
Syllabus
    • Metabolomics: overview and workflow,
    • Multivariate classification and biomarker discovery,
    • Introduction to machine learning,
    • Applications of machine learning in metabolomics,
    • Advanced topics in machine learning,
    • Applications of machine learning in food metabolomics,
    • Introduction to image analysis,
    • Advanced topics in R.


Intended learning outcomes

On successful completion of this module you should be able to:

  • Critically assess various metabolomics analytical and spectral platforms,
  • Apply state-of-the-art best practices in machine learning to fit the purpose of the analysis,
  • Develop classification and regression models based on multivariate metabolic data,
  • Provide examples of specific machine learning algorithms for classification and regression tasks or dual purpose,
  • Apply statistical and machine learning procedures covered during the module, to derive biological relevant information from metabolic datasets using R.

Programming Using Java

Module Leader
  • Dr Tomasz Kurowski
Aim

    To introduce you to concepts of object oriented programming using Java. Java is the pre-eminent programming language for serious application development on the Internet. The module covers Java data objects of primitive and reference data types and introduces you to the fundamentals of programming in Java, with hands-on practical sessions on implementing simple programs using calculations, variables, control statements and loops. 

Syllabus
    • Fundamental principles of programming in Java,
    • Object-oriented programming using Java,
    • Variables and calculations,
    • Strings,
    • Arrays, ArrayLists and HashMaps,
    • GUI programming.
Intended learning outcomes

On successful completion of this module you should be able to:

  • Identify and apply the most important programming structures,
  • Develop Java programs to meet given specifications,
  • Implement custom Java classes, interfaces, and packages,
  • Implement standalone application interfaces using Java Swing Components.

Advanced Sequencing Informatics and Genome Assembly

Module Leader
  • Professor Fady Mohareb
Aim

    This module will cover the latest bioinformatics tools and algorithms involved with developing a high-quality genome assemblies for orphan and previously unsequenced species, as well as improve the continuity and quality of existing genome assembly drafts. .This is established through the integration of advanced NGS and 3GS sequencing data with functional annotation using graph theory algorithms widely applied for various assemblers such de-Brujin and Overlap-layout consensus. This module gives an insight on the details of -omic-scale/big-data-driven life science making use of core platform technologies. 

Syllabus
    • How research is conducted in genome bioinformatics and within the broader context of interdisciplinary life sciences. 
    • Raw sequencing reads quality control and pre-processing.
    • Short reads assembly.
    • Long reads assembly.
    • Hybrid assembly strategies.
    • Genome assembly super-scaffolding, quality assessment and function annotation.
Intended learning outcomes

On successful completion of this module you should be able to:

  • Critically assess the technical limitations and the underlying biological and experimental assumptions that impact on data quality.
  • Apply and optimise various algorithms for short and long reads sequence assembly, including the application of de Bruijn and Overlap layout Consensus graph theories in genome assembly.
  • Successfully develop and optimise de-novo genome assemblies for various species.
  • Develop in-silico gene prediction models and functional annotation.

Data Integration and Interaction Networks

Module Leader
  • Dr Tomasz Kurowski
Aim

    Data integration represents a major challenge for bioinformatics research. This module covers the most popular data management, integration and visualisation tools within the bioinformatics community as well as the main concepts of databases design and normalisation.

Syllabus
    • Database design and normalisation,
    • Development of database access interfaces,
    • Design and implementation of data repository Web front-ends,
    • Techniques to integrate, interpret, analyse and visualise biological data sets,
    • Introduction to interaction networks,
    • Data Integration and visualisation.
Intended learning outcomes

On successful completion of this module you should be able to:

  • Utilise systems software for the visualisation of systems and system interactions,
  • Critically apply available tools for data integration,
  • Design, normalise and implement databases for experimental datasets,
  • Critically assess the main data standards protocols for genomics, as well as the current approaches for modelling and warehousing of life science data,
  • Discover systems relationships between data using bioinformatics tools and approaches.

Teaching team

You will be taught by an expert multidisciplinary team both from Cranfield University and externally.

Cranfield lecturers include Professor Fady Mohareb, the Head of Bioinformatics and Dr Tomasz Kurowski, the Course Director for the Applied Bioinformatics MSc.

External lecturers include Professor Conrad Bessant - Professor of Bioinformatics, QMUL; Dr Enrico Ferrero - Scientific Leader at GSK; Dr Lee Larcombe - Applied Exomics Ltd Director; Dr Robert King - Bioinformatician at Rothamsted Research; and Dr Luca Bianco - Bioinformatics Research Scientist, Fondazione Edmund Mach.

Who is it for?

This course aims to equip graduate scientists with the computational skills and awareness needed to process, analyse and interpret the vast amounts of biological data now becoming available. This course is equally suitable for candidates from life sciences disciplines who aim to gain the programming and computational skills through this course, and graduates with IT/computer science background who want to gain the molecular biology understanding to become bioinformaticians.

On completion of this course, you will be able to apply information technology and computational techniques to process genomic and genetic data, as well as developing novel drug discovery and diagnostic tools.

Additionally, you will gain the skills to design and implement software tools and databases using the latest advances in standalone and web-based technologies to fulfil the need of the research community.

How to apply

Click on the ‘Apply now’ button below to start your online application.

See our Application guide for information on our application process and entry requirements.