Breadcrumb überspringen und zum Hauptmenü wechseln

RESEARCH GROUP Computational Metagenomics > Research

sequencer, agar plates, samples and tablet — © Universität Bielefeld | Foto: A. Lauterbach

Research

Cloud Computing

The tremendous advances in the development of high-throughput sequencing technologies in recent years have not only had a major impact on biological and medical research. The exponentially increasing amounts of sequence data, as well as the growing complexity of the experiments, also represent major challenges for bioinformatics. Here, new bioinformatics approaches have to be developed that allow processing and interpretation of the ever-increasing amounts of data. In the Computational Metagenomics research group, the research focus is on metagenomics, while the Bielefeld Bioinformatics Server group is particularly concerned with the provision of analysis pipelines in public and private cloud computing environments.

At a very early stage, bioinformatics recognized the trend towards networking and the use of resources distributed worldwide. In parallel with the rapid development of the World Wide Web, tools and databases have been put online, making them especially useful for biomedical research groups, even if local computing resources were not available. In order to offer the bioinformatics tools developed at Bielefeld University as sustainable services, the Bielefeld Bioinformatics Server (BiBiServ) was established in 1996. Here we currently offer more than 60 tools that have been developed in various working groups at Bielefeld University in recent years. To ensure long-term support for these tools, we have developed a framework that gives developers an easy way to put their tools online with minimal effort, while also allowing us to keep the underlying web server infrastructure up to date. In the last 10 years alone, more than 1.5 million jobs have been processed on the BiBiServ. Users from more than 90 countries worldwide use BiBiServ tools.

Current developments in sequencing technologies call for far-reaching changes in bioinformatics infrastructures and analysis pipelines. Data volumes grow exponentially while costs constantly drop at the same time. This allows even small labs to generate huge data sets. The analysis requires large compute and storage resources that are often not available locally. Furthermore, the integration of different datasets, which are usually distributed in multiple locations, allows the improved analysis of experiments. Here, cloud computing opens up new opportunities. Analysis pipelines and data are "virtualized" in dynamically scalable resources. Providing reference data sets in the cloud storage eliminates the need for long download times. Instead, tools and pipelines are moved where data sets reside.

We have developed a cloud computing framework that allows us to run virtually all BiBiServ tools in private and public cloud environments (AWS, Google and the de.NBI OpenStack cloud). Our BiBiGrid framework configures cloud computing resources as HPC clusters and provides easy access to cloud storage, where data from major research projects are increasingly being stored (e.g. 1,700 genomes of the 1000 Genomes Project, or the Human Microbiome Project data are publicly accessible in AWS S3).