Bioinformatics - a maturing discipline...

Levels of Bioinformatics

Bioinformatics has grown from the days of using bioinformatics tools.  Those who take part in this major must synthesize knowledge from statistics, computer science and biology to produce robust research efforts and results for biological questions.

Current graduate students in the BCB major are mentored by professors in nine different departments at Iowa State and they bring specific knowledge of various tools used in their research from their home departments to the table when interacting with other BCB students.  Using bioinformatics tools from computer science or statistics or biology is the starting point for BCB students.  However, sharing knowledge of tools for particular research being done in their home departments is helpful to all students in the BCB major.

At a recent BCB symposium, Karin Dorman, BCB faculty member and former chair of BCB, presented the levels of bioinformatics:

  • Using web to analyze data: already part of modern biology
  • Install and run new programs: training in command-line usage
  • Writing scripts to analyze own data: perl/python for data processing tasks, R for statistical analysis, SQL for data storage.
  • High level coding to implement existing algorithms or modify existing ones to perform new jobs.
  • Thinking mathematically, developing own algorithms

As the BCB disciplines matures, the BCB Graduate Program seeks to motivate students to reach for the highest rung of the ladder...

Moving up the ladder of Bioinformatics complexity

Depending on their home department and interests, a BCB student might have higher level skills in statistics.  They use their knowledge to create a Statistical algorithm to analyze biological data in R, for instance.  The model or algorithm they select and develop will be based on their knowledge of the biological question being asked and the analysis needed to answer it.  Data from the analysis will need to be in an appropriate format so the tools applied to it matter !

Perhaps that same BCB student or with help from another BCB student would then take the R algorithm and optimize it with their deeper knowledge of Computer Science while still keeping in mind the biological question being asked and the kind of data being produced by the statistical model...For instance, output from R can be quite slow and take much time to be run.  If the information could be put on the cloud and if parallel processing could be applied to it, results would be produced more quickly and of course, with big data, this matters.  However, parallelizing data without a knowledge of the biological question being asked can lead to inaccurate, incomplete or superfluous data.

A BCBer with a deeper biological knowledge might construct an experiment in such a way as to have output that could be analyzed with a particular bioinformatics tool … that person might be able to retrofit the particular bioinformatics tool in such a way that it does a better job of dealing with the data and giving output that is even more usable to move the research project ahead.

BCB Students Collaborate and Learn

The BCB interdepartmental Graduate program brings together students who have expertise in computer science, statistics and biology.  The students gather regularly to talk about their research and to share programs, tools and techniques they are using.  Their joint meetings and discussions, even in social contexts, allows them to share their knowledge across the different emphases of their departmental research so their bioinformatics solutions are as robust as possible. 

The BCB graduate student must synthesize knowledge and information from all three areas for this degree program.  With BCB students collaborating with one another and with faculty mentors across departments, the BCB discipline has pushed much biological research forward.  Many exceptional publications have resulted which is the basis for continued success in grant awards.

BCB Coursework

With the need for students in the BCB discipline to have a sound foundation in algorithms, statistics and biology, the coursework must be specific to bioinformatics to best convey the concepts our students need to be a successful student in this discipline.  Here are the course descriptions for our BCB core courses:

BCB 567. Bioinformatics I (Fundamentals of Genome Informatics).

(Cross-listed with COM S, CPR E). (3-0) Cr. 3. F. Prereq: COM S 228; COM S 330; credit or enrollment in BIOL 315, STAT 430

Biology as an information science. A review of the algorithmic principles that are driving the advances in bioinformatics and computational biology. 

BCB 568. Bioinformatics II (Advanced Genome Informatics).

(Cross-listed with COM S, GDCB, STAT). (3-0) Cr. 3. S. Prereq: BCB 567, BIOL 315, STAT 430, credit or enrollment in GEN 409

Statistical models for sequence data, including applications in genome annotation, motif discovery, variant discovery, molecular phylogeny, gene expression analysis, and metagenomics.  Statistical topics include model building, inference, hypothesis testing, and simple experimental design, including for big data/complex models.

BCB 569. Bioinformatics III (Structural Genome Informatics).

(Cross-listed with BBMB, COM S, CPR E). (3-0) Cr. 3. F. Prereq: BBMB 316, BCB 567, GEN 409, STAT 430

Molecular structures including genes and gene products: protein, DNA and RNA structure. Structure determination methods, structural refinement, structure representation, comparison of structures, visualization, and modeling. Molecular and cellular structure from imaging. Analysis and prediction of protein secondary, tertiary, and higher order structure, disorder, protein-protein and protein-nucleic acid interactions, protein localization and function, bridging between molecular and cellular structures. Molecular evolution.

BCB 570. Bioinformatics IV (Computational Functional Genomics and Systems Biology).

(Cross-listed with COM S, CPR E, GDCB, STAT). (3-0) Cr. 3. S. Prereq: BCB 567 or COM S 311, COM S 228, GEN 409, STAT 430

Algorithmic and statistical approaches in computational functional genomics and systems biology. Analysis of high throughput biological data obtained using system-wide measurements. Topological analysis, module discovery, and comparative analysis of gene and protein networks. Modeling, analysis, and inference of transcriptional regulatory networks, protein-protein interaction networks, and metabolic networks. Dynamic systems and whole-cell models. Ontology-driven, network based, and probabilistic approaches to information integration.

Students are also required to take a graduate level course in molecular genetics, GDCB 511.  The BCB curriculum committee has adopted a more flexible policy with regard to this requirement if the student has equivalent background coursework.  A course can be transferred in or another course within the student’s area of emphasis can be substituted for it.

GDCB 511. Molecular Genetics. (Cross-listed with MCDB). (3-0) Cr. 3. S.Prereq: Biol 313 and BBMB 405. The principles of molecular genetics: gene structure and function at the molecular level, including regulation of gene expression, genetic rearrangement, and the organization of genetic information in prokaryotes and eukaryotes.

Biggest Challenges ahead for BCB

At the BCB Symposium held in March 2015, Professor Dorman also presented the biggest challenges for BCB based on the following article:

J C Fuller, P Khoueiry, H Dinkel, K Forslund, A Stamatakis, J Barry, A Budd, T G Soldatos, K Linssen and A M Rajput (2013) Biggest challenges in bioinformatics. EMBO Reports. 14(4):302–304.

  • Data deluge. What can be discarded? What can’t?
  • Knowledge management. Standard formats, interfaces, greater visibility.
  • Predictive models. Can hypotheses exist before data? Gold standards, negative controls.
  • Personalized medicine.
  • What is a species? Conservation and global warming, population genetics + phylogenetics.
  • Tree of life. Orthology assignment, housekeeping genes vs. “orphan” genes, lateral gene transfer.

Dorman also presented on Beyond Bioinformatics and into the future:

  • Data integration. Multiple data types, conditions, sparse solutions.
  • Pipeline errors/uncertainty.
  • Microbiome/Metagenome. Dynamics, complexity, connection to other data.
  • Imaging Genetics. Tools to relate genes to complex datasets.
  • High-Dimensional Discrete Data.
  • Epigenetics. Methods for DNA methylation and 3C.
  • Evolutionary Dependence. Population genetics + phylogenetics, species/gene tree disparity, selection
  • Multiple Testing Simultaneous Inference. The challenge of analyzing high throughput data in an exploratory fashion.

Welcome to the exciting world of Bioinformatics !!