Iowa State University

Iowa State University
Bioinformatics & Computational Biology Bioinformatics & Computational Biology

 

BCB and IGERT News


at Iowa State University

 

Past News

 

 

 

 

 

Pierre Baldi to speak at the BCB Seminar Series

 

Pierre Baldi, with the University of California, Irvine's School of Information and Computer Science, will present on Wednesday, April 18, at 5:15 in 1414 Molecular Biology Building on the ISU campus.  The Seminar will be followed by a reception in the MBB Atrium.  His presentation title and abstract follows:

 

Postgenomic Era and P4 Medicine: Integrative Systems Biology Approaches

Abstract : We will first provide a brief historical overview of genomic and P4 (Personalized, Predictive, Preventive, Participatory) medicine and some of the current computational challenges and opportunities. Then we will present some recent results derived in our group in three areas: (1) immunology; (2) drug discovery; and (3) gene regulation. In immunology, we will present a new high-throughput protein chip technology for mapping humoral immune responses and identifying antigens. In drug discovery, we will demonstrate how chemoinformatics and other computational methods can be developed and brought to bear on the problems of protein structure analysis and the identification of useful drug leads against tuberculosis. In gene regulation we will show how Bayesian methods can be used to better assess evolutionary conservation and build the most accurate genome-wide maps of transcription factor binding sites. In turn, these maps can be used to infer gene regulatory circuits and build an expert system for molecular systems biology.


Paul Hohenlohe to speak at the BCB Seminar Series

 

Paul Hohenlohe, with the University of Idaho's Institute for Bioinformatics and Evolutional Studies, will present on Thursday, March 22, at 5:15 in 1414 Molecular Biology Building on the ISU campus.  The Seminar will be followed by a reception in the MBB Atrium.  His presentation title and abstract follows:

 

Population genomics for evolutionary and conservation biology in non-model organisms

 

Population genomic approaches, such as Restriction-site Associated DNA sequencing (RADseq), provide a fundamentally new perspective on a range of questions in evolutionary and conservation biology in non-model organisms.  RADseq uses Illumina sequencing to produce sequence data, in fragments up to several hundred bp, at tens of thousands of homologous loci across the genome of hundreds of individuals.  This allows rapid identification of large sets of SNP markers, estimation of phylogeographic relationships, association mapping of complex phenotypes, and genome scans for selection and local adaptation.  I will describe how we have used RAD sequencing in threespine stickleback to examine the population genomic architecture of rapid parallel adaptation, and in trout to identify large numbers of SNP markers for high-throughput genotyping.  These studies provide insight into experimental design and bioinformatic analysis of population genomic data in non-model organisms.  Considerations in experimental design for RADseq in non-model organisms include genome size, choice of restriction enzyme and sequence read length, trade-offs in number of markers and sequencing coverage, and availability of existing genomic resources.  Analysis of RADseq data follows two main paths, depending on the availability of a reference genome sequence.  In the absence of a reference genome, loci and alleles must be identified de novo; differentiation of duplicate, or paralogous, sequences is a critical challenge.  With a reference genome, analyses can take a truly genomic perspective by evaluating population genetic statistics as continuous distributions along the genome.  This reveals signatures of multiple evolutionary processes and the genomic architecture of adaptation at a remarkably fine scale in natural populations.

 

Martin Krzywinski to speak at the BCB Seminar Series

 

Martin Krzywinski will present a seminar for the BCB Seminar Series on Monday, February 27 at 5:15 in 1414 Molecular Biology Building followed by a reception. Dr. Krzywinski is a scientist with the Bioinformatics Genome Sciences Centre in the BC Cancer Agency, Vancouver, Canada, and his research is in the area of visualization of biological data

 

BCB Students organized the series around four major areas in Bioinformatics and Computational Biology: Computational Biomodeling and Systems Biology, Visualization of Biological Data, Comparative Genomics and Statistical Bioinformatics.

 

An overview semianr on the area of visualization of biological data was presented by Di Cook, BCB faculty member from the Statistics Department recently. Biological datasets are increasing in size, complexity, and interconnectedness. New biological visualization systems are required for optimized comprehension of not only the finer details of the data, but also an overall picture of various trends and topology of data. Research in biological data visualization addresses this need by innovation in usability, data integration, standardization, and novel ways of data organization.

 

Here is a title and abstract for the upcoming seminar by Dr. Krzywinski:

 

Behind Every Great Visualization is a Design Principle

 

Continual advancements in computational and laboratory methods in genomics continue to produce ever-growing data sets. The breakneck speed of progress is a common call-to-action in talk abstracts for development of new methods. It is easy to imagine the next best visualization just around the corner, but the real focus should be on whether we have maximized the potential of the last best visualization. After all, not all visualization is of data - we also need to communicate abstractions such as processes, concepts and relationships between entities.

 

What is a beleaguered scientist to do when faced with the task of corraling this information into a cohesive and interpretable graphical representation? What makes a successful visualization? How can we apply our knowledge about our brain's visual processing - of elements such as contrast, shapes and colors - to craft better figures? Can we inadvertently compromise our reader's ability to interpret our graphics and, if so, how can this be avoided?

 

When communicating visually, our goal should focus on delivering a core message legibly and clearly. Although our figures should be attractive, they must be effective. I will discuss the fundamental concepts of design, such as salience, consistency and representation, and demonstrate how they are used to guide the creation of effective visuals, drawing from examples in the genomics literature.

 

My purpose will be to help you think about visualization as a complex sentence built from the vocabulary of fundamental graphical shapes, with focus on simple and practical visual grammar.

 


 

Biographical Information: Martin Krzywinski started as a system administrator at Canada's Michael Smith Genome Sciences Center [1] in 1999 and built its first computing and network infrastructure [2], applying his interests in computing to IT security [2], optimizing keyboard layouts [3] and visualization [4]. He later moved to research, using fingerprint mapping to identify rearrangements in cancer genomes. In an attempt to visualize structural variation seen in cancer, he created Circos [5], a common paradigm for displaying comparisons of genomes, and hive plots [6]. His information graphics have appeared in the New York Times, Wired and on the covers of books and scientific journals. Martin believes that successful visualizations must have both form and function, runs the Espresso Club at the GSC and applies his creative style to fashion and abstract photography [7], turning spam into poetry [8]. He is the former owner of the world's most popular rat [9].


1 www.bcgsc.ca
2 www.linuxjournal.com/article/6977
3 mkweb.bcgsc.ca/carpalx
4 mkweb.bcgsc.ca/schemaball
5 www.circos.ca (some information below)
6 www.hiveplot.com
7 www.lumondo.com
8 mkweb.bcgsc.ca/fun/eespammings
9 mkweb.bcgsc.ca/rat/images/raton3700/

 


 

Reinhard Laubenbacher spoke at the BCB Seminar Series

Reinhard Laubenbacher, professor of Mathematics at Virginia Tech and Director of Outreach and Education for the Virginia Bioinformatics Institute presented Feb. 1 for the BCB Seminar Series. BCB Students organized the series around four major areas in Bioinformatics and Computational Biology: Computational Biomodeling and Systems Biology, Imagining and Visualization, Comparative Genomics and Statistical Bioinformatics.

 

An overview seminar on the area represented by each invited speaker will also take place. See a U-Tube of the recent overview seminar on Systems Biology presented by BCB student, Jesse Walsh, in the lab of Julie Dickerson.

 

Dr. Laubenbacher's presentation was entitled: Cancer systems biology

Abstract: Our understanding of cancer has been aided by a network centric view. The fundamental relevance of systems biology to the understanding and treatment of cancer is the insight that genes and proteins do not act in isolation, but rather as nodes in complex interactive networks that include multiple feedback mechanisms and redundancies.

The design of effective drugs to battle cancer will depend on the understanding of these networks and of the specific network alterations present in an individual tumor. And an understanding of characteristic changes in metabolic networks can lead to new prognostic and diagnostic methods. The complexity of these dynamic networks makes it difficult or impossible to study them without the aid of computer models based on mathematical analysis.

This talk will discuss systems biology and mathematical models as an approach to cancer biology by way of two case studies. One of these focuses on our research on intracellular iron metabolism and its relationship to breast cancer.

High Performance Computing Resources for BCB

Andrew Severin to serve in new Genome Informatics facility at ISU

 

With support from the NSF IGERT Training grant, BCB recently has been able to purchase a node in a high performance computer on campus. The BCB lab has been working to get this and a few other resources available to all BCB students, including laptops -- 5 Macs, 9 Ubuntu, 8 PC, 2 Notebooks; a new BCB lab website -- with a forum for discussions on bioinformatic topics; and a Cluster & Server with accounts available soon.

Also, with support from IGERT and in conjunction with the Office of Biotechnology, Andrew Severin, has been hired as manager of Iowa State's new Genome Informatics Facility. Andrew will also be the mentor for the BCB lab. The new Genome Informatics Facility administered by the Office of Biotechnology will use high performance computing resources to help researchers analyze the vast amounts of data associated with high through-put sequencing of living organisms.

A recent news release on ISU's site provides background on Andrew and describes the new facility: "Severin describes himself as an interdisciplinary scientist who works at the interface between the genetics and bioinformatics of animal, plant and microbial systems. He is looking forward to collaborating with researchers on experimental design, data analysis and the generation of text, tables and figures for publications.

As part of the Genome Informatics Facility, he also provides letters of support and reviews the data analysis pipelines in grant proposals prior to submission. "It's my job to make sense out of sequencing data. In my opinion, bioinformatics needs to be involved from the beginning," Severin said. "Understanding the biological assumptions and limitations of the experimental design, along with the assumptions made by the computer software, is critical for exploring and understanding the data generated from high through-put sequencing technology."

Severin has an extensive background in bioinformatics and data analysis. He earned a bachelor's degree in biotechnology (2002) with minors in microbiology and chemistry from North Dakota State University, Fargo. His doctoral degree in biophysics/biochemistry (2009) was awarded by Iowa State University. Before accepting the manager position, Severin was a postdoctoral research associate in agronomy at Iowa State where he worked with bioinformatics and data mining of next-generation sequencing data in soybeans."

As mentor for the BCB lab, Andrew has already provided excellent projects for both new and old students. The BCB Lab is a student-led group formed to help life science researchers at ISU create and apply computational and bioinformatics solutions to biological problems. In the process, students who take part in the BCB lab learn from the exchange of experience, knowledge, and resources with one another while making substantial contributions to on-going research efforts.

 


 

Li Xue takes top poster prize at recent ACM-BCB

conference

Li Xue, a BCB student in Vasant Honavar's lab took top honors for her poster based on the paper, "Ranking Docked Models of Protein-Protein Complexes Using Predicted Partner-Specific Protein-Protein Interfaces: A Preliminary Study". Other authors on the paper included Rafael A. Jordan 3, 4; Yasser EL-Manzalawy 3, 5; Drena Dobbs 1, 2 and Vasant Honavar 1, 3. The complete paper is here.

 

1Bioinformatics and Computational Biology Program, Iowa State University, Ames, IA, 50011, USA

2Department of Genetics, Development and Cell Biology, Iowa State University, Ames, 50011, USA

3Department of Computer Science, Iowa State University, Ames, IA, 50011, USA

4Department of Computer Science, Pontificia Universidad Javeriana, Cali, Colombia

5Department of Systems and Computer Engineering, AI-Azhar University, Cairo, Egypt

 

ABSTRACT: Computational protein-protein docking is a valuable tool for determining the conformation of complexes formed by interacting proteins. Selecting near-native conformations from the large number of possible models generated by docking software presents a significant challenge in practice.

 

We introduce a novel method for ranking docked conformations based on the degree of overlap between the interface residues of a docked conformation formed by a pair of proteins with the set of predicted interface residues between them. Our approach relies on a method, called PS-HomPPI, for reliably predicting protein-protein interface residues by taking into account information derived from both interacting proteins. PS-HomPPI infers the residues of a query protein that are likely to interact with a partner protein based on known interface residues of the homo-interologs of the query-partner protein pair, i.e., pairs of interacting proteins that are homologous to the query protein and partner protein. Our results on Docking Benchmark 3.0 show that the quality of the ranking of docked conformations using our method is consistently superior to that produced using ClusPro cluster-size-based and energy-based criteria for 61 out of the 64 docking complexes for which PS-HomPPI produces interface predictions. An implementation of our method for ranking docked models is freely available at: http://einstein.cs.iastate.edu/DockRank/.

 


Pan Du accepts new position with Genentech

 

Pan Du, a 2005 BCB graduate who worked with Julie Dickerson, new chair of BCB, has taken a position with Robert Gentleman's group in San Francisco, CA at Genentech. Pan had been at a position at Northwestern in Chicago.

 

Genentech Inc. is a biotechnology corporation, founded in 1976 by late venture capitalist Robert A. Swanson and biochemist Dr. Herbert Boyer. It is considered to have founded the biotechnology industry.

 

Dr. Gentleman, a recipient of the 2008 Benjamin Franklin Award, is the Senior Director of Bioinformatics and Computational Biology at Genentech. His team is working on a number of different projects, mostly centered around the use of high throughput sequencing to advance the knowledge of many basic biological mechanisms.

 

Michael Lawrence, another BCB alum, has worked with Gentleman for a number of years. His mentor is Di Cook, BCB faculty member in the Statistics Department.

 

Of particular interest, in the Gentleman lab is developing methods for understanding transcriptional regulation through the use of careful experimentation and ChIP-seq data. His group is also interested in helping to develop a better understanding of the role that transposable elements play in human disease.

 

Genentech is a research-driven corporation and employ researchers, scientists and post docs to cover a wide range of scientific activity — from molecular biology to protein chemistry to bioinformatics and physiology. Genentech scientists in these various areas of expertise currently focus their efforts on five disease categories: Oncology, Immunology, Tissue Growth and Repair, Neuroscience and Infectious Disease.

 

Genentech uses genetic engineering techniques and advanced technologies to develop medicines that address significant unmet needs and provide clinical benefits to millions of patients worldwide. In June 2011, C omputerworld ranked Genentech #4 in the large company category on its 18th annual "100 Best Places to Work in IT" list.

 

BCB at ISU congratulates Pan Du on his new position.

 


 

Matthew Studham has joined the Stockholm

Bioinformatics Center as a Postdoctoral Fellow

 

Matthew Studham, a 2010 BCB graduate who worked with Gustavo Macintosh, BBMB, has joined the Stockholm Bioinformatics Center in Sweden as a Postdoctoral Research Associate. He is working in the lab of Professor Erik Sonnhammer, director of the center.

 

Dr. Sonnhammer's lab is mainly involved in research and development of bioinformatics methods for protein function prediction, and applying them to genomics research. The group is focusing on new algorithms for ortholog identification, network inference, and domain architecture analysis, as well as graphical tools for sequence and network analysis. In his lab, systems biology is enabled by understanding the function of each gene and protein in an organism in terms of biochemical function and interaction partners.

 

Stockholm Bioinformatics Centre (SBC) is the largest bioinformatics centre in Sweden. SBC was started in January 2000 in response to an initiative from the Swedish Foundation for Strategic Research (SSF), as a collaboration between Stockholm University (SU), the Royal Institute of Technology (KTH) and Karolinska Institutet (KI).

 

 

It hosts five large research groups (Elofsson, Lagergren, Lindahl, Sonnhammer, and von Heijne), and all together approximately 30 bioinformatics researchers. http://www.sbc.su.se/

 

SBC provides a "critical mass" of internationally competitive bioinformatics research and methods development, and a strong environment for the training of postdoctoral, Ph.D., and diploma students in bioinformatics. It is also an important partner in collaborative projects both within Sweden and internationally. SBC strives to improve bioinformatics research and training in Sweden and to maintain Sweden's position as a leading European country in the bioinformatics arena.

 

Current research at SBC focuses on Comparative genomics; Membrane protein modeling; Computational systems biology; Protein structure and function prediction; Phylogeny; and Orthology analysis.

 


 

Congratulations to our Graduates !

 

Xiao Yang - to Broad Institute

 

Xiao worked with Srinivas Aluru, a BCB faculty member in the Electrical and Computer Engineering Department. He graduated in Summer, 2011.

 

Xiao received a Research Excellence Award from Iowa State and a Cornette Fellowship Award from the BCB Program for his research efforts. His dissertation is entitled, "Error Correction and Clustering Algorithms for Next Generation Sequencing".

 

He accepted a position with the Eli and Edythe L. Broad Institute of Harvard and MIT (popularly known as The Broad Institute).

 

 

Dissertation Abstract: Next generation sequencing (NGS) has revolutionized genomic data generation by enabling high-throughput parallel sequencing. This makes it possible to sequence new genomes or re-sequence individual genomes at a manifold cheaper cost and in an order of magnitude lesser time than traditional Sanger sequencing.

Using NGS technologies, ambitious genomic sequencing projects target many organisms rather than a few, and large scale studies of sequence variation become feasible. Because of this revolution, the data analysis methodologies are changing, exemplified by different applications: de Bruijin or string graph based approach is replacing traditional overlap-layout-consensus paradigm in genome assembly, computational pipelines consisting of locating and counting short reads per gene location on the reference genome are replacing Microarrays in gene expression analysis, and so on.

In this context, efficient analysis for large scale datasets is one of the most challenging problems. In this thesis work, we design efficient algorithms to improve the read quality for next generation sequencing and explore the emerging cloud computing techniques to cluster a large amount of metagenomic reads. First, we develop an efficient algorithm that uses a flexible read decomposition method to improve accuracy of error correction, and demonstrate its applicability using standard runs of Illumina sequencing. We further propose a statistical framework to differentiate infrequently observed subreads from sequencing errors when genomic repeats are prevalent. To differentiate between valid and invalid substrings based on their genomic frequency, we propose a statistical approach to estimate a frequency related threshold based on the dataset under study. Lastly, we formalize the task to quantify microbial organisms in environmental samples as a sequence clustering problem and develop a parallel solution integrating sketching, quasi-clique enumeration and MapReduce techniques. The implementation is carried out using Hadoop -- a MapReduce framework for cloud computing.

 


 

Wengang Zhou - to Pioneer Hi-bred

 

Wengang Zhou has worked in the lab of Julie Dickerson, a BCB faculty member in the Electrical and Computer Engineering Department.

 

His dissertation is entitled, " Machine learning methods for omics data integration".

 

Wengang has joined Pioneer Hi-bred in Johnston, IA as a bioinformatics scientist.

 

Abstract: High-throughput technologies produce genome-scale transcriptomic and metabolomic (omics) datasets that allow for the system-level studies of complex biological processes. The limitation lies in the small number of samples versus the larger number of features represented in these datasets. Machine learning methods can help integrate these large-scale omics datasets and identify key features from each dataset.

 

A novel class dependent feature selection method integrates the F statistic, maximum relevance binary particle swarm optimization (MRBPSO), and class dependent multi-category classification (CDMC) system. A set of highly differentially expressed genes are pre-selected using the F statistic as a filter for each dataset. MRBPSO and CDMC function as a wrapper to select desirable feature subsets for each class and classify the samples using those chosen class-dependent feature subsets. The results indicate that the class-dependent approaches can effectively identify unique biomarkers for each cancer type and improve classification accuracy compared to class independent feature selection methods. The integration of transcriptomics and metabolomics data is based on a classification framework. Compared to principal component analysis and non-negative matrix factorization based integration approaches, our proposed method achieves 20-30% higher prediction accuracies on Arabidopsis tissue development data. Metabolite-predictive genes and gene-predictive metabolites are selected from transcriptomic and metabolomic data respectively. Gene-metabolite correlation network can infer the functions of unknown genes and metabolites. Tissue specific genes and metabolites are identified by the class dependent feature selection method. Evidence from subcellular locations, gene ontology, and biochemical pathways supports the involvement of these entities in different developmental stages and tissues in Arabidopsis.

 

 


 

Fadi Towfic - to Broad Institute

 

Fadi worked in the lab of Vasant Honavar, a BCB faculty member in the Computer Science Department at Iowa State, and graduated in Spring, 2011.

 

He received a Research Excellence Award from Iowa State. His dissertation is entitled, "Modular Algorithms for Biomolecular Network Alignment".

 

Upon his graduation, he joined the Broad Institute in Boston, MA.

 

Dissertation Abstract: Comparative analyses of biomolecular networks constructed using measurements from different conditions, tissues, and organisms offer a powerful approach to understanding the structure, function, dynamics, and evolution of complex biological systems. The rapidly advancing field of systems biology aims to understand the structure, function, dynamics, and evolution of complex biological systems in terms of the underlying networks of interactions among the large number of molecular participants involved including genes, proteins, and metabolites. In particular, the comparative analysis of network models representing biomolecular interactions in different species or tissues offers a powerful means of identifying conserved modules, predicting functions of specific genes or proteins and studying the evolution of biological processes, among other applications. The primary focus of this talk is on the biomolecular network alignment problem: Given two or more networks, the problem is to optimally match the nodes and links in one network with the nodes and links of the other. We describe a suite of modular, extensible, and efficient algorithms for aligning biomolecular network models including: (1) undirected graphs in their weighted and unweighted variations (2) undirected graphs in their labeled and unlabeled variants. The resulting algorithms have been implemented as part of the Biomolecular Network Alignment (BiNA) Toolkit (http://www.cs.iastate.edu/~ftowfic/twiki/bin/view/Projects/BinaToolkit), an open source, user-friendly suite of software for comparative analysis of networks. Our experiments show that BiNA is (i) competitive with the state-of-the-art network alignment tools with respect to the quality of alignments (based on a variety of performance measures ) and (ii) able to align large networks ranging in size from a few hundreds of nodes and a few thousand edges to tens of thousands of nodes with millions of edges. We will describe several applications of BiNA including (1) construction of phylogenetic trees based on protein-protein interaction networks, and (2) identification of biochemical pathways involved in ligand recognition in B cells by aligning gene co-expression networks constructed from mRNA profiles of B cells exposed to different ligands

 


 

Saras Saraswathi - to Ohio State University

 

Saras worked in the lab of Robert Jernigan, a BCB faculty member in the Biochecmistry, Biophysics and Molecular Biology Department.

 

Her dissertation was entitled, "Predicting Protein Secondary Structures by Machine Learning Approaches".

 

She will be a postdoc with Dr. Andrzej Kloczkowski at the Nationwide Children's hospital, Batelle Center for  Mathematical Medicine, Ohio State University.

 

Computational methods are rapidly gaining importance in the field of structural biology, mostly due to the explosive progress in genome sequencing projects and the large disparity between the number of sequences and the number of structures. There has been an exponential growth in the number of available protein sequences and a slower growth in the number of structures. There is therefore an urgent need to develop computed structures and identify the functions of these structures. Developing methods that will satisfy these needs both efficiently and accurately is of paramount importance for advances in many biomedical fields, for a better basic understanding of aberrant states of stress and disease, including drug discovery and discovery of biomarkers. Several aspects of secondary structure predictions and other protein structure-related predictions are investigated using different types of information such as data obtained from knowledge-based potentials derived from amino acids in protein structures, physicochemical properties of amino acids and propensities of amino acids to appear at the ends of secondary structures. Investigating the performance of these secondary structure predictions by type of amino acid highlights some interesting aspects relating to the influences of the individual amino acid types on formation of secondary structures and points toward ways to make further gains. Protein secondary structures and other features of proteins are predicted efficiently, reliably, less expensively and more accurately.

 

A novel method called Fast Learning Optimized Predictor (FLOPRED) is proposed for predicting protein secondary structures and other structural features, using knowledge-based potentials, a Neural Network based Extreme Learning Machine (ELM) and advanced Particle Swarm Optimization (PSO) techniques that yield better and faster convergence to produce more accurate results. These techniques yield superior classification of secondary structures, with a training accuracy of 93.3% and a testing accuracy of 92.2% with a standard deviation of 0.5% obtained for a small group of 84 proteins. We have a Mathews correlation-coefficients ranging between 80.6% and 84.3% for these secondary structures. Accuracies for individual amino acids range between 83% and 92% with an average standard deviation between 0.3% and 2.9% for the 20 amino acids. On a larger set of 415 proteins, we obtain a testing accuracy of 86.5% with a standard deviation of 1.4%. These results are significantly higher than those found in the literature.

 

Prediction of protein secondary structure based on an amino acid sequence is a common start for predicting its 3-D structure. Additional information such as the biophysical properties of the amino acids can help improve the results of secondary structure prediction. A database of protein physicochemical properties is used as features to encode protein sequences and this data is used for secondary structure prediction using FLOPRED. Preliminary studies using a Genetic Algorithm (GA) for feature selection, Principal Component Analysis (PCA) for feature reduction and FLOPRED for classification give promising results. Some amino acids appear more often at the ends of secondary structures than others. A preliminary study has indicated that secondary structure accuracy can be improved as much as 6% by including these effects for those residues present at the ends of alpha-helices, beta-strands and coils.

 

In summary, an improved and efficient algorithm called FLOPRED, which is based on Neural Networks and Particle Swarm Optimization is used for classifying and predicting secondary structures from protein sequences. Analyses of the results of these studies provide new and interesting insights into the influence of amino acids on secondary structures. FLOPRED yields higher classification accuracy and better generalization performance compared to previous methods.

 


 

Hong Lu - to Ambry Genetics, Aliso Viejo, CA

 

Hong worked in the lab of Dr. Volker Brendel. His co-major professor was Dr. Roger Wise.

 

His dissertation was entitled, "Comparative genomics and its application to genome-wide cis-regulatory element detection".

 

He has begun his work with Ambry Genetics in Aliso Viejo, CA as a Bioinformatics scientist

 

Abstract: In the wake of advanced DNA sequencing technology, a large number of bacterial, animal, and plant genomes have now been completely sequenced and deposited into public databases. The acceleration of genome (and transcriptome) sequence data accumulation remains unabated and poses considerable challenges for data storage, access, and transfer, with even greater challenges for comprehensive data mining to turn the genome information into knowledge.

 

In particular, detailed genome annotation with respect to the encoded genes and their regulation is still largely confined to a few model species. Thus, important current research problems revolve around automated genome annotation and the related question of how widely applicable insights from the model species are with respect to novel genomes. For example, what features of genome organization are conserved across species? What differences in gene repertoire correlate with clade-specific traits or related species? To what extent are elements of transcriptional regulation shared?

 

In this talk, I will discuss comparative genomics approaches to these questions. First, I will describe genome features of eight recently sequenced plant species, with standards for comparison provided by the well established model species for dicots (Arabidopsis) and monocots (rice). Secondly, I will discuss software and statistical models for exploring possible cis-regulatory elements of co-regulated genes, with particular application to mapping gene expression data for a species with incomplete genome data to close model genomes.

 

 


 

BCB Faculty News

 

 

ISU plant pathologist, Adam Bogdanove, updates science community on groundbreaking research

 

Adam Bogdanove, BCB Faculty member in the Plant Pathology Department was featured in this Iowa State news article by Dan Kuester, News Service, 515-294-0704, kuester@iastate.edu recently with photo by Bob Elbert.

 

AMES, Iowa - In the two years since Iowa State University's Adam Bogdanove, along with student Matthew Moscou, published their groundbreaking gene research in the cover story of the journal Science, researchers around the world have built on those findings to explore further breakthroughs. Moscou is an alum of the Bioinformatics and Computational Biology Graduate Program at ISU.

 

Science has published another article by Bogdanove in the Sept. 30 issue that updates the scientific community on where the research has been since 2009 and where it is heading.


"In the past two years, an extraordinary number of things have happened in this field," said Bogdanove, a professor of plant pathology. "This is really pretty revolutionary."


Bogdanove's research published in 2009 uncovered how so-called TAL (Transcription Activator-like) effector proteins bind to different DNA locations, and how particular amino acids in each protein determine those locations -- called binding sites -- in a very straightforward way.


Knowing this, scientists are using the proteins to target and manipulate specific genes, something that was much more difficult to accomplish prior to this research.


That could lead to breakthroughs in understanding gene function and improving traits in livestock and plants, and even treating human genetic disorders, according to Bogdanove.


Bogdanove says in the two years since his and Moscou's work was published, nearly two dozen research papers have been published using this discovery.


"We are so excited about the potential of these proteins. Just in the past six months they have been used successfully in model organisms such as yeast, zebrafish, and C. elegans (a type of worm used to study development), and even in human stem cells. There is some really innovative stuff going on," he said. Model organisms are used to understand particular biological functions.


Bogdanove collaborated on this Science article with Dan Voytas, a former member of the Iowa State University faculty and now director of the Center for Genome Engineering at the University of Minnesota.


Bogdanove cautions in the article that the power of the technologies based on TAL effectors raises legal, sociological and ethical questions about how their use should be regulated, but says that it may be just a matter of a few years before these proteins see real application in areas such as crop improvement and human medicine.

 

Recent ISU Award recipients:

 

  • Stephen J. Willson was named University Professor and Janson Professorship in Mathematics, Professor of Mathematics
  • Alicia L. Carriquiry was named Distinguished Professor, College of Liberal Arts and Sciences, and Professor of Statistics
  • James Reecy professor of animal science, received the Mid-Career Achievement in Research Award in the College of Agriculture and Life Science. He began his career at Iowa State in 1999 and has an outstanding reputation, both nationally and internationally, as a molecular geneticist with an emphasis on beef cattle.
  • Christopher Tuggle, professor of animal science, is the recipient of the CALS Outstanding Achievement in Research Award. He began his career at Iowa State in 1991 and leads an award-winning research program in animal genomics and bioinformatics. His research centers on the molecular factors that influence important biological traits in pigs.

 

These CALS faculty received promotions in 2011:

 

  • Adam Bogdanove, Plant Pathology, promotion to Professor (Already Tenured) and
  • Lyric Bartholomay, Entomology, promotion to Associate Professor with Tenure.

 

Awards from beyond Iowa State University

 

 

Max Rothschild receives Jefferson Science Fellow

 

Max Rothschild, professor of animal science and a BCB faculty member, has been named Jefferson Science Fellow.

 

The role of the Jefferson Science Fellows is to advise and educate. They help increase understanding among policy officials of complex, cutting-edge scientific issues and their possible impacts on U.S. foreign policy and international relations.

 

They advise policymakers on available policy options to address solutions for emerging international scientific issues.

 

 

The program, established in 2003, has been a model for engaging the American academic science, technology and engineering communities in formulating and implementing U.S. foreign policy. Rothschild has been an Iowa State faculty member for nearly 31 years. He is internationally recognized for his expertise in swine genetics and adapting food-animal production to feed a growing population and to enhance economic development on both local and global scales.

 

Vasant Honavar Named Fellow of International Society of Intelligent Biological Medicine

 

Dr. Vasant Honavar, Professor of Computer Science and Computational Biology and Bioinformatics, has been named a Fellow of the International Society of Intelligent Biological Medicine (ISIBM). In the fields of bioinformatics, computational biology, and bioengineering, scientists regularly approach important biological and medical problems by using a common set of intelligent methods, such as computational data mining, evolutionary computation, pattern recognition, knowledge representation, databases, combinatorics, stochastic modeling, linguistic methods, robotics, string and graph algorithms, constraint satisfaction, and parallel computation. With such a plethora of intelligent methods, it is essential to provide a platform to bring these scientists together to discuss scientific issues in this fast growing field. ISIBM aims to fill this need. The mission of ISIBM is to facilitate the multidisciplinary development of intelligent methods and to empower creative scientists in these fields to solve biological and medical problems.

Vasant Honavar Appointed to Board of Directors of ACM SIG on Bioinformatics & Computational Biology

Dr. Vasant Honavar, Professor of Computer Science and Computational Biology and Bioinformatics, has been appointed to the board of directors for the ACM SIG on Bioinformatics and Computational Biology.

The ACM Special Interest Group on Bioinformatics, Computational Biology, and Biomedical Informatics (SIGBioinformatics) was instituted in 2010 with the aim of focusing on research on bioinformatics data management topics, roughly covered by the so-called biological and biomedical data, knowledge, and information management. The focus of SIGBioinformatics is to bridge computer science, mathematics, and statistics, with biology and biomedicine, sharing research interests in the management of data related to life sciences. The mission of ACM SIGBioinformatics is to support advanced research, training, and outreach in Bioinformatics, Computational Biology, and Biomedical Informatics by stimulating interactions among researchers, educators and practitioners from related multi-disciplinary fields.

 

 


 

Science Article for Anne Bronikowski, BCB faculty member in the EEOB Department

 

Anne Bronikowski, associate professor in the Ecology, Evolution and Organismal Biology Department and a BCB faculty member took part in a 3 year study to look at aspects of early adult mortality and the speed of aging across many species. Current research highlights that both factors are important for human longevity.

Here is an overview of the research.

For complete details visit:

http://www.sciencemag.org/content/331/6022/1325.full

 

Another overview of this project was featured in a summer 2011 issue of NESCent - Newsletter of the National Evolutionary Synthesis Center, an NSF-funded collaborative research center operated by Duke University, the University of North Carolina at Chapel Hill, and North Carolina State University. This overview was featured in volume 3 No. 2 of that Newsletter; Find NESCent here: www.nescent.org.

 

Humans aren’t the only ones who grow old gracefully, says a new study of primate aging patterns. For a long time it was thought that humans, with our relatively long life spans and access to modern medicine, aged more slowly than other animals. But now, the first-ever multispecies comparison of human aging patterns with those in chimps, gorillas, and other primates suggests the pace of human aging may not be so unique after all.

 

"If we were like other mammals, we would start dying fairly rapidly after we reach midlife. But we don’t." --Anne Bronikowski

 

The findings appeared in the March 11 issue of Science. You don't need to read obituaries or sell life insurance to know that death and disease become more common as we transition from middle age to old age. But scientists studying creatures from mice to fruit flies long assumed the aging clock ticked more slowly for humans.

 

We had good reason to think human aging was unique, said co-author Anne Bronikowski of Iowa State University. For one, humans live longer than many other animals. "Humans live for many more years past their reproductive prime," Bronikowski said. "If we were like other mammals, we would start dying fairly rapidly after we reach midlife. But we don’t."

 

"Scientists have argued for a long time that human aging was unique, but we didn’t have data on aging in wild primates besides chimps until recently," said co-author Susan Alberts, associate director at NESCent and a biologist at Duke University. The researchers combined data from longterm studies of seven species of wild primates: capuchin monkeys from Costa Rica, muriqui monkeys from Brazil, baboons and blue monkeys from Kenya, chimpanzees from Tanzania, gorillas from Rwanda, and sifaka lemurs from Madagascar.

 

The study included data from several famous long-term studies of primates in the wild, including the mountain gorilla study started by Dian Fossey.

 

The team focused not on the inevitable decline in health or fertility that come with advancing age, but rather on the risk of dying. When they compared human aging rates — measured as the rate at which mortality risk increases with age — to similar data for nearly 3,000 individual monkeys, apes and lemurs, the human data fell neatly within the primate continuum.

 

"Human patterns are not strikingly different, even though wild primates experience sources of mortality from which humans may be protected," the authors wrote in a letter to Science. The results also confirm a pattern observed in humans and elsewhere in the animal kingdom: as males age, they tend to die sooner than their female counterparts.

 

In primates, the mortality gap between males and females is narrowest for the species with the least amount of male-male aggression — a monkey called the muriqui, said co-author Karen Strier of the University of Wisconsin, who has studied muriquis since 1982. The results suggest the reason why males of other species die faster than females may be the stress and strain of competition, the authors said.

 

Do the findings have any practical implications for humans? Modern medicine is helping humans live longer than ever before, the researchers note. "Yet we still don’t know what governs maximum life span," Alberts said. "Some human studies suggest we might be able to live a lot longer than we do now. Looking to other primates to understand where we are and aren't flexible in our aging will help answer that question." l CITATION: Bronikowski, A., J. Altmann, et al. (2011). "Aging in the natural world: comparative data reveal similar mortality patterns across primates." Science 331(6022). Data available in the Dryad Digital Repository at http://dx.doi.org/10.5061/ dryad.8682.

 


 

External Recognition for BCB Students

 

Cambridge, UK - Microsoft Research Internship - Olga Nikolova (Mentor: Srinivas Aluru) will be mentored by Christopher Bishop and John Winn on a project which combines medical data with previously done genetic / genomics / biostatistics / Bayesian networks analysis for the first time, in collaboration with the Sanger Institute, a children's hospital, and University of Cambridge.

 

Lindau Nobel Laureate Meeting, June 26 - July 1, 2011 - Scott Boyken (Mentor: Amy Andreotti) was selected to attend. About 20 Nobel Laureates in Physiology or Medicine and 550 young researchers from around the world will meet at Lindau (Germany) to exchange ideas, discuss projects and build international networks.

 

3 BCB students chosen for national meeting - Rasna Walia (Mentor: Vasant Honavar); Sweta Vangaveti (Mentor: Alex Travesset) and Priyanka Surana (Mentor: Roger Wise) were selected to participate in the Research Grad Cohort program, a workshop April 1-2 in Boston, sponsored by the Computer Research Association's Committee on the Status of Women in Computing. The Grad Cohort aims to increase the ranks of senior women in computing by building and mentoring nationwide communities of women through their graduate studies.

 


 

 

BCB Graduates in Fall 2010 ...

 

Xiaoyong Sun
Ph.D. Candidate
Bioinformatics and Computational Biology
Department of Statistics

 

Diagnostics for nonlinear models with application to population pharmacokinetic modeling

Major Professors: Dr. Di Cook and Dr. Basil Nikolau

Position: Xiaoyong has joined VGTI – Vaccine and Gene Therapy Inst., in Port St. Lucie, FL, as a Bioinformatician.

Abstract: Biological problems often involve fitting nonlinear models to data. In pharmacokinetics, analysts study a subject's response to drug doses, which will typically follow a quick increase in concentration as the drug circulates through the body, and a gradual nonlinear decrease as it is processed and eliminated. These models are diagnosed with the help of the experimental data.

 

Specialist software exists for pharmacokinetic modeling: NONMEM, Monolix. General modeling software, such as PROC NLMIX in SAS and the package nlme in S/R, can also be used. A common problem is that these tools lack adequate diagnostic tools to assess the model fit. The Federal Drug Administration is encouraging the development of new approaches to model diagnosis.

 

This thesis addresses this gap, with the following contributions: 1) Interactive graphics are applied to model building, including the exploratory data analysis, goodness of fit, model validation and model comparison. This is a new addition to the practice of PopPK modeling. It provides a more systematic evaluation of these complicated models. 2) New visual methods have been developed to examine resampling statistics for PopPK modeling. Resampling statistics arise when multiple models are fit. The parameter estimates and fit diagnostics are extracted and results are visualized to diagnose PopPK models. Our new visual methods are developed from existing multivariate methods. 3) Preliminary work on exploring the effects of correlation between covariates on covariate selection in PopPK model building. Three algorithms for identifying the best covariates are compared. 4) To help users utilize the methods developed in this thesis for PopPK model diagnostics, I developed two R packages, PKgraph and PKreport. The PKgraph source code is distributed through
http://cran.r-project.org/web/packages/PKgraph/index.html. PKreport is currently available at http://pkreport.sourceforge.net/.

 


 

Misha Rajaram
Ph.D. Candidate
Bioinformatics and Computational Biology
Department of Statistics

Detecting recombination and its mechanistic association with genomic features via statistical models

Major Professors: Dr. Karin Dorman and Dr. Dennis Lavrov

Position: Is a Postdoctoral Research Fellow at UCSF, in Prescott Woodruff’s lab.

Abstract: Recombination, in retroviruses like HIV, results in the production of chimeric genomic molecules or recombinants. Notable variation in virulence, cell tropism, sensitivity of detection assays and drug mutation profiles within known non-recombinant HIV genomes is further augmented by the spread of inter-specific recombinants formed from two or more distinct genetic variants. The current thesis discusses a novel application of machine learning algorithms to genotyping complex HIV recombinants, as well as a statistical model for detecting the presence of recombination and its association with sequence features via Gaussian Markov Random Field (GMRF) priors.

 

Machine learning algorithms, specifically Bayesian Additive Regression trees (BART), were used to build an ensemble classifier to serve as a rapid genotyper for HIV sequences. A novel method for generating artificial training data when faced with paucity of real data is also described. Supplemented with artificial data, the genotyper classifies HIV sequences with 99% accuracy specially displaying high levels of success with complex recombinants.

 

GMRF priors were used in a hierarchical statistical model to efficiently combine information from many recombination events inferred from analyses of individual sequences. This provides valuable insights into the global spatial variation of recombination rate. This model was extended to be able to simultaneously infer covariates of interest. We report a hotspot in the pol gene as well as another in the nef gene inferred from our dataset. We also found genomic covariates promoting secondary structure formation to have a significant positive effect on recombination rates.

 


 

John Van Hemert
Ph.D. Candidate in
Bioinformatics and Computational Biology
Electrical and Computer Engineering Department

 

Title: Methods for integrated biochemical pathway analysis

 

Major Professors: Dr. Julie Dickerson and Dr. Basil Nikolau

Abstract: The common goal for biological research is to develop models for the biological processes we seek to understand. Such models, in the form of biochemical pathway networks which describe the physical interactions between a living cell's genes, transcripts, proteins, and metabolites ("Omics"), accumulate in different repositories for several model organisms as well as non-model organisms.

 

This thesis presents a set of integrated statistical bioinformatics tools that address key problems in integrating large-scale Omics datasets with pathway network models. A hardware accelerated non-parametric Omics mining method (Monte Carlo on the GPU) allows faster screening of custom test statistics and functions. A software platform for mining pathway databases (PathwayAccess) confers knowledge integration and comparison. Omics and pathway mining are combined for a novel method for statistically discriminating functionally meaningful subnetworks for their interaction with lists of entities mined from Omics data, so that software can intelligently mine large and complex pathway databases to answer a wide variety of questions and generate hypotheses (Discriminating Omics Response Groups in Pathways).

 

The method, called PathwayFlow, can discriminate pathways, reactions, metabolite classes, or any other biological entity grouping (Response Groups), and automatically accounts for connectivity-caused biases in the pathway network. It also differentiates between regulators (or inputs) and regulatees (or outputs) for a given Query List of Omics entities. It is applied to three real datasets: a simple E. coli gene expression dataset which validates the method, a more complex Vitis gene expression dataset which complements functional enrichment analysis (Grapevine's Response to Short Days), and an ultra-high throughput re-sequencing dataset for assessing genetic differences between two wine grape varieties (DNA Sequencing Appendix).