Iowa State University

Iowa State University
Bioinformatics & Computational Biology Bioinformatics & Computational Biology

 

Poster Presentation for the

Bioinformatics and Computational Biology

Graduate Program

 

April 27, 2012

Molecular Biology Building Atrium

11 a.m. to 1 p.m.

 

A poster presentation for the Bioinformatics and Computational Biology Graduate Program will highlight the numerous research projects BCB students take part in. Posters will be presented by the following students. Abstracts appear below the table:

 

Poster #
Student

Mentors
Home Department

Poster
1
Ali Berens
Toth/Liu
EEOB
Development of Next Generation Sequencing Resources to Study Behavior in Social Wasps: Applications to Caste Determination and Facial Recognition
2
Hyejin Cho
Chou/Liu
GDCB

E. coli and Agrobacterium non-coding regulatory RNA discovery

3 - 4
Sylvia Do
Yu/

Structure ad Mechanism of the Heavy Metal Transporter CusA, and

Crystal Structure of the CusBA Heavy-Metal Efflux Complex of Escherichia coli

5
Yao Fu, Jesse Walsh, Erin Boggess
Dickerson/Jarboe
ECPE
Bioinformatics for E.coli Project of the Center of Biorenewable Chemicals
6

Jon Hurst

Wurtele/Dickerson
GDCB

Biorenewables - identifying genes which affect fatty acid levels in S. cerevisiae

7
Tieming Ji

Nettleton/Schnable
Statistics

Borrowing Information across Genes and Experiments for Improved Error Variance Estimation in Microarray Data Analysis
8
Ataur Katebi
Jernigan/Dobbs
BBMB

FBA and TIM: Similarity in Folds and Motions of Domains and Functional Loops

9
Ruolin Liu
Wurtele Rotation
GDCB
MetaBlast: A 3D game, aimed at cell biology visualization
10
Divita Mathur
Henderson/Lutz
GDCB Dept.
Deconstructing DNA Origami: Eliminating the Scaffold
11
Divya Mistry
Dickerson/Dobbs
ECPE

LeafQuant: Quantifying discoloration of plant leaves due to disease and pathogen infection

12
Shreyartha Mukherjee
Beavis/
pliceR: Detecting and quantifying Allele-Specific-Expression from RNA-seq data
13
Benjamin Mulaosmanovic
BCBio Undergrad w/Dorman
Statistics
Uncovering and Modeling DNA Methylation Dependence
14
Usha Muppirala
Dobbs/Jernigan
GDCB
Predicting RNA-Protein Interactions Using Only Sequence Information
15
Arun Sethuraman
Janzen/Dorman
EEOB
Cryptic Phylogeographic Patterns in Midwestern Populations of Blanding’s Turtles (Emydoidea blandingii)
16
Priyanka Surana
Wise/Nettleton
Plant Path
Transcript based cloning of Rar3: a novel gene required for NB-LRR-mediated
immunity in plants
17

Sweta Vangaveti

Travesset/X. Song
Physics

The signaling processes and phospholipids - the role of ions
18
Marie Vendettuoli
Hofmann/Cook
Statistics
Designing successful workflows for reproducible analysis in a regulatory statistics environment.
19
Rasna Walia
Honavar/Dobbs
Computer Science
Predicting RNA-binding residues in proteins using a sequence homology-based method
20
Gokul Wimalanathan
Vollbrecht/Lawrence
GDCB
Gene Expression Resources Available from MaizeGDB
21
Michael Zimmermann
Postdoc w/Jernigan
BBMB
"Combining Statistical Potentials with Dynamics-Based Entropies Improves Selection from Protein Decoys and Docking Poses"
22
Tao Zuo
Tom Peterson
GDCB
Global gene expression profiling of maize copy number variation (CNV) with phenotypic impact

Sponsored by the BCB Graduate Program and NSF-IGERT, Computational Molecular Biology Training Group.


 

Ali Berens; Toth, Amy L.

 

(AJB, ALT) Iowa State University, Program in Bioinformatics and Computational Biology, Ames, IA 50011; (AJB, ALT) Iowa State University, Department of Ecology, Evolution, and Organismal Biology, Ames, IA 50011; (ALT) Iowa State University, Department of Entomology, Ames, IA 50011

 

Genomic resources are being developed for a group of social insects, paper wasps in the genus Polistes, which are an important model for studying the evolution of social behavior. These wasps live in small societies consisting of a queen and her daughter workers. Although they cooperate to form a 'eusocial' colony, they are considered to be "primitively eusocial" because workers have the ability become queens, and there is a substantial amount of conflict and aggression among females for opportunities to reproduce. This poster will describe two examples of the application of genomic tools to this emerging model system. 1) As in many other social insects, queen caste determination in Polistes is associated with higher nutritional state during larval stages. Using next-generation RNA-sequencing, we are characterizing transcriptomic signatures of queen and worker caste development and investigating whether a nutritional manipulation affects the expression of caste-related genes. 2) Some species of Polistes have been discovered to possess the remarkable ability to recognize faces of conspecifics. Species with facial recognition, such as Polistes fuscatus, can learn faces much more readily than other visual patterns compared to species without facial recognition such as Polistes metricus. A candidate gene expression study using qRT-PCR indicates that these learning differences are associated with changes in the expression of genes related to eye development, mushroom body development, and memory function. Additional genomic resources for Polistes are now under development, including de novo whole genome sequencing of the 300 Mb P. dominulus genome, currently being assembled and annotated.

 

Hyejin Cho

Title: E. coli and Agrobacterium non-coding regulatory RNA discovery

Prokaryote species express small non-coding RNAs (ncRNAs) to regulate their gene expressions. 

Many ncRNAs for well-known bacteria like E. coli have been identified, the extent and structure of ncRNAs are still largely unanswered for most bacteria. 

In order to develop a systematic method to identify all ncRNAs in prokaryotes, we have chosen E. coli and Agrobacterium tumefaciens to test our new method. 

Our method will first conduct a whole-genome thermodynamic analysis to identify all possible imperfect base-pairings between annotated genes and some other genomic sequences. 

Next we will use a whole-genome tiling microarray to check the actual expression of predicted ncRNAs that are expressed by bacteria grown under various different physiology and nutritional conditions. 

This presentation focuses on the design and construction of the tiling microarrays based on the NimbleGen platform.

 

Sylvia Do

Titles and absctracts: Structure ad Mechanism of the Heavy Metal Transporter CusA

Heavy-metal ions such as silver and copper are often used as antibacterial agents. Bacteria harbor efflux systems of the resistance-nodulation-division (RND) family, which play major roles in the acquired tolerance to various compounds, including heavy metal ions. CusA is the only heavy-metal efflux RND (HME-RND) transporter in Escherichia coli. This transporter specifically recognizes and confers resistance to Ag(I) and Cu(I) ions. As an RND transporter, CusS works in conjunction with a membrane fusion protein (CusB) and an outer membrane channel (CusC) to form a functional tripartite protein complex. Presumably, the CusABC complex spans both the inner and outer membranes of E. coli to export Ag(I) and Cu(I) directly out of the cell. Proton import, catalyzed by CusA, is used to drive the efflux of heavy-metal ions.

 

Title: Crystal Structure of the CusBA Heavy-Metal Efflux Complex of Escherichia coli.

Silver and copper are well-known bactericides and have been used for centuries. Because of the widespread use of silver and copper as antimicrobial agents, the presence of silver- and copper-resistant bacterial strains appears to be on the rise. Gram-negative bacteria such as Escherichia coli, frequently utilize tripartite efflux complexes belonging to the resistance-nodulation-division (RND) family to expel toxic compounds, including toxic metal ions from the cell. E.

coli consists of one heavy-metal efflux RND transporter, CusA, which specifically recognizes and confers resistance to Ag(I) and Cu(I) ions.

 

The Cus efflux system cosists of four members:

CusA: inner membrane transporter

CusB: periplasmic membrane fusion protein (MFP)

CusC: outer membrane channel

CusF: periplasmic metal binding protein

 

A crystallographic model of this tripartite complex has been unavailable because co-crystallization of the various components of the system has proven to be extremely difficult.

Yao Fu, Jesse Walsh and Erin Boggess

Title: Bioinformatics for E.coli Project of the Center of Biorenewable Chemicals

Yao Fu 1, Jesse Walsh 1, Erin Boggess 1, Yanfen Fu 2, Liam Royce 2, Laura Jarboe 2, Jacqueline Shanks 2, Julie Dickerson 1

1 Department of Electrical and Computer Engineering, 2Department of Chemical and Biological Engineering, Iowa State University

Abstract:

The Center for Biorenewable Chemicals (CBiRC) is developing the tools, components and materials needed to transform carbohydrate feedstocks into bio-based chemicals. CBiRC has organized into five interdependent research areas that include Biocatalysis (Thrust 1), Microbial Engineering (Thrust 2), Chemical Catalysis (Thrust 3), Life-Cycle Assessment and Testbeds. The task of our project is to provide bioinformatics support through the development of tools and models for Thrust 2, Microbial Engineering. To perform this task, we focus on three research areas in bioinformatics: analysis and modeling of gene regulation, metabolic pathway and flux modeling, and high-throughput genomics data analysis. Tools and models have been developed to integrate in-house omics data with existing databases to provide a systems wide view of the production strains, and systems based tools and techniques have been developed to provide insights and/or suggestions for further strain improvement.

 

Jon Hurst

Motivation: Little is known about the regulation of fatty acid biosynthesis (FAB)  in Saccharomyces cerevisiae.  Motivated  by  the  need  to  increase fatty  acid  levels  in  yeast  for  the  production  of biorenewables,  our  goal  was  to  identify  genes affecting S. cerevisiae fatty acid levels.
Bioinformatics  solution:  We  created  co-expression  clusters  from  publicly  available transcriptomics  data  using  a  novel  clustering method.  This  method  is  based on correlations, yet  often  results  in  genes  that  are  related  but not correlated in the same cluster. These clusters contain  genes  that  share  broad  co-regulation associated  with  particular  cellular  processes, while individual genes may still exhibit separate and contradictory regulation.
Experimental  validation:  We  hypothesized that several genes -- found in a cluster with two known FAB regulators -- would affect fatty acid levels.  Employing  single  knockout  strains of these genes, we observed pronounced changes in fatty  acid  types  and  levels, validating our predictions.
Conclusion:  This clustering  method  may  be useful to anyone seeking to functionally identify target genes using transcript data.

 

Tieming Ji

Title: Borrowing Information across Genes and Experiments for Improved Error Variance Estimation in Microarray Data Analysis

Abstract: Statistical inference for microarray experiments usually involves the estimation of error variance for each gene. Because the sample size available for each gene is often low, the usual unbiased estimator of the error variance can be unreliable. Shrinkage methods, including empirical Bayes approaches that borrow information across genes to produce more stable estimates, have been developed in recent years. Because the same microarray platform is often used for at least several experiments to study similar biological systems, there is an opportunity to improve variance estimation further by borrowing information not only across genes but also across experiments. We propose a lognormal model for error variances that involves random gene effects and random experiment effects. Based on the model, we develop an empirical Bayes estimator of the error variance for each combination of gene and experiment and call this estimator BAGE because information is Borrowed Across Genes and Experiments. A permutation strategy is used to make inference about the differential expression status of each gene. Simulation studies with data generated from different probability models and real microarray data show that our method outperforms existing approaches.

 

Ataur Katebi

Fructose Bisphosphate Aldolase (FBA) of Saccharomyces cerevisiae is a class II aldolase which requires a zinc ion in each subunit for its catalysis. In glycolysis pathway, this dimeric enzyme catalyzes the reversible aldol cleavage of the 6-carbon fructose 1,6-bisphosphate into two trioses, dihydroxyacetone phosphate (DHAP) and glyceraldehyde 3-phosphate (GAP). The next protein in the pathway, Triosephosphate Isomerase (TIM), converts DHAP into GAP. Our analysis shows that FBA and TIM have high similarity in their core structures. Though the variations in their catalytic site architectures and functional loop dynamics bring high specificity in their reactions, the motions of the functional loops in FBA and TIM are highly correlated indicating that these two protein machines could function in a coupled manner.

 

Ruolin Liu

Title: MetaBlast: A 3D game, aimed at cell biology visualization

MetaBlast is a real 3D action-adventure game aimed at teaching high school student cell biology. Your job is saving the last remaining plant cell and rescue the lost team in this cell. Players go into a microscopic size in soybean plant cell and play around the vivid dynamic world. By presenting the cell as an interactive environment, MetaBlast offers a new way in which students are taught in contrast to traditional methods, such as textbooks and diagrams. By playing MetaBlast, students will be introduced to key concepts and organelles in a manner complementary to textbooks.

Among the developers working on MetaBlast are highly skilled and creative artists, game designers, programmers and biologists. MetaBlast is implemented by Unity 3.5.1 game engine. ( http://unity3d.com/ ). Javascript is chosen as scripting language because its readability and compatibility. The vivid and dynamic game objects and most scenes graphs are created by a 3D animation program Maya ( http://usa.autodesk.com/maya/ ) preceded by a 2D drawing.

 

Divita Mathur

Title: Deconstructing DNA Origami: Eliminating the Scaffold

Introduction of the DNA Origami method has created a new paradigm for designing and creating two- and three-dimensional nanostructures by folding a large single-stranded 'scaffold' DNA and 'stapling' it together with a library of complementary oligonucleotides in a raster format. However, despite its power and wide-ranging implementation, the DNA Origami technique suffers from some limitations. Two of them are based on the limited number of non-redundant single stranded scaffolds and the biological origin of these scaffolds, precluding scaffold-design ab initio. We are testing an alternative method of creating DNA nanostructures that borrows the concepts of raster formatting and staples from DNA origami and introduces a new feature called "scaples" ( scaffold+sta ples) to replace the large single-stranded scaffold. Scaples are designed by introducing nicks into the template scaffold keeping the robustness and integrity of the structure in mind. The scaples method opens up a bigger opportunity space and provides more flexibility in terms of sequence variability and size of the DNA nanostructures. Future research is focussed on improving the yield of scaples-based products.

 

Divya Mistry

Title: LeafQuant: Quantifying discoloration of plant leaves due to disease and pathogen infection

Abstract: Characterization of plant phenotype resulting from disease or infection is an important step in understanding plant's biology.  It is used in downstream processes to engineer systems that can help the plant fight against a specific disease or infection.  Discoloration of a leaf is a common phenotype on a pathogen-infected leaf.  Current practices for comparison and measurement of discoloration involve "eye-balling" the amount of infection and estimating whether the discoloration in leaves is significantly higher or lower[1].  This results in highly subjective measurements, which restricts researchers from quantitatively comparing future results.  We have developed an application, called LeafQuant, for an objective quantification of the discoloration on leaves within the same experiment.

 

Shreyartha Mukherjee

spliceR: Detecting and quantifying Allele-Specific-Expression from RNA-seq data

Shreyartha Mukherjee, Paul Scott, William Beavis

Allelic specific expression (ASE) is a vital factor in phenotypic variability and for the development of complex traits. Some genes display allelic disparity in gene expression that is transmitted by Mendelian or non-Mendelian inheritance and this discrepancy may be associated to effects like heterosis, variation in yield, uniformity in plants and complex traits and diseases in animals. It is of great interest to study how genetic and epigenetic modifications lead to transcriptional variation and how transcriptional variation affects the phenotype. Differential allele expression may be controlled by changes to the nucleotide sequence and regulatory elements, such as single nucleotide polymorphisms (SNPs), insertions and deletions, and studies indicate that these variations are rampant across the genomes and tissues. Such variants in the coding regions of genes may alter the structure and function of the gene product. Recent studies have shown that preferential expression of alleles is widespread in mammals. Non-imprinted autosomal genes exhibit allelic imbalance at the transcript level in mouse hybrids (Cowles et al., 2002) and humans (Yan et al., 2002), and such expression produces proteins associated with diseases. Hence a solid understanding of classification and functional annotation of allele-specifically expressed genes is vital to recognize the extent of functionally important regulatory variation. This will help us identify candidate haplotypes and the correlation between their genetic sequences and heterotic traits. The physiological vigor and variations in general health of an organism is strongly associated with the extent of variation of parental gametes. In our study we will develop a novel approach (spliceR) to study allele-specific expression and identify alleles that are preferentially expressed across genetic backgrounds and levels of inbreeding.

 

Benjamin Mulaosmanovic

Title: Uncovering and Modeling DNA Methylation Dependence

Abstract: DNA methylation is the addition of a methyl group to the 5' carbon of cytosine. This modification of DNA is an important factor in biological processes such as gene expression, development, and genome structure integrity. However, there is a gap in our understanding of the exact mechanisms by which DNA methylation affects these underlying processes. Knowledge of the DNA methylome (i.e. which nucleotides are methylated) would provide researchers with more tools to uncover how methylation affects these processes. I focus on one model for detection of methylation by Dorman et. al. which utilizes a simplifying assumption that methylation of nucleotides is independent. However, biological evidence suggests that methylation of neighboring nucleotides is highly dependent. I use a statistical model to prove methylation does not occur independently in human ES cell methylation bisulfite sequencing data. I then propose a modification to the Dorman et al. model that allows for the methylation state of nucleotides to be dependently methylated using a logistic regression model. I demonstrate the dependent model fits data better when they are simulated under the dependence assumption. Ultimately, I hope to generalize these modifications of the model to real world data. This will provide researchers with more tools to  decipher  the complexities of DNA methylation and gain a greater understanding of genetics as a whole. 

 

Usha Muppirala

Title: Predicting RNA-Protein Interactions Using Only Sequence Information

Authors:
Usha K Muppirala, Vasant G Honavar and Drena Dobbs

Abstract:

Background
RNA-protein interactions (RPIs) play important roles in a wide variety of cellular processes, ranging from transcriptional and post-transcriptional regulation of gene expression to host defense against pathogens. High throughput experiments to identify RNA-protein interactions are beginning to provide valuable information about the complexity of RNA-protein interaction networks, but are expensive and time consuming. Hence, there is a need for reliable computational methods for predicting RNA-protein interactions.

Results
We propose RPISeq, a family of classifiers for predicting RNA-protein interactions using only sequence information. Given the sequences of an RNA and a protein as input, RPIseq predicts whether or not the RNA-protein pair interact. The RNA sequence is encoded as a normalized vector of its ribonucleotide 4-mer composition, and the protein sequence is encoded as a normalized vector of its 3-mer composition, based on a 7-letter reduced alphabet representation. Two variants of RPISeq are presented: RPISeq-SVM, which uses a Support Vector Machine (SVM) classifier and RPISeq-RF, which uses a Random Forest classifier. On two non-redundant benchmark datasets extracted from the Protein-RNA Interface Database (PRIDB), RPISeq achieved an AUC (Area Under the Receiver Operating Characteristic (ROC) curve) of 0.96 and 0.92. On a third dataset containing only mRNA-protein interactions, the performance of RPISeq was competitive with that of a published method that requires information regarding many different features (e.g., mRNA half-life, GO annotations) of the putative RNA and protein partners. In addition, RPISeq classifiers trained using the PRIDB data correctly predicted the majority (57-99%) of non-coding RNA-protein interactions in NPInter-derived networks from E. coli, S. cerevisiae, D. melanogaster, M. musculus, and H. sapiens.

Conclusions
Our experiments with RPISeq demonstrate that RNA-protein interactions can be reliably predicted using only sequence-derived information. RPISeq offers an inexpensive method for computational construction of RNA-protein interaction networks, and should provide useful insights into the function of non-coding RNAs. RPISeq is freely available as a web-based server at http://pridb.gdcb.iastate.edu/RPISeq/.


Arun Sethuraman

Title: Cryptic Phylogeographic Patterns in Midwestern Populations of Blanding’s Turtles (Emydoidea blandingii)

Authors: Arun Sethuraman, Morgan L Becker, Fredric J Janzen

Abstract: As part of a population genetics study of the imperiled Blanding’s Turtle, we genotyped 212 turtles sampled across 18 populations in Iowa, Illinois, Nebraska and Minnesota using 8 microsatellite markers. Isolation by Distance analysis captured little of the overall genetic variance (R 2 = 0.04 in a plot of genetic versus geographic distance). Further investigation detected considerable structure among 5 distinct groups of populations, with populations typically structuring into groups that accord with expected patterns of post-glacial re-colonization of the upper Midwest. Unexpectedly, though, populations from Grant County, Nebraska grouped with turtles sampled from the Greater Chicago Metropolitan Area, a pattern detected at all levels of population structure (K=1 through 9) and with multiple statistical inference tools. Divergence and ancestral gene flow estimates reveal a split between E. blandingii populations in these two regions around 50000 years ago (95% CI of 39590-12025 years), with negligible bidirectional migration since. Sequencing of the microsatellite flanking regions and rebuilding the population phylogeny will reveal patterns of ancestral divergence and resolve whether the unexpected pattern resulted from convergent evolution or saturation of microsatellites, or from incomplete lineage sorting from a putative ancestor in the southern Great Plains between these populations.

 

Priyanka Surana

Title: Transcript based cloning of Rar3: a novel gene required for NB-LRR-mediated
immunity in plants


Abstract: Mla is a polymorphic locus in barley (Hordeum vulgare) that contains allelespecific
resistance (R) genes towards powdery mildew (Blumeria graminis f. sp. hordei).
Fast neutron mutagenesis of CI 16151 (Mla6, Rar3) plants uncovered a novel gene,
Rar3 (required for Mla6-specified resistance3). Wild-type plants are resistant to
Blumeria isolate 5874, whereas rar3 mutants are susceptible to fungal infection. The
wild-type and rar3 mutant plants were inoculated with powdery mildew and harvested at
16 and 32 hours after inoculation (hai). The 4 treatments (2 genotypes * 2 time points)
with 2 replicates each were hybridized on Affymetrix Barley1 GeneChip and sequenced
using Illumina GAIIx. Paired end reads were obtained for each sample with a mean
length of 150 base pairs. The main aim of this study is to identify the Rar3 gene using
GeneChip and RNA-Sequencing data analysis. Differential expression analysis was
performed on GeneChip data using the R package, Limma. The RNA-Sequencing reads
were filtered and trimmed using FASTX Toolkit, assembled using Trinity and then
differential expression analysis was performed on raw read counts using QuasiSeq
(unpublished R package). Preliminary results show that Rar3 influences a lot more
genes compared to another Rar mutant (rar1-m100 which is required for Mla-specified
resistance1). GeneChip differential expression analysis shows that Rar3 affects genes
controlling ATP binding, catalytic activity, transcription and phosphorylation in the
membrane or the nucleus.

 

Sweta Vangaveti

Phospholipids are an integral part of all living cell membranes and play an important role in functions pertaining to the cell's interaction with its external environment and its communication with other cells. Such interactions are important for cell survival as they help in not only maintaining the homogeneity of the cell (homeostasis) but also in identifying and protecting the cell from agents that can disrupt the cell machinery. The functions of different constituents of the cell membrane has received wide spread attention in recent years , however the existing computational and experimental tools still cannot completely explain the mechanism involved in signaling – the transfer of information across the cell membrane. Simulating the entire process of signaling computationally is difficult because of long time scales and the large number of particles involved. So, efficient coarse grained modeling is essential to study the system in molecular detail. From experiments we do know that one of the first steps involved in many signaling processes is clustering of phospholipids ( Phosphoinositides in particular) and presence of high concentration of ions (especially calcium) at the site of action in the membrane. Studying the effects of ions on the membranes may help in understanding this first step in signaling. Here we discuss a coarse grained model of a simple phospholipid – Phosphatidyl serine , placed in a two dimensional lattice and the interaction of this lattice interface with electrolytes containing ions with different valencies. Extending this model to other more complex phospholipids , we propose could possibly explain the role of ions in the first few steps of many signaling processes involving phospholipids.

 

Marie Vendettuoli

Title: Designing successful workflows for reproducible analysis in a regulatory statistics environment.

Abstract: We present a case study of the data challenge facing statisticians of USDA APHIS when evaluating submissions for product licensing. Specifically, we examine the impact to productivity that arises after introducing a workflow that relies on functionality of off-the-shelf business productivity software, R, Sweave, and LaTeX to facilitate information transfer between stakeholders. To address the challenge imposed by having no common data format enforced for submissions, we created a workflow for rapid development and deployment of robust tools rooted in fields of data visualization, reproducible research and human computer interaction. This workflow is a platform facilitating development of technical solutions, the implementation of which both reduces overall turnaround times and increases submission quality. Application extends beyond the immediate needs of the current user group and may be leveraged to create multidisciplinary just-in-time tools that meet the fluid demands existing at interface between statistics and business productivity audiences.

 

 

Rasna Walia

Abstract: Protein-RNA interactions play an important role in cellular processes like protein synthesis, RNA processing, and gene expression regulation.   Reliable identification of the interfaces involved in RNA-protein interactions is essential for comprehending the mechanisms and the functional implications of these interactions and provides a valuable guide for rational drug discovery and design. Because the determination of 3D structures of protein-RNA complexes has various technical limitations and is typically costly, reliable in silico interface prediction methods that require only the sequence information are urgently needed.

We systematically examined protein-RNA interface conservation among putative sequence homologs. Based on our analysis, we found that RNA-binding interfaces are conserved among putative sequence homologs, and that the conservation space can be divided  into three zones: Safe, Twilight, and Dark, according to the degree of interface residue conservation. We designed HomPRIP, a homologous sequence-based method for predicting RNA-binding sites in proteins. We compared the performance of HomPRIP on a benchmark of 198 proteins with  that of several state-of-the-art protein-RNA interface prediction methods. Our preliminary results show that, when homologs of query proteins can be found, HomPRIP can reliably identify protein-RNA interface residues.

 

Gokul Wimalanathan

Title: Gene Expression Resources Available from MaizeGDB


Wimalanathan Kokulapalan1, Jack Gardiner4 5, Bremen Braun2, Ethalinda KS Cannon4, Mary Schaeffer3 8,
Lisa Harper6 7, Carson Andorf2, Darwin Campbell2, Scott Birkett4, Taner Sen1 2 4, Nicholas Provart9, and
Carolyn Lawrence1 2 4
1Bioinformatics and Computational biology Program, Iowa State University, Ames, IA 50011;
2USDA-ARS Corn Insects and Crop Genetics Research Unit, Iowa State University, Ames, IA 50011;
3Division of Plant Sciences, Department of Agronomy, University of Missouri, Columbia, MO 65211;
4Department of Genetics Development and Cell Biology, Iowa State University, Ames, IA 50011; 5School
of Plant Sciences, University of Arizona, Tucson, AZ 85721‐0036;
6USDA-ARS Plant Gene Expression Center, Albany, CA 94710;
7Department of Molecular and Cell Biology, University of California, Berkeley, CA 94720;
8USDA-ARS Plant Genetics Research Unit, University of Missouri, Columbia, MO 65211;
9Department of Cell & Systems Biology, University of Toronto, Ontario Canada M5S 3G5

The completion of the maize genome sequence in 2009 has created both significant challenges and opportunities for maize researchers. The opportunities for understanding cellular processes underlying maize's phenomenal productivity have never been greater but this opportunity can only be seized if functional genomics software tools (FGSTs) are available to reduce the complexity of multimillion point data sets into manageable images and/or concepts. Currently, MaizeGDB is hosting numerous large gene expression data sets, and furthermore, indications from currently funded NSF Plant Genome Research Projects are that much more data will be deposited at MaizeGDB in the near future. Fortunately for maize researchers, free public domain FGSTs have been developed for other biological systems and their implementation at MaizeGDB can be accomplished with a moderate amount of effort.

In this poster, we describe current efforts at MaizeGDB that focus on leveraging two of these FGSTs, the eFP browser1 and MapMan2 by creating strategic linkages from MaizeGDB to the sites where these FGSTs are deployed. The eFP browser projects gene expression data onto a series of pictures (pictographs) representing the original plant tissues from which the expression data was derived. Each pictograph is colored according to the level of expression for the gene on of interest. The MapMan software suite allows the visualization of a variety of functional genomics datasets in the context of well characterized biochemical processes and metabolic pathways. Our initial efforts focus on the 60 tissues within the B73 Maize Gene Atlas3 developed by the Kaeppler laboratory at the University of Wisconsin with the expectation that additional expression data sets characterizing meristem and kernel development will be added in the near future.

 

Michael Zimmermann

"Combining Statistical Potentials with Dynamics-Based Entropies Improves Selection from Protein Decoys and Docking Poses"

Abstract: Protein structure prediction and protein-protein docking are important and widely used tools, but methods to confidently evaluate the quality of a predicted structure or binding pose have had limited success. Typically, either knowledge-based or physics-based energy functions are employed to evaluate a set of predicted structures (termed “decoys” in structure prediction and “poses” in docking), with the lowest energy structure being assumed to be the one closest to the native state. While successful for many cases, failures are still common. Thus, improvements to structure evaluation methods are essential for future improvements. In this work, we combine multi-body statistical potentials with dynamics models, evaluating fluctuation-based entropies that include contributions from the entire structure. This leads to enhanced selection of native-like structures for CASP9 decoys, refined ClusPro docking poses, as well as large sets of docking poses from the Benchmark 3.0 and Dockground datasets. These datasets test both bound and unbound docking, with positive results for each. Not only does our method yield improved average results, but for high quality docking poses we often pick the best available pose.

 

Tao Zuo

Global gene expression profiling of maize copy number variation (CNV) with phenotypic impact

 

Tao Zuo 1, Jianbo Zhang 1, Sudhansu Dash 2, Dan Nettleton 3, Roger Wise 4,5 and Thomas Peterson 1

 

1Department of Genetics, Development and Cell Biology, Department of Agronomy, Iowa State University, Ames, IA , 50011, USA;

2Virtual Reality Application Center, Iowa State University, Ames, IA, 50011, USA;

3Department of Statistics, Iowa State University, Ames, IA, 50011, USA;

4Corn Insects and Crop Genetics Research Unit, USDA-ARS, Iowa State University, Ames, IA, 50011,USA;

5Department of Plant Pathology and Microbiology, Iowa State University, Ames, IA, 50011, USA;

 

Recent comparisons of different maize inbred lines via array-based comparative genomic hybridization (Springer NM et al. 2009) and sequence-based whole genome re-sequencing (Lai J et al. 2010) has begun to reveal significant levels of copy number variation (CNV) and presence/absence variation (PAV). However, the question of how structural variation contributes to phenotypic diversity has remained unanswered. Here, we analyzed lines that vary in copy number of a specific segment of chromosome 1 due to duplication caused by alternative Ac transposition. We have shown previously that directly-oriented Ac 3’ and 5’ termini can generate paired segmental deletions and duplications by Sister Chromatid Transposition (Zhang and Peterson, 1999, 2005). We isolated a number of such duplications and deletions, and conducted expression analysis on one case (p1-ww714), which contains an inverted duplication on chromosome 1S. The region duplicated in p1-ww714 is 14.7 Mb in size and is predicted to contain approximately 300 gene models according to Maize GDB. Plants homozygous for p1-ww714 (i.e., four copies of the duplicated region) are significantly shorter and have smaller ears than normal siblings. Whereas, heterozygous plants (p1-ww714/normal; three copies of the duplicated region) are intermediate in height and ear size, suggesting that the segmental duplication in p1-ww714 exerts a CNV effect on phenotype. We implemented both GeneChip (new Affymetrix Maize WT 100K array) and high throughput sequencing (mRNA-Seq) approaches to study the relationship between CNV, and transcript accumulation. Preliminary results show that most genes within the duplicated segment exhibit dosage compensation, while some genes (~20%) exhibit dosage-dependent expression. Some genes outside the duplicated segment are differentially expressed and may represent the trans-effects of the duplicated genes. Genes within the segment are clearly overrepresented among all of the differentially expressed genes detected. These results provide insight into the transcriptional expression and phenotypic effect of a specific maize CNV.