BCB 690. Student Seminar in Computational Biology
Friday 2:10-3:00 PM
0296 Town Engineering
| Date |
Presenters |
Title |
| January 18, 2008 |
No class
|
|
| January 25 |
Chris Tuggle
|
How to Make Scientific Seminars |
| Feb. 1 |
No Class |
|
| Feb. 8 |
Saras Saraswathi
Matthew Moscou
|
Saras-An Improved Gene Selection Method for Accurate Classification of Microarray Gene Expression Data
Matt-Gene-For-Gene-Mediated Transcriptome Reprogramming In Barley-Powdery Mildew Interactions
|
| Feb. 15 |
No Class
|
|
| Feb. 22 |
Fadi Towfic
Xiaoyong Sun
|
Fadi - Exploring Gene Expression Networks With RetinaWorkbench
Xiaoyong -
BioIDMapper: a R package for mapping biological IDs
|
| Feb. 29 |
Scott Boyken
Yuanyuan Huang
|
Scott - Itk SH3/SH2: Non-Canonical Interactions Regulate Kinase Activity
Yuanyuan - Apply Dominations to Prediction of RNA Secondary Structure |
| March 7 |
Ben Lewis
Long Qu |
Ben - Combining Structural Modeling with Machine Learning Approaches to Improve Prediction of Nucleic Acid Binding Sites in Telomerase
Long - Subsampling based bias reduction in estimating the number of differentially expressed genes from microarray data |
| March 14 |
Deepak Reyon
Haining Lin
|
Haining - Characterization of paralogous protein families in rice
Deepak -
Determination of Protein Structure using X-Ray Crystallography CusB
|
| March 17-March 21 |
Spring Break |
|
| March 28 |
Jon Hurst
Li Xue
|
Li - PCA on microsphere of biodegradable polymer adjuvant data |
| April 4 |
John Van Hemert
Shreyartha Mukherjee
|
Shreyartha - Improving Secondary Protein Structure Prediction by generating decoys
John - TurtleBase: a facilitative ecoinformatics system
|
| April 11 |
Kyoungmin Roh
Ataur Katebi
|
Kyoungmin - Analysis of gene network using simulation |
| April 18 |
Tian Xia
Mike Zimmermann |
Mike - Normal Mode Analysis for Protein Dynamics
Tian - Omics Viz |
| April 25 |
Ankit Agrawal
Bob Farnham
|
Ankit - PairwiseStatSig: Pairwise Statistical Significance Estimation for Local
Sequence Alignment
Bob - An Algorithm for Finding Optimal Gene Network Models from Microarray Data
|
| May 2 |
Fengli Fu
Wengang Zhou
|
Fengli - Improve the Engineering of Zinc Finger Proteins (ZFPs) by Modular Design
Wengang - A Predicted Interactome for Vitis Vinifera |
| May 9 |
Finals Week |
|
BCB 690 Student Seminar
Friday, Jan. 18 at 2:10 p.m.
No Class
Friday, Jan. 25 at 2:10 p.m.
Speaker: Chris Tuggle, Chair, BCB
Title: How to Make Scientific Seminars
Good seminars tell a story and engage the listener in your journey
INTRODUCTION:
Lay the ground work for YOUR seminar
-Relevant background information
-What are the key questions remaining
-What is the problem you are addressing
-What is the approach you are taking
ORGANIZATION:
Does the seminar flow logically
-WHY Background/Rationale
-WHAT Hypothesis and/or Specific Aims
-HOW Experimental Approach/Methods
-WHAT Experimental Results/Significance
-WHERE Future experiments/Applications
CLARITY:
Can the audience follow you?
How clear are the answers to why/what/how/where?
Seminar should be understandable to those outside immediate field.
DATA EVALUATION:
Audience should understand how results are used to answer questions posed. Statistical analyses should be included where appropriate.
Not applicable to all seminar presentations.
SEMINAR SUMMARY:
The “take home” message
Recap major findings and indicate how they answer the initial questions asked. Indicate significance/future directions/applications.
SCIENTIFIC EVALUATION:
Is the research scientifically sound?
Is the hypothesis reasonable?
Will the experimental design address the hypothesis?
Are there appropriate controls?
Are the conclusions justified by the data?
VISUAL AIDS
DO: Simplify graphics; one concept per slide; use cartoons; have large fonts; label gels; include titles on each slide; avoid complex illustrations
DO NOT: Use dark colors on a dark background (e.g. red on a blue background); use yellow on a white background; use small or elaborate fonts; use complex tables; let visual aids detract from your data (i.e. colors/graphics/transitions/animations).
EYE CONTACT
Talk to several audience members, not the screen.
SPEAKING VOICE Clear, loud and with enthusiasm
HANDLING QUESTIONS
Repeat questions asked from front of the room
Keep questions brief and to the point - answer the question asked.
Don't be afraid to ask for clarification of question if you don't understand.
Don't be afraid to say "I don't know"
BEWARE OF TIME
Stay within time limits allotted for seminar
Keep answers to questions brief and on point
Other suggestions for presentations
1. Structure the talk in a logical way
Introduction
-Let the audience know what you are going to tell them in advance
-give Background; Rationale; Hypothesis/Question; Methodology for testing the hypothesis (describe your system)
Results
-Keep data slides as simple as possible
-Describe in detail how you tested your hypothesis; present the results; INTERPERET the results/Why are your results significant
Conclusions
-Summarize the results
-Describe how these results support or did not support the hypothesis
-Provide model or summary showing significance of your results. Discuss
the results in terms of background, significance, and rationale presented in the introduction.
-Discuss future work. What is the next step(s) or discuss utility of the work.
Acknowledgements
-Acknowledge your colleagues, collaborators and funding sources
Questions
When you receive questions will be up to you. Either during and/or after
you are finished. Each speaker will let the class know if they would prefer
the questions be held until the end.
2. Other things to consider
-The audience. The audience is comprised of students who have a good understanding of the principles of computation and some aspects of biology and statistics, but not necessarily your specific field of study.
-Do not use “lab lingo”. Be precise and define the terms.
-If you get really nervous about speaking, try memorizing an opening statement to get you started. Practice the talk several times.
-Use transitions to move between sections of your talk to establish a logical flow.
3. For more stuff to think about when preparing a presentation, check out these websites.
http://www.lions.odu.edu/~kkilburn/semhome.htm;
http://www.kumc.edu/SAH/OTEd/jradel/effective.html;
http://www.swarthmore.edu/NatSci/cpurrin1/powerpointadvice.htm
Friday, Feb. 1
No class.
Friday, February 8
Presenter: Saras Saraswathi
Rotating First-Year Student
BCB Graduate Program
Title: An Improved Gene Selection Method for Accurate Classification of Microarray Gene Expression Data
Abstract:
We present an improved gene selection method for accurate classification of cancers based on micro-array gene expression data (MGED data set). The cancer classification problem has a small number of samples with large input features, which makes it difficult to classify the data using machine learning techniques. Hence, reduction of input features, in addition to finding the right combination of genes for maximizing the classification accuracy is an important problem in bioinformatics.
A two-step ‘Integer Coded Genetic algorithm using Extreme Learning Machine(IGA-ELM)’ is given for selecting the most relevant genes for maximizing the classification accuracy. For selecting the optimal parameters, Particle Swarm Optimization algorithm is employed. For classification, the recently developed fast learning neural algorithm called ‘Extreme Learning Machine’ (ELM) is used. For the GA, the genes are used as decision variables and fitness is determined by the classification accuracy obtained using ELM. Genetic operators have been suitably defined to generate valid solutions for this problem. Performance comparison of the GA based gene selection scheme with existing methods on the GCM dataset indicate superior performance of the proposed approach.
====================
Also Friday, Feb. 8
Presenter: Matthew Moscou
Major Professor: Roger Wise
Department of Plant Pathology
Title:
Gene-For-Gene-Mediated Transcriptome Reprogramming In Barley-Powdery Mildew Interactions
Abstract:
Matthew Moscou (a, b), Nick Lauter (b,c), Rico Caldo (d), and Roger Wise (a,b,c).
(a) Interdepartmental Bioinformatics and Computational Biology Graduate Program, Iowa State University, Ames, IA 50011-1020
(b) Department of Plant Pathology and Center for Plant Responses to Environmental Stresses, Iowa State University, Ames, IA 50011-1020
(c) Corn Insects and Crop Genetics Research, USDA-ARS, Iowa State University, Ames, IA 50011-1020
(d) Monsanto, St. Louis, MO, 63167, USA
Barley has a complex interaction with powdery mildew [Blumeria graminis f. sp. hordei (Bgh)] that begins with early recognition of microbe-associated molecular patterns (MAMPs) from the pathogen. During Bgh invasion of the epidermis, the fate of cells is decided by the presence of resistance (R) genes that mediate an immediate response, which halts the progress of the pathogen. To understand the regulatory role and response associated with R-gene-mediated defense, we surveyed the transcriptional response of barley upon pathogen inoculation using three Manchuria NILs carrying allelic variants at the Mla locus, null mutations mla1-m508 and mla6-m9472, as well as Sultan 5 (Mla12) and mutants derived there from, mla12-m66, rar-1-1, rar1-2, and rom1. Each experiment consisted of sampling at 0, 8, 16, 20, 24, and 32 hours after inoculation with three replications, varying only in the inclusion of non-inoculated material. We found that the resistance response manifests itself via dynamic reprogramming of the transcriptome, which includes over 5,000 genes and may likely exceed 10,000. The quantitative nature of the Mla control becomes apparent when we observe null mutations, where early signaling effects are compromised by this gene loss. Collectively, our results confirm the high-level regulatory control of Mla in gene-for-gene-mediated resistance and point to an essential early regulatory role as observed via massive transcriptome reprogramming.
Funding by NSF-Plant Genome Award #0500461.
February 15 - No Class
Friday, February 22
Presenter: Xiaoyong Sun
Major Professor: Dianne Cook
Home Department: Statistics
Title:
BioIDMapper: a R package for mapping biological IDs
Abstract: Many new databases aiming at genes and proteins are developed as more and more
species are sequenced. It becomes tedious job about how to navigate among
different data resources, map various IDs, and collect and analyses separate
biological knowledge. Current popular databases include Entrez Gene, UniProt,
Gene Ontology, EMBL, OMIM, PubMed, KEGG, etc. Based on NCBI and UniProt,
BioIDMapper can facilitate mapping between different databases, integrate
various ID systems and provide a full practical view from gene level, mRNA level
and functional level regarding one specific ID. This R package is based on RCurl
and XML package.
Also, February 22:
Presenter: Fadi Towfic
Major Professor: Vasant Honavar
Home Department: Computer Science
Title: Exploring Gene Expression Networks With RetinaWorkbench
Abstract: Many cellular processes often involve the interaction of multiple
gene-products that are produced at various time points in a pathway.
The discovery of interacting genes/gene products is usually a central
aim for hypotheses that strive to explain the dynamics of biological
pathways. RetinaWorkbench is a Cytoscape plugin that aims to integrate
easy access to public (as well as private) gene-expression datasets,
gene ontologies and user-defined annotations with a user-friendly
querying/visualization mechanism. RetinaWorkbench was used to
reconstruct interactions between genes involved in photoreceptor
differentiation in the mouse retina using publicly-available
gene-expression datasets. The results showed that RetinaWorkbench can
be used as a hypothesis-building tool for exploring relationships
between genes of interest.
Friday, February 29
Presenter: Yuanyuan Huang
Rotating First-Year Graduate Student
BCB Graduate Program
Title: Apply Dominations to Prediction of RNA Secondary Structure
Abstract: Understanding RNA molecules is important to genomics research. Recently researchers at the Courant Institute of Mathematical Sciences used graph theory to model RNA molecules and provided a database of trees representing possible secondary RNA molecules. In this research I want to use domination parameters to predict which 9-10 degree trees are more likely to exist in nature as RNA structures. This approach appears to have promise in graph theory applications in genomics research.
Because the functional repertoire of RNA molecules, like proteins, is closely linked to the diversity of their shapes, uncovering RNA's structural repertoire is vital for identifying novel RNAs, especially in genomic sequences. To help expand the limited number of known RNA families, we can use graphical representation and clustering analysis of RNA secondary structures to predict novel RNA topologies and their abundance as a function of size. Representing the essential topological properties of RNA secondary structures as graphs enables enumeration, generation, and prediction of novel RNA motifs.
I will apply a graphic parameter and logistic regression methods to construct the 1-10 degree RNA structure space.Significantly that nearly all existing RNAs fall into one group, which I refer to as "RNA-like"; I consider the other group "non-RNA-like". My method will predicts many 9-10 degree candidates for novel RNA secondary topologies, some of which are remarkably similar to existing structures.
Also Friday, February 29
Presenter: Scott Boyken
Rotating First-Year Graduate Student
BCB Graduate Program
Title: Itk SH3/SH2: Non-Canonical Interactions Regulate Kinase Activity
Abstract:
In Protein-tyrosine kinases, the SH3 domain usually binds proline-rich regions, and the SH2 domain usually binds phosphotyrosines; however, in TEC family non-receptor kinase, Itk, SH3 binds SH2 through non-canonicol, intermolecular interactions, regulating oligomerization and kinase activity. The structure of the SH3/SH2 heterodimer has recently been solved, revealing the molecular details of this interaction. To verify the structure, several mutants have been designed, and the binding affinity of these mutants as compared to wild-type has been measured vai NMR. Our NMR experiments reveal a key interaction between Glu189 on the SH3 domain and Arg332 on the SH2 domain.
Friday, March 7
Presenter: Long Qu
Major Professor: Jack Dekkers
Co-Major Professor: Dan Nettleton
Home Department: Animal Science
Title: Subsampling based bias reduction in estimating the number of differentially expressed genes from microarray data
Abstract: In microarray experiments, the proportion of genes that change their expression levels in response to different treatment conditions is both a global measure of the strength of biological responses and a critical quantity for false discovery rates (FDR) control. However, current statistical procedures for estimating its complement, the proportion of nondifferentially expressed genes (π 0), often suffer from high biases and low statistical power. In this study, we will develop a bias reduction procedure through a novel use of data subsampling, with analogy to but extending the jackknife. Based on the fact that increasing sample sizes almost always increases power and the power reaches 1 for infinite sample size, our procedure repeatedly deletes some biological samples randomly to produce many subsamples of various sample sizes. For each subsample, the same set of hypotheses is tested and a histogram estimator is used to estimate the p-value density at 1. Unlike most existing methods that use the p-value density at 1 from the full sample as an estimate of π 0, our procedure takes a further step by robustly regressing the p-value density estimates at 1 over the sub-sample sizes and then extrapolating the regression curve to infinity to get the final estimate of π 0. This corresponds to estimating the p-value density at 1 with an infinite sample size, exactly the π 0 in theory. We derived the exact functional form between p-value density at 1 and the sample size based on the assumption that p-values are from t-tests and the standardized effect sizes for differentially expressed genes follow a normal distribution with 0 mean and common unknown variance. Motivated by this heuristic, we proposed a flexible regression function, which includes the above exact form as a special case, to increase robustness to parametric assumptions. Simulations showed that the new estimator has smaller mean squared error (MSE) compared with the currently most widely used q-value smoother method by greatly reducing the bias but mildly increasing the variance. For FDR control purposes, averaging the q-value smoother estimate with the new estimate can both provide a conservative safety margin and achieve smaller MSE. In conclusion, our new procedure leads to bias reduced estimation of π 0 and improved statistical power in FDR control and has a smaller MSE. (Supported by USDA-NRI-2005-3560415618)
Also Friday, March 7
Presenter: Ben Lewis
Rotating First-Year Graduate Student
BCB Graduate Program
Title: Combining Structural Modeling with Machine Learning Approaches to Improve Prediction of Nucleic Acid Binding Sites in Telomerase
Abstract: Telomerase is a ribonucleoprotein enzyme responsible for adding telomeric repeats to the ends of linear chromosomes and is overexpressed in ~90% of cancers as a method of circumventing the natural cell division limit. Unfortunately, the three-dimensional structure and RNA- and DNA-binding residues of human telomerase have not been determined. By combining structural models based on solved portions of the structure of /Tetrahymena/ telomerase with machine learning approaches, it was possible to predict nucleic acid binding residues in human telomerase which correspond closely to residues experimentally shown to affect binding. These results indicate that this method may be able to provide a starting point on which biochemical experiments may be based.
Friday, March 14
Haining Lin
Genetics, Development and Cell Biology Department
Major Professor: Xun Gu and Robin Buell
Title: Characterization of paralogous protein families in rice
Abstract:
Background -
High gene numbers in plant genomes reflect polyploidy and major gene
duplication
events. Oryza sativa, cultivated rice, is a diploid onocotyledonous species
with a ~390 Mb genome that has undergone segmental duplication of a
substantial
portion of its genome. This, coupled with other genetic events such as
tandem
duplications, has resulted in a substantial number of its genes, and
resulting
proteins, occurring in paralogous families.
Results
Using a computational pipeline that utilizes Pfam and novel protein domains,
we
characterized paralogous families in rice and compared these with paralogous
families in the model dicotyledonous diploid species, Arabidopsis thaliana.
Arabidopsis, which has undergone genome duplication as well, has a
substantially
smaller genome (~120 Mb) and gene complement compared to rice. Overall, 53%
and
68% of the non-transposable element-related rice and Arabidopsis proteins
could
be classified into paralogous protein families, respectively. Singleton and
paralogous family genes differed substantially in their likelihood of
encoding a
protein of known or putative function; 26% and 66% of singleton genes
compared
to 73% and 96% of the paralogous family genes encode a known or putative
protein
in rice and Arabidopsis, respectively. Furthermore, a major skew in the
distribution of specific gene function was observed; a total of 17 Gene
Ontology
categories in both rice and Arabidopsis were statistically significant in
their
differential distribution between paralogous family and singleton proteins.
In
contrast to mammalian organisms, we found that duplicated genes in rice and
Arabidopsis tend to have more alternative splice forms. Using data from
Massively Parallel Signature Sequencing, we show that a significant portion
of
the duplicated genes in rice show divergent expression although a
correlation
between sequence divergence and correlation of expression could be seen in
very
young genes.
Conclusions
Collectively, these data suggest that while co-regulation and conserved
function
are present in some paralogous protein family members, evolutionary
pressures
have resulted in functional divergence with differential expression
patterns.
Also Friday, March 14
Deepak Reyon
First-Year Graduate Student
BCB Graduate Program
Title:
Determination of Protein Structure using X-Ray Crystallography CusB
Abstract:
The Cus complex is a trans-membrane system in gram-negative bacteria that
mediates resistance to copper and silver by cation efflux. Copper is essential
to the cell but also toxic, so a homeostatic environment must be maintained. The
Cus complex consists of 3 core proteins: CusA (inner membrane), CusC (outer
membrane) and CusB (periplasmic). In Dr. Edward Yu's lab we are working solving
the structure of this complex. I will present my progress in determining the
structure of CusB using X-Ray Crystallography.
Friday, March 21 - Spring Break
Friday, March 28
Presenter: Li Xue
BCB program, Home Department:
MSE
Major professor: Krishna Rajan
Title: PCA on microsphere of biodegradable polymer adjuvant data
Abstract:
In immunology, an adjuvant is an agent that may stimulate the immune
system and increase the response to a vaccine, without having any
specific antigenic effect in itself(wikipedia). Under multiple
controlling factors, to decide optimum polymer chemistry that give
predictable immune response and enhanced stability of protein
immunogens provides a challenge. In this study, microspheres
fabricated by 1,6-bis(p-carboxyphenoxy)hexane (CPH), sebacic acid
(SA), and 1,8-bis(p-carboxyphenoxy)-3,6-dioxaoctane (CPTEG)are used as
adjuvant, and added to DCs (Dendritic Cells) to study cell markers
expression behavior, which is one of the characteristics of DCs
activation. Here PCA(Principle Component Analysis), a dimension
deduction method, is applied to the multi-variable polymer adjuvant -
cell marker expression data. Some treads are detected showing that
higher hydrophilic microspheres(CPTEG:CPH system) cause more cell
marker expressed than more hydrophobic microspheres(CPH:SA system).
Also, an optimum chemistry seems to sit between 50:50 CPTEG:CPH and
100%CPTEG, which is consistent with polymer film experiments.
Also Friday, March 28
Presenter: Jon Hurst
First-Year BCB Graduate Student
BCB Program
TITLE: Markov Model Selection and Parameterization Using A Genetic Algorithm
ABSTRACT: Modeling ion channel function is problematic because transitions between many conformational states cannot be directly observed. One solution to this problem is to use a genetic algorithm to create models that fit all desired data. I have developed software that not only parameterizes but creates the structure of Markov (state-based)
models using this method. To determine whether this method can structure and parameterize mechanistically accurate models, test cases were conducted by using the genetic algorithm with known models. Preliminary results suggest that this method is well suited not only for parameterization of Markov models, but for model selection. This
method could be also be applied to many disciplines beside neurophysiology that use hidden Markov models.
Friday, April 4
Presenter: Shreyartha Mukherjee
First Year BCB Rotating Student
BCB Graduate Program
Title:
Improving Secondary Protein Structure Prediction by generating decoys
Abstract: Predicting secondary protein structure using amino acid sequence information alone is one of the fundamental unsolved problems in computational molecular biology. Any algorithm that attempts to predict protein structure requires a scoring or discriminatory function that can distinguish between correct and incorrect conformations. If we can generate high-quality decoys
with the aim of fooling scoring functions, we can take a step further at improving the existing scoring functions and leading to more accurate structure prediction.
Also Friday, April 4
Presenter: John Van Hemert
Home Department: Computer Engineering
Major Professor: Julie Dickerson
Title: TurtleBase: a facilitative ecoinformatics system
Abstract:
The observation that many ecological survey-projects have led to massive
collections of static data suggests the construction of a centralized
platform for eco-informatics. Much ecological and environmental research is
conducted by accumulating observational data across long timeframes. Dr.
Fred Janzen's lab has been and continues to conduct just such a project
observing nesting Chrysemys picta (painted turtles) on a campground island
in the Mississippi River near Clinton, Iowa. Observations and measurements
are taken of the turtle mothers, hatchlings, and nests over a four to six
week period each summer. Since 1989 this system has entailed handwriting
notes on paper and manually transcribing data to digital tables and
spreadsheets for small scale analysis. I will present a project where a
relational database was designed, all turtle data since 1989 has been
imported to the database, and a web portal was created for access to the
data. Benefits were instantly available in several areas: consistency,
automation, data input, data analysis tools, and data retrieval via
download. Results from preliminary manual and automatic data mining will be
presented.
Friday, April 11
Kyoungmin Roh
Second-Year Graduate Student
BCB Graduate Program
Home Department:
EEOB
Major professor: Professor Stephen Proulx
Title: Analysis of gene network using simulation
Abstract: The traditional approach of Molecular biology research has been an inherently local one. The focus was on examining and collecting data on a single gene or a single reaction. However, recently, there has been much interest on the dynamics of gene regulatory networks (GRNs) being a collection of DNA segments in a cell that interacts with each other and with other substances in the cell. I applied mathematical approach for modeling of GRNs. This model describes the reaction kinetics of the constituent parts and the functions are ultimately derived from basic principle of simple expressions derived from Michaelis-Menten enzymatic kinetics. The functional forms are usually chosen as Hill functions that serve as an approximation for the real molecular dynamics. These dynamics depends on some parameters and I used simulated annealing algorithm to calculate the optimal fitness and the optimal parameters of gene network. I made a model that has two genes and experiences two different environments. From simulation, I may get the optimal gene interaction network and I will try more complicated evolutionary network in future.
Also Friday, April 11
Title:
Abstract:
Friday, April 18
Presenter: Mike Zimmermann
First Year Graduate Student
BCB Graduate Program
Title: Normal Mode Analysis for Protein Dynamics
Abstract : In recent years it has become clear that methods need to be developed to quickly calculate the molecular motions of proteins. Molecular Dynamics (MD) simulations are too computationally costly and require too much user time to be completed by most researchers. The amount of NMR and crystallographic structure data available to researchers is constantly growing while 3D structure prediction algorithms are also improving. This information usually only provides a static image of the protein structure rather than a dynamic depiction of the malleable entities that exist in cells. To analyze
protein dynamics in a computationally inexpensive manner the coarse grained Elastic Network Model (ENM) was developed. Time independent variants of ENM have been developed and studied by various groups. Recently, a time dependent model was derived which allows for interesting analysis of protein motions from a sound physical and intuitive perspective. It is hopped that with further experimentation more accurate and meaningful protein dynamics may be derived through use of this model.
Presenter: Tian Xia
Home Department: Computer Engineering
Major Professor: Julie Dickerson
Title : Omics Viz
Abstract: OmicsViz is a Cytoscape plugin (cytoscape2.4,2.5,2.6) dedicated to providing useful visualization and an integrated analysis tool for large-scale omics data. OmicsViz imports omics data into Cytoscape and visualizes it on a graph according to the change of gene experimental values. OmicsViz also provides a mapping function between two different species or between probe set and experimental names and node names in a network.
Title:
Abstract:
Also Friday, April 18
Title:
Abstract:
Friday, April 25
Presenter: Ankit Agrawal
BCB Minor
Major Professor: Xiaoqiu Huang
Home Department:
Computer Science
Title: PairwiseStatSig: Pairwise Statistical Significance Estimation for Local
Sequence Alignment
Abstract: Estimation of statistical significance of a pairwise alignment is of wide
interest in sequence comparison. Currently, most of the popular alignment
programs report the statistical significance of a pairwise alignment in context
of a database search, which is dependent on the database. This work explores the
use of pairwise statistical significance, which depends only on the pair of
sequences being aligned and the alignment parameters, and can be useful in
assessing the relatedness of two sequences (or a small number of sequences) in
less time, independent of any database. We experimented with different methods
to determine that censored maximum likelihood fitting of the score distribution
(with censoring the distribution right of the peak) gives the most accurate
estimates of pairwise statistical significance. Further, we evaluated this
method in a homology detection experiment with a subset of CATH 2.3 database,
which has been previously used by researchers as a benchmark data set for
protein comparison. Comparison of results with popular database search programs
like SSEARCH and PSI-BLAST on the same database indicate that the results of
pairwise statistical significance are comparable, and sometimes better than
those of database statistical significance (with SSEARCH). However, PSI-BLAST
performs the best, presumably due to its use of query-specific substitution
matrices. Pairwise statistical significance can be extremely useful in
evaluating different parameter combinations for pairwise alignment - like
alignment program, substitution matrices and gap penalties. As an application of
pairwise statistical significance, we also conducted a series of homology
detection experiments to empirically determine the effective gap opening
penalties for pairwise protein alignment with the widely used BLOSUM
substitution matrices - BLOSUM45, BLOSUM50, BLOSUM62 and BLOSUM80, on the same
benchmark database. The proposed method is implemented in C language in a
program PairwiseStatSig, and is expected to be a useful tool for computational
biologists for pairwise statistical significance estimation purposes, especially
for smaller set of sequences without having to perform time-consuming database
searches. The program PairwiseStatSig is available for free academic use at
www.cs.iastate.edu/$\sim$ankitag/PairwiseStatSig.html.
Also Friday, April 25
Presenter: Bob Farnham
Major Professor: Srinivas Aluru
Home Department: Electrical and Computer Engineering
Title:
An Algorithm for Finding Optimal Gene Network Models from Microarray Data
Abstract: Finding gene networks from microarray data is computationally NP-hard. In this presentation, an O(n * 2^n) (time and space) sequential algorithm will be described. On single-processor systems, such algorithms are limited to inferring gene networks of up to 32 or so genes. Thus, gene network problems are a fruitful area of research for parallel systems. Some thoughts on how this problem may be mitigated through parallel approaches will be offered.
Friday, May 2
Wengang Zhou
Major Professor Julie Dickerson
Department: Electrical and Computer Engineering
Title: A Predicted Interactome for Vitis Vinifera
Abstract: High throughput technology such as yeast two-hybrid has produced a huge amount of interaction data. One of the important goals of functional genomics is to identify the complete protein interaction network or Interactome. In this study, we collect 55146 available interactions including seven species from DIP database. By applying best reciprocal blast analysis, we found 3082 grape orthologs for 19665 unique DIP proteins. The latest published grape protein sequences by Italian-French Group are used. After removing redundant interactions, the predicted grape interactome contains 2380 interactions involving 1555 unique grape proteins. By mapping all involved grape proteins to their Arabidopsis orthologs, we further used BiNGO to find the overrepresented biological processes for three big subnetworks. Then, we presented a structure based feature to predict Subcellular locations for the entire grape proteome based on multi-class classifiers. The proteins secondary structures are predicted using PSIPRED system and the training data contains 7579 proteins within 12 locations. About 26% of all predicted interacting pairs come from the same Subcellular location. Surprisingly, even though we have few Mitochondrion and Chloroplast proteins, most of them are interacting with each other. We also obtained more evidence from TAIR which had 822 non-redundant proteins interactions for Arabidopsis. 11 of 822 interactions match with our predicted interactions.
Fengli Fu
Major Professor: Dan Voytas
Department: Genetics, Development and Cell Biology
Title: Improve the Engineering of Zinc Finger Proteins (ZFPs) by Modular Design
Abstract: The zinc finger motif is one of the best understood DNA-binding domains. Because it is typically modular both in structure and in DNA binding activity, it is the most suitable scaffold for constructing engineered DNA binding proteins. By fusing with various functional effector domains to create artificial DNA modifiers, engineered ZFPs have many potential uses in both basic science and medical therapy. The dominant methodology currently available to academic laboratories for engineering zinc finger proteins is modular design, or modular assembly. But at present, modular assembly has a high failure rate. We hypothesize that there are some rules governing the construction of ZFPs by modular design and understanding these rules will improve the engineering of ZFPs. In order to facilitate and improve the engineering of ZFPs for academic researchers, we developed a web database, into which the zinc fingers and engineered ZFPs were collected. Using the ZFPs generated using selection method, we learned the frequency of each amino acid at the 9 key positions of a 3-finger ZFP contacting a 9bp binding site. Lab experiment showed that we can modify the zinc fingers according to the frequency we found to improve the success rate of engineering ZFPs.
Title:
Abstract:
Also Friday, May 2
Title:
Abstract:
Friday, May 9
Finals Week - Presentations if needed.
URL:
Copyright© 2008, Iowa State University, all rights reserved.
Please direct corrections, suggestions, and comments to bcb@iastate.edu.
Last Modified:
|