BCB Symposium - March 31, 2017

Friday, March 31, 2017 - 8:30am
Event Type: 

The 3rd Annual BCB Symposium will take place on Friday, March 31, 2017 from 8:30 a.m. to 4:30 p.m. at Reiman Gardens.  This year's symposium is entitled: The Breadth and Depth of Bioinformatic Analysis.

Register here by March 30th: https://goo.gl/forms/00tqZh1vPCxJvJjg1

The symposium will feature several speakers, a poster session highlighting graduate and undergraduate student research, and a complimentary breakfast, lunch and coffee break.  There will be time for the speakers to interact with students. In addition to sharing their research, speakers are invited to discuss their journey to their current position and offer advice to current students who would like to pursue a career in academia.

If you cannot attend in person, you may join a webinar of the event: https://iastate.zoom.us/j/885106865

Or iPhone one-tap (US Toll):  +16465588656,885106865# or +14086380968,885106865#

Or Telephone:
    Dial: +1 646 558 8656 (US Toll) or +1 408 638 0968 (US Toll)
    +1 888 683 9685 (US Toll Free)
    Webinar ID: 885 106 865
    International numbers available: https://iastate.zoom.us/zoomconference?m=KPyzJhCljRBeBSA05POLGMWjuZHkOuVM

Speakers for the symposium include:

Dr. Stephen Altschul (Senior Investigator, Computational Biology Branch, NCBI): Dr. Altschul is widely known for the development of the BLAST and multiple sequence alignment algorithms that are integral to analysis of nucleotide and protein sequences.  Current work is focused on development of robust and accurate multiple protein sequence alignment methods.

Dr. Altschul will present at 10:45 a.m. on: "Dirichlet Mixtures, the Dirichlet Process, and the Topography of Amino-Acid Multinomial Space":


Amino acid frequencies at homologous positions within related proteins have been fruitfully modeled by Dirichlet mixtures, whose components can be viewed as probability hills within amino-acid multinomial space.  We have used the Dirichlet Process to construct such mixtures with an unspecified and unbounded number of components.  The resulting mixtures model multiple alignment data substantially better than do those previously derived.  They consist of over 500 components, in contrast to fewer than 40 previously, and provide a novel perspective on proteins.  Individual protein positions should be seen not as falling into one of several categories, but rather as arrayed near probability ridges winding through amino-acid multinomial space.

Dr. Zhiping Weng (Professor, Biochemistry & Molecular Pharmacology, University of Massachusetts Medical School): Dr. Weng’s research is focused on bioinformatics and computational genomics, specifically gene regulation, small silencing RNAs and protein docking and the development of computational methods to study these mechanisms.

Dr. Weng will present at 12:40 p.m. on: "ENCODE Encyclopedia: Featuring a Registry of Candidate Regulatory Elements and the Visualization Tool SCREEN for Searching Them".


The Encyclopedia of DNA Elements (ENCODE) Consortium has generated thousands of genomic datasets with the goal of annotating the functional elements in the human and mouse genomes. As members of the Data Analysis Center of the Consortium, we have worked with other Consortium members to integrate such genomic datasets generated by the ENCODE and Roadmap Epigenomics Consortia to create a collection of genomic annotations termed the ENCODE Encyclopedia. The Encyclopedia has three levels. The ground level consists of annotations that are very close to the experimental data, such as DNase I hypersensitive sites (DHSs) and “peaks” of histone modifications or transcription factor occupancy. In the middle level of the Encyclopedia resides a registry of candidate regulatory elements (cREs) that are anchored on DHSs in roughly 200 human cell types and 50 mouse cell types, and further annotated with histone modifications and transcription factor occupancy. We show that the registry is accurate in identifying functional promoters and enhancers in a cell type specific manners, evaluated using functional data such as mouse transgenic assays. We also show that the registry is comprehensive—it can identify promoters and enhancers in cell types for which DNase-seq experiments have not been performed. The top level of the Encyclopedia contains chromatin state calls using semi-automated genome annotation algorithms such as Segway and ChromHMM, for those cell types that have been interrogated by a complete or near-complete set of epigenomic assays. We have developed a Web-based database and visualization tool SCREEN (Search Candidate Regulatory Elements by ENCODE). SCREEN enables users to explore candidate cREs in the Registry across hundreds of cell and tissues types and filter regions by various facets. The user can use SCREEN to compare enhancer-like elements across tissues, predict enhancer-gene interactions, visualize gene expression profiles, and annotate genetic variants. SCREEN also provides the functionality to access and download supporting data, as well as visualize regions of interest using dynamic graphs and external genome browsers.

Dr. Scott Emrich (Associate Professor, University of Notre Dame) is an alumni of the BCB Graduate Program obtaining his PhD in 2007 under the mentorship of Srinivas Aluru. He is the Director of Bioinformatics at Notre Dame and is a Concurrent Associate Professor in the Department of Biological Sciences and a Research Associate Professor in the Department of Computer Science and Engineering. 

Scott is involved with a robust research program at Notre Dame. A portion of his $31.6 million in awards has come through two recent NIH awards on which he is PI.  His research interests include genome-focused bioinformatics, parallel computing, and arthropod genomics (VectorBase and Arthropod Genomics Consortium/i5K). Specifically, his group is focusing on non-model genome assembly and analysis with applications to global health and ecology.

Dr. Emrich will present at 9 a.m. on "Getting into an interdisciplinary (sub)problem: from MAGIs to minhash".

Advances in “big” biological data processing—including genome assembly—have largely followed one of two tracks: improved algorithms (e.g., BWT-accelerated alignments) or better data decomposition/parallelization.  For example, parallel sequence-based clustering was shown to be a promising decomposition method for a variety of problems I worked on at Iowa State over a decade ago.  In this talk I will start with the Maize Assembled Genomic Island (MAGI) collaboration between the Aluru, Schnable and Ashlock groups and how this influenced my independent research program. I then will talk about my ongoing efforts on malaria and tiger mosquitoes, high-throughput computing, and my students’ and my effort in the inter-BIO- CS space.  As a former BCB student, I will focus on my own personal common thread that is “divide and conquer” on multiple levels: starting with a good biological question, decomposing the computational problem into smaller more tractable problems, and most importantly knowing your individual strengths in the context of diverse research teams.  I will conclude with our most recently funded multidisciplinary project (R01) looking at co-translational protein folding, and share some emerging data decomposition techniques related to minhash that one of my own BCB students is applying problems related to community ecology.

Bio:  Prior to joining the faculty of the University of Notre Dame, Scott Emrich obtained a BS in Biology and Computer Science from Loyola College (MD) and a 2007 BCB PhD from Iowa State University after which he won the 2008 Zaffrano Prize for Graduate Research.  His research focus remains mostly computational genomics, and he has published over 75 peer-reviewed publications including venues such as Science (4, 2 covers), PNAS (3), Nature and Genome Research.

Scott will also speak at a March 30 seminar for the Computer Science Department at 3:40 p.m in 2019 Morrill Hall.  Here is the title and abstract for that presentation:

Integrating diverse data for improved computational genomics

Abstract: Genomics-driven analysis of many important species, which we have called “non-models”, remains challenging.  My group is funded by the NIH to computationally leverage newer higher-throughput sequencing and domain expert-provided metadata (biological traits like drug resistance, protein folding, community-sourced data) to tackle problems mostly related to arthropod-borne diseases (e.g., malaria). 

For this talk I will focus on updates to our 2016 ACM BCB paper, which has been submitted at the request of the organizers to IEEE/ACM Transactions on Computational Biology and Bioinformatics (TCBB). Previous computational approaches for imputation of missing genotype data have relied on a linear order of markers and a genotype panel, both of which are not common in non-models. We address this limitation with our ADDIT (Accurate Data-Driven Imputation Technique) approach, which is composed of two data integration-focused algorithms: a non-model variant that employs statistical inference, and a model organism variant that better leverages reference data using a supervised learning-based approach. I will show that ADDIT is more accurate, faster and requires less memory than state-of-the art methods using model (human) and non-model (maize, apple, grape) datasets.  I also may present emerging –omics results from three other funded projects, two involving closely related mosquito species complexes (Anopheles funestus and Culex quinquefasciatus) and a new R01 looking at sequence and network patterns linked to protein folding.  These methods integrate sequence analysis with graph and sketch-based methods for integrating diverse data types with variable levels of uncertainty.

The full schedule for the BCB Symposium including student presentations includes:



8:30 - 9:00

Breakfast and Welcome

9:00 - 9:50

Speaker: Dr. Scott Emrich, Director of Bioinformatics at Notre Dame

ISU and BCB Alumni

10:00 - 10:40

Poster session 1

10:45 - 11:35

Speaker: Dr. Stephen Altschul, Senior Investigator, NIH’s NLM/NCBI

11:35 - 12:35


12:40 - 1:30

Speaker: Dr. Zhiping Weng, Univ. of MA Medical School, Biochemistry and Molecular Pharmacology Department

1:40 - 2:20

Poster Session 2

2:20 - 3:05

Student Presentations:

Lauren Laboissonniere – Single cell transcriptomics of photoreceptive retinal ganglion cells

Carla Mann – Predicted Protein Intrinsic Disorder Improves in silico identification of RNA-Protein Partners

Gokul Wimalanathan – Maize - GO Annotation Methods Evaluation and Review (MAIZE-GAMER)

3:05 – 3:25

Coffee Break

3:25 – 3:40

Student Presentations, continued

John Hsieh - Evaluation of Organism Identification for 16S rRNA Sequencing of Chicken Cecal Microbiome

3:40 – 4 p.m.

Undergraduate Student Presentations - A Buggy Dataset: Bioinformatics Analyses on RNA Sequencing Data from Insect BioBlitz

4:05 - 4:20

Closing Remarks: Dr. Carolyn Lawrence Dill


Sponsors for the BCB Symposium include the Office of Biotechnology, the College of Liberal Arts and Sciences, the College of Agriculture and Life Sciences, the College of Veterinary Medicine, the College of Engineering, the BCB Graduate Program and the Graduate & Professional Student Senate.