Dr. Mingze He

Graduate Research Assistant, Lawrence-Dill Lab

Hi, people usually call me Ming: )

I got my Ph.D. degree in Bioinformatics at Iowa State University (ISU) 2018 Nov.

Before that, I worked for Beijing Genomics Institute (BGI-Shenzhen) for 3 years as a bioinformatics researcher. My research focused on using Next Generation Sequencing (NGS)  technology to study complex disease and human evolution. My co-authored work appears on high-profile peer-review scientific journals, which also covered by mainstream media like Time, The EconomistBBCNPRNational Geographic, and New Scientist etc.



  • Computational biology doctoral training with a comprehensive background in programming and R statistics analysis, excellent presenter and published author 
  • 3+ years industry experience and 5+ years in academia with a convincing track record of scientific innovation, resulted in 7 peer-reviewed publications on top scientific journals, such as Nature, with 700+ citations within past 5 years
  • Efficient in prototyping and building novel algorithmic tools, on local cluster and AWS cloud, to solve biological problems
  • Independent, self-motivated investigator and excellent communicator with 5+ successful collaboration experience across 4 institutes and multifunctional teams



  • Apply R packages to build statistical and analytic frameworks on big data produced from Next-generation Sequencing (NGS) technology
  • Parallel computing on thousands of CPUs & storage management of terabytes of data on High Performance Computing (HPC) cluster
  • Proficient programming languages, i.e., Python, R, SQL, Perl, under Linux environment
  • Build bioinformatics tools deployed by Docker on local server and on Amazon Web Service (AWS) cloud platform
  • Develop novel machine learning algorithm to identify target loci



  • Doctor of Philosophy (Ph.D.) Bioinformatics and Computational Biology, Iowa State University (ISU) Ames, Iowa, USA       2018
  • Bachelor of Science (B.S.) Biotechnology, South China University of Technology, China                                                        2012



A hybrid machine learning discovery platform

  • Developed statistical models to predict TCGA solid tumor types and tissue of origin from RNA-seq data (99.8% accuracy between normal and lung cancer)
  • Identifying biomarkers predicting treatment response in model organisms
  • Early detection 6 major cancer from liquid biopsy (tumor educated platelet) with improvement in accuracy ranging from 10-20% in each type
  • Elucidate key dependencies and factors explaining observed RNA expression profiles across cancer types via natural language processing and ontology grouping



Bioinformatics Research Assistant, Iowa State University, Ames, IA                 2013-2018

Major in Bioinformatics and Computational Biology under a curriculum heavy in programming, statistics and quantitative analysis

•       Collaborated with wet-lab collaborators developing an in-house RNA pathway enrichment bioinformatics analysis pipeline (under Linux/Unix environment)
•       Mentored one undergraduate research assistant to learn computer programming languages (Perl, Java, and Python) to conduct qualitative analysis
•       Presented data mining & sequence analysis on G-quadruplex transcription level regulatory roles in maize at teleconference, seminars and international conferences
•       Drafted manuscripts and published as lead authors on developing a novel RNA-seq data visualization & statistical analysis algorithm C-REx on a lab server (http://c-rex.dill-picl.org/)

Bioinformatics Visiting Researcher, UC-Berkeley, Berkeley, USA                       2012-2013

Conducted a population genetics research to analyze 3,500 exom sequencing samples

•       Use dadi and fastsimcoal (requires parallel computing on HPC) softwares to computationally infer demographic history of populations
•       Studied the root cause analysis of computation of sources (sample size statistical simulation) and reasons for specific issues (likelihood convergence) and finding appropriate solutions
•       Interpreted research finding and presented at lab seminars and international conferences

Junior Bioinformatics Researcher, BGI-Shenzhen, China                                        2010-2013

Provided bioinformatics service to universities, hospitals and pharmaceutical companies

•       Acquired industry skills in NGS and bioinformatics 
•       Developed bioinformatics analysis and quality control pipeline (including BWA, samltools, GATK, etc.) to effectively process high-throughput (terabytes) data 
•       Evaluated computational analysis outcome produced by machine learning prediction (random forest), statistical simulation & sampling, compared with mass spectrum validation results
•       Presented results to project leads and department seniors



Research conducted @ ISU (PI Dr. Carolyn Lawrence-Dill)

G4 quadruplexes in and near regulatory elements of maize genes predict tissue type and altered transcriptional and translational response to abiotic stresses. in preparation

Compare expression profiles for pre-defined gene groups with C-REx. Mingze He, Peng Liu, Carolyn Lawrence-Dill. Under review with Bioinformatics

Response to Persistent ER Stress in Plants: a Multiphasic Process that Transitions Cells from Prosurvival Activities to Cell Death. The Plant Cell 2018

A hypothesis-driven approach to assessing significance of differences in RNA expression levels among specific groups of genes. M He et al. Current Plant Biology, 2017

An ontology approach to comparative phenomics in plants A Oellrich, RL Walls … M He et al. Plant methods, 2015

Research conducted @ UC-Berkeley (Dr. Rasmus Nielsen) and BGI-Shenzhen (group leader Dr. Xin Jin)

Altitude adaptation in Tibetans caused by introgression of Denisovan-like DNA E Huerta-Sánchez, X Jin,…,M He et al. Nature, 2014 512 (7513), 194-197

Research conducted @ BGI-Shenzhen (group leader Dr. Xin Jin)

Whole-genome sequencing in an autism multiplex family L Shi, X Zhang, R Golhar, FG Otieno, M He et al. Molecular autism, 2013

Detection of clinically relevant genetic variants in autism spectrum disorder by whole-genome sequencing Y Jiang, RKC Yuen … M He et al. The American Journal of Human Genetics, 2013 Volume 93, Issue 2, 249 – 263

An effort to use human-based exome capture methods to analyze chimpanzee and macaque exomes X Jin, M He, B Ferguson et al. PLoS One, 2012

Area of Expertise: 
