Skip to main content
Mingze He

Mingze He

  • Senior Data Scientist at Williams-Sonoma, Inc.
  • Genetics, Development and Cell Biology
Hi, people usually call me Ming: )

I got my Ph.D. degree in Bioinformatics at Iowa State University (ISU) 2018 Nov.

Before that, I worked for Beijing Genomics Institute (BGI-Shenzhen) for 3 years as a bioinformatics researcher. My research focused on using Next Generation Sequencing (NGS)  technology to study complex disease and human evolution. My co-authored work appears on high-profile peer-review scientific journals, which also covered by mainstream media like Time, The Economist, BBC, NPR, National Geographic, and New Scientist etc.


Computational biology doctoral training with a comprehensive background in programming and R statistics analysis, excellent presenter and published author 
3+ years industry experience and 5+ years in academia with a convincing track record of scientific innovation, resulted in 7 peer-reviewed publications on top scientific journals, such as Nature, with 700+ citations within past 5 years
Efficient in prototyping and building novel algorithmic tools, on local cluster and AWS cloud, to solve biological problems
Independent, self-motivated investigator and excellent communicator with 5+ successful collaboration experience across 4 institutes and multifunctional teams


Apply R packages to build statistical and analytic frameworks on big data produced from Next-generation Sequencing (NGS) technology
Parallel computing on thousands of CPUs & storage management of terabytes of data on High Performance Computing (HPC) cluster
Proficient programming languages, i.e., Python, R, SQL, Perl, under Linux environment
Build bioinformatics tools deployed by Docker on local server and on Amazon Web Service (AWS) cloud platform
Develop novel machine learning algorithm to identify target loci


Doctor of Philosophy (Ph.D.) Bioinformatics and Computational Biology, Iowa State University (ISU) Ames, Iowa, USA       2018
Bachelor of Science (B.S.) Biotechnology, South China University of Technology, China                                                        2012


A hybrid machine learning discovery platform

Developed statistical models to predict TCGA solid tumor types and tissue of origin from RNA-seq data (99.8% accuracy between normal and lung cancer)
Identifying biomarkers predicting treatment response in model organisms
Early detection 6 major cancer from liquid biopsy (tumor educated platelet) with improvement in accuracy ranging from 10-20% in each type
Elucidate key dependencies and factors explaining observed RNA expression profiles across cancer types via natural language processing and ontology grouping


Bioinformatics Research Assistant, Iowa State University, Ames, IA                 2013-2018

Major in Bioinformatics and Computational Biology under a curriculum heavy in programming, statistics and quantitative analysis

•       Collaborated with wet-lab collaborators developing an in-house RNA pathway enrichment bioinformatics analysis pipeline (under Linux/Unix environment)
•       Mentored one undergraduate research assistant to learn computer programming languages (Perl, Java, and Python) to conduct qualitative analysis
•       Presented data mining & sequence analysis on G-quadruplex transcription level regulatory roles in maize at teleconference, seminars and international conferences
•       Drafted manuscripts and published as lead authors on developing a novel RNA-seq data visualization & statistical analysis algorithm C-REx on a lab server (

Bioinformatics Visiting Researcher, UC-Berkeley, Berkeley, USA                       2012-2013

Conducted a population genetics research to analyze 3,500 exom sequencing samples

•       Use dadi and fastsimcoal (requires parallel computing on HPC) softwares to computationally infer demographic history of populations
•       Studied the root cause analysis of computation of sources (sample size statistical simulation) and reasons for specific issues (likelihood convergence) and finding appropriate solutions
•       Interpreted research finding and presented at lab seminars and international conferences

Junior Bioinformatics Researcher, BGI-Shenzhen, China                                        2010-2013

Provided bioinformatics service to universities, hospitals and pharmaceutical companies

•       Acquired industry skills in NGS and bioinformatics 
•       Developed bioinformatics analysis and quality control pipeline (including BWA, samltools, GATK, etc.) to effectively process high-throughput (terabytes) data 
•       Evaluated computational analysis outcome produced by machine learning prediction (random forest), statistical simulation & sampling, compared with mass spectrum validation results
•       Presented results to project leads and department seniors


Research conducted @ ISU (PI Dr. Carolyn Lawrence-Dill)

G4 quadruplexes in and near regulatory elements of maize genes predict tissue type and altered transcriptional and translational response to abiotic stresses. in preparation

Compare expression profiles for pre-defined gene groups with C-REx. Mingze He, Peng Liu, Carolyn Lawrence-Dill. Under review with Bioinformatics

Response to Persistent ER Stress in Plants: a Multiphasic Process that Transitions Cells from Prosurvival Activities to Cell Death. The Plant Cell 2018

A hypothesis-driven approach to assessing significance of differences in RNA expression levels among specific groups of genes. M He et al. Current Plant Biology, 2017

An ontology approach to comparative phenomics in plants A Oellrich, RL Walls … M He et al. Plant methods, 2015

Research conducted @ UC-Berkeley (Dr. Rasmus Nielsen) and BGI-Shenzhen (group leader Dr. Xin Jin)

Altitude adaptation in Tibetans caused by introgression of Denisovan-like DNA E Huerta-Sánchez, X Jin,…,M He et al. Nature, 2014 512 (7513), 194-197

Research conducted @ BGI-Shenzhen (group leader Dr. Xin Jin)

Whole-genome sequencing in an autism multiplex family L Shi, X Zhang, R Golhar, FG Otieno, M He et al. Molecular autism, 2013

Detection of clinically relevant genetic variants in autism spectrum disorder by whole-genome sequencing Y Jiang, RKC Yuen … M He et al. The American Journal of Human Genetics, 2013 Volume 93, Issue 2, 249 – 263

An effort to use human-based exome capture methods to analyze chimpanzee and macaque exomes X Jin, M He, B Ferguson et al. PLoS One, 2012
Social Media and Websites