Avoiding “double dipping” in the Analysis of Single-Cell RNA Sequencing Data, Statistics Colloquium by Anna Neufeld ’18, Ph.D. student, University of Washington, Wednesday, October 5, North Science Building 114, Wachenheim or Zoom link at https://williams.zoom.us/j/93008831654.
Abstract: The hypothesis testing techniques that we teach in undergraduate statistics courses are designed for testing pre-specified hypotheses. The reality of modern data analysis is that scientists often explore their data to generate hypotheses, and then test those hypotheses using the same data. We refer to the practice of using the same data to generate and test a null hypothesis as “double dipping”. My research is focused on helping scientists avoid double dipping in applied settings.
In this talk, I will focus on double dipping that arises in the analysis of single-cell RNA sequencing data, where researchers first estimate cell types by clustering their data and then test to see which genes are associated with these estimated cell types. When the same data are used for cell type estimation and downstream significance testing, standard hypothesis tests fail to control the Type 1 error rate. We introduce count splitting, a flexible framework that allows us to avoid double dipping in this setting.
Throughout the talk, I will highlight how pursuing a PhD in statistics after Williams allowed me to use skills from my undergraduate math, statistics, and computer science courses but also explore completely new areas of application, such as genomics. I am happy to answer questions about graduate school!