Generally our work falls into the broad categories of Population Genomics, Comparative Genomics, and Computational Biology. Lab members work both on independent projects, as well as with larger scale, often multi-institutional, collaborative teams. Some recent examples of recent and ongoing projects follow, mostly as examples of what we are currently doing

Machine Learning for Population Genetic Inference


The explosion in genome sequencing data over the last two decades has demanded new approaches for making sense of the flood of information on genetic variation.
Screen Shot 2019-04-03 at 9.08.07 PM
Our lab has pioneered the use of machine learning approaches, sometimes known as artificial intelligence algorithms, towards this end with a particular emphasis on evolutionary questions. Indeed in recent years supervised machine learning approaches have proven uniquely powerful for a range of inference tasks in population genetics (reviewed in Schrider and Kern, 2018), yet their adoption is still in its infancy. A particular interest of the lab currently is in the exploration of deep learning methods for population genetics.


Population Genomics of Adaptation


Pasted Graphic
Adaptive changes at the phenotypic level abound in the natural world, yet we have very little understanding of their genetic determinants. We are interested in understanding the ways in which natural selection, in general, shapes patterns of genetic variation in genomes. A complete understanding of the population genomics of adaptation requires at least three related issues: the identification of individual targets of natural selection, an understanding of the adaptive history of those targets, and lastly an understanding of the ways in which linked genetic loci are influenced by those targets. My research program centers around these themes, alternately developing methods and applying them to genomic data sets. To this end we have been developing machine learning approaches, sometimes known as artificial intelligence algorithms, for inferring the location and mode of selection in genomes. An example of this approach is our method S/HIC which uses spatial patterns of polymorphism along a recombining chromosome in concert with a supervised machine learning method called an Extra-Trees classifier to identify selective sweeps with great sensitivity and sensitivity (Schrider and Kern, 2016).


Machine Learning Methods for Population Genomic Prediction of Function


figure2_3stateHMM_766
With the rapid explosion in genome sequence data over the past decade, geneticists find themselves in a place where data analysis rather than data collection is the rate limiting step towards gleaning new biological insights. In particular, as genome sequences from thousands of individuals are now available, population geneticists are faced with the daunting task of making sense of millions of genetic variants. It is tempting now to leverage these resources to aid identification of functional elements in the genome using genetic variation as a guide. Again, we have been again been using a fusion of traditional population genetics and machine learning to build powerful methods of inference. An example of this research which utilized so-called unsupervised learning techniques was our development of population genetic hidden Markov models for use in scanning genomes for the footprint of natural selection (Kern and Haussler, 2010). Recently in the lab we have turned our attention to supervised machine learning methods for annotating functional elements in genomes on the basis of population genomic data (Schrider and Kern, 2015).

Demographic Estimation from Genetic Data



AFSAn important use of population genetic data is in the estimation of population histories. One thread of research in our group is development and application of methods for such inference. A recent example is our
collaboration with Dr. Jody Hey on the Isolation with Migration (IM) model in which we developed a novel Markov chain representation of the coalescent process for exact calculation of the expected site frequency (Kern and Hey 2017).