Pathway and Gene Selection with Guided Regularized Random Forests
Project Date: August 2017–February 2018
Advisor: Dr. Tyler Cook
Brief Project Description:
- Developed simulations in R to assess the efficacy of a new methodological approach that utilized guided regularized random forests to identify important genes and genetic pathways when modeling for a particular biological outcome using microarray data
- Applied method to a breast cancer data set to identify important pathways, the results of which corresponded to empirical findings
- Created an application in R to visualize random forest results using the shiny and forestFloor packages that won the Best Visualization Award at the 2018 UCO CREIC Symposium
Project Files:
- The presentation I gave at the 2018 Joint Mathematics Meetings in San Diego.
- A detailed write-up of the simulations and breast cancer dataset results. (Note: This is part of a larger paper awaiting publication.)
- The code I wrote for the pathway ranking and gene selection components of the simulation as well as the code used to obtain the breast cancer data results.
- The code for the data visualization app. Note that, for the purposes of the presentation, only the best and worst performing pathways, Glycolysis-Gluconeogenesis and BC-Regulation of hem, respectively, were available for display.