Reproducible research tools
BIOS 691-001
Mikhail Dozmorov, first dot last at vcuhealth dot edu
1 credit-hour, 8 hours
9:00 am to 12:00 pm
June 11 to June 14, 2018.
One Capitol Square, Rm 5009
By appointment
grep
, awk
, sed
, vim
dplyr
) and visualization (ggplot2
) in R, tidyverseReproducibility is the cornerstone of science. In data science, reproducibility aims at delegating the majority of scientific computations to automated workflows. Such automation minimizes potential errors and irreproducibility of the point-and-click approach and makes it easier for others to trace and reconstruct analytical steps. Although the importance of computational reproducibility is commonly recognized, it is still not widely adopted, in part due to little systematic knowledge about available tools for reproducible research.
This workshop-style course will methods, tools, and software for reproducibly managing, manipulating, analyzing, and visualizing large-scale biological data. The goal is to familiarize the students with best practices and computational tools that will have immediate and long-term benefits in everyday work of a data scientist.
This course is not a statistics class. It is a data science-oriented course. Some general knowledge of statistics and study design is helpful but isn’t required.
After successful completion of this course, students will be able to:
Install several core packages, listed below. If install.packages()
generate errors, read carefully the error messages - likely some dependency packages are missing. Install them before installing the core packages.
install.packages(c("dplyr", "readr", "tidyr", "ggplot2", "knitr", "rmarkdown", "shiny", "shinythemes", "lubridate"))
Both undergraduates and graduate students are welcome to take the course. Auditing is possible contingent on class capacity. Contact the instructor for auditing arrangements.
This course will rely mainly on in-class participation, followed by assigned reading and practices with the software tools.
There will be four connected modules, each focusing on an important area of computational reproducible research. Each module will be presented in a traditional seminar format combined with real-life demo of practical tasks. The students will learn about reproducible research actively — by doing it.
None. Instead, a list of relevant reading will be provided.
Students are expected to attend every class and be on time. Participation counts toward the final grade. Homework will be provided for each topic and counts towards the final grade.
This course on GitHub https://github.com/mdozmorov/BIOS691.2018