Geir Kjetil Sandve, Anton Nekrutenko, James Taylor, and Eivind Hovig. “Ten Simple Rules for Reproducible Computational Research.” PLoS Computational Biology 2013. http://journals.plos.org/ploscompbiol/article?id=10.1371/journal.pcbi.1003285
List, Markus, Peter Ebert, and Felipe Albrecht. “Ten Simple Rules for Developing Usable Software in Computational Biology.” PLoS Computational Biology 2017. http://journals.plos.org/ploscompbiol/article?id=10.1371/journal.pcbi.1005265
Millman, K Jarrod, and Fernando Pérez. “Developing Open-Source Scientific Practice.” Implementing Reproducible Research. 2014. http://www.jarrodmillman.com/oss-chapter.html. A thorough and practical account of all steps in computational reproducible research.
“Ten simple rules” collection of essays covering all professional aspects of scientific career. http://collections.plos.org/ten-simple-rules
“Statistics for biologists” one-pagers about statistics, methods, and reproducibility by Nature journals. http://www.nature.com/collections/qghhqm
“Biologist’s Guide to Computing” book. From shell basics to GitHub to R to dplyr, ggplot2. Basic. http://biologistsguide2computing.com/
“Points of significance” collection of statistical primers by Nature Methods. https://www.nature.com/collections/qghhqm/pointsofsignificance
“Computational Biology Primers” one- or two-pagers on various topics of genomics and bioinformatics by Nature Biotechnology journal. [https://liacs.leidenuniv.nl/~hoogeboomhj/mcb/nature_primer.html]
Wilson, Greg, Jennifer Bryan, Karen Cranston, Justin Kitzes, Lex Nederbragt, and Tracy K. Teal. “Good Enough Practices in Scientific Computing.” Edited by Francis Ouellette. PLOS Computational Biology 13, no. 6 (June 22, 2017): e1005510. https://doi.org/10.1371/journal.pcbi.1005510 - Best practices for data management, software development, collaborations, project organization, version control, manuscripts.
“Tools for Reproducible Research” course by Karl Broman. http://kbroman.org/Tools4RR/
“Initial steps toward reproducible research” by Karl Broman, http://kbroman.org/steps2rr
“Steps towards reproducible research” resources and reading. http://kbroman.org/steps2rr/pages/resources.html
Software Carpentry lessons on Unix, version control, automation, R & Python programming. http://software-carpentry.org/lessons/
UW-Madison Software Carpentry Workshop coveryng best practices of coding. https://github.com/UW-Madison-ACI/boot-camps
Biomedical Data Science Workshops by Stephen Turner. From R basics through data manipulation with dplyr
, visualization with ggplot2
, reproducible research with knitr
. http://bioconnector.org/workshops/index.html
“Technical Foundations of Informatics” by Michael Freeman and Joel Ross. Introduction to R, Rmarkdown, plotly, shiny, git and github. https://info201-s17.github.io/book/index.html
Video: “How not to fool yourself with p-values and other statistic” by Regina Nuzzo, NIH Videocast. https://videocast.nih.gov/launch.asp?23420
Greenland, Sander, Stephen J. Senn, Kenneth J. Rothman, John B. Carlin, Charles Poole, Steven N. Goodman, and Douglas G. Altman. “Statistical Tests, P Values, Confidence Intervals, and Power: A Guide to Misinterpretations.” European Journal of Epidemiology 31, no. 4 (2016): 337–50. https://doi.org/10.1007/s10654-016-0149-3 - P-value (mis)interpretation. Considering p-values in context of the underlying model, and how null/test hypotheses are (in)compatible with it. A list of 25 common misinterpretations of p-values, and their rebuttals.
Ioannidis, John P. A. “The Proposal to Lower P Value Thresholds to .005.” JAMA 319, no. 14 (April 10, 2018): 1429. https://doi.org/10.1001/jama.2018.1536 - Advantages and disadvantages of p-value threshold lowering. Examples of current practices. Table - Various Proposed Solutions for Improving Statistical Inference on a Large Scale.
“So you want to be a Data Scientist” - Nature Blogs. http://blogs.nature.com/naturejobs/2013/03/18/so-you-want-to-be-a-data-scientist
A Book for Anyone to Get Started with Unix. http://seankross.com/the-unix-workbench/, and the GitHub repository, https://github.com/seankross/the-unix-workbench
Data Coding 101 – Intro To Bash. Four episodes, video. https://data36.com/data-coding-bash-best-practices/
An interactive explainer of any shell command. http://explainshell.com/
Unix/Linux command reference sheets. https://cheat-sheets.s3.amazonaws.com/linux-commands-cheat-sheet-new.pdf and https://files.fosswire.com/2007/08/fwunixref.pdf
Survival guide for Unix newbies. http://matt.might.net/articles/basic-unix/
Settling into Unix tutorial. http://matt.might.net/articles/settling-into-unix/
Shell programming with bash tutorial. http://matt.might.net/articles/bash-by-example/
Master the power of command-line with a list of one-liner gems. http://www.commandlinefu.com/commands/browse
“The Unix shell”, Software Carpentry. https://swcarpentry.github.io/shell-novice/
A curated list of Terminal frameworks, plugins & resources for command-line interface (CLI) lovers. http://terminalsare.sexy and https://github.com/k4m4/terminals-are-sexy
A collection of links to learning resources about Unix, shell best practices, R and python tools for genomics. https://github.com/crazyhottommy/getting-started-with-genomics-tools-and-resources
Data Science at the Command Line, https://www.datascienceatthecommandline.com/
Linux basics manuals and tutorials by Thomas Girke. http://hpcc.ucr.edu/manuals_linux-basics_intro
A curated list of delightful Bash scripts and resources. https://github.com/awesome-lists/awesome-bash
One-pager simple git
guide. https://rogerdudler.github.io/git-guide/
One-pager of git
commands. https://github.com/kbroman/Tools4RR/blob/master/04_Git/GitCommands/git_notes.md
Learn git
interactively in 15 min. https://try.github.io/levels/1/challenges/1
Interactive git branching tutorial. http://learngitbranching.js.org/
“Git and GitHub guide” by Karl Broman http://kbroman.org/github_tutorial/
Software Carpentry course on git
. https://swcarpentry.github.io/git-novice/
Book “Version Control by Example” by Eric Sink. http://ericsink.com/vcbe/
Blischak, John D., Emily R. Davenport, and Greg Wilson. “A Quick Introduction to Version Control with Git and GitHub.” Edited by Francis Ouellette. PLOS Computational Biology 12, no. 1 (January 19, 2016): e1004668. https://doi.org/10.1371/journal.pcbi.1004668 - An excellent explanation of Git and GitHub. Definitions (Box 1), tutorial
Bryan, Jennifer. “Excuse Me, Do You Have a Moment to Talk about Version Control?” https://doi.org/10.7287/peerj.preprints.3159v2
Book(down) “Happy Git and GitHub for the useR” by Jenny Bryan. http://happygitwithr.com/
How to create pull requests. https://akrabat.com/the-beginners-guide-to-contributing-to-a-github-project/
Quick Git and GitHub videos. http://www.dataschool.io/git-and-github-videos-for-beginners/
GitHub training videos. https://www.youtube.com/user/GitHubGuides/videos
“Learn the most important Git commands in a free video course” by Trevor D. Miller, trevordmiller, https://trevordmiller.com/courses/real-world-git
Regular expression, Unix commands, Python quick reference, SQL reference card. http://practicalcomputing.org/files/PCfB_Appendices.pdf
Tutorial to sed
by Bruce Barnett. http://www.grymoire.com/Unix/Sed.html
Vim introduction and tutorial. https://blog.interlinked.org/tutorials/vim_tutorial.html
Interactive Vim tutorial. http://www.openvim.com/
Vim reference card. http://web.mit.edu/merolish/Public/vi-ref.pdf
“Why Use Make” blog post by Mike Bostock, https://bost.ocks.org/mike/make/
A minimal tutorial on make by Karl Broman, http://kbroman.org/minimal_make/
“Learning about Makefiles” by Dave Tang. http://davetang.org/muse/2015/05/31/learning-about-makefiles/
Automation and Make by SoftwareCarpentry. https://swcarpentry.github.io/make-novice/
Makefiles in bioinformatics, one PDF lecture and four exercises, https://github.com/vsbuffalo/makefiles-in-bioinfo
Stallman, Richard M., and Roland McGrath. “GNU Make-A Program for Directing Recompilation.” (1991). https://www.gnu.org/software/make/manual/make.pdf
Boettiger, Carl, and Dirk Eddelbuettel. “An Introduction to Rocker: Docker Containers for R.” The R Journal 9, no. 2 (2017): 527–36. https://journal.r-project.org/archive/2017/RJ-2017-065/index.html - Rocker. Docker definitions. Command examples. Singularity. https://www.rocker-project.org/
Bioconductor Dockers, https://github.com/Bioconductor/bioc_docker
Langmead, Ben, and Abhinav Nellore. “Cloud Computing for Genomic Data Analysis and Collaboration.” Nature Reviews Genetics, January 30, 2018. https://doi.org/10.1038/nrg.2017.113 - Big data and cloud computing overview. Tables with cloud computing providers, big data types. Chief source of key references.
Bioconductor AMIs in the cloud, https://bioconductor.org/help/bioconductor-cloud-ami/
Video: Brief guide on running RStudio Server’s web interface on Amazon Web Services, and references therein. https://www.youtube.com/watch?v=NQu3ugUkYTk&list=PLuFFQwn__hQddxX6RhiGGeEdDy0MjWOO1. Text version: https://amunategui.github.io/EC2-RStudioServer/
Data Carpentry Genomics Workshop, Amazon EC2 installation. http://www.datacarpentry.org/genomics-workshop/
Tips for organizing projects from Karl Broman. http://kbroman.org/steps2rr/pages/organize.html
Organizing data in spreadsheets tutorial. http://kbroman.org/dataorg/. Or, read the paper https://github.com/kbroman/Paper_DataOrg
Clean Code, best practices for function names, patterns and anti-patterns, and more on good programming practices http://www.cbs.dtu.dk/courses/27610/clean_code_index.html
Seemann, Torsten. “Ten Recommendations for Creating Usable Bioinformatics Command Line Software.” GigaScience 2, no. 1 (December 2013). https://doi.org/10.1186/2047-217X-2-15
List, Markus, Peter Ebert, and Felipe Albrecht. “Ten Simple Rules for Developing Usable Software in Computational Biology.” PLoS Computational Biology 13, no. 1 (January 2017): e1005265. https://doi.org/10.1371/journal.pcbi.1005265.
Ten Simple Rules for Robustifying Your Software. https://github.com/oicr-gsi/robust-paper
“Code and Data for the Social Sciences: A Practitioner’s Guide” book by Matthew Gentzkow and Jesse Shapiro, PDF. https://web.stanford.edu/~gentzkow/research/CodeAndData.pdf
Wilson, Greg, D. A. Aruliah, C. Titus Brown, Neil P. Chue Hong, Matt Davis, Richard T. Guy, Steven H. D. Haddock, et al. “Best Practices for Scientific Computing.” PLoS Biology 2014. http://journals.plos.org/plosbiology/article?id=10.1371/journal.pbio.1001745
Organization of files, folders, code, by DataCarpentry. https://github.com/datacarpentry/rr-organization1
How to share data with a statistician, by Jeff Leek group. https://github.com/jtleek/datasharing
Broman, Karl W, and Kara H. Woo. “Data Organization in Spreadsheets.” Accessed March 29, 2018. https://doi.org/10.7287/peerj.preprints.3183v1 - Excel spreadsheet tips and best practices
Data Organization in Spreadsheets, common mistakes. http://www.datacarpentry.org/spreadsheet-ecology-lesson/02-common-mistakes/
Software Carpentry reading material on software engineering and scientific computing. http://software-carpentry.org/reading/
Software development skills for data scientists by Trey Causey, http://treycausey.com/software_dev_skills.html
Simple script to quickly set-up a version controlled project, https://github.com/adomingues/createProjectDirectories
ProjectTemplate - advanced project template, https://github.com/johnmyleswhite/ProjectTemplate, http://projecttemplate.net/
Mastering Software Development in R, https://bookdown.org/rdpeng/RProgDA/
R basics manuals and tutorials by Thomas Girke. https://sites.google.com/a/bioinformatics.ucr.edu/bioinformatics-manuals/home/R_BioCondManual
“R for Data Science” book by Garrett Grolemund & Hadley Wickham, covers ecosystem of R tools for data analysis and visualization done right. http://r4ds.had.co.nz/
Noble, William Stafford. “A Quick Guide to Organizing Computational Biology Projects.” PLoS Computational Biology 5, no. 7 (July 2009): e1000424. https://doi.org/10.1371/journal.pcbi.1000424 - Computational projects organization, folder structure, command line scripts, version control.
A very short R introduction - https://cran.r-project.org/doc/contrib/Torfs+Brauer-Short-R-Intro.pdf
Tutorials for learning R, https://www.r-bloggers.com/how-to-learn-r-2/
R Programming Software and Statistics Tutorials, https://www.youtube.com/user/marinstatlectures/featured
swirl
R package for interactive R leaning, http://swirlstats.com/
learnr
: Interactive tutorials for R, https://github.com/rstudio/learnr, https://rstudio.github.io/learnr/
Transform repeated code into functions. http://kbroman.org/steps2rr/pages/functions.html
How-to package functions. http://kbroman.org/steps2rr/pages/packages.html
Package tutorial by Hillary Parker. https://hilaryparker.com/2014/04/29/writing-an-r-package-from-scratch/
R package primer by Karl Broman. http://kbroman.org/pkg_primer/
“R packages” book by Hadley Wickham. http://r-pkgs.had.co.nz/
Jeff Leek on developing R packages. https://github.com/jtleek/rpackages
sinew
R package for making templates of help headers for functions. https://github.com/metrumresearchgroup/sinew
pRojects
R package for making project templates. https://github.com/lockedata/pRojects
mkrpkg
- Template for making R packages. https://github.com/noamross/mkrpkg
“Turn scripts into reproducible reports” by Karl Broman. http://kbroman.org/steps2rr/pages/reports.html
“R Markdown” tutorial by Karl Broman. http://kbroman.org/knitr_knutshell/pages/markdown.html and http://kbroman.org/knitr_knutshell/pages/Rmarkdown.html
“A quick introduction to R/markdown” presentation by Peter Ralph, and some R Markdown gotchas (advanced). http://petrelharp.github.io/r-markdown-tutorial/using-rmarkdown.slides.html, and https://petrelharp.github.io/r-markdown-tutorial/gotchas.html
R Markdown guides from Rstudio. https://support.rstudio.com/hc/en-us/articles/205368677-R-Markdown-Dynamic-Documents-for-R
R markdown reference sheets. https://www.rstudio.com/wp-content/uploads/2015/02/rmarkdown-cheatsheet.pdf and https://www.rstudio.com/wp-content/uploads/2015/03/rmarkdown-reference.pdf
Create beautiful and semantically meaningful articles with pandoc. Example at https://pandoc-scholar.github.io, how to at https://github.com/pandoc-scholar/pandoc-scholar
An example of how to organize a PhD thesis, https://github.com/jarad/thesisTemplate
CV and resume in Markdown, https://github.com/ryanpeek/markdown_cv
Easy web publishing from R on Rpubs.com. http://rpubs.com/
“Bookdown: Authoring Books with R Markdown” by Yihui Xie. https://bookdown.org/yihui/bookdown/
Github Pages template for academic personal websites. https://github.com/academicpages/academicpages.github.io
Slidify: Modern, simple presentations written in R Markdown https://benjaminlmoore.wordpress.com/2014/02/24/slidify-presentations-in-r-markdown/
Xaringan, presentation template based on remark.js
by Yihui Xie. (https://github.com/yihui/xaringan)[https://github.com/yihui/xaringan]
md2googleslides, Markdown to Google Slides converter. https://github.com/googlesamples/md2googleslides
An R package to produce posters. https://github.com/pzhaonet/postr. And, a collection of templates to make posters, presentations and publications in R Markdown. https://github.com/exporl/kuleuven-templates
pkgdown
is designed to make it quick and easy to build a website for your package. https://github.com/r-lib/pkgdown
Organize your project into a research website, https://github.com/jdblischak/workflowr
“Beautiful Jekyll,” Build a beautiful and simple website in minutes. http://deanattali.com/beautiful-jekyll/
Data Manipulation Using R (& dplyr). PDF slides available at https://ramnarasimhan.files.wordpress.com/2014/10/data-manipulation-using-r_acm2014.pdf, and http://www.slideshare.net/Ram-N/data-manipulation-using-r-acm2014
Data Manipulation with dplyr
. http://datascienceplus.com/data-manipulation-with-dplyr/
“Aggregating and analyzing data with dplyr” by Data Carpentry. http://www.datacarpentry.org/R-genomics/04-dplyr.html
Do your “data janitor work” like a boss with dplyr
. http://www.gettinggeneticsdone.com/2014/08/do-your-data-janitor-work-like-boss.html
Introduction to dplyr for Faster Data Manipulation in R, https://rpubs.com/justmarkham/dplyr-tutorial
“Data Manipulation Using R (& dplyr)” slides by Ram Narasimhan, http://www.slideshare.net/Ram-N/data-manipulation-using-r-acm2014
Hands-on dplyr tutorial for faster data manipulation in R, https://www.youtube.com/watch?v=jWjqLW-u3hc
“Data visualization in R” by Data Carpentry. http://www.datacarpentry.org/R-genomics/05-data-visualization.html
“ggplot2 tutorial/slides/code examples/references” by Jenny Bryan. https://github.com/jennybc/ggplot2-tutorial
“R Graph Catalog”, visuals and code examples of graphs made with ggplot2
. http://shiny.stat.ubc.ca/r-graph-catalog/
R Seminar: Introduction to ggplot2
, comprehensive introduction, from UCLA. http://www.ats.ucla.edu/stat/r/seminars/ggplot2_intro/ggplot2_intro.htm
Interactive plots in R, https://davetang.org/muse/2018/05/18/interactive-plots-in-r/
“Licensing”, SoftwareCarpentry. https://swcarpentry.github.io/git-novice/11-licensing.html
“Pick a License, Any License”. https://blog.codinghorror.com/pick-a-license-any-license/
“License your software” by Karl Broman. http://kbroman.org/steps2rr/pages/licenses.html
“The Whys and Hows of Licensing Scientific Code” by Jake VanderPlas. http://www.astrobetter.com/blog/2014/03/10/the-whys-and-hows-of-licensing-scientific-code/
Morin, Andrew, Jennifer Urban, and Piotr Sliz. “A Quick Guide to Software Licensing for the Scientist-Programmer.” PLoS Computational Biology 2012. http://journals.plos.org/ploscompbiol/article?id=10.1371/journal.pcbi.1002598
Goodman, Alyssa, Alberto Pepe, Alexander W. Blocker, Christine L. Borgman, Kyle Cranmer, Merce Crosas, Rosanne Di Stefano, et al. “Ten Simple Rules for the Care and Feeding of Scientific Data.” PLoS Computational Biology 2014. Lists all major data sharing repositories. http://journals.plos.org/ploscompbiol/article?id=10.1371/journal.pcbi.1003542