Syllabus, "Statistical Methods for High-throughput Genomic Data I" course, BIOS 567
Instructor: Mikhail G. Dozmorov, Ph.D.; Jin Liu (TA)
Schedule: Monday, Wednesday 1:00pm – 2:20pm
Classroom: One Capital Square (OCS) 5009
Office hours: Mon, Wed 2:30pm – 4:00pm, OCS 730; TA: Mon, Wed 10:30am - 12:00pm
Prerequisites: BIOS 524 Biostatistical Computing, BIOS 553 Linear Regression, and BIOS 554 ANOVA
Required text: Sorin Draghici “Statistics and Data Analysis for Microarrays Using R and Bioconductor”, 2nd Ed., Chapman & Hall/CRC Press, 2012. ISBN-978-1-4398-0975-4, preview on Google books. Supplemental course materials provided in-class
Software: The R programming environment
Course description
Welcome to BIOS 567. This course is part of the Genomics curriculum, aimed to introduce core principles of Bioinformatics, Computational Genomics and Biostatistics. This is a blended course that combines in-class learning with self-directed activities.
Bioinformatics is interdisciplinary computer science, requiring a knowledge base in biology, technology, and statistics. The operational definition of ‘bioinformatics’ for this course is “The application of biotechnology and statistical methods to the study of biological problems.” The biological problems on which we will focus are gene expression studies conducted using microarray technology.
Course Objectives
- Gain insight into biological principles and the technology underlying custom spotted and oligonucleotide microarrays, image analysis, normalization, and expression summary methods
- Learn practical Exploratory Data Analysis, visualization, and quality control
- Critically evaluate ant interpret statistical methods used in genomics data analysis
- Apply supervised and unsupervised methods to genomics data
- Assess statistical significance when multiple hypothesis tests are performed, such as in the analysis of differential gene expression
- Interpret biological findings obtained from statistical tests At the conclusion of the course, students will be able to collect, analyze and interpret real data within R programming environment
Grading
- Class participation (20%)
- Reading and weekly homework assignment (50% grade composition). Homework is due two weeks from the date of assignment, unless specified otherwise. Late homework assignments will not receive any credit
- Problem solving, programming assignments, writing, oral presentations
- Final project (30%)
- The following scale is strictly observed (no rounding): A: 90-100, B: 80-89, C: <80
Use of R programming environment is required for homework assignments. When submitting homework both solutions and R code are required to be turned in. Instructions for generating reproducible reports with knitR/Markdown will be provided.
Class Rules
- Attendance is required
- Read all assignments before class
- Bring your laptop and the book to every class
- Observe the VCU Honor Pledge in any class- and homework activities
Course Outline
Topic | Reading | Links |
---|---|---|
Biological Background | Chapters 1, 2 | Intro, R and RStudio, Biology |
Microarray technology | Chapter 3 | Microarray technology, Microarray databases |
Image analysis | Chapter 5 | Image analysis |
Bioconductor overview | Chapter 7 | Bioconductor |
Quality control | Chapter 17, 19 | Quality control |
Normalization & Expression summarization | Chapter 20 | Normalization, Expression summarization |
Differential expression | Chapter 21 | Differential expression |
Multiple hypothesis testing | Chapter 16 | Multiple testing correction |
Unsupervised learning | Chapter 18 | Clustering |
Gene Ontology, Functional enrichment | Chapters 22, 23, 24 | Gene Ontology, Functional enrichment |
Methylation Analysis | Selected readings |