Clustering
Hierarchical Clustering slides, HTML, PDF, Rmd
Non-Hierarchical Clustering slides, HTML, PDF, Rmd
Clustering QC slides, HTML, PDF, Rmd
Exercises
-
Single linkage clustering example, Single_Linkage.R
-
Complete and average linkage, Complete_Linkage.R, Average_linkage.R
Examples
-
Multidimensional Scaling, J.B. Kruskal. AT&T Bell Laboratories (1962), http://stat-graphics.org/movies/multidimensional-scaling.html
-
More PCA examples https://genomicsclass.github.io/book/pages/pca_svd.html
References
-
Google scholar links https://scholar.google.com/scholar?hl=en&q=cluster+analysis&btnG
-
Hierarchical clustering videos, https://www.youtube.com/playlist?list=PLBv09BD7ez_7qIbBhyQDr-LAKWUeycZtx
-
Distances in
vegdist
function fromvegan
package http://www.pmc.ucsc.edu/~mclapham/Rtips/cluster.htm -
CRAN overview of clustering functions http://cran.at.r-project.org/web/views/Cluster.html
-
76 formulas for measuring (dis)similarity between binary vectors. Choi, Seung-Seok, Sung-Hyuk Cha, and Charles C Tappert. “A Survey of Binary Similarity and Distance Measures.” Journal of Systemics, Cybernetics and Informatics 2010. http://www.baskent.edu.tr/~hogul/binary.pdf
-
Lecture notes about hierarchical clustering, agglomerative, steps. Ward’s method. Single link, complete. http://www.stat.cmu.edu/~cshalizi/350/lectures/07/lecture-07.pdf, http://www.stat.cmu.edu/~cshalizi/350/lectures/08/lecture-08.pdf
-
Chen et.al. “EVALUATION AND COMPARISON OF CLUSTERING ALGORITHMS IN ANGLYZING ES CELL GENE EXPRESSION DATA” http://www3.stat.sinica.edu.tw/statistica/oldpdf/A12n112.pdf
-
Ronan et.al. “Avoiding Common Pitfalls When Clustering Biological Data.” Science Signaling 2016 http://stke.sciencemag.org/content/9/432/re6.long
-
Overview of dimensionality reduction techniques, Onderwater, Martijn. “Outlier Preservation by Dimensionality Reduction Techniques.” IJDATS 2015 http://www.inderscience.com/offer.php?id=71365
-
PCA statistics http://users.ics.aalto.fi/jhollmen/dippa/node30.html, https://onlinecourses.science.psu.edu/stat505/node/51
-
Relationship between SVD and PCA https://stats.stackexchange.com/questions/134282/relationship-between-svd-and-pca-how-to-use-svd-to-perform-pca
-
NMF, nonnegative matrix factorization for gene expression studies. Brunet et.al. “Metagenes and Molecular Pattern Discovery Using Matrix Factorization.” PNAS, 2004. http://www.pnas.org/content/101/12/4164.long
-
Biclustering, Pontes et. al. “Biclustering on Expression Data: A Review.” Journal of Biomedical Informatics, 2015 http://www.sciencedirect.com/science/article/pii/S1532046415001380
-
Videos (~15min) explaining Hierarchical Agglomerative Clustering (https://youtu.be/OcoE7JlbXvY?list=PLaXDtXvwY-oDvedS3f4HW0b4KxqpJ_imw), K-Means Clustering (https://youtu.be/mfqmoUN-Cuw?list=PLaXDtXvwY-oDvedS3f4HW0b4KxqpJ_imw), Gaussian Mixture Models and EM (https://youtu.be/qMTuMa86NzU?list=PLaXDtXvwY-oDvedS3f4HW0b4KxqpJ_imw), PCA, SVD ([https://youtu.be/F-nfsSq42ow?list=PLaXDtXvwY-oDvedS3f4HW0b4KxqpJ_imw])
Selected R packages
-
ConsensusClusterPlus
- cluster count and membership, https://www.bioconductor.org/packages/release/bioc/html/ConsensusClusterPlus.html -
sigclust
- Statistical Significance of Clustering, https://cran.r-project.org/web/packages/sigclust/index.html -
pvclust
- An R package for hierarchical clustering with p-values, http://www.sigmath.es.osaka-u.ac.jp/shimo-lab/prog/pvclust/ -
dynamicTreeCut
- Methods for Detection of Clusters in Hierarchical Clustering Dendrograms, https://labs.genetics.ucla.edu/horvath/CoexpressionNetwork/BranchCutting/, https://cran.r-project.org/web/packages/dynamicTreeCut/index.html -
WGCNA
- Weighted Correlation Network Analysis, https://labs.genetics.ucla.edu/horvath/CoexpressionNetwork/Rpackages/WGCNA/, https://cran.r-project.org/web/packages/WGCNA/index.html
Datasets
- nci60.tsv - cell types can be clustered