GenomeRunner helps to interpret potential regulatory effect of SNPs (features of interest, FOIs) by identifying functional elements (aka (epi)genomic features, GFs) most statistically significantly co-localized with them (see Enrichment analysis).
If one analyzes three or more sets of SNPs, such as SNPs from different individuals, populations, diseases, GenomeRunner visualizes their regulatory similarity (see Regulatory Similarity analysis). This information may be used, e.g., to group patients by similarity of their individual sets of genomic variants within sell type-specific regulatory landscapes.
Use tab-separated text files with genomic coordinates of the SNPs of interest in BED format, see examples. As a bare minimun, chromosome, start, and end coordinates should be provided. One can upload BED file(s), or copy-paste tab-separated coordinates.
Note: a set of SNPs should contain at least 5 SNPs to be eligible for the analysis. Genomic coordinates should be 0-bases. The end coordinate should equal start coordinate + 1.
Note: lists of rsIDs (e.g., rs2789489, rs4360154, rs630642), each rsID is on a separate line, can be submitted.
Sure. Several buttons on the front page will select pre-defined sets of SNPs for the analysis. For Homo Sapiens these include (To be updated):
Pre-defined sets of SNPs | What is it | When to use |
---|---|---|
gwasCatalog | Sets of disease- and trait-associated SNPs from gwasCatalog. Each SNP set has 15 or more SNPs. | To investigate enrichments and regulatory similarity among all disease- and trait-associated SNP sets. Use with gwasCatalog background. |
gwasCatalog_vs_DGV | Selected set of disease- and trait-associated SNPs from gwasCatalog. | Use for demo purposes, to be run against structural variants from DGV. Use with gwasCatalog background. |
gwasCatalog_vs_H3K4me3 | Selected set of disease- and trait-associated SNPs from gwasCatalog. | Use for demo purposes, to be run against tissue-specific H3K4me3 histone methylation mark from Trynka-Raychaudhuri paper. Use with gwasCatalog background. |
See the Background section. In short, the background is a “universe” of all SNPs assessed in a study, from which the SNPs of interest came from. Several pre-defined background sets are provided, for Homo Sapiens these include:
Pre-defined background | When to use |
---|---|
snp141 (All Simple Nucleotide Polymorphisms (dbSNP 141)) | For sets of SNPs from whole-genome GWA studies |
snp141Common (Simple Nucleotide Polymorphisms (dbSNP 138) Found in >= 1% of Samples) | For sets of SNPs from studies where rare variants were ignored |
gwascatalog (NHGRI Catalog of Published Genome-Wide Association Studies) | For demo testing, to observe regulatory associations of disease-specific sets of SNPs, as compared with randomly selected SNPs from all GWAScatalog |
For a GWAS, the background is likely to be all SNPs (snp141 for Homo Sapiens). For a study using microarrays, the background should contain coordinates of all SNPs on the microarray - upload or copy/paste them.
Note: The SNPs of interest should be a subset of the background SNPs. If some SNPs of interest do not overlap the background, a non-critical error is issued. Use BEDtools for creating custom backgrounds and for any other manipulations with the BED files.
Regulatory datasets are sets of discrete regions potentially having functional/regulatory properties. Vast majority of these data were experimentally obtained by the ENCODE project.
Don't panic. The genome annotation features are organized by categories mirrored from the UCSC genome browser (see Database structure). Use search box and/or checkboxes in the TreeView control to select one or more categories of regulatory datasets. Clicking on a regulatory dataset’ name will bring up description, if available.
The ENCODE data are organized by source/data type, tiers (quality), and by cell types. Hint: Several well-known/specially processes genome annotation features sets are brought forward as “default genome annotation features”. For Homo Sapiens these include:
Genome annotation category | Experimental question: Are the SNPs of interest... |
---|---|
altSplicing (Alternative Splicing, Alternative Promoter and Similar Events in UCSC Genes, split by splicing type) | ... potentially disrupt a specific type of alternative spliced regions? |
chromStates (Chromatin State Segmentation by HMM from ENCODE/Broad, Gm12878 cell line, split by chromatin state type) | ... preferentially located in certain chromatin states? |
coriellVariants (Coriell Cell Line Copy Number Variants, split by cell types) | ... enriched in CNVs, and in which cell type? |
DGV (Gap locations) | ... happen to be in gaps, telomeres, heterochromatin regions? |
genomicVariants (Database of Genomic Variants: Structural Variation (CNV, Inversion, In/del), split by variant type) | ... enriched in CNVs, or other types of structural variations? |
H3K4me3 (Tissue-specific histone 3 lysine 4 trimethylation marks) | ... enriched in tissue-specific active transcription-associated histone mark? |
ncRNAs (C/D and H/ACA Box snoRNAs, scaRNAs, and microRNAs from snoRNABase and miRBase, split by ncRNA type) | ... associated with a class of non-coding elements? |
nestedRepeats (Fragments of Interrupted Repeats Joined by RepeatMasker ID, split by repeat class) | ... enriched in regions of low complexity, and in which type? |
tfbsConserved (HMR Conserved Transcription Factor Binding Sites, split by TFBS name) | ... potentially disrupt a specific computationally predicted transcription factor binding site? |
tfbsEncode (Transcription Factor ChIP-seq Clusters V3 (161 targets, 189 antibodies) from ENCODE, split by TFBS name) | ... potentially disrupt a specific experimentally defined transcription factor binding site? |
Examples of what to choose:
tfbsEncode
category to get an answer whether the SNPs of interest are enriched in any of the 161 transcription factor binding sites identified by ChIP-seq.H3K4me3
category to get an insight whether the SNPs of interest are enriched in H3K4me3 histone mark, and in which tissue/cell type.genes
category to answer a question whether the SNPs of interest are enriched in genes/exons.The original version of GenomeRunner, hosted on SourceForge, was designed as an "all purpose" tool. It has several advantages over the web interface, as well as disadvantages, such as learning curve, various non-obvious settings, need to download large databases, complicated database maintenance, restriction to Windows platform, lacking visualization capabilities. GenomeRunner web server addresses these issues - its key functionality includes:
The database update time is shown in the drop-down menu for database version selection. Last time we updated and optimized the hg19 and mm9 databases on December 1, 2014. The following database updates are scheculed every 6-month intervals.
No! We plan to expand the database, without sacrificing its categorical structure, with other datasets, such as from the Roadmap Epigenomics project. In the current release, we added the data from Nuclear Receptor Cistrome DB.
If you find GenomeRunner useful, please, cite the 2012 paper. The manuscript describing the web version is submitted.
Please, contact