First, create and go to a folder named, e.g., db_2.00_06.10.2014. A database name consists of the db prefix, the 2.00 database version, and the 06.10.2014 date when the database was created. Note that throughout this guide [dir]
represents the full path to a database (i.e [dir] = /home/testuser/Documents/db_2.00_06.10.2014
). Do not use ~
shortcuts.
mkdir [dir]
cd [dir] # Recommended: the `dbcreator*` module should be executed within the [dir] folder
python -m grsnp.dbcreator_ucsc -g hg19 -d [dir]
Executing the dbcreator_ucsc
module without any arguments will show short help text:
usage: python -m grsnp.dbcreator_ucsc [-h] --data_dir [DATA_DIR]
[--organism [ORGANISM]]
[--featurenames [FEATURENAMES]] [--max [MAX]]
[--galaxy] [--score [SCORE]] [--scoreonly]
Using the --help argument will show detailed help.
The --data_dir, or -d argument designates full path to the [dir]
where the database is to be installed.
The --organism, or -g argument specifies organism code and genome assembly version. Organism-specific genome annotation data are placed under appropriate subfolders and automatically processed by the server
module. Examples of organism codes: hg19, mm9, rn4, danRer7, dm3, ce6, sacCer3.
The --featurenames, or -f argument will process one, or a comma-separatet list of regulatory datasets. Useful when one knows which dataset to get. Example: evofold,gwasCatalog.
The --max, or -m argument limits the number of regulatory datasets in each category to be processed. Useful when testing module' functionality.
The --score, or -s argument specifies score percentiles for regulatory datasets filtering. Default: 25,50,75 percentiles.
The --filteronly, or -o argument is used when the main, unfiltered, database has been created. This argument triggers regulatory datasets filtering.