2. Tutorial (Basic Settings) - AmirUCR/allegro GitHub Wiki
This tutorial assumes that you have downloaded ALLEGRO and installed its dependencies, and that python src/main.py --soundcheck produces a success message and exits. To conduct an experiment using the default settings in your config.yaml file, simply execute the following:
python src/main.py
ALLEGRO will output the smallest gRNA library to target every record/gene in the 50 input files and place your library under data/output/ALLEGRO_EXAMPLE_RUN/ALLEGRO_EXAMPLE_RUN_library.txt.
Configuring ALLEGRO (Basic Settings)
There are two way to configure ALLEGRO: via command line arguments, or by directly modifying the config.yaml. If you specify any arguments via the command line, they override those in the config.yaml, and any unspecified arguments will default to those in the config. If you simply run python src/main.py, ALLEGRO uses all arguments in the config.
We will continue exploring the basic capabilities of ALLEGRO using the provided example input. First we will open config.yaml and modify a few parameters. Go ahead and modify the value of experiment_name (-n) as you wish. We will leave path settings as they are for now, but here's a brief description:
input_directoryor-id
- By default points to
'data/input/example_input'This is where your input fasta files live. There must be at least one fasta file with at least one fasta record in this directory which ALLEGRO will read as input.
input_species_pathor-isp
-
By default points to
'data/input/fifty_example_input_species.csv'You can create your own input species CSV file and point ALLEGRO to it. Note that ALLEGRO requires this CSV file to have at least two columns, one must be named'species_name'. You may name the second column whatever you wish to, however, the values of this second column must correspond to file names existing under the directory specified by theinput_directoryparameter above. For example, if yourinput_species.csvlooks like the following:species_name filename test_fasta my_test_fasta.fna and you have specified
input_directory: 'data/input/my_test_directory/', then your ALLEGRO file structure must look like:├── data │ ├── input │ │ ├── my_test_directory │ │ │ ├── my_test_fasta.fnaYou may also refer to the provided
data/input/fourdbs_input_species.csvto inspect the file we used for our experiments. Notice how you may have as many columns as you want, but ALLEGRO will only use'species_name'and the other column(s) specified in the config file.
input_species_path_columnor-ispc
- This tells ALLEGRO the name of the second required column in the
input_species_pathCSV file as described above. In the example above, we have two columns:'species_name', and'filename'. Therefore, the value for this parameter would be'filename'.
trackor-t, andmultiplicityor-m
-
The value for
trackMay be'track_a'or'track_e'(or simply'a'or'e'). By specifyingtrack: 'track_a'(andmultiplicity: 1), you require ALLEGRO to generate a guide RNA library that includes guides targeting anywhere in each input fasta file at least once. Increasing themultiplicityparameter increases the required number of guides per input fasta. By specifyingtrack: 'track_e', you require ALLEGRO to target each gene/record in each input fasta file at least once, increased by themultiplicity.In the most trivial example, using
track: 'track_a'andmultiplicity: 1on the followingmy_test_fasta.fnainput>gene1 AAAAAAAAAAAAAAAAAAAATGG|TTTTTTTTTTTTTTTTTTTTTGG >gene2 ACACACACACACACACACACTGG|CCCCCCCCCCCCCCCCCCCCTGGyields a single guide as output, whereas using
track: 'track_e'andmultiplicity: 1yields 2 guides, one to targetgene1and onegene2. Usingtrack: 'track_e'andmultiplicity: 2yields 4 guides with 2 guides per gene. Using a higher multiplicity in this example causes ALLEGRO to warn you that not enough guides exist, and gracefully exit. Note that you may mark the boundary of an intron and exon via the pipe|character. Guides that are split by (or span through) this delimiter are ignored by ALLEGRO.
filter_by_gcor-gc
- Dictates whether the guides output by ALLEGRO should be excluded if their GC content falls outside of the specified range. For example, if
gc_max: 0.7, a guide with a GC content of 0.71 is excluded while a guide with GC: 0.7 is included. This is a booleanTrue/Falsevalue. The value for this filter is not a string ('False'with the quotation marks is not valid), and it must be capitalized (falseis not valid).gc_maxandgc_minare floating point values.
Manually Excluding Certain Guides
When you navigate to data/input, you will see an empty text file called _the_blocklist_.txt. Place any guides without its NGG PAM inside, separated by line breaks, for ALLEGRO to ignore in its calculation.