Parsing - sungsik-kong/PhyNEST.jl GitHub Wiki
Function readPhylip(args)
(args
means arguments) parses the input alignment and stores the observed site pattern frequencies for every quartet (i.e., combination of four taxa (or sequences)) from the data along with other relevant information in the form of julia
object. readPhylip(args)
can have multiple arguments. What are they?
- The file name of the sequence alignment. For example,
julia> phylip_data = readPhylip("filename.phy")
-
showProgress=true/false
- The boolean argument
showProgress
visualizes the process of data parsing and estimated remaining time.showProgress=true
by default.
- The boolean argument
-
checkpoint=true/false
- The boolean argument
checkpoint
creates a.ckp
file in the working directory upon the completion of the data parsing. This file will have the same name as the input alignment file with an extension.ckp
. By default,checkpoint=false
. See here for more information about checkpointing.
- The boolean argument
-
writecsv=true/false
- The boolean argument
writecsv
creates a.csv
file in the working directory upon the completion of the data parsing. This.csv
file contains observed site pattern frequencies extracted from the data for every quartet.writecsv=false
by default.
- The boolean argument
-
csvname=""
-
csvname
allows users to change the name of the.csv
file created whenwritecsv=true
. Desired name can be specified in the quote. Ifcsvname
is not specified, the output.csv
file will have the same name as the input alignment file with a prefixsitePatternCounts_
.
-
Let's say we want to use the function readPhylip(args)
to parse the input alignment sample_n5h1.phy
located in PhyNEST.jl.wiki/example-data
and name the data object as phylip_data
. Using the optional arguments, we disallow visualizing the progress bar, and create .ckp
and .csv
files upon completion, where the .csv
file will have the name sample_n5h1.csv
. Can you guess the command?
Click here to see the command
julia> phylip_data = readPhylip("sample_n5h1.phy", showProgress=false, checkpoint=true, writecsv=true, csvname="sample_n5h1")
When you are ready, let's execute the command. This should take less than a minute.
Click here to see the output
julia> phylip_data = readPhylip("sample_n5h1.phy", showProgress=false, checkpoint=true, writecsv=true, csvname="sample_n5h1")
A [.csv] file is saved as sample_n5h1.csv.csv in the current working directory.
Summary of Phylip File
Parsing the file [sample_n5h1.phy] took 26.123 seconds.
Number of taxa: 5
Species names: ["outgroup", "species_4", "species_3", "species_1", "species_2"]
Alignment length (b.p): 1000000
Site patterns frequencies for 120 quartets computed and stored.
Try `show_sp()` function to see all quartet site patterns.
Now, we have parsed the input sequence alignment and computed observed quartet site pattern frequencies. But what does this mean?
Next: Observed site patterns and Checkpointing