Parsing - sungsik-kong/PhyNEST.jl GitHub Wiki
Function readPhylip(args) (args means arguments) parses the input alignment and stores the observed site pattern frequencies for every quartet (i.e., combination of four taxa (or sequences)) from the data along with other relevant information in the form of julia object. readPhylip(args) can have multiple arguments. What are they?
- The file name of the sequence alignment. For example,
julia> phylip_data = readPhylip("filename.phy")
-
showProgress=true/false- The boolean argument
showProgressvisualizes the process of data parsing and estimated remaining time.showProgress=trueby default.
- The boolean argument
-
checkpoint=true/false- The boolean argument
checkpointcreates a.ckpfile in the working directory upon the completion of the data parsing. This file will have the same name as the input alignment file with an extension.ckp. By default,checkpoint=false. See here for more information about checkpointing.
- The boolean argument
-
writecsv=true/false- The boolean argument
writecsvcreates a.csvfile in the working directory upon the completion of the data parsing. This.csvfile contains observed site pattern frequencies extracted from the data for every quartet.writecsv=falseby default.
- The boolean argument
-
csvname=""-
csvnameallows users to change the name of the.csvfile created whenwritecsv=true. Desired name can be specified in the quote. Ifcsvnameis not specified, the output.csvfile will have the same name as the input alignment file with a prefixsitePatternCounts_.
-
Let's say we want to use the function readPhylip(args) to parse the input alignment sample_n5h1.phy located in PhyNEST.jl.wiki/example-data and name the data object as phylip_data. Using the optional arguments, we disallow visualizing the progress bar, and create .ckp and .csv files upon completion, where the .csv file will have the name sample_n5h1.csv. Can you guess the command?
Click here to see the command
julia> phylip_data = readPhylip("sample_n5h1.phy", showProgress=false, checkpoint=true, writecsv=true, csvname="sample_n5h1")
When you are ready, let's execute the command. This should take less than a minute.
Click here to see the output
julia> phylip_data = readPhylip("sample_n5h1.phy", showProgress=false, checkpoint=true, writecsv=true, csvname="sample_n5h1")
A [.csv] file is saved as sample_n5h1.csv.csv in the current working directory.
Summary of Phylip File
Parsing the file [sample_n5h1.phy] took 26.123 seconds.
Number of taxa: 5
Species names: ["outgroup", "species_4", "species_3", "species_1", "species_2"]
Alignment length (b.p): 1000000
Site patterns frequencies for 120 quartets computed and stored.
Try `show_sp()` function to see all quartet site patterns.
Now, we have parsed the input sequence alignment and computed observed quartet site pattern frequencies. But what does this mean?
Next: Observed site patterns and Checkpointing