API Reference - xflouris/libpll GitHub Wiki

Partition management

Partition parameters setup

Transition probability matrices

Invariant sites

Conditional probability vectors

Likelihood computation

Likelihood derivatives

Ascertainment bias correction

Alignment parsing

Unrooted tree parsing and manipulation

Rooted tree parsing and manipulation

Auxiliary functions

Data structures


List of functions

pll_partition_create()
pll_partition_t * pll_partition_create(unsigned int tips, 
                                       unsigned int clv_buffers, 
                                       unsigned int states, 
                                       unsigned int sites,
                                       unsigned int rate_matrices, 
                                       unsigned int prob_matrices, 
                                       unsigned int rate_cats, 
                                       unsigned int scale_buffers, 
                                       unsigned int attributes);

Creates a partition with either tips character arrays or tips CLV arrays (depending on attributes, see Partition Attributes), and, additionally, clv_buffers CLV vectors, for storing conditional probabilities at inner nodes. The partition structure is constructed for states number of states (e.g. 4 for nucleotide and 20 for amino-acid data) and sufficient space is allocated to host an alignment of size sites*tips. The number of rate matrices that can be used is given by rate_matrices. Additionally, the function allocates space for hosting rate_matrices arrays of substitution parameters, frequencies, and auxiliary eigen-decomposition arrays (transparent to the user). The parameter prob_matrices dictates the number of probability matrices for which space will be allocated. This parameter is typically set to the number of branches the tree has (e.g., 2n-3 for unrooted and 2n-2 for rooted, where n is the number of tips/leaves). libpll will automatically create space for prob_matrices*rate_cats, where rate_cats is the number of different rate categories. The array of probability matrices is indexed from 0 to prob_matrices-1. Each matrix entry consists of sufficient space to accommodate rate_cats matrices, which are stored consecutively in memory. Note that libpll will not allocate space for the different substitution matrices specified by rate_matrices. The user must indicate that to libpll by multiplying prob_matrices with the corresponding factor. Finally, scale_buffers sets the number of scaling buffers to be allocated, and attributes states the hardware acceleration options to be used (see Partition Attributes). The function returns a pointer to the allocated pll_partition_t structure. Note that, rate_matrices are used to address heterotachy, i.e. transition probability matrices computed from different rate matrices. For more information, see [Updating transition probability matrices](see https://github.com/xflouris/libpll/wiki/Updating-transition-probability-matrices).

pll_partition_destroy()
void pll_partition_destroy(pll_partition_t * partition);

Deallocates all data associated with the partition pointed by partition.

pll_set_tip_states()
int pll_set_tip_states(pll_partition_t * partition,
                       unsigned int tip_index,
                       const unsigned int * map,
                       const char * sequence);

Set the tip CLV (or tip character array) with index tip_index of instance partition, according to the character sequence sequence and the conversion table map, which translates (maps) characters to states. For an example see Setting CLV vectors at tips from sequences and maps.

pll_set_tip_clv()
void pll_set_tip_clv(pll_partition_t * partition,
                     unsigned int tip_index,
                     const double * clv);

Set the tip CLV with index tip_index of instance partition, to the contents of the array clv. For an example see Setting CLV vectors manually. Note, this function cannot be used in conjuction with the PLL_ATTRIB_PATTERN_TIP (see Partition Attributes).

pll_set_subst_params()
void pll_set_subst_params(pll_partition_t * partition,
                          unsigned int params_index,
                          const double * params);

Set the parameters for substitution model with index params_index, where params_index ranges from 0 to rate_matrices-1, as specified in the pll_partition_create() call. Array params should contain exactly (states x states - states) / 2 parameters of type double. These values correspond to the upper triangle elements (above the main diagonal) of the rate matrix.

pll_set_frequencies()
void pll_set_frequencies(pll_partition_t * partition,
                         unsigned int params_index,
                         const double * frequencies);

Sets the base frequencies for the substitution model with index params_index, where params_index ranges from 0 to rate_matrices-1, as specified in the pll_partition_create() call. The array of base frequencies (frequencies) is copied into the instance. The order of bases in the array depends on the encoding used when converting tip sequences to CLV. For example, if the pll_map_nt map was used with the pll_set_tip_states() function to describe nucleotide data, then the order is A, C, G, T. However, this can be arbitrarily set by adjusting the provided map.

pll_set_pattern_weights()
void pll_set_pattern_weights(pll_partition_t * partition,
                             const unsigned int * pattern_weights);

Sets the vector of pattern weights (pattern_weights) for partition. The function reads and copies the first partition->sites elements of pattern_weights into partition->pattern_weights.

pll_set_category_rates()
void pll_set_category_rates(pll_partition_t * partition,
                            const double * rates);

Sets the rate categories for partition. The function reads and copies the first partition->rate_cats elements of array rates into partition->rates.

pll_update_invariant_sites()
int pll_update_invariant_sites(pll_partition_t * partition);

Updates the invariant sites array partition->invariant, according to the sequences in the partition. This function is implicitly called by pll_update_invariant_sites_proportion when the specified proportion of invariant sites is greater than zero, but it must be explicitly called by the client code if the sequences change.

pll_update_invariant_sites_proportion()
int pll_update_invariant_sites_proportion(pll_partition_t * partition,
                                          unsigned int params_index,
                                          double prop_invar);

Updates the proportion of invariant sites for the rate matrix of partition with index params_index. Note that, this call will not implicitly update the transition probability matrices computed from the particular rate matrix, but must be done explicitly for example with a call to pll_update_prob_matrices().

pll_update_prob_matrices()
void pll_update_prob_matrices(pll_partition_t * partition,
                              const unsigned int * params_indices,
                              const unsigned int * matrix_indices,
                              const double * branch_lengths,
                              unsigned int count);

Computes the transition probability matrices specified by the count indices in matrix_indices, for all rate categories. A matrix with index matrix_indices[i] will be computed using the branch length branch_lengths[i]. To compute the matrix for rate category j, the function uses the rate matrix with index params_indices[j]. Matrices are stored in partition->pmatrix[matrix_indices[i]]. Note that, each such entry holds the matrices for all rate categories, stored consecutively in memory.

pll_show_pmatrix()
void pll_show_pmatrix(pll_partition_t * partition,
                      unsigned int index,
                      unsigned int float_precision);

Prints the transition probability matrices for each rate category of partition associated with index to standard output. The floating point precision is dictated by float_precision.

pll_update_eigen()
void pll_update_eigen(pll_partition_t * partition,
                      unsigned int params_index);

Updates the eigenvectors (partition->eigenvecs[params_index]), inverse eigenvectors (partition->eigenvecs[params_index]), and eigenvalues (partition->eigenvals[params_index]) using the substitution parameters (partition->subst_params[params_index]) and base frequencies (partition->frequencies[params_index]) specified by params_index.

pll_count_invariant_sites()
void pll_count_invariant_sites(pll_partition_t * partition,
                               unsigned int * state_inv_count);

Returns the number of invariant sites in the sequence alignment from partition. The array state_inv_count must be of size partition->states and is filled such that entry i contains the count of invariant sites for state i.

pll_update_partials()
void pll_update_partials(pll_partition_t * partition,
                         const pll_operation_t * operations,
                         unsigned int count);

Updates the count conditional probability vectors (CPV) defined by the entries of operations, in the order they appear in the array. Each operations entry describes one CPV from partition. See also pll_operation_t.

pll_compute_root_loglikelihood()
double pll_compute_root_loglikelihood(pll_partition_t * partition,
                                      unsigned int clv_index,
                                      int scaler_index,
                                      const unsigned int * freqs_index,
                                      double * persite_lnl);

Evaluates the log-likelihood of a rooted tree, for the vector of conditional probabilities (partials) with index clv_index, scale buffer with index scaler_index (or PLL_SCALE_BUFFER_NONE), and base frequencies arrays with indices freqs_index (one per rate category). If persite_lnl is not NULL, then it must be large enough to hold partition->sites double-precision values, and will be filled with the per-site log-likelihoods.

pll_compute_edge_loglikelihood()
double pll_compute_edge_loglikelihood(pll_partition_t * partition,
                                      unsigned int parent_clv_index,
                                      int parent_scaler_index,
                                      unsigned int child_clv_index,
                                      int child_scaler_index,
                                      unsigned int matrix_index,
                                      const unsigned int * freqs_index,
                                      double * persite_lnl);

Evaluates the log-likelihood of an unrooted tree, by providing the conditional probability vectors (partials) for two nodes that share an edge with indices parent_clv_index resp. child_clv_index, scale buffers with indices parent_scaler_index resp. child_clv_index (or PLL_SCALE_BUFFER_NONE), the transition probability matrix with index matrix_index and base frequencies arrays with indices freqs_index (one per rate category). If persite_lnl is not NULL, then it must be large enough to hold partition->sites double-precision values, and will be filled with the per-site log-likelihoods.

pll_update_sumtable()
int pll_update_sumtable(pll_partition_t * partition,
                        unsigned int parent_clv_index,
                        unsigned int child_clv_index,
                        const unsigned int * params_indices,
                        double *sumtable);

Updates the sumtable (required for computing the likelihood derivatives) at the specified branch.

  • sumtable must be allocated beforehand for partition->sites x partition->rates x partition->states_padded x sizeof(double).
  • Partial likelihoods at parent_clv_index and child_clv_index must be updated.
pll_compute_likelihood_derivatives()
int pll_compute_likelihood_derivatives(pll_partition_t * partition,
                                       int parent_scaler_index,
                                       int child_scaler_index,
                                       double branch_length,
                                       const unsigned int * params_indices,
                                       const double * sumtable,
                                       double * d_f,
                                       double * dd_f);

Computes the first and second likelihood derivatives.

  • sumtable must be updated (pll_update_sumtable) and consistent with parent_scaler_index and child_scaler_index.
  • Derivatives are returned in d_f (first) and dd_f (second).
pll_set_asc_bias_type()
int pll_set_asc_bias_type(pll_partition_t * partition,
                          int asc_bias_type);

Set the ascertainment bias correction type to one of the following:

PLL_ATTRIB_ASC_BIAS_LEWIS
PLL_ATTRIB_ASC_BIAS_FELSENSTEIN
PLL_ATTRIB_ASC_BIAS_STAMATAKIS

Partition must have been created with PLL_ATTRIB_ASC_BIAS_MASK flag active. Otherwise, this function returns an error.

pll_set_asc_state_weights()

Set the number of invariant sites for each of the states when ascertainment bias correction is active.

void pll_set_asc_state_weights(pll_partition_t * partition,
                               const unsigned int * state_weights);
pll_fasta_open()
pll_fasta_t * pll_fasta_open(const char * filename,
                             const unsigned int * map);
pll_fasta_getnext()
int pll_fasta_getnext(pll_fasta_t * fd, char ** head,
                      long * head_len,  char ** seq,
                      long * seq_len, long * seqno);
pll_fasta_close()
void pll_fasta_close(pll_fasta_t * fd);
pll_fasta_rewind()
int pll_fasta_rewind(pll_fasta_t * fd);
pll_fasta_getfilesize()
long pll_fasta_getfilesize(pll_fasta_t * fd);
pll_fasta_getfilepos()
long pll_fasta_getfilepos(pll_fasta_t * fd);
pll_utree_parse_newick()
pll_utree_t * pll_utree_parse_newick(const char * filename,
                                     unsigned int * tip_count);
pll_utree_destroy()
void pll_utree_destroy(pll_utree_t * root);
pll_utree_show_ascii()
void pll_utree_show_ascii(pll_utree_t * tree, int options);
pll_utree_export_newick()
char * pll_utree_export_newick(pll_utree_t * root);
pll_utree_traverse()
int pll_utree_traverse(pll_utree_t * root,
                       int (*cbtrav)(pll_utree_t *),
                       pll_utree_t ** outbuffer,
                       unsigned int * trav_size);
pll_utree_query_tipnodes()
unsigned int pll_utree_query_tipnodes(pll_utree_t * root,
                                      pll_utree_t ** node_list);
pll_utree_query_innernodes()
unsigned int pll_utree_query_innernodes(pll_utree_t * root,
                                        pll_utree_t ** node_list);
pll_utree_create_operations()
void pll_utree_create_operations(pll_utree_t ** trav_buffer,
                                 unsigned int trav_buffer_size,
                                 double * branches,
                                 unsigned int * pmatrix_indices,
                                 pll_operation_t * ops,
                                 unsigned int * matrix_count,
                                 unsigned int * ops_count);
pll_utree_clone()
pll_utree_t * pll_utree_clone(pll_utree_t * root);
pll_utree_reset_template_indices()
void pll_utree_reset_template_indices(pll_utree_t * node,
                                      unsigned int tip_count);
pll_utree_every()
int pll_utree_every(pll_utree_t * node,
                    int (*cb)(pll_utree_t *));
pll_utree_check_integrity()
int pll_utree_check_integrity(pll_utree_t * root);

PLL_EXPORT int pll_utree_spr(pll_utree_t * p, pll_utree_t * r, pll_utree_rb_t * rb, double * branch_lengths, unsigned int * matrix_indices);

pll_utree_spr()
int pll_utree_spr_safe(pll_utree_t * p,
                       pll_utree_t * r,
                       pll_utree_rb_t * rb,
                       double * branch_lengths,
                       unsigned int * matrix_indices);
pll_utree_nni()
int pll_utree_nni(pll_utree_t * p,
                  int type,
                  pll_utree_rb_t * rb);
pll_utree_rollback()
int pll_utree_rollback(pll_utree_rb_t * rollback,
                       double * branch_lengths,
                       unsigned int * matrix_indices);
pll_rtree_parse_newick()
pll_rtree_t * pll_rtree_parse_newick(const char * filename,
                                     unsigned int * tip_count);
pll_rtree_destroy()
void pll_rtree_destroy(pll_rtree_t * root);
pll_rtree_show_ascii()
void pll_rtree_show_ascii(pll_rtree_t * tree, int options);
pll_rtree_export_newick()
char * pll_rtree_export_newick(pll_rtree_t * root);
pll_rtree_traverse()
int pll_rtree_traverse(pll_rtree_t * root,
                       int (*cbtrav)(pll_rtree_t *),
                       pll_rtree_t ** outbuffer,
                       unsigned int * trav_size);
pll_rtree_query_tipnodes()
unsigned int pll_rtree_query_tipnodes(pll_rtree_t * root,
                                      pll_rtree_t ** node_list);
pll_rtree_query_innernodes()
unsigned int pll_rtree_query_innernodes(pll_rtree_t * root,
                                        pll_rtree_t ** node_list);
pll_rtree_create_operations()
void pll_rtree_create_operations(pll_rtree_t ** trav_buffer,
                                 unsigned int trav_buffer_size,
                                 double * branches,
                                 unsigned int * pmatrix_indices,
                                 pll_operation_t * ops,
                                 unsigned int * matrix_count,
                                 unsigned int * ops_count);
pll_utree_t * pll_rtree_unroot()
pll_utree_t * pll_rtree_unroot(pll_rtree_t * root);
pll_compute_gamma_cats()
int pll_compute_gamma_cats(double alpha,
                           unsigned int categories,
                           double * output_rates);
pll_show_clv()
void pll_show_clv(pll_partition_t * partition,
                  unsigned int clv_index,
                  int scaler_index,
                  unsigned int float_precision);

Prints to standard output the conditional probability vector for index clv_index from partition, using the scale buffer with index scaler_index. If no scale buffer was used, then scaler_index must be passed the value PLL_SCALE_BUFFER_NONE. The floating precision (number of digits after decimal point) is dictated by float_precision. The output contains brackets, curly braces and round brackets to separate the values as sites, rate categories and states related, respectively.

pll_dlist_append()
int pll_dlist_append(pll_dlist_t ** dlist, void * data);
pll_dlist_prepend()
int pll_dlist_prepend(pll_dlist_t ** dlist, void * data);
pll_dlist_remove()
int pll_dlist_remove(pll_dlist_t ** dlist, void * data);
pll_utree_query_tipnodes()
int pll_query_utree_tipnodes(pll_utree_t * root,
                             pll_utree_t ** node_list);

Allocates and returns an array tips elements long of aliases to the tip labels in the pll_utree_t structs of the tree. The caller must free this array. The order of labels comes from a traversal of the tree

Data structures

pll_partition_t
typedef struct
{  
  unsigned int tips;
  unsigned int clv_buffers;
  unsigned int states;
  unsigned int sites;
  unsigned int pattern_weight_sum;
  unsigned int rate_matrices;
  unsigned int prob_matrices;
  unsigned int rate_cats;
  unsigned int scale_buffers;
  unsigned int attributes;

  const unsigned int * map;

  /* vectorization options */
  size_t alignment;
  unsigned int states_padded;

  double ** clv;
  double ** pmatrix;
  double * rates;
  double * rate_weights;
  double ** subst_params;
  unsigned int ** scale_buffer;
  double ** frequencies;
  double * prop_invar;
  int * invariant;
  unsigned int * pattern_weight;

  int * eigen_decomp_valid;
  double ** eigenvecs;
  double ** inv_eigenvecs;
  double ** eigenvals;

  /* tip-tip precomputation data */
  unsigned int maxstates;
  unsigned char ** tipchar;
  unsigned char * charmap;
  double * ttlookup;
  unsigned int * tipmap;

  /* ascertainment bias correction */
  int asc_bias_alloc;
} pll_partition_t;
Argument Description
tips Number of tip sequences (one CLV per tip will be allocated)
clv_buffers Number of extra CLV arrays to allocate (typically for inner nodes)
states Number of states for the data in this partition will have (typically 4 for nucleotide, 20 for aminoacid)
sites Number of sites that this partition will hold (length of alignment)
`pattern_weight_sum Sum of pattern weights (pattern_weights vector)
rate_matrices Number of different substitution matrices that can be used simultaneously
prob_matrices Number of probability matrices to create. Each entry will hold rate_cats matrices. Each matrix has size states x states
rate_cats Number of rate categories
scale_buffers Number of scale buffers to allocate (not yet implemented)
attributes Hardware acceleration options (not yet implemented)
map TODO
alignment The address of allocated memory for CLVs/matrices will always be a multiple of alignment (libpll inner use)
states_padded Number of states padded according to the vectorization
clv Array of conditional likelihood vectors for tips and inner nodes. The array has exactly tips + clv_buffers CLVs. Each CLV has sites x states x rate_cats elements of size double
pmatrix Array of exactly prob_matrices probability matrix arrays. Each probability matrix array is of size states x states x rate_cats elements of size double
rates Array of rate_matrices rate arrays. Each rate array is of size rate_cats elements of size double
rate_weights Weight of each rate category, summing to 1. For regular discrete gamma rates heterogeneity, the weights are equal
subst_params Array of rate_matrices substitution rate arrays. Each substitution rate array holds exactly (states x states - states) / 2 substitution rates
scale_buffer Array of scale buffers (not yet implemented)
frequencies Array of rate_matrices arrays of base frequencies. Each array contains exactly states elements of size double
prop_invar Array of rate_matrices proportions of invariant/invariable sites of size double
invariant Array of size sites where i-th element indicates whether i-th site is invariant. If it is invariant it is set to the index of frequency array that the character of the site is represented by. Otherewise, it is set to -1
pattern_weight Weight of each pattern
eigen_decomp_valid indicates whether the eigen decomposition is cached and valid. If it is, then eigen_decomp_valid is set to 1, 0 otherwise
eigenvecs Array of rate_matrices linearized arrays of eigen vectors. Entry i contains the eigenvectors (in linearized form) of the i-th instantaneous substitution matrix and is of size states x states
inv_eigenvecs Array of rate_matrices linearized arrays of inverse eigen vectors. Entry i contains the inverse eigenvectors (in linearized form) of the i-th instantaneous substitution matrix and is of size states x states
eigenvals Array of rate_matrices arrays of eigen values. Entry i contains the eigenvalues of the i-th instantaneous substitution matrix and is of size states
maxstates TODO
tipchars TODO
charmap TODO
ttlookup TODO
tipmap TODO
asc_bias_alloc Flag to determine whether the partition has been allocated with support for ascertainment bias correction
pll_operation_t
typedef struct
{  
  unsigned int parent_clv_index;
  int parent_scaler_index;
  unsigned int child1_clv_index;
  unsigned int child1_matrix_index;
  int child1_scaler_index;
  unsigned int child2_clv_index;
  unsigned int child2_matrix_index;
  int child2_scaler_index;
} pll_operation_t;
pll_dlist_t
typedef struct pll_dlist
{  
  struct pll_dlist * next;
  struct pll_dlist * prev;
  void * data;
} pll_dlist_t;
pll_msa_t
typedef struct pll_msa_s
{
  int count;
  int length;

  char ** sequence;
  char ** label;
} pll_msa_t;
pll_fasta_t
typedef struct
{  
  FILE * fp;
  char line[PLL_LINEALLOC];
  unsigned int * chrstatus;
  long no;
  long filesize;
  long lineno;
  long stripped_count;
  long stripped[256];
} pll_fasta_t;
pll_utree_t
typedef struct tree_noderec
{  
  char * label;
  double length;
  unsigned int node_index;
  unsigned int clv_index;
  int scaler_index;
  unsigned int pmatrix_index;
  struct tree_noderec * next;
  struct tree_noderec * back;
   
  void * data;
} pll_utree_t;
`pll_rtree_t
typedef struct pll_rtree
{
  char * label;
  double length;
  unsigned int node_index;
  unsigned int clv_index;
  int scaler_index;
  unsigned int pmatrix_index;
  struct pll_rtree * left;
  struct pll_rtree * right;
  struct pll_rtree * parent;

  void * data;
} pll_rtree_t;
pll_utree_rb_t
typedef struct pll_utree_rb_s
{
  int move_type;
  union
  {
    struct
    {
      pll_utree_t * p;
      pll_utree_t * r;
      pll_utree_t * rb;
      pll_utree_t * pnb;
      pll_utree_t * pnnb;
      double r_len;
      double pnb_len;
      double pnnb_len;
    } spr;
    struct
    {
      pll_utree_t * p;
      int nni_type;
    } nni;
  };
} pll_utree_rb_t;