Two Way ANOVA & Tukey's HSD - UMEcolGenetics/PawPawPulation-Genetics Wiki

Introduction to ANOVA

ANOVA (ANalysis Of VAriance) is a method used to ... This tutorial will focus on microsatellite data; however, a number of different marker types can be used.

Tukey's HSD is ...


This tutorial will show you how to do a two-way ANOVA and Tukey's HSD analysis on the combined pawapaw data using Excel/GenAlEx and JMP Pro 15. Excel/GenAlEx are free softwares available to download (link). JMP Pro 15 is not free, though many universities and institutions have purchased access to it. If you are faculty, student, or staff at a univeristy, ask someone in your department if they know about how to gain access.

Transforming dataset

The input data for JMP contains columns for locus number, population, and any genetic diversity statistics that are of interest to you. Generating this data table will require some modifications from the output of GenAlEx to create a data table similar to this:

input 16.png

To get this data, follow these steps: First, your data should be properly formatted in Excel.

input 17.png

Then activate GenAlEx (if not done so already). From here, click on the "Frequency-based" option then "Frequency

input 1.png

Insert the correct data parameters for your data, then click "OK"

input 2.png

This next window determines which genetic diversity statistics will be calculated. For the purpose of this tutorial, we will only look into the "Allele Frequency & Heterozygosity" section by clicking on "Frequency by pop" and "Het, Fstat, & Poly by Pop". If interested, you can also look into other statistics such as genetic distances, FST, private alleles, etc. by clicking the other options.

input 3.png

Once GenAlEx finishes calculating, go to the tab labeled "HFP", which stands for Heterozygosity, Fstatistics and Polymorphism by Population. This tab shows the genetic diversity measures given population and locus level.

input 4.png

This is the data that will be used as input for JMP to calculate a two-way ANOVA and Tukey's HSD. Before using this data as input for JMP, we should edit it slightly.

First, change all "#N/A" and empty cells to "NA". This will make sure that JMP knows there is no data for that locus at that population.

Next, change the locus IDs to numbers (1-n) to ensure it does not mess with the analysis. You can probably keep the loci as their original names, though it may cause issues later.

After these steps, you are ready to transfer the data to JMP!

ANOVA using JMP Pro 15

Once JMP is opened, you should open a new file/table. From there, you can copy and paste the data from Excel/GenAlEx to this new table. The table should resemble this:

input 5.png

Next, you need to change the datatype of each genetic diversity measure (Na, Ho, He, F) to numeric. To do that, right click on the columns and select "Column info...".

input 7.png

Then change the datatype to Numeric and the modeling type to Continuous. The pop-up window should resemble this:

input 8.png

You'll know you did it right if the datatypes on the left of the screen have Locus and Pop as red graphs and the genetic diversity measures as blue triangles

input 6.png


Once your data is ready and resembles the previous image, you are ready to do run a two-way ANOVA analysis. To do this, click on "Analyze" in the option bar, from there click on "Fit Model"

input 9.png

A pop-up window should appear. Click on all genetic diversity measures (Na, Ho, He, F) and transfer them to Y variables, with Pop and Locus as Model Effects. You can also change the Personality and Emphasis of the analysis, though that is dependent on your goals. For the purpose of this tutorial, we will do a Standard Least Squares Personality and Effect Leverage Emphasis.

input 10.png

Click "Run"

After a minute or so, a report should appear that shows graphs and tables related to your genetic diversity measures. Somewhat resembling this below:

input 11.png

The two-way ANOVA is complete!

The result of the two-way ANOVA tells you whether the genetic diversity measure you tested is significantly different at the Locus and/or Population level. You can determine if that genetic diversity measure is significant by looking at the Analysis of Variance table and seeing the if Prob>F for the C.Total row is significant (p < 0.05). If so, then that genetic diversity measure is significant!

Any significant measure should then have a Tukey's HSD test performed on them.

Tukey's HSD

Tukey's HSD can now be run on the significant genetic diversity measures resulting from the two-way ANOVA. To do this, click on the down arrow of the plot/table that you are interested in. From there, you will click on the "... Tukey HSD" option from the drop-down menu.

input 12.png

From there, a table will show up and different cells will be highlighted showing if they are significant or not, similar to below.

input 13.png

If you scroll past the table, there will be another table showing the Level (population), a few columns of letters (i.e., A, B, C), then a Least Sq Mean column.

input 14.png

These letters indicate similarities between values. For example, all Levels that have A, B, and C in the row are most similar to other Levels with the same lettering pattern. Levels not connected by the same letter(s) (i.e., A vs. A, B, and C) are significantly different from one another.

This data can then be included in your genetic diversity statistics table to signify which measures are significantly different from one another, and how different they are.