User Guide  OHBAanalysis/HMMMAR Wiki
UPDATING TO THE LATEST VERSION IS STRONGLY RECOMMENDED, DUE TO AN ERROR THAT WAS INTRODUCED AT THE END OF JANUARY 2021, AND CLEANED AT THE END OF MARCH 2021.
This page contains the manual of how to use the toolbox, divided in sections:
General guidelines
The toolbox can be applied to any data modality (EEG,MEG,fMRI,ECoG,etc). However, although the HMM is a general framework, different configurations can adapt better to the different modalities.
In particular, we found that using a Gaussian observation model (i.e. set the MAR order to zero) works well for wholebrain fMRI and is easier to link to the existing literature. Depending on your question, you might consider to let the covariance be shared for all states (see options.covtype below) such that the states are based on just activation changes, or you might prefer to avoid modelling the mean (options.zeromean) such that the states are distinct only in terms of their functional connectivity (Vidaurre et al. 2021). By default, the two will be modelled. This is the approach taken in Vidaurre et al. (2017). If the data is too highdimensional, we can run the HMM on PCAreduced data; see Ahrends et al. (2021) for some discussion about practical considerations on having too many parcels or time series. Although this approach is simple, it has some limitations. An alternative is having a PCA model per state Vidaurre (2021), an approach we refer to as HMMPCA; see the section below Timevarying PCA models for information to set this model up. One more possibility is to focus on the coherence of the signals and be (relatively) independent of the amplitude. This is the approach taken in Cabral et al. (2017), explained in detail below under the section Leading Eigenvector Dynamics Analysis.
In EEG/MEG/intracraneal data, it again depends on the question. If one is aiming to model the evoked response given certain task, the Gaussian model with a shared covariance matrix might be the simplest option. If we are interested in changes in the spectra in a specific set of regions, the MAR (order higher than zero and preferably an odd number) is recommended, preferably with no statewise modelling of the mean (options.zeromean=1). It is suggested to start with an order not very high to avoid overfitting, and then increase it if needed. The MAR tends to overfit nevertheless when applied on a large number of regions (e.g. MEG/EEG whole brain). In order to model whole brain M/EEG data, the timedelay embedded model (TDEHMM), presented in Vidaurre et al. (2017c), is a more useful approach. This is briefly explained in the section Timedelay embedded HMM.
If one is interested in investigating states of brain activity that are specific to a given stimulus or behaviour (where a state is where and how the stimulus is represented in the brain), then we recommend using Supervised HMM models, in which HMM states explicitly model the relationship between the brain activity and the stimulus.
Alternatively to the HMM, the model can be set up to be a mixture of distributions (e.g. of Gaussian distributions, or of decoders), so that we dispense with the temporal structure of the data.
Another consideration somewhat related to the choice of model and the data modality is the use of bandpass filtering. When using the MAR as an observation model (i.e. MAR order higher than zero), one must bear in mind that too aggressive bandpass filtering can quite strongly interact with the estimation of the MAR parameters. That can not only heavily interfere the estimation of the spectra (when computed using the MAR parameters  see below), but can also affect with the HMM inference. As a general rule of thumb, whenever bandpass filtering is a necessity, we might consider to downsample the data such that the sampling frequency is not much more than double the frequency cutoff. For example, if we bandpass filter our data to be between 0.1 and 50Hz, it is advised not to have a sampling frequency higher than 100Hz. Some basic preprocessing tools are provided as part of the toolbox (see below).
Please note that this is not intended to be a rigorous prescription. These guidelines are empirical, are based on our own research experience, and, in any case, are aimed to get you started rather than to provide optimal results. Whether certain configurations work better than others will also depend on the specifics of the data (amount of data, signal to noise ratio, number of channels, etc), and, crucially, on the question being addressed.
Installation
The toolbox does not have any external dependency, except for SPM if one intends to use SPM files as inputs. So, unless this is the case, one just needs to unzip the directory somewhere and add the paths:
addpath(genpath('my_HMMMAR_directory'))
Types of HMMs
The toolbox offer different possibilities (i.e. different families of probability distributions) to model the states. Which one to use will depend on the data modality and other characteristics of the data, as well as the on research question. In brief, the different possibilities and the basic parameters used to set up these are
 In the multivariate autoregressive model each state is a linear dynamical system, where time point t is expressed as a linear combination of the previous times points. This is set through options.order, indicating how many time points one goes back to predict the time point at t.
 In the Gaussian model, each state is a multivariate normal distribution with mean and covariance (or one of the two). The key parameter here are options.zeromean and options.covtype, which will determine whether the mean is modelled as statespecific, whether the covariance is modelled as statespecific, as well as the form of the covariance (diagonal or full matrix).
 In the timedelay embedded HMM, each state can be interpreted as a truncated Gaussian process. Here, the key parameters are options.embeddedlags and options.pca; see below for details.
 In Supervised HMM Models, each state is either an encoding or decoding model relating brain data and stimulus or action. To use this, a different function (tudatrain) is used altogether and does not require a specific option; see below for details.
 In HMMPCA, each state is a PCA projection, simultaneously encoding a timevarying functionalconnectivity pattern and a timevarying dimensionality reduction.
The next section lists all parameters.
HMM estimation
The estimation of the HMMMAR model is done through the function hmmmar(). Further details about the format of the inputs is provided in the next section. (Note that, by default, the data is standardised for each subject  see options.standardise below).
[hmm, Gamma, Xi, vpath, GammaInit, residuals, fehist] = hmmmar (data,T,options)
Input arguments
 data: The data time series.
 T: The length of each trial/segment/subject.
 options: a structure with the HMMMAR training options (see below for further specification).
Output arguments
 hmm: a structure with the estimated HMMMAR model (see below for further specification).
 Gamma: a (no. of time points X no. of states) matrix containing the state time courses. Note that if the MAR order is higher than 0, Gamma has less rows than data because the first order time points of each trial do not have an associated state time course estimation.
 Xi: a (no. of time points X no. of states X no. of states) matrix joint probability of past and future states conditioned on data. Xi has one row less per trial than Gamma.
 vpath: a (no. of time points X 1) vector with the Viterbi path.
 GammaInit: the state time courses used after initialisation.
 residuals: if the model is trained on the residuals (which happens if some of the connections have been clamped to a global value so that they do not drive the states), the value of such residuals.
 fehist: historial of the free energy, with one element per iteration of the variational inference.
The structure options has, or can have, the following fields:
 K: maximum number of HMM states (mandatory, with no default).
 id_mixture: if false, the model will be a mixture of distributions instead of an HMM, ignoring the temporal structure of the data (default to false).
 Fs: Sampling frequency (default to 1).
 order: maximum order of the MAR model; if zero, an HMM with Gaussian observations is trained (mandatory, with no default).
 timelag: the lapse between lags; for example, timelag==2 skips one sample for each sample that is taken, time_lag=3 skips 2, etc (default to 1).
 exptimelag: base for the exponential spacing of regressor samples. Samples are spaced by round(exptimelags^n) with n=0,1,2, never going further in the past than indicated by order. To check which past samples will be used, use the function formorders(); for finding out which value of exptimelag is needed to cover until certain frequency using a given number of lags, use higherorder(). If a value for exptimelag higher than 1 is specified, then timelag is ignored.
 orderoffset: offset to set the starting lag. For example, assuming order==4 and timelag==1, we will use lags {1,2,3,4} for orderoffset=0, or lags {3,4} for orderoffset==2. This parameter becomes particularly useful in situations of strong autocorrelations, as for example in MEG (default to 0).
 covtype: choice of the covariance matrix of the noise; "full" to have a full covariance matrix for each state (with offdiagonal elements different from zero), "sharedfull" to have one full covariance matrix for all states, "diag" to have a diagonal full covariance matrix for each state, and "shareddiag" to have one diagonal covariance matrix for all states (default to "full").
 zeromean: if 1, the mean of the time series will not be used to drive the states (default to 1 if order is higher than 0, and 0 otherwise).
 DirichletDiag: value of the diagonal of the prior of the transition probability matrix; the higher, the more persistent the states will be (default to 10). Note that this value is relative; the prior competes with the data, in such a way that if we have very long time series, DirichletDiag will have an irrelevant effect unless is set to a very big value.
 pca: if specified, the HMM will work on a reduced PCA space  refer to Section Working in a (PCA) reduced space for more information.
 standardise: whether or not to standardise each subject/trial such that each channel has mean equal to zero and standard deviation equal to one (default 1).
 standardise_pc: whether or not to standardise each subject/trial such that each principal component (if PCA has been applied by setting options.pca) has mean equal to zero and standard deviation equal to one (default 0).
 onpower: if 1, the HMM is run on power instead on using the raw time series (default 0).
 filter: if specified, a (Butterworth) bandpass filter will be applied to the data; it should contain a vector with two numbers, [hpc lpc] indicating respectively the high and low cutoff frequencies. If hpc is 0, it will only apply a lowpass filter; if lpc is Inf, it will only apply a highpass filter. Examples: options.filter = [1 45], options.filter = [0 8], options.filter = [30 Inf] . The default is [].
 detrend: if 1, linear detrending is applied to the data (default 0).
 downsample: if >0, it indicates the new sampling rate of the data, to which we want to downsample (default 0).
 sequential: If different from 0, it imposes restrictions on the state transitions. It it is a positive number, say 3, a state transition from state k can only go as far as k+3 (i.e. stay in k, go to k+1, k+2 or k+3); and it cannot retrocede to a previous state. If it is a negative number, say 3, then a state transition can go forwards and backwards (i.e. stay in k, go to k+1, k+2, k+3, k1, k2 or k3). Default is 0; that is, no restrictions.
 Pstructure and Pistructure: a more flexible way to restrict the possible pathways of state transitions. Pstructure is a K by K matrix with zeros or ones indicating which transitions are possible. Pistructure is a 1 by K vector of zeros and ones indicating with which states trials or sequences are allowed to start.
 S: a (no. of channels X no. of channels matrix), defining which autoregression coefficients are to be modelled and how. See section Targeting specific connections below.
 symmetricprior: if 1, the prior for the autoregressive coefficients relative to the relation from channel j to channel i is the same than the prior for the autoregressive coefficients modelling the relation from channel i to channel j (default to 1).
 dropstates: if 1, states whose fractional occupancy falls below a given threshold are eliminated from the model; if 0, they are kept but they are not updated anymore (default to 1).
 tol: minimum relative decrement of free energy (in comparison to the total decrement during the run) for the algorithm to stop (default to 1E5).
 cyc: maximum number of variational inference cycles. The algorithm with stop earlier if tol is reached. (Default to 1000).
 inittype: "hmmmar" for HMMMAR initialisation, "sequential" for an initialisation where trials or segments start on state 1 and progress in sequence towards state K, or "random" for random initialisation. (default to "hmmmar").
 initrep: number of repetitions of the initialisation algorithm, out of which the best will be used as a starting point for the variational inference (default to 5).
 initcyc: maximum number of optimisation cycles in the initialisation algorithm, per repetition (default is 25).
 Gamma: initial estimate for the states probabilities; if provided the initialisation procedure will not be run (optional)
 hmm: initial estimate for the HMMMAR structure; this is typically used as a warm restart using the output of a previous run as a starting point.
 updateGamma: If 0, the posterior state probabilities are left fixed to the initial value, which needs to be specified by parameter Gamma (default to 1).
 decodeGamma: if 1, the Viterbi algorithm will be run at the end of the traning to provide the Viterbi path in the output argument vpath (default to 1).
 useParallel: if 1, it uses the Matlab parallel computation engine; set 0 for architectures that are incompatible with parallel computing (default to 1).
 DirStats: if supplied with a directory name, information about the computation time of the run will be saved therein.
 plotGamma: use it display graphical progress during training. If 1, it shows a piece of the state courses; if 2, it shows a matrix representation of the state time courses and the acrosstrial average (default to 0, which does not show anything).
 verbose: whether to show algorithm progress (default to 1).
Note that, in the current implementation, the option dropstates cannot be used when using stochastic learning. It is however possible to call hmmprune() after the training to remove the unused states, if dropstates wasn't (or couldn't be) used.
[hmm, Gamma, Xi] = hmmprune (hmm, Gamma, Xi, threshold)
Where threshold indicates the minimum fractional occupancy for a state to be kept (default value to 1). Xi is also an optional argument.
Format of the inputs
The input data comes in two variables: data and T. The former contains the time series and the latter specifies the length of each segment (or trial or subject) of data. The time series data can be supplied in the following formats:
 A (no. of time points X no. of channels) matrix with double or single precision numeric elements, with all subjects concatenated
 A (no. of subjects X 1) Matlab cell containing one (no. of time points X no. of channels) matrix per subject.
 A (no. of subjects X 1) Matlab cell containing a string pointing at a file in disk, which contains the data for that subject. The file can be either a text file (.txt), an SPM file (in which case the file name will should be provided with no extension), or a Matlab file (.mat), in which case the data must be enclosed in a matrix named X.
SPM is necessary only if the data is specified as a SPM file, and will only be used to read the files.
Regarding T, it can be either a (N X 1) vector (where N is the total number of trials or segments for all subjects) containing the length of each trial/segment/subject, or a (no. of subjects X 1) cell with each element containing a vector (no. of trials X 1) reflecting the length of the trials for that particular subject.
For example, if there are 2 subjects, the first having 3 trials and the second having 4 trials, and all trials are 20000 time points, then T should be either T={[20000 20000 20000], [20000 20000 20000 20000]} or T=[20000 20000,20000 20000 20000 20000 20000]. This two formats are equivalent when using standard inference (the option by default), and only becomes relevant when performing stochastic inference (see below).
Note that data and T must be either both on cell format or both in matrix/vector format.
Format of the outputs
Struct hmm
In the current version of the toolbox, the observation model can be Gaussian (if options.order=0), a MAR model (if options.order>0), or a TUDA model (see below).
If the states are modelled as Gaussian distributions, the states are described by mean (hmm.state(k).W.Mu_W for state k) and covariance, which expected value can be retrieved by using getFuncConn(hmm,k). Note that both mean and covariance are themselves distributed as Gaussian and Wishart, respectively; the parameters of these distributions can be found in hmm.state(k).W and hmm.state(k).Omega.
If the states are modelled as MAR or TUDA, the states are described by the MAR/TUDA coefficients in hmm.state(k).W.Mu_W and covariance matrix of the MAR/TUDA error, which can also be retrieved by using getFuncConn(hmm,k). Depending on the specified model options, one of these two elements might be missing (for instance, if the model is Gaussian and the mean was not modelled, then hmm.state(k).W.Mu_W is empty).
More in detail, the HMMMAR structure hmm has the following fields:
 state: a struct array with the posterior and prior distributions of each state (see below).
 K: the number of inferred states (note that this number can be lower than the specified K, because some states were dropped).
 P: the (K X K) state transition probability matrices
 Dir2d_alpha: the parameters of the posterior Dirichlet distribution of the state transition probability matrices.
 Pi: the (K X 1) initial state probabilities.
 Dir_alpha: the parameters of the posterior Dirichlet distribution of the initial state probabilities.
 Omega: the parameters of the distribution of the covariance matrix, if modelled globally. If the covariance matrix is "sharedfull", then it corresponds to a Wishart distribution, with parameters Gam_rate, which is (no. of channels by no. of channels), and Gam_shape, which is a scalar. If the covariance matrix is "shareddiag", then it corresponds to a Gamma distribution (one per channel), with parameters Gam_rate, which is (1 by no. of channels), and Gam_shape, which is a scalar.
 prior: the parameters of the prior Dirichlet distribution of the state transition probability matrices, the initial state probabilities and the covariance matrix if modelled globally.
 train: an structure with the training options.
Each element of the state field contains the information of one state in a struct with fields:
 W: either the mean if this is a Gaussian distribution or the Gaussiandistributed MAR/TUDA coefficients. It contains fields Mu_W and S_W with, respectively, the mean and the covariance matrix of the MAR coefficients  not to be confused with the covariance matrix of the error, which is modelled in Omega. Mu_W is (no. of channels by no. of lags X number of channels), with an additional row the first if the mean is being modelled. If the covariance matrix is "full" or "sharedfull", S_W is (no. of channels^2 by no. of lags X no. of channels^2 by no. of lags), otherwise it is (no. of channels by no. of lags X no. of channels by no. of lags X no. of channels).
 Omega: the parameters of the distribution of the covariance matrix that models the error or the covariance if this is a Gaussian distribution, and if this is modelled statewise. If the covariance matrix is "full", then it corresponds to a Wishart distribution, with parameters Gam_rate, which is (no. of channels by no. of channels), and Gam_shape, which is a scalar. If the covariance matrix is "diag", then it corresponds to a Gamma distribution (one per channel), with parameters Gam_rate, which is (1 by no. of channels), and Gam_shape, which is a scalar.
 alpha: the parameters of the Gamma prior distribution on the lags (time domain), with fields Gam_rate, which is a (1 by no. of lags) vector, and Gam_shape, which is a scalar.
 sigma: the parameters of the Gamma prior distribution on the channels (space domain), with fields Gam_rate, which is a (no. of channels by no. of channels) vector, and Gam_shape, which is a scalar.
For more information about the Bayesian hierarchy, please refer to (Vidaurre et al., 2016).
State time courses
The second argument Gamma is a matrix (time points by states), with the probabilities for each state to be active at each time point. Note that the number of time points of Gamma will not coincide with the number of time points in the data when: (i) we use a MAR/TUDA model, (ii) when we use a TDE model (see below), or when we downsample the data using options.downsample. In order to put Gamma in the same dimensions of the data, we can use:
Gamma = padGamma (Gamma,T,options);
The third argument Xi is the probability of having states k1 and k2 in subsequent time points. For large data sets, this can be quite large, so it can be left unassigned.
The fourth argument vp is the Viterbi path (Vidaurre et al., 2016).
Preprocessing
The toolbox offers a few basic preprocessing options:

Filtering: the toolbox offers to possibility of filtering the data using a (Butterworth) lowpass, highpass or bandpass filter, by specifying options.filter. The default is options.filter=0, which does not do any filtering. Otherwise, it must be specified as a vector with two elements, which are the limits of the band of frequency of interest (the first can be 0, and the second can be Inf).

Detrending: In M/EEG it is often the case that there are strong trends in the data (e.g. the magnitude of the signal going up continuously for an entire trial). By using options.detrend=1, we would remove the linear trends in the data for each channel and trial separately.

Standardisation: In order to avoid the segmentation to be dominated by differences between trials in terms of signal scaling or shifting (which would hamper the model to focus on the dynamics), we can standardise (center and make the standard deviation equal to one) the signal for each trial, such that all trials and channels have the same mean and standard deviation. This is done by using options.standardise=1 (which is the option by default).

Correction for volume conduction (i.e. reduction of signal leakage): When source space M/EEG data is analysed, there is a strong artifactual correlation between proximal sources due to the uncertainty in source reconstruction. This is sometimes problematic because it can dominate the estimation of functional connectivity. Here, two solutions are implemented: the symmetric orthogonalisation proposed by Colclough et al. (2015) and the innovations orthogonalisation proposed by PascualMarqui el al. (2017). The former is done with options.leakagecorr=1. The second is done with options.leakagecorr=p, where p>0 is the autoregressive order used in the correction (see PascualMarqui et al. (2017) for details). Default is options.leakagecorr=0, which does not perform any leakage correction.

Computing the Hilbert envelope of the signal, in order to focus on changes in power and ignore phase. In this case, dealing with sign ambiguity is no longer necessary. This is done with options.onpower=1. Typically, we would set options.order=0, as in Baker et al (2014). Setting options.order to something higher than 0 will model the autocorrelation (and lagged correlation between channels) of the power fluctuations.

PCA: The data can be projected in a lowdimensional PCA space in order to alleviate computations. See Working in a (PCA) reduced space later.

Downsampling: The data can be downsampled to a new frequency, indicated (in Hz) in the field options.downsample. This is particularly convenient when working on the power time series, given that changes in power are known to be relatively slow. Note that if options.downsample is specified, options.Fs should be too.
Importantly, the preprocessing is done in this order. Change of the order of the steps could change the results. For example, doing PCA after downsampling is different than doing PCA before (as it is implemented). Currently, it is not an option to change the order of these steps, so, if desired, the preprocessing must be done manually.
Sometimes we might want to estimate the HMM states on some portion of the data, and apply these states (i.e. find when they activate) in a separate portion of the data. We might want to do this, for example, in order to relate different data sets, i.e. rest vs. task, movie watching vs. recollection, etc. In these cases, we need to have the same preprocessing being done on all data. In particular, if we are doing PCA, it is crucial to have the same PCA projection for everything. In order to make easier this kind of approach, it is possible to use the function
[data, T, options] = hmmpreprocess (data,T,options)
Which applies all the preprocessing steps to the data, returning also the struct options with the corresponding preprocessing options being removed so that this is not done again. Here, data and T will correspond to all data concatenated. Therefore, when calling hmmmar later on, it is important to use options as returned from this function specially if PCA is being carried out.
Analysis of the results
The HMM inference provides the state time courses (Gamma) indicating the probability of each state to be active at each time point, and the description of the probability distribution of each state. Both things are described elsewhere in this Wiki; here we introduce a few functions that are useful to interrogate the results. More details about arguments are contained in the headers of the corresponding .m files, under the path utils/analysis.

The mean pattern of activity of each state. The function getMean(hmm,k) outputs the mean of the Gaussian distribution, assuming that the option zeromean was set to 0 (see above). Note that this is mostly useful for the Gaussian distribution, but not that much for the MAR model.

The MAR coefficients of the model. Assuming the observation model is set to be MAR (_order>0), function getMARmodel(hmm,k) gets the MAR coefficients of state k.

The autocovariance matrix (or crosscorrelation matrix) of the model. Assuming a TDEHMM was deployed (by specifying options.embeddedlags), the function getAutoCovMat(hmm,k) gets the corresponding autocovariance matrix of state k.

The functional connectivity of each state. The function getFuncConn(hmm,k) outputs the covariance and correlation matrices of state k, when the states are distributed Gaussian (order=0). In the case of a MAR state model, these refer to the covariance matrix of the residual.

The transition probabilities. The function getTransProbs(hmm) returns the transition probabilies from any state to any other state, without considering the persistence probabilities (i.e. the probability to remain in the same state). The transition probability matrix including the persistence probabilities is contained in hmm.P.

The transition probabilities for a subset of the subjects. The function getMaskedTransProbMats returns the transition probabilities (including persistencies) for subsets of the data as specified in the parameter Masks. This is useful for example if we wish to find differences in the state transitions between groups.

The state time fractional occupancies, using getFractionalOccupancy(Gamma,T,dim). This can refer to either (i) how much time the HMM spends on each state at each time point on average (across trials), or (ii) how much time each subject/trial/session spends in each state (i.e. the average state probability across time, per session or subject). The former, useful for task, is computed when dim=1; the latter, useful to investigate differences in occupancies between subjects, is computed when dim=2.

The state switching rates. The function getSwitchingRate provides a measure of the state switching rate for each session/subject, and can be understood as a measure of stability per subject.

The state life times. The function _ getStateLifeTimes_ returns, per subject/session/trial, a vector of life times per state, containing the number of time points per visit. This reflects the temporal stability of the states.

The state interval times. The function _ getStateIntervalTimes_ returns, per subject/session/trial, a vector of interval times per state, containing the number of time points between visits. The interval times, the life times, the fractional occupancies and the switching rates are sometimes referred to collectively as the "chronnectome".

The maximum fractional occupancy, used to measure whether the HMM is effectively capturing the dynamics on the data. The function getMaxFractionalOccupancy finds the maximum fractional occupancy for each subject/trial. This is a useful statistic to diagnose whether the HMM solution is "mixing well" or states are assigned to describe entire subjects (in which case the HMM is doing a poor job in finding the data dynamics). See this reference for more information about this criterion.

The onset time of the states for each session. The function getStateOnsets provides a vector per session and state of when specifically the state becomes active. That can be used to relate to task information.

The evoked state probability relative to a stimulus. The function evokedStateProbability() can be used to examine how the stimulus modulates the state probabilities.

The similarity between two HMM runs. There are multiple ways to measure the similarity between HMM runs (on the same or comparable data). One of them is to measure the statistical dependence between the state time courses between the two runs. This is done with the function getGammaSimilarity(gamma1,gamma2), which measures the overlapping between the state probabilities after optimally reordering the states.
Apart from inspecting the HMM results, the toolbox provides a few other functions to inspect the data in order to understand the behaviour of the HMM.

Sometimes we might find that the states "do not mix well", that is, the state assignment is dominated by betweensubject differences and fails to capture the withinsession dynamics. For example, when looking at fMRI data, this might happen when the static functional connectivity is too distinct between subjects. In order to measure this, the function corrStaticFC computes the (subjects by subjects) matrix of static FC similarities (measured in terms of correlation) between each pair of subjects. If the obtained values are too low, then covtype='sharedfull' has a higher chance to do a good job.

A problem that can occur when using the MAR model on M/EEG data (specially for a large number or regions) is that the MAR model, when parametrised with a high order, can overfit the data with the result that one state ends up explaining the entire data set (i.e. the other states are dropped from the model). In order to assess this possibility, the function explainedvar_singleMAR.m computes the explained variance for one single MAR model on the data. If this value is too high (too close to 1.0), then the HMM segmentation will probably fail.

An important option when dealing with large data sets is pca, which can be used to reduce dimensionality. In order to assist in the choice of how many PCA components or how much variance we would like to explain by the PCA decomposition, the function explainedvar_PCA performs preprocessing on the data, performs PCA, and returns the explained variance for each PCA component, which we can then use to inspect the rank of the data and decide the number of PC components to use in an informed way.
Estimation of the state time courses and Viterbi path
Although these are outputs of the hmmmar() function, they can also be estimated separately, once we have the HMMMAR structure, using the function hmmdecode() . Importantly, you can use this function to track the estimated states in a different data set (that is, data and T do not need to be the same than in training). Further details about the format of the inputs were provided in the Format of the inputs section
[Gamma,Xi] = hmmdecode(data,T,hmm,0)
[viterbipath] = hmmdecode(data,T,hmm,1)
Input arguments
 data: a (no. of time points X no. of channels) matrix containing the time series.
 T: a vector containing the length of each trial/segment/subject (the sum of T must be equal to the number of time points contained in data).
 hmm: a structure with the HMMMAR training options (see below for further specification).
Output arguments
 Gamma: a (no. of time points X no. of states) matrix containing the state time courses. Note that if the MAR order is higher than 0, Gamma has less rows than data because the first order time points of each trial do not have an associated state time course estimation.
 Xi: a (no. of time points X no. of states X no. of states) matrix joint probability of past and future states conditioned on data. Xi has one row less per trial than Gamma.
 viterbipath: a (no. of time points X 1) vector with the Viterbi path.
Evoked State Probabilities
If we are dealing with task data, we can obtain the evoked state probabilities given certain task. For example, in (Vidaurre et al., 2016) we applied the HMMMAR on selfpaced fingertapping data, where the subjects pressed a button every now and then. Given some vector stimulus (time points by one) containing when the subject pressed the button (1 indicating the data points when that happened and 0 otherwise), we can obtain tasklocked state time courses by
evokedGamma = evokedStateProbability(stimulus,T,Gamma,window,Fs,downsample);
where window indicates the length of the window around the event (in seconds), T and Gamma are specified as before, Fs is the sampling frequency of the data on which the HMM was run (which also corresponds to window), and downsample is an optional argument that must be specified only if the HMM was run with the downsample parameter (in which case, it must match).
Working in a (PCA) reduced space
When the number of channels is high, the MAR can overparametrise and/or be computationally expensive. To get around this issue, one possibility is to work in a lowdimensional space that represents the data as faithfully as possible. In the toolbox, the possibility of working in PCA space is implemented. Other options would be possible (ICA, nonlinear PCA, etc) but they are not currently implemented. The PCA decomposition also works for the Gaussian model (order=0).
The parameter options.pca is used to select the dimensionality of the decomposition. There are various options to specify it. If options.pca < 1, then it represents the amount of variance, and the data will be decomposed in a number of components such that this amount of variance is explained. If options.pca >= 1, then it represents the number of PCA components to keep. It can also accept the option of specifying both, in a two elements vector where the first element is the amount of variance and the second is the number of components. In this case, it will use the minimum of the two. Finally, if options.pca has three elements, and the third element is equal to 2, then it will take the maximum instead of the minimum.
A sometimes better alternative is to have states defined as (probabilistic) PCA models; see Timevarying PCA models for details.
Computing the state spectra
The state spectra can be computed using either a parametric (MAR) or a nonparametric approach. For more information about these, check the Theory Section, or, for even more detail, refer to (Vidaurre et al., 2016). Note that for the parametric approach PDC is estimated by default, whereas for the nonparametric approach PDC is not estimated by default for computational reasons (see below).
The parametric spectra is computed by the function
fit = hmmspectramar(X,T,hmm,Gamma,options)
If the struct hmm is specified (using the output of hmmmar), then the MAR spectra will be computed from the parameters within hmm; in this case, X, T and Gamma can be left empty. If hmm is left empty or unspecified, the MAR parameters will be recomputed using maximum likelihood, in which case X, T and Gamma are mandatory.
Input arguments
 X: a (no. of time points X no. of channels) matrix containing the time series.
 T: a vector containing the length of each trial/segment/subject (the sum of T must be equal to the number of rows of X.
 hmm: a structure with the estimated HMMMAR model.
 Gamma: a (no. of time points X no. of states) matrix containing the state time courses.
 options.fpass: frequency band to be used [fmin fmax] (default to [0 options.Fs/2]).
 options.Nf: no. of frequencies to be computed in the range minHzmaxHz.
 options.p: pvalue for computing jackknife confidence intervals (default 0, in which case no confidence interval will be computed). Using this option is only possible when the MAR models are to be recomputed, i.e. if hmm is left empty.
 options.order: if we want a higher MAR order than that used for training, specify it here (a new set of MAR models will be estimated).
 options.completelags: if MLestimation is true and completelags is true, the MAR spectra will be calculated using maximum likelihood with the complete set of lags up to the specified order, i.e. lags=1,2,...,order (default to 1).
The rest of the arguments of options (e.g. for preprocessing) are the same than for hmmmar().
Output arguments
The variable fit contains a struct array struct with K elements, each of which contains:
 psd: (Nf x no. of channels x no. of channels) power Spectral Density matrix
 ipsd: (Nf x no. of channels x no. of channels) inverse Power Spectral Density matrix
 coh: (Nf x no. of channels x no. of channels) coherence matrix
 pcoh: (Nf x no. of channels x no. of channels) partial Coherence matrix
 pdc: (Nf x no. of channels x no. of channels) Baccala's Partial Directed Coherence
 phase: (Nf x no. of channels x no. of channels) phase matrix
 psderr: (2 x Nf x no. of channels x no. of channels) interval of confidence for the crossspectral density
 coherr: (2 x Nf x no. of channels x no. of channels) interval of confidence for the coherence.
 pcoherr: (2 x Nf x no. of channels x no. of channels) interval of confidence for the partial coherence.
 pdcerr: (2 x Nf x no. of channels x no. of channels) interval of confidence for the partial directed coherence.
 f: (Nf x 1) frequency vector.
The nonparametric spectra is computed by the function
fit = hmmspectramt(X,T,Gamma,options)
Input arguments
 X: a (no. of time points X no. of channels) matrix containing the time series.
 T: a vector containing the length of each trial/segment/subject (the sum of T must be equal to the number of rows of X.
 Gamma: a (no. of time points X no. of states) matrix containing the state time courses.
 options.Fs: sampling frequency of the data.
 options.order: the order with which the HMM was estimated with hmmmar().
 options.embeddedlags: the embedded lags with which the HMM was estimated with hmmmar(). Both order and embeddedlags should be the same as used in options when calling hmmmar().
 options.fpass: frequency band to be used [fmin fmax] (default to [0 options.Fs/2]).
 options.p: pvalue for computing jackknife confidence intervals (default 0, in which case no confidence interval will be computed).
 options.win: number of time points per nonoverlapping window; this parameter relates to the frequency resolution (default to min(T)).
 options.tapers: a numeric vector [TW K] where TW is the timebandwidth product and K is the number of tapers to be used in the multitaper calculation, which must be less than or equal to 2TW1 (default to [4 7]).
 options.pad: the amount of padding used by the fast Fourier transform (FFT), used to improve the efficiency of the FFT and increase the number of frequency bins (higher than 1, default to 0).
 options.tol: tolerance limit for the Wilson algorithm used to get PDC (default to 1e18).
 options.numIterations: no. iterations for the Wilson algorithm used to get PDC (default to 100).
 options.to_do: a (2 by 1) vector, with component indicating, respectively, whether a estimation of coherence and/or PDC is going to be produced (default is [1 0]).
The rest of the arguments of options (e.g. for preprocessing) are the same than for hmmmar().
Output arguments
 psd: (Nf x no. of channels x no. of channels) power Spectral Density matrix
 ipsd: (Nf x no. of channels x no. of channels) inverse Power Spectral Density matrix
 coh: (Nf x no. of channels x no. of channels) coherence matrix
 pcoh: (Nf x no. of channels x no. of channels) partial Coherence matrix
 pdc: (Nf x no. of channels x no. of channels) Baccala's Partial Directed Coherence
 phase: (Nf x no. of channels x no. of channels) phase matrix
 psderr: (2 x Nf x no. of channels x no. of channels) interval of confidence for the crossspectral density
 coherr: (2 x Nf x no. of channels x no. of channels) interval of confidence for the coherence.
 pcoherr: (2 x Nf x no. of channels x no. of channels) interval of confidence for the partial coherence.
 pdcerr: (2 x Nf x no. of channels x no. of channels) interval of confidence for the partial directed coherence.
 sdphase: (2 x Nf x no. of channels x no. of channels) jackknife standard deviation of the phase
 f: (Nf x 1) frequency vector.
The part related to the multitaper calculation is inspired by the code in the Chronux toolbox.
Once all the spectra is computed either way, we use the evoked state probabilities given certain task (see above) to build a HMMbased timefrequency representation. (The same can be done on restingstate data for a any given fragment of the state time courses). This is done by
[psd_tf,coh_tf,pdc_tf] = hmmtimefreq(fit,Gamma,centered);
where the frequency information will be centered for each frequency bin if centered is 1.
Factorising the states spectra into frequency bands
The spectral information computed by either the multitaper or the MAR approach contains (states by frequency bins by channels by channels) values, just for the coherence estimation. This can be an overwhelming amount of information. For this reason, it could be advisable to factorise the frequency bins into a handful of frequency modes or bands. There are two ways of doing this: by defining the bands manually (e.g. theta, alpha, gamma), or by performing the factorisation in a data driven way.
In the simpler case, we would define the bands as a matrix with one row per band, and two columns indicating the beginning and the end of each band, such as
bands = [0 5; 5 10; 10 20; 20 40; 40 Inf];
Given a spectral estimation sp_fit as returned for example by hmmspectramt(), we would then apply
sp_fit = spectbands(sp_fit,bands);
For a datadriven frequency decomposition (into frequency modes) as in Vidaurre et al. (2018), we typically use a subject by subject spectral estimation which must be stored in a cell:
Gamma = padGamma(Gamma,T,hmm_train_options);
sp_fit_subj = cell(nsubjects,1); Tacc = 0;
for j = 1:nsubjects
ind = Tacc + (1:T{j}); Tacc = Tacc + length(ind);
sp_fit_subj{j} = hmmspectramt(X{j},T{j},hmm,Gamma(ind,:),options_spectra);
end
And then apply spectdecompose()
params_fac = struct();
params_fac.Method = 'NNMF';
params_fac.Ncomp = 4;
[spectral_factors,spectral_profiles,spectral_factors_subj] = spectdecompose(sp_fit_subj,params_fac);
In this case, we asked for four modes, and used nonnegative matrix factorisation (NNMF) as the decomposing method. Here, spectral_factors contains the group estimation of the frequency modes, i.e. a struct as returned by hmmspectramt() but where, instead of having for instance 250 frequency bins, we have 4 frequency modes; the equivalent estimation is stored subject by subject in spectral_factors_subj. Finally, spectral_profiles contains a matrix (frequency bins by frequency modes) with the frequency profile of each mode. Note that the NNMF estimation depends on an optimisation process with a random initialisation; therefore, it is recommended to inspect spectral_profiles visually and rerun if the frequency modes look too unclear. Within the function, NNMF is run a few times, and the solution with the most clearly unimodal modes (i.e. with just one peak per mode) is the chosen one.
Generating state brain maps
The toolbox contains functions to display the HMM states in brain space in terms of both amount of connectivity or connectivity. First, we will assume that the states are estimated as Gaussian distributions (as we normally do in fMRI, or when running the HMM on MEG power time series). After, some details are provided on how to generate brain maps in M/EEG source space using frequency information.
To generate brain activity maps, we can either project the states' activity into NIFTI format (volumetric space) or CIFTI format (surface space); which one of the two options to use will depend on the original space of the data. In either case, the HMM should have been trained using the mean parameter (options.zeromean=0); if not, the channel variance will be used instead. To generate the maps, we would use:
maps = makeMap(hmm,k,parcellationfile,maskfile,onconnectivity,centermaps,scalemaps,outputfile,wbdir)
where hmm is an hmm struct as returned from hmmmar.m (or an matrix/array of precomputed values – see inline documentation within the function), and k specifies which states will be projected into brain space, which can be a single number, a vector, or empty, in which case it projects all the states. If parcellationfile (which contains the original parcellation or ICA decomposition) is a NIFTI file, then also a maskfile needs to be specified, and OSL needs to be installed an in the path; if it is a CIFTI file, then it has to be a dtseries.nii file (see below to convert from func.gii to dtseries.nii). The flag onconnectivity indicates whether to build maps based on amplitude (option 0) or on connectivity, in which case it can be based on the degree of the covariance matrix (option 1), the degree of the correlation matrix (option 10), the first PCA component of the covariance matrix (option 2), or the first PCA component of the correlation matrix (option 20). Options centermaps and scalemaps are about whether or not to demean and standardize the maps across states. If wbdir is the route to the HCP Workbench, which is needed if parcellationfile is a CIFTI. If parcellationfile is NIFTI and wbdir is specified, then the maps will also be projected to surface space (CIFTI). Note that if parcellationfile is a CIFTI file, then the HMM states will be made into CIFTI files only. In this case, maskfile can be left empty. The output of the function, maps, is a (voxels by states) matrix with the projected maps. See the withinfile function documentation for more details about the different options.
About connectivity, there are two additional ways to plot the states: as a glass connectivity brain, or as a circle graph (with nodes/regions placed on the circumference and coloured lines indicating the connections). For a glass connectivity brain map, we would use
graphs = makeBrainGraph(hmm,k,parcellationfile,maskfile,centergraphs,scalegraphs,partialcorr,threshold,outputfile)
where as before k specifies which states will be projected into brain space (see above), partialcorr would be use to base the plots on partial (directed) correlations, and threshold would be used to only display the strongest connections (for example, threshold=0.95 only the 5% strongest connections would be shown). Parameters centergraphs and scalegraphs are as before. The output graphs is a (voxels by voxels by states) matrix with the projected connectivity graphs. To use makeBrainGraph, OSL has to be installed and on the path. Again, see function documentation for more details.
For a circle graph, we would do
graphs = makeConnectivityCircle(hmm,k,labels,centergraphs,scalegraphs,partialcorr,threshold)
where labels would contain the name of the regions, in a cell of strings, and k is as before. The output graphs is a (regions by regions by states) matrix with the connectivity graphs.
As mentioned, these functions are used to map either instantaneous activity or connectivity into brain space. In the case of MEG or EEG (when ran in source space), this information is normally a function of frequency. The following functions are made to create maps and graphs that are analogous to the above but using the frequency information, sp_fit, as estimated from hmmspectramar(), hmmspectramar(), spectbands() or spectdecompose() (see above for details about those):
maps = makeSpectralMap(sp_fit,k,freqindex,parcellationfile,maskfile,centermaps,scalemaps,outputfile,wbdir)
graphs = makeSpectralGraph(sp_fit,k,freqindex,type,parcellationfile,maskfile,centergraphs,scalegraphs,threshold,outputfile)
graphs = makeSpectralConnectivityCircle(sp_fit,k,freqindex,type,labels,centergraphs,scalegraphs,threshold)
In all these functions, freqindex tells which frequency bin or band to plot: it can be an index (e.g. band 2, or frequency bin 10) or a range to sum across (e.g. 1:10). In the connectivity functions, type refers to the spectral estimate to use: 1 for coherence, 2 for partial coherence.
As a support for this functions, two utilities are provided: parc_gii2dtseries() which maps from func.gii to dtseries.nii, so that the parcellation can be used for example by makeMap(); and parc_surf2vol() to convert a parcellation from surface (CIFTI) into volumetric space (NIFTI).
parc_surf2vol(input,surface,volume_space,output,wbdir)
parc_gii2dtseries(input,output,wbdir)
Finding subjectspecific states
Once we have run the inference process on the entire data, we have state time courses for each subject and grouplevel states (i.e. the same states for all subjects). In some cases, we may be also interested on the specific manifestation of the states for each subject, in terms of their state distribution representation (e.g. MAR) or in terms of their spectral properties. To obtain subjectspecific states we just need to use the state time courses for a particular subject and reinfer the states given such state time courses. From there we can estimate subjectspecific spectral information.
Assuming that the data for a given subject is stored in {X_subj,T_subj} and that we have the state time courses for this subject in Gamma_subj, we would do
[hmm_subj,Gamma_subj,vpath_subj] = hmmdual(data_subj,T_subj,hmm);
With the structure hmm_subj, we can estimate subjectspecific spectral information exactly as detailed above.
Importantly, the data should have been standardised beforehand, using the entire data set. Otherwise, if the standardisation were done within hmmdual, the properties of the data would change with respect to the rest of the data, and the (dual) estimation would not be correct.
Other model varieties
The basic state distribution is the Gaussian model, of which the MAR model can be considered an extension. Next, we describe some additional state distributions: the TimeDelay Embedded HMM (TDEHMM), Temporally Unconstrained Decoding Analysis (TUDA), timevarying PCA, and Leading Eigenvector Analysis (LEidA). Some basic intuition about when to use what is given in the General Guidelines above
Timedelay embedded HMM
When modelling M/EEG data over a sufficient number of regions (e.g. whole brain), using the MAR is not only computationally costly but also has problems of overfitting. One possibility to alleviate these issues is to model only the withinchannel AR coefficients, such that we do not model the crosschannel interactions and focus on the spectral properties of the individual channels (see section Targeting specific connections). A whole new possibility is to use the Timedelay embedded HMM (TDEHMM), as described in Vidaurre et al. (2018). The idea is that we model the main principal components (eigenvectors) of a Gaussian distribution defined not only over space but also over a window of time around the point of interest. This approach, inspired by and related to the theory of Gaussian processes, can capture both spatial and spectral properties without overfitting. The limitation is that the use of dimensionality reduction over this (space x time) by (space x time) matrices that represent the states makes the approach to be relatively blind to the high frequencies. In order to use this approach, we would need to specify:
 options.embeddedlags, defining the lags that form the time window. For example, if we set _options.embeddedlags_=7:7 we would be covering a window of 15 time points around the time point of interest. This is the configuration that we used in Vidaurre et al. (2018) to model MEG data source reconstructed to 42 regions, at a sampling frequency of 250Hz; that is, the window contains 15/250=60ms in this case. Making this window shorter will allow the model to pay attention to higher frequencies. Alternatively, we could also model interspersed lags, e.g. _options.embeddedlags_=15:2:15, although this possibility has not yet been explored.
 options.pca, defining the number of eigenvectors to model. For theoretical and practical reasons, when working with sourcereconstructed M/EEG data with correction for volume conduction (aka leakage correction), it is recommended to use double the number of regions (see Vidaurre et al. (2018) for details); in the example, that would correspond to options.pca=84.
Also, we need to specify
 options.order=0
 options.covtype='full'
 options.zeromean=1
The states are thus defined in time and space and contain spectral information. To retrieve this spectral information, we can use hmmspectramt(), described in section Computing the state spectra.
Intuitively, using a larger window (options.embeddedlags) will make the model focus more in slower frequencies and neglect high frequencies more. On the contrary, reducing this window will help to identify high frequencies. Likewise, having mode PCA components will include more information about the high frequencies.
Supervised HMM Models
For experimental data where different stimuli are presented on different trials, supervised HMM models can be used where each state explicitly models the relationship between the brain activity and the stimulus design matrix. Supervised HMM models allow researchers to track trialspecific neural dynamics of stimulus processing and decision making with high temporal precision, addressing a major limitation of traditional analysis approaches: that they rely on consistent timing of these processes over trials.
Here, unlike the standard HMM, each state is either a decoding model, called TUDA (as in Vidaurre et al. (2019)) or an encoding model, called STRM (as in Higgins et al. (2022)). In our more recent work, we show that the STRM model can achieve the same functions as decoding model HMMs, but with number of additional benefits. These include being more interpretable, achieving slightly higher classification accuracy and overcoming specific technical challenges in some experimental designs. For these reasons, we currently recommend users to use the STRM model formulation of the HMM as outlined in Higgins et al. (2022).
For either of these analyses, the main function to perform this analysis is invoked as:
[tuda,Gamma] = tudatrain (X,Y,T,options)
The input arguments are:
 X: The neural data, specified as a (time by regions) matrix. The other input formats are not yet implemented for supervised HMMs.
 Y: The value of the stimulus or behavioural action to relate the brain activity to. It can either be a (time by features) matrix with the same number of time points as X, or a (trials by features) matrix with the same number of rows as trials are specified in T. See below about important considerations about the nature and format of the stimulus.
 T: The length of each trial or segment.
 options: a structure with the training options (see below for further specification).
The output arguments are:
 tuda: The TUDA or STRM model, which fields are analogous to those of an HMM model (for details about how to extract the state decoding parameters see below).
 Gamma: A (no. of time points X no. of states) matrix containing the state time courses.
Many of the options that can be specified in the options struct match those of the HMM explained above, although stochastic inference is not yet implemented for TUDA. Apart from the preprocessing options (go here for details), the main options for TUDA (specified as fields in options) are:
 K: maximum number of TUDA/STRM states (mandatory, with no default).
 encodemodel: a Boolean value, if true the HMM is fit as an encoding model, if false it is fit as a decoding model (default is true).
 parallel_trials: set to true if the data is a collection of stimulus repetitions or trials; and false for continuous data, or data where the trials are not consistent in terms of stimulus presentation of behavioural action.
 Nfeatures: it is possible to perform some preselection of the channels that will be fed into TUDA, based on their predictive power. This is useful when the number of channels is too large. This option indicates how many channels are going to be preselected (not to confuse with the features to be predicted). The default is not to do channel screening and include all in the model.
Once the model is trained, we can extract the decoding or encoding coefficients for each state using
beta = tudabeta(tuda)
Here, beta is a (no. states by no. channels by no. predicted features) array of decoding or encoding coefficients.
Also, it is possible to recompute the state time courses (or apply the TUDA decoding models or STRM encoding models to a different data set), using a precomputed model (obtained with tudatrain):
[Gamma,vpath] = tudadecode(X,Y,T,tuda)
The output arguments are:
 Gamma: a (no. of time points X no. of states) matrix containing the state time courses. Note that if the MAR order is higher than 0, Gamma has less rows than data because the first order time points of each trial do not have an associated state time course estimation.
 vpath: a (no. of time points X 1) vector with the most likely sequence of states (referred to as Viterbi path).
Importantly, each state corresponds to a standard Bayesian regression model. That is, the response is treated as a continuous variable (i.e. the noise is assumed to be Gaussiandistributed). In practice, this means that the TUDA model is not optimally suited to decode a categorical variable, for which we again recommend using the STRM model implementation as outlined in Higgins et al. (2022).
Timevarying PCA models
When faced with highdimensional data (eg fMRI data parcellated over many parcels, or highdensity EEG data in sensor space), a typical approach would be to apply PCA on the data and then use the HMM on PCA space — which we would do here with the options.pca parameter. However, whenever the data covariance is very stationary this approach might be suboptimal; furthermore, as explained in Vidaurre (2017), even if the data covariance is stationary, this approach can bias the results in non trivial ways. An alternative is to have a HMM where the observation model itself is a probabilistic PCA model, which not only effects dimensionality reduction but also encodes a pattern of functional connectivity (simply because PCA is based on the data covariance), therefore doing both things in one single step. To use HMMPCA
 options.lowrank=p
where p would indicate the number of PCA component with the observation model. This option is not compatible with options.pca. Note that the HMMPCA model does not model the mean, as a standard Gaussian would do. Therefore, it only appropriate to model changes in covariance (more specifically, any change in amplitude would be absorbed into the covariance)
Leading Eigenvector Dynamics Analysis
In modelling fMRI, we might be mainly interested in functional connectivity. Although the assessment of functional connectivity is never free from amplitude modulations, we can place more emphasis on the former by using an appropriate observation model. In particular, the Leading Eigenvector Dynamics Analysis (LEiDa) approach, proposed by Cabral et al. (2017), is based on computing the phasic relations between the signals at each time point (measured in terms of coherence), and estimate the segmentation on those. More specifically, the time series to be HMMsegmented correspond to the first eigenvector (main trend) of these functional connectivity matrices, which are estimated using the Hilbert transformation time point by time point. In order to use LEiDA, we need to specify:
 options.leida=1
In Cabral et al. (2017), the kmeans algorithm was used instead of the HMM. Here, we can use LEiDA in a full probabilistic setting based on the HMM. That opens the possibility of using either options.covtype='shareddiag' or options.covtype='sharedfull'; the second can leverage the geometric properties of the Leading Eigenvector space and is thus recommended. The use of LEiDA is not compatible with options.order>0, options.pca and options.S, and statespecific covariance matrices are not recommended.
Advanced topics
We next introduce some advanced topics. These basically are some specific options and features that you might not need in all cases. Most importantly, the options to use stochastic inference are described for those cases when training the HMM is too computationally costly (either memory or computation time). Also, the sign flipping correction is important when working on sourcereconstructed M/EEG data.
Statistical testing
There are two ways of using statistical testing implemented in the toolbox. In the first one, we have trials or subjects distributed across a number of conditions, and we would like to test whether or not there are differences across conditions. This is done with the hmmtest() function. In the second one, we want to see how the state activations is modulated by some stimulus or decision. In this case, we perform statistical testing for each time point in the trial (across trials). This is done with the function hmmtest_epoched().
More specifically, assuming the data is divided into different conditions (e.g. a blockdesign task), the function hmmtest() runs GLMbased permutation testing in order to see if there is any difference in terms of fractional occupancies, switching rates and (optionally) transition probability matrices between the conditions across trials or subjects. The function receives a (no. trials by no. conditions) or (no. subjects by no. conditions) design matrix Y. Tests are done at the subject or at the group level, and, in the group level case, the tests can be paired or unpaired. The statistic used to evaluate the surrogates and unpermuted data is the squared error. Please note that tests are not corrected across multiple comparisons so that must be done manually.
tests = hmmtest (Gamma,T,Tsubject,Y,options,hmm,Xi)
Input arguments
 Gamma: Inferred state time courses (from the hmmmar() function).
 T: Length of time series (no. trials by 1) in number of time points.
 Tsubject: Length of time series per subject (no. subjects by 1); if empty it will be interpreted that there is only one subject.
 Y: Design matrix, which can be (no. subjects by conditions) if testing is at the group level or (no. trials by conditions) if testing is at the subject level.
 options: Permutation testing options.
 hmm: the HMM structure obtained from hmmmar().
 Xi: joint probability of past and future states conditioned on data (optional and only used if options.testP==true), as obtained from hmmmar(). If unspecified, it will be approximated.
The structure options has, or can have, the following fields:
 Nperm: Number of permutations (default 1000)
 subjectlevel: run subjectlevel tests? if 0, grouplevel testing will be run. If unspecified it will be guess based on the dimensions of Y.
 confounds: (no. trials by no. confounds) matrix of confounds, which will be regressed out before permuting.
 paired: paired tests? false by default (ie unpaired testing).
 pairs: if paired testing, a vector indicating the pairs (eg [ 1 1 2 2 ... ]), with as many elements as subjects.
 testP: testing on the transition probability matrices? false by default.
The output argument, tests, is a struct with either of the following fields:
 subjectlevel: subjectlevel tests, with fields containing pvalues for the different measures: p_fractional_occupancy (no. states by no. subjects), p_switching_rate (1 by no. subjects) and p_transProbMat (no. states by no. states by no. subjects).
 grouplevel: grouplevel tests, with fields: p_fractional_occupancy (no. states by 1), p_switching_rate (one pvalue) and p_transProbMat (no. states by no. states).
Additionally, the fractional occupancy is also tested across states using the NPC algorithm (Vidaurre et al, 2019), giving (across states) combined pvalues grouplevel.p_aggr_fractional_occupancy (one value) or subjectlevel.p_aggr_fractional_occupancy (one value per subject).
On the other hand, the function hmmtest_epoched() will do regressionbased permutation testing for each time point within the trial. In this case, the function expects a paradigm where all the trials (or elements of T) have the same length.
pvalues = hmmtest_epoched (Gamma,T,Y,Nperm)
Input arguments
 Gamma: Inferred state time courses (from the hmmmar() function).
 T: Length of time series (no. trials by 1) in number of time points. All elements of T must be equal.
 Y: Vector containing one value per trial, corresponding to the value of the stimulus for that trial.
 Nperm: Number of permutations to be performed (default 10000).
The only output argument, _ pvalues_, is a vector with one pvalue per time point within the trial (so, its length is equal to T(1)).
Stochastic inference
The HMM variational inference iteratively estimates the state distributions and the state time courses from data. When the data set is very large, estimating the state time courses (which can be parallelizable across trials but requires sequential computations within each trial) can take a significant amount of time. Likewise, the estimation of the state distributions needs to have the entire data set in memory, which in turn can take a significant amount of space in memory and can potentially lead to timeconsuming memory swapping. For this case, an alternative stochastic inference scheme is included in the toolbox. Details apart, this is based on taking subsets or batches of subjects at each iteration instead of the entire data set. Stochastic inference will be run when the parameter options.BIGNbatch is specified. This parameter indicates the amount of subject in each batch. A very small number (e.g. 1) will lead to very noise but quick and memorycheap estimations at each iteration. A higher number (approaching the total number of subjects) will resemble the nonstochastic inference. In practice, a small but not too small number will provide good solutions at a cheap cost. For example, for the HCP restingstate fMRI data set, which has up to 1000 subjects, we found 50 to be a good tradeoff. For the MEG data set, with ~50 subjects, we used options.BIGNbatch=5. (More details about the algorithm and the experiments are provided in the NeuroImage paper).
Another relevant aspect is the initialisation. The initialisation for the stochastic inference method is implemented by running the standard inference on subsets of subjects, and combine all runs (see the paper for more details). Importantly, when stochastic inference is used, the parameters tol, inittype, initrep and initcyc refer only to this initialisation step. The size of these subsets of subjects on which we build the initial model is specified by the parameter BIGNinitbatch.
Although options.BIGNbatch (and secondarily BIGNinitbatch) are the most relevant parameters, there are other parameters which have a modest impact in the inference.
 BIGcyc: maximum number of variational inference cycles. The algorithm with stop earlier if BIGtol is reached. (Default to 1000).
 BIGinitcyc: The solution is initialised in batches, each with using the standard HMM inference. BIGinitcyc (and not options.cyc) tells how many iterations do these inferences run.
 BIGtol: minimum relative decrement of free energy (in comparison to the total decrement during the run) for convergence to be considered (default to 1E5). See also BIGundertol_tostop.
 BIGmincyc: minimum number of inference cycles to run, considering that the estimations in the first iterations can be quite noisy (default 10).
 BIGundertol_tostop: number of inference steps with free energy decrements under options.BIGtol for the algorithm to stop (default 5).
 BIGcycnobetter_tostop: number of iterations without improvement before stopping, given that stochastic inference can have occasional no improvements in the free energy (default 20).
 BIGinitrep: number of stochastic initialisations, of which the best will be chosen (default 1).
 repetitions: If repetitions is higher than 1, it will run the training (after initialisation) and keep the run with the best free energy (default to 1).
 BIGforgetrate and BIGdelay define the learning rate function (see paper for details). Defaults are to 0.9 and 1.0.
 BIGbase_weights: The use of this parameter is designed to promote a democratic sampling of the subjects (see paper). The smaller, the more frequentlysampled subjects are discouraged to be sampled again; a value of 1 indicates standard (unbiased) sampling (default 0.9).
 BIGcomputeGamma: If 1, a nonstochastic estimation of Gamma will be provided at the last iteration (default to 1).
 BIGdecodeGamma: If 1, a nonstochastic estimation of the Viterbi path will be provided at the last iteration (default to 1).
 BIGverbose: whether to show progress in the stochastic inference (default to 1).
Importantly, if stochastic inference is to be used, the inputs (both data and T; see above) must be specified as a cell, with one element per subject. That is, data{1} will be a matrix with the data for subject 1, and T{1} will contain a vector with the trial lengths for subject 1 (or just length(data{1}) if this is continuous data with no breaks).
Semisupervised estimation
Semisupervised prediction is also done using the hmmmar() function, but the first argument data is instead specified as a struct, with fields X (time series; with dimension no. of time points X no. of channels) and C (states, no. of time points X no. of states). Some rows of C will be vectors with values between 0 and 1, and summing up to 1. These will be interpreted as fixed values of the state time courses and will remain unmodified. The rest of the elements of C must be specified as NaN, indicating that the state time courses must be estimated at these time points. Note that if the MAR order is higher than 0, the first order rows of each trial of C will be ignored.
Targeting specific connections
This is implemented using the option options.S. If the position (i,j) is 1 then the autoregression coefficients that predict channel j from channel i are not modelled (for example, if all offdiagonal elements of S are 1, then the model would be an ensemble of unrelated AR models). If the position (i,j) is 1, the corresponding coefficients are modelled normally, i.e. they participate in driving the state transitions. If the position (i,j) is 0, the corresponding coefficients are modelled globally, i.e. they do not drive the state and have a stationary value all over the training (default, all values equal to 1). In the current version, this is implemented only for diag or shareddiag error covariance matrices (parameter covtype).
Computation of the free energy
The main hmmmar() function provides the history of the free energy throughout the variational inference process. We can recompute the free energy using hmmfe(). The value obtained may be slightly lower than the final free energy calculation during training. This is because (unless supplied) the state time courses are recomputed once more within hmmfe(), as they are necessary for the free energy calculation. Therefore, that could be considered as an additional iteration of the inference process, and, thus, will reduce the free energy a little extra amount.
fe = hmmfe(data,T,hmm);
fe = hmmfe(data,T,hmm,Gamma,Xi);
Input arguments
 X: a (no. of time points X no. of channels) matrix containing the time series.
 T: a vector containing the length of each trial/segment/subject (the sum of T must be equal to the number of rows of X).
 hmm: a structure with the estimated HMMMAR model.
 Gamma: a (no. of time points X no. of states) matrix containing the state time courses (optional).
 Xi: a (no. of time points X no. of states X no. of states) matrix joint probability of past and future states conditioned on data (optional).
Output arguments
 fe: the variational free energy.
Crossvalidated loglikelihood
Crossvalidation is an alternative to the free energy for model selection of the MAR parameters (yet not of K), more principled but computationally more demanding
[mcv,cv] = cvhmmmar (X,T,options)
Input arguments
 X: a (no. of time points X no. of channels) matrix containing the time series.
 T: a vector containing the length of each trial/segment/subject (the sum of T must be equal to the number of rows of X.
 options: a structure with the HMMMAR training options, which are the same than for the HMMMAR model plus the following options:
 options.cvfolds: either the number of CV folds, or the result of calling to the Matlab function cvpartition, which contains the CV structure.
 options.cvrep: the number of HMMMAR per CV iteration, out of which the best according to the free energy will be selected (default to 1).
 options.cvverbose: whether to show algorithm progress (default to 0).
Output arguments
 mcv: the averaged crossvalidated likelihood.
 cv: the averaged crossvalidated likelihood.
Simulating data from a HMMMAR model
Simulating both the hidden state time courses and the data is possible with the function:
[X,T,Gamma,statepath] = simhmmmar(T,hmm,Gamma,nrep,sim_state_tcs_only)
Input arguments
 T: a vector containing the length of each trial/segment/subject.
 hmm: an hmm structure, with options specified in hmm.train.
 Gamma: the state courses (they will be simulated if empty).
 nrep: if the state time courses are to be simulated, nrep parametrises the multinomial sampling distribution. A value of 1 (by default) is recommended.
 sim_state_tcs_only: when specified to 1, only the state time courses will be simulated (default to 0).
Output arguments
 X: simulated observations.
 T: a vector containing the length of each generated trial/segment/subject.
 Gamma: simulated state time courses.
Handling the lags
In order to manipulate the configuration of the MAR lags, there are two functions that can be useful
[orders,order] = formorders(order,orderoffset,timelag,exptimelag)
[order,exptimelag] = higherorder(minHz,Fs,L,orderoffset)
The first provides the set of lags and the maximum lag for the configuration specified by the input parameters (see the hmmmar() function for more information about those).
The second, provided with the slowest frequency we want to cover (minHz), the sampling rate of the data (Fs), the number of desired lags (L) and the offset (orderoffset, check the HMMMAR parameters above for more details), returns the corresponding order and exptimelag parameters that we need to specify using an exponential lapse.
In general, based on our latest experience it is preferable to work with a low order and avoid using exptimelag or orderoffset if possible.
Sign flipping to correct for dipole ambiguity
The sign ambiguity issue (present when dealing with sourcereconstructed M/EEG data) is taken care of by the following function, which is not called by the hmmmar() function or any of the other functions, and, then, needs to be invoked explicitly. This step is however not necessary if working with the power time series (by setting options.onpower=1).
[flips,scorepath] = findflip(X,T,options)
Input arguments
 X: a (no. of time points X no. of channels) matrix containing the time series.
 T: a vector containing the length of each trial/segment/subject (the sum of T must be equal to the number of rows of X).
 options.maxlag: max lag to consider, considering that we are including lagged autocovariance matrices in the calculation (default to 4).
 options.nbatch: no. of channels to evaluate at each iteration of the optimisation process (default to no. of channels).
 options.noruns: how many random initialisations will be carried out (default to 50).
 options.standardize: if 1, standardize the data (default to 1).
 options.maxcyc: for each initialization, maximum number of cycles of the greedy algorithm (default to 100 * N * no. of channels).
 options.mincyc: for each initialization, minimum number of cycles of the greedy algorithm (default to 10).
Output arguments
 flips: (length(T) X No. channels) binary matrix saying which channels must be flipped for each of the time series.
 scorepath: cell with the score of the winning solutions.
The effective flipping is done with the function:
X = flipdata(X,T,flips)
which uses the output of findflips().